public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [PATCH] eal/linux: handle epoll error conditions
@ 2026-01-28 12:20 Kevin Traynor
  2026-01-29 12:51 ` Kevin Traynor
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-01-28 12:20 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, Kevin Traynor,
	stable

Add handling for epoll error conditions EPOLLERR, EPOLLHUP and
EPOLLRDHUP. These events indicate that the interrupt file descriptor
is in an error state or there has been a hangup.

This may happen when the interrupt file descriptor is deleted or for
mlx5 devices when the device is unbound from mlx5 kernel driver or if
the device is removed by the mlx5 kernel driver as part of LAG setup.

Previously, the interrupts were being read, but the condition was not
cleared and that may lead to an interrupt continuing to fire and a
busy-loop processing it.

Now when this condition is detected, an error message is logged and the
interrupt is removed to prevent busy-looping.

Also cover the case where no bytes are read even though the epoll has
indicated there is something to read.

Bugzilla ID: 1873
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/eal/linux/eal_interrupts.c | 63 ++++++++++++++++++++++++----------
 1 file changed, 44 insertions(+), 19 deletions(-)

diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 9db978923a..eedc75d776 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -887,4 +887,21 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static void
+eal_intr_source_free(struct rte_intr_source *src)
+{
+	struct rte_intr_callback *cb, *next;
+
+	/* Free all callbacks */
+	for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
+		next = TAILQ_NEXT(cb, next);
+		TAILQ_REMOVE(&src->callbacks, cb, next);
+		free(cb);
+	}
+
+	/* Free the interrupt source */
+	rte_intr_instance_free(src->intr_handle);
+	free(src);
+}
+
 static int
 eal_intr_process_interrupts(struct epoll_event *events, int nfds)
@@ -918,4 +935,21 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		}
 
+		/* Check for error conditions on the fd before processing. */
+		if (events[n].events & (EPOLLRDHUP | EPOLLERR | EPOLLHUP)) {
+			EAL_LOG(WARNING, "Disconnect condition on fd %d "
+				"(events=0x%x), removing from epoll",
+				events[n].data.fd, events[n].events);
+			/*
+			 * There is an error or a hangup. Remove the
+			 * interrupt source and return to force the wait list
+			 * to be rebuilt.
+			 */
+			TAILQ_REMOVE(&intr_sources, src, next);
+			rte_spinlock_unlock(&intr_lock);
+
+			eal_intr_source_free(src);
+			return -1;
+		}
+
 		/* mark this interrupt source as active and release the lock. */
 		src->active = 1;
@@ -957,5 +991,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			 */
 			bytes_read = read(events[n].data.fd, &buf, bytes_read);
-			if (bytes_read < 0) {
+			if (bytes_read > 0) {
+				call = true;
+			} else if (bytes_read < 0) {
 				if (errno == EINTR || errno == EWOULDBLOCK)
 					continue;
@@ -965,27 +1001,16 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 					events[n].data.fd,
 					strerror(errno));
-				/*
-				 * The device is unplugged or buggy, remove
-				 * it as an interrupt source and return to
-				 * force the wait list to be rebuilt.
-				 */
+			} else { /* bytes == 0 */
+				EAL_LOG(WARNING, "Read nothing from file "
+					"descriptor %d", events[n].data.fd);
+			}
+			if (bytes_read <= 0) {
 				rte_spinlock_lock(&intr_lock);
 				TAILQ_REMOVE(&intr_sources, src, next);
 				rte_spinlock_unlock(&intr_lock);
 
-				for (cb = TAILQ_FIRST(&src->callbacks); cb;
-							cb = next) {
-					next = TAILQ_NEXT(cb, next);
-					TAILQ_REMOVE(&src->callbacks, cb, next);
-					free(cb);
-				}
-				rte_intr_instance_free(src->intr_handle);
-				free(src);
+				eal_intr_source_free(src);
 				return -1;
-			} else if (bytes_read == 0)
-				EAL_LOG(ERR, "Read nothing from file "
-					"descriptor %d", events[n].data.fd);
-			else
-				call = true;
+			}
 		}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] eal/linux: handle epoll error conditions
  2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
@ 2026-01-29 12:51 ` Kevin Traynor
  2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-01-29 12:51 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, stable

On 28/01/2026 12:20, Kevin Traynor wrote:
> Add handling for epoll error conditions EPOLLERR, EPOLLHUP and
> EPOLLRDHUP. These events indicate that the interrupt file descriptor
> is in an error state or there has been a hangup.
> 
> This may happen when the interrupt file descriptor is deleted or for
> mlx5 devices when the device is unbound from mlx5 kernel driver or if
> the device is removed by the mlx5 kernel driver as part of LAG setup.
> 
> Previously, the interrupts were being read, but the condition was not
> cleared and that may lead to an interrupt continuing to fire and a
> busy-loop processing it.
> 
> Now when this condition is detected, an error message is logged and the
> interrupt is removed to prevent busy-looping.
> 
> Also cover the case where no bytes are read even though the epoll has
> indicated there is something to read.
> 
> Bugzilla ID: 1873
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> ---
>  lib/eal/linux/eal_interrupts.c | 63 ++++++++++++++++++++++++----------
>  1 file changed, 44 insertions(+), 19 deletions(-)

I have reproduced the ci failure and need to investigate. I may have
been too aggressive in dealing with some event types. I will mark as
Changes Requested. Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 0/2] interrupt epoll event handling
  2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
  2026-01-29 12:51 ` Kevin Traynor
@ 2026-02-06 17:20 ` Kevin Traynor
  2026-02-06 17:20   ` [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
  2026-02-06 17:20   ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
  2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-06 17:20 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, Kevin Traynor

These patches are to fix some issues with epoll event handling for
EPOLLERR/EPOLLRDHUP/EPOLLHUP.

In the interrupt handling code, some interrupts are read directly in eal
and some just call registered callback which will read the interrupt.

v1 version was too aggressively dealing with epoll disconnect/error events
by detecting and removing the interrupt in the eal interrupt code and
not calling the registered callbacks.

This was a problem for virtio, which needs to handle these scenarios
itself so it can enable reconnect for vhost server.

In v2, if the read is not done in eal then the registered callback will
be called and it is up to handlers external to eal to deal with the
interrupt as they see fit.

This better respects interrupt types like RTE_INTR_HANDLE_EXT and
RTE_INTR_HANDLE_VDEV.

To deal with the observed issue of mlx5 devx interrupts causing a
busy-loop and 100% CPU of dpdk-intr thread, extra handling is added to
the devx interrupt handler.

1/2: deals with mlx5 devx interrupt busy-loop
2/2: deals with disconnect/error epoll events for interrupts read in eal

The patches are independent but 1/2 is the direct fix for the real life
bug observed and reported in https://bugs.dpdk.org/show_bug.cgi?id=1873.

Kevin Traynor (2):
  net/mlx5: check for no data read in devx interrupt
  eal/linux: handle interrupt epoll events

 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 34 ++++++++++---
 lib/eal/linux/eal_interrupts.c          | 67 ++++++++++++++++---------
 2 files changed, 72 insertions(+), 29 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
@ 2026-02-06 17:20   ` Kevin Traynor
  2026-02-07  6:09     ` Stephen Hemminger
  2026-02-06 17:20   ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
  1 sibling, 1 reply; 33+ messages in thread
From: Kevin Traynor @ 2026-02-06 17:20 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, Kevin Traynor,
	stable

A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
EPOLLRDHUP epoll events for the devx interrupt fd.

This may happen if the interrupt fd is deleted, if the device
is unbound from mlx5_core kernel driver or if the device is
removed by the mlx5 kernel driver as part of LAG setup.

When that occurs, there is no data to be read and in the devx
interrupt handler an EAGAIN is returned on the first call to
devx_get_async_cmd_comp, but this is not checked.

As the interrupt is not removed or condition reset, it causes
an interrupt processing busy-loop, which leads to the dpdk-intr
thread going to 100% CPU.

e.g.
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)

Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
first read. If that happens, unregister the callback to prevent looping.

Bugzilla ID: 1873
Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 34 ++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 50997c187c..18b0baee04 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -859,11 +859,33 @@ mlx5_dev_interrupt_handler_devx(void *cb_arg)
 	} out;
 	uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+	bool data_read = false;
+	int ret;
 
-	while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
-						   &out.cmd_resp,
-						   sizeof(out.buf)))
-		mlx5_flow_async_pool_query_handle
-			(sh, (uint64_t)out.cmd_resp.wr_id,
-			 mlx5_devx_get_out_command_status(buf));
+	while (!(ret = mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
+							  &out.cmd_resp,
+							  sizeof(out.buf)))) {
+		data_read = true;
+		mlx5_flow_async_pool_query_handle(sh,
+			(uint64_t)out.cmd_resp.wr_id,
+			mlx5_devx_get_out_command_status(buf));
+	};
+
+	if (!data_read && ret == EAGAIN) {
+		/**
+		 * no data and EAGAIN indicate there is an error or
+		 * disconnect state. Unregister callback to prevent
+		 * interrupt busy-looping.
+		 */
+		DRV_LOG(DEBUG, "no data for mlx5 devx interrupt on fd %d",
+			rte_intr_fd_get(sh->intr_handle_devx));
+
+		if (rte_intr_callback_unregister_pending(sh->intr_handle_devx,
+						     mlx5_dev_interrupt_handler_devx,
+						     (void *)sh, NULL) < 0) {
+			DRV_LOG(WARNING,
+				"unable to unregister mlx5 devx interrupt callback on fd %d",
+				rte_intr_fd_get(sh->intr_handle_devx));
+		}
+	}
 #endif /* HAVE_IBV_DEVX_ASYNC */
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/2] eal/linux: handle interrupt epoll events
  2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
  2026-02-06 17:20   ` [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
@ 2026-02-06 17:20   ` Kevin Traynor
  2026-02-07  6:11     ` Stephen Hemminger
  2026-02-10  9:17     ` David Marchand
  1 sibling, 2 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-06 17:20 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, Kevin Traynor,
	stable

Add handling for epoll error and disconnect conditions EPOLLERR,
EPOLLHUP and EPOLLRDHUP.

These events indicate that the interrupt file descriptor is in
an error state or there has been a hangup.

Only do this for interrupts that are read in eal. Interrupts that
are read outside eal should deal with different interrupt scenarios
appropriate to their functionality. e.g. virtio interrupt handling
has reconnect mechanisms for some cases.

Also, treat no bytes read as an error condition.

Bugzilla ID: 1873
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/eal/linux/eal_interrupts.c | 67 ++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 9db978923a..68ca0f929e 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -887,4 +887,26 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static void
+eal_intr_source_remove_and_free(struct rte_intr_source *src)
+{
+	struct rte_intr_callback *cb, *next;
+
+	/* Remove the interrupt source */
+	rte_spinlock_lock(&intr_lock);
+	TAILQ_REMOVE(&intr_sources, src, next);
+	rte_spinlock_unlock(&intr_lock);
+
+	/* Free callbacks */
+	for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
+		next = TAILQ_NEXT(cb, next);
+		TAILQ_REMOVE(&src->callbacks, cb, next);
+		free(cb);
+	}
+
+	/* Free the interrupt source */
+	rte_intr_instance_free(src->intr_handle);
+	free(src);
+}
+
 static int
 eal_intr_process_interrupts(struct epoll_event *events, int nfds)
@@ -952,4 +974,16 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 
 		if (bytes_read > 0) {
+			/**
+			 * Check for epoll error or disconnect events for
+			 * interrupts that are read directly in eal.
+			 */
+			if (events[n].events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) {
+				EAL_LOG(INFO, "Disconnect condition on fd %d "
+					"(events=0x%x), removing from epoll",
+					events[n].data.fd, events[n].events);
+				eal_intr_source_remove_and_free(src);
+				return -1;
+			}
+
 			/**
 			 * read out to clear the ready-to-be-read flag
@@ -957,5 +991,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			 */
 			bytes_read = read(events[n].data.fd, &buf, bytes_read);
-			if (bytes_read < 0) {
+			if (bytes_read > 0) {
+				call = true;
+			} else if (bytes_read < 0) {
 				if (errno == EINTR || errno == EWOULDBLOCK)
 					continue;
@@ -965,27 +1001,12 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 					events[n].data.fd,
 					strerror(errno));
-				/*
-				 * The device is unplugged or buggy, remove
-				 * it as an interrupt source and return to
-				 * force the wait list to be rebuilt.
-				 */
-				rte_spinlock_lock(&intr_lock);
-				TAILQ_REMOVE(&intr_sources, src, next);
-				rte_spinlock_unlock(&intr_lock);
-
-				for (cb = TAILQ_FIRST(&src->callbacks); cb;
-							cb = next) {
-					next = TAILQ_NEXT(cb, next);
-					TAILQ_REMOVE(&src->callbacks, cb, next);
-					free(cb);
-				}
-				rte_intr_instance_free(src->intr_handle);
-				free(src);
-				return -1;
-			} else if (bytes_read == 0)
-				EAL_LOG(ERR, "Read nothing from file "
+			} else { /* bytes == 0 */
+				EAL_LOG(WARNING, "Read nothing from file "
 					"descriptor %d", events[n].data.fd);
-			else
-				call = true;
+			}
+			if (bytes_read <= 0) {
+				eal_intr_source_remove_and_free(src);
+				return -1;
+			}
 		}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-06 17:20   ` [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
@ 2026-02-07  6:09     ` Stephen Hemminger
  2026-02-10 15:05       ` Kevin Traynor
  0 siblings, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2026-02-07  6:09 UTC (permalink / raw)
  To: Kevin Traynor
  Cc: dev, thomas, david.marchand, dsosnowski, viacheslavo, stable

On Fri,  6 Feb 2026 17:20:53 +0000
Kevin Traynor <ktraynor@redhat.com> wrote:

> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
> EPOLLRDHUP epoll events for the devx interrupt fd.
> 
> This may happen if the interrupt fd is deleted, if the device
> is unbound from mlx5_core kernel driver or if the device is
> removed by the mlx5 kernel driver as part of LAG setup.
> 
> When that occurs, there is no data to be read and in the devx
> interrupt handler an EAGAIN is returned on the first call to
> devx_get_async_cmd_comp, but this is not checked.
> 
> As the interrupt is not removed or condition reset, it causes
> an interrupt processing busy-loop, which leads to the dpdk-intr
> thread going to 100% CPU.
> 
> e.g.
> epoll_wait
>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
> read(28, 0x7f1f5c7fc2f0, 40)
>    = -1 EAGAIN (Resource temporarily unavailable)
> epoll_wait
>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
> read(28, 0x7f1f5c7fc2f0, 40)
>    = -1 EAGAIN (Resource temporarily unavailable)
> 
> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
> first read. If that happens, unregister the callback to prevent looping.
> 
> Bugzilla ID: 1873
> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

AI spotted this, I didn't...


Errors:

    Line 139: Unnecessary semicolon after closing brace

c

   };

Should be:
c

   }

    Lines 142-146: Block comment uses incorrect style Block comments in C code should use /* and */ style, not /** which is reserved for documentation comments.

c

   /**
    * no data and EAGAIN indicate there is an error or
    * disconnect state. Unregister callback to prevent
    * interrupt busy-looping.
    */

Should be:
c

   /*
    * no data and EAGAIN indicate there is an error or
    * disconnect state. Unregister callback to prevent
    * interrupt busy-looping.
    */

Warnings:

    Logic clarity: The variable data_read is set to true inside the while loop but never checked when data WAS read. Consider if data_read is the clearest way to express this condition.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] eal/linux: handle interrupt epoll events
  2026-02-06 17:20   ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
@ 2026-02-07  6:11     ` Stephen Hemminger
  2026-02-10 13:35       ` Kevin Traynor
  2026-02-10  9:17     ` David Marchand
  1 sibling, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2026-02-07  6:11 UTC (permalink / raw)
  To: Kevin Traynor
  Cc: dev, thomas, david.marchand, dsosnowski, viacheslavo, stable

On Fri,  6 Feb 2026 17:20:54 +0000
Kevin Traynor <ktraynor@redhat.com> wrote:

> Add handling for epoll error and disconnect conditions EPOLLERR,
> EPOLLHUP and EPOLLRDHUP.
> 
> These events indicate that the interrupt file descriptor is in
> an error state or there has been a hangup.
> 
> Only do this for interrupts that are read in eal. Interrupts that
> are read outside eal should deal with different interrupt scenarios
> appropriate to their functionality. e.g. virtio interrupt handling
> has reconnect mechanisms for some cases.
> 
> Also, treat no bytes read as an error condition.
> 
> Bugzilla ID: 1873
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

Minor AI review nits.

Code Review

Errors:

    Lines 295-298: Block comment uses incorrect style Same issue as patch 1 - should use /* not /** for non-documentation comments.

Warnings:

    Line 304: Return value inconsistency The function returns -1 to force rebuild of the wait list, but this happens in the middle of processing multiple events. The existing code had the same pattern for error handling, so this is consistent with the codebase style.
    Line 342: Log level changed from ERR to WARNING For a condition that causes interrupt source removal, WARNING may be too low. Consider if INFO (as used in line 300) might be more appropriate for consistency.

Overall Assessment

Both patches address a real bug (busy-looping on interrupt errors) with reasonable solutions. The main issues are code style violations with comment formatting and a minor semicolon error. The logic appears sound for handling the EAGAIN and epoll error conditions.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] eal/linux: handle interrupt epoll events
  2026-02-06 17:20   ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
  2026-02-07  6:11     ` Stephen Hemminger
@ 2026-02-10  9:17     ` David Marchand
  2026-02-10 14:47       ` Kevin Traynor
  1 sibling, 1 reply; 33+ messages in thread
From: David Marchand @ 2026-02-10  9:17 UTC (permalink / raw)
  To: Kevin Traynor; +Cc: dev, thomas, dsosnowski, viacheslavo, stable, Harman Kalra

Hello Kevin,

On Fri, 6 Feb 2026 at 18:21, Kevin Traynor <ktraynor@redhat.com> wrote:
>
> Add handling for epoll error and disconnect conditions EPOLLERR,
> EPOLLHUP and EPOLLRDHUP.
>
> These events indicate that the interrupt file descriptor is in
> an error state or there has been a hangup.
>
> Only do this for interrupts that are read in eal. Interrupts that
> are read outside eal should deal with different interrupt scenarios
> appropriate to their functionality. e.g. virtio interrupt handling
> has reconnect mechanisms for some cases.
>
> Also, treat no bytes read as an error condition.
>
> Bugzilla ID: 1873
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org

Cc: Harman.

>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> ---
>  lib/eal/linux/eal_interrupts.c | 67 ++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 23 deletions(-)
>
> diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
> index 9db978923a..68ca0f929e 100644
> --- a/lib/eal/linux/eal_interrupts.c
> +++ b/lib/eal/linux/eal_interrupts.c
> @@ -887,4 +887,26 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
>  }
>
> +static void
> +eal_intr_source_remove_and_free(struct rte_intr_source *src)
> +{
> +       struct rte_intr_callback *cb, *next;
> +
> +       /* Remove the interrupt source */
> +       rte_spinlock_lock(&intr_lock);
> +       TAILQ_REMOVE(&intr_sources, src, next);
> +       rte_spinlock_unlock(&intr_lock);
> +
> +       /* Free callbacks */
> +       for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
> +               next = TAILQ_NEXT(cb, next);
> +               TAILQ_REMOVE(&src->callbacks, cb, next);
> +               free(cb);
> +       }
> +
> +       /* Free the interrupt source */
> +       rte_intr_instance_free(src->intr_handle);
> +       free(src);
> +}
> +
>  static int
>  eal_intr_process_interrupts(struct epoll_event *events, int nfds)
> @@ -952,4 +974,16 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>
>                 if (bytes_read > 0) {
> +                       /**
> +                        * Check for epoll error or disconnect events for
> +                        * interrupts that are read directly in eal.
> +                        */
> +                       if (events[n].events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) {
> +                               EAL_LOG(INFO, "Disconnect condition on fd %d "

This is an anormal situation, I would make this log level the same as
other logs below.

The fact that the interrupt fd gets into this state should be
something to report and investigate.


> +                                       "(events=0x%x), removing from epoll",
> +                                       events[n].data.fd, events[n].events);
> +                               eal_intr_source_remove_and_free(src);
> +                               return -1;
> +                       }
> +
>                         /**
>                          * read out to clear the ready-to-be-read flag
> @@ -957,5 +991,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>                          */
>                         bytes_read = read(events[n].data.fd, &buf, bytes_read);
> -                       if (bytes_read < 0) {
> +                       if (bytes_read > 0) {
> +                               call = true;
> +                       } else if (bytes_read < 0) {
>                                 if (errno == EINTR || errno == EWOULDBLOCK)
>                                         continue;
> @@ -965,27 +1001,12 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>                                         events[n].data.fd,
>                                         strerror(errno));
> -                               /*
> -                                * The device is unplugged or buggy, remove
> -                                * it as an interrupt source and return to
> -                                * force the wait list to be rebuilt.
> -                                */
> -                               rte_spinlock_lock(&intr_lock);
> -                               TAILQ_REMOVE(&intr_sources, src, next);
> -                               rte_spinlock_unlock(&intr_lock);
> -
> -                               for (cb = TAILQ_FIRST(&src->callbacks); cb;
> -                                                       cb = next) {
> -                                       next = TAILQ_NEXT(cb, next);
> -                                       TAILQ_REMOVE(&src->callbacks, cb, next);
> -                                       free(cb);
> -                               }
> -                               rte_intr_instance_free(src->intr_handle);
> -                               free(src);
> -                               return -1;
> -                       } else if (bytes_read == 0)
> -                               EAL_LOG(ERR, "Read nothing from file "
> +                       } else { /* bytes == 0 */

"bytes_read == 0", or remove this comment as the code is quite compact
and leaves little space for wondering what this else block is about.


> +                               EAL_LOG(WARNING, "Read nothing from file "

I would keep this log at the same level than the < 0 condition.
It seems the same type of error.

>                                         "descriptor %d", events[n].data.fd);

And avoid splitting the format string.


> -                       else
> -                               call = true;
> +                       }
> +                       if (bytes_read <= 0) {
> +                               eal_intr_source_remove_and_free(src);
> +                               return -1;
> +                       }
>                 }
>
> --
> 2.52.0
>

Except those nits, the fix looks correct.

Acked-by: David Marchand <david.marchand@redhat.com>




-- 
David Marchand


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] eal/linux: handle interrupt epoll events
  2026-02-07  6:11     ` Stephen Hemminger
@ 2026-02-10 13:35       ` Kevin Traynor
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 13:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas, david.marchand, dsosnowski, viacheslavo, stable,
	Harman Kalra

On 07/02/2026 06:11, Stephen Hemminger wrote:
> On Fri,  6 Feb 2026 17:20:54 +0000
> Kevin Traynor <ktraynor@redhat.com> wrote:
> 
>> Add handling for epoll error and disconnect conditions EPOLLERR,
>> EPOLLHUP and EPOLLRDHUP.
>>
>> These events indicate that the interrupt file descriptor is in
>> an error state or there has been a hangup.
>>
>> Only do this for interrupts that are read in eal. Interrupts that
>> are read outside eal should deal with different interrupt scenarios
>> appropriate to their functionality. e.g. virtio interrupt handling
>> has reconnect mechanisms for some cases.
>>
>> Also, treat no bytes read as an error condition.
>>
>> Bugzilla ID: 1873
>> Fixes: af75078fece3 ("first public release")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> 
> Minor AI review nits.
> 

Thanks Stephen/AI. I will wait a couple of days and address these and
other comments in v3.

> Code Review
> 
> Errors:
> 
>     Lines 295-298: Block comment uses incorrect style Same issue as patch 1 - should use /* not /** for non-documentation comments.
> 

Ack

> Warnings:
> 
>     Line 304: Return value inconsistency The function returns -1 to force rebuild of the wait list, but this happens in the middle of processing multiple events. The existing code had the same pattern for error handling, so this is consistent with the codebase style.

any interrupts not processed this time are not reset, so this should be
fine.

>     Line 342: Log level changed from ERR to WARNING For a condition that causes interrupt source removal, WARNING may be too low. Consider if INFO (as used in line 300) might be more appropriate for consistency.
> 

I will review the log levels based on David's feedback.

> Overall Assessment
> 
> Both patches address a real bug (busy-looping on interrupt errors) with reasonable solutions. The main issues are code style violations with comment formatting and a minor semicolon error. The logic appears sound for handling the EAGAIN and epoll error conditions.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/2] eal/linux: handle interrupt epoll events
  2026-02-10  9:17     ` David Marchand
@ 2026-02-10 14:47       ` Kevin Traynor
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 14:47 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, dsosnowski, viacheslavo, stable, Harman Kalra

On 10/02/2026 09:17, David Marchand wrote:
> Hello Kevin,
> 
> On Fri, 6 Feb 2026 at 18:21, Kevin Traynor <ktraynor@redhat.com> wrote:
>>
>> Add handling for epoll error and disconnect conditions EPOLLERR,
>> EPOLLHUP and EPOLLRDHUP.
>>
>> These events indicate that the interrupt file descriptor is in
>> an error state or there has been a hangup.
>>
>> Only do this for interrupts that are read in eal. Interrupts that
>> are read outside eal should deal with different interrupt scenarios
>> appropriate to their functionality. e.g. virtio interrupt handling
>> has reconnect mechanisms for some cases.
>>
>> Also, treat no bytes read as an error condition.
>>
>> Bugzilla ID: 1873
>> Fixes: af75078fece3 ("first public release")
>> Cc: stable@dpdk.org
> 
> Cc: Harman.
> 
>>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>> ---
>>  lib/eal/linux/eal_interrupts.c | 67 ++++++++++++++++++++++------------
>>  1 file changed, 44 insertions(+), 23 deletions(-)
>>
>> diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
>> index 9db978923a..68ca0f929e 100644
>> --- a/lib/eal/linux/eal_interrupts.c
>> +++ b/lib/eal/linux/eal_interrupts.c
>> @@ -887,4 +887,26 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
>>  }
>>
>> +static void
>> +eal_intr_source_remove_and_free(struct rte_intr_source *src)
>> +{
>> +       struct rte_intr_callback *cb, *next;
>> +
>> +       /* Remove the interrupt source */
>> +       rte_spinlock_lock(&intr_lock);
>> +       TAILQ_REMOVE(&intr_sources, src, next);
>> +       rte_spinlock_unlock(&intr_lock);
>> +
>> +       /* Free callbacks */
>> +       for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
>> +               next = TAILQ_NEXT(cb, next);
>> +               TAILQ_REMOVE(&src->callbacks, cb, next);
>> +               free(cb);
>> +       }
>> +
>> +       /* Free the interrupt source */
>> +       rte_intr_instance_free(src->intr_handle);
>> +       free(src);
>> +}
>> +
>>  static int
>>  eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>> @@ -952,4 +974,16 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>>
>>                 if (bytes_read > 0) {
>> +                       /**
>> +                        * Check for epoll error or disconnect events for
>> +                        * interrupts that are read directly in eal.
>> +                        */
>> +                       if (events[n].events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) {
>> +                               EAL_LOG(INFO, "Disconnect condition on fd %d "
> 
> This is an anormal situation, I would make this log level the same as
> other logs below.
> 
> The fact that the interrupt fd gets into this state should be
> something to report and investigate.
> 

ok. I'll change to warning.

> 
>> +                                       "(events=0x%x), removing from epoll",
>> +                                       events[n].data.fd, events[n].events);
>> +                               eal_intr_source_remove_and_free(src);
>> +                               return -1;
>> +                       }
>> +
>>                         /**
>>                          * read out to clear the ready-to-be-read flag
>> @@ -957,5 +991,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>>                          */
>>                         bytes_read = read(events[n].data.fd, &buf, bytes_read);
>> -                       if (bytes_read < 0) {
>> +                       if (bytes_read > 0) {
>> +                               call = true;
>> +                       } else if (bytes_read < 0) {
>>                                 if (errno == EINTR || errno == EWOULDBLOCK)
>>                                         continue;
>> @@ -965,27 +1001,12 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>>                                         events[n].data.fd,
>>                                         strerror(errno));
>> -                               /*
>> -                                * The device is unplugged or buggy, remove
>> -                                * it as an interrupt source and return to
>> -                                * force the wait list to be rebuilt.
>> -                                */
>> -                               rte_spinlock_lock(&intr_lock);
>> -                               TAILQ_REMOVE(&intr_sources, src, next);
>> -                               rte_spinlock_unlock(&intr_lock);
>> -
>> -                               for (cb = TAILQ_FIRST(&src->callbacks); cb;
>> -                                                       cb = next) {
>> -                                       next = TAILQ_NEXT(cb, next);
>> -                                       TAILQ_REMOVE(&src->callbacks, cb, next);
>> -                                       free(cb);
>> -                               }
>> -                               rte_intr_instance_free(src->intr_handle);
>> -                               free(src);
>> -                               return -1;
>> -                       } else if (bytes_read == 0)
>> -                               EAL_LOG(ERR, "Read nothing from file "
>> +                       } else { /* bytes == 0 */
> 
> "bytes_read == 0", or remove this comment as the code is quite compact
> and leaves little space for wondering what this else block is about.
> 

Ack. I will take it as a compliment and remove the comment ;-)

> 
>> +                               EAL_LOG(WARNING, "Read nothing from file "
> 
> I would keep this log at the same level than the < 0 condition.
> It seems the same type of error.
> 
>>                                         "descriptor %d", events[n].data.fd);
> 
> And avoid splitting the format string.
> 

Ack.

> 
>> -                       else
>> -                               call = true;
>> +                       }
>> +                       if (bytes_read <= 0) {
>> +                               eal_intr_source_remove_and_free(src);
>> +                               return -1;
>> +                       }
>>                 }
>>
>> --
>> 2.52.0
>>
> 
> Except those nits, the fix looks correct.
> 
> Acked-by: David Marchand <david.marchand@redhat.com>
> 


Thanks David. I will make these changes in v3.

> 
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-07  6:09     ` Stephen Hemminger
@ 2026-02-10 15:05       ` Kevin Traynor
  2026-02-10 17:05         ` Slava Ovsiienko
  0 siblings, 1 reply; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 15:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas, david.marchand, dsosnowski, viacheslavo, stable

On 07/02/2026 06:09, Stephen Hemminger wrote:
> On Fri,  6 Feb 2026 17:20:53 +0000
> Kevin Traynor <ktraynor@redhat.com> wrote:
> 
>> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
>> EPOLLRDHUP epoll events for the devx interrupt fd.
>>
>> This may happen if the interrupt fd is deleted, if the device
>> is unbound from mlx5_core kernel driver or if the device is
>> removed by the mlx5 kernel driver as part of LAG setup.
>>
>> When that occurs, there is no data to be read and in the devx
>> interrupt handler an EAGAIN is returned on the first call to
>> devx_get_async_cmd_comp, but this is not checked.
>>
>> As the interrupt is not removed or condition reset, it causes
>> an interrupt processing busy-loop, which leads to the dpdk-intr
>> thread going to 100% CPU.
>>
>> e.g.
>> epoll_wait
>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
>> read(28, 0x7f1f5c7fc2f0, 40)
>>    = -1 EAGAIN (Resource temporarily unavailable)
>> epoll_wait
>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
>> read(28, 0x7f1f5c7fc2f0, 40)
>>    = -1 EAGAIN (Resource temporarily unavailable)
>>
>> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
>> first read. If that happens, unregister the callback to prevent looping.
>>
>> Bugzilla ID: 1873
>> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> 
> AI spotted this, I didn't...
> 
> 
> Errors:
> 
>     Line 139: Unnecessary semicolon after closing brace
> 
> c
> 
>    };
> 
> Should be:
> c
> 
>    }
> 
>     Lines 142-146: Block comment uses incorrect style Block comments in C code should use /* and */ style, not /** which is reserved for documentation comments.
> 
> c
> 
>    /**
>     * no data and EAGAIN indicate there is an error or
>     * disconnect state. Unregister callback to prevent
>     * interrupt busy-looping.
>     */
> 
> Should be:
> c
> 
>    /*
>     * no data and EAGAIN indicate there is an error or
>     * disconnect state. Unregister callback to prevent
>     * interrupt busy-looping.
>     */
> 
> Warnings:
> 
>     Logic clarity: The variable data_read is set to true inside the while loop but never checked when data WAS read. Consider if data_read is the clearest way to express this condition.
> 

Ack above. Thanks.Will be fixed in v3.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-10 15:05       ` Kevin Traynor
@ 2026-02-10 17:05         ` Slava Ovsiienko
  2026-02-10 19:07           ` Kevin Traynor
  0 siblings, 1 reply; 33+ messages in thread
From: Slava Ovsiienko @ 2026-02-10 17:05 UTC (permalink / raw)
  To: Kevin Traynor, Stephen Hemminger
  Cc: dev@dpdk.org, NBU-Contact-Thomas Monjalon (EXTERNAL),
	david.marchand@redhat.com, Dariusz Sosnowski, stable@dpdk.org

Hi,

I'm sorry, I have some concern about the patch.

How it works, as far as I understand:

- DPDK simulates interrupts in user mode with epoll_wait()
- mlx5 PMD emits the async counter query command to the NIC periodically 
- there might be multiple async query commands in the flight
- kernel drivers handles the async query completion interrupts, pushes the token to the internal completion queue and unblocks associated fd
- epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
- mlx5 PMD reads the completion token from the kernel queue with devx_get_async_cmd_comp()

The concern scenario, let's assume:

- we have 2 async query commands in the flight
- the first async query completes, fd is unblocked, PMD is inviked, the completion is read by PMD and is being handled
- the second async query completes, fd gets unblocked, the second token is written to the queue
- the PMD completes the handling of the first completion and reads the queue again (with devx_get_async_cmd_comp() call in the loop)
- it reads the second token successfully and handles
- then, on the third call, devx_get_async_cmd_comp() returns EAGAIN, it means queue is empty
- DPDK calls epoll_wait() again and sees unblocked fd
- it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue is empty (handled in previous interrupt handling)
- with the patch we wrongly remove the handler 

In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU from the epoll_wait()_return also for
RTE_INTR_HANDLE_EXT and RTE_INTR_HANDLE_DEV_EVENT interrupt types.

With best regards,
Slava
 



> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: Tuesday, February 10, 2026 5:06 PM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stable@dpdk.org
> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
> 
> On 07/02/2026 06:09, Stephen Hemminger wrote:
> > On Fri,  6 Feb 2026 17:20:53 +0000
> > Kevin Traynor <ktraynor@redhat.com> wrote:
> >
> >> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
> EPOLLRDHUP
> >> epoll events for the devx interrupt fd.
> >>
> >> This may happen if the interrupt fd is deleted, if the device is
> >> unbound from mlx5_core kernel driver or if the device is removed by
> >> the mlx5 kernel driver as part of LAG setup.
> >>
> >> When that occurs, there is no data to be read and in the devx
> >> interrupt handler an EAGAIN is returned on the first call to
> >> devx_get_async_cmd_comp, but this is not checked.
> >>
> >> As the interrupt is not removed or condition reset, it causes an
> >> interrupt processing busy-loop, which leads to the dpdk-intr thread
> >> going to 100% CPU.
> >>
> >> e.g.
> >> epoll_wait
> >>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
> >> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >>    = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
> >>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
> >> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >>    = -1 EAGAIN (Resource temporarily unavailable)
> >>
> >> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
> >> first read. If that happens, unregister the callback to prevent looping.
> >>
> >> Bugzilla ID: 1873
> >> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >
> > AI spotted this, I didn't...
> >
> >
> > Errors:
> >
> >     Line 139: Unnecessary semicolon after closing brace
> >
> > c
> >
> >    };
> >
> > Should be:
> > c
> >
> >    }
> >
> >     Lines 142-146: Block comment uses incorrect style Block comments in C
> code should use /* and */ style, not /** which is reserved for documentation
> comments.
> >
> > c
> >
> >    /**
> >     * no data and EAGAIN indicate there is an error or
> >     * disconnect state. Unregister callback to prevent
> >     * interrupt busy-looping.
> >     */
> >
> > Should be:
> > c
> >
> >    /*
> >     * no data and EAGAIN indicate there is an error or
> >     * disconnect state. Unregister callback to prevent
> >     * interrupt busy-looping.
> >     */
> >
> > Warnings:
> >
> >     Logic clarity: The variable data_read is set to true inside the while loop but
> never checked when data WAS read. Consider if data_read is the clearest way to
> express this condition.
> >
> 
> Ack above. Thanks.Will be fixed in v3.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 0/2] interrupt epoll event handling
  2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
  2026-01-29 12:51 ` Kevin Traynor
  2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
@ 2026-02-10 18:06 ` Kevin Traynor
  2026-02-10 18:06   ` [PATCH v3 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
  2026-02-10 18:06   ` [PATCH v3 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
  2026-02-19 14:37 ` [PATCH v4 0/3] interrupt disconnect/error event handling Kevin Traynor
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 18:06 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor

These patches are to fix some issues with epoll event handling for
EPOLLERR/EPOLLRDHUP/EPOLLHUP.

In the interrupt handling code, some interrupts are read directly in eal
and some just call registered callback which will read the interrupt.

v1 version was too aggressively dealing with epoll disconnect/error events
by detecting and removing the interrupt in the eal interrupt code and
not calling the registered callbacks.

This was a problem for virtio, which needs to handle these scenarios
itself so it can enable reconnect for vhost server.

In v2, if the read is not done in eal then the registered callback will
be called and it is up to handlers external to eal to deal with the
interrupt as they see fit.

This better respects interrupt types like RTE_INTR_HANDLE_EXT and
RTE_INTR_HANDLE_VDEV.

To deal with the observed issue of mlx5 devx interrupts causing a
busy-loop and 100% CPU of dpdk-intr thread, extra handling is added to
the devx interrupt handler.

1/2: deals with mlx5 devx interrupt busy-loop
2/2: deals with disconnect/error epoll events for interrupts read in eal

The patches are independent but 1/2 is the direct fix for the real life
bug observed and reported in https://bugs.dpdk.org/show_bug.cgi?id=1873.

v3:
- 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
- 2/2 Make log level consistant (David)

Sending v3 as it is fixing some minor coding issues and is prepared.

***NOTE:*** Slava has raised some concerns on the operation of 1/2 that
will need to be discussed further.

Kevin Traynor (2):
  net/mlx5: check for no data read in devx interrupt
  eal/linux: handle interrupt epoll events

 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 34 +++++++++---
 lib/eal/linux/eal_interrupts.c          | 72 ++++++++++++++++---------
 2 files changed, 74 insertions(+), 32 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
@ 2026-02-10 18:06   ` Kevin Traynor
  2026-02-10 18:06   ` [PATCH v3 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
  1 sibling, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 18:06 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor, stable

A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
EPOLLRDHUP epoll events for the devx interrupt fd.

This may happen if the interrupt fd is deleted, if the device
is unbound from mlx5_core kernel driver or if the device is
removed by the mlx5 kernel driver as part of LAG setup.

When that occurs, there is no data to be read and in the devx
interrupt handler an EAGAIN is returned on the first call to
devx_get_async_cmd_comp, but this is not checked.

As the interrupt is not removed or condition reset, it causes
an interrupt processing busy-loop, which leads to the dpdk-intr
thread going to 100% CPU.

e.g.
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)

Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
first read. If that happens, unregister the callback to prevent looping.

Bugzilla ID: 1873
Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 34 ++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 50997c187c..bb4ef40e06 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -859,11 +859,33 @@ mlx5_dev_interrupt_handler_devx(void *cb_arg)
 	} out;
 	uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+	bool data_read = false;
+	int ret;
 
-	while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
-						   &out.cmd_resp,
-						   sizeof(out.buf)))
-		mlx5_flow_async_pool_query_handle
-			(sh, (uint64_t)out.cmd_resp.wr_id,
-			 mlx5_devx_get_out_command_status(buf));
+	while (!(ret = mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
+							  &out.cmd_resp,
+							  sizeof(out.buf)))) {
+		data_read = true;
+		mlx5_flow_async_pool_query_handle(sh,
+			(uint64_t)out.cmd_resp.wr_id,
+			mlx5_devx_get_out_command_status(buf));
+	}
+
+	if (!data_read && ret == EAGAIN) {
+		/*
+		 * no data and EAGAIN indicate there is an error or
+		 * disconnect state. Unregister callback to prevent
+		 * interrupt busy-looping.
+		 */
+		DRV_LOG(DEBUG, "no data for mlx5 devx interrupt on fd %d",
+			rte_intr_fd_get(sh->intr_handle_devx));
+
+		if (rte_intr_callback_unregister_pending(sh->intr_handle_devx,
+						     mlx5_dev_interrupt_handler_devx,
+						     (void *)sh, NULL) < 0) {
+			DRV_LOG(WARNING,
+				"unable to unregister mlx5 devx interrupt callback on fd %d",
+				rte_intr_fd_get(sh->intr_handle_devx));
+		}
+	}
 #endif /* HAVE_IBV_DEVX_ASYNC */
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 2/2] eal/linux: handle interrupt epoll events
  2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
  2026-02-10 18:06   ` [PATCH v3 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
@ 2026-02-10 18:06   ` Kevin Traynor
  1 sibling, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 18:06 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor, stable

Add handling for epoll error and disconnect conditions EPOLLERR,
EPOLLHUP and EPOLLRDHUP.

These events indicate that the interrupt file descriptor is in
an error state or there has been a hangup.

Only do this for interrupts that are read in eal. Interrupts that
are read outside eal should deal with different interrupt scenarios
appropriate to their functionality. e.g. virtio interrupt handling
has reconnect mechanisms for some cases.

Also, treat no bytes read as an error condition.

Bugzilla ID: 1873
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/linux/eal_interrupts.c | 72 ++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 9db978923a..f3f6bdd01d 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -887,4 +887,26 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static void
+eal_intr_source_remove_and_free(struct rte_intr_source *src)
+{
+	struct rte_intr_callback *cb, *next;
+
+	/* Remove the interrupt source */
+	rte_spinlock_lock(&intr_lock);
+	TAILQ_REMOVE(&intr_sources, src, next);
+	rte_spinlock_unlock(&intr_lock);
+
+	/* Free callbacks */
+	for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
+		next = TAILQ_NEXT(cb, next);
+		TAILQ_REMOVE(&src->callbacks, cb, next);
+		free(cb);
+	}
+
+	/* Free the interrupt source */
+	rte_intr_instance_free(src->intr_handle);
+	free(src);
+}
+
 static int
 eal_intr_process_interrupts(struct epoll_event *events, int nfds)
@@ -952,40 +974,38 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 
 		if (bytes_read > 0) {
-			/**
+			/*
+			 * Check for epoll error or disconnect events for
+			 * interrupts that are read directly in eal.
+			 */
+			if (events[n].events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) {
+				EAL_LOG(ERR, "Disconnect condition on fd %d "
+					"(events=0x%x), removing from epoll",
+					events[n].data.fd, events[n].events);
+				eal_intr_source_remove_and_free(src);
+				return -1;
+			}
+
+			/*
 			 * read out to clear the ready-to-be-read flag
 			 * for epoll_wait.
 			 */
 			bytes_read = read(events[n].data.fd, &buf, bytes_read);
-			if (bytes_read < 0) {
+			if (bytes_read > 0) {
+				call = true;
+			} else if (bytes_read < 0) {
 				if (errno == EINTR || errno == EWOULDBLOCK)
 					continue;
 
-				EAL_LOG(ERR, "Error reading from file "
-					"descriptor %d: %s",
+				EAL_LOG(ERR, "Error reading from file descriptor %d: %s",
 					events[n].data.fd,
 					strerror(errno));
-				/*
-				 * The device is unplugged or buggy, remove
-				 * it as an interrupt source and return to
-				 * force the wait list to be rebuilt.
-				 */
-				rte_spinlock_lock(&intr_lock);
-				TAILQ_REMOVE(&intr_sources, src, next);
-				rte_spinlock_unlock(&intr_lock);
-
-				for (cb = TAILQ_FIRST(&src->callbacks); cb;
-							cb = next) {
-					next = TAILQ_NEXT(cb, next);
-					TAILQ_REMOVE(&src->callbacks, cb, next);
-					free(cb);
-				}
-				rte_intr_instance_free(src->intr_handle);
-				free(src);
+			} else {
+				EAL_LOG(ERR, "Read nothing from file descriptor %d",
+					events[n].data.fd);
+			}
+			if (bytes_read <= 0) {
+				eal_intr_source_remove_and_free(src);
 				return -1;
-			} else if (bytes_read == 0)
-				EAL_LOG(ERR, "Read nothing from file "
-					"descriptor %d", events[n].data.fd);
-			else
-				call = true;
+			}
 		}
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-10 17:05         ` Slava Ovsiienko
@ 2026-02-10 19:07           ` Kevin Traynor
  2026-02-10 20:58             ` Slava Ovsiienko
  0 siblings, 1 reply; 33+ messages in thread
From: Kevin Traynor @ 2026-02-10 19:07 UTC (permalink / raw)
  To: Slava Ovsiienko, Stephen Hemminger
  Cc: dev@dpdk.org, NBU-Contact-Thomas Monjalon (EXTERNAL),
	david.marchand@redhat.com, Dariusz Sosnowski, stable@dpdk.org,
	Harman Kalra

On 10/02/2026 17:05, Slava Ovsiienko wrote:
> Hi,
> 

Hi Slava,

> I'm sorry, I have some concern about the patch.
> 

No problem, that's what reviews are for :-) thanks for reviewing.

> How it works, as far as I understand:
> 
> - DPDK simulates interrupts in user mode with epoll_wait()
> - mlx5 PMD emits the async counter query command to the NIC periodically

I didn't think this would happen unless there was something like
hardware offload, but regardless, yes I agree there may be async counter
queries.

> - there might be multiple async query commands in the flight
> - kernel drivers handles the async query completion interrupts, pushes the token to the internal completion queue and unblocks associated fd
> - epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
> - mlx5 PMD reads the completion token from the kernel queue with devx_get_async_cmd_comp()
> 
> The concern scenario, let's assume:
> 
> - we have 2 async query commands in the flight
> - the first async query completes, fd is unblocked, PMD is inviked, the completion is read by PMD and is being handled
> - the second async query completes, fd gets unblocked, the second token is written to the queue
> - the PMD completes the handling of the first completion and reads the queue again (with devx_get_async_cmd_comp() call in the loop)
> - it reads the second token successfully and handles
> - then, on the third call, devx_get_async_cmd_comp() returns EAGAIN, it means queue is empty
> - DPDK calls epoll_wait() again and sees unblocked fd
> - it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue is empty (handled in previous interrupt handling)
> - with the patch we wrongly remove the handler 
> 

I'm not sure, but this ^^^ sounds feasible.

> In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU from the epoll_wait()_return also for
> RTE_INTR_HANDLE_EXT and RTE_INTR_HANDLE_DEV_EVENT interrupt types.
> 

That's exactly what I had in v1 of the patch! The issue is that some
clients of eal interrupt may not interpret the condition of
EPOLLHUP/EPOLLRDHUP as an error condition and/or want to do some special
handling.

The example is vhost user server, which puts in place a reconnect
mechanism. If we filter out EPOLLHUP/EPOLLRDHUP events in eal, then
virtio will not receive the callback and vhost server reconnect is
broken. I have some more notes about it in the cover letter.

Trying to base on the read pattern in devx handler was an attempt to
move logic out of eal so different handlers could be flexible in how
they handle this condition.

We do have a distinction in that mlx5 uses RTE_INTR_HANDLE_EXT and
virtio uses RTE_INTR_HANDLE_VDEV but i'm not sure that is generic enough
to base a check/don't check for EPOLLHUP/EPOLLRDHUP events on.

So we'd need to come up with another solution if we wanted to filter
this in eal. Let's think more on this, though we are a bit constrained
by public API as well.

A workaround we can do from application is David's hack™ "-a
0000:00:00.0" to skip initial probe. That will at least prevent the
issue for mlx devices not used in DPDK, which was the scenario reported.

thanks,
Kevin.

> With best regards,
> Slava
>  
> 
> 
> 
>> -----Original Message-----
>> From: Kevin Traynor <ktraynor@redhat.com>
>> Sent: Tuesday, February 10, 2026 5:06 PM
>> To: Stephen Hemminger <stephen@networkplumber.org>
>> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
>> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
>> <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
>> stable@dpdk.org
>> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
>>
>> On 07/02/2026 06:09, Stephen Hemminger wrote:
>>> On Fri,  6 Feb 2026 17:20:53 +0000
>>> Kevin Traynor <ktraynor@redhat.com> wrote:
>>>
>>>> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
>> EPOLLRDHUP
>>>> epoll events for the devx interrupt fd.
>>>>
>>>> This may happen if the interrupt fd is deleted, if the device is
>>>> unbound from mlx5_core kernel driver or if the device is removed by
>>>> the mlx5 kernel driver as part of LAG setup.
>>>>
>>>> When that occurs, there is no data to be read and in the devx
>>>> interrupt handler an EAGAIN is returned on the first call to
>>>> devx_get_async_cmd_comp, but this is not checked.
>>>>
>>>> As the interrupt is not removed or condition reset, it causes an
>>>> interrupt processing busy-loop, which leads to the dpdk-intr thread
>>>> going to 100% CPU.
>>>>
>>>> e.g.
>>>> epoll_wait
>>>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
>>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
>>>>    = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
>>>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
>>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
>>>>    = -1 EAGAIN (Resource temporarily unavailable)
>>>>
>>>> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
>>>> first read. If that happens, unregister the callback to prevent looping.
>>>>
>>>> Bugzilla ID: 1873
>>>> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>>>
>>> AI spotted this, I didn't...
>>>
>>>
>>> Errors:
>>>
>>>     Line 139: Unnecessary semicolon after closing brace
>>>
>>> c
>>>
>>>    };
>>>
>>> Should be:
>>> c
>>>
>>>    }
>>>
>>>     Lines 142-146: Block comment uses incorrect style Block comments in C
>> code should use /* and */ style, not /** which is reserved for documentation
>> comments.
>>>
>>> c
>>>
>>>    /**
>>>     * no data and EAGAIN indicate there is an error or
>>>     * disconnect state. Unregister callback to prevent
>>>     * interrupt busy-looping.
>>>     */
>>>
>>> Should be:
>>> c
>>>
>>>    /*
>>>     * no data and EAGAIN indicate there is an error or
>>>     * disconnect state. Unregister callback to prevent
>>>     * interrupt busy-looping.
>>>     */
>>>
>>> Warnings:
>>>
>>>     Logic clarity: The variable data_read is set to true inside the while loop but
>> never checked when data WAS read. Consider if data_read is the clearest way to
>> express this condition.
>>>
>>
>> Ack above. Thanks.Will be fixed in v3.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-10 19:07           ` Kevin Traynor
@ 2026-02-10 20:58             ` Slava Ovsiienko
  2026-02-19 14:44               ` Kevin Traynor
  0 siblings, 1 reply; 33+ messages in thread
From: Slava Ovsiienko @ 2026-02-10 20:58 UTC (permalink / raw)
  To: Kevin Traynor, Stephen Hemminger
  Cc: dev@dpdk.org, NBU-Contact-Thomas Monjalon (EXTERNAL),
	david.marchand@redhat.com, Dariusz Sosnowski, stable@dpdk.org,
	Harman Kalra

Hi,

What about checking EPOLLERR | EPOLLHUP | EPOLLRDHU flags for specific fd in mlx5 habdler?

if devx_get_async_cmd_comp() returns EAGAIN {
   if no data were read {
    call epoll_wait() for specific fd and zero timeout
    check EPOLLERR | EPOLLHUP | EPOLLRDHU flags
    if fd is in hanging/error state {
     - remove handler 
    }
  }
}

With best regards,
Slava

> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: Tuesday, February 10, 2026 9:08 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; stable@dpdk.org; Harman Kalra
> <hkalra@marvell.com>
> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
> 
> On 10/02/2026 17:05, Slava Ovsiienko wrote:
> > Hi,
> >
> 
> Hi Slava,
> 
> > I'm sorry, I have some concern about the patch.
> >
> 
> No problem, that's what reviews are for :-) thanks for reviewing.
> 
> > How it works, as far as I understand:
> >
> > - DPDK simulates interrupts in user mode with epoll_wait()
> > - mlx5 PMD emits the async counter query command to the NIC
> > periodically
> 
> I didn't think this would happen unless there was something like hardware
> offload, but regardless, yes I agree there may be async counter queries.
> 
> > - there might be multiple async query commands in the flight
> > - kernel drivers handles the async query completion interrupts, pushes
> > the token to the internal completion queue and unblocks associated fd
> > - epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
> > - mlx5 PMD reads the completion token from the kernel queue with
> > devx_get_async_cmd_comp()
> >
> > The concern scenario, let's assume:
> >
> > - we have 2 async query commands in the flight
> > - the first async query completes, fd is unblocked, PMD is inviked,
> > the completion is read by PMD and is being handled
> > - the second async query completes, fd gets unblocked, the second
> > token is written to the queue
> > - the PMD completes the handling of the first completion and reads the
> > queue again (with devx_get_async_cmd_comp() call in the loop)
> > - it reads the second token successfully and handles
> > - then, on the third call, devx_get_async_cmd_comp() returns EAGAIN,
> > it means queue is empty
> > - DPDK calls epoll_wait() again and sees unblocked fd
> > - it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue
> > is empty (handled in previous interrupt handling)
> > - with the patch we wrongly remove the handler
> >
> 
> I'm not sure, but this ^^^ sounds feasible.
> 
> > In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU
> > from the epoll_wait()_return also for RTE_INTR_HANDLE_EXT and
> RTE_INTR_HANDLE_DEV_EVENT interrupt types.
> >
> 
> That's exactly what I had in v1 of the patch! The issue is that some clients of eal
> interrupt may not interpret the condition of EPOLLHUP/EPOLLRDHUP as an
> error condition and/or want to do some special handling.
> 
> The example is vhost user server, which puts in place a reconnect mechanism. If
> we filter out EPOLLHUP/EPOLLRDHUP events in eal, then virtio will not receive
> the callback and vhost server reconnect is broken. I have some more notes
> about it in the cover letter.
> 
> Trying to base on the read pattern in devx handler was an attempt to move logic
> out of eal so different handlers could be flexible in how they handle this
> condition.
> 
> We do have a distinction in that mlx5 uses RTE_INTR_HANDLE_EXT and virtio
> uses RTE_INTR_HANDLE_VDEV but i'm not sure that is generic enough to base a
> check/don't check for EPOLLHUP/EPOLLRDHUP events on.
> 
> So we'd need to come up with another solution if we wanted to filter this in eal.
> Let's think more on this, though we are a bit constrained by public API as well.
> 
> A workaround we can do from application is David's hack™ "-a 0000:00:00.0"
> to skip initial probe. That will at least prevent the issue for mlx devices not used
> in DPDK, which was the scenario reported.
> 
> thanks,
> Kevin.
> 
> > With best regards,
> > Slava
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: Kevin Traynor <ktraynor@redhat.com>
> >> Sent: Tuesday, February 10, 2026 5:06 PM
> >> To: Stephen Hemminger <stephen@networkplumber.org>
> >> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> >> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
> >> <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> >> stable@dpdk.org
> >> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx
> >> interrupt
> >>
> >> On 07/02/2026 06:09, Stephen Hemminger wrote:
> >>> On Fri,  6 Feb 2026 17:20:53 +0000
> >>> Kevin Traynor <ktraynor@redhat.com> wrote:
> >>>
> >>>> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
> >> EPOLLRDHUP
> >>>> epoll events for the devx interrupt fd.
> >>>>
> >>>> This may happen if the interrupt fd is deleted, if the device is
> >>>> unbound from mlx5_core kernel driver or if the device is removed by
> >>>> the mlx5 kernel driver as part of LAG setup.
> >>>>
> >>>> When that occurs, there is no data to be read and in the devx
> >>>> interrupt handler an EAGAIN is returned on the first call to
> >>>> devx_get_async_cmd_comp, but this is not checked.
> >>>>
> >>>> As the interrupt is not removed or condition reset, it causes an
> >>>> interrupt processing busy-loop, which leads to the dpdk-intr thread
> >>>> going to 100% CPU.
> >>>>
> >>>> e.g.
> >>>> epoll_wait
> >>>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1)
> >>>> =
> >>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >>>>    = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
> >>>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1)
> >>>> =
> >>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >>>>    = -1 EAGAIN (Resource temporarily unavailable)
> >>>>
> >>>> Add a check for an EAGAIN return from devx_get_async_cmd_comp on
> >>>> the first read. If that happens, unregister the callback to prevent looping.
> >>>>
> >>>> Bugzilla ID: 1873
> >>>> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >>>
> >>> AI spotted this, I didn't...
> >>>
> >>>
> >>> Errors:
> >>>
> >>>     Line 139: Unnecessary semicolon after closing brace
> >>>
> >>> c
> >>>
> >>>    };
> >>>
> >>> Should be:
> >>> c
> >>>
> >>>    }
> >>>
> >>>     Lines 142-146: Block comment uses incorrect style Block comments
> >>> in C
> >> code should use /* and */ style, not /** which is reserved for
> >> documentation comments.
> >>>
> >>> c
> >>>
> >>>    /**
> >>>     * no data and EAGAIN indicate there is an error or
> >>>     * disconnect state. Unregister callback to prevent
> >>>     * interrupt busy-looping.
> >>>     */
> >>>
> >>> Should be:
> >>> c
> >>>
> >>>    /*
> >>>     * no data and EAGAIN indicate there is an error or
> >>>     * disconnect state. Unregister callback to prevent
> >>>     * interrupt busy-looping.
> >>>     */
> >>>
> >>> Warnings:
> >>>
> >>>     Logic clarity: The variable data_read is set to true inside the
> >>> while loop but
> >> never checked when data WAS read. Consider if data_read is the
> >> clearest way to express this condition.
> >>>
> >>
> >> Ack above. Thanks.Will be fixed in v3.
> >


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 0/3] interrupt disconnect/error event handling
  2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
                   ` (2 preceding siblings ...)
  2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
@ 2026-02-19 14:37 ` Kevin Traynor
  2026-02-19 14:38 ` Kevin Traynor
  2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
  5 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-19 14:37 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor

These patches are to fix some issues with epoll event handling for
EPOLLERR/EPOLLRDHUP/EPOLLHUP.

1/3: handles these disconnect/error events for interrupts that are read
in eal

2/3: provides an API for interrupt callbacks to get the interrupt events
for the active interrupt

3/3: deal with the observed issue as reported in
https://bugs.dpdk.org/show_bug.cgi?id=1873 where mlx5 devx interrupts
cause a busy-loop and 100% CPU of dpdk-intr thread.

v4:
Updated to allow for case where devx interrupt handler may handle
multiple completions during one interrupt call, leading to no data being
read in a subsequent call as flagged by Slava.

- 1/3 No change
- 2/3 New API rte_intr_active_events() to get interrupt events
- 3/3 Use new API in mlx5 devx interrupt handler to detect if
  disconnect/error events and if so unregister the callback

v3:
- 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
- 2/2 Make log level consistant (David)

v2:
- Only handle disconnect/error epoll events when the read is done in eal
  interrupt code. This is to allow interrupt handlers like virtio deal
  with disconnects in an appropriate
- Detect if not data is read in the mlx dex interrupt and if so unregister
  the callback

Kevin Traynor (3):
  eal/linux: handle interrupt epoll events
  eal/interrupt: add interrupt event info
  net/mlx5: check devx disconnect/error interrupt events

 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  20 +++++
 lib/eal/freebsd/eal_interrupts.c        |   7 ++
 lib/eal/include/rte_interrupts.h        |  23 ++++++
 lib/eal/linux/eal_interrupts.c          | 103 +++++++++++++++++-------
 lib/eal/windows/eal_interrupts.c        |   7 ++
 5 files changed, 133 insertions(+), 27 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 0/3] interrupt disconnect/error event handling
  2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
                   ` (3 preceding siblings ...)
  2026-02-19 14:37 ` [PATCH v4 0/3] interrupt disconnect/error event handling Kevin Traynor
@ 2026-02-19 14:38 ` Kevin Traynor
  2026-02-19 14:38   ` [PATCH v4 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
                     ` (3 more replies)
  2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
  5 siblings, 4 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-19 14:38 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor

These patches are to fix some issues with epoll event handling for
EPOLLERR/EPOLLRDHUP/EPOLLHUP.

1/3: handles these disconnect/error events for interrupts that are read
in eal

2/3: provides an API for interrupt callbacks to get the interrupt events
for the active interrupt

3/3: deal with the observed issue as reported in
https://bugs.dpdk.org/show_bug.cgi?id=1873 where mlx5 devx interrupts
cause a busy-loop and 100% CPU of dpdk-intr thread.

v4:
Updated to allow for case where devx interrupt handler may handle
multiple completions during one interrupt call, leading to no data being
read in a subsequent call as flagged by Slava.

- 1/3 No change
- 2/3 New API rte_intr_active_events() to get interrupt events
- 3/3 Use new API in mlx5 devx interrupt handler to detect if
  disconnect/error events and if so unregister the callback

v3:
- 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
- 2/2 Make log level consistant (David)

v2:
- Only handle disconnect/error epoll events when the read is done in eal
  interrupt code. This is to allow interrupt handlers like virtio deal
  with disconnects in an appropriate
- Detect if not data is read in the mlx dex interrupt and if so unregister
  the callback

Kevin Traynor (3):
  eal/linux: handle interrupt epoll events
  eal/interrupt: add interrupt event info
  net/mlx5: check devx disconnect/error interrupt events

 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  20 +++++
 lib/eal/freebsd/eal_interrupts.c        |   7 ++
 lib/eal/include/rte_interrupts.h        |  23 ++++++
 lib/eal/linux/eal_interrupts.c          | 103 +++++++++++++++++-------
 lib/eal/windows/eal_interrupts.c        |   7 ++
 5 files changed, 133 insertions(+), 27 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 1/3] eal/linux: handle interrupt epoll events
  2026-02-19 14:38 ` Kevin Traynor
@ 2026-02-19 14:38   ` Kevin Traynor
  2026-02-19 14:38   ` [PATCH v4 2/3] eal/interrupt: add interrupt event info Kevin Traynor
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-19 14:38 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor, stable

Add handling for epoll error and disconnect conditions EPOLLERR,
EPOLLHUP and EPOLLRDHUP.

These events indicate that the interrupt file descriptor is in
an error state or there has been a hangup.

Only do this for interrupts that are read in eal. Interrupts that
are read outside eal should deal with disconnect/error events
appropriate to their functionality. e.g. virtio interrupt handling
has reconnect mechanisms for some cases.

Also, treat no bytes read as an error condition.

Bugzilla ID: 1873
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/linux/eal_interrupts.c | 72 ++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 9db978923a..f3f6bdd01d 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -887,4 +887,26 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static void
+eal_intr_source_remove_and_free(struct rte_intr_source *src)
+{
+	struct rte_intr_callback *cb, *next;
+
+	/* Remove the interrupt source */
+	rte_spinlock_lock(&intr_lock);
+	TAILQ_REMOVE(&intr_sources, src, next);
+	rte_spinlock_unlock(&intr_lock);
+
+	/* Free callbacks */
+	for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
+		next = TAILQ_NEXT(cb, next);
+		TAILQ_REMOVE(&src->callbacks, cb, next);
+		free(cb);
+	}
+
+	/* Free the interrupt source */
+	rte_intr_instance_free(src->intr_handle);
+	free(src);
+}
+
 static int
 eal_intr_process_interrupts(struct epoll_event *events, int nfds)
@@ -952,40 +974,38 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 
 		if (bytes_read > 0) {
-			/**
+			/*
+			 * Check for epoll error or disconnect events for
+			 * interrupts that are read directly in eal.
+			 */
+			if (events[n].events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) {
+				EAL_LOG(ERR, "Disconnect condition on fd %d "
+					"(events=0x%x), removing from epoll",
+					events[n].data.fd, events[n].events);
+				eal_intr_source_remove_and_free(src);
+				return -1;
+			}
+
+			/*
 			 * read out to clear the ready-to-be-read flag
 			 * for epoll_wait.
 			 */
 			bytes_read = read(events[n].data.fd, &buf, bytes_read);
-			if (bytes_read < 0) {
+			if (bytes_read > 0) {
+				call = true;
+			} else if (bytes_read < 0) {
 				if (errno == EINTR || errno == EWOULDBLOCK)
 					continue;
 
-				EAL_LOG(ERR, "Error reading from file "
-					"descriptor %d: %s",
+				EAL_LOG(ERR, "Error reading from file descriptor %d: %s",
 					events[n].data.fd,
 					strerror(errno));
-				/*
-				 * The device is unplugged or buggy, remove
-				 * it as an interrupt source and return to
-				 * force the wait list to be rebuilt.
-				 */
-				rte_spinlock_lock(&intr_lock);
-				TAILQ_REMOVE(&intr_sources, src, next);
-				rte_spinlock_unlock(&intr_lock);
-
-				for (cb = TAILQ_FIRST(&src->callbacks); cb;
-							cb = next) {
-					next = TAILQ_NEXT(cb, next);
-					TAILQ_REMOVE(&src->callbacks, cb, next);
-					free(cb);
-				}
-				rte_intr_instance_free(src->intr_handle);
-				free(src);
+			} else {
+				EAL_LOG(ERR, "Read nothing from file descriptor %d",
+					events[n].data.fd);
+			}
+			if (bytes_read <= 0) {
+				eal_intr_source_remove_and_free(src);
 				return -1;
-			} else if (bytes_read == 0)
-				EAL_LOG(ERR, "Read nothing from file "
-					"descriptor %d", events[n].data.fd);
-			else
-				call = true;
+			}
 		}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 2/3] eal/interrupt: add interrupt event info
  2026-02-19 14:38 ` Kevin Traynor
  2026-02-19 14:38   ` [PATCH v4 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
@ 2026-02-19 14:38   ` Kevin Traynor
  2026-02-26 15:41     ` David Marchand
  2026-02-19 14:38   ` [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
  2026-02-19 18:52   ` [PATCH v4 0/3] interrupt disconnect/error event handling Stephen Hemminger
  3 siblings, 1 reply; 33+ messages in thread
From: Kevin Traynor @ 2026-02-19 14:38 UTC (permalink / raw)
  To: dev; +Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor

Add RTE_INTR_EVENT_* defines and a new API rte_intr_active_events()
in order to retrieve them.

As the events are in the context of the current interrupt,
rte_intr_active_events() must be called from the context of
an interrupt callback.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/eal/freebsd/eal_interrupts.c |  7 +++++++
 lib/eal/include/rte_interrupts.h | 23 +++++++++++++++++++++++
 lib/eal/linux/eal_interrupts.c   | 31 ++++++++++++++++++++++++++++++-
 lib/eal/windows/eal_interrupts.c |  7 +++++++
 4 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/lib/eal/freebsd/eal_interrupts.c b/lib/eal/freebsd/eal_interrupts.c
index 5c3ab6699e..aa0bd50009 100644
--- a/lib/eal/freebsd/eal_interrupts.c
+++ b/lib/eal/freebsd/eal_interrupts.c
@@ -769,2 +769,9 @@ int rte_thread_is_intr(void)
 	return rte_thread_equal(intr_thread, rte_thread_self());
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_intr_active_events, 26.03)
+uint32_t
+rte_intr_active_events(void)
+{
+	return 0;
+}
diff --git a/lib/eal/include/rte_interrupts.h b/lib/eal/include/rte_interrupts.h
index 1b9a0b2a78..bff4f98f85 100644
--- a/lib/eal/include/rte_interrupts.h
+++ b/lib/eal/include/rte_interrupts.h
@@ -40,4 +40,10 @@ struct rte_intr_handle;
 #define RTE_INTR_VEC_RXTX_OFFSET      1
 
+/** Interrupt event flags returned by rte_intr_active_events() */
+#define RTE_INTR_EVENT_IN    (1 << 0)  /**< Data available to read */
+#define RTE_INTR_EVENT_ERR   (1 << 1)  /**< Error condition on fd */
+#define RTE_INTR_EVENT_HUP   (1 << 2)  /**< Hang up / disconnect */
+#define RTE_INTR_EVENT_RDHUP (1 << 3)  /**< Read Hang up / disconnect */
+
 /**
  * The interrupt source type, e.g. UIO, VFIO, ALARM etc.
@@ -197,4 +203,21 @@ int rte_intr_ack(const struct rte_intr_handle *intr_handle);
 int rte_thread_is_intr(void);
 
+/**
+ * Return the event flags for the interrupt currently being processed.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Must be called from an interrupt callback running on the EAL
+ * interrupt thread. The returned value is a bitmask of
+ * RTE_INTR_EVENT_* flags.
+ *
+ * @return
+ *   Active event flags, or 0 if not in interrupt context or
+ *   on platforms that do not support this feature.
+ */
+__rte_experimental
+uint32_t rte_intr_active_events(void);
+
 /**
  * It allocates memory for interrupt instance. API takes flag as an argument
diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index f3f6bdd01d..43493aa299 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -41,4 +41,6 @@
 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
 
+static uint32_t active_events; /**< events for active interrupt */
+
 /**
  * union for pipe fds.
@@ -887,4 +889,20 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static uint32_t
+epoll_to_intr_events(uint32_t epoll_events)
+{
+	uint32_t ev = 0;
+
+	if (epoll_events & EPOLLIN)
+		ev |= RTE_INTR_EVENT_IN;
+	if (epoll_events & EPOLLERR)
+		ev |= RTE_INTR_EVENT_ERR;
+	if (epoll_events & EPOLLHUP)
+		ev |= RTE_INTR_EVENT_HUP;
+	if (epoll_events & EPOLLRDHUP)
+		ev |= RTE_INTR_EVENT_RDHUP;
+	return ev;
+}
+
 static void
 eal_intr_source_remove_and_free(struct rte_intr_source *src)
@@ -1014,5 +1032,5 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 
 		if (call) {
-
+			active_events = epoll_to_intr_events(events[n].events);
 			/* Finally, call all callbacks. */
 			TAILQ_FOREACH(cb, &src->callbacks, next) {
@@ -1028,4 +1046,5 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				rte_spinlock_lock(&intr_lock);
 			}
+			active_events = 0;
 		}
 		/* we done with that interrupt source, release it. */
@@ -1642,2 +1661,12 @@ int rte_thread_is_intr(void)
 	return rte_thread_equal(intr_thread, rte_thread_self());
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_intr_active_events, 26.03)
+uint32_t
+rte_intr_active_events(void)
+{
+	if (rte_thread_is_intr())
+		return active_events;
+
+	return 0;
+}
diff --git a/lib/eal/windows/eal_interrupts.c b/lib/eal/windows/eal_interrupts.c
index 5ff30c7631..1c7700eca2 100644
--- a/lib/eal/windows/eal_interrupts.c
+++ b/lib/eal/windows/eal_interrupts.c
@@ -117,4 +117,11 @@ rte_thread_is_intr(void)
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_intr_active_events, 26.03)
+uint32_t
+rte_intr_active_events(void)
+{
+	return 0;
+}
+
 RTE_EXPORT_INTERNAL_SYMBOL(rte_intr_rx_ctl)
 int
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events
  2026-02-19 14:38 ` Kevin Traynor
  2026-02-19 14:38   ` [PATCH v4 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
  2026-02-19 14:38   ` [PATCH v4 2/3] eal/interrupt: add interrupt event info Kevin Traynor
@ 2026-02-19 14:38   ` Kevin Traynor
  2026-03-03 16:16     ` Slava Ovsiienko
  2026-02-19 18:52   ` [PATCH v4 0/3] interrupt disconnect/error event handling Stephen Hemminger
  3 siblings, 1 reply; 33+ messages in thread
From: Kevin Traynor @ 2026-02-19 14:38 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra,
	Kevin Traynor, stable

A busy-loop may occur when there are disconnect/error events
such as EPOLLERR, EPOLLHUP or EPOLLRDHUP on Linux for the devx
interrupt fd.

This may happen if the interrupt fd is deleted, if the device
is unbound from mlx5_core kernel driver or if the device is
removed by the mlx5 kernel driver as part of LAG setup.

As the interrupt is not removed or condition reset, it causes
an interrupt processing busy-loop, which leads to the dpdk-intr
thread going to 100% CPU.

e.g.
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)

In order to prevent a busy-loop use the eal API rte_intr_active_events()
to get the interrupt events and check for disconnect/error.

If there is a disconnect/error event, unregister the devx callback.

Bugzilla ID: 1873
Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 18819a4a0f..1ddb620f65 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -860,4 +860,24 @@ mlx5_dev_interrupt_handler_devx(void *cb_arg)
 	} out;
 	uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+	uint32_t events = rte_intr_active_events();
+
+	if (events & (RTE_INTR_EVENT_HUP | RTE_INTR_EVENT_RDHUP | RTE_INTR_EVENT_ERR)) {
+		/*
+		 * Disconnect or Error event that cannot be cleared by reading.
+		 * Unregister callback to prevent interrupt busy-looping.
+		 */
+		DRV_LOG(WARNING, "disconnect or error event for mlx5 devx interrupt on fd %d"
+			" (events=0x%x)",
+			rte_intr_fd_get(sh->intr_handle_devx), events);
+
+		if (rte_intr_callback_unregister_pending(sh->intr_handle_devx,
+							 mlx5_dev_interrupt_handler_devx,
+							 (void *)sh, NULL) < 0) {
+			DRV_LOG(WARNING,
+				"unable to unregister mlx5 devx interrupt callback on fd %d",
+				rte_intr_fd_get(sh->intr_handle_devx));
+		}
+		return;
+	}
 
 	while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
  2026-02-10 20:58             ` Slava Ovsiienko
@ 2026-02-19 14:44               ` Kevin Traynor
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-02-19 14:44 UTC (permalink / raw)
  To: Slava Ovsiienko, Stephen Hemminger
  Cc: dev@dpdk.org, NBU-Contact-Thomas Monjalon (EXTERNAL),
	david.marchand@redhat.com, Dariusz Sosnowski, stable@dpdk.org,
	Harman Kalra

On 10/02/2026 20:58, Slava Ovsiienko wrote:
> Hi,
> 
> What about checking EPOLLERR | EPOLLHUP | EPOLLRDHU flags for specific fd in mlx5 habdler?
> 
> if devx_get_async_cmd_comp() returns EAGAIN {
>    if no data were read {
>     call epoll_wait() for specific fd and zero timeout
>     check EPOLLERR | EPOLLHUP | EPOLLRDHU flags
>     if fd is in hanging/error state {
>      - remove handler 
>     }
>   }
> }
> 

Thanks for the suggestion. We already have this info at the time of
callback. So I added an API to get it and made it os agnostic in v4.

Let me know if you have comments or suggestion. Thanks!

> With best regards,
> Slava
> 
>> -----Original Message-----
>> From: Kevin Traynor <ktraynor@redhat.com>
>> Sent: Tuesday, February 10, 2026 9:08 PM
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Stephen Hemminger
>> <stephen@networkplumber.org>
>> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
>> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
>> <dsosnowski@nvidia.com>; stable@dpdk.org; Harman Kalra
>> <hkalra@marvell.com>
>> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
>>
>> On 10/02/2026 17:05, Slava Ovsiienko wrote:
>>> Hi,
>>>
>>
>> Hi Slava,
>>
>>> I'm sorry, I have some concern about the patch.
>>>
>>
>> No problem, that's what reviews are for :-) thanks for reviewing.
>>
>>> How it works, as far as I understand:
>>>
>>> - DPDK simulates interrupts in user mode with epoll_wait()
>>> - mlx5 PMD emits the async counter query command to the NIC
>>> periodically
>>
>> I didn't think this would happen unless there was something like hardware
>> offload, but regardless, yes I agree there may be async counter queries.
>>
>>> - there might be multiple async query commands in the flight
>>> - kernel drivers handles the async query completion interrupts, pushes
>>> the token to the internal completion queue and unblocks associated fd
>>> - epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
>>> - mlx5 PMD reads the completion token from the kernel queue with
>>> devx_get_async_cmd_comp()
>>>
>>> The concern scenario, let's assume:
>>>
>>> - we have 2 async query commands in the flight
>>> - the first async query completes, fd is unblocked, PMD is inviked,
>>> the completion is read by PMD and is being handled
>>> - the second async query completes, fd gets unblocked, the second
>>> token is written to the queue
>>> - the PMD completes the handling of the first completion and reads the
>>> queue again (with devx_get_async_cmd_comp() call in the loop)
>>> - it reads the second token successfully and handles
>>> - then, on the third call, devx_get_async_cmd_comp() returns EAGAIN,
>>> it means queue is empty
>>> - DPDK calls epoll_wait() again and sees unblocked fd
>>> - it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue
>>> is empty (handled in previous interrupt handling)
>>> - with the patch we wrongly remove the handler
>>>
>>
>> I'm not sure, but this ^^^ sounds feasible.
>>
>>> In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU
>>> from the epoll_wait()_return also for RTE_INTR_HANDLE_EXT and
>> RTE_INTR_HANDLE_DEV_EVENT interrupt types.
>>>
>>
>> That's exactly what I had in v1 of the patch! The issue is that some clients of eal
>> interrupt may not interpret the condition of EPOLLHUP/EPOLLRDHUP as an
>> error condition and/or want to do some special handling.
>>
>> The example is vhost user server, which puts in place a reconnect mechanism. If
>> we filter out EPOLLHUP/EPOLLRDHUP events in eal, then virtio will not receive
>> the callback and vhost server reconnect is broken. I have some more notes
>> about it in the cover letter.
>>
>> Trying to base on the read pattern in devx handler was an attempt to move logic
>> out of eal so different handlers could be flexible in how they handle this
>> condition.
>>
>> We do have a distinction in that mlx5 uses RTE_INTR_HANDLE_EXT and virtio
>> uses RTE_INTR_HANDLE_VDEV but i'm not sure that is generic enough to base a
>> check/don't check for EPOLLHUP/EPOLLRDHUP events on.
>>
>> So we'd need to come up with another solution if we wanted to filter this in eal.
>> Let's think more on this, though we are a bit constrained by public API as well.
>>
>> A workaround we can do from application is David's hack™ "-a 0000:00:00.0"
>> to skip initial probe. That will at least prevent the issue for mlx devices not used
>> in DPDK, which was the scenario reported.
>>
>> thanks,
>> Kevin.
>>
>>> With best regards,
>>> Slava
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Kevin Traynor <ktraynor@redhat.com>
>>>> Sent: Tuesday, February 10, 2026 5:06 PM
>>>> To: Stephen Hemminger <stephen@networkplumber.org>
>>>> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
>>>> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
>>>> <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
>>>> stable@dpdk.org
>>>> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx
>>>> interrupt
>>>>
>>>> On 07/02/2026 06:09, Stephen Hemminger wrote:
>>>>> On Fri,  6 Feb 2026 17:20:53 +0000
>>>>> Kevin Traynor <ktraynor@redhat.com> wrote:
>>>>>
>>>>>> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
>>>> EPOLLRDHUP
>>>>>> epoll events for the devx interrupt fd.
>>>>>>
>>>>>> This may happen if the interrupt fd is deleted, if the device is
>>>>>> unbound from mlx5_core kernel driver or if the device is removed by
>>>>>> the mlx5 kernel driver as part of LAG setup.
>>>>>>
>>>>>> When that occurs, there is no data to be read and in the devx
>>>>>> interrupt handler an EAGAIN is returned on the first call to
>>>>>> devx_get_async_cmd_comp, but this is not checked.
>>>>>>
>>>>>> As the interrupt is not removed or condition reset, it causes an
>>>>>> interrupt processing busy-loop, which leads to the dpdk-intr thread
>>>>>> going to 100% CPU.
>>>>>>
>>>>>> e.g.
>>>>>> epoll_wait
>>>>>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1)
>>>>>> =
>>>>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
>>>>>>    = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
>>>>>>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1)
>>>>>> =
>>>>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
>>>>>>    = -1 EAGAIN (Resource temporarily unavailable)
>>>>>>
>>>>>> Add a check for an EAGAIN return from devx_get_async_cmd_comp on
>>>>>> the first read. If that happens, unregister the callback to prevent looping.
>>>>>>
>>>>>> Bugzilla ID: 1873
>>>>>> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>>>>>
>>>>> AI spotted this, I didn't...
>>>>>
>>>>>
>>>>> Errors:
>>>>>
>>>>>     Line 139: Unnecessary semicolon after closing brace
>>>>>
>>>>> c
>>>>>
>>>>>    };
>>>>>
>>>>> Should be:
>>>>> c
>>>>>
>>>>>    }
>>>>>
>>>>>     Lines 142-146: Block comment uses incorrect style Block comments
>>>>> in C
>>>> code should use /* and */ style, not /** which is reserved for
>>>> documentation comments.
>>>>>
>>>>> c
>>>>>
>>>>>    /**
>>>>>     * no data and EAGAIN indicate there is an error or
>>>>>     * disconnect state. Unregister callback to prevent
>>>>>     * interrupt busy-looping.
>>>>>     */
>>>>>
>>>>> Should be:
>>>>> c
>>>>>
>>>>>    /*
>>>>>     * no data and EAGAIN indicate there is an error or
>>>>>     * disconnect state. Unregister callback to prevent
>>>>>     * interrupt busy-looping.
>>>>>     */
>>>>>
>>>>> Warnings:
>>>>>
>>>>>     Logic clarity: The variable data_read is set to true inside the
>>>>> while loop but
>>>> never checked when data WAS read. Consider if data_read is the
>>>> clearest way to express this condition.
>>>>>
>>>>
>>>> Ack above. Thanks.Will be fixed in v3.
>>>
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 0/3] interrupt disconnect/error event handling
  2026-02-19 14:38 ` Kevin Traynor
                     ` (2 preceding siblings ...)
  2026-02-19 14:38   ` [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
@ 2026-02-19 18:52   ` Stephen Hemminger
  2026-03-02 11:41     ` Kevin Traynor
  3 siblings, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2026-02-19 18:52 UTC (permalink / raw)
  To: Kevin Traynor
  Cc: dev, thomas, david.marchand, dsosnowski, viacheslavo, hkalra

On Thu, 19 Feb 2026 14:38:49 +0000
Kevin Traynor <ktraynor@redhat.com> wrote:

> These patches are to fix some issues with epoll event handling for
> EPOLLERR/EPOLLRDHUP/EPOLLHUP.
> 
> 1/3: handles these disconnect/error events for interrupts that are read
> in eal
> 
> 2/3: provides an API for interrupt callbacks to get the interrupt events
> for the active interrupt
> 
> 3/3: deal with the observed issue as reported in
> https://bugs.dpdk.org/show_bug.cgi?id=1873 where mlx5 devx interrupts
> cause a busy-loop and 100% CPU of dpdk-intr thread.
> 
> v4:
> Updated to allow for case where devx interrupt handler may handle
> multiple completions during one interrupt call, leading to no data being
> read in a subsequent call as flagged by Slava.
> 
> - 1/3 No change
> - 2/3 New API rte_intr_active_events() to get interrupt events
> - 3/3 Use new API in mlx5 devx interrupt handler to detect if
>   disconnect/error events and if so unregister the callback
> 
> v3:
> - 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
> - 2/2 Make log level consistant (David)
> 
> v2:
> - Only handle disconnect/error epoll events when the read is done in eal
>   interrupt code. This is to allow interrupt handlers like virtio deal
>   with disconnects in an appropriate
> - Detect if not data is read in the mlx dex interrupt and if so unregister
>   the callback
> 
> Kevin Traynor (3):
>   eal/linux: handle interrupt epoll events
>   eal/interrupt: add interrupt event info
>   net/mlx5: check devx disconnect/error interrupt events
> 
>  drivers/net/mlx5/linux/mlx5_ethdev_os.c |  20 +++++
>  lib/eal/freebsd/eal_interrupts.c        |   7 ++
>  lib/eal/include/rte_interrupts.h        |  23 ++++++
>  lib/eal/linux/eal_interrupts.c          | 103 +++++++++++++++++-------
>  lib/eal/windows/eal_interrupts.c        |   7 ++
>  5 files changed, 133 insertions(+), 27 deletions(-)
> 

Series-Acked-by: Stephen Hemminger <stephen@networkplumber.org>

FYI - AI review gives long winded version of "this is patch is good"
The only worthwhile feedback was that there should be a release note for
a new API.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/3] eal/interrupt: add interrupt event info
  2026-02-19 14:38   ` [PATCH v4 2/3] eal/interrupt: add interrupt event info Kevin Traynor
@ 2026-02-26 15:41     ` David Marchand
  2026-03-02 11:47       ` Kevin Traynor
  0 siblings, 1 reply; 33+ messages in thread
From: David Marchand @ 2026-02-26 15:41 UTC (permalink / raw)
  To: Kevin Traynor; +Cc: dev, thomas, dsosnowski, viacheslavo, hkalra

On Thu, 19 Feb 2026 at 15:39, Kevin Traynor <ktraynor@redhat.com> wrote:
>
> Add RTE_INTR_EVENT_* defines and a new API rte_intr_active_events()
> in order to retrieve them.
>
> As the events are in the context of the current interrupt,
> rte_intr_active_events() must be called from the context of
> an interrupt callback.
>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

I have mixed feelings about letting this API in the open.
We only have one user (the mlx5 driver), so I would mark it internal
for now, and open later if some external user asks for it.

And on the name itself, maybe: rte_intr_active_event_flags() ?

[snip]

> diff --git a/lib/eal/include/rte_interrupts.h b/lib/eal/include/rte_interrupts.h
> index 1b9a0b2a78..bff4f98f85 100644
> --- a/lib/eal/include/rte_interrupts.h
> +++ b/lib/eal/include/rte_interrupts.h
> @@ -40,4 +40,10 @@ struct rte_intr_handle;
>  #define RTE_INTR_VEC_RXTX_OFFSET      1
>
> +/** Interrupt event flags returned by rte_intr_active_events() */
> +#define RTE_INTR_EVENT_IN    (1 << 0)  /**< Data available to read */
> +#define RTE_INTR_EVENT_ERR   (1 << 1)  /**< Error condition on fd */
> +#define RTE_INTR_EVENT_HUP   (1 << 2)  /**< Hang up / disconnect */
> +#define RTE_INTR_EVENT_RDHUP (1 << 3)  /**< Read Hang up / disconnect */

Nit: RTE_BIT32()


-- 
David Marchand


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 0/3] interrupt disconnect/error event handling
  2026-02-19 18:52   ` [PATCH v4 0/3] interrupt disconnect/error event handling Stephen Hemminger
@ 2026-03-02 11:41     ` Kevin Traynor
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-03-02 11:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas, david.marchand, dsosnowski, viacheslavo, hkalra

On 2/19/26 6:52 PM, Stephen Hemminger wrote:
> On Thu, 19 Feb 2026 14:38:49 +0000
> Kevin Traynor <ktraynor@redhat.com> wrote:
> 
>> These patches are to fix some issues with epoll event handling for
>> EPOLLERR/EPOLLRDHUP/EPOLLHUP.
>>
>> 1/3: handles these disconnect/error events for interrupts that are read
>> in eal
>>
>> 2/3: provides an API for interrupt callbacks to get the interrupt events
>> for the active interrupt
>>
>> 3/3: deal with the observed issue as reported in
>> https://bugs.dpdk.org/show_bug.cgi?id=1873 where mlx5 devx interrupts
>> cause a busy-loop and 100% CPU of dpdk-intr thread.
>>
>> v4:
>> Updated to allow for case where devx interrupt handler may handle
>> multiple completions during one interrupt call, leading to no data being
>> read in a subsequent call as flagged by Slava.
>>
>> - 1/3 No change
>> - 2/3 New API rte_intr_active_events() to get interrupt events
>> - 3/3 Use new API in mlx5 devx interrupt handler to detect if
>>   disconnect/error events and if so unregister the callback
>>
>> v3:
>> - 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
>> - 2/2 Make log level consistant (David)
>>
>> v2:
>> - Only handle disconnect/error epoll events when the read is done in eal
>>   interrupt code. This is to allow interrupt handlers like virtio deal
>>   with disconnects in an appropriate
>> - Detect if not data is read in the mlx dex interrupt and if so unregister
>>   the callback
>>
>> Kevin Traynor (3):
>>   eal/linux: handle interrupt epoll events
>>   eal/interrupt: add interrupt event info
>>   net/mlx5: check devx disconnect/error interrupt events
>>
>>  drivers/net/mlx5/linux/mlx5_ethdev_os.c |  20 +++++
>>  lib/eal/freebsd/eal_interrupts.c        |   7 ++
>>  lib/eal/include/rte_interrupts.h        |  23 ++++++
>>  lib/eal/linux/eal_interrupts.c          | 103 +++++++++++++++++-------
>>  lib/eal/windows/eal_interrupts.c        |   7 ++
>>  5 files changed, 133 insertions(+), 27 deletions(-)
>>
> 
> Series-Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> FYI - AI review gives long winded version of "this is patch is good"
> The only worthwhile feedback was that there should be a release note for
> a new API.
> 

Thanks Stephen. I will address this on the next version.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/3] eal/interrupt: add interrupt event info
  2026-02-26 15:41     ` David Marchand
@ 2026-03-02 11:47       ` Kevin Traynor
  0 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-03-02 11:47 UTC (permalink / raw)
  To: David Marchand, viacheslavo, dsosnowski; +Cc: dev, thomas, hkalra

On 2/26/26 3:41 PM, David Marchand wrote:
> On Thu, 19 Feb 2026 at 15:39, Kevin Traynor <ktraynor@redhat.com> wrote:
>>
>> Add RTE_INTR_EVENT_* defines and a new API rte_intr_active_events()
>> in order to retrieve them.
>>
>> As the events are in the context of the current interrupt,
>> rte_intr_active_events() must be called from the context of
>> an interrupt callback.
>>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> 
> I have mixed feelings about letting this API in the open.
> We only have one user (the mlx5 driver), so I would mark it internal
> for now, and open later if some external user asks for it.
> 

Sounds good to me. I think it's generic enough to be public, but as you
say the only known use at present is for mlx5, so let's keep internal so
we are less restricted if we wanted to change it etc.

> And on the name itself, maybe: rte_intr_active_event_flags() ?
> 

Fine for me, it adds a bit more description.

> [snip]
> 
>> diff --git a/lib/eal/include/rte_interrupts.h b/lib/eal/include/rte_interrupts.h
>> index 1b9a0b2a78..bff4f98f85 100644
>> --- a/lib/eal/include/rte_interrupts.h
>> +++ b/lib/eal/include/rte_interrupts.h
>> @@ -40,4 +40,10 @@ struct rte_intr_handle;
>>  #define RTE_INTR_VEC_RXTX_OFFSET      1
>>
>> +/** Interrupt event flags returned by rte_intr_active_events() */
>> +#define RTE_INTR_EVENT_IN    (1 << 0)  /**< Data available to read */
>> +#define RTE_INTR_EVENT_ERR   (1 << 1)  /**< Error condition on fd */
>> +#define RTE_INTR_EVENT_HUP   (1 << 2)  /**< Hang up / disconnect */
>> +#define RTE_INTR_EVENT_RDHUP (1 << 3)  /**< Read Hang up / disconnect */
> 
> Nit: RTE_BIT32()
> 
> 

Ack. I will update on next version. Thanks David.

Slava/Dariusz, does this address your concerns from previous version,
any other concerns/comments from mlx5 perspective?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events
  2026-02-19 14:38   ` [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
@ 2026-03-03 16:16     ` Slava Ovsiienko
  0 siblings, 0 replies; 33+ messages in thread
From: Slava Ovsiienko @ 2026-03-03 16:16 UTC (permalink / raw)
  To: Kevin Traynor, dev@dpdk.org
  Cc: NBU-Contact-Thomas Monjalon (EXTERNAL), david.marchand@redhat.com,
	Dariusz Sosnowski, hkalra@marvell.com, stable@dpdk.org

> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: Thursday, February 19, 2026 4:39 PM
> To: dev@dpdk.org
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> david.marchand@redhat.com; Dariusz Sosnowski <dsosnowski@nvidia.com>;
> Slava Ovsiienko <viacheslavo@nvidia.com>; hkalra@marvell.com; Kevin
> Traynor <ktraynor@redhat.com>; stable@dpdk.org
> Subject: [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events
> 
> A busy-loop may occur when there are disconnect/error events such as
> EPOLLERR, EPOLLHUP or EPOLLRDHUP on Linux for the devx interrupt fd.
> 
> This may happen if the interrupt fd is deleted, if the device is unbound from
> mlx5_core kernel driver or if the device is removed by the mlx5 kernel driver as
> part of LAG setup.
> 
> As the interrupt is not removed or condition reset, it causes an interrupt
> processing busy-loop, which leads to the dpdk-intr thread going to 100% CPU.
> 
> e.g.
> epoll_wait
>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
> read(28, 0x7f1f5c7fc2f0, 40)
>    = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
> read(28, 0x7f1f5c7fc2f0, 40)
>    = -1 EAGAIN (Resource temporarily unavailable)
> 
> In order to prevent a busy-loop use the eal API rte_intr_active_events() to get
> the interrupt events and check for disconnect/error.
> 
> If there is a disconnect/error event, unregister the devx callback.
> 
> Bugzilla ID: 1873
> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 0/3] interrupt disconnect/error event handling
  2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
                   ` (4 preceding siblings ...)
  2026-02-19 14:38 ` Kevin Traynor
@ 2026-03-03 18:58 ` Kevin Traynor
  2026-03-03 18:58   ` [PATCH v5 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
                     ` (3 more replies)
  5 siblings, 4 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-03-03 18:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra, stephen,
	Kevin Traynor

These patches are to fix some issues with epoll event handling for
EPOLLERR/EPOLLRDHUP/EPOLLHUP.

1/3: handles these disconnect/error events for interrupts that are read
in eal

2/3: provides an internal API for interrupt callbacks to get the interrupt
events for the active interrupt

3/3: deal with the observed issue as reported in
https://bugs.dpdk.org/show_bug.cgi?id=1873 where mlx5 devx interrupts
cause a busy-loop and 100% CPU of dpdk-intr thread.

v5:
- 2/3 changed API to rte_intr_active_event_flags() and made internal.
  used RTE_BIT32() for defines. Added Cc: stable. Kept Acks as no
  functional changes.

v4:
Updated to allow for case where devx interrupt handler may handle
multiple completions during one interrupt call, leading to no data being
read in a subsequent call as flagged by Slava.

- 1/3 No change
- 2/3 New API rte_intr_active_events() to get interrupt events
- 3/3 Use new API in mlx5 devx interrupt handler to detect if
  disconnect/error events and if so unregister the callback

v3:
- 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
- 2/2 Make log level consistant (David)

v2:
- Only handle disconnect/error epoll events when the read is done in eal
  interrupt code. This is to allow interrupt handlers like virtio deal
  with disconnects in an appropriate
- Detect if not data is read in the mlx dex interrupt and if so unregister
  the callback

Kevin Traynor (3):
  eal/linux: handle interrupt epoll events
  eal/interrupt: add interrupt event info
  net/mlx5: check devx disconnect/error interrupt events

 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  20 +++++
 lib/eal/freebsd/eal_interrupts.c        |   7 ++
 lib/eal/include/rte_interrupts.h        |  21 +++++
 lib/eal/linux/eal_interrupts.c          | 103 +++++++++++++++++-------
 lib/eal/windows/eal_interrupts.c        |   7 ++
 5 files changed, 131 insertions(+), 27 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 1/3] eal/linux: handle interrupt epoll events
  2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
@ 2026-03-03 18:58   ` Kevin Traynor
  2026-03-03 18:58   ` [PATCH v5 2/3] eal/interrupt: add interrupt event info Kevin Traynor
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-03-03 18:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra, stephen,
	Kevin Traynor, stable

Add handling for epoll error and disconnect conditions EPOLLERR,
EPOLLHUP and EPOLLRDHUP.

These events indicate that the interrupt file descriptor is in
an error state or there has been a hangup.

Only do this for interrupts that are read in eal. Interrupts that
are read outside eal should deal with disconnect/error events
appropriate to their functionality. e.g. virtio interrupt handling
has reconnect mechanisms for some cases.

Also, treat no bytes read as an error condition.

Bugzilla ID: 1873
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/linux/eal_interrupts.c | 72 ++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 9db978923a..f3f6bdd01d 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -887,4 +887,26 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static void
+eal_intr_source_remove_and_free(struct rte_intr_source *src)
+{
+	struct rte_intr_callback *cb, *next;
+
+	/* Remove the interrupt source */
+	rte_spinlock_lock(&intr_lock);
+	TAILQ_REMOVE(&intr_sources, src, next);
+	rte_spinlock_unlock(&intr_lock);
+
+	/* Free callbacks */
+	for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
+		next = TAILQ_NEXT(cb, next);
+		TAILQ_REMOVE(&src->callbacks, cb, next);
+		free(cb);
+	}
+
+	/* Free the interrupt source */
+	rte_intr_instance_free(src->intr_handle);
+	free(src);
+}
+
 static int
 eal_intr_process_interrupts(struct epoll_event *events, int nfds)
@@ -952,40 +974,38 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 
 		if (bytes_read > 0) {
-			/**
+			/*
+			 * Check for epoll error or disconnect events for
+			 * interrupts that are read directly in eal.
+			 */
+			if (events[n].events & (EPOLLERR | EPOLLHUP | EPOLLRDHUP)) {
+				EAL_LOG(ERR, "Disconnect condition on fd %d "
+					"(events=0x%x), removing from epoll",
+					events[n].data.fd, events[n].events);
+				eal_intr_source_remove_and_free(src);
+				return -1;
+			}
+
+			/*
 			 * read out to clear the ready-to-be-read flag
 			 * for epoll_wait.
 			 */
 			bytes_read = read(events[n].data.fd, &buf, bytes_read);
-			if (bytes_read < 0) {
+			if (bytes_read > 0) {
+				call = true;
+			} else if (bytes_read < 0) {
 				if (errno == EINTR || errno == EWOULDBLOCK)
 					continue;
 
-				EAL_LOG(ERR, "Error reading from file "
-					"descriptor %d: %s",
+				EAL_LOG(ERR, "Error reading from file descriptor %d: %s",
 					events[n].data.fd,
 					strerror(errno));
-				/*
-				 * The device is unplugged or buggy, remove
-				 * it as an interrupt source and return to
-				 * force the wait list to be rebuilt.
-				 */
-				rte_spinlock_lock(&intr_lock);
-				TAILQ_REMOVE(&intr_sources, src, next);
-				rte_spinlock_unlock(&intr_lock);
-
-				for (cb = TAILQ_FIRST(&src->callbacks); cb;
-							cb = next) {
-					next = TAILQ_NEXT(cb, next);
-					TAILQ_REMOVE(&src->callbacks, cb, next);
-					free(cb);
-				}
-				rte_intr_instance_free(src->intr_handle);
-				free(src);
+			} else {
+				EAL_LOG(ERR, "Read nothing from file descriptor %d",
+					events[n].data.fd);
+			}
+			if (bytes_read <= 0) {
+				eal_intr_source_remove_and_free(src);
 				return -1;
-			} else if (bytes_read == 0)
-				EAL_LOG(ERR, "Read nothing from file "
-					"descriptor %d", events[n].data.fd);
-			else
-				call = true;
+			}
 		}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 2/3] eal/interrupt: add interrupt event info
  2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
  2026-03-03 18:58   ` [PATCH v5 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
@ 2026-03-03 18:58   ` Kevin Traynor
  2026-03-03 18:58   ` [PATCH v5 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
  2026-03-04 11:09   ` [PATCH v5 0/3] interrupt disconnect/error event handling David Marchand
  3 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-03-03 18:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra, stephen,
	Kevin Traynor, stable

Add RTE_INTR_EVENT_* defines and a new internal API
rte_intr_active_events_flags() in order to retrieve them.

As the events are in the context of the current interrupt,
rte_intr_active_events_flags() must be called from the context of
an interrupt callback.

Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/freebsd/eal_interrupts.c |  7 +++++++
 lib/eal/include/rte_interrupts.h | 21 +++++++++++++++++++++
 lib/eal/linux/eal_interrupts.c   | 31 ++++++++++++++++++++++++++++++-
 lib/eal/windows/eal_interrupts.c |  7 +++++++
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/lib/eal/freebsd/eal_interrupts.c b/lib/eal/freebsd/eal_interrupts.c
index 5c3ab6699e..bb2bc529f2 100644
--- a/lib/eal/freebsd/eal_interrupts.c
+++ b/lib/eal/freebsd/eal_interrupts.c
@@ -769,2 +769,9 @@ int rte_thread_is_intr(void)
 	return rte_thread_equal(intr_thread, rte_thread_self());
 }
+
+RTE_EXPORT_INTERNAL_SYMBOL(rte_intr_active_events_flags)
+uint32_t
+rte_intr_active_events_flags(void)
+{
+	return 0;
+}
diff --git a/lib/eal/include/rte_interrupts.h b/lib/eal/include/rte_interrupts.h
index 1b9a0b2a78..0ae16bbe19 100644
--- a/lib/eal/include/rte_interrupts.h
+++ b/lib/eal/include/rte_interrupts.h
@@ -40,4 +40,10 @@ struct rte_intr_handle;
 #define RTE_INTR_VEC_RXTX_OFFSET      1
 
+/** Interrupt event flags returned by rte_intr_active_events_flags() */
+#define RTE_INTR_EVENT_IN    RTE_BIT32(0)  /**< Data available to read */
+#define RTE_INTR_EVENT_ERR   RTE_BIT32(1)  /**< Error condition on fd */
+#define RTE_INTR_EVENT_HUP   RTE_BIT32(2)  /**< Hang up / disconnect */
+#define RTE_INTR_EVENT_RDHUP RTE_BIT32(3)  /**< Read hang up / disconnect */
+
 /**
  * The interrupt source type, e.g. UIO, VFIO, ALARM etc.
@@ -197,4 +203,19 @@ int rte_intr_ack(const struct rte_intr_handle *intr_handle);
 int rte_thread_is_intr(void);
 
+/**
+ * @internal
+ * Return the event flags for the interrupt currently being processed.
+ *
+ * Must be called from an interrupt callback running on the EAL
+ * interrupt thread. The returned value is a bitmask of
+ * RTE_INTR_EVENT_* flags.
+ *
+ * @return
+ *   Active event flags, or 0 if not in interrupt context or
+ *   on platforms that do not support this feature.
+ */
+__rte_internal
+uint32_t rte_intr_active_events_flags(void);
+
 /**
  * It allocates memory for interrupt instance. API takes flag as an argument
diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index f3f6bdd01d..5d0607effe 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -41,4 +41,6 @@
 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
 
+static uint32_t active_events; /**< events for active interrupt */
+
 /**
  * union for pipe fds.
@@ -887,4 +889,20 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static uint32_t
+epoll_to_intr_events(uint32_t epoll_events)
+{
+	uint32_t ev = 0;
+
+	if (epoll_events & EPOLLIN)
+		ev |= RTE_INTR_EVENT_IN;
+	if (epoll_events & EPOLLERR)
+		ev |= RTE_INTR_EVENT_ERR;
+	if (epoll_events & EPOLLHUP)
+		ev |= RTE_INTR_EVENT_HUP;
+	if (epoll_events & EPOLLRDHUP)
+		ev |= RTE_INTR_EVENT_RDHUP;
+	return ev;
+}
+
 static void
 eal_intr_source_remove_and_free(struct rte_intr_source *src)
@@ -1014,5 +1032,5 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 
 		if (call) {
-
+			active_events = epoll_to_intr_events(events[n].events);
 			/* Finally, call all callbacks. */
 			TAILQ_FOREACH(cb, &src->callbacks, next) {
@@ -1028,4 +1046,5 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				rte_spinlock_lock(&intr_lock);
 			}
+			active_events = 0;
 		}
 		/* we done with that interrupt source, release it. */
@@ -1642,2 +1661,12 @@ int rte_thread_is_intr(void)
 	return rte_thread_equal(intr_thread, rte_thread_self());
 }
+
+RTE_EXPORT_INTERNAL_SYMBOL(rte_intr_active_events_flags)
+uint32_t
+rte_intr_active_events_flags(void)
+{
+	if (rte_thread_is_intr())
+		return active_events;
+
+	return 0;
+}
diff --git a/lib/eal/windows/eal_interrupts.c b/lib/eal/windows/eal_interrupts.c
index 5ff30c7631..5650b84d1f 100644
--- a/lib/eal/windows/eal_interrupts.c
+++ b/lib/eal/windows/eal_interrupts.c
@@ -117,4 +117,11 @@ rte_thread_is_intr(void)
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(rte_intr_active_events_flags)
+uint32_t
+rte_intr_active_events_flags(void)
+{
+	return 0;
+}
+
 RTE_EXPORT_INTERNAL_SYMBOL(rte_intr_rx_ctl)
 int
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 3/3] net/mlx5: check devx disconnect/error interrupt events
  2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
  2026-03-03 18:58   ` [PATCH v5 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
  2026-03-03 18:58   ` [PATCH v5 2/3] eal/interrupt: add interrupt event info Kevin Traynor
@ 2026-03-03 18:58   ` Kevin Traynor
  2026-03-04 11:09   ` [PATCH v5 0/3] interrupt disconnect/error event handling David Marchand
  3 siblings, 0 replies; 33+ messages in thread
From: Kevin Traynor @ 2026-03-03 18:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, dsosnowski, viacheslavo, hkalra, stephen,
	Kevin Traynor, stable

A busy-loop may occur when there are disconnect/error events
such as EPOLLERR, EPOLLHUP or EPOLLRDHUP on Linux for the devx
interrupt fd.

This may happen if the interrupt fd is deleted, if the device
is unbound from mlx5_core kernel driver or if the device is
removed by the mlx5 kernel driver as part of LAG setup.

As the interrupt is not removed or condition reset, it causes
an interrupt processing busy-loop, which leads to the dpdk-intr
thread going to 100% CPU.

e.g.
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)

In order to prevent a busy-loop use the eal API
rte_intr_active_events_flags() to get the interrupt events and check
for disconnect/error.

If there is a disconnect/error event, unregister the devx callback.

Bugzilla ID: 1873
Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 18819a4a0f..4bbc590e91 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -860,4 +860,24 @@ mlx5_dev_interrupt_handler_devx(void *cb_arg)
 	} out;
 	uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+	uint32_t events = rte_intr_active_events_flags();
+
+	if (events & (RTE_INTR_EVENT_HUP | RTE_INTR_EVENT_RDHUP | RTE_INTR_EVENT_ERR)) {
+		/*
+		 * Disconnect or Error event that cannot be cleared by reading.
+		 * Unregister callback to prevent interrupt busy-looping.
+		 */
+		DRV_LOG(WARNING, "disconnect or error event for mlx5 devx interrupt on fd %d"
+			" (events=0x%x)",
+			rte_intr_fd_get(sh->intr_handle_devx), events);
+
+		if (rte_intr_callback_unregister_pending(sh->intr_handle_devx,
+							 mlx5_dev_interrupt_handler_devx,
+							 (void *)sh, NULL) < 0) {
+			DRV_LOG(WARNING,
+				"unable to unregister mlx5 devx interrupt callback on fd %d",
+				rte_intr_fd_get(sh->intr_handle_devx));
+		}
+		return;
+	}
 
 	while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 0/3] interrupt disconnect/error event handling
  2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
                     ` (2 preceding siblings ...)
  2026-03-03 18:58   ` [PATCH v5 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
@ 2026-03-04 11:09   ` David Marchand
  3 siblings, 0 replies; 33+ messages in thread
From: David Marchand @ 2026-03-04 11:09 UTC (permalink / raw)
  To: Kevin Traynor; +Cc: dev, thomas, dsosnowski, viacheslavo, hkalra, stephen

On Tue, 3 Mar 2026 at 19:58, Kevin Traynor <ktraynor@redhat.com> wrote:
>
> These patches are to fix some issues with epoll event handling for
> EPOLLERR/EPOLLRDHUP/EPOLLHUP.
>
> 1/3: handles these disconnect/error events for interrupts that are read
> in eal
>
> 2/3: provides an internal API for interrupt callbacks to get the interrupt
> events for the active interrupt
>
> 3/3: deal with the observed issue as reported in
> https://bugs.dpdk.org/show_bug.cgi?id=1873 where mlx5 devx interrupts
> cause a busy-loop and 100% CPU of dpdk-intr thread.
>
> v5:
> - 2/3 changed API to rte_intr_active_event_flags() and made internal.
>   used RTE_BIT32() for defines. Added Cc: stable. Kept Acks as no
>   functional changes.
>
> v4:
> Updated to allow for case where devx interrupt handler may handle
> multiple completions during one interrupt call, leading to no data being
> read in a subsequent call as flagged by Slava.
>
> - 1/3 No change
> - 2/3 New API rte_intr_active_events() to get interrupt events
> - 3/3 Use new API in mlx5 devx interrupt handler to detect if
>   disconnect/error events and if so unregister the callback
>
> v3:
> - 1/2 and 2/2 fix some coding nits (Stephen/AI/David)
> - 2/2 Make log level consistant (David)
>
> v2:
> - Only handle disconnect/error epoll events when the read is done in eal
>   interrupt code. This is to allow interrupt handlers like virtio deal
>   with disconnects in an appropriate
> - Detect if not data is read in the mlx dex interrupt and if so unregister
>   the callback
>
> Kevin Traynor (3):
>   eal/linux: handle interrupt epoll events
>   eal/interrupt: add interrupt event info
>   net/mlx5: check devx disconnect/error interrupt events
>
>  drivers/net/mlx5/linux/mlx5_ethdev_os.c |  20 +++++
>  lib/eal/freebsd/eal_interrupts.c        |   7 ++
>  lib/eal/include/rte_interrupts.h        |  21 +++++
>  lib/eal/linux/eal_interrupts.c          | 103 +++++++++++++++++-------
>  lib/eal/windows/eal_interrupts.c        |   7 ++
>  5 files changed, 131 insertions(+), 27 deletions(-)

Series applied, thanks Kevin.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-03-04 11:09 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
2026-01-29 12:51 ` Kevin Traynor
2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
2026-02-06 17:20   ` [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
2026-02-07  6:09     ` Stephen Hemminger
2026-02-10 15:05       ` Kevin Traynor
2026-02-10 17:05         ` Slava Ovsiienko
2026-02-10 19:07           ` Kevin Traynor
2026-02-10 20:58             ` Slava Ovsiienko
2026-02-19 14:44               ` Kevin Traynor
2026-02-06 17:20   ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-07  6:11     ` Stephen Hemminger
2026-02-10 13:35       ` Kevin Traynor
2026-02-10  9:17     ` David Marchand
2026-02-10 14:47       ` Kevin Traynor
2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
2026-02-10 18:06   ` [PATCH v3 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
2026-02-10 18:06   ` [PATCH v3 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-19 14:37 ` [PATCH v4 0/3] interrupt disconnect/error event handling Kevin Traynor
2026-02-19 14:38 ` Kevin Traynor
2026-02-19 14:38   ` [PATCH v4 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-19 14:38   ` [PATCH v4 2/3] eal/interrupt: add interrupt event info Kevin Traynor
2026-02-26 15:41     ` David Marchand
2026-03-02 11:47       ` Kevin Traynor
2026-02-19 14:38   ` [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
2026-03-03 16:16     ` Slava Ovsiienko
2026-02-19 18:52   ` [PATCH v4 0/3] interrupt disconnect/error event handling Stephen Hemminger
2026-03-02 11:41     ` Kevin Traynor
2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
2026-03-03 18:58   ` [PATCH v5 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
2026-03-03 18:58   ` [PATCH v5 2/3] eal/interrupt: add interrupt event info Kevin Traynor
2026-03-03 18:58   ` [PATCH v5 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
2026-03-04 11:09   ` [PATCH v5 0/3] interrupt disconnect/error event handling David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox