public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Kevin Traynor <ktraynor@redhat.com>
To: dev@dpdk.org
Cc: thomas@monjalon.net, david.marchand@redhat.com,
	dsosnowski@nvidia.com, viacheslavo@nvidia.com,
	Kevin Traynor <ktraynor@redhat.com>,
	stable@dpdk.org
Subject: [PATCH] eal/linux: handle epoll error conditions
Date: Wed, 28 Jan 2026 12:20:55 +0000	[thread overview]
Message-ID: <20260128122055.192104-1-ktraynor@redhat.com> (raw)

Add handling for epoll error conditions EPOLLERR, EPOLLHUP and
EPOLLRDHUP. These events indicate that the interrupt file descriptor
is in an error state or there has been a hangup.

This may happen when the interrupt file descriptor is deleted or for
mlx5 devices when the device is unbound from mlx5 kernel driver or if
the device is removed by the mlx5 kernel driver as part of LAG setup.

Previously, the interrupts were being read, but the condition was not
cleared and that may lead to an interrupt continuing to fire and a
busy-loop processing it.

Now when this condition is detected, an error message is logged and the
interrupt is removed to prevent busy-looping.

Also cover the case where no bytes are read even though the epoll has
indicated there is something to read.

Bugzilla ID: 1873
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/eal/linux/eal_interrupts.c | 63 ++++++++++++++++++++++++----------
 1 file changed, 44 insertions(+), 19 deletions(-)

diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 9db978923a..eedc75d776 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -887,4 +887,21 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 }
 
+static void
+eal_intr_source_free(struct rte_intr_source *src)
+{
+	struct rte_intr_callback *cb, *next;
+
+	/* Free all callbacks */
+	for (cb = TAILQ_FIRST(&src->callbacks); cb; cb = next) {
+		next = TAILQ_NEXT(cb, next);
+		TAILQ_REMOVE(&src->callbacks, cb, next);
+		free(cb);
+	}
+
+	/* Free the interrupt source */
+	rte_intr_instance_free(src->intr_handle);
+	free(src);
+}
+
 static int
 eal_intr_process_interrupts(struct epoll_event *events, int nfds)
@@ -918,4 +935,21 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		}
 
+		/* Check for error conditions on the fd before processing. */
+		if (events[n].events & (EPOLLRDHUP | EPOLLERR | EPOLLHUP)) {
+			EAL_LOG(WARNING, "Disconnect condition on fd %d "
+				"(events=0x%x), removing from epoll",
+				events[n].data.fd, events[n].events);
+			/*
+			 * There is an error or a hangup. Remove the
+			 * interrupt source and return to force the wait list
+			 * to be rebuilt.
+			 */
+			TAILQ_REMOVE(&intr_sources, src, next);
+			rte_spinlock_unlock(&intr_lock);
+
+			eal_intr_source_free(src);
+			return -1;
+		}
+
 		/* mark this interrupt source as active and release the lock. */
 		src->active = 1;
@@ -957,5 +991,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			 */
 			bytes_read = read(events[n].data.fd, &buf, bytes_read);
-			if (bytes_read < 0) {
+			if (bytes_read > 0) {
+				call = true;
+			} else if (bytes_read < 0) {
 				if (errno == EINTR || errno == EWOULDBLOCK)
 					continue;
@@ -965,27 +1001,16 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 					events[n].data.fd,
 					strerror(errno));
-				/*
-				 * The device is unplugged or buggy, remove
-				 * it as an interrupt source and return to
-				 * force the wait list to be rebuilt.
-				 */
+			} else { /* bytes == 0 */
+				EAL_LOG(WARNING, "Read nothing from file "
+					"descriptor %d", events[n].data.fd);
+			}
+			if (bytes_read <= 0) {
 				rte_spinlock_lock(&intr_lock);
 				TAILQ_REMOVE(&intr_sources, src, next);
 				rte_spinlock_unlock(&intr_lock);
 
-				for (cb = TAILQ_FIRST(&src->callbacks); cb;
-							cb = next) {
-					next = TAILQ_NEXT(cb, next);
-					TAILQ_REMOVE(&src->callbacks, cb, next);
-					free(cb);
-				}
-				rte_intr_instance_free(src->intr_handle);
-				free(src);
+				eal_intr_source_free(src);
 				return -1;
-			} else if (bytes_read == 0)
-				EAL_LOG(ERR, "Read nothing from file "
-					"descriptor %d", events[n].data.fd);
-			else
-				call = true;
+			}
 		}
 
-- 
2.52.0


             reply	other threads:[~2026-01-28 12:21 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-28 12:20 Kevin Traynor [this message]
2026-01-29 12:51 ` [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
2026-02-06 17:20   ` [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
2026-02-07  6:09     ` Stephen Hemminger
2026-02-10 15:05       ` Kevin Traynor
2026-02-10 17:05         ` Slava Ovsiienko
2026-02-10 19:07           ` Kevin Traynor
2026-02-10 20:58             ` Slava Ovsiienko
2026-02-19 14:44               ` Kevin Traynor
2026-02-06 17:20   ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-07  6:11     ` Stephen Hemminger
2026-02-10 13:35       ` Kevin Traynor
2026-02-10  9:17     ` David Marchand
2026-02-10 14:47       ` Kevin Traynor
2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
2026-02-10 18:06   ` [PATCH v3 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
2026-02-10 18:06   ` [PATCH v3 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-19 14:37 ` [PATCH v4 0/3] interrupt disconnect/error event handling Kevin Traynor
2026-02-19 14:38 ` Kevin Traynor
2026-02-19 14:38   ` [PATCH v4 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-19 14:38   ` [PATCH v4 2/3] eal/interrupt: add interrupt event info Kevin Traynor
2026-02-26 15:41     ` David Marchand
2026-03-02 11:47       ` Kevin Traynor
2026-02-19 14:38   ` [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
2026-03-03 16:16     ` Slava Ovsiienko
2026-02-19 18:52   ` [PATCH v4 0/3] interrupt disconnect/error event handling Stephen Hemminger
2026-03-02 11:41     ` Kevin Traynor
2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
2026-03-03 18:58   ` [PATCH v5 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
2026-03-03 18:58   ` [PATCH v5 2/3] eal/interrupt: add interrupt event info Kevin Traynor
2026-03-03 18:58   ` [PATCH v5 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
2026-03-04 11:09   ` [PATCH v5 0/3] interrupt disconnect/error event handling David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260128122055.192104-1-ktraynor@redhat.com \
    --to=ktraynor@redhat.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox