From: Kevin Traynor <ktraynor@redhat.com>
To: Slava Ovsiienko <viacheslavo@nvidia.com>,
Stephen Hemminger <stephen@networkplumber.org>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>,
"david.marchand@redhat.com" <david.marchand@redhat.com>,
Dariusz Sosnowski <dsosnowski@nvidia.com>,
"stable@dpdk.org" <stable@dpdk.org>,
Harman Kalra <hkalra@marvell.com>
Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
Date: Tue, 10 Feb 2026 19:07:52 +0000 [thread overview]
Message-ID: <fdf1a6ed-4505-4170-bb02-ea6104ab445c@redhat.com> (raw)
In-Reply-To: <DM4PR12MB60872D62FD8A3616C69E7BFCDF62A@DM4PR12MB6087.namprd12.prod.outlook.com>
On 10/02/2026 17:05, Slava Ovsiienko wrote:
> Hi,
>
Hi Slava,
> I'm sorry, I have some concern about the patch.
>
No problem, that's what reviews are for :-) thanks for reviewing.
> How it works, as far as I understand:
>
> - DPDK simulates interrupts in user mode with epoll_wait()
> - mlx5 PMD emits the async counter query command to the NIC periodically
I didn't think this would happen unless there was something like
hardware offload, but regardless, yes I agree there may be async counter
queries.
> - there might be multiple async query commands in the flight
> - kernel drivers handles the async query completion interrupts, pushes the token to the internal completion queue and unblocks associated fd
> - epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
> - mlx5 PMD reads the completion token from the kernel queue with devx_get_async_cmd_comp()
>
> The concern scenario, let's assume:
>
> - we have 2 async query commands in the flight
> - the first async query completes, fd is unblocked, PMD is inviked, the completion is read by PMD and is being handled
> - the second async query completes, fd gets unblocked, the second token is written to the queue
> - the PMD completes the handling of the first completion and reads the queue again (with devx_get_async_cmd_comp() call in the loop)
> - it reads the second token successfully and handles
> - then, on the third call, devx_get_async_cmd_comp() returns EAGAIN, it means queue is empty
> - DPDK calls epoll_wait() again and sees unblocked fd
> - it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue is empty (handled in previous interrupt handling)
> - with the patch we wrongly remove the handler
>
I'm not sure, but this ^^^ sounds feasible.
> In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU from the epoll_wait()_return also for
> RTE_INTR_HANDLE_EXT and RTE_INTR_HANDLE_DEV_EVENT interrupt types.
>
That's exactly what I had in v1 of the patch! The issue is that some
clients of eal interrupt may not interpret the condition of
EPOLLHUP/EPOLLRDHUP as an error condition and/or want to do some special
handling.
The example is vhost user server, which puts in place a reconnect
mechanism. If we filter out EPOLLHUP/EPOLLRDHUP events in eal, then
virtio will not receive the callback and vhost server reconnect is
broken. I have some more notes about it in the cover letter.
Trying to base on the read pattern in devx handler was an attempt to
move logic out of eal so different handlers could be flexible in how
they handle this condition.
We do have a distinction in that mlx5 uses RTE_INTR_HANDLE_EXT and
virtio uses RTE_INTR_HANDLE_VDEV but i'm not sure that is generic enough
to base a check/don't check for EPOLLHUP/EPOLLRDHUP events on.
So we'd need to come up with another solution if we wanted to filter
this in eal. Let's think more on this, though we are a bit constrained
by public API as well.
A workaround we can do from application is David's hack™ "-a
0000:00:00.0" to skip initial probe. That will at least prevent the
issue for mlx devices not used in DPDK, which was the scenario reported.
thanks,
Kevin.
> With best regards,
> Slava
>
>
>
>
>> -----Original Message-----
>> From: Kevin Traynor <ktraynor@redhat.com>
>> Sent: Tuesday, February 10, 2026 5:06 PM
>> To: Stephen Hemminger <stephen@networkplumber.org>
>> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
>> <thomas@monjalon.net>; david.marchand@redhat.com; Dariusz Sosnowski
>> <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
>> stable@dpdk.org
>> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
>>
>> On 07/02/2026 06:09, Stephen Hemminger wrote:
>>> On Fri, 6 Feb 2026 17:20:53 +0000
>>> Kevin Traynor <ktraynor@redhat.com> wrote:
>>>
>>>> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
>> EPOLLRDHUP
>>>> epoll events for the devx interrupt fd.
>>>>
>>>> This may happen if the interrupt fd is deleted, if the device is
>>>> unbound from mlx5_core kernel driver or if the device is removed by
>>>> the mlx5 kernel driver as part of LAG setup.
>>>>
>>>> When that occurs, there is no data to be read and in the devx
>>>> interrupt handler an EAGAIN is returned on the first call to
>>>> devx_get_async_cmd_comp, but this is not checked.
>>>>
>>>> As the interrupt is not removed or condition reset, it causes an
>>>> interrupt processing busy-loop, which leads to the dpdk-intr thread
>>>> going to 100% CPU.
>>>>
>>>> e.g.
>>>> epoll_wait
>>>> (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
>>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
>>>> = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
>>>> (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
>>>> 1 read(28, 0x7f1f5c7fc2f0, 40)
>>>> = -1 EAGAIN (Resource temporarily unavailable)
>>>>
>>>> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
>>>> first read. If that happens, unregister the callback to prevent looping.
>>>>
>>>> Bugzilla ID: 1873
>>>> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>>>
>>> AI spotted this, I didn't...
>>>
>>>
>>> Errors:
>>>
>>> Line 139: Unnecessary semicolon after closing brace
>>>
>>> c
>>>
>>> };
>>>
>>> Should be:
>>> c
>>>
>>> }
>>>
>>> Lines 142-146: Block comment uses incorrect style Block comments in C
>> code should use /* and */ style, not /** which is reserved for documentation
>> comments.
>>>
>>> c
>>>
>>> /**
>>> * no data and EAGAIN indicate there is an error or
>>> * disconnect state. Unregister callback to prevent
>>> * interrupt busy-looping.
>>> */
>>>
>>> Should be:
>>> c
>>>
>>> /*
>>> * no data and EAGAIN indicate there is an error or
>>> * disconnect state. Unregister callback to prevent
>>> * interrupt busy-looping.
>>> */
>>>
>>> Warnings:
>>>
>>> Logic clarity: The variable data_read is set to true inside the while loop but
>> never checked when data WAS read. Consider if data_read is the clearest way to
>> express this condition.
>>>
>>
>> Ack above. Thanks.Will be fixed in v3.
>
next prev parent reply other threads:[~2026-02-10 19:14 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-28 12:20 [PATCH] eal/linux: handle epoll error conditions Kevin Traynor
2026-01-29 12:51 ` Kevin Traynor
2026-02-06 17:20 ` [PATCH v2 0/2] interrupt epoll event handling Kevin Traynor
2026-02-06 17:20 ` [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
2026-02-07 6:09 ` Stephen Hemminger
2026-02-10 15:05 ` Kevin Traynor
2026-02-10 17:05 ` Slava Ovsiienko
2026-02-10 19:07 ` Kevin Traynor [this message]
2026-02-10 20:58 ` Slava Ovsiienko
2026-02-19 14:44 ` Kevin Traynor
2026-02-06 17:20 ` [PATCH v2 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-07 6:11 ` Stephen Hemminger
2026-02-10 13:35 ` Kevin Traynor
2026-02-10 9:17 ` David Marchand
2026-02-10 14:47 ` Kevin Traynor
2026-02-10 18:06 ` [PATCH v3 0/2] interrupt epoll event handling Kevin Traynor
2026-02-10 18:06 ` [PATCH v3 1/2] net/mlx5: check for no data read in devx interrupt Kevin Traynor
2026-02-10 18:06 ` [PATCH v3 2/2] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-19 14:37 ` [PATCH v4 0/3] interrupt disconnect/error event handling Kevin Traynor
2026-02-19 14:38 ` Kevin Traynor
2026-02-19 14:38 ` [PATCH v4 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
2026-02-19 14:38 ` [PATCH v4 2/3] eal/interrupt: add interrupt event info Kevin Traynor
2026-02-26 15:41 ` David Marchand
2026-03-02 11:47 ` Kevin Traynor
2026-02-19 14:38 ` [PATCH v4 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
2026-03-03 16:16 ` Slava Ovsiienko
2026-02-19 18:52 ` [PATCH v4 0/3] interrupt disconnect/error event handling Stephen Hemminger
2026-03-02 11:41 ` Kevin Traynor
2026-03-03 18:58 ` [PATCH v5 " Kevin Traynor
2026-03-03 18:58 ` [PATCH v5 1/3] eal/linux: handle interrupt epoll events Kevin Traynor
2026-03-03 18:58 ` [PATCH v5 2/3] eal/interrupt: add interrupt event info Kevin Traynor
2026-03-03 18:58 ` [PATCH v5 3/3] net/mlx5: check devx disconnect/error interrupt events Kevin Traynor
2026-03-04 11:09 ` [PATCH v5 0/3] interrupt disconnect/error event handling David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fdf1a6ed-4505-4170-bb02-ea6104ab445c@redhat.com \
--to=ktraynor@redhat.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=hkalra@marvell.com \
--cc=stable@dpdk.org \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
--cc=viacheslavo@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox