* [ptdma] pt_core_execute_cmd() from interrupt context results in panic
@ 2022-12-28 9:41 Eric Pilmore
2023-01-17 18:00 ` Eric Pilmore
0 siblings, 1 reply; 3+ messages in thread
From: Eric Pilmore @ 2022-12-28 9:41 UTC (permalink / raw)
To: Mehta, Sanju, Vinod, dmaengine; +Cc: Eric Pilmore
Wondering if this might be a known issue in the ptdma DMA driver. Did
not see anything obvious in bugzilla.
I am doing some testing of the ntb_netdev module in conjunction with
the ptdma module as the supporting DMA engines on an AMD Rome CPU
based platform. The ptdma driver being used is the latest code in the
Linux (6.2) repository.
There are no issues in doing simple ping operations across the
ntb_netdev (TCP/IP) interface, including sending large packets which
we know will cause the respective DMA engines to be utilized. However,
while doing iperf testing across the ntb_netdev interface, we have
encountered a panic:
[ 1626.776583] RIP: 0010:mutex_spin_on_owner+0x3b/0xa0
....
[ 1626.776588] Call Trace:
[ 1626.776588] <IRQ>
[ 1626.776589] __mutex_lock.isra.7+0xad/0x4c0
[ 1626.776589] ? ntb_transport_rx_enqueue+0x127/0x200 [ntb_transport]
[ 1626.776589] __mutex_lock_slowpath+0x13/0x20
[ 1626.776590] ? __mutex_lock_slowpath+0x13/0x20
[ 1626.776590] mutex_lock+0x2f/0x40
[ 1626.776590] pt_core_perform_passthru+0xc5/0x160 [ptdma]
[ 1626.776591] pt_cmd_callback.part.7+0x262/0x2d0 [ptdma]
[ 1626.776591] pt_cmd_callback+0x13/0x20 [ptdma]
[ 1626.776591] pt_check_status_trans+0xc3/0x120 [ptdma]
[ 1626.776592] pt_core_irq_handler+0x36/0x60 [ptdma]
[ 1626.776592] __handle_irq_event_percpu+0x44/0x1a0
[ 1626.776592] handle_irq_event_percpu+0x32/0x80
[ 1626.776593] handle_irq_event+0x3b/0x60
[ 1626.776593] handle_edge_irq+0x83/0x1a0
[ 1626.776593] handle_irq+0x20/0x30
[ 1626.776593] do_IRQ+0x50/0xe0
[ 1626.776594] common_interrupt+0xf/0xf
The issue is that the ptdma handlers are getting called in interrupt
context, and ultimately the flow leads to pt_core_execute_cmd() which
will attempt to grab a mutex, which is really not appropriate in
interrupt context. I have temporarily changed the lock in question to
a spinlock, which seems to have resolved the issue. However, I don't
know enough about the ptdma driver to really know if this is the
desired repair.
Hoping that others with more knowledge in this driver might be able to
comment as to the validity of this bug and whether a spinlock is the
correct approach here. If it is, I would be happy to submit a patch,
otherwise I can just file a bugzilla for the module owner to make a
more appropriate fix.
Thanks for any advice.
Eric Pilmore
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ptdma] pt_core_execute_cmd() from interrupt context results in panic
2022-12-28 9:41 [ptdma] pt_core_execute_cmd() from interrupt context results in panic Eric Pilmore
@ 2023-01-17 18:00 ` Eric Pilmore
2023-01-18 6:50 ` Vinod Koul
0 siblings, 1 reply; 3+ messages in thread
From: Eric Pilmore @ 2023-01-17 18:00 UTC (permalink / raw)
To: Mehta, Sanju, Vinod, dmaengine
On Wed, Dec 28, 2022 at 1:41 AM Eric Pilmore <epilmore@gigaio.com> wrote:
>
> Wondering if this might be a known issue in the ptdma DMA driver. Did
> not see anything obvious in bugzilla.
>
> I am doing some testing of the ntb_netdev module in conjunction with
> the ptdma module as the supporting DMA engines on an AMD Rome CPU
> based platform. The ptdma driver being used is the latest code in the
> Linux (6.2) repository.
>
> There are no issues in doing simple ping operations across the
> ntb_netdev (TCP/IP) interface, including sending large packets which
> we know will cause the respective DMA engines to be utilized. However,
> while doing iperf testing across the ntb_netdev interface, we have
> encountered a panic:
>
> [ 1626.776583] RIP: 0010:mutex_spin_on_owner+0x3b/0xa0
> ....
> [ 1626.776588] Call Trace:
> [ 1626.776588] <IRQ>
> [ 1626.776589] __mutex_lock.isra.7+0xad/0x4c0
> [ 1626.776589] ? ntb_transport_rx_enqueue+0x127/0x200 [ntb_transport]
> [ 1626.776589] __mutex_lock_slowpath+0x13/0x20
> [ 1626.776590] ? __mutex_lock_slowpath+0x13/0x20
> [ 1626.776590] mutex_lock+0x2f/0x40
> [ 1626.776590] pt_core_perform_passthru+0xc5/0x160 [ptdma]
> [ 1626.776591] pt_cmd_callback.part.7+0x262/0x2d0 [ptdma]
> [ 1626.776591] pt_cmd_callback+0x13/0x20 [ptdma]
> [ 1626.776591] pt_check_status_trans+0xc3/0x120 [ptdma]
> [ 1626.776592] pt_core_irq_handler+0x36/0x60 [ptdma]
> [ 1626.776592] __handle_irq_event_percpu+0x44/0x1a0
> [ 1626.776592] handle_irq_event_percpu+0x32/0x80
> [ 1626.776593] handle_irq_event+0x3b/0x60
> [ 1626.776593] handle_edge_irq+0x83/0x1a0
> [ 1626.776593] handle_irq+0x20/0x30
> [ 1626.776593] do_IRQ+0x50/0xe0
> [ 1626.776594] common_interrupt+0xf/0xf
>
> The issue is that the ptdma handlers are getting called in interrupt
> context, and ultimately the flow leads to pt_core_execute_cmd() which
> will attempt to grab a mutex, which is really not appropriate in
> interrupt context. I have temporarily changed the lock in question to
> a spinlock, which seems to have resolved the issue. However, I don't
> know enough about the ptdma driver to really know if this is the
> desired repair.
>
> Hoping that others with more knowledge in this driver might be able to
> comment as to the validity of this bug and whether a spinlock is the
> correct approach here. If it is, I would be happy to submit a patch,
> otherwise I can just file a bugzilla for the module owner to make a
> more appropriate fix.
>
> Thanks for any advice.
>
> Eric Pilmore
I haven't heard any further on this, so I filed a bugzilla so it
doesn't get lost.
Eric
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ptdma] pt_core_execute_cmd() from interrupt context results in panic
2023-01-17 18:00 ` Eric Pilmore
@ 2023-01-18 6:50 ` Vinod Koul
0 siblings, 0 replies; 3+ messages in thread
From: Vinod Koul @ 2023-01-18 6:50 UTC (permalink / raw)
To: Eric Pilmore; +Cc: Mehta, Sanju, dmaengine
On 17-01-23, 10:00, Eric Pilmore wrote:
> On Wed, Dec 28, 2022 at 1:41 AM Eric Pilmore <epilmore@gigaio.com> wrote:
> >
> > Wondering if this might be a known issue in the ptdma DMA driver. Did
> > not see anything obvious in bugzilla.
> >
> > I am doing some testing of the ntb_netdev module in conjunction with
> > the ptdma module as the supporting DMA engines on an AMD Rome CPU
> > based platform. The ptdma driver being used is the latest code in the
> > Linux (6.2) repository.
> >
> > There are no issues in doing simple ping operations across the
> > ntb_netdev (TCP/IP) interface, including sending large packets which
> > we know will cause the respective DMA engines to be utilized. However,
> > while doing iperf testing across the ntb_netdev interface, we have
> > encountered a panic:
> >
> > [ 1626.776583] RIP: 0010:mutex_spin_on_owner+0x3b/0xa0
> > ....
> > [ 1626.776588] Call Trace:
> > [ 1626.776588] <IRQ>
> > [ 1626.776589] __mutex_lock.isra.7+0xad/0x4c0
> > [ 1626.776589] ? ntb_transport_rx_enqueue+0x127/0x200 [ntb_transport]
> > [ 1626.776589] __mutex_lock_slowpath+0x13/0x20
> > [ 1626.776590] ? __mutex_lock_slowpath+0x13/0x20
> > [ 1626.776590] mutex_lock+0x2f/0x40
> > [ 1626.776590] pt_core_perform_passthru+0xc5/0x160 [ptdma]
> > [ 1626.776591] pt_cmd_callback.part.7+0x262/0x2d0 [ptdma]
> > [ 1626.776591] pt_cmd_callback+0x13/0x20 [ptdma]
> > [ 1626.776591] pt_check_status_trans+0xc3/0x120 [ptdma]
> > [ 1626.776592] pt_core_irq_handler+0x36/0x60 [ptdma]
> > [ 1626.776592] __handle_irq_event_percpu+0x44/0x1a0
> > [ 1626.776592] handle_irq_event_percpu+0x32/0x80
> > [ 1626.776593] handle_irq_event+0x3b/0x60
> > [ 1626.776593] handle_edge_irq+0x83/0x1a0
> > [ 1626.776593] handle_irq+0x20/0x30
> > [ 1626.776593] do_IRQ+0x50/0xe0
> > [ 1626.776594] common_interrupt+0xf/0xf
> >
> > The issue is that the ptdma handlers are getting called in interrupt
> > context, and ultimately the flow leads to pt_core_execute_cmd() which
> > will attempt to grab a mutex, which is really not appropriate in
> > interrupt context. I have temporarily changed the lock in question to
> > a spinlock, which seems to have resolved the issue. However, I don't
> > know enough about the ptdma driver to really know if this is the
> > desired repair.
> >
> > Hoping that others with more knowledge in this driver might be able to
> > comment as to the validity of this bug and whether a spinlock is the
> > correct approach here. If it is, I would be happy to submit a patch,
It is the right approach.. ISR needs to push descriptors and yes you
need to hold the lock for that..
Pls do send the patch, looks like AMD folks didnt bother
> > otherwise I can just file a bugzilla for the module owner to make a
> > more appropriate fix.
> >
> > Thanks for any advice.
> >
> > Eric Pilmore
>
> I haven't heard any further on this, so I filed a bugzilla so it
> doesn't get lost.
>
> Eric
--
~Vinod
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-01-18 7:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-28 9:41 [ptdma] pt_core_execute_cmd() from interrupt context results in panic Eric Pilmore
2023-01-17 18:00 ` Eric Pilmore
2023-01-18 6:50 ` Vinod Koul
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).