public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu: fix crash in report_iommu_fault()
@ 2025-04-08 21:33 Fedor Pchelkin
  2025-04-08 21:38 ` Jason Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Fedor Pchelkin @ 2025-04-08 21:33 UTC (permalink / raw)
  To: Joerg Roedel, Robin Murphy, Jason Gunthorpe
  Cc: Fedor Pchelkin, Will Deacon, Kevin Tian, Nicolin Chen, iommu,
	linux-kernel, lvc-project

The following crash is observed while handling an IOMMU fault with a
recent kernel:

kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle page fault for address: ffff8c708299f700
PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3
Oops: Oops: 0011 [#1] SMP NOPTI
CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy)
Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024
RIP: 0010:0xffff8c708299f700
Call Trace:
 <TASK>
 ? report_iommu_fault+0x78/0xd3
 ? amd_iommu_report_page_fault+0x91/0x150
 ? amd_iommu_int_thread+0x77/0x180
 ? __pfx_irq_thread_fn+0x10/0x10
 ? irq_thread_fn+0x23/0x60
 ? irq_thread+0xf9/0x1e0
 ? __pfx_irq_thread_dtor+0x10/0x10
 ? __pfx_irq_thread+0x10/0x10
 ? kthread+0xfc/0x240
 ? __pfx_kthread+0x10/0x10
 ? ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ? ret_from_fork_asm+0x1a/0x30
 </TASK>

report_iommu_fault() checks for an installed handler comparing the
corresponding field to NULL. It can (and could before) be called for a
domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically.
Cookie is represented as a union so we may end up with a garbage value
treated there if this happens for a domain with another cookie type.

Formerly there were two exclusive cookie types in the union.
IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault().

Call the fault handler only if the passed domain has a required cookie
type.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---

I've seen the discussion [1] on 6aa63a4ec947 ("iommu: Sort out domain user
data") and got a bit confused by the fragment:

> iommu-dma itself isn't ever going to use a fault 
> handler because it expects the DMA API to be used correctly and thus no 
> faults to occur.

My first thought about this is that iommu-dma is not interested in
installing a fault handler ever, okay. But why does it suppose that no
faults would occur? It is a matter of "chance", firmware bugs, abovesaid
DMA API abusing, etc... isn't it?

[1]: https://lore.kernel.org/linux-iommu/d9a6c611-2a19-4830-964d-44b711fffb08@arm.com/


BTW, the device in question is Realtek RTL8852BE PCIe Wireless Network
controller. It had occasionally dumped IO fault messages before the 6.15
changes but I didn't pay attention to them since there was no connectivity
problems observed or similar.

IOMMU group 16 03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8852BE PCIe 802.11ax Wireless Network Controller [10ec:b852]


[ 2628.582070] rtw89_8852be 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x0 flags=0x0000]

Turns out it started with a recent firmware upgrade so will report that
certain issue to rtw maintainers.

 drivers/iommu/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c8033ca66377..5729e8ecdda3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2717,7 +2717,8 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	 * if upper layers showed interest and installed a fault handler,
 	 * invoke it.
 	 */
-	if (domain->handler)
+	if (domain->cookie_type == IOMMU_COOKIE_FAULT_HANDLER &&
+	    domain->handler)
 		ret = domain->handler(domain, dev, iova, flags,
 						domain->handler_token);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu: fix crash in report_iommu_fault()
  2025-04-08 21:33 [PATCH] iommu: fix crash in report_iommu_fault() Fedor Pchelkin
@ 2025-04-08 21:38 ` Jason Gunthorpe
  2025-04-09 11:30   ` Robin Murphy
  2025-04-10  5:21 ` Tian, Kevin
  2025-04-11  7:05 ` Joerg Roedel
  2 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2025-04-08 21:38 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Joerg Roedel, Robin Murphy, Will Deacon, Kevin Tian, Nicolin Chen,
	iommu, linux-kernel, lvc-project

On Wed, Apr 09, 2025 at 12:33:41AM +0300, Fedor Pchelkin wrote:
> The following crash is observed while handling an IOMMU fault with a
> recent kernel:
> 
> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> BUG: unable to handle page fault for address: ffff8c708299f700
> PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3
> Oops: Oops: 0011 [#1] SMP NOPTI
> CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy)
> Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024
> RIP: 0010:0xffff8c708299f700
> Call Trace:
>  <TASK>
>  ? report_iommu_fault+0x78/0xd3
>  ? amd_iommu_report_page_fault+0x91/0x150
>  ? amd_iommu_int_thread+0x77/0x180
>  ? __pfx_irq_thread_fn+0x10/0x10
>  ? irq_thread_fn+0x23/0x60
>  ? irq_thread+0xf9/0x1e0
>  ? __pfx_irq_thread_dtor+0x10/0x10
>  ? __pfx_irq_thread+0x10/0x10
>  ? kthread+0xfc/0x240
>  ? __pfx_kthread+0x10/0x10
>  ? ret_from_fork+0x34/0x50
>  ? __pfx_kthread+0x10/0x10
>  ? ret_from_fork_asm+0x1a/0x30
>  </TASK>
> 
> report_iommu_fault() checks for an installed handler comparing the
> corresponding field to NULL. It can (and could before) be called for a
> domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically.
> Cookie is represented as a union so we may end up with a garbage value
> treated there if this happens for a domain with another cookie type.
> 
> Formerly there were two exclusive cookie types in the union.
> IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault().
> 
> Call the fault handler only if the passed domain has a required cookie
> type.
> 
> Found by Linux Verification Center (linuxtesting.org).
> 
> Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> ---

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

This should go to rc

> > iommu-dma itself isn't ever going to use a fault 
> > handler because it expects the DMA API to be used correctly and thus no 
> > faults to occur.
> 
> My first thought about this is that iommu-dma is not interested in
> installing a fault handler ever, okay. But why does it suppose that no
> faults would occur? It is a matter of "chance", firmware bugs, abovesaid
> DMA API abusing, etc... isn't it?

Yes, it should not happen, this driver is clearly buggy.

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu: fix crash in report_iommu_fault()
  2025-04-08 21:38 ` Jason Gunthorpe
@ 2025-04-09 11:30   ` Robin Murphy
  2025-04-10  5:29     ` Tian, Kevin
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2025-04-09 11:30 UTC (permalink / raw)
  To: Jason Gunthorpe, Fedor Pchelkin
  Cc: Joerg Roedel, Will Deacon, Kevin Tian, Nicolin Chen, iommu,
	linux-kernel, lvc-project

On 2025-04-08 10:38 pm, Jason Gunthorpe wrote:
> On Wed, Apr 09, 2025 at 12:33:41AM +0300, Fedor Pchelkin wrote:
>> The following crash is observed while handling an IOMMU fault with a
>> recent kernel:
>>
>> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> BUG: unable to handle page fault for address: ffff8c708299f700
>> PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3
>> Oops: Oops: 0011 [#1] SMP NOPTI
>> CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy)
>> Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024
>> RIP: 0010:0xffff8c708299f700
>> Call Trace:
>>   <TASK>
>>   ? report_iommu_fault+0x78/0xd3
>>   ? amd_iommu_report_page_fault+0x91/0x150
>>   ? amd_iommu_int_thread+0x77/0x180
>>   ? __pfx_irq_thread_fn+0x10/0x10
>>   ? irq_thread_fn+0x23/0x60
>>   ? irq_thread+0xf9/0x1e0
>>   ? __pfx_irq_thread_dtor+0x10/0x10
>>   ? __pfx_irq_thread+0x10/0x10
>>   ? kthread+0xfc/0x240
>>   ? __pfx_kthread+0x10/0x10
>>   ? ret_from_fork+0x34/0x50
>>   ? __pfx_kthread+0x10/0x10
>>   ? ret_from_fork_asm+0x1a/0x30
>>   </TASK>
>>
>> report_iommu_fault() checks for an installed handler comparing the
>> corresponding field to NULL. It can (and could before) be called for a
>> domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically.
>> Cookie is represented as a union so we may end up with a garbage value
>> treated there if this happens for a domain with another cookie type.
>>
>> Formerly there were two exclusive cookie types in the union.
>> IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault().
>>
>> Call the fault handler only if the passed domain has a required cookie
>> type.
>>
>> Found by Linux Verification Center (linuxtesting.org).
>>
>> Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
>> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
>> ---
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> This should go to rc
> 
>>> iommu-dma itself isn't ever going to use a fault
>>> handler because it expects the DMA API to be used correctly and thus no
>>> faults to occur.
>>
>> My first thought about this is that iommu-dma is not interested in
>> installing a fault handler ever, okay. But why does it suppose that no
>> faults would occur? It is a matter of "chance", firmware bugs, abovesaid
>> DMA API abusing, etc... isn't it?
> 
> Yes, it should not happen, this driver is clearly buggy.

Right, it's not that we assume no faults can occur at all, just that any 
faults are *unexpected* since they represent some device or driver doing 
something wrong in any number of ways, thus there is nothing a fault 
handler could reasonably do except print "a fault happened!", which 
every IOMMU driver is going to do anyway, therefore there is no need to 
support fault handlers on DMA domains.

But indeed it is now erroneous to be dereferencing domain->handler 
without checking that it is in fact a handler, sorry I missed that.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] iommu: fix crash in report_iommu_fault()
  2025-04-08 21:33 [PATCH] iommu: fix crash in report_iommu_fault() Fedor Pchelkin
  2025-04-08 21:38 ` Jason Gunthorpe
@ 2025-04-10  5:21 ` Tian, Kevin
  2025-04-11  7:05 ` Joerg Roedel
  2 siblings, 0 replies; 7+ messages in thread
From: Tian, Kevin @ 2025-04-10  5:21 UTC (permalink / raw)
  To: Fedor Pchelkin, Joerg Roedel, Robin Murphy, Jason Gunthorpe
  Cc: Will Deacon, Nicolin Chen, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org

> From: Fedor Pchelkin <pchelkin@ispras.ru>
> Sent: Wednesday, April 9, 2025 5:34 AM
> 
> The following crash is observed while handling an IOMMU fault with a
> recent kernel:
> 
> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> BUG: unable to handle page fault for address: ffff8c708299f700
> PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3
> Oops: Oops: 0011 [#1] SMP NOPTI
> CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20
> PREEMPT(lazy)
> Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024
> RIP: 0010:0xffff8c708299f700
> Call Trace:
>  <TASK>
>  ? report_iommu_fault+0x78/0xd3
>  ? amd_iommu_report_page_fault+0x91/0x150
>  ? amd_iommu_int_thread+0x77/0x180
>  ? __pfx_irq_thread_fn+0x10/0x10
>  ? irq_thread_fn+0x23/0x60
>  ? irq_thread+0xf9/0x1e0
>  ? __pfx_irq_thread_dtor+0x10/0x10
>  ? __pfx_irq_thread+0x10/0x10
>  ? kthread+0xfc/0x240
>  ? __pfx_kthread+0x10/0x10
>  ? ret_from_fork+0x34/0x50
>  ? __pfx_kthread+0x10/0x10
>  ? ret_from_fork_asm+0x1a/0x30
>  </TASK>
> 
> report_iommu_fault() checks for an installed handler comparing the
> corresponding field to NULL. It can (and could before) be called for a
> domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA,
> specifically.
> Cookie is represented as a union so we may end up with a garbage value
> treated there if this happens for a domain with another cookie type.
> 
> Formerly there were two exclusive cookie types in the union.
> IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault().
> 
> Call the fault handler only if the passed domain has a required cookie
> type.
> 
> Found by Linux Verification Center (linuxtesting.org).
> 
> Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] iommu: fix crash in report_iommu_fault()
  2025-04-09 11:30   ` Robin Murphy
@ 2025-04-10  5:29     ` Tian, Kevin
  2025-04-10 13:00       ` Jason Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Tian, Kevin @ 2025-04-10  5:29 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe, Fedor Pchelkin
  Cc: Joerg Roedel, Will Deacon, Nicolin Chen, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, lvc-project@linuxtesting.org

> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Wednesday, April 9, 2025 7:30 PM
> 
> On 2025-04-08 10:38 pm, Jason Gunthorpe wrote:
> > On Wed, Apr 09, 2025 at 12:33:41AM +0300, Fedor Pchelkin wrote:
> >> The following crash is observed while handling an IOMMU fault with a
> >> recent kernel:
> >>
> >> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >> BUG: unable to handle page fault for address: ffff8c708299f700
> >> PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3
> >> Oops: Oops: 0011 [#1] SMP NOPTI
> >> CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20
> PREEMPT(lazy)
> >> Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW
> 09/27/2024
> >> RIP: 0010:0xffff8c708299f700
> >> Call Trace:
> >>   <TASK>
> >>   ? report_iommu_fault+0x78/0xd3
> >>   ? amd_iommu_report_page_fault+0x91/0x150
> >>   ? amd_iommu_int_thread+0x77/0x180
> >>   ? __pfx_irq_thread_fn+0x10/0x10
> >>   ? irq_thread_fn+0x23/0x60
> >>   ? irq_thread+0xf9/0x1e0
> >>   ? __pfx_irq_thread_dtor+0x10/0x10
> >>   ? __pfx_irq_thread+0x10/0x10
> >>   ? kthread+0xfc/0x240
> >>   ? __pfx_kthread+0x10/0x10
> >>   ? ret_from_fork+0x34/0x50
> >>   ? __pfx_kthread+0x10/0x10
> >>   ? ret_from_fork_asm+0x1a/0x30
> >>   </TASK>
> >>
> >> report_iommu_fault() checks for an installed handler comparing the
> >> corresponding field to NULL. It can (and could before) be called for a
> >> domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA,
> specifically.
> >> Cookie is represented as a union so we may end up with a garbage value
> >> treated there if this happens for a domain with another cookie type.
> >>
> >> Formerly there were two exclusive cookie types in the union.
> >> IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault().
> >>
> >> Call the fault handler only if the passed domain has a required cookie
> >> type.
> >>
> >> Found by Linux Verification Center (linuxtesting.org).
> >>
> >> Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
> >> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
> >> ---
> >
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >
> > This should go to rc
> >
> >>> iommu-dma itself isn't ever going to use a fault
> >>> handler because it expects the DMA API to be used correctly and thus no
> >>> faults to occur.
> >>
> >> My first thought about this is that iommu-dma is not interested in
> >> installing a fault handler ever, okay. But why does it suppose that no
> >> faults would occur? It is a matter of "chance", firmware bugs, abovesaid
> >> DMA API abusing, etc... isn't it?
> >
> > Yes, it should not happen, this driver is clearly buggy.
> 
> Right, it's not that we assume no faults can occur at all, just that any
> faults are *unexpected* since they represent some device or driver doing
> something wrong in any number of ways, thus there is nothing a fault
> handler could reasonably do except print "a fault happened!", which
> every IOMMU driver is going to do anyway, therefore there is no need to
> support fault handlers on DMA domains.
> 
> But indeed it is now erroneous to be dereferencing domain->handler
> without checking that it is in fact a handler, sorry I missed that.
> 

btw an interesting observation. The comment of report_iommu_fault()
says:

 * This function should be called by the low-level IOMMU implementations
 * whenever IOMMU faults happen, to allow high-level users, that are
 * interested in such events, to know about them.

It sounds a general requirement to all IOMMU drivers, but in reality
only a subset of iommu drivers call it (e.g. intel/smmuv3 don't). So
there seems to be an implicit assumption from drivers on whether
the underlying IOMMU provides such facility...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu: fix crash in report_iommu_fault()
  2025-04-10  5:29     ` Tian, Kevin
@ 2025-04-10 13:00       ` Jason Gunthorpe
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2025-04-10 13:00 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Robin Murphy, Fedor Pchelkin, Joerg Roedel, Will Deacon,
	Nicolin Chen, iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	lvc-project@linuxtesting.org

On Thu, Apr 10, 2025 at 05:29:51AM +0000, Tian, Kevin wrote:

> It sounds a general requirement to all IOMMU drivers, but in reality
> only a subset of iommu drivers call it (e.g. intel/smmuv3 don't). So
> there seems to be an implicit assumption from drivers on whether
> the underlying IOMMU provides such facility...

Yeah, it's a bit wonky. And a different intersection calls the new
fault handling API instead :\

There are only 3 things using iommu_set_fault_handler():
 - drivers/gpu/drm/msm
   Seems to actually do restartable page faulting?? Maybe it needs to
   move to the PRI API..

 - drivers/infiniband/hw/usnic
   This just prints a log, we should remove it

 - drivers/remoteproc
   This prints a log and does some crash_handler

I just quickly typed in a small series to improve on this..

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] iommu: fix crash in report_iommu_fault()
  2025-04-08 21:33 [PATCH] iommu: fix crash in report_iommu_fault() Fedor Pchelkin
  2025-04-08 21:38 ` Jason Gunthorpe
  2025-04-10  5:21 ` Tian, Kevin
@ 2025-04-11  7:05 ` Joerg Roedel
  2 siblings, 0 replies; 7+ messages in thread
From: Joerg Roedel @ 2025-04-11  7:05 UTC (permalink / raw)
  To: Fedor Pchelkin
  Cc: Robin Murphy, Jason Gunthorpe, Will Deacon, Kevin Tian,
	Nicolin Chen, iommu, linux-kernel, lvc-project

On Wed, Apr 09, 2025 at 12:33:41AM +0300, Fedor Pchelkin wrote:
> Found by Linux Verification Center (linuxtesting.org).
> 
> Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>

Applied, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-04-11  7:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 21:33 [PATCH] iommu: fix crash in report_iommu_fault() Fedor Pchelkin
2025-04-08 21:38 ` Jason Gunthorpe
2025-04-09 11:30   ` Robin Murphy
2025-04-10  5:29     ` Tian, Kevin
2025-04-10 13:00       ` Jason Gunthorpe
2025-04-10  5:21 ` Tian, Kevin
2025-04-11  7:05 ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox