From: Lennert Buytenhek <buytenh@wantstofly.org>
To: "Suthikulpanit, Suravee" <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Subject: Re: [PATCH v2] iommu/amd: Use report_iommu_fault()
Date: Sat, 25 Sep 2021 17:09:40 +0300 [thread overview]
Message-ID: <YU8tpKhnehH34niv@wantstofly.org> (raw)
In-Reply-To: <YSEfZyrLSWR0gFDu@wantstofly.org>
On Sat, Aug 21, 2021 at 06:44:39PM +0300, Lennert Buytenhek wrote:
> > > - EVENT_FLAG_I unset, but the request was a translation request
> > > (EVENT_FLAG_TR set) or the target page was not present
> > > (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW
> > > bit will be invalid, so don't try to map it to a
> > > IOMMU_FAULT_{READ,WRITE} code
> >
> > So, why do we need to call report_iommu_fault() for this case?
> > My understanding is we only have IOMMU_FAULT_[READ|WRITE].
> > So, if we can't identify whether the DMA is read / write,
> > we should not need to call report_iommu_fauilt(), is it?
>
> I don't think that we should just altogether avoid logging the subset
> of page faults for which we can't determine the read/write direction
> on AMD platforms.
>
> E.g. "access to an unmapped address" (which will have PR=0, and thus we
> won't know if it was a read or a write access) is just as much of a page
> fault as "write to a read-only page" (which will have PR=1, and thus the
> RW bit will be accurate) is, and for RAS purposes, both events are
> equally interesting, and important to know about.
>
> It's true that we currently don't have a way of signaling to
> report_iommu_fault() (and by extension, to the io_page_fault
> tracepoint) that we're not sure whether the offending access was a read
> or a write, but I think we can just add a bit to include/linux/iommu.h
> to indicate that, something along the lines of:
>
> /* iommu fault flags */
> #define IOMMU_FAULT_READ 0x0
> #define IOMMU_FAULT_WRITE 0x1
> +#define IOMMU_FAULT_RW_UNKNOWN 0x2
>
> (Cc'ing Ohad Ben-Cohen, who originally added this API.)
>
> I don't think that it would be a good idea to just not signal the page
> faults for which we don't know the read/write direction.
I had another look at this, and from some testing, it seems that,
contrary to what the datasheet suggests, the RW (read/write direction)
bit in logged I/O page faults is actually accurate for both faulting
accesses to present pages and faulting accesses to non-present pages.
I made a few hacks to the ixgbe driver to intentionally cause several
different types of I/O page faults, thereby turning an ixgbe NIC into
a poor man's I/O page fault generator, and these are the resulting
logged I/O page faults, on a Ryzen 3700X system:
Read from non-present page:
ixgbe 0000:25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0x1bbc8f040 flags=0x0000]
=> flags indicate PR(esent)=0, RW=0
Write to non-present page:
ixgbe 0000:25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0x1bdcb70c0 flags=0x0020]
=> flags indicate PR(esent)=0, RW=1
Read from write-only page:
ixgbe 0000:25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xbbcc1c40 flags=0x0050]
=> flags indicate PR(esent)=1, RW=0 (and PE(rmission violation)=1)
Write to read-only page:
ixgbe 0000:25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xbdcb70c0 flags=0x0070]
=> flags indicate PR(esent)=1, RW=1 (and PE(rmission violation)=1)
In other words, it seems that the RW bit is reliable even for PR=0 type
faults.
I assume that the restriction mentioned in the docs regarding RW and
PR ("RW is only meaningful when PR=1, TR=0, and I=0" from AMD I/O
Virtualization Technology (IOMMU) Specification, revision 3.00, Table
55: IO_PAGE_FAULT Event Log Buffer Entry Fields) is either confused
or refers to a restriction in older hardware that has since been lifted.
I'll resubmit the patch to unconditionally pass through the RW bit.
ixgbe hacks:
to cause reads from non-present pages (in the TX path):
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8232,7 +8232,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
dma_unmap_len_set(tx_buffer, len, size);
dma_unmap_addr_set(tx_buffer, dma, dma);
- tx_desc->read.buffer_addr = cpu_to_le64(dma);
+ tx_desc->read.buffer_addr = cpu_to_le64(dma + BIT(32));
while (unlikely(size > IXGBE_MAX_DATA_PER_TXD)) {
tx_desc->read.cmd_type_len =
to cause writes to non-present pages (in the RX path):
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1604,7 +1604,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
*/
- rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
+ rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset + BIT(32));
rx_desc++;
bi++;
to cause reads from write-only pages (in the TX path):
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8220,7 +8220,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
}
#endif
- dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
+ dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_FROM_DEVICE);
tx_buffer = first;
to cause writes to read-only pages (in the RX path):
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1545,7 +1545,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
/* map page for use */
dma = dma_map_page_attrs(rx_ring->dev, page, 0,
ixgbe_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
+ DMA_TO_DEVICE,
IXGBE_RX_DMA_ATTR);
/*
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
prev parent reply other threads:[~2021-09-25 14:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-26 16:31 [PATCH v2] iommu/amd: Use report_iommu_fault() Lennert Buytenhek
2021-07-28 21:51 ` Suthikulpanit, Suravee via iommu
2021-07-30 2:32 ` Lennert Buytenhek
2021-08-03 16:20 ` Lennert Buytenhek
2021-08-05 16:26 ` Suthikulpanit, Suravee via iommu
2021-08-21 15:44 ` Lennert Buytenhek
2021-09-25 14:09 ` Lennert Buytenhek [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YU8tpKhnehH34niv@wantstofly.org \
--to=buytenh@wantstofly.org \
--cc=iommu@lists.linux-foundation.org \
--cc=suravee.suthikulpanit@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox