* [PATCH v2 0/2] iommu/vt-d: Fault logging improvements
@ 2016-03-17 20:12 Alex Williamson
2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw)
To: iommu, dwmw2; +Cc: joe, joro, linux-kernel
Ratelimit and improve formatting.
v2:
- Use a single ratelimit state as suggested by Joe Perches, except
I chose to move it up to dmar_fault() so that it includes the
"handling fault status reg" pr_err and we can avoid collecting
entries for logging if we don't plan to print them.
- Added reformatting changes suggested by Joe Perches.
- While there is clearly more that could be done with disabling
fault handling for specific context entries on storm and sending
errors to drivers, this makes a marked improvement on its own.
Thanks,
Alex
---
Alex Williamson (2):
iommu/vt-d: Ratelimit fault handler
iommu/vt-d: Improve fault handler error messages
drivers/iommu/dmar.c | 47 +++++++++++++++++++++++++++--------------------
1 file changed, 27 insertions(+), 20 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler 2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson @ 2016-03-17 20:12 ` Alex Williamson 2016-03-17 20:33 ` Joe Perches [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org> 2016-03-17 20:25 ` Joe Perches 2 siblings, 1 reply; 10+ messages in thread From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw) To: iommu, dwmw2; +Cc: joe, joro, linux-kernel Fault rates can easily overwhelm the console and make the system unresponsive. Ratelimit to allow an opportunity for maintenance. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/iommu/dmar.c | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 8ffd756..8f8bfff 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id) int reg, fault_index; u32 fault_status; unsigned long flag; + bool ratelimited; + static DEFINE_RATELIMIT_STATE(rs, + DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + + /* Disable printing, simply clear the fault when ratelimited */ + ratelimited = !__ratelimit(&rs); raw_spin_lock_irqsave(&iommu->register_lock, flag); fault_status = readl(iommu->reg + DMAR_FSTS_REG); - if (fault_status) + if (fault_status && !ratelimited) pr_err("DRHD: handling fault status reg %x\n", fault_status); /* TBD: ignore advanced fault log currently */ @@ -1627,24 +1634,28 @@ irqreturn_t dmar_fault(int irq, void *dev_id) if (!(data & DMA_FRCD_F)) break; - fault_reason = dma_frcd_fault_reason(data); - type = dma_frcd_type(data); + if (!ratelimited) { + fault_reason = dma_frcd_fault_reason(data); + type = dma_frcd_type(data); - data = readl(iommu->reg + reg + - fault_index * PRIMARY_FAULT_REG_LEN + 8); - source_id = dma_frcd_source_id(data); + data = readl(iommu->reg + reg + + fault_index * PRIMARY_FAULT_REG_LEN + 8); + source_id = dma_frcd_source_id(data); + + guest_addr = dmar_readq(iommu->reg + reg + + fault_index * PRIMARY_FAULT_REG_LEN); + guest_addr = dma_frcd_page_addr(guest_addr); + } - guest_addr = dmar_readq(iommu->reg + reg + - fault_index * PRIMARY_FAULT_REG_LEN); - guest_addr = dma_frcd_page_addr(guest_addr); /* clear the fault */ writel(DMA_FRCD_F, iommu->reg + reg + fault_index * PRIMARY_FAULT_REG_LEN + 12); raw_spin_unlock_irqrestore(&iommu->register_lock, flag); - dmar_fault_do_one(iommu, type, fault_reason, - source_id, guest_addr); + if (!ratelimited) + dmar_fault_do_one(iommu, type, fault_reason, + source_id, guest_addr); fault_index++; if (fault_index >= cap_num_fault_regs(iommu->cap)) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler 2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson @ 2016-03-17 20:33 ` Joe Perches [not found] ` <1458246810.9556.22.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Joe Perches @ 2016-03-17 20:33 UTC (permalink / raw) To: Alex Williamson, iommu, dwmw2; +Cc: joro, linux-kernel On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote: > Fault rates can easily overwhelm the console and make the system > unresponsive. Ratelimit to allow an opportunity for maintenance. [] > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c [] > @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > int reg, fault_index; > u32 fault_status; > unsigned long flag; > + bool ratelimited; > + static DEFINE_RATELIMIT_STATE(rs, > + DEFAULT_RATELIMIT_INTERVAL, > + DEFAULT_RATELIMIT_BURST); Are these the appropriate limits for dmar? include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ) include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURST 10 ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1458246810.9556.22.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>]
* Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler 2016-03-17 20:33 ` Joe Perches @ 2016-03-17 20:46 ` Alex Williamson 0 siblings, 0 replies; 10+ messages in thread From: Alex Williamson @ 2016-03-17 20:46 UTC (permalink / raw) To: Joe Perches Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, 17 Mar 2016 13:33:30 -0700 Joe Perches <joe@perches.com> wrote: > On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote: > > Fault rates can easily overwhelm the console and make the system > > unresponsive. Ratelimit to allow an opportunity for maintenance. > [] > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > [] > > @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > > int reg, fault_index; > > u32 fault_status; > > unsigned long flag; > > + bool ratelimited; > > + static DEFINE_RATELIMIT_STATE(rs, > > + DEFAULT_RATELIMIT_INTERVAL, > > + DEFAULT_RATELIMIT_BURST); > > Are these the appropriate limits for dmar? > > include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ) > include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURST 10 They seem OK to me, I've got a test running that continuously generates DMA read faults and I get 20 lines of log every 5 seconds. That seems like enough to know there's an issue, it's ongoing, and maybe see some patterns in the fault addresses. I expect we could turn up the burst value but generally when I'm looking at the logs I'm only looking for things like is it a single target address, is it a sequential address, or what's the general address space to know if it should or should not be a valid fault address. Thanks, Alex _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler @ 2016-03-17 20:46 ` Alex Williamson 0 siblings, 0 replies; 10+ messages in thread From: Alex Williamson @ 2016-03-17 20:46 UTC (permalink / raw) To: Joe Perches; +Cc: iommu, dwmw2, joro, linux-kernel On Thu, 17 Mar 2016 13:33:30 -0700 Joe Perches <joe@perches.com> wrote: > On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote: > > Fault rates can easily overwhelm the console and make the system > > unresponsive. Ratelimit to allow an opportunity for maintenance. > [] > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > [] > > @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > > int reg, fault_index; > > u32 fault_status; > > unsigned long flag; > > + bool ratelimited; > > + static DEFINE_RATELIMIT_STATE(rs, > > + DEFAULT_RATELIMIT_INTERVAL, > > + DEFAULT_RATELIMIT_BURST); > > Are these the appropriate limits for dmar? > > include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ) > include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURST 10 They seem OK to me, I've got a test running that continuously generates DMA read faults and I get 20 lines of log every 5 seconds. That seems like enough to know there's an issue, it's ongoing, and maybe see some patterns in the fault addresses. I expect we could turn up the burst value but generally when I'm looking at the logs I'm only looking for things like is it a single target address, is it a sequential address, or what's the general address space to know if it should or should not be a valid fault address. Thanks, Alex ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>]
* [PATCH v2 2/2] iommu/vt-d: Improve fault handler error messages 2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson @ 2016-03-17 20:12 ` Alex Williamson [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org> 2016-03-17 20:25 ` Joe Perches 2 siblings, 0 replies; 10+ messages in thread From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ Cc: joe-6d6DIl74uiNBDgjK7y7TUQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA Remove new line in error logs, avoid duplicate and explicit pr_fmt. Suggested-by: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- drivers/iommu/dmar.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 8f8bfff..6a86b5d 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1579,18 +1579,14 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, reason = dmar_get_fault_reason(fault_reason, &fault_type); if (fault_type == INTR_REMAP) - pr_err("INTR-REMAP: Request device [[%02x:%02x.%d] " - "fault index %llx\n" - "INTR-REMAP:[fault reason %02d] %s\n", - (source_id >> 8), PCI_SLOT(source_id & 0xFF), + pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index %llx [fault reason %02d] %s\n", + source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr >> 48, fault_reason, reason); else - pr_err("DMAR:[%s] Request device [%02x:%02x.%d] " - "fault addr %llx \n" - "DMAR:[fault reason %02d] %s\n", - (type ? "DMA Read" : "DMA Write"), - (source_id >> 8), PCI_SLOT(source_id & 0xFF), + pr_err("[%s] Request device [%02x:%02x.%d] fault addr %llx [fault reason %02d] %s\n", + type ? "DMA Read" : "DMA Write", + source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); return 0; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] iommu/vt-d: Improve fault handler error messages @ 2016-03-17 20:12 ` Alex Williamson 0 siblings, 0 replies; 10+ messages in thread From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw) To: iommu, dwmw2; +Cc: joe, joro, linux-kernel Remove new line in error logs, avoid duplicate and explicit pr_fmt. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/iommu/dmar.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 8f8bfff..6a86b5d 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1579,18 +1579,14 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, reason = dmar_get_fault_reason(fault_reason, &fault_type); if (fault_type == INTR_REMAP) - pr_err("INTR-REMAP: Request device [[%02x:%02x.%d] " - "fault index %llx\n" - "INTR-REMAP:[fault reason %02d] %s\n", - (source_id >> 8), PCI_SLOT(source_id & 0xFF), + pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index %llx [fault reason %02d] %s\n", + source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr >> 48, fault_reason, reason); else - pr_err("DMAR:[%s] Request device [%02x:%02x.%d] " - "fault addr %llx \n" - "DMAR:[fault reason %02d] %s\n", - (type ? "DMA Read" : "DMA Write"), - (source_id >> 8), PCI_SLOT(source_id & 0xFF), + pr_err("[%s] Request device [%02x:%02x.%d] fault addr %llx [fault reason %02d] %s\n", + type ? "DMA Read" : "DMA Write", + source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); return 0; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/vt-d: Fault logging improvements 2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson @ 2016-04-05 14:20 ` Joerg Roedel [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org> 2016-03-17 20:25 ` Joe Perches 2 siblings, 0 replies; 10+ messages in thread From: Joerg Roedel @ 2016-04-05 14:20 UTC (permalink / raw) To: Alex Williamson Cc: joe-6d6DIl74uiNBDgjK7y7TUQ, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, Mar 17, 2016 at 02:12:19PM -0600, Alex Williamson wrote: > Alex Williamson (2): > iommu/vt-d: Ratelimit fault handler > iommu/vt-d: Improve fault handler error messages Applied, thanks Alex. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/vt-d: Fault logging improvements @ 2016-04-05 14:20 ` Joerg Roedel 0 siblings, 0 replies; 10+ messages in thread From: Joerg Roedel @ 2016-04-05 14:20 UTC (permalink / raw) To: Alex Williamson; +Cc: iommu, dwmw2, joe, linux-kernel On Thu, Mar 17, 2016 at 02:12:19PM -0600, Alex Williamson wrote: > Alex Williamson (2): > iommu/vt-d: Ratelimit fault handler > iommu/vt-d: Improve fault handler error messages Applied, thanks Alex. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/vt-d: Fault logging improvements 2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson 2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org> @ 2016-03-17 20:25 ` Joe Perches 2 siblings, 0 replies; 10+ messages in thread From: Joe Perches @ 2016-03-17 20:25 UTC (permalink / raw) To: Alex Williamson, iommu, dwmw2; +Cc: joro, linux-kernel On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote: > Ratelimit and improve formatting. Makes sense, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-04-05 14:20 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson
2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson
2016-03-17 20:33 ` Joe Perches
[not found] ` <1458246810.9556.22.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
2016-03-17 20:46 ` Alex Williamson
2016-03-17 20:46 ` Alex Williamson
[not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
2016-03-17 20:12 ` [PATCH v2 2/2] iommu/vt-d: Improve fault handler error messages Alex Williamson
2016-03-17 20:12 ` Alex Williamson
2016-04-05 14:20 ` [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Joerg Roedel
2016-04-05 14:20 ` Joerg Roedel
2016-03-17 20:25 ` Joe Perches
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.