All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] iommu/vt-d: Fault logging improvements
@ 2016-03-17 20:12 Alex Williamson
  2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw)
  To: iommu, dwmw2; +Cc: joe, joro, linux-kernel

Ratelimit and improve formatting.

v2:

 - Use a single ratelimit state as suggested by Joe Perches, except
   I chose to move it up to dmar_fault() so that it includes the
   "handling fault status reg" pr_err and we can avoid collecting
   entries for logging if we don't plan to print them.
 - Added reformatting changes suggested by Joe Perches.
 - While there is clearly more that could be done with disabling
   fault handling for specific context entries on storm and sending
   errors to drivers, this makes a marked improvement on its own.

Thanks,
Alex

---

Alex Williamson (2):
      iommu/vt-d: Ratelimit fault handler
      iommu/vt-d: Improve fault handler error messages


 drivers/iommu/dmar.c |   47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler
  2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson
@ 2016-03-17 20:12 ` Alex Williamson
  2016-03-17 20:33   ` Joe Perches
  2016-03-17 20:25 ` [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Joe Perches
       [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
  2 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw)
  To: iommu, dwmw2; +Cc: joe, joro, linux-kernel

Fault rates can easily overwhelm the console and make the system
unresponsive.  Ratelimit to allow an opportunity for maintenance.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/dmar.c |   33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 8ffd756..8f8bfff 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 	int reg, fault_index;
 	u32 fault_status;
 	unsigned long flag;
+	bool ratelimited;
+	static DEFINE_RATELIMIT_STATE(rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+
+	/* Disable printing, simply clear the fault when ratelimited */
+	ratelimited = !__ratelimit(&rs);
 
 	raw_spin_lock_irqsave(&iommu->register_lock, flag);
 	fault_status = readl(iommu->reg + DMAR_FSTS_REG);
-	if (fault_status)
+	if (fault_status && !ratelimited)
 		pr_err("DRHD: handling fault status reg %x\n", fault_status);
 
 	/* TBD: ignore advanced fault log currently */
@@ -1627,24 +1634,28 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 		if (!(data & DMA_FRCD_F))
 			break;
 
-		fault_reason = dma_frcd_fault_reason(data);
-		type = dma_frcd_type(data);
+		if (!ratelimited) {
+			fault_reason = dma_frcd_fault_reason(data);
+			type = dma_frcd_type(data);
 
-		data = readl(iommu->reg + reg +
-				fault_index * PRIMARY_FAULT_REG_LEN + 8);
-		source_id = dma_frcd_source_id(data);
+			data = readl(iommu->reg + reg +
+				     fault_index * PRIMARY_FAULT_REG_LEN + 8);
+			source_id = dma_frcd_source_id(data);
+
+			guest_addr = dmar_readq(iommu->reg + reg +
+					fault_index * PRIMARY_FAULT_REG_LEN);
+			guest_addr = dma_frcd_page_addr(guest_addr);
+		}
 
-		guest_addr = dmar_readq(iommu->reg + reg +
-				fault_index * PRIMARY_FAULT_REG_LEN);
-		guest_addr = dma_frcd_page_addr(guest_addr);
 		/* clear the fault */
 		writel(DMA_FRCD_F, iommu->reg + reg +
 			fault_index * PRIMARY_FAULT_REG_LEN + 12);
 
 		raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
 
-		dmar_fault_do_one(iommu, type, fault_reason,
-				source_id, guest_addr);
+		if (!ratelimited)
+			dmar_fault_do_one(iommu, type, fault_reason,
+					  source_id, guest_addr);
 
 		fault_index++;
 		if (fault_index >= cap_num_fault_regs(iommu->cap))

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] iommu/vt-d: Improve fault handler error messages
  2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson
@ 2016-03-17 20:12     ` Alex Williamson
  2016-03-17 20:25 ` [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Joe Perches
       [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
  2 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: joe-6d6DIl74uiNBDgjK7y7TUQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Remove new line in error logs, avoid duplicate and explicit pr_fmt.

Suggested-by: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/dmar.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 8f8bfff..6a86b5d 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1579,18 +1579,14 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 	reason = dmar_get_fault_reason(fault_reason, &fault_type);
 
 	if (fault_type == INTR_REMAP)
-		pr_err("INTR-REMAP: Request device [[%02x:%02x.%d] "
-		       "fault index %llx\n"
-			"INTR-REMAP:[fault reason %02d] %s\n",
-			(source_id >> 8), PCI_SLOT(source_id & 0xFF),
+		pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index %llx [fault reason %02d] %s\n",
+			source_id >> 8, PCI_SLOT(source_id & 0xFF),
 			PCI_FUNC(source_id & 0xFF), addr >> 48,
 			fault_reason, reason);
 	else
-		pr_err("DMAR:[%s] Request device [%02x:%02x.%d] "
-		       "fault addr %llx \n"
-		       "DMAR:[fault reason %02d] %s\n",
-		       (type ? "DMA Read" : "DMA Write"),
-		       (source_id >> 8), PCI_SLOT(source_id & 0xFF),
+		pr_err("[%s] Request device [%02x:%02x.%d] fault addr %llx [fault reason %02d] %s\n",
+		       type ? "DMA Read" : "DMA Write",
+		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
 		       PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] iommu/vt-d: Improve fault handler error messages
@ 2016-03-17 20:12     ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:12 UTC (permalink / raw)
  To: iommu, dwmw2; +Cc: joe, joro, linux-kernel

Remove new line in error logs, avoid duplicate and explicit pr_fmt.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/dmar.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 8f8bfff..6a86b5d 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1579,18 +1579,14 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 	reason = dmar_get_fault_reason(fault_reason, &fault_type);
 
 	if (fault_type == INTR_REMAP)
-		pr_err("INTR-REMAP: Request device [[%02x:%02x.%d] "
-		       "fault index %llx\n"
-			"INTR-REMAP:[fault reason %02d] %s\n",
-			(source_id >> 8), PCI_SLOT(source_id & 0xFF),
+		pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index %llx [fault reason %02d] %s\n",
+			source_id >> 8, PCI_SLOT(source_id & 0xFF),
 			PCI_FUNC(source_id & 0xFF), addr >> 48,
 			fault_reason, reason);
 	else
-		pr_err("DMAR:[%s] Request device [%02x:%02x.%d] "
-		       "fault addr %llx \n"
-		       "DMAR:[fault reason %02d] %s\n",
-		       (type ? "DMA Read" : "DMA Write"),
-		       (source_id >> 8), PCI_SLOT(source_id & 0xFF),
+		pr_err("[%s] Request device [%02x:%02x.%d] fault addr %llx [fault reason %02d] %s\n",
+		       type ? "DMA Read" : "DMA Write",
+		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
 		       PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] iommu/vt-d: Fault logging improvements
  2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson
  2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson
@ 2016-03-17 20:25 ` Joe Perches
       [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
  2 siblings, 0 replies; 10+ messages in thread
From: Joe Perches @ 2016-03-17 20:25 UTC (permalink / raw)
  To: Alex Williamson, iommu, dwmw2; +Cc: joro, linux-kernel

On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote:
> Ratelimit and improve formatting.

Makes sense, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler
  2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson
@ 2016-03-17 20:33   ` Joe Perches
       [not found]     ` <1458246810.9556.22.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Joe Perches @ 2016-03-17 20:33 UTC (permalink / raw)
  To: Alex Williamson, iommu, dwmw2; +Cc: joro, linux-kernel

On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote:
> Fault rates can easily overwhelm the console and make the system
> unresponsive.  Ratelimit to allow an opportunity for maintenance.
[]
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
[]
> @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>  	int reg, fault_index;
>  	u32 fault_status;
>  	unsigned long flag;
> +	bool ratelimited;
> +	static DEFINE_RATELIMIT_STATE(rs,
> +				      DEFAULT_RATELIMIT_INTERVAL,
> +				      DEFAULT_RATELIMIT_BURST);

Are these the appropriate limits for dmar?

include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVAL    (5 * HZ)
include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURST       10

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler
  2016-03-17 20:33   ` Joe Perches
@ 2016-03-17 20:46         ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:46 UTC (permalink / raw)
  To: Joe Perches
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, 17 Mar 2016 13:33:30 -0700
Joe Perches <joe@perches.com> wrote:

> On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote:
> > Fault rates can easily overwhelm the console and make the system
> > unresponsive.  Ratelimit to allow an opportunity for maintenance.  
> []
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c  
> []
> > @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
> >  	int reg, fault_index;
> >  	u32 fault_status;
> >  	unsigned long flag;
> > +	bool ratelimited;
> > +	static DEFINE_RATELIMIT_STATE(rs,
> > +				      DEFAULT_RATELIMIT_INTERVAL,
> > +				      DEFAULT_RATELIMIT_BURST);  
> 
> Are these the appropriate limits for dmar?
> 
> include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVAL    (5 * HZ)
> include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURST       10

They seem OK to me, I've got a test running that continuously generates
DMA read faults and I get 20 lines of log every 5 seconds.  That seems
like enough to know there's an issue, it's ongoing, and maybe see some
patterns in the fault addresses.  I expect we could turn up the burst
value but generally when I'm looking at the logs I'm only looking for
things like is it a single target address, is it a sequential address,
or what's the general address space to know if it should or should not
be a valid fault address.  Thanks,

Alex
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler
@ 2016-03-17 20:46         ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2016-03-17 20:46 UTC (permalink / raw)
  To: Joe Perches; +Cc: iommu, dwmw2, joro, linux-kernel

On Thu, 17 Mar 2016 13:33:30 -0700
Joe Perches <joe@perches.com> wrote:

> On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote:
> > Fault rates can easily overwhelm the console and make the system
> > unresponsive.  Ratelimit to allow an opportunity for maintenance.  
> []
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c  
> []
> > @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
> >  	int reg, fault_index;
> >  	u32 fault_status;
> >  	unsigned long flag;
> > +	bool ratelimited;
> > +	static DEFINE_RATELIMIT_STATE(rs,
> > +				      DEFAULT_RATELIMIT_INTERVAL,
> > +				      DEFAULT_RATELIMIT_BURST);  
> 
> Are these the appropriate limits for dmar?
> 
> include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVAL    (5 * HZ)
> include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURST       10

They seem OK to me, I've got a test running that continuously generates
DMA read faults and I get 20 lines of log every 5 seconds.  That seems
like enough to know there's an issue, it's ongoing, and maybe see some
patterns in the fault addresses.  I expect we could turn up the burst
value but generally when I'm looking at the logs I'm only looking for
things like is it a single target address, is it a sequential address,
or what's the general address space to know if it should or should not
be a valid fault address.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] iommu/vt-d: Fault logging improvements
  2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson
@ 2016-04-05 14:20     ` Joerg Roedel
  2016-03-17 20:25 ` [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Joe Perches
       [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
  2 siblings, 0 replies; 10+ messages in thread
From: Joerg Roedel @ 2016-04-05 14:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joe-6d6DIl74uiNBDgjK7y7TUQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 17, 2016 at 02:12:19PM -0600, Alex Williamson wrote:
> Alex Williamson (2):
>       iommu/vt-d: Ratelimit fault handler
>       iommu/vt-d: Improve fault handler error messages

Applied, thanks Alex.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] iommu/vt-d: Fault logging improvements
@ 2016-04-05 14:20     ` Joerg Roedel
  0 siblings, 0 replies; 10+ messages in thread
From: Joerg Roedel @ 2016-04-05 14:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2, joe, linux-kernel

On Thu, Mar 17, 2016 at 02:12:19PM -0600, Alex Williamson wrote:
> Alex Williamson (2):
>       iommu/vt-d: Ratelimit fault handler
>       iommu/vt-d: Improve fault handler error messages

Applied, thanks Alex.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-05 14:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-17 20:12 [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Alex Williamson
2016-03-17 20:12 ` [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler Alex Williamson
2016-03-17 20:33   ` Joe Perches
     [not found]     ` <1458246810.9556.22.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
2016-03-17 20:46       ` Alex Williamson
2016-03-17 20:46         ` Alex Williamson
2016-03-17 20:25 ` [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Joe Perches
     [not found] ` <20160317200557.29049.32879.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
2016-03-17 20:12   ` [PATCH v2 2/2] iommu/vt-d: Improve fault handler error messages Alex Williamson
2016-03-17 20:12     ` Alex Williamson
2016-04-05 14:20   ` [PATCH v2 0/2] iommu/vt-d: Fault logging improvements Joerg Roedel
2016-04-05 14:20     ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.