public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
To: Udo van den Heuvel <udovdh@xs4all.nl>
Cc: "Boris Ostrovsky" <boris.ostrovsky@amd.com>,
	"Jacob Shin" <jacob.shin@amd.com>,
	"Borislav Petkov" <bp@alien8.de>, "Jörg Rödel" <joro@8bytes.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out
Date: Tue, 22 Jan 2013 17:29:54 -0600	[thread overview]
Message-ID: <50FF20F2.9090503@amd.com> (raw)
In-Reply-To: <50FEBE5D.7040905@xs4all.nl>

On 1/22/2013 10:29 AM, Udo van den Heuvel wrote:

> On 2013-01-22 17:12, Boris Ostrovsky wrote:
>> Your BIOS does not have the required erratum workaround. We will provide
>> a patch to close that hole but since the problem is not easily
>> reproducible (and the erratum is also not easy to trigger) it may be
>> difficult to say whether it really helped with your problem.

Udo,

I sent out a patch (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should implement
the workaround for AMD processor family15h model 10-1Fh erratum 746 in the IOMMU driver.
In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which tells me that BIOS doesn't
implement the work around. After patching, you should see the following message in "dmesg".

"AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

> Can we think of certain loads/actions/etc that could help trigger the issue?
> Then if reproducing is easier we can better say if stuff is actually
> fixed after the workaround.
>
> Udo

Looking at the original kernel message, it seems that the the kernel timed out while waiting for the IOMMU
to finish executing the "COMPLETION_WAIT" command.   In this particular case, it is issued as part of
"__domain_flush_pages()" while trying to send the "INVALIDATE_IOMMU_PAGE" command to the IOMMU but the command
buffer is getting full and the kernel needed to wait for the command buffer to free up.  However, the kernel
message did not exactly telling us what caused IOMMU to locked up in the first place.

According to my observation, high disk traffic workload should trigger large amount of "INVALIDATE_IOMMU_PAGE".
However, this doesn't automatically issuing "COMPLETION_WAIT" command.  The following patch slightly modify
the code to always issue "COMPLETION_WAIT" after every command.  This should help increasing the chance of reproducing
the issue.


diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..d05b1f9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1016,6 +1016,7 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu,
                                     struct iommu_cmd *cmd,
                                     bool sync)
  {
+#if 0
         u32 left, tail, head, next_tail;
         unsigned long flags;
  
@@ -1052,6 +1053,40 @@ again:
  
         spin_unlock_irqrestore(&iommu->lock, flags);
  
+#else
+       u32 tail;
+       unsigned long flags;
+
+       WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
+       printk (KERN_DEBUG "AMD-Vi: iommu_queue_command_sync: iommu_queue_command_sync"
+               " data[0]:%#x data[1]:%#x data[2]:%#x data[3]:%#x\n",
+               cmd->data[0], cmd->data[1], cmd->data[2], cmd->data[3] );
+
+       spin_lock_irqsave(&iommu->lock, flags);
+
+       tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+       copy_cmd_to_buffer(iommu, cmd, tail);
+
+       spin_unlock_irqrestore(&iommu->lock, flags);
+
+       // Sending completion_wait command
+       {
+               struct iommu_cmd sync_cmd;
+               volatile u64 sem = 0;
+               int ret;
+
+               spin_lock_irqsave(&iommu->lock, flags);
+
+               tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+               build_completion_wait(&sync_cmd, (u64)&sem);
+               copy_cmd_to_buffer(iommu, &sync_cmd, tail);
+
+               spin_unlock_irqrestore(&iommu->lock, flags);
+
+               if ((ret = wait_on_sem(&sem)) != 0)
+                       return ret;
+       }
+#endif
         return 0;
  }








  reply	other threads:[~2013-01-22 23:30 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-20 10:33 3.6.11 AMD-Vi: Completion-Wait loop timed out Udo van den Heuvel
2013-01-20 10:36 ` Borislav Petkov
2013-01-20 10:40   ` Udo van den Heuvel
2013-01-20 11:19     ` Jörg Rödel
2013-01-20 11:25       ` Udo van den Heuvel
2013-01-20 11:40         ` Jörg Rödel
2013-01-20 11:48           ` Borislav Petkov
2013-01-20 11:50             ` Borislav Petkov
2013-01-20 11:59               ` Udo van den Heuvel
2013-01-20 12:24                 ` Borislav Petkov
2013-01-20 11:52             ` Udo van den Heuvel
2013-01-20 11:57             ` Jörg Rödel
2013-01-21 13:09               ` Borislav Petkov
2013-01-21 14:10                 ` Udo van den Heuvel
2013-01-21 14:55                   ` Borislav Petkov
2013-01-21 15:10                 ` Jörg Rödel
2013-01-21 15:32                   ` Borislav Petkov
2013-01-21 15:34                     ` Udo van den Heuvel
2013-04-21  1:03                 ` Jake
2013-04-21 21:47                   ` Borislav Petkov
2013-01-21 14:37               ` Boris Ostrovsky
2013-01-21 14:44                 ` Udo van den Heuvel
2013-01-21 14:47                 ` Jörg Rödel
2013-01-21 16:04             ` Jacob Shin
2013-01-21 22:35               ` Suravee Suthikulpanit
2013-01-22  3:22                 ` Udo van den Heuvel
2013-01-22 14:13                 ` Udo van den Heuvel
2013-01-22 14:36                   ` Boris Ostrovsky
2013-01-22 15:16                     ` Jörg Rödel
2013-01-22 15:27                     ` Udo van den Heuvel
2013-01-22 16:12                       ` Boris Ostrovsky
2013-01-22 16:29                         ` Udo van den Heuvel
2013-01-22 23:29                           ` Suravee Suthikulanit [this message]
2013-01-23 14:19                             ` Udo van den Heuvel
2013-01-23 15:00                               ` Suravee Suthikulpanit
2013-01-23 14:23                             ` Udo van den Heuvel
2013-01-23 15:01                               ` Suravee Suthikulpanit
2013-01-31 15:42                     ` Udo van den Heuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50FF20F2.9090503@amd.com \
    --to=suravee.suthikulpanit@amd.com \
    --cc=boris.ostrovsky@amd.com \
    --cc=bp@alien8.de \
    --cc=jacob.shin@amd.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=udovdh@xs4all.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox