All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vinod Koul <vinod.koul@intel.com>
To: "Jiang, Dave" <dave.jiang@intel.com>
Cc: Gavin Guo <gavin.guo@canonical.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: ioatdma(Intel(R) I/OAT DMA Engine init failed)
Date: Thu, 19 May 2016 22:52:44 +0530	[thread overview]
Message-ID: <20160519172244.GE2735@localhost> (raw)
In-Reply-To: <112A412BB11A1242B37129D931BCE5347249E6B4@fmsmsx116.amr.corp.intel.com>

On Thu, May 19, 2016 at 08:19:30PM +0530, Jiang, Dave wrote:
> > -----Original Message-----
> > From: Gavin Guo [mailto:gavin.guo@canonical.com]
> > Sent: Wednesday, May 18, 2016 8:19 PM
> > To: Jiang, Dave <dave.jiang@intel.com>
> > Cc: Koul, Vinod <vinod.koul@intel.com>; dmaengine@vger.kernel.org; linux-kernel@vger.kernel.org; Williams, Dan J
> > <dan.j.williams@intel.com>
> > Subject: Re: ioatdma(Intel(R) I/OAT DMA Engine init failed)
> > 
> > On Thu, May 19, 2016 at 12:49 AM, Jiang, Dave <dave.jiang@intel.com> wrote:
> > > On Wed, 2016-05-18 at 13:27 +0000, Gavin Guo wrote:
> > >> On Tue, May 17, 2016 at 6:06 PM, Vinod Koul <vinod.koul@intel.com>
> > >> wrote:
> > >> >
> > >> > On Mon, May 16, 2016 at 06:08:20PM +0800, Gavin Guo wrote:
> > >> > >
> > >> > > The following error messages can be observed on the Intel
> > >> > > Haswell-E
> > >> > > chipset with v3.13 kernel. After the analysis, I found there is
> > >> > > no
> > >> > > difference in the logic of these error messages in the current
> > >> > > upstream kernel. I also searched the git log and can't find any
> > >> > > commit
> > >> > > which is fix to the error(correct me if I am wrong). The
> > >> > > following is
> > >> > > the detail, and I'll really appreciate if there is any comment.
> > >> > > :)
> > >> > 3.13 is ancient, can you check this on latest kernel
> > >> Thank you for the comment. It's running on the production system.
> > >> However,
> > >> I'll try to figure out if it's possible to test the latest kernel.
> > >
> > > I wonder if you don't have the extended PCI config space access enabled
> > > in your kernel config.
> > 
> > Really thanks for your advice. :)
> > 
> > I searched the internet about the extended PCI config space and found
> > the link:
> > 
> > [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
> > http://lwn.net/Articles/263288/
> 
> Can you try calling pci_enable_ext_config() in the PCI probe for your kernel? I just haven't seen this issue in the latest kernel. 

Do we need that to be called explicitly by driver, should that not be enabled
by default?

> 
> > 
> > And I checked the config and found the CONFIG_PCI_MMCONFIG=y. The
> > following string also can be observed in the dmesg:
> > 
> > [    1.419853] PCI: MMCONFIG for domain 0000 [bus 00-ff] at
> > [mem0x80000000-0x8fffffff] (base 0x80000000)
> > [    1.419855] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
> > 
> > It seems the extended PCI config space is enabled. If there is
> > anything missed?
> > 
> > >
> > >>
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > > ioatdma 0000:00:04.0: channel error register unreachable
> > >> > > ioatdma 0000:00:04.0: channel enumeration error
> > >> > > ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
> > >> > > ioatdma 0000:00:04.1: channel error register unreachable
> > >> > > ioatdma 0000:00:04.1: channel enumeration error
> > >> > > ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
> > >> > > ...
> > >> > > ioatdma 0000:00:04.7: channel error register unreachable
> > >> > > ioatdma 0000:00:04.7: channel enumeration error
> > >> > > ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
> > >> > > mei_me 0000:00:16.0: initialization failed.
> > >> > >
> > >> > > There are 8 I/OAT DMA controllers on the Haswell-E chipset:
> > >> > > 8086:2f20 ~ 8086:2f27
> > >> > > 80:04.0 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 0 (rev 02)
> > >> > > 80:04.1 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 1 (rev 02)
> > >> > > 80:04.2 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 2 (rev 02)
> > >> > > 80:04.3 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 3 (rev 02)
> > >> > > 80:04.4 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 4 (rev 02)
> > >> > > 80:04.5 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 5 (rev 02)
> > >> > > 80:04.6 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 6 (rev 02)
> > >> > > 80:04.7 System peripheral: Intel Corporation Haswell-E DMA
> > >> > > Channel 7 (rev 02)
> > >> > >
> > >> > > Analysis:
> > >> > > The bug happens when the driver is resetting DMA controller, this
> > >> > > is
> > >> > > the sequence: The function, ioat_pci_probe, is called when the
> > >> > > DMA
> > >> > > controller is detected by the PCI bus. Then,
> > >> > > ioat3_dma_probe -> ioat_probe -> ioat2_enumerate_channels ->
> > >> > > ioat3_reset_hw. The following code can be found in the
> > >> > > ioat3_reset_hw:
> > >> > >
> > >> > > drivers/dma/ioat/dma_v3.c:
> > >> > >         chanerr = readl(chan->reg_base + IOAT_CHANERR_OFFSET);
> > >> > >         writel(chanerr, chan->reg_base + IOAT_CHANERR_OFFSET);
> > >> > > ...
> > >> > >         err = pci_read_config_dword(pdev,
> > >> > > IOAT_PCI_CHANERR_INT_OFFSET, &chanerr);
> > >> > > if (err) {
> > >> > > dev_err(&pdev->dev,
> > >> > > "channel error register unreachable\n");
> > >> > > return err;
> > >> > > }
> > >> > >
> > >> > > Obviously, there are something wrong in the channel error
> > >> > > register
> > >> > > reset process. Then all the way back to ioat_probe(). Because the
> > >> > > error happens, the dma->chancnt will be set to 0:
> > >> > >
> > >> > > drivers/dma/ioat/dma.c:
> > >> > >         if (!dma->chancnt) {
> > >> > >                 dev_err(dev, "channel enumeration error\n");
> > >> > >                 goto err_setup_interrupts;
> > >> > >         }
> > >> > >
> > >> > > Finally back to ioat_pci_probe:
> > >> > >
> > >> > > drivers/dma/ioat/pci.c:
> > >> > >                 err = ioat3_dma_probe(device, ioat_dca_enabled);
> > >> > >         else
> > >> > >                 return -ENODEV;
> > >> > >
> > >> > >         if (err) {
> > >> > >                 dev_err(dev, "Intel(R) I/OAT DMA Engine init
> > >> > > failed\n");
> > >> > >                 return -ENODEV;
> > >> > --
> > >> > ~Vinod

-- 
~Vinod

  reply	other threads:[~2016-05-19 17:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-16 10:08 ioatdma(Intel(R) I/OAT DMA Engine init failed) Gavin Guo
2016-05-17 10:06 ` Vinod Koul
2016-05-18 13:27   ` Gavin Guo
2016-05-18 16:49     ` Jiang, Dave
2016-05-19  3:18       ` Gavin Guo
2016-05-19 14:49         ` Jiang, Dave
2016-05-19 17:22           ` Vinod Koul [this message]
2016-05-19 20:17             ` Jiang, Dave
2016-05-19 22:17               ` Yinghai Lu
2016-05-25  6:57                 ` Gavin Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160519172244.GE2735@localhost \
    --to=vinod.koul@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=gavin.guo@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.