All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
To: Davidlohr Bueso <davidlohr-VXdhtT5mjnY@public.gmane.org>
Cc: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-scsi <linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	James Bottomley
	<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
	"open list:INTEL IOMMU (VT-d)"
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	"Stephen M. Cameron"
	<scameron-3C9H9nn4BS4HL6m8NFMY+dBPR1lH4CV8@public.gmane.org>,
	Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	David Woodhouse
	<David.Woodhouse-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: hpsa driver bug crack kernel down!
Date: Thu, 10 Apr 2014 09:15:35 +0200	[thread overview]
Message-ID: <20140410071535.GX13491@8bytes.org> (raw)
In-Reply-To: <1397111557.2608.29.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>

[+ David, VT-d maintainer ]

Jiang, David, can you please have a look into this issue?

Thanks,

	Joerg

On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote:
> On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote:
> > [+cc Joerg, iommu list]
> > 
> > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso <davidlohr-VXdhtT5mjnY@public.gmane.org> wrote:
> > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
> > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
> > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
> > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
> > >> > > > [+linux-scsi]
> > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
> > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
> > >> > > > > > Hi,
> > >> > > > > >
> > >> > > > > > The kernel is 3.14.0+ which is pulled just now.
> > >> > > > >
> > >> > > > > Cc'ing more people.
> > >> > > > >
> > >> > > > > While the hpsa driver appears to be involved in some way, I'm sure if
> > >> > > > > this is a related issue, but as of today's pull I'm getting another
> > >> > > > > problem that causes my DL980 not to come up.
> > >> > > > >
> > >> > > > > *Massive* amounts of:
> > >> > > > >
> > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear
> > >> > > > > dmar: DRHD: handling fault status reg 602
> > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
> > >> > > > >
> > >> > > > > Then:
> > >> > > > >
> > >> > > > > hpsa 0000:03:00.0: Controller lockup detected: 0xffff0000
> > >> > > > > ...
> > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
> > >> > > > > ...
> > >> > > > >
> > >> > > > > Screenshot of the actual LOCKUP:
> > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png
> > >> > > > >
> > >> > > > > While I haven't bisected, things worked fine until at least until commit
> > >> > > > > 39de65aa2c3e (April 2nd).
> > >> > > > >
> > >> > > > > Any ideas?
> > >> > > >
> > >> > > > Well, it's either a DMA remapping issue or a hpsa one.  Your assertion
> > >> > > > that everything worked fine until 39de65aa2c3e would tend to vindicate
> > >> > > > hpsa,
> > >> >
> > >> > Hmm here you mean DMA, right?
> > >>
> > >> No, it vindicates the hpsa changes ... they don't seem to be causing
> > >> problems until something goes wrong with dma remapping.
> > >>
> > >> > > because all the hpsa changes went in before that under
> > >> > > Missing crucial info:
> > >> > >
> > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
> > >> > >
> > >> > > > Merge: 3e75c6d b2bff6c
> > >> > > > Author: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > >> > > > Date:   Tue Apr 1 18:49:04 2014 -0700
> > >> > > >
> > >> > > >     Merge tag 'scsi-misc' of
> > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> > >> > > >
> > >> > > > can you revalidate that this commit works OK just to make sure?
> > >> >
> > >> > Ok so I don't see those DMA messages and system starts just fine. I'm
> > >> > thinking perhaps something broke after the IO mmu stuff in commit
> > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
> > >> > causing the CPU stalls and just blame hpsa in the path as a side effect?
> > >> >
> > >> > /me goes out to try the commit.
> > >>
> > >> That's my guess.  The DMAR messages are DMA remapping issues caused in
> > >> the IOMMU.  If I had to guess, I'd say the DMAR fault message is
> > >> indicating the IOMMU is calling for a mapping address before it can
> > >> satisfy the driver read request, which is causing the hang apparently in
> > >> the hpsa driver.
> > >>
> > >> I've added linux-pci to the cc; I think they deal with iommu issues on
> > >> x86.
> > >
> > > So that merge commit appears to be the culprit, I see both the DMA
> > > messages and the lockup blaming hpsa...
> > 
> > My understanding so far (please correct me if I'm wrong):
> > 
> > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'")
> > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'")
> > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'")
> 
> Yes, specifically (finally done bisecting):
> 
> commit 2e45528930388658603ea24d49cf52867b928d3e
> Author: Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> Date:   Wed Feb 19 14:07:36 2014 +0800
> 
>     iommu/vt-d: Unify the way to process DMAR device scope array
>     
>     Now we have a PCI bus notification based mechanism to update DMAR
>     device scope array, we could extend the mechanism to support boot
>     time initialization too, which will help to unify and simplify
>     the implementation.
>     
>     Signed-off-by: Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>     Signed-off-by: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> 

WARNING: multiple messages have this Message-ID (diff)
From: Joerg Roedel <joro@8bytes.org>
To: Davidlohr Bueso <davidlohr@hp.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Baoquan He <bhe@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Stephen M. Cameron" <scameron@beardog.cce.hp.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"open list:INTEL IOMMU (VT-d)" <iommu@lists.linux-foundation.org>,
	Jiang Liu <jiang.liu@linux.intel.com>,
	David Woodhouse <David.Woodhouse@intel.com>
Subject: Re: hpsa driver bug crack kernel down!
Date: Thu, 10 Apr 2014 09:15:35 +0200	[thread overview]
Message-ID: <20140410071535.GX13491@8bytes.org> (raw)
In-Reply-To: <1397111557.2608.29.camel@buesod1.americas.hpqcorp.net>

[+ David, VT-d maintainer ]

Jiang, David, can you please have a look into this issue?

Thanks,

	Joerg

On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote:
> On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote:
> > [+cc Joerg, iommu list]
> > 
> > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
> > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
> > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
> > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
> > >> > > > [+linux-scsi]
> > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
> > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
> > >> > > > > > Hi,
> > >> > > > > >
> > >> > > > > > The kernel is 3.14.0+ which is pulled just now.
> > >> > > > >
> > >> > > > > Cc'ing more people.
> > >> > > > >
> > >> > > > > While the hpsa driver appears to be involved in some way, I'm sure if
> > >> > > > > this is a related issue, but as of today's pull I'm getting another
> > >> > > > > problem that causes my DL980 not to come up.
> > >> > > > >
> > >> > > > > *Massive* amounts of:
> > >> > > > >
> > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear
> > >> > > > > dmar: DRHD: handling fault status reg 602
> > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
> > >> > > > >
> > >> > > > > Then:
> > >> > > > >
> > >> > > > > hpsa 0000:03:00.0: Controller lockup detected: 0xffff0000
> > >> > > > > ...
> > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
> > >> > > > > ...
> > >> > > > >
> > >> > > > > Screenshot of the actual LOCKUP:
> > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png
> > >> > > > >
> > >> > > > > While I haven't bisected, things worked fine until at least until commit
> > >> > > > > 39de65aa2c3e (April 2nd).
> > >> > > > >
> > >> > > > > Any ideas?
> > >> > > >
> > >> > > > Well, it's either a DMA remapping issue or a hpsa one.  Your assertion
> > >> > > > that everything worked fine until 39de65aa2c3e would tend to vindicate
> > >> > > > hpsa,
> > >> >
> > >> > Hmm here you mean DMA, right?
> > >>
> > >> No, it vindicates the hpsa changes ... they don't seem to be causing
> > >> problems until something goes wrong with dma remapping.
> > >>
> > >> > > because all the hpsa changes went in before that under
> > >> > > Missing crucial info:
> > >> > >
> > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
> > >> > >
> > >> > > > Merge: 3e75c6d b2bff6c
> > >> > > > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > >> > > > Date:   Tue Apr 1 18:49:04 2014 -0700
> > >> > > >
> > >> > > >     Merge tag 'scsi-misc' of
> > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> > >> > > >
> > >> > > > can you revalidate that this commit works OK just to make sure?
> > >> >
> > >> > Ok so I don't see those DMA messages and system starts just fine. I'm
> > >> > thinking perhaps something broke after the IO mmu stuff in commit
> > >> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
> > >> > causing the CPU stalls and just blame hpsa in the path as a side effect?
> > >> >
> > >> > /me goes out to try the commit.
> > >>
> > >> That's my guess.  The DMAR messages are DMA remapping issues caused in
> > >> the IOMMU.  If I had to guess, I'd say the DMAR fault message is
> > >> indicating the IOMMU is calling for a mapping address before it can
> > >> satisfy the driver read request, which is causing the hang apparently in
> > >> the hpsa driver.
> > >>
> > >> I've added linux-pci to the cc; I think they deal with iommu issues on
> > >> x86.
> > >
> > > So that merge commit appears to be the culprit, I see both the DMA
> > > messages and the lockup blaming hpsa...
> > 
> > My understanding so far (please correct me if I'm wrong):
> > 
> > 39de65aa2c3e OK ("Merge branch 'i2c/for-next'")
> > 1a0b6abaea78 OK ("Merge tag 'scsi-misc'")
> > 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'")
> 
> Yes, specifically (finally done bisecting):
> 
> commit 2e45528930388658603ea24d49cf52867b928d3e
> Author: Jiang Liu <jiang.liu@linux.intel.com>
> Date:   Wed Feb 19 14:07:36 2014 +0800
> 
>     iommu/vt-d: Unify the way to process DMAR device scope array
>     
>     Now we have a PCI bus notification based mechanism to update DMAR
>     device scope array, we could extend the mechanism to support boot
>     time initialization too, which will help to unify and simplify
>     the implementation.
>     
>     Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>     Signed-off-by: Joerg Roedel <joro@8bytes.org>
> 


  parent reply	other threads:[~2014-04-10  7:15 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-09  2:39 hpsa driver bug crack kernel down! Baoquan He
2014-04-09 22:49 ` Davidlohr Bueso
2014-04-09 23:08   ` James Bottomley
2014-04-09 23:10     ` James Bottomley
2014-04-09 23:40       ` Davidlohr Bueso
2014-04-09 23:50         ` James Bottomley
2014-04-10  0:19           ` Davidlohr Bueso
     [not found]             ` <1397089180.2608.27.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-10  4:03               ` Bjorn Helgaas
2014-04-10  4:03                 ` Bjorn Helgaas
2014-04-10  6:32                 ` Davidlohr Bueso
     [not found]                   ` <1397111557.2608.29.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-10  7:15                     ` Joerg Roedel [this message]
2014-04-10  7:15                       ` Joerg Roedel
     [not found]                       ` <20140410071535.GX13491-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-04-10  8:46                         ` Woodhouse, David
2014-04-10  8:46                           ` Woodhouse, David
     [not found]                           ` <1397119587.19944.14.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>
2014-04-10 15:14                             ` Bjorn Helgaas
2014-04-10 15:14                               ` Bjorn Helgaas
2014-04-10 15:34                               ` Woodhouse, David
2014-04-10 15:36                               ` Linda Knippers
2014-04-10 16:19                             ` Davidlohr Bueso
2014-04-10 16:19                               ` Davidlohr Bueso
2014-04-10 16:30                               ` Woodhouse, David
2014-04-11  9:18                               ` Woodhouse, David
     [not found]                                 ` <1397207932.19944.58.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>
2014-04-14 15:45                                   ` Davidlohr Bueso
2014-04-14 15:45                                     ` Davidlohr Bueso
     [not found]                                     ` <1397490358.31076.6.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-14 16:19                                       ` Jiang Liu
2014-04-14 16:19                                         ` Jiang Liu
     [not found]                                         ` <534C0AA9.5080909-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-04-14 16:44                                           ` Davidlohr Bueso
2014-04-14 16:44                                             ` Davidlohr Bueso
     [not found]                                             ` <1397493858.31076.8.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-14 16:47                                               ` Davidlohr Bueso
2014-04-14 16:47                                                 ` Davidlohr Bueso
2014-04-14 17:03                                                 ` Woodhouse, David
     [not found]                                                   ` <1397495030.19944.198.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>
2014-04-16 13:37                                                     ` joro-zLv9SwRftAIdnm+yROfE0A
2014-04-16 13:37                                                       ` joro
2014-04-16 13:58                                                       ` Woodhouse, David
2014-04-16 14:13                                                         ` joro
2014-04-14  7:01                               ` Jiang Liu
2014-04-14  8:57                               ` Jiang Liu
     [not found]                                 ` <534BA30B.5040102-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-04-14 18:08                                   ` Davidlohr Bueso
2014-04-14 18:08                                     ` Davidlohr Bueso
2014-04-10 20:45                   ` scameron
     [not found]                     ` <20140410204525.GC21815-3C9H9nn4BS4HL6m8NFMY+dBPR1lH4CV8@public.gmane.org>
2014-04-10 23:17                       ` Shuah Khan
2014-04-10 23:17                         ` Shuah Khan
     [not found]                         ` <CAKocOONaqGAaiesf_MUFXEOMDtX8R8kYuPQYAxLBfth7nAx3Jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-11  8:57                           ` David Woodhouse
2014-04-11  8:57                             ` David Woodhouse
     [not found]                 ` <CAErSpo4H=hcro8sMnt2MzDDVCROpASuUTQWBw37OxodHTyOfyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-10  8:34                   ` Jiang Liu
2014-04-10  8:34                     ` Jiang Liu
     [not found]                     ` <53465781.4010904-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-04-10 15:54                       ` Davidlohr Bueso
2014-04-10 15:54                         ` Davidlohr Bueso
2014-04-10 16:02                       ` Davidlohr Bueso
2014-04-10 16:02                         ` Davidlohr Bueso
2014-04-11  1:34                       ` Baoquan He
2014-04-11  1:34                         ` Baoquan He
2014-04-11  3:14                       ` Baoquan He
2014-04-11  3:14                         ` Baoquan He
2014-04-10 15:43 ` Bjorn Helgaas
2014-04-10 16:02   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140410071535.GX13491@8bytes.org \
    --to=joro-zlv9swrftaidnm+yrofe0a@public.gmane.org \
    --cc=David.Woodhouse-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=davidlohr-VXdhtT5mjnY@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=scameron-3C9H9nn4BS4HL6m8NFMY+dBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.