All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Davidlohr Bueso <davidlohr-VXdhtT5mjnY@public.gmane.org>,
	Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-scsi <linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	James Bottomley
	<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
	"open list:INTEL IOMMU (VT-d)"
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	"Stephen M. Cameron"
	<scameron-3C9H9nn4BS4HL6m8NFMY+dBPR1lH4CV8@public.gmane.org>
Subject: Re: hpsa driver bug crack kernel down!
Date: Thu, 10 Apr 2014 16:34:09 +0800	[thread overview]
Message-ID: <53465781.4010904@linux.intel.com> (raw)
In-Reply-To: <CAErSpo4H=hcro8sMnt2MzDDVCROpASuUTQWBw37OxodHTyOfyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Baoquan,
	Could you please help to give output of "lspci -vvvv"?
Is device "hpsa 0000:03:00.0" a legacy PCI device(non-PCIe)?
It may have relationship with IOMMU driver.
Thanks!
Gerry

On 2014/4/10 12:03, Bjorn Helgaas wrote:
> [+cc Joerg, iommu list]
> 
> On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso <davidlohr-VXdhtT5mjnY@public.gmane.org> wrote:
>> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
>>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
>>>> On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
>>>>> On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
>>>>>> [+linux-scsi]
>>>>>> On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
>>>>>>> On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The kernel is 3.14.0+ which is pulled just now.
>>>>>>>
>>>>>>> Cc'ing more people.
>>>>>>>
>>>>>>> While the hpsa driver appears to be involved in some way, I'm sure if
>>>>>>> this is a related issue, but as of today's pull I'm getting another
>>>>>>> problem that causes my DL980 not to come up.
>>>>>>>
>>>>>>> *Massive* amounts of:
>>>>>>>
>>>>>>> DMAR:[fault reason 02] Present bit in context entry is clear
>>>>>>> dmar: DRHD: handling fault status reg 602
>>>>>>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
>>>>>>>
>>>>>>> Then:
>>>>>>>
>>>>>>> hpsa 0000:03:00.0: Controller lockup detected: 0xffff0000
>>>>>>> ...
>>>>>>> Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
>>>>>>> ...
>>>>>>>
>>>>>>> Screenshot of the actual LOCKUP:
>>>>>>> http://stgolabs.net/hpsa-hard-lockup-3.14+.png
>>>>>>>
>>>>>>> While I haven't bisected, things worked fine until at least until commit
>>>>>>> 39de65aa2c3e (April 2nd).
>>>>>>>
>>>>>>> Any ideas?
>>>>>>
>>>>>> Well, it's either a DMA remapping issue or a hpsa one.  Your assertion
>>>>>> that everything worked fine until 39de65aa2c3e would tend to vindicate
>>>>>> hpsa,
>>>>
>>>> Hmm here you mean DMA, right?
>>>
>>> No, it vindicates the hpsa changes ... they don't seem to be causing
>>> problems until something goes wrong with dma remapping.
>>>
>>>>> because all the hpsa changes went in before that under
>>>>> Missing crucial info:
>>>>>
>>>>> commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
>>>>>
>>>>>> Merge: 3e75c6d b2bff6c
>>>>>> Author: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
>>>>>> Date:   Tue Apr 1 18:49:04 2014 -0700
>>>>>>
>>>>>>     Merge tag 'scsi-misc' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>>>>
>>>>>> can you revalidate that this commit works OK just to make sure?
>>>>
>>>> Ok so I don't see those DMA messages and system starts just fine. I'm
>>>> thinking perhaps something broke after the IO mmu stuff in commit
>>>> 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
>>>> causing the CPU stalls and just blame hpsa in the path as a side effect?
>>>>
>>>> /me goes out to try the commit.
>>>
>>> That's my guess.  The DMAR messages are DMA remapping issues caused in
>>> the IOMMU.  If I had to guess, I'd say the DMAR fault message is
>>> indicating the IOMMU is calling for a mapping address before it can
>>> satisfy the driver read request, which is causing the hang apparently in
>>> the hpsa driver.
>>>
>>> I've added linux-pci to the cc; I think they deal with iommu issues on
>>> x86.
>>
>> So that merge commit appears to be the culprit, I see both the DMA
>> messages and the lockup blaming hpsa...
> 
> My understanding so far (please correct me if I'm wrong):
> 
> 39de65aa2c3e OK ("Merge branch 'i2c/for-next'")
> 1a0b6abaea78 OK ("Merge tag 'scsi-misc'")
> 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'")
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

WARNING: multiple messages have this Message-ID (diff)
From: Jiang Liu <jiang.liu@linux.intel.com>
To: Bjorn Helgaas <bhelgaas@google.com>,
	Davidlohr Bueso <davidlohr@hp.com>, Baoquan He <bhe@redhat.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Stephen M. Cameron" <scameron@beardog.cce.hp.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	"open list:INTEL IOMMU (VT-d)" <iommu@lists.linux-foundation.org>
Subject: Re: hpsa driver bug crack kernel down!
Date: Thu, 10 Apr 2014 16:34:09 +0800	[thread overview]
Message-ID: <53465781.4010904@linux.intel.com> (raw)
In-Reply-To: <CAErSpo4H=hcro8sMnt2MzDDVCROpASuUTQWBw37OxodHTyOfyw@mail.gmail.com>

Hi Baoquan,
	Could you please help to give output of "lspci -vvvv"?
Is device "hpsa 0000:03:00.0" a legacy PCI device(non-PCIe)?
It may have relationship with IOMMU driver.
Thanks!
Gerry

On 2014/4/10 12:03, Bjorn Helgaas wrote:
> [+cc Joerg, iommu list]
> 
> On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
>>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
>>>> On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
>>>>> On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
>>>>>> [+linux-scsi]
>>>>>> On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
>>>>>>> On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The kernel is 3.14.0+ which is pulled just now.
>>>>>>>
>>>>>>> Cc'ing more people.
>>>>>>>
>>>>>>> While the hpsa driver appears to be involved in some way, I'm sure if
>>>>>>> this is a related issue, but as of today's pull I'm getting another
>>>>>>> problem that causes my DL980 not to come up.
>>>>>>>
>>>>>>> *Massive* amounts of:
>>>>>>>
>>>>>>> DMAR:[fault reason 02] Present bit in context entry is clear
>>>>>>> dmar: DRHD: handling fault status reg 602
>>>>>>> dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
>>>>>>>
>>>>>>> Then:
>>>>>>>
>>>>>>> hpsa 0000:03:00.0: Controller lockup detected: 0xffff0000
>>>>>>> ...
>>>>>>> Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
>>>>>>> ...
>>>>>>>
>>>>>>> Screenshot of the actual LOCKUP:
>>>>>>> http://stgolabs.net/hpsa-hard-lockup-3.14+.png
>>>>>>>
>>>>>>> While I haven't bisected, things worked fine until at least until commit
>>>>>>> 39de65aa2c3e (April 2nd).
>>>>>>>
>>>>>>> Any ideas?
>>>>>>
>>>>>> Well, it's either a DMA remapping issue or a hpsa one.  Your assertion
>>>>>> that everything worked fine until 39de65aa2c3e would tend to vindicate
>>>>>> hpsa,
>>>>
>>>> Hmm here you mean DMA, right?
>>>
>>> No, it vindicates the hpsa changes ... they don't seem to be causing
>>> problems until something goes wrong with dma remapping.
>>>
>>>>> because all the hpsa changes went in before that under
>>>>> Missing crucial info:
>>>>>
>>>>> commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
>>>>>
>>>>>> Merge: 3e75c6d b2bff6c
>>>>>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>> Date:   Tue Apr 1 18:49:04 2014 -0700
>>>>>>
>>>>>>     Merge tag 'scsi-misc' of
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>>>>
>>>>>> can you revalidate that this commit works OK just to make sure?
>>>>
>>>> Ok so I don't see those DMA messages and system starts just fine. I'm
>>>> thinking perhaps something broke after the IO mmu stuff in commit
>>>> 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
>>>> causing the CPU stalls and just blame hpsa in the path as a side effect?
>>>>
>>>> /me goes out to try the commit.
>>>
>>> That's my guess.  The DMAR messages are DMA remapping issues caused in
>>> the IOMMU.  If I had to guess, I'd say the DMAR fault message is
>>> indicating the IOMMU is calling for a mapping address before it can
>>> satisfy the driver read request, which is causing the hang apparently in
>>> the hpsa driver.
>>>
>>> I've added linux-pci to the cc; I think they deal with iommu issues on
>>> x86.
>>
>> So that merge commit appears to be the culprit, I see both the DMA
>> messages and the lockup blaming hpsa...
> 
> My understanding so far (please correct me if I'm wrong):
> 
> 39de65aa2c3e OK ("Merge branch 'i2c/for-next'")
> 1a0b6abaea78 OK ("Merge tag 'scsi-misc'")
> 3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'")
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

  parent reply	other threads:[~2014-04-10  8:34 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-09  2:39 hpsa driver bug crack kernel down! Baoquan He
2014-04-09 22:49 ` Davidlohr Bueso
2014-04-09 23:08   ` James Bottomley
2014-04-09 23:10     ` James Bottomley
2014-04-09 23:40       ` Davidlohr Bueso
2014-04-09 23:50         ` James Bottomley
2014-04-10  0:19           ` Davidlohr Bueso
     [not found]             ` <1397089180.2608.27.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-10  4:03               ` Bjorn Helgaas
2014-04-10  4:03                 ` Bjorn Helgaas
2014-04-10  6:32                 ` Davidlohr Bueso
     [not found]                   ` <1397111557.2608.29.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-10  7:15                     ` Joerg Roedel
2014-04-10  7:15                       ` Joerg Roedel
     [not found]                       ` <20140410071535.GX13491-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-04-10  8:46                         ` Woodhouse, David
2014-04-10  8:46                           ` Woodhouse, David
     [not found]                           ` <1397119587.19944.14.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>
2014-04-10 15:14                             ` Bjorn Helgaas
2014-04-10 15:14                               ` Bjorn Helgaas
2014-04-10 15:34                               ` Woodhouse, David
2014-04-10 15:36                               ` Linda Knippers
2014-04-10 16:19                             ` Davidlohr Bueso
2014-04-10 16:19                               ` Davidlohr Bueso
2014-04-10 16:30                               ` Woodhouse, David
2014-04-11  9:18                               ` Woodhouse, David
     [not found]                                 ` <1397207932.19944.58.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>
2014-04-14 15:45                                   ` Davidlohr Bueso
2014-04-14 15:45                                     ` Davidlohr Bueso
     [not found]                                     ` <1397490358.31076.6.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-14 16:19                                       ` Jiang Liu
2014-04-14 16:19                                         ` Jiang Liu
     [not found]                                         ` <534C0AA9.5080909-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-04-14 16:44                                           ` Davidlohr Bueso
2014-04-14 16:44                                             ` Davidlohr Bueso
     [not found]                                             ` <1397493858.31076.8.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-04-14 16:47                                               ` Davidlohr Bueso
2014-04-14 16:47                                                 ` Davidlohr Bueso
2014-04-14 17:03                                                 ` Woodhouse, David
     [not found]                                                   ` <1397495030.19944.198.camel-Fexsq3y4057IgHVZqg5X0TlWvGAXklZc@public.gmane.org>
2014-04-16 13:37                                                     ` joro-zLv9SwRftAIdnm+yROfE0A
2014-04-16 13:37                                                       ` joro
2014-04-16 13:58                                                       ` Woodhouse, David
2014-04-16 14:13                                                         ` joro
2014-04-14  7:01                               ` Jiang Liu
2014-04-14  8:57                               ` Jiang Liu
     [not found]                                 ` <534BA30B.5040102-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-04-14 18:08                                   ` Davidlohr Bueso
2014-04-14 18:08                                     ` Davidlohr Bueso
2014-04-10 20:45                   ` scameron
     [not found]                     ` <20140410204525.GC21815-3C9H9nn4BS4HL6m8NFMY+dBPR1lH4CV8@public.gmane.org>
2014-04-10 23:17                       ` Shuah Khan
2014-04-10 23:17                         ` Shuah Khan
     [not found]                         ` <CAKocOONaqGAaiesf_MUFXEOMDtX8R8kYuPQYAxLBfth7nAx3Jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-11  8:57                           ` David Woodhouse
2014-04-11  8:57                             ` David Woodhouse
     [not found]                 ` <CAErSpo4H=hcro8sMnt2MzDDVCROpASuUTQWBw37OxodHTyOfyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-10  8:34                   ` Jiang Liu [this message]
2014-04-10  8:34                     ` Jiang Liu
     [not found]                     ` <53465781.4010904-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-04-10 15:54                       ` Davidlohr Bueso
2014-04-10 15:54                         ` Davidlohr Bueso
2014-04-10 16:02                       ` Davidlohr Bueso
2014-04-10 16:02                         ` Davidlohr Bueso
2014-04-11  1:34                       ` Baoquan He
2014-04-11  1:34                         ` Baoquan He
2014-04-11  3:14                       ` Baoquan He
2014-04-11  3:14                         ` Baoquan He
2014-04-10 15:43 ` Bjorn Helgaas
2014-04-10 16:02   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53465781.4010904@linux.intel.com \
    --to=jiang.liu-vuqaysv1563yd54fqh9/ca@public.gmane.org \
    --cc=James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=davidlohr-VXdhtT5mjnY@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=scameron-3C9H9nn4BS4HL6m8NFMY+dBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.