Linux IOMMU Development
 help / color / mirror / Atom feed
From: fandongdong <fandd-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
To: Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	Alex Williamson
	<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Joerg Roedeljoro <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: "Roland Dreier" <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	闫晓峰 <yanxiaofeng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>,
	"jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
	<jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	刘长生 <liuchangsheng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>,
	iommu
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: Panic when cpu hot-remove
Date: Fri, 26 Jun 2015 17:35:39 +0800	[thread overview]
Message-ID: <558D1CEB.3050804@inspur.com> (raw)
In-Reply-To: <558BB7B8.7000402-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 3083 bytes --]



在 2015/6/25 16:11, Jiang Liu 写道:
> On 2015/6/18 15:54, fandongdong wrote:
>>
>> 在 2015/6/18 15:27, fandongdong 写道:
>>>
>>> 在 2015/6/18 13:40, Jiang Liu 写道:
>>>> On 2015/6/17 22:36, Alex Williamson wrote:
>>>>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:
>>>>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote:
>>>>>>> Hi maintainer,
>>>>>>>
>>>>>>> We found a problem that a panic happen when cpu was hot-removed.
>>>>>>> We also trace the problem according to the calltrace information.
>>>>>>> An endless loop happen because value head is not equal to value
>>>>>>> tail forever in the function qi_check_fault( ).
>>>>>>> The location code is as follows:
>>>>>>>
>>>>>>>
>>>>>>> do {
>>>>>>>           if (qi->desc_status[head] == QI_IN_USE)
>>>>>>>           qi->desc_status[head] = QI_ABORT;
>>>>>>>           head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>>>>>       } while (head != tail);
>>>>>> Hmm, this code interates only over every second QI descriptor, and
>>>>>> tail
>>>>>> probably points to a descriptor that is not iterated over.
>>>>>>
>>>>>> Jiang, can you please have a look?
>>>>> I think that part is normal, the way we use the queue is to always
>>>>> submit a work operation followed by a wait operation so that we can
>>>>> determine the work operation is complete.  That's done via
>>>>> qi_submit_sync().  We have had spurious reports of the queue getting
>>>>> impossibly out of sync though.  I saw one that was somehow linked to
>>>>> the
>>>>> I/O AT DMA engine.  Roland Dreier saw something similar[1]. I'm not
>>>>> sure if they're related to this, but maybe worth comparing. Thanks,
>>>> Thanks, Alex and Joerg!
>>>>
>>>> Hi Dongdong,
>>>>      Could you please help to give some instructions about how to
>>>> reproduce this issue? I will try to reproduce it if possible.
>>>> Thanks!
>>>> Gerry
>>> Hi Gerry,
>>>
>>> We're running kernel 4.1.0 on a 4-socket system and  we want to
>>> offline socket 1.
>>> Steps as follows:
>>>
>>> echo 1 > /sys/firmware/acpi/hotplug/force_remove
>>> echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject
> Hi Dongdong,
> 	I failed to reproduce this issue on my side. Some please help
> to confirm?
> 1) Is this issue reproducible on your side?
> 2) Does this issue happen if you disable irqbalance service on you
>     system?
> 3) Has the corresponding PCI host bridge been removed before removing
>     the socket?
>
> >From the log message, we only noticed log messages for CPU and memory,
> but not messages for PCI (IOMMU) devices. And this log message
> 	"[ 149.976493] acpi ACPI0004:01: Still not present"
> implies that the socket has been powered off during the ejection.
> So the story may be that you powered off the socket while the host
> bridge on the socket is still in use.
> Thanks!
> Gerry
Hi Gerry,
             Thanks for your suggestion!
             The issue didn't happen after removing the corresponding 
PCI host bridge.
  Thanks!
  Dongdong
>
> .
>


[-- Attachment #1.2: Type: text/html, Size: 4699 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



      parent reply	other threads:[~2015-06-26  9:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-17 10:42 Panic when cpu hot-remove 范冬冬
2015-06-17 11:52 ` Joerg Roedeljoro
     [not found]   ` <20150617115238.GC27750-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-06-17 14:36     ` Alex Williamson
     [not found]       ` <1434551800.5628.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-18  5:40         ` Jiang Liu
     [not found]           ` <558272E3.4000504@inspur.com>
     [not found]             ` <558272E3.4000504-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2015-06-18  7:54               ` fandongdong
     [not found]                 ` <55827927.4080504-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2015-06-25  8:11                   ` Jiang Liu
     [not found]                     ` <558BB7B8.7000402-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-06-25 10:46                       ` fandongdong
2015-06-26  9:35                       ` fandongdong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558D1CEB.3050804@inspur.com \
    --to=fandd-6guaa8visnnqt0dzr+alfa@public.gmane.org \
    --cc=alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=liuchangsheng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=yanxiaofeng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox