From: fandongdong <fandd@inspur.com>
To: Jiang Liu <jiang.liu@linux.intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Joerg Roedeljoro <joro@8bytes.org>
Cc: 刘长生 <liuchangsheng@inspur.com>,
iommu <iommu@lists.linux-foundation.org>,
"jiang.liu@intel.com" <jiang.liu@intel.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
闫晓峰 <yanxiaofeng@inspur.com>, "Roland Dreier" <roland@kernel.org>
Subject: Re: Panic when cpu hot-remove
Date: Thu, 25 Jun 2015 18:46:37 +0800 [thread overview]
Message-ID: <558BDC0D.2000206@inspur.com> (raw)
In-Reply-To: <558BB7B8.7000402@linux.intel.com>
在 2015/6/25 16:11, Jiang Liu 写道:
> On 2015/6/18 15:54, fandongdong wrote:
>>
>> 在 2015/6/18 15:27, fandongdong 写道:
>>>
>>> 在 2015/6/18 13:40, Jiang Liu 写道:
>>>> On 2015/6/17 22:36, Alex Williamson wrote:
>>>>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:
>>>>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote:
>>>>>>> Hi maintainer,
>>>>>>>
>>>>>>> We found a problem that a panic happen when cpu was hot-removed.
>>>>>>> We also trace the problem according to the calltrace information.
>>>>>>> An endless loop happen because value head is not equal to value
>>>>>>> tail forever in the function qi_check_fault( ).
>>>>>>> The location code is as follows:
>>>>>>>
>>>>>>>
>>>>>>> do {
>>>>>>> if (qi->desc_status[head] == QI_IN_USE)
>>>>>>> qi->desc_status[head] = QI_ABORT;
>>>>>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>>>>> } while (head != tail);
>>>>>> Hmm, this code interates only over every second QI descriptor, and
>>>>>> tail
>>>>>> probably points to a descriptor that is not iterated over.
>>>>>>
>>>>>> Jiang, can you please have a look?
>>>>> I think that part is normal, the way we use the queue is to always
>>>>> submit a work operation followed by a wait operation so that we can
>>>>> determine the work operation is complete. That's done via
>>>>> qi_submit_sync(). We have had spurious reports of the queue getting
>>>>> impossibly out of sync though. I saw one that was somehow linked to
>>>>> the
>>>>> I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not
>>>>> sure if they're related to this, but maybe worth comparing. Thanks,
>>>> Thanks, Alex and Joerg!
>>>>
>>>> Hi Dongdong,
>>>> Could you please help to give some instructions about how to
>>>> reproduce this issue? I will try to reproduce it if possible.
>>>> Thanks!
>>>> Gerry
>>> Hi Gerry,
>>>
>>> We're running kernel 4.1.0 on a 4-socket system and we want to
>>> offline socket 1.
>>> Steps as follows:
>>>
>>> echo 1 > /sys/firmware/acpi/hotplug/force_remove
>>> echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject
> Hi Dongdong,
> I failed to reproduce this issue on my side. Some please help
> to confirm?
> 1) Is this issue reproducible on your side?
Yes.
> 2) Does this issue happen if you disable irqbalance service on you
> system?
Yes.
> 3) Has the corresponding PCI host bridge been removed before removing
> the socket?
No, we will try to remove it before removing the socket later.
Thanks for your help, Gerry.
>
> >From the log message, we only noticed log messages for CPU and memory,
> but not messages for PCI (IOMMU) devices. And this log message
> "[ 149.976493] acpi ACPI0004:01: Still not present"
> implies that the socket has been powered off during the ejection.
> So the story may be that you powered off the socket while the host
> bridge on the socket is still in use.
> Thanks!
> Gerry
>
> .
>
next prev parent reply other threads:[~2015-06-25 10:49 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-17 10:42 Panic when cpu hot-remove 范冬冬
2015-06-17 11:52 ` Joerg Roedeljoro
2015-06-17 14:36 ` Alex Williamson
2015-06-18 5:40 ` Jiang Liu
[not found] ` <558272E3.4000504@inspur.com>
2015-06-18 7:54 ` fandongdong
2015-06-25 8:11 ` Jiang Liu
2015-06-25 10:46 ` fandongdong [this message]
2015-11-09 20:21 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558BDC0D.2000206@inspur.com \
--to=fandd@inspur.com \
--cc=alex.williamson@redhat.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jiang.liu@intel.com \
--cc=jiang.liu@linux.intel.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liuchangsheng@inspur.com \
--cc=roland@kernel.org \
--cc=yanxiaofeng@inspur.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox