From: fandongdong <fandd-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
To: Jiang Liu <jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
Alex Williamson
<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Joerg Roedeljoro <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: "Roland Dreier" <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
闫晓峰 <yanxiaofeng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>,
"jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org"
<jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
linux-kernel
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
刘长生 <liuchangsheng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>,
iommu
<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: Panic when cpu hot-remove
Date: Thu, 25 Jun 2015 18:46:37 +0800 [thread overview]
Message-ID: <558BDC0D.2000206@inspur.com> (raw)
In-Reply-To: <558BB7B8.7000402-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
在 2015/6/25 16:11, Jiang Liu 写道:
> On 2015/6/18 15:54, fandongdong wrote:
>>
>> 在 2015/6/18 15:27, fandongdong 写道:
>>>
>>> 在 2015/6/18 13:40, Jiang Liu 写道:
>>>> On 2015/6/17 22:36, Alex Williamson wrote:
>>>>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:
>>>>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote:
>>>>>>> Hi maintainer,
>>>>>>>
>>>>>>> We found a problem that a panic happen when cpu was hot-removed.
>>>>>>> We also trace the problem according to the calltrace information.
>>>>>>> An endless loop happen because value head is not equal to value
>>>>>>> tail forever in the function qi_check_fault( ).
>>>>>>> The location code is as follows:
>>>>>>>
>>>>>>>
>>>>>>> do {
>>>>>>> if (qi->desc_status[head] == QI_IN_USE)
>>>>>>> qi->desc_status[head] = QI_ABORT;
>>>>>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>>>>> } while (head != tail);
>>>>>> Hmm, this code interates only over every second QI descriptor, and
>>>>>> tail
>>>>>> probably points to a descriptor that is not iterated over.
>>>>>>
>>>>>> Jiang, can you please have a look?
>>>>> I think that part is normal, the way we use the queue is to always
>>>>> submit a work operation followed by a wait operation so that we can
>>>>> determine the work operation is complete. That's done via
>>>>> qi_submit_sync(). We have had spurious reports of the queue getting
>>>>> impossibly out of sync though. I saw one that was somehow linked to
>>>>> the
>>>>> I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not
>>>>> sure if they're related to this, but maybe worth comparing. Thanks,
>>>> Thanks, Alex and Joerg!
>>>>
>>>> Hi Dongdong,
>>>> Could you please help to give some instructions about how to
>>>> reproduce this issue? I will try to reproduce it if possible.
>>>> Thanks!
>>>> Gerry
>>> Hi Gerry,
>>>
>>> We're running kernel 4.1.0 on a 4-socket system and we want to
>>> offline socket 1.
>>> Steps as follows:
>>>
>>> echo 1 > /sys/firmware/acpi/hotplug/force_remove
>>> echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject
> Hi Dongdong,
> I failed to reproduce this issue on my side. Some please help
> to confirm?
> 1) Is this issue reproducible on your side?
Yes.
> 2) Does this issue happen if you disable irqbalance service on you
> system?
Yes.
> 3) Has the corresponding PCI host bridge been removed before removing
> the socket?
No, we will try to remove it before removing the socket later.
Thanks for your help, Gerry.
>
> >From the log message, we only noticed log messages for CPU and memory,
> but not messages for PCI (IOMMU) devices. And this log message
> "[ 149.976493] acpi ACPI0004:01: Still not present"
> implies that the socket has been powered off during the ejection.
> So the story may be that you powered off the socket while the host
> bridge on the socket is still in use.
> Thanks!
> Gerry
>
> .
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2015-06-25 10:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-17 10:42 Panic when cpu hot-remove 范冬冬
2015-06-17 11:52 ` Joerg Roedeljoro
[not found] ` <20150617115238.GC27750-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-06-17 14:36 ` Alex Williamson
[not found] ` <1434551800.5628.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-18 5:40 ` Jiang Liu
[not found] ` <558272E3.4000504@inspur.com>
[not found] ` <558272E3.4000504-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2015-06-18 7:54 ` fandongdong
[not found] ` <55827927.4080504-6gUaA8visnnQT0dZR+AlfA@public.gmane.org>
2015-06-25 8:11 ` Jiang Liu
[not found] ` <558BB7B8.7000402-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-06-25 10:46 ` fandongdong [this message]
2015-06-26 9:35 ` fandongdong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558BDC0D.2000206@inspur.com \
--to=fandd-6guaa8visnnqt0dzr+alfa@public.gmane.org \
--cc=alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
--cc=jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=liuchangsheng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=yanxiaofeng-6gUaA8visnnQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox