From mboxrd@z Thu Jan 1 00:00:00 1970 From: fandongdong Subject: Re: Panic when cpu hot-remove Date: Thu, 25 Jun 2015 18:46:37 +0800 Message-ID: <558BDC0D.2000206@inspur.com> References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> <558259BD.7080402@linux.intel.com> <558272E3.4000504@inspur.com> <55827927.4080504@inspur.com> <558BB7B8.7000402@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; Format="flowed" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <558BB7B8.7000402-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Jiang Liu , Alex Williamson , Joerg Roedeljoro Cc: Roland Dreier , =?UTF-8?B?6Zer5pmT5bOw?= , "jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" , linux-kernel , =?UTF-8?B?5YiY6ZW/55Sf?= , iommu List-Id: iommu@lists.linux-foundation.org CgrlnKggMjAxNS82LzI1IDE2OjExLCBKaWFuZyBMaXUg5YaZ6YGTOgo+IE9uIDIwMTUvNi8xOCAx NTo1NCwgZmFuZG9uZ2Rvbmcgd3JvdGU6Cj4+Cj4+IOWcqCAyMDE1LzYvMTggMTU6MjcsIGZhbmRv bmdkb25nIOWGmemBkzoKPj4+Cj4+PiDlnKggMjAxNS82LzE4IDEzOjQwLCBKaWFuZyBMaXUg5YaZ 6YGTOgo+Pj4+IE9uIDIwMTUvNi8xNyAyMjozNiwgQWxleCBXaWxsaWFtc29uIHdyb3RlOgo+Pj4+ PiBPbiBXZWQsIDIwMTUtMDYtMTcgYXQgMTM6NTIgKzAyMDAsIEpvZXJnIFJvZWRlbGpvcm8gd3Jv dGU6Cj4+Pj4+PiBPbiBXZWQsIEp1biAxNywgMjAxNSBhdCAxMDo0Mjo0OUFNICswMDAwLCDojIPl hqzlhqwgd3JvdGU6Cj4+Pj4+Pj4gSGkgbWFpbnRhaW5lciwKPj4+Pj4+Pgo+Pj4+Pj4+IFdlIGZv dW5kIGEgcHJvYmxlbSB0aGF0IGEgcGFuaWMgaGFwcGVuIHdoZW4gY3B1IHdhcyBob3QtcmVtb3Zl ZC4KPj4+Pj4+PiBXZSBhbHNvIHRyYWNlIHRoZSBwcm9ibGVtIGFjY29yZGluZyB0byB0aGUgY2Fs bHRyYWNlIGluZm9ybWF0aW9uLgo+Pj4+Pj4+IEFuIGVuZGxlc3MgbG9vcCBoYXBwZW4gYmVjYXVz ZSB2YWx1ZSBoZWFkIGlzIG5vdCBlcXVhbCB0byB2YWx1ZQo+Pj4+Pj4+IHRhaWwgZm9yZXZlciBp biB0aGUgZnVuY3Rpb24gcWlfY2hlY2tfZmF1bHQoICkuCj4+Pj4+Pj4gVGhlIGxvY2F0aW9uIGNv ZGUgaXMgYXMgZm9sbG93czoKPj4+Pj4+Pgo+Pj4+Pj4+Cj4+Pj4+Pj4gZG8gewo+Pj4+Pj4+ICAg ICAgICAgICBpZiAocWktPmRlc2Nfc3RhdHVzW2hlYWRdID09IFFJX0lOX1VTRSkKPj4+Pj4+PiAg ICAgICAgICAgcWktPmRlc2Nfc3RhdHVzW2hlYWRdID0gUUlfQUJPUlQ7Cj4+Pj4+Pj4gICAgICAg ICAgIGhlYWQgPSAoaGVhZCAtIDIgKyBRSV9MRU5HVEgpICUgUUlfTEVOR1RIOwo+Pj4+Pj4+ICAg ICAgIH0gd2hpbGUgKGhlYWQgIT0gdGFpbCk7Cj4+Pj4+PiBIbW0sIHRoaXMgY29kZSBpbnRlcmF0 ZXMgb25seSBvdmVyIGV2ZXJ5IHNlY29uZCBRSSBkZXNjcmlwdG9yLCBhbmQKPj4+Pj4+IHRhaWwK Pj4+Pj4+IHByb2JhYmx5IHBvaW50cyB0byBhIGRlc2NyaXB0b3IgdGhhdCBpcyBub3QgaXRlcmF0 ZWQgb3Zlci4KPj4+Pj4+Cj4+Pj4+PiBKaWFuZywgY2FuIHlvdSBwbGVhc2UgaGF2ZSBhIGxvb2s/ Cj4+Pj4+IEkgdGhpbmsgdGhhdCBwYXJ0IGlzIG5vcm1hbCwgdGhlIHdheSB3ZSB1c2UgdGhlIHF1 ZXVlIGlzIHRvIGFsd2F5cwo+Pj4+PiBzdWJtaXQgYSB3b3JrIG9wZXJhdGlvbiBmb2xsb3dlZCBi eSBhIHdhaXQgb3BlcmF0aW9uIHNvIHRoYXQgd2UgY2FuCj4+Pj4+IGRldGVybWluZSB0aGUgd29y ayBvcGVyYXRpb24gaXMgY29tcGxldGUuICBUaGF0J3MgZG9uZSB2aWEKPj4+Pj4gcWlfc3VibWl0 X3N5bmMoKS4gIFdlIGhhdmUgaGFkIHNwdXJpb3VzIHJlcG9ydHMgb2YgdGhlIHF1ZXVlIGdldHRp bmcKPj4+Pj4gaW1wb3NzaWJseSBvdXQgb2Ygc3luYyB0aG91Z2guICBJIHNhdyBvbmUgdGhhdCB3 YXMgc29tZWhvdyBsaW5rZWQgdG8KPj4+Pj4gdGhlCj4+Pj4+IEkvTyBBVCBETUEgZW5naW5lLiAg Um9sYW5kIERyZWllciBzYXcgc29tZXRoaW5nIHNpbWlsYXJbMV0uIEknbSBub3QKPj4+Pj4gc3Vy ZSBpZiB0aGV5J3JlIHJlbGF0ZWQgdG8gdGhpcywgYnV0IG1heWJlIHdvcnRoIGNvbXBhcmluZy4g VGhhbmtzLAo+Pj4+IFRoYW5rcywgQWxleCBhbmQgSm9lcmchCj4+Pj4KPj4+PiBIaSBEb25nZG9u ZywKPj4+PiAgICAgIENvdWxkIHlvdSBwbGVhc2UgaGVscCB0byBnaXZlIHNvbWUgaW5zdHJ1Y3Rp b25zIGFib3V0IGhvdyB0bwo+Pj4+IHJlcHJvZHVjZSB0aGlzIGlzc3VlPyBJIHdpbGwgdHJ5IHRv IHJlcHJvZHVjZSBpdCBpZiBwb3NzaWJsZS4KPj4+PiBUaGFua3MhCj4+Pj4gR2VycnkKPj4+IEhp IEdlcnJ5LAo+Pj4KPj4+IFdlJ3JlIHJ1bm5pbmcga2VybmVsIDQuMS4wIG9uIGEgNC1zb2NrZXQg c3lzdGVtIGFuZCAgd2Ugd2FudCB0bwo+Pj4gb2ZmbGluZSBzb2NrZXQgMS4KPj4+IFN0ZXBzIGFz IGZvbGxvd3M6Cj4+Pgo+Pj4gZWNobyAxID4gL3N5cy9maXJtd2FyZS9hY3BpL2hvdHBsdWcvZm9y Y2VfcmVtb3ZlCj4+PiBlY2hvIDEgPiAvc3lzL2RldmljZXMvTE5YU1lTVE06MDAvTE5YU1lCVVM6 MDAvQUNQSTAwMDQ6MDEvZWplY3QKPiBIaSBEb25nZG9uZywKPiAJSSBmYWlsZWQgdG8gcmVwcm9k dWNlIHRoaXMgaXNzdWUgb24gbXkgc2lkZS4gU29tZSBwbGVhc2UgaGVscAo+IHRvIGNvbmZpcm0/ Cj4gMSkgSXMgdGhpcyBpc3N1ZSByZXByb2R1Y2libGUgb24geW91ciBzaWRlPwpZZXMuCj4gMikg RG9lcyB0aGlzIGlzc3VlIGhhcHBlbiBpZiB5b3UgZGlzYWJsZSBpcnFiYWxhbmNlIHNlcnZpY2Ug b24geW91Cj4gICAgIHN5c3RlbT8KWWVzLgo+IDMpIEhhcyB0aGUgY29ycmVzcG9uZGluZyBQQ0kg aG9zdCBicmlkZ2UgYmVlbiByZW1vdmVkIGJlZm9yZSByZW1vdmluZwo+ICAgICB0aGUgc29ja2V0 PwpObywgd2Ugd2lsbCB0cnkgdG8gcmVtb3ZlIGl0IGJlZm9yZSByZW1vdmluZyB0aGUgc29ja2V0 IGxhdGVyLgpUaGFua3MgZm9yIHlvdXIgaGVscCwgR2VycnkuCj4KPiA+RnJvbSB0aGUgbG9nIG1l c3NhZ2UsIHdlIG9ubHkgbm90aWNlZCBsb2cgbWVzc2FnZXMgZm9yIENQVSBhbmQgbWVtb3J5LAo+ IGJ1dCBub3QgbWVzc2FnZXMgZm9yIFBDSSAoSU9NTVUpIGRldmljZXMuIEFuZCB0aGlzIGxvZyBt ZXNzYWdlCj4gCSJbIDE0OS45NzY0OTNdIGFjcGkgQUNQSTAwMDQ6MDE6IFN0aWxsIG5vdCBwcmVz ZW50Igo+IGltcGxpZXMgdGhhdCB0aGUgc29ja2V0IGhhcyBiZWVuIHBvd2VyZWQgb2ZmIGR1cmlu ZyB0aGUgZWplY3Rpb24uCj4gU28gdGhlIHN0b3J5IG1heSBiZSB0aGF0IHlvdSBwb3dlcmVkIG9m ZiB0aGUgc29ja2V0IHdoaWxlIHRoZSBob3N0Cj4gYnJpZGdlIG9uIHRoZSBzb2NrZXQgaXMgc3Rp bGwgaW4gdXNlLgo+IFRoYW5rcyEKPiBHZXJyeQo+Cj4gLgo+CgpfX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fXwppb21tdSBtYWlsaW5nIGxpc3QKaW9tbXVAbGlz dHMubGludXgtZm91bmRhdGlvbi5vcmcKaHR0cHM6Ly9saXN0cy5saW51eGZvdW5kYXRpb24ub3Jn L21haWxtYW4vbGlzdGluZm8vaW9tbXU= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752913AbbFYKtU (ORCPT ); Thu, 25 Jun 2015 06:49:20 -0400 Received: from sg02.corpemail.net ([128.199.154.28]:35457 "EHLO sg02.corpemail.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751241AbbFYKtO (ORCPT ); Thu, 25 Jun 2015 06:49:14 -0400 Subject: Re: Panic when cpu hot-remove To: Jiang Liu , Alex Williamson , Joerg Roedeljoro References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> <558259BD.7080402@linux.intel.com> <558272E3.4000504@inspur.com> <55827927.4080504@inspur.com> <558BB7B8.7000402@linux.intel.com> CC: =?UTF-8?B?5YiY6ZW/55Sf?= , iommu , "jiang.liu@intel.com" , linux-kernel , =?UTF-8?B?6Zer5pmT5bOw?= , Roland Dreier From: fandongdong Message-ID: <558BDC0D.2000206@inspur.com> Date: Thu, 25 Jun 2015 18:46:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <558BB7B8.7000402@linux.intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.165.21.134] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2015/6/25 16:11, Jiang Liu 写道: > On 2015/6/18 15:54, fandongdong wrote: >> >> 在 2015/6/18 15:27, fandongdong 写道: >>> >>> 在 2015/6/18 13:40, Jiang Liu 写道: >>>> On 2015/6/17 22:36, Alex Williamson wrote: >>>>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: >>>>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: >>>>>>> Hi maintainer, >>>>>>> >>>>>>> We found a problem that a panic happen when cpu was hot-removed. >>>>>>> We also trace the problem according to the calltrace information. >>>>>>> An endless loop happen because value head is not equal to value >>>>>>> tail forever in the function qi_check_fault( ). >>>>>>> The location code is as follows: >>>>>>> >>>>>>> >>>>>>> do { >>>>>>> if (qi->desc_status[head] == QI_IN_USE) >>>>>>> qi->desc_status[head] = QI_ABORT; >>>>>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH; >>>>>>> } while (head != tail); >>>>>> Hmm, this code interates only over every second QI descriptor, and >>>>>> tail >>>>>> probably points to a descriptor that is not iterated over. >>>>>> >>>>>> Jiang, can you please have a look? >>>>> I think that part is normal, the way we use the queue is to always >>>>> submit a work operation followed by a wait operation so that we can >>>>> determine the work operation is complete. That's done via >>>>> qi_submit_sync(). We have had spurious reports of the queue getting >>>>> impossibly out of sync though. I saw one that was somehow linked to >>>>> the >>>>> I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not >>>>> sure if they're related to this, but maybe worth comparing. Thanks, >>>> Thanks, Alex and Joerg! >>>> >>>> Hi Dongdong, >>>> Could you please help to give some instructions about how to >>>> reproduce this issue? I will try to reproduce it if possible. >>>> Thanks! >>>> Gerry >>> Hi Gerry, >>> >>> We're running kernel 4.1.0 on a 4-socket system and we want to >>> offline socket 1. >>> Steps as follows: >>> >>> echo 1 > /sys/firmware/acpi/hotplug/force_remove >>> echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject > Hi Dongdong, > I failed to reproduce this issue on my side. Some please help > to confirm? > 1) Is this issue reproducible on your side? Yes. > 2) Does this issue happen if you disable irqbalance service on you > system? Yes. > 3) Has the corresponding PCI host bridge been removed before removing > the socket? No, we will try to remove it before removing the socket later. Thanks for your help, Gerry. > > >From the log message, we only noticed log messages for CPU and memory, > but not messages for PCI (IOMMU) devices. And this log message > "[ 149.976493] acpi ACPI0004:01: Still not present" > implies that the socket has been powered off during the ejection. > So the story may be that you powered off the socket while the host > bridge on the socket is still in use. > Thanks! > Gerry > > . >