From mboxrd@z Thu Jan 1 00:00:00 1970 From: fandongdong Subject: Re: Panic when cpu hot-remove Date: Thu, 18 Jun 2015 15:54:15 +0800 Message-ID: <55827927.4080504@inspur.com> References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> <558259BD.7080402@linux.intel.com> <558272E3.4000504@inspur.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; Format="flowed" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <558272E3.4000504-6gUaA8visnnQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Jiang Liu , Alex Williamson , Joerg Roedeljoro Cc: Roland Dreier , =?UTF-8?B?6Zer5pmT5bOw?= , "jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" , linux-kernel , =?UTF-8?B?5YiY6ZW/55Sf?= , iommu List-Id: iommu@lists.linux-foundation.org CgrlnKggMjAxNS82LzE4IDE1OjI3LCBmYW5kb25nZG9uZyDlhpnpgZM6Cj4KPgo+IOWcqCAyMDE1 LzYvMTggMTM6NDAsIEppYW5nIExpdSDlhpnpgZM6Cj4+IE9uIDIwMTUvNi8xNyAyMjozNiwgQWxl eCBXaWxsaWFtc29uIHdyb3RlOgo+Pj4gT24gV2VkLCAyMDE1LTA2LTE3IGF0IDEzOjUyICswMjAw LCBKb2VyZyBSb2VkZWxqb3JvIHdyb3RlOgo+Pj4+IE9uIFdlZCwgSnVuIDE3LCAyMDE1IGF0IDEw OjQyOjQ5QU0gKzAwMDAsIOiMg+WGrOWGrCB3cm90ZToKPj4+Pj4gSGkgbWFpbnRhaW5lciwKPj4+ Pj4KPj4+Pj4gV2UgZm91bmQgYSBwcm9ibGVtIHRoYXQgYSBwYW5pYyBoYXBwZW4gd2hlbiBjcHUg d2FzIGhvdC1yZW1vdmVkLiAKPj4+Pj4gV2UgYWxzbyB0cmFjZSB0aGUgcHJvYmxlbSBhY2NvcmRp bmcgdG8gdGhlIGNhbGx0cmFjZSBpbmZvcm1hdGlvbi4KPj4+Pj4gQW4gZW5kbGVzcyBsb29wIGhh cHBlbiBiZWNhdXNlIHZhbHVlIGhlYWQgaXMgbm90IGVxdWFsIHRvIHZhbHVlIAo+Pj4+PiB0YWls IGZvcmV2ZXIgaW4gdGhlIGZ1bmN0aW9uIHFpX2NoZWNrX2ZhdWx0KCApLgo+Pj4+PiBUaGUgbG9j YXRpb24gY29kZSBpcyBhcyBmb2xsb3dzOgo+Pj4+Pgo+Pj4+Pgo+Pj4+PiBkbyB7Cj4+Pj4+ICAg ICAgICAgIGlmIChxaS0+ZGVzY19zdGF0dXNbaGVhZF0gPT0gUUlfSU5fVVNFKQo+Pj4+PiAgICAg ICAgICBxaS0+ZGVzY19zdGF0dXNbaGVhZF0gPSBRSV9BQk9SVDsKPj4+Pj4gICAgICAgICAgaGVh ZCA9IChoZWFkIC0gMiArIFFJX0xFTkdUSCkgJSBRSV9MRU5HVEg7Cj4+Pj4+ICAgICAgfSB3aGls ZSAoaGVhZCAhPSB0YWlsKTsKPj4+PiBIbW0sIHRoaXMgY29kZSBpbnRlcmF0ZXMgb25seSBvdmVy IGV2ZXJ5IHNlY29uZCBRSSBkZXNjcmlwdG9yLCBhbmQgCj4+Pj4gdGFpbAo+Pj4+IHByb2JhYmx5 IHBvaW50cyB0byBhIGRlc2NyaXB0b3IgdGhhdCBpcyBub3QgaXRlcmF0ZWQgb3Zlci4KPj4+Pgo+ Pj4+IEppYW5nLCBjYW4geW91IHBsZWFzZSBoYXZlIGEgbG9vaz8KPj4+IEkgdGhpbmsgdGhhdCBw YXJ0IGlzIG5vcm1hbCwgdGhlIHdheSB3ZSB1c2UgdGhlIHF1ZXVlIGlzIHRvIGFsd2F5cwo+Pj4g c3VibWl0IGEgd29yayBvcGVyYXRpb24gZm9sbG93ZWQgYnkgYSB3YWl0IG9wZXJhdGlvbiBzbyB0 aGF0IHdlIGNhbgo+Pj4gZGV0ZXJtaW5lIHRoZSB3b3JrIG9wZXJhdGlvbiBpcyBjb21wbGV0ZS4g IFRoYXQncyBkb25lIHZpYQo+Pj4gcWlfc3VibWl0X3N5bmMoKS4gIFdlIGhhdmUgaGFkIHNwdXJp b3VzIHJlcG9ydHMgb2YgdGhlIHF1ZXVlIGdldHRpbmcKPj4+IGltcG9zc2libHkgb3V0IG9mIHN5 bmMgdGhvdWdoLiAgSSBzYXcgb25lIHRoYXQgd2FzIHNvbWVob3cgbGlua2VkIHRvIAo+Pj4gdGhl Cj4+PiBJL08gQVQgRE1BIGVuZ2luZS4gIFJvbGFuZCBEcmVpZXIgc2F3IHNvbWV0aGluZyBzaW1p bGFyWzFdLiBJJ20gbm90Cj4+PiBzdXJlIGlmIHRoZXkncmUgcmVsYXRlZCB0byB0aGlzLCBidXQg bWF5YmUgd29ydGggY29tcGFyaW5nLiBUaGFua3MsCj4+IFRoYW5rcywgQWxleCBhbmQgSm9lcmch Cj4+Cj4+IEhpIERvbmdkb25nLAo+PiAgICAgQ291bGQgeW91IHBsZWFzZSBoZWxwIHRvIGdpdmUg c29tZSBpbnN0cnVjdGlvbnMgYWJvdXQgaG93IHRvCj4+IHJlcHJvZHVjZSB0aGlzIGlzc3VlPyBJ IHdpbGwgdHJ5IHRvIHJlcHJvZHVjZSBpdCBpZiBwb3NzaWJsZS4KPj4gVGhhbmtzIQo+PiBHZXJy eQo+IEhpIEdlcnJ5LAo+Cj4gV2UncmUgcnVubmluZyBrZXJuZWwgNC4xLjAgb24gYSA0LXNvY2tl dCBzeXN0ZW0gYW5kICB3ZSB3YW50IHRvIAo+IG9mZmxpbmUgc29ja2V0IDEuCj4gU3RlcHMgYXMg Zm9sbG93czoKPgo+IGVjaG8gMSA+IC9zeXMvZmlybXdhcmUvYWNwaS9ob3RwbHVnL2ZvcmNlX3Jl bW92ZQo+IGVjaG8gMSA+IC9zeXMvZGV2aWNlcy9MTlhTWVNUTTowMC9MTlhTWUJVUzowMC9BQ1BJ MDAwNDowMS9lamVjdAo+Cj4gVGhhbmtzIQo+IERvbmdkb25nCj4+PiBBbGV4Cj4+Pgo+Pj4gWzFd IAo+Pj4gaHR0cDovL2xpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmcvcGlwZXJtYWlsL2lvbW11LzIw MTUtSmFudWFyeS8wMTE1MDIuaHRtbAo+Pj4KPj4+IC0tIAo+Pj4gVG8gdW5zdWJzY3JpYmUgZnJv bSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIAo+Pj4gbGludXgta2VybmVs IiBpbgo+Pj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5v cmcKPj4+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpv cmRvbW8taW5mby5odG1sCj4+PiBQbGVhc2UgcmVhZCB0aGUgRkFRIGF0ICBodHRwOi8vd3d3LnR1 eC5vcmcvbGttbC8KPj4+Cj4+IC4KPj4KPgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX18KaW9tbXUgbWFpbGluZyBsaXN0CmlvbW11QGxpc3RzLmxpbnV4LWZv dW5kYXRpb24ub3JnCmh0dHBzOi8vbGlzdHMubGludXhmb3VuZGF0aW9uLm9yZy9tYWlsbWFuL2xp c3RpbmZvL2lvbW11 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753045AbbFRH4e (ORCPT ); Thu, 18 Jun 2015 03:56:34 -0400 Received: from unicom145.biz-email.net ([210.51.26.145]:2025 "EHLO unicom145.biz-email.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752404AbbFRH4d (ORCPT ); Thu, 18 Jun 2015 03:56:33 -0400 X-Greylist: delayed 76415 seconds by postgrey-1.27 at vger.kernel.org; Thu, 18 Jun 2015 03:56:30 EDT Subject: Re: Panic when cpu hot-remove To: Jiang Liu , Alex Williamson , Joerg Roedeljoro References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> <558259BD.7080402@linux.intel.com> <558272E3.4000504@inspur.com> CC: =?UTF-8?B?5YiY6ZW/55Sf?= , iommu , "jiang.liu@intel.com" , linux-kernel , =?UTF-8?B?6Zer5pmT5bOw?= , Roland Dreier From: fandongdong Message-ID: <55827927.4080504@inspur.com> Date: Thu, 18 Jun 2015 15:54:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <558272E3.4000504@inspur.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.165.21.134] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2015/6/18 15:27, fandongdong 写道: > > > 在 2015/6/18 13:40, Jiang Liu 写道: >> On 2015/6/17 22:36, Alex Williamson wrote: >>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: >>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: >>>>> Hi maintainer, >>>>> >>>>> We found a problem that a panic happen when cpu was hot-removed. >>>>> We also trace the problem according to the calltrace information. >>>>> An endless loop happen because value head is not equal to value >>>>> tail forever in the function qi_check_fault( ). >>>>> The location code is as follows: >>>>> >>>>> >>>>> do { >>>>> if (qi->desc_status[head] == QI_IN_USE) >>>>> qi->desc_status[head] = QI_ABORT; >>>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH; >>>>> } while (head != tail); >>>> Hmm, this code interates only over every second QI descriptor, and >>>> tail >>>> probably points to a descriptor that is not iterated over. >>>> >>>> Jiang, can you please have a look? >>> I think that part is normal, the way we use the queue is to always >>> submit a work operation followed by a wait operation so that we can >>> determine the work operation is complete. That's done via >>> qi_submit_sync(). We have had spurious reports of the queue getting >>> impossibly out of sync though. I saw one that was somehow linked to >>> the >>> I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not >>> sure if they're related to this, but maybe worth comparing. Thanks, >> Thanks, Alex and Joerg! >> >> Hi Dongdong, >> Could you please help to give some instructions about how to >> reproduce this issue? I will try to reproduce it if possible. >> Thanks! >> Gerry > Hi Gerry, > > We're running kernel 4.1.0 on a 4-socket system and we want to > offline socket 1. > Steps as follows: > > echo 1 > /sys/firmware/acpi/hotplug/force_remove > echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject > > Thanks! > Dongdong >>> Alex >>> >>> [1] >>> http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> >> . >> >