From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiang Liu Subject: Re: Panic when cpu hot-remove Date: Thu, 18 Jun 2015 13:40:13 +0800 Message-ID: <558259BD.7080402@linux.intel.com> References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1434551800.5628.5.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Alex Williamson , Joerg Roedeljoro Cc: Roland Dreier , =?UTF-8?B?6Zer5pmT5bOw?= , "jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" , linux-kernel , =?UTF-8?B?5YiY6ZW/55Sf?= , iommu , =?UTF-8?B?6IyD5Yas5Yas?= List-Id: iommu@lists.linux-foundation.org T24gMjAxNS82LzE3IDIyOjM2LCBBbGV4IFdpbGxpYW1zb24gd3JvdGU6Cj4gT24gV2VkLCAyMDE1 LTA2LTE3IGF0IDEzOjUyICswMjAwLCBKb2VyZyBSb2VkZWxqb3JvIHdyb3RlOgo+PiBPbiBXZWQs IEp1biAxNywgMjAxNSBhdCAxMDo0Mjo0OUFNICswMDAwLCDojIPlhqzlhqwgd3JvdGU6Cj4+PiBI aSBtYWludGFpbmVyLAo+Pj4KPj4+IFdlIGZvdW5kIGEgcHJvYmxlbSB0aGF0IGEgcGFuaWMgaGFw cGVuIHdoZW4gY3B1IHdhcyBob3QtcmVtb3ZlZC4gV2UgYWxzbyB0cmFjZSB0aGUgcHJvYmxlbSBh Y2NvcmRpbmcgdG8gdGhlIGNhbGx0cmFjZSBpbmZvcm1hdGlvbi4KPj4+IEFuIGVuZGxlc3MgbG9v cCBoYXBwZW4gYmVjYXVzZSB2YWx1ZSBoZWFkIGlzIG5vdCBlcXVhbCB0byB2YWx1ZSB0YWlsIGZv cmV2ZXIgaW4gdGhlIGZ1bmN0aW9uIHFpX2NoZWNrX2ZhdWx0KCApLgo+Pj4gVGhlIGxvY2F0aW9u IGNvZGUgaXMgYXMgZm9sbG93czoKPj4+Cj4+Pgo+Pj4gZG8gewo+Pj4gICAgICAgICBpZiAocWkt PmRlc2Nfc3RhdHVzW2hlYWRdID09IFFJX0lOX1VTRSkKPj4+ICAgICAgICAgcWktPmRlc2Nfc3Rh dHVzW2hlYWRdID0gUUlfQUJPUlQ7Cj4+PiAgICAgICAgIGhlYWQgPSAoaGVhZCAtIDIgKyBRSV9M RU5HVEgpICUgUUlfTEVOR1RIOwo+Pj4gICAgIH0gd2hpbGUgKGhlYWQgIT0gdGFpbCk7Cj4+Cj4+ IEhtbSwgdGhpcyBjb2RlIGludGVyYXRlcyBvbmx5IG92ZXIgZXZlcnkgc2Vjb25kIFFJIGRlc2Ny aXB0b3IsIGFuZCB0YWlsCj4+IHByb2JhYmx5IHBvaW50cyB0byBhIGRlc2NyaXB0b3IgdGhhdCBp cyBub3QgaXRlcmF0ZWQgb3Zlci4KPj4KPj4gSmlhbmcsIGNhbiB5b3UgcGxlYXNlIGhhdmUgYSBs b29rPwo+IAo+IEkgdGhpbmsgdGhhdCBwYXJ0IGlzIG5vcm1hbCwgdGhlIHdheSB3ZSB1c2UgdGhl IHF1ZXVlIGlzIHRvIGFsd2F5cwo+IHN1Ym1pdCBhIHdvcmsgb3BlcmF0aW9uIGZvbGxvd2VkIGJ5 IGEgd2FpdCBvcGVyYXRpb24gc28gdGhhdCB3ZSBjYW4KPiBkZXRlcm1pbmUgdGhlIHdvcmsgb3Bl cmF0aW9uIGlzIGNvbXBsZXRlLiAgVGhhdCdzIGRvbmUgdmlhCj4gcWlfc3VibWl0X3N5bmMoKS4g IFdlIGhhdmUgaGFkIHNwdXJpb3VzIHJlcG9ydHMgb2YgdGhlIHF1ZXVlIGdldHRpbmcKPiBpbXBv c3NpYmx5IG91dCBvZiBzeW5jIHRob3VnaC4gIEkgc2F3IG9uZSB0aGF0IHdhcyBzb21laG93IGxp bmtlZCB0byB0aGUKPiBJL08gQVQgRE1BIGVuZ2luZS4gIFJvbGFuZCBEcmVpZXIgc2F3IHNvbWV0 aGluZyBzaW1pbGFyWzFdLiAgSSdtIG5vdAo+IHN1cmUgaWYgdGhleSdyZSByZWxhdGVkIHRvIHRo aXMsIGJ1dCBtYXliZSB3b3J0aCBjb21wYXJpbmcuICBUaGFua3MsClRoYW5rcywgQWxleCBhbmQg Sm9lcmchCgpIaSBEb25nZG9uZywKCUNvdWxkIHlvdSBwbGVhc2UgaGVscCB0byBnaXZlIHNvbWUg aW5zdHJ1Y3Rpb25zIGFib3V0IGhvdyB0bwpyZXByb2R1Y2UgdGhpcyBpc3N1ZT8gSSB3aWxsIHRy eSB0byByZXByb2R1Y2UgaXQgaWYgcG9zc2libGUuClRoYW5rcyEKR2VycnkKCj4gCj4gQWxleAo+ IAo+IFsxXSBodHRwOi8vbGlzdHMubGludXhmb3VuZGF0aW9uLm9yZy9waXBlcm1haWwvaW9tbXUv MjAxNS1KYW51YXJ5LzAxMTUwMi5odG1sCj4gCj4gLS0KPiBUbyB1bnN1YnNjcmliZSBmcm9tIHRo aXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgta2VybmVsIiBpbgo+IHRo ZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnCj4gTW9yZSBt YWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5o dG1sCj4gUGxlYXNlIHJlYWQgdGhlIEZBUSBhdCAgaHR0cDovL3d3dy50dXgub3JnL2xrbWwvCj4g Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmlvbW11IG1h aWxpbmcgbGlzdAppb21tdUBsaXN0cy5saW51eC1mb3VuZGF0aW9uLm9yZwpodHRwczovL2xpc3Rz LmxpbnV4Zm91bmRhdGlvbi5vcmcvbWFpbG1hbi9saXN0aW5mby9pb21tdQ== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752209AbbFRFkY (ORCPT ); Thu, 18 Jun 2015 01:40:24 -0400 Received: from mga01.intel.com ([192.55.52.88]:28673 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751809AbbFRFkR (ORCPT ); Thu, 18 Jun 2015 01:40:17 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,637,1427785200"; d="scan'208";a="748791726" Message-ID: <558259BD.7080402@linux.intel.com> Date: Thu, 18 Jun 2015 13:40:13 +0800 From: Jiang Liu Organization: Intel User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Alex Williamson , Joerg Roedeljoro CC: =?UTF-8?B?6IyD5Yas5Yas?= , =?UTF-8?B?5YiY6ZW/55Sf?= , iommu , "jiang.liu@intel.com" , linux-kernel , =?UTF-8?B?6Zer5pmT5bOw?= , Roland Dreier Subject: Re: Panic when cpu hot-remove References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> In-Reply-To: <1434551800.5628.5.camel@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/17 22:36, Alex Williamson wrote: > On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: >> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: >>> Hi maintainer, >>> >>> We found a problem that a panic happen when cpu was hot-removed. We also trace the problem according to the calltrace information. >>> An endless loop happen because value head is not equal to value tail forever in the function qi_check_fault( ). >>> The location code is as follows: >>> >>> >>> do { >>> if (qi->desc_status[head] == QI_IN_USE) >>> qi->desc_status[head] = QI_ABORT; >>> head = (head - 2 + QI_LENGTH) % QI_LENGTH; >>> } while (head != tail); >> >> Hmm, this code interates only over every second QI descriptor, and tail >> probably points to a descriptor that is not iterated over. >> >> Jiang, can you please have a look? > > I think that part is normal, the way we use the queue is to always > submit a work operation followed by a wait operation so that we can > determine the work operation is complete. That's done via > qi_submit_sync(). We have had spurious reports of the queue getting > impossibly out of sync though. I saw one that was somehow linked to the > I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not > sure if they're related to this, but maybe worth comparing. Thanks, Thanks, Alex and Joerg! Hi Dongdong, Could you please help to give some instructions about how to reproduce this issue? I will try to reproduce it if possible. Thanks! Gerry > > Alex > > [1] http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >