From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: Panic when cpu hot-remove Date: Wed, 17 Jun 2015 08:36:40 -0600 Message-ID: <1434551800.5628.5.camel@redhat.com> References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20150617115238.GC27750-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Joerg Roedeljoro Cc: Roland Dreier , =?UTF-8?Q?=E9=97=AB=E6=99=93=E5=B3=B0?= , "jiang.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" , linux-kernel , =?UTF-8?Q?=E5=88=98=E9=95=BF=E7=94=9F?= , iommu , =?UTF-8?Q?=E8=8C=83=E5=86=AC=E5=86=AC?= List-Id: iommu@lists.linux-foundation.org T24gV2VkLCAyMDE1LTA2LTE3IGF0IDEzOjUyICswMjAwLCBKb2VyZyBSb2VkZWxqb3JvIHdyb3Rl Ogo+IE9uIFdlZCwgSnVuIDE3LCAyMDE1IGF0IDEwOjQyOjQ5QU0gKzAwMDAsIOiMg+WGrOWGrCB3 cm90ZToKPiA+IEhpIG1haW50YWluZXIsCj4gPiAKPiA+IFdlIGZvdW5kIGEgcHJvYmxlbSB0aGF0 IGEgcGFuaWMgaGFwcGVuIHdoZW4gY3B1IHdhcyBob3QtcmVtb3ZlZC4gV2UgYWxzbyB0cmFjZSB0 aGUgcHJvYmxlbSBhY2NvcmRpbmcgdG8gdGhlIGNhbGx0cmFjZSBpbmZvcm1hdGlvbi4KPiA+IEFu IGVuZGxlc3MgbG9vcCBoYXBwZW4gYmVjYXVzZSB2YWx1ZSBoZWFkIGlzIG5vdCBlcXVhbCB0byB2 YWx1ZSB0YWlsIGZvcmV2ZXIgaW4gdGhlIGZ1bmN0aW9uIHFpX2NoZWNrX2ZhdWx0KCApLgo+ID4g VGhlIGxvY2F0aW9uIGNvZGUgaXMgYXMgZm9sbG93czoKPiA+IAo+ID4gCj4gPiBkbyB7Cj4gPiAg ICAgICAgIGlmIChxaS0+ZGVzY19zdGF0dXNbaGVhZF0gPT0gUUlfSU5fVVNFKQo+ID4gICAgICAg ICBxaS0+ZGVzY19zdGF0dXNbaGVhZF0gPSBRSV9BQk9SVDsKPiA+ICAgICAgICAgaGVhZCA9ICho ZWFkIC0gMiArIFFJX0xFTkdUSCkgJSBRSV9MRU5HVEg7Cj4gPiAgICAgfSB3aGlsZSAoaGVhZCAh PSB0YWlsKTsKPiAKPiBIbW0sIHRoaXMgY29kZSBpbnRlcmF0ZXMgb25seSBvdmVyIGV2ZXJ5IHNl Y29uZCBRSSBkZXNjcmlwdG9yLCBhbmQgdGFpbAo+IHByb2JhYmx5IHBvaW50cyB0byBhIGRlc2Ny aXB0b3IgdGhhdCBpcyBub3QgaXRlcmF0ZWQgb3Zlci4KPiAKPiBKaWFuZywgY2FuIHlvdSBwbGVh c2UgaGF2ZSBhIGxvb2s/CgpJIHRoaW5rIHRoYXQgcGFydCBpcyBub3JtYWwsIHRoZSB3YXkgd2Ug dXNlIHRoZSBxdWV1ZSBpcyB0byBhbHdheXMKc3VibWl0IGEgd29yayBvcGVyYXRpb24gZm9sbG93 ZWQgYnkgYSB3YWl0IG9wZXJhdGlvbiBzbyB0aGF0IHdlIGNhbgpkZXRlcm1pbmUgdGhlIHdvcmsg b3BlcmF0aW9uIGlzIGNvbXBsZXRlLiAgVGhhdCdzIGRvbmUgdmlhCnFpX3N1Ym1pdF9zeW5jKCku ICBXZSBoYXZlIGhhZCBzcHVyaW91cyByZXBvcnRzIG9mIHRoZSBxdWV1ZSBnZXR0aW5nCmltcG9z c2libHkgb3V0IG9mIHN5bmMgdGhvdWdoLiAgSSBzYXcgb25lIHRoYXQgd2FzIHNvbWVob3cgbGlu a2VkIHRvIHRoZQpJL08gQVQgRE1BIGVuZ2luZS4gIFJvbGFuZCBEcmVpZXIgc2F3IHNvbWV0aGlu ZyBzaW1pbGFyWzFdLiAgSSdtIG5vdApzdXJlIGlmIHRoZXkncmUgcmVsYXRlZCB0byB0aGlzLCBi dXQgbWF5YmUgd29ydGggY29tcGFyaW5nLiAgVGhhbmtzLAoKQWxleAoKWzFdIGh0dHA6Ly9saXN0 cy5saW51eGZvdW5kYXRpb24ub3JnL3BpcGVybWFpbC9pb21tdS8yMDE1LUphbnVhcnkvMDExNTAy Lmh0bWwKCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmlv bW11IG1haWxpbmcgbGlzdAppb21tdUBsaXN0cy5saW51eC1mb3VuZGF0aW9uLm9yZwpodHRwczov L2xpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmcvbWFpbG1hbi9saXN0aW5mby9pb21tdQ== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756764AbbFQOgp (ORCPT ); Wed, 17 Jun 2015 10:36:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49265 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754538AbbFQOgm (ORCPT ); Wed, 17 Jun 2015 10:36:42 -0400 Message-ID: <1434551800.5628.5.camel@redhat.com> Subject: Re: Panic when cpu hot-remove From: Alex Williamson To: Joerg Roedeljoro Cc: =?UTF-8?Q?=E8=8C=83=E5=86=AC=E5=86=AC?= , =?UTF-8?Q?=E5=88=98=E9=95=BF=E7=94=9F?= , iommu , "jiang.liu@intel.com" , linux-kernel , =?UTF-8?Q?=E9=97=AB=E6=99=93=E5=B3=B0?= , Roland Dreier Date: Wed, 17 Jun 2015 08:36:40 -0600 In-Reply-To: <20150617115238.GC27750@8bytes.org> References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: > On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: > > Hi maintainer, > > > > We found a problem that a panic happen when cpu was hot-removed. We also trace the problem according to the calltrace information. > > An endless loop happen because value head is not equal to value tail forever in the function qi_check_fault( ). > > The location code is as follows: > > > > > > do { > > if (qi->desc_status[head] == QI_IN_USE) > > qi->desc_status[head] = QI_ABORT; > > head = (head - 2 + QI_LENGTH) % QI_LENGTH; > > } while (head != tail); > > Hmm, this code interates only over every second QI descriptor, and tail > probably points to a descriptor that is not iterated over. > > Jiang, can you please have a look? I think that part is normal, the way we use the queue is to always submit a work operation followed by a wait operation so that we can determine the work operation is complete. That's done via qi_submit_sync(). We have had spurious reports of the queue getting impossibly out of sync though. I saw one that was somehow linked to the I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not sure if they're related to this, but maybe worth comparing. Thanks, Alex [1] http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html