From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756764AbbFQOgp (ORCPT ); Wed, 17 Jun 2015 10:36:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49265 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754538AbbFQOgm (ORCPT ); Wed, 17 Jun 2015 10:36:42 -0400 Message-ID: <1434551800.5628.5.camel@redhat.com> Subject: Re: Panic when cpu hot-remove From: Alex Williamson To: Joerg Roedeljoro Cc: =?UTF-8?Q?=E8=8C=83=E5=86=AC=E5=86=AC?= , =?UTF-8?Q?=E5=88=98=E9=95=BF=E7=94=9F?= , iommu , "jiang.liu@intel.com" , linux-kernel , =?UTF-8?Q?=E9=97=AB=E6=99=93=E5=B3=B0?= , Roland Dreier Date: Wed, 17 Jun 2015 08:36:40 -0600 In-Reply-To: <20150617115238.GC27750@8bytes.org> References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: > On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: > > Hi maintainer, > > > > We found a problem that a panic happen when cpu was hot-removed. We also trace the problem according to the calltrace information. > > An endless loop happen because value head is not equal to value tail forever in the function qi_check_fault( ). > > The location code is as follows: > > > > > > do { > > if (qi->desc_status[head] == QI_IN_USE) > > qi->desc_status[head] = QI_ABORT; > > head = (head - 2 + QI_LENGTH) % QI_LENGTH; > > } while (head != tail); > > Hmm, this code interates only over every second QI descriptor, and tail > probably points to a descriptor that is not iterated over. > > Jiang, can you please have a look? I think that part is normal, the way we use the queue is to always submit a work operation followed by a wait operation so that we can determine the work operation is complete. That's done via qi_submit_sync(). We have had spurious reports of the queue getting impossibly out of sync though. I saw one that was somehow linked to the I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not sure if they're related to this, but maybe worth comparing. Thanks, Alex [1] http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html