From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753045AbbFRH4e (ORCPT ); Thu, 18 Jun 2015 03:56:34 -0400 Received: from unicom145.biz-email.net ([210.51.26.145]:2025 "EHLO unicom145.biz-email.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752404AbbFRH4d (ORCPT ); Thu, 18 Jun 2015 03:56:33 -0400 X-Greylist: delayed 76415 seconds by postgrey-1.27 at vger.kernel.org; Thu, 18 Jun 2015 03:56:30 EDT Subject: Re: Panic when cpu hot-remove To: Jiang Liu , Alex Williamson , Joerg Roedeljoro References: <42BB8332972FC149B81C55A0D41E3A79C07469@jtjnmailbox06.home.langchao.com> <20150617115238.GC27750@8bytes.org> <1434551800.5628.5.camel@redhat.com> <558259BD.7080402@linux.intel.com> <558272E3.4000504@inspur.com> CC: =?UTF-8?B?5YiY6ZW/55Sf?= , iommu , "jiang.liu@intel.com" , linux-kernel , =?UTF-8?B?6Zer5pmT5bOw?= , Roland Dreier From: fandongdong Message-ID: <55827927.4080504@inspur.com> Date: Thu, 18 Jun 2015 15:54:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <558272E3.4000504@inspur.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.165.21.134] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2015/6/18 15:27, fandongdong 写道: > > > 在 2015/6/18 13:40, Jiang Liu 写道: >> On 2015/6/17 22:36, Alex Williamson wrote: >>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: >>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: >>>>> Hi maintainer, >>>>> >>>>> We found a problem that a panic happen when cpu was hot-removed. >>>>> We also trace the problem according to the calltrace information. >>>>> An endless loop happen because value head is not equal to value >>>>> tail forever in the function qi_check_fault( ). >>>>> The location code is as follows: >>>>> >>>>> >>>>> do { >>>>> if (qi->desc_status[head] == QI_IN_USE) >>>>> qi->desc_status[head] = QI_ABORT; >>>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH; >>>>> } while (head != tail); >>>> Hmm, this code interates only over every second QI descriptor, and >>>> tail >>>> probably points to a descriptor that is not iterated over. >>>> >>>> Jiang, can you please have a look? >>> I think that part is normal, the way we use the queue is to always >>> submit a work operation followed by a wait operation so that we can >>> determine the work operation is complete. That's done via >>> qi_submit_sync(). We have had spurious reports of the queue getting >>> impossibly out of sync though. I saw one that was somehow linked to >>> the >>> I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not >>> sure if they're related to this, but maybe worth comparing. Thanks, >> Thanks, Alex and Joerg! >> >> Hi Dongdong, >> Could you please help to give some instructions about how to >> reproduce this issue? I will try to reproduce it if possible. >> Thanks! >> Gerry > Hi Gerry, > > We're running kernel 4.1.0 on a 4-socket system and we want to > offline socket 1. > Steps as follows: > > echo 1 > /sys/firmware/acpi/hotplug/force_remove > echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject > > Thanks! > Dongdong >>> Alex >>> >>> [1] >>> http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> >> . >> >