From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cavan.codon.org.uk ([93.93.128.6]:47746 "EHLO cavan.codon.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751193Ab3KZRsQ (ORCPT ); Tue, 26 Nov 2013 12:48:16 -0500 Date: Tue, 26 Nov 2013 17:48:06 +0000 From: Matthew Garrett To: Bjorn Helgaas Cc: Khalid Aziz , Chang Liu , "linux-pci@vger.kernel.org" , Lan Tianyu , Konstantin Khlebnikov , Alan Cox , Takao Indoh , Jility , Florian Otti , "linux-kernel@vger.kernel.org" , "Eric W. Biederman" Subject: Re: [PATCH] PCI: add a quirk for keeping Bus Master bit on shutdown Message-ID: <20131126174806.GA14789@srcf.ucam.org> References: <1384285203-642-1-git-send-email-cl91tp@gmail.com> <5294CF1A.6080707@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Sender: linux-pci-owner@vger.kernel.org List-ID: On Tue, Nov 26, 2013 at 10:35:26AM -0700, Bjorn Helgaas wrote: > On Tue, Nov 26, 2013 at 9:40 AM, Khalid Aziz wrote: > > Disabling Bus Master bit is effectively a brute force and not an elegant way > > to stop unwanted DMA. It can have side effects as Alan and others pointed > > out in the original discussion, and we are now seeing one with Lynx Point on > > Acer. > > I'm getting more queasy all the time about disabling Bus Master. I > don't think RHEL does it, and that's probably where most kexec use is. > So I doubt we really have much experience with it yet. Does Windows disable the BM bit on shutdown? If not, it's likely that there are platforms where the SMM code assumes it's still enabled. We also know that there are devices that hang if BM is disabled while their DMA engines are still running. Unless we verify that Windows does this, I think there's no way we can guarantee that firmware won't make assumptions about the state of PCI. The easiest compromise would probably be to set a flag that disables busmastering purely when we're performing a kexec. > > Eric had pointed out in original discussion - > > that this code change > > moves failure from a random point in the kexec'd kernel to a predictable > > point on shutdown path where it becomes lot easier to debug than a random > > memory overwrite. > > That is probably true in some cases, but not this one. I have no idea > how to debug this poweroff hang. Poweroff is a path *everybody* uses, > so it's much more important to have that work reliably than it is to > have kexec work. If it's hanging after we've performed the io writes that trap us into SMM, there's no meaningful way for us to debug it. We're violating assumptions that the firmware is making, and the only way to fix that is to cease violating those assumptions. -- Matthew Garrett | mjg59@srcf.ucam.org