From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757086AbYHZIfp (ORCPT ); Tue, 26 Aug 2008 04:35:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752241AbYHZIfi (ORCPT ); Tue, 26 Aug 2008 04:35:38 -0400 Received: from hrndva-omtalb.mail.rr.com ([71.74.56.123]:39023 "EHLO hrndva-omtalb.mail.rr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751734AbYHZIfh (ORCPT ); Tue, 26 Aug 2008 04:35:37 -0400 Date: Mon, 25 Aug 2008 22:35:34 -1000 From: Joshua Hoblitt To: Ingo Molnar Cc: Yinghai Lu , Andrew Morton , bugme-daemon@bugzilla.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [Bug 11388] New: 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable Message-ID: <20080826083534.GH10646@hoblitt.com> References: <86802c440808211855h50ea65faudc169e48f83e18e2@mail.gmail.com> <20080822021512.GJ23377@hoblitt.com> <86802c440808211926k1ec5b2a2g2bec1b7faee7ebbb@mail.gmail.com> <86802c440808212024v3ccd2a7ey6517c180464275f0@mail.gmail.com> <20080822035024.GB30284@elte.hu> <20080822035609.GD30284@elte.hu> <86802c440808212148k74e56fa2mdb2e329ed4df1271@mail.gmail.com> <20080823002235.GL23377@hoblitt.com> <86802c440808222252j6f900f1dlfbf1378a7bc35ee4@mail.gmail.com> <20080823104311.GC25904@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080823104311.GC25904@elte.hu> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Aug 23, 2008 at 12:43:11PM +0200, Ingo Molnar wrote: > > * Yinghai Lu wrote: > > > On Fri, Aug 22, 2008 at 5:22 PM, Joshua Hoblitt wrote: > > > I've confirmed that the boards in these systems are Tyan Tempest > > > i5400PW (S5397)s. We've discovered a workload that will deadlock > > > the system under both 2.6.24.2 and -tip kernel with the mtrr masking > > > patch. The only thing unusual about this workload is that one of > > > the binaries in it constantly segvs... Is it possible that these > > > deadlocks (no kernel oops on console) are caused by MSR setup > > > wierdness or is it likely unrelated? > > > > could be other problem. > > > > cpu should be smarter enough to understand the missing bits in mask. > > at least amd cpu. remember that we didn't set mask bits to 40bits with > > opteron with LinuxBIOS, and everything still works well. > > yeah. Is the deadlock debuggable? (does nmi_watchdog=1 produce anything > useful, or does the enabling of CONFIG_PROVE_LOCKING=y show anything > weird in the syslog during light, non-deadlocking use of this workload?) Enabling the nmi_watchdog doesn't produce anything at all (I double checked the .config... it should be working). Rebuilding with PROVE_LOCKING seems to have prevented the deadlock. It used to take 30-45 mins to lock the system up under heavy load and we're going on 6 hours here with no issues. Absolutely nothing in the dmesg. Ugh. Any other suggestions? How bad is it to leave PROVE_LOCKING enabled? -J --