From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760122AbYG1WAU (ORCPT ); Mon, 28 Jul 2008 18:00:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751383AbYG1WAF (ORCPT ); Mon, 28 Jul 2008 18:00:05 -0400 Received: from mx1.redhat.com ([66.187.233.31]:41607 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751893AbYG1WAE (ORCPT ); Mon, 28 Jul 2008 18:00:04 -0400 Message-ID: <488E4166.5070304@redhat.com> Date: Mon, 28 Jul 2008 18:00:06 -0400 From: Chuck Ebbert User-Agent: Thunderbird 2.0.0.14 (X11/20080501) MIME-Version: 1.0 To: Ingo Molnar CC: Jan Beulich , Andi Kleen , tglx@linutronix.de, linux-kernel@vger.kernel.org, "H. Peter Anvin" , Linus Torvalds , Joerg Roedel Subject: Re: [PATCH] i386: improve double fault handling References: <4880A912.76E4.0078.0@novell.com> <4881263B.7060700@zytor.com> <48846B02.76E4.0078.0@novell.com> <20080721110510.GC10782@elte.hu> <4885CEFE.76E4.0078.0@novell.com> <20080728134252.GI5515@elte.hu> In-Reply-To: <20080728134252.GI5515@elte.hu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ingo Molnar wrote: > > All CPUs hitting a double fault simultaneously and corrupting each > others' kernel stack is a theoretical possibility - but is handling it > worth the complexity? It appears to me that a lock plus a short stub > function that takes the lock (with no stack usage) would handle that > much better. That can't happen now because the TSS gets marked busy so we will get a triple fault instead. One thing we might want to do in the current code is unset the busy flag after handling the fault and before we start looping at the end of the handler so we can handle another fault later. > > So i'm really uneasy about all this. Breakage in such rarely used code > gets found very late, and has thus a high risk of losing debug > information when we need it the most. (i.e. it works in the exact > _opposite_ way of the intented goal of making things more robust - it > makes things less robust) > Also how much bloat does this cause, having a per-CPU TSS and stack for every fault handler that uses this method?