From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from claw.goop.org (claw.goop.org [74.207.240.146]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.goop.org", Issuer "Goop.org CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 633ECB6F73 for ; Wed, 23 Mar 2011 12:23:39 +1100 (EST) Message-ID: <4D88A560.8080405@goop.org> Date: Tue, 22 Mar 2011 13:34:24 +0000 From: Jeremy Fitzhardinge MIME-Version: 1.0 To: Benjamin Herrenschmidt Subject: Re: mmotm threatens ppc preemption again References: <1300665188.2402.64.camel@pasglop> <4D873571.702@goop.org> <1300747942.2402.262.camel@pasglop> In-Reply-To: <1300747942.2402.262.camel@pasglop> Content-Type: text/plain; charset=UTF-8 Cc: linuxppc-dev@lists.ozlabs.org, Andrew Morton , Hugh Dickins , Peter Zijlstra List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 03/21/2011 10:52 PM, Benjamin Herrenschmidt wrote: > On Mon, 2011-03-21 at 11:24 +0000, Jeremy Fitzhardinge wrote: >> I'm very sorry about that, I didn't realize power was also using that >> interface. Unfortunately, the "no preemption" definition was an error, >> and had to be changed to match the pre-existing locking rules. >> >> Could you implement a similar "flush batched pte updates on context >> switch" as x86? > Well, we already do that for -rt & co. > > However, we have another issue which is the reason we used those > lazy_mmu hooks to do our flushing. > > Our PTEs eventually get faulted into a hash table which is what the real > MMU uses. We must never (ever) allow that hash table to contain a > duplicate entry for a given virtual address. > > When we do a batch, we remove things from the linux PTE, and keep a > reference in our batch structure, and only update the hash table at the > end of the batch. Wouldn't implicitly ending a batch on context switch get the same effect? > That means that we must not allow a hash fault to populate the hash with > a "new" PTE value prior to the old one having been flushed out (which is > possible if they different in protection attributes for example). For > that to happen, we must basically not allow a page fault to re-populate > a PTE invalidated by a batch before that batch has completed. Kernel ptes are not generally populated on fault though, unless there's something in power? On x86 it can happen when syncing a process's kernel pmd with the init_mm one, but that shouldn't happen in the middle of an update since you'd deadlock anyway. If a particular kernel subsystem has its own locks to manage the ptes for a kernel mapping, then that should prevent any nested updates within a batch shouldn't it? > That translates to batches must only happen within a PTE lock section. Well, in that case, I guess your best bet is to disable batching for kernel pagetable updates. These apply_to_page_range() changes are the first time any attempt to batch kernel pagetable updates has been made (otherwise you would have seen this problem earlier), so not batching them will not be a regression for you. But I'm not sure what the proper fix to get batching in your case will be. But the assumption that there's a pte lock for kernel ptes is not valid. J