From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Herrenschmidt Subject: Re: [rfc] data race in page table setup/walking? Date: Tue, 29 Apr 2008 15:08:44 +1000 Message-ID: <1209445724.18023.136.camel@pasglop> References: <20080429050054.GC21795@wotan.suse.de> Reply-To: benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20080429050054.GC21795-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org> Sender: linux-arch-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: To: Nick Piggin Cc: Linus Torvalds , Hugh Dickins , linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux Memory Management List On Tue, 2008-04-29 at 07:00 +0200, Nick Piggin wrote: > > At this point, the spinlock is not guaranteed to have ordered the previous > stores to initialize the pte page with the subsequent store to put it in the > page tables. So another Linux page table walker might be walking down (without > any locks, because we have split-leaf-ptls), and find that new pte we've > inserted. It might try to take the spinlock before the store from the other > CPU initializes it. And subsequently it might read a pte_t out before stores > from the other CPU have cleared the memory. Funny, we used to have a similar race where the zeros for clearing a newly allocated anonymous pages end up reaching the coherency domain after the new PTE in set_pte, causing memory corruption on threaded apps. I think back then we fixed that with an explicit smp_wmb() before a set_pte(). Maybe we need that also when setting the higher levels. Ben. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org ([63.228.1.57]:53573 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750875AbYD2FJJ (ORCPT ); Tue, 29 Apr 2008 01:09:09 -0400 Subject: Re: [rfc] data race in page table setup/walking? From: Benjamin Herrenschmidt Reply-To: benh@kernel.crashing.org In-Reply-To: <20080429050054.GC21795@wotan.suse.de> References: <20080429050054.GC21795@wotan.suse.de> Content-Type: text/plain Date: Tue, 29 Apr 2008 15:08:44 +1000 Message-ID: <1209445724.18023.136.camel@pasglop> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: Nick Piggin Cc: Linus Torvalds , Hugh Dickins , linux-arch@vger.kernel.org, Linux Memory Management List Message-ID: <20080429050844.oRPq5W1IMjyz9_Kp6RzOmbFuhtO_vVV-Hdz7IUX5AGo@z> On Tue, 2008-04-29 at 07:00 +0200, Nick Piggin wrote: > > At this point, the spinlock is not guaranteed to have ordered the previous > stores to initialize the pte page with the subsequent store to put it in the > page tables. So another Linux page table walker might be walking down (without > any locks, because we have split-leaf-ptls), and find that new pte we've > inserted. It might try to take the spinlock before the store from the other > CPU initializes it. And subsequently it might read a pte_t out before stores > from the other CPU have cleared the memory. Funny, we used to have a similar race where the zeros for clearing a newly allocated anonymous pages end up reaching the coherency domain after the new PTE in set_pte, causing memory corruption on threaded apps. I think back then we fixed that with an explicit smp_wmb() before a set_pte(). Maybe we need that also when setting the higher levels. Ben.