From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e28smtp06.in.ibm.com (e28smtp06.in.ibm.com [122.248.162.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e28smtp06.in.ibm.com", Issuer "GeoTrust SSL CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 1E0A22C0087 for ; Wed, 19 Jun 2013 04:46:27 +1000 (EST) Received: from /spool/local by e28smtp06.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 19 Jun 2013 00:09:11 +0530 Received: from d28relay05.in.ibm.com (d28relay05.in.ibm.com [9.184.220.62]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 868DA125804E for ; Wed, 19 Jun 2013 00:15:19 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r5IIkHpc26214626 for ; Wed, 19 Jun 2013 00:16:18 +0530 Received: from d28av04.in.ibm.com (loopback [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r5IIkLVL017830 for ; Wed, 19 Jun 2013 04:46:21 +1000 From: "Aneesh Kumar K.V" To: Benjamin Herrenschmidt Subject: Re: [PATCH -V10 00/15] THP support for PPC64 In-Reply-To: <1371355567.21896.101.camel@pasglop> References: <1370446119-8837-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1371348007.21896.62.camel@pasglop> <1371353865.21896.94.camel@pasglop> <1371355567.21896.101.camel@pasglop> Date: Wed, 19 Jun 2013 00:16:20 +0530 Message-ID: <8738sfi7er.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Cc: Alexey Kardashevskiy , linuxppc-dev@lists.ozlabs.org, paulus@samba.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Benjamin Herrenschmidt writes: > On Sun, 2013-06-16 at 13:37 +1000, Benjamin Herrenschmidt wrote: >> On Sun, 2013-06-16 at 12:00 +1000, Benjamin Herrenschmidt wrote: >> > So at this point, hash_page might *still* see the old pmd. Unless I >> > missed something, you did nothing that will prevent that (the only way >> > to lock against hash_page is really an IPI & wait or to take the PTE's >> > busy and make them !present or something). So as far as I can tell, >> > a concurrent hash_page can still sneak into the hash some "small" >> > entries after you have supposedly flushed them. >> >> Note that the _PAGE_PRESENT bit is removed eventually ... but much >> later, in __collapse_huge_page_copy() which will also flush the hash, so >> at least we will remove a stale hash entry that would have been added by >> the race above I suppose... but: >> >> - _PAGE_ACCESSED can still potentially be set after it was supposed to >> be stable >> >> - The clearing happens *after* copy_user_highpage(), ie, unless I >> missed something here, we potentially still have something writing to >> the 4k page while it's being copied, which is BAD. >> >> Now, let me know if I did miss something here :-) > > An additional issue is that this all collides a bit with Alexey's work > to support TCEs in real mode in KVM, which is necessary to have "usable" > PCI pass-through. > > Look at patches http://patchwork.ozlabs.org/patch/248920/ and followup, > he basically walks the page tables here in a slightly different way than > Paul does in H_ENTER. It's more like gup_fast. It will need to handle > concurrent split/collapse etc... as well which it doesn't right now. > > I'm considering merging Alexei stuff first (provided I don't find major > problems with it), then you can provide a new THP series on top of it. > > While at it, also fix: > > - Some of your patches are bug fixes (like the one about subpage > protection). They need to be either merged in the main patch or put > before the patch that enables THP. We can move them before. One of the reason I would like to keep them as seperate patches is to document the changes better in commit messages. > > - I haven't completely yet considered the impact of the demotion of > segments, but neither do you :-) IE. Under some circumstances, we can > demote entire segments from 64K HW pages to 4K HW pages in the SLB. For > example if a driver (such as HCA) sets the 4K_PFN bit in a PTE, this > will happen at hashing time. I don't think your code deals with that at > all, am I correct ? It *might* be that the right approach with those is: > But will that by anonymous memory ? ie, will we find them suitable for THP allocation ? > * If you find a THP in hash_page and the segment size is 4k, fault > > * In do_page_fault, re-check for that condition (or maybe we can make > hash_page return a specific bit that gets ORed into the error_code into > do_page_fault ?) and split huge pages there. > > But that's just an idea off the top of my mind, there might be a better > way. Of course this needs to be tested. > > BTW. For the subpage protection, similarily, you need to make sure you > properly map the entire segment as "no THP", not just the range > passed-in by the user. Can you explain that more, why should the entire segment be marked no THP ? The segment can work with 4K base page size and we still be able to allocate a hugepage in that segment. -aneesh