From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Date: Thu, 25 Oct 2018 13:19:00 +0300 Message-ID: <20181025101900.phqnqpoju5t2gar5@kshutemo-mobl1> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> <20181025020907.GA13560@joelaf.mtv.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=cBAD82//F0IHqocoI9u7q0ji6ideZxbf3r0I5XESV+w=; b=ejTgSfWk+en03E xPZvCejgyWtf5cP1cIF5Q3bznOdkjdi/eqZ6S07qwC+l5q5YhvczdyAb15QPT/2x78j3CtFDlfnR+ Y1+PIhkFa63GcPmVCIIvjwF6HlcNJBYvr7ZPVd2oz2mBrzHkzWMQ69CDjijlB5DQRStwATiyzuXoW utEXSm/wWqLsFzdcxo7uZCe+14bXK+5B/Yf6fWXZJOfgFXMF8tNr9l65tdks+8B9jlc07OoBu7Ju9 HBUv9X5RzJX1OVqnkat8nSAB9hS9l9FG78VEb8KHoMPA9ojn/LYbUMdq+7HPbiRlKp9EoDUl5Zt6s 1tFmqYznwWHrCzPnijXA==; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=NN9j8rpITBLmXn0s4bCI/Rhv1UTafbrEZVV+5fg1umY=; b=bXy1o7gBqEG05Ywgx9/FrojpzTDhWv0EkEXVi0IVnfJNaXyjBAccGyM9GDwNX/9AO5 5c3353pMwtFoNPHIz1yi0K7TGWwx5F/cf9+vXWAf6emqwwhT2O0UVxm/EYw4K9od7nUp kQJJB/ZD8mVABPMc8/eZG5E3FsU/2f1/34566e+LfSLfvNFQxYaUmReyfpQf6+h+nd9N DS2otEO8dvj8DjqaiAMoMtmDiCF3R7gg+N70qTfvna2jTzoi4pGKqKOZXdviecnoqgKO ckfGiAGunlFPSXWlqRRH/J1tdarE+zwuAL8bCT4RZVOrp42w0RyNPc72HtdsyY/ZvYBF iZcg== Content-Disposition: inline In-Reply-To: <20181025020907.GA13560@joelaf.mtv.corp.google.com> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+glpr-linux-riscv=m.gmane.org@lists.infradead.org To: Joel Fernandes Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Balbir Singh , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, I On Wed, Oct 24, 2018 at 07:09:07PM -0700, Joel Fernandes wrote: > On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > > --- a/mm/mremap.c > > > > > +++ b/mm/mremap.c > > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > > drop_rmap_locks(vma); > > > > > } > > > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > > + unsigned long new_addr, unsigned long old_end, > > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > > +{ > > > > > + spinlock_t *old_ptl, *new_ptl; > > > > > + struct mm_struct *mm = vma->vm_mm; > > > > > + > > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > > + || old_end - old_addr < PMD_SIZE) > > > > > + return false; > > > > > + > > > > > + /* > > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > > + * should have release it. > > > > > + */ > > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > > + return false; > > > > > + > > > > > + /* > > > > > + * We don't have to worry about the ordering of src and dst > > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > > + */ > > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > > + if (old_ptl) { > > > > > > > > How can it ever be false? > > Kirill, > It cannot, you are right. I'll remove the test. > > By the way, there are new changes upstream by Linus which flush the TLB > before releasing the ptlock instead of after. I'm guessing that patch came > about because of reviews of this patch and someone spotted an issue in the > existing code :) > > Anyway the patch in concern is: > eb66ae030829 ("mremap: properly flush TLB before releasing the page") > > I need to rebase on top of that with appropriate modifications, but I worry > that this patch will slow down performance since we have to flush at every > PMD/PTE move before releasing the ptlock. Where as with my patch, the > intention is to flush only at once in the end of move_page_tables. When I > tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. > > Further observation [1] is, it seems like the move_huge_pmds and move_ptes code > is a bit sub optimal in the sense, we are acquiring and releasing the same > ptlock for a bunch of PMDs if the said PMDs are on the same page-table page > right? Instead we can do better by acquiring and release the ptlock less > often. > > I think this observation [1] and the frequent TLB flush issue [2] can be solved > by acquiring the ptlock once for a bunch of PMDs, move them all, then flush > the tlb and then release the ptlock, and then proceed to doing the same thing > for the PMDs in the next page-table page. What do you think? Yeah, that's viable optimization. The tricky part is that one PMD page table can have PMD entires of different types: THP, page table that you can move as whole and the one that you cannot (for any reason). If we cannot move the PMD entry as a whole and must go to PTE page table we would need to drop PMD ptl and take PTE ptl (it might be the same lock in some configuations). Also we don't want to take PMD lock unless it's required. I expect it to be not very trivial to get everything right. But take a shot :) -- Kirill A. Shutemov