From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: "Kirill A. Shutemov" In-Reply-To: <20150917154131.GA27791@linux.intel.com> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> <20150917154131.GA27791@linux.intel.com> Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Content-Transfer-Encoding: 7bit Message-Id: <20150917154715.7A857B8@black.fi.intel.com> Date: Thu, 17 Sep 2015 18:47:15 +0300 (EEST) Sender: owner-linux-mm@kvack.org To: Ross Zwisler Cc: "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" List-ID: Ross Zwisler wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > > Hi Kirill, > > > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > > wrote: > > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > > point. > > > > > > > > Not-yet-signed-off-by: Matthew Wilcox > > > > Signed-off-by: Kirill A. Shutemov > > > > --- > > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > > mm/memory.c | 11 ++--------- > > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > > index 9ef9b80cc132..ed54efedade6 100644 > > > > --- a/fs/dax.c > > > > +++ b/fs/dax.c > > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > > goto fallback; > > > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > > + int i; > > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > > set at this point, so I'm not sure this path was ever tested. > > > > Ughh. It's obviously broken. > > > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > > block around. It should work fine where it was before. Right? > > Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. > > Kirill, do you want to whip up a quick patch? I'm happy to do it if you're > busy. I would be better if you'll prepare the patch. Thanks. -- Kirill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <20150917154131.GA27791@linux.intel.com> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> <20150917154131.GA27791@linux.intel.com> Date: Thu, 17 Sep 2015 08:46:57 -0700 Message-ID: Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX From: Dan Williams Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org To: Ross Zwisler , "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , Linux MM , linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" List-ID: On Thu, Sep 17, 2015 at 8:41 AM, Ross Zwisler wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: >> On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: >> > Hi Kirill, >> > >> > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov >> > wrote: >> > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. >> > > >> > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from >> > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. >> > > >> > > Re-aquiring the lock should be fine since we check i_size after the >> > > point. >> > > >> > > Not-yet-signed-off-by: Matthew Wilcox >> > > Signed-off-by: Kirill A. Shutemov >> > > --- >> > > fs/dax.c | 35 +++++++++++++++++++---------------- >> > > mm/memory.c | 11 ++--------- >> > > 2 files changed, 21 insertions(+), 25 deletions(-) >> > > >> > > diff --git a/fs/dax.c b/fs/dax.c >> > > index 9ef9b80cc132..ed54efedade6 100644 >> > > --- a/fs/dax.c >> > > +++ b/fs/dax.c >> > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, >> > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) >> > > goto fallback; >> > > >> > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { >> > > + int i; >> > > + for (i = 0; i < PTRS_PER_PMD; i++) >> > > + clear_page(kaddr + i * PAGE_SIZE); >> > >> > This patch, now upstream as commit 46c043ede471, moves the call to >> > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not >> > set at this point, so I'm not sure this path was ever tested. >> >> Ughh. It's obviously broken. >> >> I took fs/dax.c part of the patch from Matthew. And I'm not sure now we >> would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" >> block around. It should work fine where it was before. Right? >> Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. I'd say leave it at its current location and add a local call to bdev_direct_access() as I'm not sure you'd want to trigger one of the failure conditions without having zeroed the page. I.e. right before vmf_insert_pfn_pmd() is probably too late. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 17 Sep 2015 09:41:31 -0600 From: Ross Zwisler Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Message-ID: <20150917154131.GA27791@linux.intel.com> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150916111218.GB23026@node.dhcp.inet.fi> Sender: owner-linux-mm@kvack.org To: "Kirill A. Shutemov" Cc: Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com List-ID: On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > Hi Kirill, > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > wrote: > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > point. > > > > > > Not-yet-signed-off-by: Matthew Wilcox > > > Signed-off-by: Kirill A. Shutemov > > > --- > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > mm/memory.c | 11 ++--------- > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > index 9ef9b80cc132..ed54efedade6 100644 > > > --- a/fs/dax.c > > > +++ b/fs/dax.c > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > goto fallback; > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > + int i; > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > set at this point, so I'm not sure this path was ever tested. > > Ughh. It's obviously broken. > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > block around. It should work fine where it was before. Right? > Matthew? Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems correct to me. Matthew is out for a while, so we should probably take care of this without him. Kirill, do you want to whip up a quick patch? I'm happy to do it if you're busy. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 16 Sep 2015 14:12:18 +0300 From: "Kirill A. Shutemov" Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Message-ID: <20150916111218.GB23026@node.dhcp.inet.fi> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Matthew Wilcox , Dan Williams Cc: "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com List-ID: On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > Hi Kirill, > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > wrote: > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > Re-aquiring the lock should be fine since we check i_size after the > > point. > > > > Not-yet-signed-off-by: Matthew Wilcox > > Signed-off-by: Kirill A. Shutemov > > --- > > fs/dax.c | 35 +++++++++++++++++++---------------- > > mm/memory.c | 11 ++--------- > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 9ef9b80cc132..ed54efedade6 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > goto fallback; > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > + int i; > > + for (i = 0; i < PTRS_PER_PMD; i++) > > + clear_page(kaddr + i * PAGE_SIZE); > > This patch, now upstream as commit 46c043ede471, moves the call to > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > set at this point, so I'm not sure this path was ever tested. Ughh. It's obviously broken. I took fs/dax.c part of the patch from Matthew. And I'm not sure now we would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block around. It should work fine where it was before. Right? Matthew? > I'm also not sure why the compiler is not complaining about an > uninitialized variable? No idea. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> Date: Tue, 15 Sep 2015 16:52:42 -0700 Message-ID: Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX From: Dan Williams Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org To: "Kirill A. Shutemov" Cc: Andrew Morton , Matthew Wilcox , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com List-ID: Hi Kirill, On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov wrote: > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > Re-aquiring the lock should be fine since we check i_size after the > point. > > Not-yet-signed-off-by: Matthew Wilcox > Signed-off-by: Kirill A. Shutemov > --- > fs/dax.c | 35 +++++++++++++++++++---------------- > mm/memory.c | 11 ++--------- > 2 files changed, 21 insertions(+), 25 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 9ef9b80cc132..ed54efedade6 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > goto fallback; > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > + int i; > + for (i = 0; i < PTRS_PER_PMD; i++) > + clear_page(kaddr + i * PAGE_SIZE); This patch, now upstream as commit 46c043ede471, moves the call to clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not set at this point, so I'm not sure this path was ever tested. I'm also not sure why the compiler is not complaining about an uninitialized variable? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Date: Fri, 7 Aug 2015 14:53:43 +0300 Message-ID: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" To: Andrew Morton , Matthew Wilcox Return-path: Received: from mga09.intel.com ([134.134.136.24]:51156 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752140AbbHGLyU (ORCPT ); Fri, 7 Aug 2015 07:54:20 -0400 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. Re-aquiring the lock should be fine since we check i_size after the point. Not-yet-signed-off-by: Matthew Wilcox Signed-off-by: Kirill A. Shutemov --- fs/dax.c | 35 +++++++++++++++++++---------------- mm/memory.c | 11 ++--------- 2 files changed, 21 insertions(+), 25 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 9ef9b80cc132..ed54efedade6 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) goto fallback; + if (buffer_unwritten(&bh) || buffer_new(&bh)) { + int i; + for (i = 0; i < PTRS_PER_PMD; i++) + clear_page(kaddr + i * PAGE_SIZE); + count_vm_event(PGMAJFAULT); + mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); + result |= VM_FAULT_MAJOR; + } + + /* + * If we allocated new storage, make sure no process has any + * zero pages covering this hole + */ + if (buffer_new(&bh)) { + i_mmap_unlock_write(mapping); + unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0); + i_mmap_lock_write(mapping); + } + /* * If a truncate happened while we were allocating blocks, we may * leave blocks allocated to the file that are beyond EOF. We can't @@ -568,13 +587,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if ((pgoff | PG_PMD_COLOUR) >= size) goto fallback; - /* - * If we allocated new storage, make sure no process has any - * zero pages covering this hole - */ - if (buffer_new(&bh)) - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0); - if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) { spinlock_t *ptl; pmd_t entry; @@ -605,15 +617,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) goto fallback; - if (buffer_unwritten(&bh) || buffer_new(&bh)) { - int i; - for (i = 0; i < PTRS_PER_PMD; i++) - clear_page(kaddr + i * PAGE_SIZE); - count_vm_event(PGMAJFAULT); - mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); - result |= VM_FAULT_MAJOR; - } - result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write); } diff --git a/mm/memory.c b/mm/memory.c index 5a3427bb3f32..670cdfa9f33e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2426,17 +2426,10 @@ void unmap_mapping_range(struct address_space *mapping, if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - - /* - * DAX already holds i_mmap_lock to serialise file truncate vs - * page fault and page fault vs page fault. - */ - if (!IS_DAX(mapping->host)) - i_mmap_lock_write(mapping); + i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap))) unmap_mapping_range_tree(&mapping->i_mmap, &details); - if (!IS_DAX(mapping->host)) - i_mmap_unlock_write(mapping); + i_mmap_unlock_write(mapping); } EXPORT_SYMBOL(unmap_mapping_range); -- 2.4.6 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Date: Tue, 15 Sep 2015 16:52:42 -0700 Message-ID: References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Andrew Morton , Matthew Wilcox , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com To: "Kirill A. Shutemov" Return-path: Received: from mail-ig0-f177.google.com ([209.85.213.177]:35134 "EHLO mail-ig0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751929AbbIOXwn (ORCPT ); Tue, 15 Sep 2015 19:52:43 -0400 In-Reply-To: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Kirill, On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov wrote: > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > Re-aquiring the lock should be fine since we check i_size after the > point. > > Not-yet-signed-off-by: Matthew Wilcox > Signed-off-by: Kirill A. Shutemov > --- > fs/dax.c | 35 +++++++++++++++++++---------------- > mm/memory.c | 11 ++--------- > 2 files changed, 21 insertions(+), 25 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 9ef9b80cc132..ed54efedade6 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > goto fallback; > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > + int i; > + for (i = 0; i < PTRS_PER_PMD; i++) > + clear_page(kaddr + i * PAGE_SIZE); This patch, now upstream as commit 46c043ede471, moves the call to clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not set at this point, so I'm not sure this path was ever tested. I'm also not sure why the compiler is not complaining about an uninitialized variable? From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Date: Wed, 16 Sep 2015 14:12:18 +0300 Message-ID: <20150916111218.GB23026@node.dhcp.inet.fi> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com To: Matthew Wilcox , Dan Williams Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > Hi Kirill, > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > wrote: > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > Re-aquiring the lock should be fine since we check i_size after the > > point. > > > > Not-yet-signed-off-by: Matthew Wilcox > > Signed-off-by: Kirill A. Shutemov > > --- > > fs/dax.c | 35 +++++++++++++++++++---------------- > > mm/memory.c | 11 ++--------- > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 9ef9b80cc132..ed54efedade6 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > goto fallback; > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > + int i; > > + for (i = 0; i < PTRS_PER_PMD; i++) > > + clear_page(kaddr + i * PAGE_SIZE); > > This patch, now upstream as commit 46c043ede471, moves the call to > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > set at this point, so I'm not sure this path was ever tested. Ughh. It's obviously broken. I took fs/dax.c part of the patch from Matthew. And I'm not sure now we would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block around. It should work fine where it was before. Right? Matthew? > I'm also not sure why the compiler is not complaining about an > uninitialized variable? No idea. -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Date: Thu, 17 Sep 2015 08:46:57 -0700 Message-ID: References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> <20150917154131.GA27791@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 To: Ross Zwisler , "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , Linux MM , linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" Return-path: In-Reply-To: <20150917154131.GA27791@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Sep 17, 2015 at 8:41 AM, Ross Zwisler wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: >> On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: >> > Hi Kirill, >> > >> > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov >> > wrote: >> > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. >> > > >> > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from >> > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. >> > > >> > > Re-aquiring the lock should be fine since we check i_size after the >> > > point. >> > > >> > > Not-yet-signed-off-by: Matthew Wilcox >> > > Signed-off-by: Kirill A. Shutemov >> > > --- >> > > fs/dax.c | 35 +++++++++++++++++++---------------- >> > > mm/memory.c | 11 ++--------- >> > > 2 files changed, 21 insertions(+), 25 deletions(-) >> > > >> > > diff --git a/fs/dax.c b/fs/dax.c >> > > index 9ef9b80cc132..ed54efedade6 100644 >> > > --- a/fs/dax.c >> > > +++ b/fs/dax.c >> > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, >> > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) >> > > goto fallback; >> > > >> > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { >> > > + int i; >> > > + for (i = 0; i < PTRS_PER_PMD; i++) >> > > + clear_page(kaddr + i * PAGE_SIZE); >> > >> > This patch, now upstream as commit 46c043ede471, moves the call to >> > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not >> > set at this point, so I'm not sure this path was ever tested. >> >> Ughh. It's obviously broken. >> >> I took fs/dax.c part of the patch from Matthew. And I'm not sure now we >> would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" >> block around. It should work fine where it was before. Right? >> Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. I'd say leave it at its current location and add a local call to bdev_direct_access() as I'm not sure you'd want to trigger one of the failure conditions without having zeroed the page. I.e. right before vmf_insert_pfn_pmd() is probably too late. From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Date: Thu, 17 Sep 2015 18:47:15 +0300 (EEST) Message-ID: <20150917154715.7A857B8@black.fi.intel.com> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> <20150917154131.GA27791@linux.intel.com> Content-Transfer-Encoding: 7bit Cc: "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com To: Ross Zwisler Return-path: In-Reply-To: <20150917154131.GA27791@linux.intel.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org Ross Zwisler wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > > Hi Kirill, > > > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > > wrote: > > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > > point. > > > > > > > > Not-yet-signed-off-by: Matthew Wilcox > > > > Signed-off-by: Kirill A. Shutemov > > > > --- > > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > > mm/memory.c | 11 ++--------- > > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > > index 9ef9b80cc132..ed54efedade6 100644 > > > > --- a/fs/dax.c > > > > +++ b/fs/dax.c > > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > > goto fallback; > > > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > > + int i; > > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > > set at this point, so I'm not sure this path was ever tested. > > > > Ughh. It's obviously broken. > > > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > > block around. It should work fine where it was before. Right? > > Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. > > Kirill, do you want to whip up a quick patch? I'm happy to do it if you're > busy. I would be better if you'll prepare the patch. Thanks. -- Kirill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) by kanga.kvack.org (Postfix) with ESMTP id AAB3F6B0038 for ; Fri, 7 Aug 2015 07:54:21 -0400 (EDT) Received: by pdrg1 with SMTP id g1so44809524pdr.2 for ; Fri, 07 Aug 2015 04:54:21 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTP id ve14si17240623pab.20.2015.08.07.04.54.20 for ; Fri, 07 Aug 2015 04:54:20 -0700 (PDT) From: "Kirill A. Shutemov" Subject: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Date: Fri, 7 Aug 2015 14:53:43 +0300 Message-Id: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , Matthew Wilcox Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. Re-aquiring the lock should be fine since we check i_size after the point. Not-yet-signed-off-by: Matthew Wilcox Signed-off-by: Kirill A. Shutemov --- fs/dax.c | 35 +++++++++++++++++++---------------- mm/memory.c | 11 ++--------- 2 files changed, 21 insertions(+), 25 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 9ef9b80cc132..ed54efedade6 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) goto fallback; + if (buffer_unwritten(&bh) || buffer_new(&bh)) { + int i; + for (i = 0; i < PTRS_PER_PMD; i++) + clear_page(kaddr + i * PAGE_SIZE); + count_vm_event(PGMAJFAULT); + mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); + result |= VM_FAULT_MAJOR; + } + + /* + * If we allocated new storage, make sure no process has any + * zero pages covering this hole + */ + if (buffer_new(&bh)) { + i_mmap_unlock_write(mapping); + unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0); + i_mmap_lock_write(mapping); + } + /* * If a truncate happened while we were allocating blocks, we may * leave blocks allocated to the file that are beyond EOF. We can't @@ -568,13 +587,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if ((pgoff | PG_PMD_COLOUR) >= size) goto fallback; - /* - * If we allocated new storage, make sure no process has any - * zero pages covering this hole - */ - if (buffer_new(&bh)) - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0); - if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) { spinlock_t *ptl; pmd_t entry; @@ -605,15 +617,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) goto fallback; - if (buffer_unwritten(&bh) || buffer_new(&bh)) { - int i; - for (i = 0; i < PTRS_PER_PMD; i++) - clear_page(kaddr + i * PAGE_SIZE); - count_vm_event(PGMAJFAULT); - mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); - result |= VM_FAULT_MAJOR; - } - result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write); } diff --git a/mm/memory.c b/mm/memory.c index 5a3427bb3f32..670cdfa9f33e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2426,17 +2426,10 @@ void unmap_mapping_range(struct address_space *mapping, if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - - /* - * DAX already holds i_mmap_lock to serialise file truncate vs - * page fault and page fault vs page fault. - */ - if (!IS_DAX(mapping->host)) - i_mmap_lock_write(mapping); + i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap))) unmap_mapping_range_tree(&mapping->i_mmap, &details); - if (!IS_DAX(mapping->host)) - i_mmap_unlock_write(mapping); + i_mmap_unlock_write(mapping); } EXPORT_SYMBOL(unmap_mapping_range); -- 2.4.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751874AbbIQPle (ORCPT ); Thu, 17 Sep 2015 11:41:34 -0400 Received: from mga09.intel.com ([134.134.136.24]:17315 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750792AbbIQPlc (ORCPT ); Thu, 17 Sep 2015 11:41:32 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,547,1437462000"; d="scan'208";a="646869544" Date: Thu, 17 Sep 2015 09:41:31 -0600 From: Ross Zwisler To: "Kirill A. Shutemov" Cc: Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Message-ID: <20150917154131.GA27791@linux.intel.com> Mail-Followup-To: Ross Zwisler , "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150916111218.GB23026@node.dhcp.inet.fi> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > Hi Kirill, > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > wrote: > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > point. > > > > > > Not-yet-signed-off-by: Matthew Wilcox > > > Signed-off-by: Kirill A. Shutemov > > > --- > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > mm/memory.c | 11 ++--------- > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > index 9ef9b80cc132..ed54efedade6 100644 > > > --- a/fs/dax.c > > > +++ b/fs/dax.c > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > goto fallback; > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > + int i; > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > set at this point, so I'm not sure this path was ever tested. > > Ughh. It's obviously broken. > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > block around. It should work fine where it was before. Right? > Matthew? Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems correct to me. Matthew is out for a while, so we should probably take care of this without him. Kirill, do you want to whip up a quick patch? I'm happy to do it if you're busy. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752210AbbIQPrV (ORCPT ); Thu, 17 Sep 2015 11:47:21 -0400 Received: from mga09.intel.com ([134.134.136.24]:53642 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751418AbbIQPrT (ORCPT ); Thu, 17 Sep 2015 11:47:19 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,547,1437462000"; d="scan'208";a="807367056" From: "Kirill A. Shutemov" To: Ross Zwisler Cc: "Kirill A. Shutemov" , Matthew Wilcox , Dan Williams , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, linux-fsdevel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , ross.zwisler@linux.intel.com In-Reply-To: <20150917154131.GA27791@linux.intel.com> References: <1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com> <20150916111218.GB23026@node.dhcp.inet.fi> <20150917154131.GA27791@linux.intel.com> Subject: Re: [PATCH] mm: take i_mmap_lock in unmap_mapping_range() for DAX Content-Transfer-Encoding: 7bit Message-Id: <20150917154715.7A857B8@black.fi.intel.com> Date: Thu, 17 Sep 2015 18:47:15 +0300 (EEST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ross Zwisler wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > > Hi Kirill, > > > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > > wrote: > > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > > point. > > > > > > > > Not-yet-signed-off-by: Matthew Wilcox > > > > Signed-off-by: Kirill A. Shutemov > > > > --- > > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > > mm/memory.c | 11 ++--------- > > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > > index 9ef9b80cc132..ed54efedade6 100644 > > > > --- a/fs/dax.c > > > > +++ b/fs/dax.c > > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > > goto fallback; > > > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > > + int i; > > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > > set at this point, so I'm not sure this path was ever tested. > > > > Ughh. It's obviously broken. > > > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > > block around. It should work fine where it was before. Right? > > Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. > > Kirill, do you want to whip up a quick patch? I'm happy to do it if you're > busy. I would be better if you'll prepare the patch. Thanks. -- Kirill