From: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Ross Zwisler
<ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
Alexander Viro
<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
Andreas Dilger
<adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
Dan Williams
<dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>,
Matthew Wilcox <mawilcox-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v4 10/12] dax: add struct iomap based DAX PMD support
Date: Thu, 6 Oct 2016 15:34:24 -0600 [thread overview]
Message-ID: <20161006213424.GA4569@linux.intel.com> (raw)
In-Reply-To: <20161003210557.GA28177-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
On Mon, Oct 03, 2016 at 03:05:57PM -0600, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 12:59:49PM +0200, Jan Kara wrote:
> > On Thu 29-09-16 16:49:28, Ross Zwisler wrote:
<>
> > > +int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > + pmd_t *pmd, unsigned int flags, struct iomap_ops *ops)
> > > +{
> > > + struct address_space *mapping = vma->vm_file->f_mapping;
> > > + unsigned long pmd_addr = address & PMD_MASK;
> > > + bool write = flags & FAULT_FLAG_WRITE;
> > > + struct inode *inode = mapping->host;
> > > + struct iomap iomap = { 0 };
> > > + int error, result = 0;
> > > + pgoff_t size, pgoff;
> > > + struct vm_fault vmf;
> > > + void *entry;
> > > + loff_t pos;
> > > +
> > > + /* Fall back to PTEs if we're going to COW */
> > > + if (write && !(vma->vm_flags & VM_SHARED)) {
> > > + split_huge_pmd(vma, pmd, address);
> > > + return VM_FAULT_FALLBACK;
> > > + }
> > > +
> > > + /* If the PMD would extend outside the VMA */
> > > + if (pmd_addr < vma->vm_start)
> > > + return VM_FAULT_FALLBACK;
> > > + if ((pmd_addr + PMD_SIZE) > vma->vm_end)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Check whether offset isn't beyond end of file now. Caller is
> > > + * supposed to hold locks serializing us with truncate / punch hole so
> > > + * this is a reliable test.
> > > + */
> > > + pgoff = linear_page_index(vma, pmd_addr);
> > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> > > + if (pgoff >= size)
> > > + return VM_FAULT_SIGBUS;
> > > +
> > > + /* If the PMD would extend beyond the file size */
> > > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
> > > + * PMD or a HZP entry. If it can't (because a 4k page is already in
> > > + * the tree, for instance), it will return -EEXIST and we just fall
> > > + * back to 4k entries.
> > > + */
> > > + entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
> > > + if (IS_ERR(entry))
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Note that we don't use iomap_apply here. We aren't doing I/O, only
> > > + * setting up a mapping, so really we're using iomap_begin() as a way
> > > + * to look up our filesystem block.
> > > + */
> > > + pos = (loff_t)pgoff << PAGE_SHIFT;
> > > + error = ops->iomap_begin(inode, pos, PMD_SIZE, write ? IOMAP_WRITE : 0,
> > > + &iomap);
> >
> > I'm not quite sure if it is OK to call ->iomap_begin() without ever calling
> > ->iomap_end. Specifically the comment before iomap_apply() says:
> >
> > "It is assumed that the filesystems will lock whatever resources they
> > require in the iomap_begin call, and release them in the iomap_end call."
> >
> > so what you do could result in unbalanced allocations / locks / whatever.
> > Christoph?
>
> I'll add the iomap_end() calls to both the PTE and PMD iomap fault handlers.
Interesting - adding iomap_end() calls to the DAX PTE fault handler causes an
AA deadlock because we try and retake ei->dax_sem. We take dax_sem in
ext2_dax_fault() before calling into the DAX code, then if we end up going
through the error path in ext2_iomap_end(), we call
ext2_write_failed()
ext2_truncate_blocks()
dax_sem_down_write()
Where we try and take dax_sem again. This error path is really only valid for
I/O operations, but we happen to call it for page faults because 'written' in
ext2_iomap_end() is just 0.
So...how should we handle this? A few ideas:
1) Just continue to omit the calls to iomap_end() in the DAX page fault
handlers for now, and add them when there is useful work to be done in one of
the filesystems.
2) Add an IOMAP_FAULT flag to the flags passed into iomap_begin() and
iomap_end() so make it explicit that we are calling as part of a fault handler
and not an I/O operation, and use this to adjust the error handling in
ext2_iomap_end().
3) Just work around the existing error handling in ext2_iomap_end() by either
unsetting IOMAP_WRITE or by setting 'written' to the size of the fault.
For #2 or #3, probably add a comment explaining the deadlock and why we need
to never call ext2_write_failed() while handling a page fault.
Thoughts?
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.com>,
Matthew Wilcox <mawilcox@microsoft.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@lists.01.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4 10/12] dax: add struct iomap based DAX PMD support
Date: Thu, 6 Oct 2016 15:34:24 -0600 [thread overview]
Message-ID: <20161006213424.GA4569@linux.intel.com> (raw)
In-Reply-To: <20161003210557.GA28177@linux.intel.com>
On Mon, Oct 03, 2016 at 03:05:57PM -0600, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 12:59:49PM +0200, Jan Kara wrote:
> > On Thu 29-09-16 16:49:28, Ross Zwisler wrote:
<>
> > > +int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > + pmd_t *pmd, unsigned int flags, struct iomap_ops *ops)
> > > +{
> > > + struct address_space *mapping = vma->vm_file->f_mapping;
> > > + unsigned long pmd_addr = address & PMD_MASK;
> > > + bool write = flags & FAULT_FLAG_WRITE;
> > > + struct inode *inode = mapping->host;
> > > + struct iomap iomap = { 0 };
> > > + int error, result = 0;
> > > + pgoff_t size, pgoff;
> > > + struct vm_fault vmf;
> > > + void *entry;
> > > + loff_t pos;
> > > +
> > > + /* Fall back to PTEs if we're going to COW */
> > > + if (write && !(vma->vm_flags & VM_SHARED)) {
> > > + split_huge_pmd(vma, pmd, address);
> > > + return VM_FAULT_FALLBACK;
> > > + }
> > > +
> > > + /* If the PMD would extend outside the VMA */
> > > + if (pmd_addr < vma->vm_start)
> > > + return VM_FAULT_FALLBACK;
> > > + if ((pmd_addr + PMD_SIZE) > vma->vm_end)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Check whether offset isn't beyond end of file now. Caller is
> > > + * supposed to hold locks serializing us with truncate / punch hole so
> > > + * this is a reliable test.
> > > + */
> > > + pgoff = linear_page_index(vma, pmd_addr);
> > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> > > + if (pgoff >= size)
> > > + return VM_FAULT_SIGBUS;
> > > +
> > > + /* If the PMD would extend beyond the file size */
> > > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
> > > + * PMD or a HZP entry. If it can't (because a 4k page is already in
> > > + * the tree, for instance), it will return -EEXIST and we just fall
> > > + * back to 4k entries.
> > > + */
> > > + entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
> > > + if (IS_ERR(entry))
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Note that we don't use iomap_apply here. We aren't doing I/O, only
> > > + * setting up a mapping, so really we're using iomap_begin() as a way
> > > + * to look up our filesystem block.
> > > + */
> > > + pos = (loff_t)pgoff << PAGE_SHIFT;
> > > + error = ops->iomap_begin(inode, pos, PMD_SIZE, write ? IOMAP_WRITE : 0,
> > > + &iomap);
> >
> > I'm not quite sure if it is OK to call ->iomap_begin() without ever calling
> > ->iomap_end. Specifically the comment before iomap_apply() says:
> >
> > "It is assumed that the filesystems will lock whatever resources they
> > require in the iomap_begin call, and release them in the iomap_end call."
> >
> > so what you do could result in unbalanced allocations / locks / whatever.
> > Christoph?
>
> I'll add the iomap_end() calls to both the PTE and PMD iomap fault handlers.
Interesting - adding iomap_end() calls to the DAX PTE fault handler causes an
AA deadlock because we try and retake ei->dax_sem. We take dax_sem in
ext2_dax_fault() before calling into the DAX code, then if we end up going
through the error path in ext2_iomap_end(), we call
ext2_write_failed()
ext2_truncate_blocks()
dax_sem_down_write()
Where we try and take dax_sem again. This error path is really only valid for
I/O operations, but we happen to call it for page faults because 'written' in
ext2_iomap_end() is just 0.
So...how should we handle this? A few ideas:
1) Just continue to omit the calls to iomap_end() in the DAX page fault
handlers for now, and add them when there is useful work to be done in one of
the filesystems.
2) Add an IOMAP_FAULT flag to the flags passed into iomap_begin() and
iomap_end() so make it explicit that we are calling as part of a fault handler
and not an I/O operation, and use this to adjust the error handling in
ext2_iomap_end().
3) Just work around the existing error handling in ext2_iomap_end() by either
unsetting IOMAP_WRITE or by setting 'written' to the size of the fault.
For #2 or #3, probably add a comment explaining the deadlock and why we need
to never call ext2_write_failed() while handling a page fault.
Thoughts?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.com>,
Matthew Wilcox <mawilcox@microsoft.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@lists.01.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4 10/12] dax: add struct iomap based DAX PMD support
Date: Thu, 6 Oct 2016 15:34:24 -0600 [thread overview]
Message-ID: <20161006213424.GA4569@linux.intel.com> (raw)
In-Reply-To: <20161003210557.GA28177@linux.intel.com>
On Mon, Oct 03, 2016 at 03:05:57PM -0600, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 12:59:49PM +0200, Jan Kara wrote:
> > On Thu 29-09-16 16:49:28, Ross Zwisler wrote:
<>
> > > +int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > + pmd_t *pmd, unsigned int flags, struct iomap_ops *ops)
> > > +{
> > > + struct address_space *mapping = vma->vm_file->f_mapping;
> > > + unsigned long pmd_addr = address & PMD_MASK;
> > > + bool write = flags & FAULT_FLAG_WRITE;
> > > + struct inode *inode = mapping->host;
> > > + struct iomap iomap = { 0 };
> > > + int error, result = 0;
> > > + pgoff_t size, pgoff;
> > > + struct vm_fault vmf;
> > > + void *entry;
> > > + loff_t pos;
> > > +
> > > + /* Fall back to PTEs if we're going to COW */
> > > + if (write && !(vma->vm_flags & VM_SHARED)) {
> > > + split_huge_pmd(vma, pmd, address);
> > > + return VM_FAULT_FALLBACK;
> > > + }
> > > +
> > > + /* If the PMD would extend outside the VMA */
> > > + if (pmd_addr < vma->vm_start)
> > > + return VM_FAULT_FALLBACK;
> > > + if ((pmd_addr + PMD_SIZE) > vma->vm_end)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Check whether offset isn't beyond end of file now. Caller is
> > > + * supposed to hold locks serializing us with truncate / punch hole so
> > > + * this is a reliable test.
> > > + */
> > > + pgoff = linear_page_index(vma, pmd_addr);
> > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> > > + if (pgoff >= size)
> > > + return VM_FAULT_SIGBUS;
> > > +
> > > + /* If the PMD would extend beyond the file size */
> > > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
> > > + * PMD or a HZP entry. If it can't (because a 4k page is already in
> > > + * the tree, for instance), it will return -EEXIST and we just fall
> > > + * back to 4k entries.
> > > + */
> > > + entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
> > > + if (IS_ERR(entry))
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Note that we don't use iomap_apply here. We aren't doing I/O, only
> > > + * setting up a mapping, so really we're using iomap_begin() as a way
> > > + * to look up our filesystem block.
> > > + */
> > > + pos = (loff_t)pgoff << PAGE_SHIFT;
> > > + error = ops->iomap_begin(inode, pos, PMD_SIZE, write ? IOMAP_WRITE : 0,
> > > + &iomap);
> >
> > I'm not quite sure if it is OK to call ->iomap_begin() without ever calling
> > ->iomap_end. Specifically the comment before iomap_apply() says:
> >
> > "It is assumed that the filesystems will lock whatever resources they
> > require in the iomap_begin call, and release them in the iomap_end call."
> >
> > so what you do could result in unbalanced allocations / locks / whatever.
> > Christoph?
>
> I'll add the iomap_end() calls to both the PTE and PMD iomap fault handlers.
Interesting - adding iomap_end() calls to the DAX PTE fault handler causes an
AA deadlock because we try and retake ei->dax_sem. We take dax_sem in
ext2_dax_fault() before calling into the DAX code, then if we end up going
through the error path in ext2_iomap_end(), we call
ext2_write_failed()
ext2_truncate_blocks()
dax_sem_down_write()
Where we try and take dax_sem again. This error path is really only valid for
I/O operations, but we happen to call it for page faults because 'written' in
ext2_iomap_end() is just 0.
So...how should we handle this? A few ideas:
1) Just continue to omit the calls to iomap_end() in the DAX page fault
handlers for now, and add them when there is useful work to be done in one of
the filesystems.
2) Add an IOMAP_FAULT flag to the flags passed into iomap_begin() and
iomap_end() so make it explicit that we are calling as part of a fault handler
and not an I/O operation, and use this to adjust the error handling in
ext2_iomap_end().
3) Just work around the existing error handling in ext2_iomap_end() by either
unsetting IOMAP_WRITE or by setting 'written' to the size of the fault.
For #2 or #3, probably add a comment explaining the deadlock and why we need
to never call ext2_write_failed() while handling a page fault.
Thoughts?
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.com>,
Matthew Wilcox <mawilcox@microsoft.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@lists.01.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4 10/12] dax: add struct iomap based DAX PMD support
Date: Thu, 6 Oct 2016 15:34:24 -0600 [thread overview]
Message-ID: <20161006213424.GA4569@linux.intel.com> (raw)
In-Reply-To: <20161003210557.GA28177@linux.intel.com>
On Mon, Oct 03, 2016 at 03:05:57PM -0600, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 12:59:49PM +0200, Jan Kara wrote:
> > On Thu 29-09-16 16:49:28, Ross Zwisler wrote:
<>
> > > +int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > + pmd_t *pmd, unsigned int flags, struct iomap_ops *ops)
> > > +{
> > > + struct address_space *mapping = vma->vm_file->f_mapping;
> > > + unsigned long pmd_addr = address & PMD_MASK;
> > > + bool write = flags & FAULT_FLAG_WRITE;
> > > + struct inode *inode = mapping->host;
> > > + struct iomap iomap = { 0 };
> > > + int error, result = 0;
> > > + pgoff_t size, pgoff;
> > > + struct vm_fault vmf;
> > > + void *entry;
> > > + loff_t pos;
> > > +
> > > + /* Fall back to PTEs if we're going to COW */
> > > + if (write && !(vma->vm_flags & VM_SHARED)) {
> > > + split_huge_pmd(vma, pmd, address);
> > > + return VM_FAULT_FALLBACK;
> > > + }
> > > +
> > > + /* If the PMD would extend outside the VMA */
> > > + if (pmd_addr < vma->vm_start)
> > > + return VM_FAULT_FALLBACK;
> > > + if ((pmd_addr + PMD_SIZE) > vma->vm_end)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Check whether offset isn't beyond end of file now. Caller is
> > > + * supposed to hold locks serializing us with truncate / punch hole so
> > > + * this is a reliable test.
> > > + */
> > > + pgoff = linear_page_index(vma, pmd_addr);
> > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> > > + if (pgoff >= size)
> > > + return VM_FAULT_SIGBUS;
> > > +
> > > + /* If the PMD would extend beyond the file size */
> > > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
> > > + * PMD or a HZP entry. If it can't (because a 4k page is already in
> > > + * the tree, for instance), it will return -EEXIST and we just fall
> > > + * back to 4k entries.
> > > + */
> > > + entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
> > > + if (IS_ERR(entry))
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Note that we don't use iomap_apply here. We aren't doing I/O, only
> > > + * setting up a mapping, so really we're using iomap_begin() as a way
> > > + * to look up our filesystem block.
> > > + */
> > > + pos = (loff_t)pgoff << PAGE_SHIFT;
> > > + error = ops->iomap_begin(inode, pos, PMD_SIZE, write ? IOMAP_WRITE : 0,
> > > + &iomap);
> >
> > I'm not quite sure if it is OK to call ->iomap_begin() without ever calling
> > ->iomap_end. Specifically the comment before iomap_apply() says:
> >
> > "It is assumed that the filesystems will lock whatever resources they
> > require in the iomap_begin call, and release them in the iomap_end call."
> >
> > so what you do could result in unbalanced allocations / locks / whatever.
> > Christoph?
>
> I'll add the iomap_end() calls to both the PTE and PMD iomap fault handlers.
Interesting - adding iomap_end() calls to the DAX PTE fault handler causes an
AA deadlock because we try and retake ei->dax_sem. We take dax_sem in
ext2_dax_fault() before calling into the DAX code, then if we end up going
through the error path in ext2_iomap_end(), we call
ext2_write_failed()
ext2_truncate_blocks()
dax_sem_down_write()
Where we try and take dax_sem again. This error path is really only valid for
I/O operations, but we happen to call it for page faults because 'written' in
ext2_iomap_end() is just 0.
So...how should we handle this? A few ideas:
1) Just continue to omit the calls to iomap_end() in the DAX page fault
handlers for now, and add them when there is useful work to be done in one of
the filesystems.
2) Add an IOMAP_FAULT flag to the flags passed into iomap_begin() and
iomap_end() so make it explicit that we are calling as part of a fault handler
and not an I/O operation, and use this to adjust the error handling in
ext2_iomap_end().
3) Just work around the existing error handling in ext2_iomap_end() by either
unsetting IOMAP_WRITE or by setting 'written' to the size of the fault.
For #2 or #3, probably add a comment explaining the deadlock and why we need
to never call ext2_write_failed() while handling a page fault.
Thoughts?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.com>,
Matthew Wilcox <mawilcox@microsoft.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@ml01.01.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4 10/12] dax: add struct iomap based DAX PMD support
Date: Thu, 6 Oct 2016 15:34:24 -0600 [thread overview]
Message-ID: <20161006213424.GA4569@linux.intel.com> (raw)
In-Reply-To: <20161003210557.GA28177@linux.intel.com>
On Mon, Oct 03, 2016 at 03:05:57PM -0600, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 12:59:49PM +0200, Jan Kara wrote:
> > On Thu 29-09-16 16:49:28, Ross Zwisler wrote:
<>
> > > +int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > + pmd_t *pmd, unsigned int flags, struct iomap_ops *ops)
> > > +{
> > > + struct address_space *mapping = vma->vm_file->f_mapping;
> > > + unsigned long pmd_addr = address & PMD_MASK;
> > > + bool write = flags & FAULT_FLAG_WRITE;
> > > + struct inode *inode = mapping->host;
> > > + struct iomap iomap = { 0 };
> > > + int error, result = 0;
> > > + pgoff_t size, pgoff;
> > > + struct vm_fault vmf;
> > > + void *entry;
> > > + loff_t pos;
> > > +
> > > + /* Fall back to PTEs if we're going to COW */
> > > + if (write && !(vma->vm_flags & VM_SHARED)) {
> > > + split_huge_pmd(vma, pmd, address);
> > > + return VM_FAULT_FALLBACK;
> > > + }
> > > +
> > > + /* If the PMD would extend outside the VMA */
> > > + if (pmd_addr < vma->vm_start)
> > > + return VM_FAULT_FALLBACK;
> > > + if ((pmd_addr + PMD_SIZE) > vma->vm_end)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Check whether offset isn't beyond end of file now. Caller is
> > > + * supposed to hold locks serializing us with truncate / punch hole so
> > > + * this is a reliable test.
> > > + */
> > > + pgoff = linear_page_index(vma, pmd_addr);
> > > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +
> > > + if (pgoff >= size)
> > > + return VM_FAULT_SIGBUS;
> > > +
> > > + /* If the PMD would extend beyond the file size */
> > > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * grab_mapping_entry() will make sure we get a 2M empty entry, a DAX
> > > + * PMD or a HZP entry. If it can't (because a 4k page is already in
> > > + * the tree, for instance), it will return -EEXIST and we just fall
> > > + * back to 4k entries.
> > > + */
> > > + entry = grab_mapping_entry(mapping, pgoff, RADIX_DAX_PMD);
> > > + if (IS_ERR(entry))
> > > + return VM_FAULT_FALLBACK;
> > > +
> > > + /*
> > > + * Note that we don't use iomap_apply here. We aren't doing I/O, only
> > > + * setting up a mapping, so really we're using iomap_begin() as a way
> > > + * to look up our filesystem block.
> > > + */
> > > + pos = (loff_t)pgoff << PAGE_SHIFT;
> > > + error = ops->iomap_begin(inode, pos, PMD_SIZE, write ? IOMAP_WRITE : 0,
> > > + &iomap);
> >
> > I'm not quite sure if it is OK to call ->iomap_begin() without ever calling
> > ->iomap_end. Specifically the comment before iomap_apply() says:
> >
> > "It is assumed that the filesystems will lock whatever resources they
> > require in the iomap_begin call, and release them in the iomap_end call."
> >
> > so what you do could result in unbalanced allocations / locks / whatever.
> > Christoph?
>
> I'll add the iomap_end() calls to both the PTE and PMD iomap fault handlers.
Interesting - adding iomap_end() calls to the DAX PTE fault handler causes an
AA deadlock because we try and retake ei->dax_sem. We take dax_sem in
ext2_dax_fault() before calling into the DAX code, then if we end up going
through the error path in ext2_iomap_end(), we call
ext2_write_failed()
ext2_truncate_blocks()
dax_sem_down_write()
Where we try and take dax_sem again. This error path is really only valid for
I/O operations, but we happen to call it for page faults because 'written' in
ext2_iomap_end() is just 0.
So...how should we handle this? A few ideas:
1) Just continue to omit the calls to iomap_end() in the DAX page fault
handlers for now, and add them when there is useful work to be done in one of
the filesystems.
2) Add an IOMAP_FAULT flag to the flags passed into iomap_begin() and
iomap_end() so make it explicit that we are calling as part of a fault handler
and not an I/O operation, and use this to adjust the error handling in
ext2_iomap_end().
3) Just work around the existing error handling in ext2_iomap_end() by either
unsetting IOMAP_WRITE or by setting 'written' to the size of the fault.
For #2 or #3, probably add a comment explaining the deadlock and why we need
to never call ext2_write_failed() while handling a page fault.
Thoughts?
next prev parent reply other threads:[~2016-10-06 21:34 UTC|newest]
Thread overview: 189+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-29 22:49 [PATCH v4 00/12] re-enable DAX PMD support Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 02/12] ext4: tell DAX the size of allocation holes Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 03/12] dax: remove buffer_size_valid() Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-30 8:49 ` Christoph Hellwig
2016-09-30 8:49 ` Christoph Hellwig
2016-09-30 8:49 ` Christoph Hellwig
2016-09-30 8:49 ` Christoph Hellwig
[not found] ` <1475189370-31634-1-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-29 22:49 ` [PATCH v4 01/12] ext4: allow DAX writeback for hole punch Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 04/12] ext2: remove support for DAX PMD faults Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
[not found] ` <1475189370-31634-5-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-30 8:49 ` Christoph Hellwig
2016-09-30 8:49 ` Christoph Hellwig
2016-09-30 8:49 ` Christoph Hellwig
2016-09-30 8:49 ` Christoph Hellwig
2016-10-03 9:35 ` Jan Kara
2016-10-03 9:35 ` Jan Kara
2016-10-03 9:35 ` Jan Kara
2016-10-03 9:35 ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 05/12] dax: make 'wait_table' global variable static Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
[not found] ` <1475189370-31634-6-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-10-03 9:36 ` Jan Kara
2016-10-03 9:36 ` Jan Kara
2016-10-03 9:36 ` Jan Kara
2016-10-03 9:36 ` Jan Kara
2016-10-03 9:36 ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 06/12] dax: consistent variable naming for DAX entries Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
[not found] ` <1475189370-31634-7-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-10-03 9:37 ` Jan Kara
2016-10-03 9:37 ` Jan Kara
2016-10-03 9:37 ` Jan Kara
2016-10-03 9:37 ` Jan Kara
2016-10-03 9:37 ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 07/12] dax: coordinate locking for offsets in PMD range Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-30 9:44 ` Christoph Hellwig
2016-09-30 9:44 ` Christoph Hellwig
2016-10-03 9:55 ` Jan Kara
2016-10-03 9:55 ` Jan Kara
2016-10-03 9:55 ` Jan Kara
2016-10-03 9:55 ` Jan Kara
[not found] ` <20161003095518.GM6457-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2016-10-03 18:40 ` Ross Zwisler
2016-10-03 18:40 ` Ross Zwisler
2016-10-03 18:40 ` Ross Zwisler
2016-10-03 18:40 ` Ross Zwisler
2016-10-03 18:40 ` Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 08/12] dax: remove dax_pmd_fault() Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
[not found] ` <1475189370-31634-9-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-09-30 8:50 ` Christoph Hellwig
2016-10-03 9:56 ` Jan Kara
2016-10-03 9:56 ` Jan Kara
2016-10-03 9:56 ` Jan Kara
2016-10-03 9:56 ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 09/12] dax: correct dax iomap code namespace Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-30 8:51 ` Christoph Hellwig
2016-09-30 8:51 ` Christoph Hellwig
2016-09-30 8:51 ` Christoph Hellwig
2016-09-30 8:51 ` Christoph Hellwig
2016-10-03 9:57 ` Jan Kara
2016-10-03 9:57 ` Jan Kara
2016-10-03 9:57 ` Jan Kara
2016-10-03 9:57 ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 10/12] dax: add struct iomap based DAX PMD support Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
[not found] ` <1475189370-31634-11-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-30 9:56 ` Christoph Hellwig
2016-09-30 9:56 ` Christoph Hellwig
2016-09-30 9:56 ` Christoph Hellwig
[not found] ` <20160930095627.GB5299-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-03 21:16 ` Ross Zwisler
2016-10-03 21:16 ` Ross Zwisler
2016-10-03 21:16 ` Ross Zwisler
2016-10-03 10:59 ` Jan Kara
2016-10-03 10:59 ` Jan Kara
2016-10-03 10:59 ` Jan Kara
2016-10-03 10:59 ` Jan Kara
2016-10-03 16:37 ` Christoph Hellwig
2016-10-03 16:37 ` Christoph Hellwig
2016-10-03 16:37 ` Christoph Hellwig
2016-10-03 21:05 ` Ross Zwisler
2016-10-03 21:05 ` Ross Zwisler
2016-10-03 21:05 ` Ross Zwisler
2016-10-04 5:55 ` Jan Kara
2016-10-04 5:55 ` Jan Kara
2016-10-04 5:55 ` Jan Kara
2016-10-04 5:55 ` Jan Kara
2016-10-04 15:39 ` Ross Zwisler
2016-10-04 15:39 ` Ross Zwisler
2016-10-04 15:39 ` Ross Zwisler
2016-10-04 15:39 ` Ross Zwisler
[not found] ` <20161004153948.GA21248-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-10-05 5:50 ` Jan Kara
2016-10-05 5:50 ` Jan Kara
2016-10-05 5:50 ` Jan Kara
2016-10-05 5:50 ` Jan Kara
2016-10-05 5:50 ` Jan Kara
[not found] ` <20161003210557.GA28177-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-10-06 21:34 ` Ross Zwisler [this message]
2016-10-06 21:34 ` Ross Zwisler
2016-10-06 21:34 ` Ross Zwisler
2016-10-06 21:34 ` Ross Zwisler
2016-10-06 21:34 ` Ross Zwisler
2016-10-07 2:58 ` Ross Zwisler
2016-10-07 2:58 ` Ross Zwisler
2016-10-07 2:58 ` Ross Zwisler
2016-10-07 2:58 ` Ross Zwisler
[not found] ` <20161007025833.GA2934-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-10-07 7:24 ` Jan Kara
2016-10-07 7:24 ` Jan Kara
2016-10-07 7:24 ` Jan Kara
2016-10-07 7:24 ` Jan Kara
2016-10-07 7:24 ` Jan Kara
2016-09-29 22:49 ` [PATCH v4 12/12] dax: remove "depends on BROKEN" from FS_DAX_PMD Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 23:43 ` [PATCH v4 00/12] re-enable DAX PMD support Dave Chinner
2016-09-29 23:43 ` Dave Chinner
2016-09-29 23:43 ` Dave Chinner
2016-09-29 23:43 ` Dave Chinner
2016-09-29 23:43 ` Dave Chinner
2016-09-30 3:03 ` Ross Zwisler
2016-09-30 3:03 ` Ross Zwisler
2016-09-30 3:03 ` Ross Zwisler
2016-09-30 3:03 ` Ross Zwisler
2016-09-30 4:00 ` Darrick J. Wong
2016-09-30 4:00 ` Darrick J. Wong
2016-10-03 18:54 ` Ross Zwisler
2016-10-03 18:54 ` Ross Zwisler
2016-09-30 6:48 ` Dave Chinner
2016-09-30 6:48 ` Dave Chinner
2016-09-30 6:48 ` Dave Chinner
2016-09-30 6:48 ` Dave Chinner
2016-10-03 21:11 ` Ross Zwisler
2016-10-03 21:11 ` Ross Zwisler
2016-10-03 21:11 ` Ross Zwisler
2016-10-03 21:11 ` Ross Zwisler
2016-10-03 23:05 ` Ross Zwisler
2016-10-03 23:05 ` Ross Zwisler
2016-10-03 23:05 ` Ross Zwisler
2016-10-03 23:05 ` Ross Zwisler
2016-09-29 22:49 ` [PATCH v4 11/12] xfs: use struct iomap based DAX PMD fault path Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-29 22:49 ` Ross Zwisler
2016-09-30 11:46 ` [PATCH v4 00/12] re-enable DAX PMD support Christoph Hellwig
2016-09-30 11:46 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161006213424.GA4569@linux.intel.com \
--to=ross.zwisler-vuqaysv1563yd54fqh9/ca@public.gmane.org \
--cc=adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=hch-jcswGhMUV9g@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=jack-IBi9RG/b67k@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
--cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mawilcox-0li6OtcxBFHby3iVrkZq2A@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.