* [PATCH 0/2 v2] ext4: Fix ENOSPC handling for DAX faults @ 2017-12-21 16:30 Jan Kara 2017-12-21 16:30 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara 2017-12-21 16:30 ` [PATCH 2/2] ext4: Fix ENOSPC handling in DAX page fault handler Jan Kara 0 siblings, 2 replies; 11+ messages in thread From: Jan Kara @ 2017-12-21 16:30 UTC (permalink / raw) To: linux-ext4 Cc: Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm, Jan Kara Hello, these two patches fix handling of ENOSPC during DAX faults. The problem is that currently running transaction may be holding lots of already freed blocks which can be reallocated only once the transaction commits. Standard retry logic in ext4_iomap_end() does not work for DAX page fault handler since we hold current transaction open in ext4_dax_huge_fault() and thus retry logic cannot force the running transaction and as a result application gets SIGBUS due to premature ENOSPC error. Changes since v1: * Made error passed back from dax_iomap_fault() explicitely be only the error from ->iomap_begin() Comments are welcome. Honza ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2017-12-21 16:30 [PATCH 0/2 v2] ext4: Fix ENOSPC handling for DAX faults Jan Kara @ 2017-12-21 16:30 ` Jan Kara 2017-12-21 17:12 ` Dan Williams 2017-12-21 17:16 ` Ross Zwisler 2017-12-21 16:30 ` [PATCH 2/2] ext4: Fix ENOSPC handling in DAX page fault handler Jan Kara 1 sibling, 2 replies; 11+ messages in thread From: Jan Kara @ 2017-12-21 16:30 UTC (permalink / raw) To: linux-ext4 Cc: Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm, Jan Kara Ext4 needs to pass through error from its iomap handler to the page fault handler so that it can properly detect ENOSPC and force transaction commit and retry the fault (and block allocation). Add argument to dax_iomap_fault() for passing such error. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/dax.c | 9 ++++++--- fs/ext2/file.c | 2 +- fs/ext4/file.c | 2 +- fs/xfs/xfs_file.c | 2 +- include/linux/dax.h | 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 95981591977a..f3afa1d6156c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1096,7 +1096,7 @@ static bool dax_fault_is_synchronous(unsigned long flags, } static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, - const struct iomap_ops *ops) + int *iomap_errp, const struct iomap_ops *ops) { struct vm_area_struct *vma = vmf->vma; struct address_space *mapping = vma->vm_file->f_mapping; @@ -1149,6 +1149,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, * that we never have to deal with more than a single extent here. */ error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); + if (iomap_errp) + *iomap_errp = error; if (error) { vmf_ret = dax_fault_return(error); goto unlock_entry; @@ -1488,6 +1490,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * @vmf: The description of the fault * @pe_size: Size of the page to fault in * @pfnp: PFN to insert for synchronous faults if fsync is required + * @iomap_errp: Storage for detailed error code in case of error * @ops: Iomap ops passed from the file system * * When a page fault occurs, filesystems may call this helper in @@ -1496,11 +1499,11 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * successfully. */ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, - pfn_t *pfnp, const struct iomap_ops *ops) + pfn_t *pfnp, int *iomap_errp, const struct iomap_ops *ops) { switch (pe_size) { case PE_SIZE_PTE: - return dax_iomap_pte_fault(vmf, pfnp, ops); + return dax_iomap_pte_fault(vmf, pfnp, iomap_errp, ops); case PE_SIZE_PMD: return dax_iomap_pmd_fault(vmf, pfnp, ops); default: diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 2da67699dc33..09640220fda8 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -100,7 +100,7 @@ static int ext2_dax_fault(struct vm_fault *vmf) } down_read(&ei->dax_sem); - ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, &ext2_iomap_ops); + ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); up_read(&ei->dax_sem); if (vmf->flags & FAULT_FLAG_WRITE) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index a0ae27b1bc66..1c7cd882d998 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -314,7 +314,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf, } else { down_read(&EXT4_I(inode)->i_mmap_sem); } - result = dax_iomap_fault(vmf, pe_size, &pfn, &ext4_iomap_ops); + result = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &ext4_iomap_ops); if (write) { ext4_journal_stop(handle); /* Handling synchronous page fault? */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 8601275cc5e6..9ea08326f876 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1048,7 +1048,7 @@ __xfs_filemap_fault( if (IS_DAX(inode)) { pfn_t pfn; - ret = dax_iomap_fault(vmf, pe_size, &pfn, &xfs_iomap_ops); + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); } else { diff --git a/include/linux/dax.h b/include/linux/dax.h index 5258346c558c..0185ecdae135 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -96,7 +96,7 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev); ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, - pfn_t *pfnp, const struct iomap_ops *ops); + pfn_t *pfnp, int *errp, const struct iomap_ops *ops); int dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t pfn); int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); -- 2.12.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2017-12-21 16:30 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara @ 2017-12-21 17:12 ` Dan Williams 2018-01-02 18:54 ` Jan Kara 2017-12-21 17:16 ` Ross Zwisler 1 sibling, 1 reply; 11+ messages in thread From: Dan Williams @ 2017-12-21 17:12 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, Ted Tso, Ross Zwisler, linux-fsdevel, linux-nvdimm On Thu, Dec 21, 2017 at 8:30 AM, Jan Kara <jack@suse.cz> wrote: > Ext4 needs to pass through error from its iomap handler to the page > fault handler so that it can properly detect ENOSPC and force > transaction commit and retry the fault (and block allocation). Add > argument to dax_iomap_fault() for passing such error. > > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/dax.c | 9 ++++++--- > fs/ext2/file.c | 2 +- > fs/ext4/file.c | 2 +- > fs/xfs/xfs_file.c | 2 +- > include/linux/dax.h | 2 +- > 5 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 95981591977a..f3afa1d6156c 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -1096,7 +1096,7 @@ static bool dax_fault_is_synchronous(unsigned long flags, > } > > static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, > - const struct iomap_ops *ops) > + int *iomap_errp, const struct iomap_ops *ops) > { > struct vm_area_struct *vma = vmf->vma; > struct address_space *mapping = vma->vm_file->f_mapping; > @@ -1149,6 +1149,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, > * that we never have to deal with more than a single extent here. > */ > error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); > + if (iomap_errp) > + *iomap_errp = error; Since we already have 'struct iomap' tracking the state of the iomap should we track the error status there as well? I.e. move the on stack allocation of struct iomap to the per-fs dax fault handlers. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2017-12-21 17:12 ` Dan Williams @ 2018-01-02 18:54 ` Jan Kara 2018-01-02 18:57 ` Dan Williams 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2018-01-02 18:54 UTC (permalink / raw) To: Dan Williams Cc: Jan Kara, linux-ext4, Ted Tso, Ross Zwisler, linux-fsdevel, linux-nvdimm On Thu 21-12-17 09:12:52, Dan Williams wrote: > On Thu, Dec 21, 2017 at 8:30 AM, Jan Kara <jack@suse.cz> wrote: > > Ext4 needs to pass through error from its iomap handler to the page > > fault handler so that it can properly detect ENOSPC and force > > transaction commit and retry the fault (and block allocation). Add > > argument to dax_iomap_fault() for passing such error. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > fs/dax.c | 9 ++++++--- > > fs/ext2/file.c | 2 +- > > fs/ext4/file.c | 2 +- > > fs/xfs/xfs_file.c | 2 +- > > include/linux/dax.h | 2 +- > > 5 files changed, 10 insertions(+), 7 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 95981591977a..f3afa1d6156c 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -1096,7 +1096,7 @@ static bool dax_fault_is_synchronous(unsigned long flags, > > } > > > > static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, > > - const struct iomap_ops *ops) > > + int *iomap_errp, const struct iomap_ops *ops) > > { > > struct vm_area_struct *vma = vmf->vma; > > struct address_space *mapping = vma->vm_file->f_mapping; > > @@ -1149,6 +1149,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, > > * that we never have to deal with more than a single extent here. > > */ > > error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); > > + if (iomap_errp) > > + *iomap_errp = error; > > Since we already have 'struct iomap' tracking the state of the iomap > should we track the error status there as well? I.e. move the on > stack allocation of struct iomap to the per-fs dax fault handlers. I don't think that's really adequate. Firstly because at least part of struct iomap needs to be initialized inside dax_iomap_fault() and secondly because by the time dax_iomap_fault() returns, mapping information in struct iomap may be invalid (as we unlocked radix tree entry). So I think it is really better to pass back only the error code and keep struct iomap internal to dax_iomap_fault(). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2018-01-02 18:54 ` Jan Kara @ 2018-01-02 18:57 ` Dan Williams 0 siblings, 0 replies; 11+ messages in thread From: Dan Williams @ 2018-01-02 18:57 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, Ted Tso, Ross Zwisler, linux-fsdevel, linux-nvdimm On Tue, Jan 2, 2018 at 10:54 AM, Jan Kara <jack@suse.cz> wrote: > On Thu 21-12-17 09:12:52, Dan Williams wrote: >> On Thu, Dec 21, 2017 at 8:30 AM, Jan Kara <jack@suse.cz> wrote: >> > Ext4 needs to pass through error from its iomap handler to the page >> > fault handler so that it can properly detect ENOSPC and force >> > transaction commit and retry the fault (and block allocation). Add >> > argument to dax_iomap_fault() for passing such error. >> > >> > Signed-off-by: Jan Kara <jack@suse.cz> >> > --- >> > fs/dax.c | 9 ++++++--- >> > fs/ext2/file.c | 2 +- >> > fs/ext4/file.c | 2 +- >> > fs/xfs/xfs_file.c | 2 +- >> > include/linux/dax.h | 2 +- >> > 5 files changed, 10 insertions(+), 7 deletions(-) >> > >> > diff --git a/fs/dax.c b/fs/dax.c >> > index 95981591977a..f3afa1d6156c 100644 >> > --- a/fs/dax.c >> > +++ b/fs/dax.c >> > @@ -1096,7 +1096,7 @@ static bool dax_fault_is_synchronous(unsigned long flags, >> > } >> > >> > static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, >> > - const struct iomap_ops *ops) >> > + int *iomap_errp, const struct iomap_ops *ops) >> > { >> > struct vm_area_struct *vma = vmf->vma; >> > struct address_space *mapping = vma->vm_file->f_mapping; >> > @@ -1149,6 +1149,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, >> > * that we never have to deal with more than a single extent here. >> > */ >> > error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); >> > + if (iomap_errp) >> > + *iomap_errp = error; >> >> Since we already have 'struct iomap' tracking the state of the iomap >> should we track the error status there as well? I.e. move the on >> stack allocation of struct iomap to the per-fs dax fault handlers. > > I don't think that's really adequate. Firstly because at least part of > struct iomap needs to be initialized inside dax_iomap_fault() and secondly > because by the time dax_iomap_fault() returns, mapping information in struct > iomap may be invalid (as we unlocked radix tree entry). So I think it is > really better to pass back only the error code and keep struct iomap > internal to dax_iomap_fault(). Ok, that makes sense to me. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2017-12-21 16:30 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara 2017-12-21 17:12 ` Dan Williams @ 2017-12-21 17:16 ` Ross Zwisler 1 sibling, 0 replies; 11+ messages in thread From: Ross Zwisler @ 2017-12-21 17:16 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm On Thu, Dec 21, 2017 at 05:30:54PM +0100, Jan Kara wrote: > Ext4 needs to pass through error from its iomap handler to the page > fault handler so that it can properly detect ENOSPC and force > transaction commit and retry the fault (and block allocation). Add > argument to dax_iomap_fault() for passing such error. > > Signed-off-by: Jan Kara <jack@suse.cz> I like how much simpler this version is. Looks good. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] ext4: Fix ENOSPC handling in DAX page fault handler 2017-12-21 16:30 [PATCH 0/2 v2] ext4: Fix ENOSPC handling for DAX faults Jan Kara 2017-12-21 16:30 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara @ 2017-12-21 16:30 ` Jan Kara 2017-12-21 17:17 ` Ross Zwisler 1 sibling, 1 reply; 11+ messages in thread From: Jan Kara @ 2017-12-21 16:30 UTC (permalink / raw) To: linux-ext4 Cc: Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm, Jan Kara When allocation of underlying block for a page fault fails, we fail the fault with SIGBUS. However we may well hit ENOSPC just due to lots of free blocks being held by the running / committing transaction. So propagate the error from ext4_iomap_begin() and implement do standard allocation retry loop in ext4_dax_huge_fault(). Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/file.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 1c7cd882d998..fb6f023622fe 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -280,7 +280,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) static int ext4_dax_huge_fault(struct vm_fault *vmf, enum page_entry_size pe_size) { - int result; + int result, error = 0; + int retries = 0; handle_t *handle = NULL; struct inode *inode = file_inode(vmf->vma->vm_file); struct super_block *sb = inode->i_sb; @@ -304,6 +305,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf, sb_start_pagefault(sb); file_update_time(vmf->vma->vm_file); down_read(&EXT4_I(inode)->i_mmap_sem); +retry: handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE, EXT4_DATA_TRANS_BLOCKS(sb)); if (IS_ERR(handle)) { @@ -314,9 +316,13 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf, } else { down_read(&EXT4_I(inode)->i_mmap_sem); } - result = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &ext4_iomap_ops); + result = dax_iomap_fault(vmf, pe_size, &pfn, &error, &ext4_iomap_ops); if (write) { ext4_journal_stop(handle); + + if ((result & VM_FAULT_ERROR) && error == -ENOSPC && + ext4_should_retry_alloc(sb, &retries)) + goto retry; /* Handling synchronous page fault? */ if (result & VM_FAULT_NEEDDSYNC) result = dax_finish_sync_fault(vmf, pe_size, pfn); -- 2.12.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] ext4: Fix ENOSPC handling in DAX page fault handler 2017-12-21 16:30 ` [PATCH 2/2] ext4: Fix ENOSPC handling in DAX page fault handler Jan Kara @ 2017-12-21 17:17 ` Ross Zwisler 0 siblings, 0 replies; 11+ messages in thread From: Ross Zwisler @ 2017-12-21 17:17 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm On Thu, Dec 21, 2017 at 05:30:55PM +0100, Jan Kara wrote: > When allocation of underlying block for a page fault fails, we fail the > fault with SIGBUS. However we may well hit ENOSPC just due to lots of > free blocks being held by the running / committing transaction. So > propagate the error from ext4_iomap_begin() and implement do standard > allocation retry loop in ext4_dax_huge_fault(). > > Signed-off-by: Jan Kara <jack@suse.cz> Looks good. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/2 v3] ext4: Fix ENOSPC handling for DAX faults @ 2018-01-03 10:08 Jan Kara 2018-01-03 10:08 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2018-01-03 10:08 UTC (permalink / raw) To: Ted Tso Cc: linux-ext4, linux-fsdevel, linux-nvdimm, Ross Zwisler, Dan Williams, Jan Kara Hello, these two patches fix handling of ENOSPC during DAX faults. The problem is that currently running transaction may be holding lots of already freed blocks which can be reallocated only once the transaction commits. Standard retry logic in ext4_iomap_end() does not work for DAX page fault handler since we hold current transaction open in ext4_dax_huge_fault() and thus retry logic cannot force the running transaction and as a result application gets SIGBUS due to premature ENOSPC error. Ted, can you please merge these patches through your tree? Ext4 is the part that is touched by these patches the most. Changes since v2: * Added Reviewed-by tags Changes since v1: * Made error passed back from dax_iomap_fault() explicitely be only the error from ->iomap_begin() Comments are welcome. Honza ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2018-01-03 10:08 [PATCH 0/2 v3] ext4: Fix ENOSPC handling for DAX faults Jan Kara @ 2018-01-03 10:08 ` Jan Kara 2018-01-07 21:51 ` Theodore Ts'o 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2018-01-03 10:08 UTC (permalink / raw) To: Ted Tso Cc: linux-ext4, linux-fsdevel, linux-nvdimm, Ross Zwisler, Dan Williams, Jan Kara Ext4 needs to pass through error from its iomap handler to the page fault handler so that it can properly detect ENOSPC and force transaction commit and retry the fault (and block allocation). Add argument to dax_iomap_fault() for passing such error. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/dax.c | 9 ++++++--- fs/ext2/file.c | 2 +- fs/ext4/file.c | 2 +- fs/xfs/xfs_file.c | 2 +- include/linux/dax.h | 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 95981591977a..f3afa1d6156c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1096,7 +1096,7 @@ static bool dax_fault_is_synchronous(unsigned long flags, } static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, - const struct iomap_ops *ops) + int *iomap_errp, const struct iomap_ops *ops) { struct vm_area_struct *vma = vmf->vma; struct address_space *mapping = vma->vm_file->f_mapping; @@ -1149,6 +1149,8 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, * that we never have to deal with more than a single extent here. */ error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); + if (iomap_errp) + *iomap_errp = error; if (error) { vmf_ret = dax_fault_return(error); goto unlock_entry; @@ -1488,6 +1490,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * @vmf: The description of the fault * @pe_size: Size of the page to fault in * @pfnp: PFN to insert for synchronous faults if fsync is required + * @iomap_errp: Storage for detailed error code in case of error * @ops: Iomap ops passed from the file system * * When a page fault occurs, filesystems may call this helper in @@ -1496,11 +1499,11 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * successfully. */ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, - pfn_t *pfnp, const struct iomap_ops *ops) + pfn_t *pfnp, int *iomap_errp, const struct iomap_ops *ops) { switch (pe_size) { case PE_SIZE_PTE: - return dax_iomap_pte_fault(vmf, pfnp, ops); + return dax_iomap_pte_fault(vmf, pfnp, iomap_errp, ops); case PE_SIZE_PMD: return dax_iomap_pmd_fault(vmf, pfnp, ops); default: diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 2da67699dc33..09640220fda8 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -100,7 +100,7 @@ static int ext2_dax_fault(struct vm_fault *vmf) } down_read(&ei->dax_sem); - ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, &ext2_iomap_ops); + ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); up_read(&ei->dax_sem); if (vmf->flags & FAULT_FLAG_WRITE) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index a0ae27b1bc66..1c7cd882d998 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -314,7 +314,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf, } else { down_read(&EXT4_I(inode)->i_mmap_sem); } - result = dax_iomap_fault(vmf, pe_size, &pfn, &ext4_iomap_ops); + result = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &ext4_iomap_ops); if (write) { ext4_journal_stop(handle); /* Handling synchronous page fault? */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 8601275cc5e6..9ea08326f876 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1048,7 +1048,7 @@ __xfs_filemap_fault( if (IS_DAX(inode)) { pfn_t pfn; - ret = dax_iomap_fault(vmf, pe_size, &pfn, &xfs_iomap_ops); + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); } else { diff --git a/include/linux/dax.h b/include/linux/dax.h index 5258346c558c..0185ecdae135 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -96,7 +96,7 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev); ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, - pfn_t *pfnp, const struct iomap_ops *ops); + pfn_t *pfnp, int *errp, const struct iomap_ops *ops); int dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t pfn); int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); -- 2.12.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2018-01-03 10:08 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara @ 2018-01-07 21:51 ` Theodore Ts'o 0 siblings, 0 replies; 11+ messages in thread From: Theodore Ts'o @ 2018-01-07 21:51 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, linux-fsdevel, linux-nvdimm, Ross Zwisler, Dan Williams On Wed, Jan 03, 2018 at 11:08:43AM +0100, Jan Kara wrote: > Ext4 needs to pass through error from its iomap handler to the page > fault handler so that it can properly detect ENOSPC and force > transaction commit and retry the fault (and block allocation). Add > argument to dax_iomap_fault() for passing such error. > > Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> > Signed-off-by: Jan Kara <jack@suse.cz> Applied, thanks. - Ted ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/2] ext4: Fix ENOSPC handling for DAX faults @ 2017-12-13 9:13 Jan Kara 2017-12-13 9:13 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2017-12-13 9:13 UTC (permalink / raw) To: linux-ext4 Cc: Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm, Jan Kara Hello, these two patches fix handling of ENOSPC during DAX faults. The problem is that currently running transaction may be holding lots of already freed blocks which can be reallocated only once the transaction commits. Standard retry logic in ext4_iomap_end() does not work for DAX page fault handler since we hold current transaction open in ext4_dax_huge_fault() and thus retry logic cannot force the running transaction and as a result application gets SIGBUS due to premature ENOSPC error. These two patches fix the problem. I'm not too happy about patch 1/2 passing additional info (error code) from the fault handler but I don't see an easy better option since we want to also pass back special return values like VM_FAULT_FALLBACK. Comments are welcome. Honza ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() 2017-12-13 9:13 [PATCH 0/2] ext4: Fix ENOSPC handling for DAX faults Jan Kara @ 2017-12-13 9:13 ` Jan Kara 0 siblings, 0 replies; 11+ messages in thread From: Jan Kara @ 2017-12-13 9:13 UTC (permalink / raw) To: linux-ext4 Cc: Ted Tso, Ross Zwisler, Dan Williams, linux-fsdevel, linux-nvdimm, Jan Kara Ext4 needs to pass through error from its iomap handler to the page fault handler so that it can properly detect ENOSPC and force transaction commit and retry the fault (and block allocation). Add argument to dax_iomap_fault() for passing such error. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/dax.c | 25 +++++++++++++++---------- fs/ext2/file.c | 2 +- fs/ext4/file.c | 2 +- fs/xfs/xfs_file.c | 2 +- include/linux/dax.h | 2 +- 5 files changed, 19 insertions(+), 14 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 78b72c48374e..5bab8210599a 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -872,7 +872,7 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, * point to real DAX storage instead. */ static int dax_load_hole(struct address_space *mapping, void *entry, - struct vm_fault *vmf) + struct vm_fault *vmf, int *errp) { struct inode *inode = mapping->host; unsigned long vaddr = vmf->address; @@ -890,6 +890,8 @@ static int dax_load_hole(struct address_space *mapping, void *entry, RADIX_DAX_ZERO_PAGE, false); if (IS_ERR(entry2)) { ret = VM_FAULT_SIGBUS; + if (errp) + *errp = PTR_ERR(entry2); goto out; } @@ -1076,10 +1078,12 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, } EXPORT_SYMBOL_GPL(dax_iomap_rw); -static int dax_fault_return(int error) +static int dax_fault_return(int error, int *errp) { if (error == 0) return VM_FAULT_NOPAGE; + if (errp) + *errp = error; if (error == -ENOMEM) return VM_FAULT_OOM; return VM_FAULT_SIGBUS; @@ -1096,7 +1100,7 @@ static bool dax_fault_is_synchronous(unsigned long flags, && (iomap->flags & IOMAP_F_DIRTY); } -static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, +static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, int *errp, const struct iomap_ops *ops) { struct vm_area_struct *vma = vmf->vma; @@ -1129,7 +1133,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, entry = grab_mapping_entry(mapping, vmf->pgoff, 0); if (IS_ERR(entry)) { - vmf_ret = dax_fault_return(PTR_ERR(entry)); + vmf_ret = dax_fault_return(PTR_ERR(entry), errp); goto out; } @@ -1151,7 +1155,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, */ error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); if (error) { - vmf_ret = dax_fault_return(error); + vmf_ret = dax_fault_return(error, errp); goto unlock_entry; } if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) { @@ -1236,7 +1240,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, case IOMAP_UNWRITTEN: case IOMAP_HOLE: if (!write) { - vmf_ret = dax_load_hole(mapping, entry, vmf); + vmf_ret = dax_load_hole(mapping, entry, vmf, errp); goto finish_iomap; } /*FALLTHRU*/ @@ -1247,7 +1251,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, } error_finish_iomap: - vmf_ret = dax_fault_return(error) | major; + vmf_ret = dax_fault_return(error, errp) | major; finish_iomap: if (ops->iomap_end) { int copied = PAGE_SIZE; @@ -1489,6 +1493,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * @vmf: The description of the fault * @pe_size: Size of the page to fault in * @pfnp: PFN to insert for synchronous faults if fsync is required + * @errp: Storage for detailed error code in case of error * @ops: Iomap ops passed from the file system * * When a page fault occurs, filesystems may call this helper in @@ -1497,11 +1502,11 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * successfully. */ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, - pfn_t *pfnp, const struct iomap_ops *ops) + pfn_t *pfnp, int *errp, const struct iomap_ops *ops) { switch (pe_size) { case PE_SIZE_PTE: - return dax_iomap_pte_fault(vmf, pfnp, ops); + return dax_iomap_pte_fault(vmf, pfnp, errp, ops); case PE_SIZE_PMD: return dax_iomap_pmd_fault(vmf, pfnp, ops); default: @@ -1547,7 +1552,7 @@ static int dax_insert_pfn_mkwrite(struct vm_fault *vmf, switch (pe_size) { case PE_SIZE_PTE: error = vm_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); - vmf_ret = dax_fault_return(error); + vmf_ret = dax_fault_return(error, NULL); break; #ifdef CONFIG_FS_DAX_PMD case PE_SIZE_PMD: diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 2da67699dc33..09640220fda8 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -100,7 +100,7 @@ static int ext2_dax_fault(struct vm_fault *vmf) } down_read(&ei->dax_sem); - ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, &ext2_iomap_ops); + ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); up_read(&ei->dax_sem); if (vmf->flags & FAULT_FLAG_WRITE) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index a0ae27b1bc66..1c7cd882d998 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -314,7 +314,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf, } else { down_read(&EXT4_I(inode)->i_mmap_sem); } - result = dax_iomap_fault(vmf, pe_size, &pfn, &ext4_iomap_ops); + result = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &ext4_iomap_ops); if (write) { ext4_journal_stop(handle); /* Handling synchronous page fault? */ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 8601275cc5e6..9ea08326f876 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1048,7 +1048,7 @@ __xfs_filemap_fault( if (IS_DAX(inode)) { pfn_t pfn; - ret = dax_iomap_fault(vmf, pe_size, &pfn, &xfs_iomap_ops); + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); } else { diff --git a/include/linux/dax.h b/include/linux/dax.h index 5258346c558c..0185ecdae135 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -96,7 +96,7 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev); ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, - pfn_t *pfnp, const struct iomap_ops *ops); + pfn_t *pfnp, int *errp, const struct iomap_ops *ops); int dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t pfn); int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); -- 2.12.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-01-07 21:51 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-12-21 16:30 [PATCH 0/2 v2] ext4: Fix ENOSPC handling for DAX faults Jan Kara 2017-12-21 16:30 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara 2017-12-21 17:12 ` Dan Williams 2018-01-02 18:54 ` Jan Kara 2018-01-02 18:57 ` Dan Williams 2017-12-21 17:16 ` Ross Zwisler 2017-12-21 16:30 ` [PATCH 2/2] ext4: Fix ENOSPC handling in DAX page fault handler Jan Kara 2017-12-21 17:17 ` Ross Zwisler -- strict thread matches above, loose matches on Subject: below -- 2018-01-03 10:08 [PATCH 0/2 v3] ext4: Fix ENOSPC handling for DAX faults Jan Kara 2018-01-03 10:08 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara 2018-01-07 21:51 ` Theodore Ts'o 2017-12-13 9:13 [PATCH 0/2] ext4: Fix ENOSPC handling for DAX faults Jan Kara 2017-12-13 9:13 ` [PATCH 1/2] dax: Pass detailed error code from dax_iomap_fault() Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).