* [PATCH 0/3] mm, xfs: memory allocation improvements @ 2021-06-25 2:30 Dave Chinner 2021-06-25 2:30 ` [PATCH 1/3] mm: Add kvrealloc() Dave Chinner ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Dave Chinner @ 2021-06-25 2:30 UTC (permalink / raw) To: linux-xfs, linux-mm Hi folks, We're slowly trying to move the XFS code closer to the common memory allocation routines everyone else uses. This patch set addresses a regression from a previous set of changes (kmem_realloc() removal) and removes another couple of kmem wrappers from within the XFS code. The first patch addresses a regression - using large directory blocks triggers a warning when calling krealloc() recombine a buffer split across across two log records into a single contiguous memory buffer. this results in krealloc() being called to allocate a 64kB buffer with __GFP_NOFAIL being set. This warning is trivially reproduced by generic/040 and generic/041 when run with 64kB directory block sizes on a 4kB page size machine. Log recovery really needs to use kvmalloc() like all the other "allocate up to 64kB and can't fail" cases in filesystem code (e.g. for xattrs), but there's no native kvrealloc() function that provides us with the necessary semantics. So rather than add a new wrapper, the first patch adds this helper to the common code and converts XFS to use it for log recovery. The latter two patches are removing what are essentially kvmalloc() wrappers from XFS. With the more widespread use of memalloc_nofs_save/restore(), we can call kvmalloc(GFP_KERNEL) and just have it do the right thing because GFP_NOFS contexts are covered by PF_MEMALLOC_NOFS task flags now. Hence we don't need kmem_alloc_large() anymore. And with the slab code guaranteeing at least 512 byte alignment for sector and block sized heap allocations, we no longer need the kmem_alloc_io() variant of kmem_alloc_large() for IO buffers. So these wrappers can all go away... Cheers, Dave. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] mm: Add kvrealloc() 2021-06-25 2:30 [PATCH 0/3] mm, xfs: memory allocation improvements Dave Chinner @ 2021-06-25 2:30 ` Dave Chinner 2021-06-25 2:40 ` Miaohe Lin 2021-06-25 3:54 ` Matthew Wilcox 2021-06-25 2:30 ` [PATCH 2/3] xfs: remove kmem_alloc_io() Dave Chinner 2021-06-25 2:30 ` [PATCH 3/3] xfs: replace kmem_alloc_large() with kvmalloc() Dave Chinner 2 siblings, 2 replies; 12+ messages in thread From: Dave Chinner @ 2021-06-25 2:30 UTC (permalink / raw) To: linux-xfs, linux-mm From: Dave Chinner <dchinner@redhat.com> During log recovery of an XFS filesystem with 64kB directory buffers, rebuilding a buffer split across two log records results in a memory allocation warning from krealloc like this: xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff) XFS (dm-0): Unmounting Filesystem XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) ------------[ cut here ]------------ WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40 ..... RIP: 0010:get_page_from_freelist+0xdee/0xe40 Call Trace: ? complete+0x3f/0x50 __alloc_pages+0x16f/0x300 alloc_pages+0x87/0x110 kmalloc_order+0x2c/0x90 kmalloc_order_trace+0x1d/0x90 __kmalloc_track_caller+0x215/0x270 ? xlog_recover_add_to_cont_trans+0x63/0x1f0 krealloc+0x54/0xb0 xlog_recover_add_to_cont_trans+0x63/0x1f0 xlog_recovery_process_trans+0xc1/0xd0 xlog_recover_process_ophdr+0x86/0x130 xlog_recover_process_data+0x9f/0x160 xlog_recover_process+0xa2/0x120 xlog_do_recovery_pass+0x40b/0x7d0 ? __irq_work_queue_local+0x4f/0x60 ? irq_work_queue+0x3a/0x50 xlog_do_log_recovery+0x70/0x150 xlog_do_recover+0x38/0x1d0 xlog_recover+0xd8/0x170 xfs_log_mount+0x181/0x300 xfs_mountfs+0x4a1/0x9b0 xfs_fs_fill_super+0x3c0/0x7b0 get_tree_bdev+0x171/0x270 ? suffix_kstrtoint.constprop.0+0xf0/0xf0 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x2f5/0xaf0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Essentially, we are taking a multi-order allocation from kmem_alloc() (which has an open coded no fail, no warn loop) and then reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is then triggering the above warning. This is a regression caused by converting this code from an open coded no fail/no warn reallocation loop to using __GFP_NOFAIL. What we actually need here is kvrealloc(), so that if contiguous page allocation fails we fall back to vmalloc() and we don't get nasty warnings happening in XFS. Fixes: 771915c4f688 ("xfs: remove kmem_realloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_log_recover.c | 2 +- include/linux/mm.h | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 1721fce2ec94..fee4fbadea0a 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2062,7 +2062,7 @@ xlog_recover_add_to_cont_trans( old_ptr = item->ri_buf[item->ri_cnt-1].i_addr; old_len = item->ri_buf[item->ri_cnt-1].i_len; - ptr = krealloc(old_ptr, len + old_len, GFP_KERNEL | __GFP_NOFAIL); + ptr = kvrealloc(old_ptr, old_len, len + old_len, GFP_KERNEL); memcpy(&ptr[old_len], dp, len); item->ri_buf[item->ri_cnt-1].i_len += len; item->ri_buf[item->ri_cnt-1].i_addr = ptr; diff --git a/include/linux/mm.h b/include/linux/mm.h index 8ae31622deef..34d88ff00f31 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -830,6 +830,20 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); +static inline void *kvrealloc(void *p, size_t oldsize, size_t newsize, + gfp_t flags) +{ + void *newp; + + if (oldsize >= newsize) + return p; + newp = kvmalloc(newsize, flags); + memcpy(newp, p, oldsize); + kvfree(p); + return newp; +} + + static inline int head_compound_mapcount(struct page *head) { return atomic_read(compound_mapcount_ptr(head)) + 1; -- 2.31.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] mm: Add kvrealloc() 2021-06-25 2:30 ` [PATCH 1/3] mm: Add kvrealloc() Dave Chinner @ 2021-06-25 2:40 ` Miaohe Lin 2021-06-25 5:05 ` Dave Chinner 2021-06-25 3:54 ` Matthew Wilcox 1 sibling, 1 reply; 12+ messages in thread From: Miaohe Lin @ 2021-06-25 2:40 UTC (permalink / raw) To: Dave Chinner, linux-xfs, linux-mm On 2021/6/25 10:30, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > During log recovery of an XFS filesystem with 64kB directory > buffers, rebuilding a buffer split across two log records results > in a memory allocation warning from krealloc like this: > > xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff) > XFS (dm-0): Unmounting Filesystem > XFS (dm-0): Mounting V5 Filesystem > XFS (dm-0): Starting recovery (logdev: internal) > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40 > ..... > RIP: 0010:get_page_from_freelist+0xdee/0xe40 > Call Trace: > ? complete+0x3f/0x50 > __alloc_pages+0x16f/0x300 > alloc_pages+0x87/0x110 > kmalloc_order+0x2c/0x90 > kmalloc_order_trace+0x1d/0x90 > __kmalloc_track_caller+0x215/0x270 > ? xlog_recover_add_to_cont_trans+0x63/0x1f0 > krealloc+0x54/0xb0 > xlog_recover_add_to_cont_trans+0x63/0x1f0 > xlog_recovery_process_trans+0xc1/0xd0 > xlog_recover_process_ophdr+0x86/0x130 > xlog_recover_process_data+0x9f/0x160 > xlog_recover_process+0xa2/0x120 > xlog_do_recovery_pass+0x40b/0x7d0 > ? __irq_work_queue_local+0x4f/0x60 > ? irq_work_queue+0x3a/0x50 > xlog_do_log_recovery+0x70/0x150 > xlog_do_recover+0x38/0x1d0 > xlog_recover+0xd8/0x170 > xfs_log_mount+0x181/0x300 > xfs_mountfs+0x4a1/0x9b0 > xfs_fs_fill_super+0x3c0/0x7b0 > get_tree_bdev+0x171/0x270 > ? suffix_kstrtoint.constprop.0+0xf0/0xf0 > xfs_fs_get_tree+0x15/0x20 > vfs_get_tree+0x24/0xc0 > path_mount+0x2f5/0xaf0 > __x64_sys_mount+0x108/0x140 > do_syscall_64+0x3a/0x70 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > Essentially, we are taking a multi-order allocation from kmem_alloc() > (which has an open coded no fail, no warn loop) and then > reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is > then triggering the above warning. > > This is a regression caused by converting this code from an open > coded no fail/no warn reallocation loop to using __GFP_NOFAIL. > > What we actually need here is kvrealloc(), so that if contiguous > page allocation fails we fall back to vmalloc() and we don't > get nasty warnings happening in XFS. > > Fixes: 771915c4f688 ("xfs: remove kmem_realloc()") > Signed-off-by: Dave Chinner <dchinner@redhat.com> Thank you for your patch. > --- > fs/xfs/xfs_log_recover.c | 2 +- > include/linux/mm.h | 14 ++++++++++++++ > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index 1721fce2ec94..fee4fbadea0a 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -2062,7 +2062,7 @@ xlog_recover_add_to_cont_trans( > old_ptr = item->ri_buf[item->ri_cnt-1].i_addr; > old_len = item->ri_buf[item->ri_cnt-1].i_len; > > - ptr = krealloc(old_ptr, len + old_len, GFP_KERNEL | __GFP_NOFAIL); > + ptr = kvrealloc(old_ptr, old_len, len + old_len, GFP_KERNEL); > memcpy(&ptr[old_len], dp, len); > item->ri_buf[item->ri_cnt-1].i_len += len; > item->ri_buf[item->ri_cnt-1].i_addr = ptr; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8ae31622deef..34d88ff00f31 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -830,6 +830,20 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) > extern void kvfree(const void *addr); > extern void kvfree_sensitive(const void *addr, size_t len); > > +static inline void *kvrealloc(void *p, size_t oldsize, size_t newsize, > + gfp_t flags) > +{ > + void *newp; > + > + if (oldsize >= newsize) > + return p; > + newp = kvmalloc(newsize, flags); Shouldn't we check newp against NULL before memcpy? Thanks. > + memcpy(newp, p, oldsize); > + kvfree(p); > + return newp; > +} > + > + > static inline int head_compound_mapcount(struct page *head) > { > return atomic_read(compound_mapcount_ptr(head)) + 1; > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] mm: Add kvrealloc() 2021-06-25 2:40 ` Miaohe Lin @ 2021-06-25 5:05 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2021-06-25 5:05 UTC (permalink / raw) To: Miaohe Lin; +Cc: linux-xfs, linux-mm On Fri, Jun 25, 2021 at 10:40:08AM +0800, Miaohe Lin wrote: > On 2021/6/25 10:30, Dave Chinner wrote: > > @@ -830,6 +830,20 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) > > extern void kvfree(const void *addr); > > extern void kvfree_sensitive(const void *addr, size_t len); > > > > +static inline void *kvrealloc(void *p, size_t oldsize, size_t newsize, > > + gfp_t flags) > > +{ > > + void *newp; > > + > > + if (oldsize >= newsize) > > + return p; > > + newp = kvmalloc(newsize, flags); > > Shouldn't we check newp against NULL before memcpy? Sure, the flags cwwe pass it from XFS mean it can't fail, but I guess someone could add a noretry flag or something like that... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] mm: Add kvrealloc() 2021-06-25 2:30 ` [PATCH 1/3] mm: Add kvrealloc() Dave Chinner 2021-06-25 2:40 ` Miaohe Lin @ 2021-06-25 3:54 ` Matthew Wilcox 2021-06-25 5:02 ` Dave Chinner 1 sibling, 1 reply; 12+ messages in thread From: Matthew Wilcox @ 2021-06-25 3:54 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, linux-mm On Fri, Jun 25, 2021 at 12:30:27PM +1000, Dave Chinner wrote: > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8ae31622deef..34d88ff00f31 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -830,6 +830,20 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) > extern void kvfree(const void *addr); > extern void kvfree_sensitive(const void *addr, size_t len); > > +static inline void *kvrealloc(void *p, size_t oldsize, size_t newsize, > + gfp_t flags) > +{ > + void *newp; > + > + if (oldsize >= newsize) > + return p; > + newp = kvmalloc(newsize, flags); > + memcpy(newp, p, oldsize); > + kvfree(p); > + return newp; > +} Why make this an inline function instead of putting it in mm/util.c? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] mm: Add kvrealloc() 2021-06-25 3:54 ` Matthew Wilcox @ 2021-06-25 5:02 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2021-06-25 5:02 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-xfs, linux-mm On Fri, Jun 25, 2021 at 04:54:11AM +0100, Matthew Wilcox wrote: > On Fri, Jun 25, 2021 at 12:30:27PM +1000, Dave Chinner wrote: > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 8ae31622deef..34d88ff00f31 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -830,6 +830,20 @@ static inline void *kvcalloc(size_t n, size_t size, gfp_t flags) > > extern void kvfree(const void *addr); > > extern void kvfree_sensitive(const void *addr, size_t len); > > > > +static inline void *kvrealloc(void *p, size_t oldsize, size_t newsize, > > + gfp_t flags) > > +{ > > + void *newp; > > + > > + if (oldsize >= newsize) > > + return p; > > + newp = kvmalloc(newsize, flags); > > + memcpy(newp, p, oldsize); > > + kvfree(p); > > + return newp; > > +} > > Why make this an inline function instead of putting it in mm/util.c? Not fussed, I can do that. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] xfs: remove kmem_alloc_io() 2021-06-25 2:30 [PATCH 0/3] mm, xfs: memory allocation improvements Dave Chinner 2021-06-25 2:30 ` [PATCH 1/3] mm: Add kvrealloc() Dave Chinner @ 2021-06-25 2:30 ` Dave Chinner 2021-06-26 2:01 ` Darrick J. Wong 2021-06-25 2:30 ` [PATCH 3/3] xfs: replace kmem_alloc_large() with kvmalloc() Dave Chinner 2 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2021-06-25 2:30 UTC (permalink / raw) To: linux-xfs, linux-mm From: Dave Chinner <dchinner@redhat.com> Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee. Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/kmem.c | 25 ------------------------- fs/xfs/kmem.h | 1 - fs/xfs/xfs_buf.c | 3 +-- fs/xfs/xfs_log.c | 3 +-- fs/xfs/xfs_log_recover.c | 4 +--- fs/xfs/xfs_trace.h | 1 - 6 files changed, 3 insertions(+), 34 deletions(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index e986b95d94c9..3f2979fd2f2b 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -56,31 +56,6 @@ __kmem_vmalloc(size_t size, xfs_km_flags_t flags) return ptr; } -/* - * Same as kmem_alloc_large, except we guarantee the buffer returned is aligned - * to the @align_mask. We only guarantee alignment up to page size, we'll clamp - * alignment at page size if it is larger. vmalloc always returns a PAGE_SIZE - * aligned region. - */ -void * -kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags) -{ - void *ptr; - - trace_kmem_alloc_io(size, flags, _RET_IP_); - - if (WARN_ON_ONCE(align_mask >= PAGE_SIZE)) - align_mask = PAGE_SIZE - 1; - - ptr = kmem_alloc(size, flags | KM_MAYFAIL); - if (ptr) { - if (!((uintptr_t)ptr & align_mask)) - return ptr; - kfree(ptr); - } - return __kmem_vmalloc(size, flags); -} - void * kmem_alloc_large(size_t size, xfs_km_flags_t flags) { diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index 38007117697e..9ff20047f8b8 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -57,7 +57,6 @@ kmem_flags_convert(xfs_km_flags_t flags) } extern void *kmem_alloc(size_t, xfs_km_flags_t); -extern void *kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags); extern void *kmem_alloc_large(size_t size, xfs_km_flags_t); static inline void kmem_free(const void *ptr) { diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 8ff42b3585e0..a5ef1f9eb622 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -315,7 +315,6 @@ xfs_buf_alloc_kmem( struct xfs_buf *bp, xfs_buf_flags_t flags) { - int align_mask = xfs_buftarg_dma_alignment(bp->b_target); xfs_km_flags_t kmflag_mask = KM_NOFS; size_t size = BBTOB(bp->b_length); @@ -323,7 +322,7 @@ xfs_buf_alloc_kmem( if (!(flags & XBF_READ)) kmflag_mask |= KM_ZERO; - bp->b_addr = kmem_alloc_io(size, align_mask, kmflag_mask); + bp->b_addr = kmem_alloc(size, kmflag_mask); if (!bp->b_addr) return -ENOMEM; diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index e93cac6b5378..404970a4343c 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1451,7 +1451,6 @@ xlog_alloc_log( */ ASSERT(log->l_iclog_size >= 4096); for (i = 0; i < log->l_iclog_bufs; i++) { - int align_mask = xfs_buftarg_dma_alignment(mp->m_logdev_targp); size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) * sizeof(struct bio_vec); @@ -1463,7 +1462,7 @@ xlog_alloc_log( iclog->ic_prev = prev_iclog; prev_iclog = iclog; - iclog->ic_data = kmem_alloc_io(log->l_iclog_size, align_mask, + iclog->ic_data = kmem_alloc_large(log->l_iclog_size, KM_MAYFAIL | KM_ZERO); if (!iclog->ic_data) goto out_free_iclog; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index fee4fbadea0a..cc559815e08f 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -79,8 +79,6 @@ xlog_alloc_buffer( struct xlog *log, int nbblks) { - int align_mask = xfs_buftarg_dma_alignment(log->l_targ); - /* * Pass log block 0 since we don't have an addr yet, buffer will be * verified on read. @@ -108,7 +106,7 @@ xlog_alloc_buffer( if (nbblks > 1 && log->l_sectBBsize > 1) nbblks += log->l_sectBBsize; nbblks = round_up(nbblks, log->l_sectBBsize); - return kmem_alloc_io(BBTOB(nbblks), align_mask, KM_MAYFAIL | KM_ZERO); + return kmem_alloc_large(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO); } /* diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index f9d8d605f9b1..6865e838a71b 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3689,7 +3689,6 @@ DEFINE_EVENT(xfs_kmem_class, name, \ TP_PROTO(ssize_t size, int flags, unsigned long caller_ip), \ TP_ARGS(size, flags, caller_ip)) DEFINE_KMEM_EVENT(kmem_alloc); -DEFINE_KMEM_EVENT(kmem_alloc_io); DEFINE_KMEM_EVENT(kmem_alloc_large); TRACE_EVENT(xfs_check_new_dalign, -- 2.31.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] xfs: remove kmem_alloc_io() 2021-06-25 2:30 ` [PATCH 2/3] xfs: remove kmem_alloc_io() Dave Chinner @ 2021-06-26 2:01 ` Darrick J. Wong 2021-06-27 22:09 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Darrick J. Wong @ 2021-06-26 2:01 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, linux-mm On Fri, Jun 25, 2021 at 12:30:28PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment > for kmalloc(power-of-two)"), the core slab code now guarantees slab > alignment in all situations sufficient for IO purposes (i.e. minimum > of 512 byte alignment of >= 512 byte sized heap allocations) we no > longer need the workaround in the XFS code to provide this > guarantee. > > Replace the use of kmem_alloc_io() with kmem_alloc() or > kmem_alloc_large() appropriately, and remove the kmem_alloc_io() > interface altogether. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/kmem.c | 25 ------------------------- > fs/xfs/kmem.h | 1 - > fs/xfs/xfs_buf.c | 3 +-- > fs/xfs/xfs_log.c | 3 +-- > fs/xfs/xfs_log_recover.c | 4 +--- > fs/xfs/xfs_trace.h | 1 - > 6 files changed, 3 insertions(+), 34 deletions(-) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index e986b95d94c9..3f2979fd2f2b 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -56,31 +56,6 @@ __kmem_vmalloc(size_t size, xfs_km_flags_t flags) > return ptr; > } > > -/* > - * Same as kmem_alloc_large, except we guarantee the buffer returned is aligned > - * to the @align_mask. We only guarantee alignment up to page size, we'll clamp > - * alignment at page size if it is larger. vmalloc always returns a PAGE_SIZE > - * aligned region. > - */ > -void * > -kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags) > -{ > - void *ptr; > - > - trace_kmem_alloc_io(size, flags, _RET_IP_); > - > - if (WARN_ON_ONCE(align_mask >= PAGE_SIZE)) > - align_mask = PAGE_SIZE - 1; > - > - ptr = kmem_alloc(size, flags | KM_MAYFAIL); > - if (ptr) { > - if (!((uintptr_t)ptr & align_mask)) > - return ptr; > - kfree(ptr); > - } > - return __kmem_vmalloc(size, flags); > -} > - > void * > kmem_alloc_large(size_t size, xfs_km_flags_t flags) > { > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > index 38007117697e..9ff20047f8b8 100644 > --- a/fs/xfs/kmem.h > +++ b/fs/xfs/kmem.h > @@ -57,7 +57,6 @@ kmem_flags_convert(xfs_km_flags_t flags) > } > > extern void *kmem_alloc(size_t, xfs_km_flags_t); > -extern void *kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags); > extern void *kmem_alloc_large(size_t size, xfs_km_flags_t); > static inline void kmem_free(const void *ptr) > { > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 8ff42b3585e0..a5ef1f9eb622 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -315,7 +315,6 @@ xfs_buf_alloc_kmem( > struct xfs_buf *bp, > xfs_buf_flags_t flags) > { > - int align_mask = xfs_buftarg_dma_alignment(bp->b_target); Is xfs_buftarg_dma_alignment unused now? -or- Should we trust that the memory allocators will always maintain at least the current alignment guarantees, or actually check the alignment of the returned buffer if CONFIG_XFS_DEBUG=y? --D > xfs_km_flags_t kmflag_mask = KM_NOFS; > size_t size = BBTOB(bp->b_length); > > @@ -323,7 +322,7 @@ xfs_buf_alloc_kmem( > if (!(flags & XBF_READ)) > kmflag_mask |= KM_ZERO; > > - bp->b_addr = kmem_alloc_io(size, align_mask, kmflag_mask); > + bp->b_addr = kmem_alloc(size, kmflag_mask); > if (!bp->b_addr) > return -ENOMEM; > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index e93cac6b5378..404970a4343c 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1451,7 +1451,6 @@ xlog_alloc_log( > */ > ASSERT(log->l_iclog_size >= 4096); > for (i = 0; i < log->l_iclog_bufs; i++) { > - int align_mask = xfs_buftarg_dma_alignment(mp->m_logdev_targp); > size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) * > sizeof(struct bio_vec); > > @@ -1463,7 +1462,7 @@ xlog_alloc_log( > iclog->ic_prev = prev_iclog; > prev_iclog = iclog; > > - iclog->ic_data = kmem_alloc_io(log->l_iclog_size, align_mask, > + iclog->ic_data = kmem_alloc_large(log->l_iclog_size, > KM_MAYFAIL | KM_ZERO); > if (!iclog->ic_data) > goto out_free_iclog; > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index fee4fbadea0a..cc559815e08f 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -79,8 +79,6 @@ xlog_alloc_buffer( > struct xlog *log, > int nbblks) > { > - int align_mask = xfs_buftarg_dma_alignment(log->l_targ); > - > /* > * Pass log block 0 since we don't have an addr yet, buffer will be > * verified on read. > @@ -108,7 +106,7 @@ xlog_alloc_buffer( > if (nbblks > 1 && log->l_sectBBsize > 1) > nbblks += log->l_sectBBsize; > nbblks = round_up(nbblks, log->l_sectBBsize); > - return kmem_alloc_io(BBTOB(nbblks), align_mask, KM_MAYFAIL | KM_ZERO); > + return kmem_alloc_large(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO); > } > > /* > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index f9d8d605f9b1..6865e838a71b 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -3689,7 +3689,6 @@ DEFINE_EVENT(xfs_kmem_class, name, \ > TP_PROTO(ssize_t size, int flags, unsigned long caller_ip), \ > TP_ARGS(size, flags, caller_ip)) > DEFINE_KMEM_EVENT(kmem_alloc); > -DEFINE_KMEM_EVENT(kmem_alloc_io); > DEFINE_KMEM_EVENT(kmem_alloc_large); > > TRACE_EVENT(xfs_check_new_dalign, > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] xfs: remove kmem_alloc_io() 2021-06-26 2:01 ` Darrick J. Wong @ 2021-06-27 22:09 ` Dave Chinner 2021-06-27 23:14 ` Darrick J. Wong 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2021-06-27 22:09 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, linux-mm On Fri, Jun 25, 2021 at 07:01:45PM -0700, Darrick J. Wong wrote: > On Fri, Jun 25, 2021 at 12:30:28PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment > > for kmalloc(power-of-two)"), the core slab code now guarantees slab > > alignment in all situations sufficient for IO purposes (i.e. minimum > > of 512 byte alignment of >= 512 byte sized heap allocations) we no > > longer need the workaround in the XFS code to provide this > > guarantee. > > > > Replace the use of kmem_alloc_io() with kmem_alloc() or > > kmem_alloc_large() appropriately, and remove the kmem_alloc_io() > > interface altogether. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/kmem.c | 25 ------------------------- > > fs/xfs/kmem.h | 1 - > > fs/xfs/xfs_buf.c | 3 +-- > > fs/xfs/xfs_log.c | 3 +-- > > fs/xfs/xfs_log_recover.c | 4 +--- > > fs/xfs/xfs_trace.h | 1 - > > 6 files changed, 3 insertions(+), 34 deletions(-) > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > index e986b95d94c9..3f2979fd2f2b 100644 > > --- a/fs/xfs/kmem.c > > +++ b/fs/xfs/kmem.c > > @@ -56,31 +56,6 @@ __kmem_vmalloc(size_t size, xfs_km_flags_t flags) > > return ptr; > > } > > > > -/* > > - * Same as kmem_alloc_large, except we guarantee the buffer returned is aligned > > - * to the @align_mask. We only guarantee alignment up to page size, we'll clamp > > - * alignment at page size if it is larger. vmalloc always returns a PAGE_SIZE > > - * aligned region. > > - */ > > -void * > > -kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags) > > -{ > > - void *ptr; > > - > > - trace_kmem_alloc_io(size, flags, _RET_IP_); > > - > > - if (WARN_ON_ONCE(align_mask >= PAGE_SIZE)) > > - align_mask = PAGE_SIZE - 1; > > - > > - ptr = kmem_alloc(size, flags | KM_MAYFAIL); > > - if (ptr) { > > - if (!((uintptr_t)ptr & align_mask)) > > - return ptr; > > - kfree(ptr); > > - } > > - return __kmem_vmalloc(size, flags); > > -} > > - > > void * > > kmem_alloc_large(size_t size, xfs_km_flags_t flags) > > { > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > > index 38007117697e..9ff20047f8b8 100644 > > --- a/fs/xfs/kmem.h > > +++ b/fs/xfs/kmem.h > > @@ -57,7 +57,6 @@ kmem_flags_convert(xfs_km_flags_t flags) > > } > > > > extern void *kmem_alloc(size_t, xfs_km_flags_t); > > -extern void *kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags); > > extern void *kmem_alloc_large(size_t size, xfs_km_flags_t); > > static inline void kmem_free(const void *ptr) > > { > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index 8ff42b3585e0..a5ef1f9eb622 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -315,7 +315,6 @@ xfs_buf_alloc_kmem( > > struct xfs_buf *bp, > > xfs_buf_flags_t flags) > > { > > - int align_mask = xfs_buftarg_dma_alignment(bp->b_target); > > Is xfs_buftarg_dma_alignment unused now? It is unused, I'll remove it. > -or- > > Should we trust that the memory allocators will always maintain at least > the current alignment guarantees, or actually check the alignment of the > returned buffer if CONFIG_XFS_DEBUG=y? It's documented in Documentation/core-api/memory-allocation.rst that the alignment of the memory for power of two sized allocations is guaranteed. That means it's the responsibility of the mm developers to test and ensure this API-defined behaviour does not regress. I don't think it's viable for us to test every guarantee other subsystems are supposed to provide us with regardless of whether CONFIG_XFS_DEBUG=y is set or not... Unfortunately, I don't see any debug or test infrastructure that ensures the allocation alignment guarantee is being met. Perhaps that's something the mm developers need to address? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] xfs: remove kmem_alloc_io() 2021-06-27 22:09 ` Dave Chinner @ 2021-06-27 23:14 ` Darrick J. Wong 0 siblings, 0 replies; 12+ messages in thread From: Darrick J. Wong @ 2021-06-27 23:14 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, linux-mm On Mon, Jun 28, 2021 at 08:09:46AM +1000, Dave Chinner wrote: > On Fri, Jun 25, 2021 at 07:01:45PM -0700, Darrick J. Wong wrote: > > On Fri, Jun 25, 2021 at 12:30:28PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment > > > for kmalloc(power-of-two)"), the core slab code now guarantees slab > > > alignment in all situations sufficient for IO purposes (i.e. minimum > > > of 512 byte alignment of >= 512 byte sized heap allocations) we no > > > longer need the workaround in the XFS code to provide this > > > guarantee. > > > > > > Replace the use of kmem_alloc_io() with kmem_alloc() or > > > kmem_alloc_large() appropriately, and remove the kmem_alloc_io() > > > interface altogether. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > --- > > > fs/xfs/kmem.c | 25 ------------------------- > > > fs/xfs/kmem.h | 1 - > > > fs/xfs/xfs_buf.c | 3 +-- > > > fs/xfs/xfs_log.c | 3 +-- > > > fs/xfs/xfs_log_recover.c | 4 +--- > > > fs/xfs/xfs_trace.h | 1 - > > > 6 files changed, 3 insertions(+), 34 deletions(-) > > > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > > index e986b95d94c9..3f2979fd2f2b 100644 > > > --- a/fs/xfs/kmem.c > > > +++ b/fs/xfs/kmem.c > > > @@ -56,31 +56,6 @@ __kmem_vmalloc(size_t size, xfs_km_flags_t flags) > > > return ptr; > > > } > > > > > > -/* > > > - * Same as kmem_alloc_large, except we guarantee the buffer returned is aligned > > > - * to the @align_mask. We only guarantee alignment up to page size, we'll clamp > > > - * alignment at page size if it is larger. vmalloc always returns a PAGE_SIZE > > > - * aligned region. > > > - */ > > > -void * > > > -kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags) > > > -{ > > > - void *ptr; > > > - > > > - trace_kmem_alloc_io(size, flags, _RET_IP_); > > > - > > > - if (WARN_ON_ONCE(align_mask >= PAGE_SIZE)) > > > - align_mask = PAGE_SIZE - 1; > > > - > > > - ptr = kmem_alloc(size, flags | KM_MAYFAIL); > > > - if (ptr) { > > > - if (!((uintptr_t)ptr & align_mask)) > > > - return ptr; > > > - kfree(ptr); > > > - } > > > - return __kmem_vmalloc(size, flags); > > > -} > > > - > > > void * > > > kmem_alloc_large(size_t size, xfs_km_flags_t flags) > > > { > > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > > > index 38007117697e..9ff20047f8b8 100644 > > > --- a/fs/xfs/kmem.h > > > +++ b/fs/xfs/kmem.h > > > @@ -57,7 +57,6 @@ kmem_flags_convert(xfs_km_flags_t flags) > > > } > > > > > > extern void *kmem_alloc(size_t, xfs_km_flags_t); > > > -extern void *kmem_alloc_io(size_t size, int align_mask, xfs_km_flags_t flags); > > > extern void *kmem_alloc_large(size_t size, xfs_km_flags_t); > > > static inline void kmem_free(const void *ptr) > > > { > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > > index 8ff42b3585e0..a5ef1f9eb622 100644 > > > --- a/fs/xfs/xfs_buf.c > > > +++ b/fs/xfs/xfs_buf.c > > > @@ -315,7 +315,6 @@ xfs_buf_alloc_kmem( > > > struct xfs_buf *bp, > > > xfs_buf_flags_t flags) > > > { > > > - int align_mask = xfs_buftarg_dma_alignment(bp->b_target); > > > > Is xfs_buftarg_dma_alignment unused now? > > It is unused, I'll remove it. > > > -or- > > > > Should we trust that the memory allocators will always maintain at least > > the current alignment guarantees, or actually check the alignment of the > > returned buffer if CONFIG_XFS_DEBUG=y? > > It's documented in Documentation/core-api/memory-allocation.rst that > the alignment of the memory for power of two sized allocations is > guaranteed. That means it's the responsibility of the mm developers > to test and ensure this API-defined behaviour does not regress. I > don't think it's viable for us to test every guarantee other > subsystems are supposed to provide us with regardless of whether > CONFIG_XFS_DEBUG=y is set or not... > > Unfortunately, I don't see any debug or test infrastructure that > ensures the allocation alignment guarantee is being met. Perhaps > that's something the mm developers need to address? That would be nice. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] xfs: replace kmem_alloc_large() with kvmalloc() 2021-06-25 2:30 [PATCH 0/3] mm, xfs: memory allocation improvements Dave Chinner 2021-06-25 2:30 ` [PATCH 1/3] mm: Add kvrealloc() Dave Chinner 2021-06-25 2:30 ` [PATCH 2/3] xfs: remove kmem_alloc_io() Dave Chinner @ 2021-06-25 2:30 ` Dave Chinner 2021-06-26 2:06 ` Darrick J. Wong 2 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2021-06-25 2:30 UTC (permalink / raw) To: linux-xfs, linux-mm From: Dave Chinner <dchinner@redhat.com> There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/kmem.c | 39 ----------------------------------- fs/xfs/kmem.h | 1 - fs/xfs/libxfs/xfs_attr_leaf.c | 2 +- fs/xfs/scrub/attr.c | 14 +++++++------ fs/xfs/scrub/attr.h | 3 --- fs/xfs/xfs_log.c | 4 ++-- fs/xfs/xfs_log_cil.c | 10 ++++++++- fs/xfs/xfs_log_recover.c | 2 +- fs/xfs/xfs_trace.h | 1 - 9 files changed, 21 insertions(+), 55 deletions(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 3f2979fd2f2b..6f49bf39183c 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -29,42 +29,3 @@ kmem_alloc(size_t size, xfs_km_flags_t flags) congestion_wait(BLK_RW_ASYNC, HZ/50); } while (1); } - - -/* - * __vmalloc() will allocate data pages and auxiliary structures (e.g. - * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context here. Hence - * we need to tell memory reclaim that we are in such a context via - * PF_MEMALLOC_NOFS to prevent memory reclaim re-entering the filesystem here - * and potentially deadlocking. - */ -static void * -__kmem_vmalloc(size_t size, xfs_km_flags_t flags) -{ - unsigned nofs_flag = 0; - void *ptr; - gfp_t lflags = kmem_flags_convert(flags); - - if (flags & KM_NOFS) - nofs_flag = memalloc_nofs_save(); - - ptr = __vmalloc(size, lflags); - - if (flags & KM_NOFS) - memalloc_nofs_restore(nofs_flag); - - return ptr; -} - -void * -kmem_alloc_large(size_t size, xfs_km_flags_t flags) -{ - void *ptr; - - trace_kmem_alloc_large(size, flags, _RET_IP_); - - ptr = kmem_alloc(size, flags | KM_MAYFAIL); - if (ptr) - return ptr; - return __kmem_vmalloc(size, flags); -} diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index 9ff20047f8b8..54da6d717a06 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -57,7 +57,6 @@ kmem_flags_convert(xfs_km_flags_t flags) } extern void *kmem_alloc(size_t, xfs_km_flags_t); -extern void *kmem_alloc_large(size_t size, xfs_km_flags_t); static inline void kmem_free(const void *ptr) { kvfree(ptr); diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index b910bd209949..16d64872acc0 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -489,7 +489,7 @@ xfs_attr_copy_value( } if (!args->value) { - args->value = kmem_alloc_large(valuelen, KM_NOLOCKDEP); + args->value = kvmalloc(valuelen, GFP_KERNEL | __GFP_NOLOCKDEP); if (!args->value) return -ENOMEM; } diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c index 552af0cf8482..6c36af6dbd35 100644 --- a/fs/xfs/scrub/attr.c +++ b/fs/xfs/scrub/attr.c @@ -25,11 +25,11 @@ * reallocating the buffer if necessary. Buffer contents are not preserved * across a reallocation. */ -int +static int xchk_setup_xattr_buf( struct xfs_scrub *sc, size_t value_size, - xfs_km_flags_t flags) + gfp_t flags) { size_t sz; struct xchk_xattr_buf *ab = sc->buf; @@ -57,7 +57,7 @@ xchk_setup_xattr_buf( * Don't zero the buffer upon allocation to avoid runtime overhead. * All users must be careful never to read uninitialized contents. */ - ab = kmem_alloc_large(sizeof(*ab) + sz, flags); + ab = kvmalloc(sizeof(*ab) + sz, flags); if (!ab) return -ENOMEM; @@ -79,7 +79,7 @@ xchk_setup_xattr( * without the inode lock held, which means we can sleep. */ if (sc->flags & XCHK_TRY_HARDER) { - error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, 0); + error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, GFP_KERNEL); if (error) return error; } @@ -138,7 +138,8 @@ xchk_xattr_listent( * doesn't work, we overload the seen_enough variable to convey * the error message back to the main scrub function. */ - error = xchk_setup_xattr_buf(sx->sc, valuelen, KM_MAYFAIL); + error = xchk_setup_xattr_buf(sx->sc, valuelen, + GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (error == -ENOMEM) error = -EDEADLOCK; if (error) { @@ -323,7 +324,8 @@ xchk_xattr_block( return 0; /* Allocate memory for block usage checking. */ - error = xchk_setup_xattr_buf(ds->sc, 0, KM_MAYFAIL); + error = xchk_setup_xattr_buf(ds->sc, 0, + GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (error == -ENOMEM) return -EDEADLOCK; if (error) diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h index 13a1d2e8424d..1719e1c4da59 100644 --- a/fs/xfs/scrub/attr.h +++ b/fs/xfs/scrub/attr.h @@ -65,7 +65,4 @@ xchk_xattr_dstmap( BITS_TO_LONGS(sc->mp->m_attr_geo->blksize); } -int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size, - xfs_km_flags_t flags); - #endif /* __XFS_SCRUB_ATTR_H__ */ diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 404970a4343c..6cc51b0cb99e 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1462,8 +1462,8 @@ xlog_alloc_log( iclog->ic_prev = prev_iclog; prev_iclog = iclog; - iclog->ic_data = kmem_alloc_large(log->l_iclog_size, - KM_MAYFAIL | KM_ZERO); + iclog->ic_data = kvzalloc(log->l_iclog_size, + GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!iclog->ic_data) goto out_free_iclog; #ifdef DEBUG diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 3c2b1205944d..bf9d747352df 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -185,7 +185,15 @@ xlog_cil_alloc_shadow_bufs( */ kmem_free(lip->li_lv_shadow); - lv = kmem_alloc_large(buf_size, KM_NOFS); + /* + * We are in transaction context, which means this + * allocation will pick up GFP_NOFS from the + * memalloc_nofs_save/restore context the transaction + * holds. This means we can use GFP_KERNEL here so the + * generic kvmalloc() code will run vmalloc on + * contiguous page allocation failure as we require. + */ + lv = kvmalloc(buf_size, GFP_KERNEL); memset(lv, 0, xlog_cil_iovec_space(niovecs)); lv->lv_item = lip; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index cc559815e08f..1a291bf50776 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -106,7 +106,7 @@ xlog_alloc_buffer( if (nbblks > 1 && log->l_sectBBsize > 1) nbblks += log->l_sectBBsize; nbblks = round_up(nbblks, log->l_sectBBsize); - return kmem_alloc_large(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO); + return kvzalloc(BBTOB(nbblks), GFP_KERNEL | __GFP_RETRY_MAYFAIL); } /* diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 6865e838a71b..38f2f67303f7 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3689,7 +3689,6 @@ DEFINE_EVENT(xfs_kmem_class, name, \ TP_PROTO(ssize_t size, int flags, unsigned long caller_ip), \ TP_ARGS(size, flags, caller_ip)) DEFINE_KMEM_EVENT(kmem_alloc); -DEFINE_KMEM_EVENT(kmem_alloc_large); TRACE_EVENT(xfs_check_new_dalign, TP_PROTO(struct xfs_mount *mp, int new_dalign, xfs_ino_t calc_rootino), -- 2.31.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] xfs: replace kmem_alloc_large() with kvmalloc() 2021-06-25 2:30 ` [PATCH 3/3] xfs: replace kmem_alloc_large() with kvmalloc() Dave Chinner @ 2021-06-26 2:06 ` Darrick J. Wong 0 siblings, 0 replies; 12+ messages in thread From: Darrick J. Wong @ 2021-06-26 2:06 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, linux-mm On Fri, Jun 25, 2021 at 12:30:29PM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > There is no reason for this wrapper existing anymore. All the places > that use KM_NOFS allocation are within transaction contexts and > hence covered by memalloc_nofs_save/restore contexts. Hence we don't > need any special handling of vmalloc for large IOs anymore and > so special casing this code isn't necessary. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks pretty straightforward, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/xfs/kmem.c | 39 ----------------------------------- > fs/xfs/kmem.h | 1 - > fs/xfs/libxfs/xfs_attr_leaf.c | 2 +- > fs/xfs/scrub/attr.c | 14 +++++++------ > fs/xfs/scrub/attr.h | 3 --- > fs/xfs/xfs_log.c | 4 ++-- > fs/xfs/xfs_log_cil.c | 10 ++++++++- > fs/xfs/xfs_log_recover.c | 2 +- > fs/xfs/xfs_trace.h | 1 - > 9 files changed, 21 insertions(+), 55 deletions(-) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index 3f2979fd2f2b..6f49bf39183c 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -29,42 +29,3 @@ kmem_alloc(size_t size, xfs_km_flags_t flags) > congestion_wait(BLK_RW_ASYNC, HZ/50); > } while (1); > } > - > - > -/* > - * __vmalloc() will allocate data pages and auxiliary structures (e.g. > - * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context here. Hence > - * we need to tell memory reclaim that we are in such a context via > - * PF_MEMALLOC_NOFS to prevent memory reclaim re-entering the filesystem here > - * and potentially deadlocking. > - */ > -static void * > -__kmem_vmalloc(size_t size, xfs_km_flags_t flags) > -{ > - unsigned nofs_flag = 0; > - void *ptr; > - gfp_t lflags = kmem_flags_convert(flags); > - > - if (flags & KM_NOFS) > - nofs_flag = memalloc_nofs_save(); > - > - ptr = __vmalloc(size, lflags); > - > - if (flags & KM_NOFS) > - memalloc_nofs_restore(nofs_flag); > - > - return ptr; > -} > - > -void * > -kmem_alloc_large(size_t size, xfs_km_flags_t flags) > -{ > - void *ptr; > - > - trace_kmem_alloc_large(size, flags, _RET_IP_); > - > - ptr = kmem_alloc(size, flags | KM_MAYFAIL); > - if (ptr) > - return ptr; > - return __kmem_vmalloc(size, flags); > -} > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > index 9ff20047f8b8..54da6d717a06 100644 > --- a/fs/xfs/kmem.h > +++ b/fs/xfs/kmem.h > @@ -57,7 +57,6 @@ kmem_flags_convert(xfs_km_flags_t flags) > } > > extern void *kmem_alloc(size_t, xfs_km_flags_t); > -extern void *kmem_alloc_large(size_t size, xfs_km_flags_t); > static inline void kmem_free(const void *ptr) > { > kvfree(ptr); > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > index b910bd209949..16d64872acc0 100644 > --- a/fs/xfs/libxfs/xfs_attr_leaf.c > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > @@ -489,7 +489,7 @@ xfs_attr_copy_value( > } > > if (!args->value) { > - args->value = kmem_alloc_large(valuelen, KM_NOLOCKDEP); > + args->value = kvmalloc(valuelen, GFP_KERNEL | __GFP_NOLOCKDEP); > if (!args->value) > return -ENOMEM; > } > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c > index 552af0cf8482..6c36af6dbd35 100644 > --- a/fs/xfs/scrub/attr.c > +++ b/fs/xfs/scrub/attr.c > @@ -25,11 +25,11 @@ > * reallocating the buffer if necessary. Buffer contents are not preserved > * across a reallocation. > */ > -int > +static int > xchk_setup_xattr_buf( > struct xfs_scrub *sc, > size_t value_size, > - xfs_km_flags_t flags) > + gfp_t flags) > { > size_t sz; > struct xchk_xattr_buf *ab = sc->buf; > @@ -57,7 +57,7 @@ xchk_setup_xattr_buf( > * Don't zero the buffer upon allocation to avoid runtime overhead. > * All users must be careful never to read uninitialized contents. > */ > - ab = kmem_alloc_large(sizeof(*ab) + sz, flags); > + ab = kvmalloc(sizeof(*ab) + sz, flags); > if (!ab) > return -ENOMEM; > > @@ -79,7 +79,7 @@ xchk_setup_xattr( > * without the inode lock held, which means we can sleep. > */ > if (sc->flags & XCHK_TRY_HARDER) { > - error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, 0); > + error = xchk_setup_xattr_buf(sc, XATTR_SIZE_MAX, GFP_KERNEL); > if (error) > return error; > } > @@ -138,7 +138,8 @@ xchk_xattr_listent( > * doesn't work, we overload the seen_enough variable to convey > * the error message back to the main scrub function. > */ > - error = xchk_setup_xattr_buf(sx->sc, valuelen, KM_MAYFAIL); > + error = xchk_setup_xattr_buf(sx->sc, valuelen, > + GFP_KERNEL | __GFP_RETRY_MAYFAIL); > if (error == -ENOMEM) > error = -EDEADLOCK; > if (error) { > @@ -323,7 +324,8 @@ xchk_xattr_block( > return 0; > > /* Allocate memory for block usage checking. */ > - error = xchk_setup_xattr_buf(ds->sc, 0, KM_MAYFAIL); > + error = xchk_setup_xattr_buf(ds->sc, 0, > + GFP_KERNEL | __GFP_RETRY_MAYFAIL); > if (error == -ENOMEM) > return -EDEADLOCK; > if (error) > diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h > index 13a1d2e8424d..1719e1c4da59 100644 > --- a/fs/xfs/scrub/attr.h > +++ b/fs/xfs/scrub/attr.h > @@ -65,7 +65,4 @@ xchk_xattr_dstmap( > BITS_TO_LONGS(sc->mp->m_attr_geo->blksize); > } > > -int xchk_setup_xattr_buf(struct xfs_scrub *sc, size_t value_size, > - xfs_km_flags_t flags); > - > #endif /* __XFS_SCRUB_ATTR_H__ */ > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index 404970a4343c..6cc51b0cb99e 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1462,8 +1462,8 @@ xlog_alloc_log( > iclog->ic_prev = prev_iclog; > prev_iclog = iclog; > > - iclog->ic_data = kmem_alloc_large(log->l_iclog_size, > - KM_MAYFAIL | KM_ZERO); > + iclog->ic_data = kvzalloc(log->l_iclog_size, > + GFP_KERNEL | __GFP_RETRY_MAYFAIL); > if (!iclog->ic_data) > goto out_free_iclog; > #ifdef DEBUG > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 3c2b1205944d..bf9d747352df 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -185,7 +185,15 @@ xlog_cil_alloc_shadow_bufs( > */ > kmem_free(lip->li_lv_shadow); > > - lv = kmem_alloc_large(buf_size, KM_NOFS); > + /* > + * We are in transaction context, which means this > + * allocation will pick up GFP_NOFS from the > + * memalloc_nofs_save/restore context the transaction > + * holds. This means we can use GFP_KERNEL here so the > + * generic kvmalloc() code will run vmalloc on > + * contiguous page allocation failure as we require. > + */ > + lv = kvmalloc(buf_size, GFP_KERNEL); > memset(lv, 0, xlog_cil_iovec_space(niovecs)); > > lv->lv_item = lip; > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index cc559815e08f..1a291bf50776 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -106,7 +106,7 @@ xlog_alloc_buffer( > if (nbblks > 1 && log->l_sectBBsize > 1) > nbblks += log->l_sectBBsize; > nbblks = round_up(nbblks, log->l_sectBBsize); > - return kmem_alloc_large(BBTOB(nbblks), KM_MAYFAIL | KM_ZERO); > + return kvzalloc(BBTOB(nbblks), GFP_KERNEL | __GFP_RETRY_MAYFAIL); > } > > /* > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 6865e838a71b..38f2f67303f7 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -3689,7 +3689,6 @@ DEFINE_EVENT(xfs_kmem_class, name, \ > TP_PROTO(ssize_t size, int flags, unsigned long caller_ip), \ > TP_ARGS(size, flags, caller_ip)) > DEFINE_KMEM_EVENT(kmem_alloc); > -DEFINE_KMEM_EVENT(kmem_alloc_large); > > TRACE_EVENT(xfs_check_new_dalign, > TP_PROTO(struct xfs_mount *mp, int new_dalign, xfs_ino_t calc_rootino), > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-06-27 23:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-25 2:30 [PATCH 0/3] mm, xfs: memory allocation improvements Dave Chinner 2021-06-25 2:30 ` [PATCH 1/3] mm: Add kvrealloc() Dave Chinner 2021-06-25 2:40 ` Miaohe Lin 2021-06-25 5:05 ` Dave Chinner 2021-06-25 3:54 ` Matthew Wilcox 2021-06-25 5:02 ` Dave Chinner 2021-06-25 2:30 ` [PATCH 2/3] xfs: remove kmem_alloc_io() Dave Chinner 2021-06-26 2:01 ` Darrick J. Wong 2021-06-27 22:09 ` Dave Chinner 2021-06-27 23:14 ` Darrick J. Wong 2021-06-25 2:30 ` [PATCH 3/3] xfs: replace kmem_alloc_large() with kvmalloc() Dave Chinner 2021-06-26 2:06 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).