* [PATCH 0/2] xfs: fix some new memory allocation failures
@ 2013-09-02 10:52 Dave Chinner
2013-09-02 10:52 ` [PATCH 1/2] xfs: fix memory allocation failures with ACLs Dave Chinner
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Dave Chinner @ 2013-09-02 10:52 UTC (permalink / raw)
To: xfs
Hi folks,
These failures are a result of order-4 allocations being done on v5
filesystems to support the large ACL count xattrs. The first patch
puts out usual falbback to vmalloc workaround in place. The second
patch factors all the places we now have this fallback-to-vmalloc
and makes it transparent to the callers.
Cheers,
Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/2] xfs: fix memory allocation failures with ACLs 2013-09-02 10:52 [PATCH 0/2] xfs: fix some new memory allocation failures Dave Chinner @ 2013-09-02 10:52 ` Dave Chinner 2013-09-06 19:59 ` Mark Tinguely 2013-09-02 10:53 ` [PATCH 2/2] xfs: factor all the kmalloc-or-vmalloc fallback allocations Dave Chinner ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2013-09-02 10:52 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> Ever since increasing the number of supported ACLs from 25 to as many as can fit in an xattr, there have been reports of order 4 memory allocations failing in the ACL code. Fix it in the same way we've fixed all the xattr read/write code that has the same problem. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_acl.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c index 6951896..4ea73cc 100644 --- a/fs/xfs/xfs_acl.c +++ b/fs/xfs/xfs_acl.c @@ -152,9 +152,12 @@ xfs_get_acl(struct inode *inode, int type) * go out to the disk. */ len = XFS_ACL_MAX_SIZE(ip->i_mount); - xfs_acl = kzalloc(len, GFP_KERNEL); - if (!xfs_acl) - return ERR_PTR(-ENOMEM); + xfs_acl = kmem_zalloc(len, KM_SLEEP | KM_MAYFAIL); + if (!xfs_acl) { + xfs_acl = kmem_zalloc_large(len); + if (!xfs_acl) + return ERR_PTR(-ENOMEM); + } error = -xfs_attr_get(ip, ea_name, (unsigned char *)xfs_acl, &len, ATTR_ROOT); @@ -175,10 +178,13 @@ xfs_get_acl(struct inode *inode, int type) if (IS_ERR(acl)) goto out; - out_update_cache: +out_update_cache: set_cached_acl(inode, type, acl); - out: - kfree(xfs_acl); +out: + if (is_vmalloc_addr(xfs_acl)) + kmem_free_large(xfs_acl); + else + kfree(xfs_acl); return acl; } @@ -209,9 +215,12 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) struct xfs_acl *xfs_acl; int len = XFS_ACL_MAX_SIZE(ip->i_mount); - xfs_acl = kzalloc(len, GFP_KERNEL); - if (!xfs_acl) - return -ENOMEM; + xfs_acl = kmem_zalloc(len, KM_SLEEP | KM_MAYFAIL); + if (!xfs_acl) { + xfs_acl = kmem_zalloc_large(len); + if (!xfs_acl) + return -ENOMEM; + } xfs_acl_to_disk(xfs_acl, acl); @@ -222,7 +231,10 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) error = -xfs_attr_set(ip, ea_name, (unsigned char *)xfs_acl, len, ATTR_ROOT); - kfree(xfs_acl); + if (is_vmalloc_addr(xfs_acl)) + kmem_free_large(xfs_acl); + else + kfree(xfs_acl); } else { /* * A NULL ACL argument means we want to remove the ACL. -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] xfs: fix memory allocation failures with ACLs 2013-09-02 10:52 ` [PATCH 1/2] xfs: fix memory allocation failures with ACLs Dave Chinner @ 2013-09-06 19:59 ` Mark Tinguely 0 siblings, 0 replies; 13+ messages in thread From: Mark Tinguely @ 2013-09-06 19:59 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 09/02/13 05:52, Dave Chinner wrote: > From: Dave Chinner<dchinner@redhat.com> > > Ever since increasing the number of supported ACLs from 25 to as > many as can fit in an xattr, there have been reports of order 4 > memory allocations failing in the ACL code. Fix it in the same way > we've fixed all the xattr read/write code that has the same problem. > > Signed-off-by: Dave Chinner<dchinner@redhat.com> > --- Looks good. Reviewed-by: Mark Tinguely <tinguely@sgi.com> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-02 10:52 [PATCH 0/2] xfs: fix some new memory allocation failures Dave Chinner 2013-09-02 10:52 ` [PATCH 1/2] xfs: fix memory allocation failures with ACLs Dave Chinner @ 2013-09-02 10:53 ` Dave Chinner 2013-09-06 20:37 ` Mark Tinguely 2013-09-02 17:03 ` [PATCH 0/2] xfs: fix some new memory allocation failures Mark Tinguely 2013-09-10 22:40 ` Ben Myers 3 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2013-09-02 10:53 UTC (permalink / raw) To: xfs From: Dave Chinner <dchinner@redhat.com> We have quite a few places now where we do: x = kmem_zalloc(large size) if (!x) x = kmem_zalloc_large(large size) and do a similar dance when freeing the memory. kmem_free() already does the correct freeing dance, and kmem_zalloc_large() is only ever called in these constructs, so just factor it all into kmem_zalloc_large() and kmem_free(). Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/kmem.c | 15 ++++++++++++--- fs/xfs/kmem.h | 9 +-------- fs/xfs/xfs_acl.c | 28 ++++++++-------------------- fs/xfs/xfs_bmap_util.c | 15 ++++----------- fs/xfs/xfs_ioctl.c | 34 +++++++++++----------------------- fs/xfs/xfs_ioctl32.c | 18 ++++++------------ fs/xfs/xfs_itable.c | 2 +- 7 files changed, 43 insertions(+), 78 deletions(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 4a7286c..a02cfb9 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -27,8 +27,6 @@ /* * Greedy allocation. May fail and may return vmalloced memory. - * - * Must be freed using kmem_free_large. */ void * kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) @@ -36,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) void *ptr; size_t kmsize = maxsize; - while (!(ptr = kmem_zalloc_large(kmsize))) { + while (!(ptr = vzalloc(kmsize))) { if ((kmsize >>= 1) <= minsize) kmsize = minsize; } @@ -75,6 +73,17 @@ kmem_zalloc(size_t size, xfs_km_flags_t flags) return ptr; } +void * +kmem_zalloc_large(size_t size, xfs_km_flags_t flags) +{ + void *ptr; + + ptr = kmem_zalloc(size, flags | KM_MAYFAIL); + if (ptr) + return ptr; + return vzalloc(size); +} + void kmem_free(const void *ptr) { diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index b2f2620..3a7371c 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -57,17 +57,10 @@ kmem_flags_convert(xfs_km_flags_t flags) extern void *kmem_alloc(size_t, xfs_km_flags_t); extern void *kmem_zalloc(size_t, xfs_km_flags_t); +extern void *kmem_zalloc_large(size_t size, xfs_km_flags_t); extern void *kmem_realloc(const void *, size_t, size_t, xfs_km_flags_t); extern void kmem_free(const void *); -static inline void *kmem_zalloc_large(size_t size) -{ - return vzalloc(size); -} -static inline void kmem_free_large(void *ptr) -{ - vfree(ptr); -} extern void *kmem_zalloc_greedy(size_t *, size_t, size_t); diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c index 4ea73cc..0e2f37e 100644 --- a/fs/xfs/xfs_acl.c +++ b/fs/xfs/xfs_acl.c @@ -152,12 +152,9 @@ xfs_get_acl(struct inode *inode, int type) * go out to the disk. */ len = XFS_ACL_MAX_SIZE(ip->i_mount); - xfs_acl = kmem_zalloc(len, KM_SLEEP | KM_MAYFAIL); - if (!xfs_acl) { - xfs_acl = kmem_zalloc_large(len); - if (!xfs_acl) - return ERR_PTR(-ENOMEM); - } + xfs_acl = kmem_zalloc_large(len, KM_SLEEP); + if (!xfs_acl) + return ERR_PTR(-ENOMEM); error = -xfs_attr_get(ip, ea_name, (unsigned char *)xfs_acl, &len, ATTR_ROOT); @@ -181,10 +178,7 @@ xfs_get_acl(struct inode *inode, int type) out_update_cache: set_cached_acl(inode, type, acl); out: - if (is_vmalloc_addr(xfs_acl)) - kmem_free_large(xfs_acl); - else - kfree(xfs_acl); + kmem_free(xfs_acl); return acl; } @@ -215,12 +209,9 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) struct xfs_acl *xfs_acl; int len = XFS_ACL_MAX_SIZE(ip->i_mount); - xfs_acl = kmem_zalloc(len, KM_SLEEP | KM_MAYFAIL); - if (!xfs_acl) { - xfs_acl = kmem_zalloc_large(len); - if (!xfs_acl) - return -ENOMEM; - } + xfs_acl = kmem_zalloc_large(len, KM_SLEEP); + if (!xfs_acl) + return -ENOMEM; xfs_acl_to_disk(xfs_acl, acl); @@ -231,10 +222,7 @@ xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) error = -xfs_attr_set(ip, ea_name, (unsigned char *)xfs_acl, len, ATTR_ROOT); - if (is_vmalloc_addr(xfs_acl)) - kmem_free_large(xfs_acl); - else - kfree(xfs_acl); + kmem_free(xfs_acl); } else { /* * A NULL ACL argument means we want to remove the ACL. diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index c6dc551..97f952c 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -612,13 +612,9 @@ xfs_getbmap( if (bmv->bmv_count > ULONG_MAX / sizeof(struct getbmapx)) return XFS_ERROR(ENOMEM); - out = kmem_zalloc(bmv->bmv_count * sizeof(struct getbmapx), KM_MAYFAIL); - if (!out) { - out = kmem_zalloc_large(bmv->bmv_count * - sizeof(struct getbmapx)); - if (!out) - return XFS_ERROR(ENOMEM); - } + out = kmem_zalloc_large(bmv->bmv_count * sizeof(struct getbmapx), 0); + if (!out) + return XFS_ERROR(ENOMEM); xfs_ilock(ip, XFS_IOLOCK_SHARED); if (whichfork == XFS_DATA_FORK && !(iflags & BMV_IF_DELALLOC)) { @@ -754,10 +750,7 @@ xfs_getbmap( break; } - if (is_vmalloc_addr(out)) - kmem_free_large(out); - else - kmem_free(out); + kmem_free(out); return error; } diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 21d9c9df..668e8f4 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -456,12 +456,9 @@ xfs_attrlist_by_handle( if (IS_ERR(dentry)) return PTR_ERR(dentry); - kbuf = kmem_zalloc(al_hreq.buflen, KM_SLEEP | KM_MAYFAIL); - if (!kbuf) { - kbuf = kmem_zalloc_large(al_hreq.buflen); - if (!kbuf) - goto out_dput; - } + kbuf = kmem_zalloc_large(al_hreq.buflen, KM_SLEEP); + if (!kbuf) + goto out_dput; cursor = (attrlist_cursor_kern_t *)&al_hreq.pos; error = -xfs_attr_list(XFS_I(dentry->d_inode), kbuf, al_hreq.buflen, @@ -472,12 +469,9 @@ xfs_attrlist_by_handle( if (copy_to_user(al_hreq.buffer, kbuf, al_hreq.buflen)) error = -EFAULT; - out_kfree: - if (is_vmalloc_addr(kbuf)) - kmem_free_large(kbuf); - else - kmem_free(kbuf); - out_dput: +out_kfree: + kmem_free(kbuf); +out_dput: dput(dentry); return error; } @@ -495,12 +489,9 @@ xfs_attrmulti_attr_get( if (*len > XATTR_SIZE_MAX) return EINVAL; - kbuf = kmem_zalloc(*len, KM_SLEEP | KM_MAYFAIL); - if (!kbuf) { - kbuf = kmem_zalloc_large(*len); - if (!kbuf) - return ENOMEM; - } + kbuf = kmem_zalloc_large(*len, KM_SLEEP); + if (!kbuf) + return ENOMEM; error = xfs_attr_get(XFS_I(inode), name, kbuf, (int *)len, flags); if (error) @@ -509,11 +500,8 @@ xfs_attrmulti_attr_get( if (copy_to_user(ubuf, kbuf, *len)) error = EFAULT; - out_kfree: - if (is_vmalloc_addr(kbuf)) - kmem_free_large(kbuf); - else - kmem_free(kbuf); +out_kfree: + kmem_free(kbuf); return error; } diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c index d3ab953..f671f7e 100644 --- a/fs/xfs/xfs_ioctl32.c +++ b/fs/xfs/xfs_ioctl32.c @@ -371,12 +371,9 @@ xfs_compat_attrlist_by_handle( return PTR_ERR(dentry); error = -ENOMEM; - kbuf = kmem_zalloc(al_hreq.buflen, KM_SLEEP | KM_MAYFAIL); - if (!kbuf) { - kbuf = kmem_zalloc_large(al_hreq.buflen); - if (!kbuf) - goto out_dput; - } + kbuf = kmem_zalloc_large(al_hreq.buflen, KM_SLEEP); + if (!kbuf) + goto out_dput; cursor = (attrlist_cursor_kern_t *)&al_hreq.pos; error = -xfs_attr_list(XFS_I(dentry->d_inode), kbuf, al_hreq.buflen, @@ -387,12 +384,9 @@ xfs_compat_attrlist_by_handle( if (copy_to_user(compat_ptr(al_hreq.buffer), kbuf, al_hreq.buflen)) error = -EFAULT; - out_kfree: - if (is_vmalloc_addr(kbuf)) - kmem_free_large(kbuf); - else - kmem_free(kbuf); - out_dput: +out_kfree: + kmem_free(kbuf); +out_dput: dput(dentry); return error; } diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 8a67d53..084b3e1 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -495,7 +495,7 @@ xfs_bulkstat( /* * Done, we're either out of filesystem or space to put the data. */ - kmem_free_large(irbuf); + kmem_free(irbuf); *ubcountp = ubelem; /* * Found some inodes, return them now and return the error next time. -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] xfs: factor all the kmalloc-or-vmalloc fallback allocations 2013-09-02 10:53 ` [PATCH 2/2] xfs: factor all the kmalloc-or-vmalloc fallback allocations Dave Chinner @ 2013-09-06 20:37 ` Mark Tinguely 0 siblings, 0 replies; 13+ messages in thread From: Mark Tinguely @ 2013-09-06 20:37 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 09/02/13 05:53, Dave Chinner wrote: > From: Dave Chinner<dchinner@redhat.com> > > We have quite a few places now where we do: > > x = kmem_zalloc(large size) > if (!x) > x = kmem_zalloc_large(large size) > > and do a similar dance when freeing the memory. kmem_free() already > does the correct freeing dance, and kmem_zalloc_large() is only ever > called in these constructs, so just factor it all into > kmem_zalloc_large() and kmem_free(). > > Signed-off-by: Dave Chinner<dchinner@redhat.com> > --- Good start on the allocation changes. Reviewed-by: Mark Tinguely <tinguely@sgi.com> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-02 10:52 [PATCH 0/2] xfs: fix some new memory allocation failures Dave Chinner 2013-09-02 10:52 ` [PATCH 1/2] xfs: fix memory allocation failures with ACLs Dave Chinner 2013-09-02 10:53 ` [PATCH 2/2] xfs: factor all the kmalloc-or-vmalloc fallback allocations Dave Chinner @ 2013-09-02 17:03 ` Mark Tinguely 2013-09-02 22:20 ` Dave Chinner 2013-09-10 22:40 ` Ben Myers 3 siblings, 1 reply; 13+ messages in thread From: Mark Tinguely @ 2013-09-02 17:03 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 09/02/13 05:52, Dave Chinner wrote: > Hi folks, > > These failures are a result of order-4 allocations being done on v5 > filesystems to support the large ACL count xattrs. The first patch > puts out usual falbback to vmalloc workaround in place. The second > patch factors all the places we now have this fallback-to-vmalloc > and makes it transparent to the callers. > > Cheers, > > Dave. Thanks for clean up. Broken record time: Do we really need order allocation in the filesystem? Esp in xfs_ioctl.c. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-02 17:03 ` [PATCH 0/2] xfs: fix some new memory allocation failures Mark Tinguely @ 2013-09-02 22:20 ` Dave Chinner 2013-09-03 13:07 ` Mark Tinguely 0 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2013-09-02 22:20 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Mon, Sep 02, 2013 at 12:03:37PM -0500, Mark Tinguely wrote: > On 09/02/13 05:52, Dave Chinner wrote: > >Hi folks, > > > >These failures are a result of order-4 allocations being done on v5 > >filesystems to support the large ACL count xattrs. The first patch > >puts out usual falbback to vmalloc workaround in place. The second > >patch factors all the places we now have this fallback-to-vmalloc > >and makes it transparent to the callers. > > > >Cheers, > > > >Dave. > > Thanks for clean up. Broken record time: Do we really need order > allocation in the filesystem? Esp in xfs_ioctl.c. I don't understand your question. Are you asking why we need high order allocation? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-02 22:20 ` Dave Chinner @ 2013-09-03 13:07 ` Mark Tinguely 2013-09-03 20:04 ` Dave Chinner 0 siblings, 1 reply; 13+ messages in thread From: Mark Tinguely @ 2013-09-03 13:07 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 09/02/13 17:20, Dave Chinner wrote: > On Mon, Sep 02, 2013 at 12:03:37PM -0500, Mark Tinguely wrote: >> On 09/02/13 05:52, Dave Chinner wrote: >>> Hi folks, >>> >>> These failures are a result of order-4 allocations being done on v5 >>> filesystems to support the large ACL count xattrs. The first patch >>> puts out usual falbback to vmalloc workaround in place. The second >>> patch factors all the places we now have this fallback-to-vmalloc >>> and makes it transparent to the callers. >>> >>> Cheers, >>> >>> Dave. >> >> Thanks for clean up. Broken record time: Do we really need order >> allocation in the filesystem? Esp in xfs_ioctl.c. > > I don't understand your question. Are you asking why we need high > order allocation? > > Cheers, > > Dave. In patch 2, why not drop the physically contiguous allocation attempt and just do the virtually contiguous allocation? Things that now call kmem_zalloc_large() do not need a physically contiguous memory, it will simplify the allocation, it will leave the physically contiguous pieces for other Linux code that really need it. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-03 13:07 ` Mark Tinguely @ 2013-09-03 20:04 ` Dave Chinner 2013-09-03 20:46 ` Mark Tinguely 0 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2013-09-03 20:04 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Tue, Sep 03, 2013 at 08:07:19AM -0500, Mark Tinguely wrote: > On 09/02/13 17:20, Dave Chinner wrote: > >On Mon, Sep 02, 2013 at 12:03:37PM -0500, Mark Tinguely wrote: > >>On 09/02/13 05:52, Dave Chinner wrote: > >>>Hi folks, > >>> > >>>These failures are a result of order-4 allocations being done on v5 > >>>filesystems to support the large ACL count xattrs. The first patch > >>>puts out usual falbback to vmalloc workaround in place. The second > >>>patch factors all the places we now have this fallback-to-vmalloc > >>>and makes it transparent to the callers. > >>> > >>>Cheers, > >>> > >>>Dave. > >> > >>Thanks for clean up. Broken record time: Do we really need order > >>allocation in the filesystem? Esp in xfs_ioctl.c. > > > >I don't understand your question. Are you asking why we need high > >order allocation? > > > >Cheers, > > > >Dave. > > In patch 2, why not drop the physically contiguous allocation > attempt and just do the virtually contiguous allocation? Because: a) virtual memory space is extremely limited on some platforms - we regularly get people reporting that they've exhausted vmalloc space on 32 bit systems. b) when there is free contiguous memory, allocating that contiguous memory is much faster than allocating virtual memory. c) virtual memory access is slower than physical memory access and it puts pressure on the page tables. IOWs, we want to avoid allocating virtual memory if at all possible. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-03 20:04 ` Dave Chinner @ 2013-09-03 20:46 ` Mark Tinguely 2013-09-03 21:16 ` Dave Chinner 0 siblings, 1 reply; 13+ messages in thread From: Mark Tinguely @ 2013-09-03 20:46 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 09/03/13 15:04, Dave Chinner wrote: > On Tue, Sep 03, 2013 at 08:07:19AM -0500, Mark Tinguely wrote: >> On 09/02/13 17:20, Dave Chinner wrote: >>> On Mon, Sep 02, 2013 at 12:03:37PM -0500, Mark Tinguely wrote: >>>> On 09/02/13 05:52, Dave Chinner wrote: >>>>> Hi folks, >>>>> >>>>> These failures are a result of order-4 allocations being done on v5 >>>>> filesystems to support the large ACL count xattrs. The first patch >>>>> puts out usual falbback to vmalloc workaround in place. The second >>>>> patch factors all the places we now have this fallback-to-vmalloc >>>>> and makes it transparent to the callers. >>>>> >>>>> Cheers, >>>>> >>>>> Dave. >>>> >>>> Thanks for clean up. Broken record time: Do we really need order >>>> allocation in the filesystem? Esp in xfs_ioctl.c. >>> >>> I don't understand your question. Are you asking why we need high >>> order allocation? >>> >>> Cheers, >>> >>> Dave. >> >> In patch 2, why not drop the physically contiguous allocation >> attempt and just do the virtually contiguous allocation? > > Because: > > a) virtual memory space is extremely limited on some > platforms - we regularly get people reporting that they've > exhausted vmalloc space on 32 bit systems. > b) when there is free contiguous memory, allocating that > contiguous memory is much faster than allocating > virtual memory. > c) virtual memory access is slower than physical memory > access and it puts pressure on the page tables. > > IOWs, we want to avoid allocating virtual memory if at all possible. > > Cheers, > > Dave. Ummm, It is all virtual memory it all runs through page tables. The MMU works on virtual addresses. It appears Linux has a special range of kernel virtual memory for the physical contiguous allocations and range for sparse memory allocation. XFS does not need the physical space that backs the kernel virtual address to be contiguous - other parts of the kernel do. Why put pressure on the drivers that need order allocations when we do not need it? --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-03 20:46 ` Mark Tinguely @ 2013-09-03 21:16 ` Dave Chinner 2013-09-03 22:38 ` Mark Tinguely 0 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2013-09-03 21:16 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Tue, Sep 03, 2013 at 03:46:24PM -0500, Mark Tinguely wrote: > On 09/03/13 15:04, Dave Chinner wrote: > >On Tue, Sep 03, 2013 at 08:07:19AM -0500, Mark Tinguely wrote: > >>On 09/02/13 17:20, Dave Chinner wrote: > >>>On Mon, Sep 02, 2013 at 12:03:37PM -0500, Mark Tinguely wrote: > >>>>On 09/02/13 05:52, Dave Chinner wrote: > >>>>>Hi folks, > >>>>> > >>>>>These failures are a result of order-4 allocations being done on v5 > >>>>>filesystems to support the large ACL count xattrs. The first patch > >>>>>puts out usual falbback to vmalloc workaround in place. The second > >>>>>patch factors all the places we now have this fallback-to-vmalloc > >>>>>and makes it transparent to the callers. > >>>>> > >>>>>Cheers, > >>>>> > >>>>>Dave. > >>>> > >>>>Thanks for clean up. Broken record time: Do we really need order > >>>>allocation in the filesystem? Esp in xfs_ioctl.c. > >>> > >>>I don't understand your question. Are you asking why we need high > >>>order allocation? > >>> > >>>Cheers, > >>> > >>>Dave. > >> > >>In patch 2, why not drop the physically contiguous allocation > >>attempt and just do the virtually contiguous allocation? > > > >Because: > > > > a) virtual memory space is extremely limited on some > > platforms - we regularly get people reporting that they've > > exhausted vmalloc space on 32 bit systems. > > b) when there is free contiguous memory, allocating that > > contiguous memory is much faster than allocating > > virtual memory. > > c) virtual memory access is slower than physical memory > > access and it puts pressure on the page tables. > > > >IOWs, we want to avoid allocating virtual memory if at all possible. > > > >Cheers, > > > >Dave. > > Ummm, It is all virtual memory it all runs through page tables. The > MMU works on virtual addresses. Sure, but there's a massive difference between kmalloc and vmalloc()'d memory... > It appears Linux has a special range of kernel virtual memory for > the physical contiguous allocations and range for sparse memory > allocation. ... kernel memory is directly mapped as there is a 1:! relationship between the virtual address of kernel memory and the physical location of the memory. vmalloc memory has an arbitrary virtual to physical mapping that has to be looked in a separate structure on every page fault, same as for any userspace page fault. vmalloc space can become fragmented, be exhausted, etc, just like kmalloc memory can be. Indeed, there are situations where vmalloc can fail yet kmalloc will succeed... > XFS does not need the physical space that backs the kernel virtual > address to be contiguous - other parts of the kernel do. Why put > pressure on the drivers that need order allocations when we do not > need it? Let's just quote Linus from 2003, shall we: http://yarchive.net/comp/linux/vmalloc.html | > I think it'd make more sense to only use vmalloc when it's explicitly | > too big for kmalloc - or simply switch on num_online_cpus > 100 or | > whatever a sensible cutoff is (ie nobody but you would ever see this ;-)) | | No, please please please don't do these things. | | vmalloc() is NOT SOMETHING YOU SHOULD EVER USE! It's only valid for when | you _need_ a big array, and you don't have any choice. It's slow, and it's | a very restricted resource: it's a global resource that is literally | restricted to a few tens of megabytes. It should be _very_ carefully used. | | There are basically no valid new uses of it. There's a few valid legacy | users (I think the file descriptor array), and there are some drivers that | use it (which is crap, but drivers are drivers), and it's _really_ valid | only for modules. Nothing else. | | Basically: if you think you need more memory than a kmalloc() can give, | you need to re-organize your data structures. To either not need a big | area, or to be able to allocate it in chunks. | | Linus Linus will say exactly the same thing today.... And that doesn't take into account that vmalloc() or vm_map_ram() inside an XFS transaction context can deadlock as page table entries use GFP_KERNEL allocation, and this can happen in a GFP_NOFS context in XFS... Seriously, there are excellent reasons for vmalloc being considered a bad thing and hence it's use is actively discouraged across the entire kernel space. We only use it where absolutely necessary to function correctly. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-03 21:16 ` Dave Chinner @ 2013-09-03 22:38 ` Mark Tinguely 0 siblings, 0 replies; 13+ messages in thread From: Mark Tinguely @ 2013-09-03 22:38 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 09/03/13 16:16, Dave Chinner wrote: > Let's just quote Linus from 2003, shall we: > > http://yarchive.net/comp/linux/vmalloc.html > > |> I think it'd make more sense to only use vmalloc when it's explicitly > |> too big for kmalloc - or simply switch on num_online_cpus> 100 or > |> whatever a sensible cutoff is (ie nobody but you would ever see this ;-)) > | > | No, please please please don't do these things. > | > | vmalloc() is NOT SOMETHING YOU SHOULD EVER USE! It's only valid for when > | you_need_ a big array, and you don't have any choice. It's slow, and it's > | a very restricted resource: it's a global resource that is literally > | restricted to a few tens of megabytes. It should be_very_ carefully used. > | > | There are basically no valid new uses of it. There's a few valid legacy > | users (I think the file descriptor array), and there are some drivers that > | use it (which is crap, but drivers are drivers), and it's_really_ valid > | only for modules. Nothing else. > | > | Basically: if you think you need more memory than a kmalloc() can give, > | you need to re-organize your data structures. To either not need a big > | area, or to be able to allocate it in chunks. > | > | Linus > > Linus will say exactly the same thing today.... And that is very very sad that is it funny. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] xfs: fix some new memory allocation failures 2013-09-02 10:52 [PATCH 0/2] xfs: fix some new memory allocation failures Dave Chinner ` (2 preceding siblings ...) 2013-09-02 17:03 ` [PATCH 0/2] xfs: fix some new memory allocation failures Mark Tinguely @ 2013-09-10 22:40 ` Ben Myers 3 siblings, 0 replies; 13+ messages in thread From: Ben Myers @ 2013-09-10 22:40 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Mon, Sep 02, 2013 at 08:52:58PM +1000, Dave Chinner wrote: > Hi folks, > > These failures are a result of order-4 allocations being done on v5 > filesystems to support the large ACL count xattrs. The first patch > puts out usual falbback to vmalloc workaround in place. The second > patch factors all the places we now have this fallback-to-vmalloc > and makes it transparent to the callers. Applied these 2. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-09-10 22:40 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-02 10:52 [PATCH 0/2] xfs: fix some new memory allocation failures Dave Chinner 2013-09-02 10:52 ` [PATCH 1/2] xfs: fix memory allocation failures with ACLs Dave Chinner 2013-09-06 19:59 ` Mark Tinguely 2013-09-02 10:53 ` [PATCH 2/2] xfs: factor all the kmalloc-or-vmalloc fallback allocations Dave Chinner 2013-09-06 20:37 ` Mark Tinguely 2013-09-02 17:03 ` [PATCH 0/2] xfs: fix some new memory allocation failures Mark Tinguely 2013-09-02 22:20 ` Dave Chinner 2013-09-03 13:07 ` Mark Tinguely 2013-09-03 20:04 ` Dave Chinner 2013-09-03 20:46 ` Mark Tinguely 2013-09-03 21:16 ` Dave Chinner 2013-09-03 22:38 ` Mark Tinguely 2013-09-10 22:40 ` Ben Myers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox