* Failing XFS memory allocation @ 2016-03-23 10:15 Nikolay Borisov 2016-03-23 12:43 ` Brian Foster 2016-03-24 9:33 ` Christoph Hellwig 0 siblings, 2 replies; 13+ messages in thread From: Nikolay Borisov @ 2016-03-23 10:15 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Hello, So I have an XFS filesystem which houses 2 2.3T sparse files, which are loop-mounted. Recently I migrated a server to a 4.4.6 kernel and this morning I observed the following in my dmesg: XFS: loop0(15174) possible memory allocation deadlock size 107168 in kmem_alloc (mode:0x2400240) the mode is essentially (GFP_KERNEL | GFP_NOWARN) &= ~__GFP_FS. Here is the site of the loop file in case it matters: du -h --apparent-size /storage/loop/file1 2.3T /storage/loop/file1 du -h /storage/loop/file1 878G /storage/loop/file1 And this string is repeated multiple times. Looking at the output of "echo w > /proc/sysrq-trigger" I see the following suspicious entry: loop0 D ffff881fe081f038 0 15174 2 0x00000000 ffff881fe081f038 ffff883ff29fa700 ffff881fecb70d00 ffff88407fffae00 0000000000000000 0000000502404240 ffffffff81e30d60 0000000000000000 0000000000000000 ffff881f00000003 0000000000000282 ffff883f00000000 Call Trace: [<ffffffff8163ac01>] ? _raw_spin_lock_irqsave+0x21/0x60 [<ffffffff81636fd7>] schedule+0x47/0x90 [<ffffffff81639f03>] schedule_timeout+0x113/0x1e0 [<ffffffff810ac580>] ? lock_timer_base+0x80/0x80 [<ffffffff816363d4>] io_schedule_timeout+0xa4/0x110 [<ffffffff8114aadf>] congestion_wait+0x7f/0x130 [<ffffffff810939e0>] ? woken_wake_function+0x20/0x20 [<ffffffffa0283bac>] kmem_alloc+0x8c/0x120 [xfs] [<ffffffff81181751>] ? __kmalloc+0x121/0x250 [<ffffffffa0283c73>] kmem_realloc+0x33/0x80 [xfs] [<ffffffffa02546cd>] xfs_iext_realloc_indirect+0x3d/0x60 [xfs] [<ffffffffa02548cf>] xfs_iext_irec_new+0x3f/0xf0 [xfs] [<ffffffffa0254c0d>] xfs_iext_add_indirect_multi+0x14d/0x210 [xfs] [<ffffffffa02554b5>] xfs_iext_add+0xc5/0x230 [xfs] [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 [<ffffffffa0256269>] xfs_iext_insert+0x59/0x110 [xfs] [<ffffffffa0230928>] ? xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] [<ffffffffa0230928>] xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 [<ffffffff8112b725>] ? mempool_alloc+0x65/0x180 [<ffffffffa02543d8>] ? xfs_iext_get_ext+0x38/0x70 [xfs] [<ffffffffa0254e8d>] ? xfs_iext_bno_to_ext+0xed/0x150 [xfs] [<ffffffffa02311b5>] xfs_bmapi_reserve_delalloc+0x225/0x250 [xfs] [<ffffffffa023131e>] xfs_bmapi_delay+0x13e/0x290 [xfs] [<ffffffffa02730ad>] xfs_iomap_write_delay+0x17d/0x300 [xfs] [<ffffffffa022e434>] ? xfs_bmapi_read+0x114/0x330 [xfs] [<ffffffffa025ddc5>] __xfs_get_blocks+0x585/0xa90 [xfs] [<ffffffff81324b53>] ? __percpu_counter_add+0x63/0x80 [<ffffffff811374cd>] ? account_page_dirtied+0xed/0x1b0 [<ffffffff811cfc59>] ? alloc_buffer_head+0x49/0x60 [<ffffffff811d07c0>] ? alloc_page_buffers+0x60/0xb0 [<ffffffff811d13e5>] ? create_empty_buffers+0x45/0xc0 [<ffffffffa025e324>] xfs_get_blocks+0x14/0x20 [xfs] [<ffffffff811d34e2>] __block_write_begin+0x1c2/0x580 [<ffffffffa025e310>] ? xfs_get_blocks_direct+0x20/0x20 [xfs] [<ffffffffa025bbb1>] xfs_vm_write_begin+0x61/0xf0 [xfs] [<ffffffff81127e50>] generic_perform_write+0xd0/0x1f0 [<ffffffffa026a341>] xfs_file_buffered_aio_write+0xe1/0x240 [xfs] [<ffffffff812e16d2>] ? bt_clear_tag+0xb2/0xd0 [<ffffffffa026ab87>] xfs_file_write_iter+0x167/0x170 [xfs] [<ffffffff81199d76>] vfs_iter_write+0x76/0xa0 [<ffffffffa03fb735>] lo_write_bvec+0x65/0x100 [loop] [<ffffffffa03fd589>] loop_queue_work+0x689/0x924 [loop] [<ffffffff8163ba52>] ? retint_kernel+0x10/0x10 [<ffffffff81074d71>] kthread_worker_fn+0x61/0x1c0 [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 [<ffffffff810744d7>] kthread+0xd7/0xf0 [<ffffffff8107d22e>] ? schedule_tail+0x1e/0xd0 [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 [<ffffffff8163b2af>] ret_from_fork+0x3f/0x70 [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 So this seems that there are writes to the loop device being queued and while being served XFS has to do some internal memory allocation to fit the new data, however due to some *uknown* reason it fails and starts looping in kmem_alloc. I didn't see any OOM reports so presumably the server was not out of memory, but unfortunately I didn't check the memory fragmentation, though I collected a crash dump in case you need further info. The one thing which bugs me is that XFS tried to allocate 107 contiguous kb which is page-order-26 isn't this waaaaay too big and almost never satisfiable, despite direct/bg reclaim to be enabled? For now I've reverted to using 3.12.52 kernel, where this issue hasn't been observed (yet) any ideas would be much appreciated. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 10:15 Failing XFS memory allocation Nikolay Borisov @ 2016-03-23 12:43 ` Brian Foster 2016-03-23 12:56 ` Nikolay Borisov 2016-03-24 9:33 ` Christoph Hellwig 1 sibling, 1 reply; 13+ messages in thread From: Brian Foster @ 2016-03-23 12:43 UTC (permalink / raw) To: Nikolay Borisov; +Cc: xfs On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote: > Hello, > > So I have an XFS filesystem which houses 2 2.3T sparse files, which are > loop-mounted. Recently I migrated a server to a 4.4.6 kernel and this > morning I observed the following in my dmesg: > > XFS: loop0(15174) possible memory allocation deadlock size 107168 in > kmem_alloc (mode:0x2400240) > Is there a stack trace associated with this message? > the mode is essentially (GFP_KERNEL | GFP_NOWARN) &= ~__GFP_FS. > Here is the site of the loop file in case it matters: > > du -h --apparent-size /storage/loop/file1 > 2.3T /storage/loop/file1 > > du -h /storage/loop/file1 > 878G /storage/loop/file1 > > And this string is repeated multiple times. Looking at the output of > "echo w > /proc/sysrq-trigger" I see the following suspicious entry: > > loop0 D ffff881fe081f038 0 15174 2 0x00000000 > ffff881fe081f038 ffff883ff29fa700 ffff881fecb70d00 ffff88407fffae00 > 0000000000000000 0000000502404240 ffffffff81e30d60 0000000000000000 > 0000000000000000 ffff881f00000003 0000000000000282 ffff883f00000000 > Call Trace: > [<ffffffff8163ac01>] ? _raw_spin_lock_irqsave+0x21/0x60 > [<ffffffff81636fd7>] schedule+0x47/0x90 > [<ffffffff81639f03>] schedule_timeout+0x113/0x1e0 > [<ffffffff810ac580>] ? lock_timer_base+0x80/0x80 > [<ffffffff816363d4>] io_schedule_timeout+0xa4/0x110 > [<ffffffff8114aadf>] congestion_wait+0x7f/0x130 > [<ffffffff810939e0>] ? woken_wake_function+0x20/0x20 > [<ffffffffa0283bac>] kmem_alloc+0x8c/0x120 [xfs] > [<ffffffff81181751>] ? __kmalloc+0x121/0x250 > [<ffffffffa0283c73>] kmem_realloc+0x33/0x80 [xfs] > [<ffffffffa02546cd>] xfs_iext_realloc_indirect+0x3d/0x60 [xfs] > [<ffffffffa02548cf>] xfs_iext_irec_new+0x3f/0xf0 [xfs] > [<ffffffffa0254c0d>] xfs_iext_add_indirect_multi+0x14d/0x210 [xfs] > [<ffffffffa02554b5>] xfs_iext_add+0xc5/0x230 [xfs] It looks like it's working to add a new extent to the in-core extent list. If this is the stack associated with the warning message (combined with the large alloc size), I wonder if there's a fragmentation issue on the file leading to an excessive number of extents. What does 'xfs_bmap -v /storage/loop/file1' show? Brian > [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 > [<ffffffffa0256269>] xfs_iext_insert+0x59/0x110 [xfs] > [<ffffffffa0230928>] ? xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] > [<ffffffffa0230928>] xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] > [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 > [<ffffffff8112b725>] ? mempool_alloc+0x65/0x180 > [<ffffffffa02543d8>] ? xfs_iext_get_ext+0x38/0x70 [xfs] > [<ffffffffa0254e8d>] ? xfs_iext_bno_to_ext+0xed/0x150 [xfs] > [<ffffffffa02311b5>] xfs_bmapi_reserve_delalloc+0x225/0x250 [xfs] > [<ffffffffa023131e>] xfs_bmapi_delay+0x13e/0x290 [xfs] > [<ffffffffa02730ad>] xfs_iomap_write_delay+0x17d/0x300 [xfs] > [<ffffffffa022e434>] ? xfs_bmapi_read+0x114/0x330 [xfs] > [<ffffffffa025ddc5>] __xfs_get_blocks+0x585/0xa90 [xfs] > [<ffffffff81324b53>] ? __percpu_counter_add+0x63/0x80 > [<ffffffff811374cd>] ? account_page_dirtied+0xed/0x1b0 > [<ffffffff811cfc59>] ? alloc_buffer_head+0x49/0x60 > [<ffffffff811d07c0>] ? alloc_page_buffers+0x60/0xb0 > [<ffffffff811d13e5>] ? create_empty_buffers+0x45/0xc0 > [<ffffffffa025e324>] xfs_get_blocks+0x14/0x20 [xfs] > [<ffffffff811d34e2>] __block_write_begin+0x1c2/0x580 > [<ffffffffa025e310>] ? xfs_get_blocks_direct+0x20/0x20 [xfs] > [<ffffffffa025bbb1>] xfs_vm_write_begin+0x61/0xf0 [xfs] > [<ffffffff81127e50>] generic_perform_write+0xd0/0x1f0 > [<ffffffffa026a341>] xfs_file_buffered_aio_write+0xe1/0x240 [xfs] > [<ffffffff812e16d2>] ? bt_clear_tag+0xb2/0xd0 > [<ffffffffa026ab87>] xfs_file_write_iter+0x167/0x170 [xfs] > [<ffffffff81199d76>] vfs_iter_write+0x76/0xa0 > [<ffffffffa03fb735>] lo_write_bvec+0x65/0x100 [loop] > [<ffffffffa03fd589>] loop_queue_work+0x689/0x924 [loop] > [<ffffffff8163ba52>] ? retint_kernel+0x10/0x10 > [<ffffffff81074d71>] kthread_worker_fn+0x61/0x1c0 > [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 > [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 > [<ffffffff810744d7>] kthread+0xd7/0xf0 > [<ffffffff8107d22e>] ? schedule_tail+0x1e/0xd0 > [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 > [<ffffffff8163b2af>] ret_from_fork+0x3f/0x70 > [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 > > So this seems that there are writes to the loop device being queued and > while being served XFS has to do some internal memory allocation to fit > the new data, however due to some *uknown* reason it fails and starts > looping in kmem_alloc. I didn't see any OOM reports so presumably the > server was not out of memory, but unfortunately I didn't check the > memory fragmentation, though I collected a crash dump in case you need > further info. > > The one thing which bugs me is that XFS tried to allocate 107 contiguous > kb which is page-order-26 isn't this waaaaay too big and almost never > satisfiable, despite direct/bg reclaim to be enabled? For now I've > reverted to using 3.12.52 kernel, where this issue hasn't been observed > (yet) any ideas would be much appreciated. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 12:43 ` Brian Foster @ 2016-03-23 12:56 ` Nikolay Borisov 2016-03-23 13:10 ` Brian Foster 0 siblings, 1 reply; 13+ messages in thread From: Nikolay Borisov @ 2016-03-23 12:56 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 03/23/2016 02:43 PM, Brian Foster wrote: > On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote: >> Hello, >> >> So I have an XFS filesystem which houses 2 2.3T sparse files, which are >> loop-mounted. Recently I migrated a server to a 4.4.6 kernel and this >> morning I observed the following in my dmesg: >> >> XFS: loop0(15174) possible memory allocation deadlock size 107168 in >> kmem_alloc (mode:0x2400240) >> > > Is there a stack trace associated with this message? > >> the mode is essentially (GFP_KERNEL | GFP_NOWARN) &= ~__GFP_FS. >> Here is the site of the loop file in case it matters: >> >> du -h --apparent-size /storage/loop/file1 >> 2.3T /storage/loop/file1 >> >> du -h /storage/loop/file1 >> 878G /storage/loop/file1 >> >> And this string is repeated multiple times. Looking at the output of >> "echo w > /proc/sysrq-trigger" I see the following suspicious entry: >> >> loop0 D ffff881fe081f038 0 15174 2 0x00000000 >> ffff881fe081f038 ffff883ff29fa700 ffff881fecb70d00 ffff88407fffae00 >> 0000000000000000 0000000502404240 ffffffff81e30d60 0000000000000000 >> 0000000000000000 ffff881f00000003 0000000000000282 ffff883f00000000 >> Call Trace: >> [<ffffffff8163ac01>] ? _raw_spin_lock_irqsave+0x21/0x60 >> [<ffffffff81636fd7>] schedule+0x47/0x90 >> [<ffffffff81639f03>] schedule_timeout+0x113/0x1e0 >> [<ffffffff810ac580>] ? lock_timer_base+0x80/0x80 >> [<ffffffff816363d4>] io_schedule_timeout+0xa4/0x110 >> [<ffffffff8114aadf>] congestion_wait+0x7f/0x130 >> [<ffffffff810939e0>] ? woken_wake_function+0x20/0x20 >> [<ffffffffa0283bac>] kmem_alloc+0x8c/0x120 [xfs] >> [<ffffffff81181751>] ? __kmalloc+0x121/0x250 >> [<ffffffffa0283c73>] kmem_realloc+0x33/0x80 [xfs] >> [<ffffffffa02546cd>] xfs_iext_realloc_indirect+0x3d/0x60 [xfs] >> [<ffffffffa02548cf>] xfs_iext_irec_new+0x3f/0xf0 [xfs] >> [<ffffffffa0254c0d>] xfs_iext_add_indirect_multi+0x14d/0x210 [xfs] >> [<ffffffffa02554b5>] xfs_iext_add+0xc5/0x230 [xfs] > > It looks like it's working to add a new extent to the in-core extent > list. If this is the stack associated with the warning message (combined > with the large alloc size), I wonder if there's a fragmentation issue on > the file leading to an excessive number of extents. Yes this is the stack trace associated. > > What does 'xfs_bmap -v /storage/loop/file1' show? It spews a lot of stuff but here is a summary, more detailed info can be provided if you need it: xfs_bmap -v /storage/loop/file1 | wc -l 900908 xfs_bmap -v /storage/loop/file1 | grep -c hole 94568 Also, what would constitute an "excessive number of extents"? > > Brian > >> [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 >> [<ffffffffa0256269>] xfs_iext_insert+0x59/0x110 [xfs] >> [<ffffffffa0230928>] ? xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] >> [<ffffffffa0230928>] xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] >> [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 >> [<ffffffff8112b725>] ? mempool_alloc+0x65/0x180 >> [<ffffffffa02543d8>] ? xfs_iext_get_ext+0x38/0x70 [xfs] >> [<ffffffffa0254e8d>] ? xfs_iext_bno_to_ext+0xed/0x150 [xfs] >> [<ffffffffa02311b5>] xfs_bmapi_reserve_delalloc+0x225/0x250 [xfs] >> [<ffffffffa023131e>] xfs_bmapi_delay+0x13e/0x290 [xfs] >> [<ffffffffa02730ad>] xfs_iomap_write_delay+0x17d/0x300 [xfs] >> [<ffffffffa022e434>] ? xfs_bmapi_read+0x114/0x330 [xfs] >> [<ffffffffa025ddc5>] __xfs_get_blocks+0x585/0xa90 [xfs] >> [<ffffffff81324b53>] ? __percpu_counter_add+0x63/0x80 >> [<ffffffff811374cd>] ? account_page_dirtied+0xed/0x1b0 >> [<ffffffff811cfc59>] ? alloc_buffer_head+0x49/0x60 >> [<ffffffff811d07c0>] ? alloc_page_buffers+0x60/0xb0 >> [<ffffffff811d13e5>] ? create_empty_buffers+0x45/0xc0 >> [<ffffffffa025e324>] xfs_get_blocks+0x14/0x20 [xfs] >> [<ffffffff811d34e2>] __block_write_begin+0x1c2/0x580 >> [<ffffffffa025e310>] ? xfs_get_blocks_direct+0x20/0x20 [xfs] >> [<ffffffffa025bbb1>] xfs_vm_write_begin+0x61/0xf0 [xfs] >> [<ffffffff81127e50>] generic_perform_write+0xd0/0x1f0 >> [<ffffffffa026a341>] xfs_file_buffered_aio_write+0xe1/0x240 [xfs] >> [<ffffffff812e16d2>] ? bt_clear_tag+0xb2/0xd0 >> [<ffffffffa026ab87>] xfs_file_write_iter+0x167/0x170 [xfs] >> [<ffffffff81199d76>] vfs_iter_write+0x76/0xa0 >> [<ffffffffa03fb735>] lo_write_bvec+0x65/0x100 [loop] >> [<ffffffffa03fd589>] loop_queue_work+0x689/0x924 [loop] >> [<ffffffff8163ba52>] ? retint_kernel+0x10/0x10 >> [<ffffffff81074d71>] kthread_worker_fn+0x61/0x1c0 >> [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 >> [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 >> [<ffffffff810744d7>] kthread+0xd7/0xf0 >> [<ffffffff8107d22e>] ? schedule_tail+0x1e/0xd0 >> [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 >> [<ffffffff8163b2af>] ret_from_fork+0x3f/0x70 >> [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 >> >> So this seems that there are writes to the loop device being queued and >> while being served XFS has to do some internal memory allocation to fit >> the new data, however due to some *uknown* reason it fails and starts >> looping in kmem_alloc. I didn't see any OOM reports so presumably the >> server was not out of memory, but unfortunately I didn't check the >> memory fragmentation, though I collected a crash dump in case you need >> further info. >> >> The one thing which bugs me is that XFS tried to allocate 107 contiguous >> kb which is page-order-26 isn't this waaaaay too big and almost never >> satisfiable, despite direct/bg reclaim to be enabled? For now I've >> reverted to using 3.12.52 kernel, where this issue hasn't been observed >> (yet) any ideas would be much appreciated. >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 12:56 ` Nikolay Borisov @ 2016-03-23 13:10 ` Brian Foster 2016-03-23 15:03 ` Nikolay Borisov 2016-03-23 23:00 ` Dave Chinner 0 siblings, 2 replies; 13+ messages in thread From: Brian Foster @ 2016-03-23 13:10 UTC (permalink / raw) To: Nikolay Borisov; +Cc: xfs On Wed, Mar 23, 2016 at 02:56:25PM +0200, Nikolay Borisov wrote: > > > On 03/23/2016 02:43 PM, Brian Foster wrote: > > On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote: ... > > It looks like it's working to add a new extent to the in-core extent > > list. If this is the stack associated with the warning message (combined > > with the large alloc size), I wonder if there's a fragmentation issue on > > the file leading to an excessive number of extents. > > Yes this is the stack trace associated. > > > > > What does 'xfs_bmap -v /storage/loop/file1' show? > > It spews a lot of stuff but here is a summary, more detailed info can be > provided if you need it: > > xfs_bmap -v /storage/loop/file1 | wc -l > 900908 > xfs_bmap -v /storage/loop/file1 | grep -c hole > 94568 > > Also, what would constitute an "excessive number of extents"? > I'm not sure where one would draw the line tbh, it's just a matter of having too many extents to the point that it causes problems in terms of performance (i.e., reading/modifying the extent list) or such as the allocation problem you're running into. As it is, XFS maintains the full extent list for an active inode in memory, so that's 800k+ extents that it's looking for memory for. It looks like that is your problem here. 800k or so extents over 878G looks to be about 1MB per extent. Are you using extent size hints? One option that might prevent this is to use a larger extent size hint value. Another might be to preallocate the entire file up front with fallocate. You'd probably have to experiment with what option or value works best for your workload. Brian > > > > Brian > > > >> [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 > >> [<ffffffffa0256269>] xfs_iext_insert+0x59/0x110 [xfs] > >> [<ffffffffa0230928>] ? xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] > >> [<ffffffffa0230928>] xfs_bmap_add_extent_hole_delay+0xd8/0x740 [xfs] > >> [<ffffffff8112b5c5>] ? mempool_alloc_slab+0x15/0x20 > >> [<ffffffff8112b725>] ? mempool_alloc+0x65/0x180 > >> [<ffffffffa02543d8>] ? xfs_iext_get_ext+0x38/0x70 [xfs] > >> [<ffffffffa0254e8d>] ? xfs_iext_bno_to_ext+0xed/0x150 [xfs] > >> [<ffffffffa02311b5>] xfs_bmapi_reserve_delalloc+0x225/0x250 [xfs] > >> [<ffffffffa023131e>] xfs_bmapi_delay+0x13e/0x290 [xfs] > >> [<ffffffffa02730ad>] xfs_iomap_write_delay+0x17d/0x300 [xfs] > >> [<ffffffffa022e434>] ? xfs_bmapi_read+0x114/0x330 [xfs] > >> [<ffffffffa025ddc5>] __xfs_get_blocks+0x585/0xa90 [xfs] > >> [<ffffffff81324b53>] ? __percpu_counter_add+0x63/0x80 > >> [<ffffffff811374cd>] ? account_page_dirtied+0xed/0x1b0 > >> [<ffffffff811cfc59>] ? alloc_buffer_head+0x49/0x60 > >> [<ffffffff811d07c0>] ? alloc_page_buffers+0x60/0xb0 > >> [<ffffffff811d13e5>] ? create_empty_buffers+0x45/0xc0 > >> [<ffffffffa025e324>] xfs_get_blocks+0x14/0x20 [xfs] > >> [<ffffffff811d34e2>] __block_write_begin+0x1c2/0x580 > >> [<ffffffffa025e310>] ? xfs_get_blocks_direct+0x20/0x20 [xfs] > >> [<ffffffffa025bbb1>] xfs_vm_write_begin+0x61/0xf0 [xfs] > >> [<ffffffff81127e50>] generic_perform_write+0xd0/0x1f0 > >> [<ffffffffa026a341>] xfs_file_buffered_aio_write+0xe1/0x240 [xfs] > >> [<ffffffff812e16d2>] ? bt_clear_tag+0xb2/0xd0 > >> [<ffffffffa026ab87>] xfs_file_write_iter+0x167/0x170 [xfs] > >> [<ffffffff81199d76>] vfs_iter_write+0x76/0xa0 > >> [<ffffffffa03fb735>] lo_write_bvec+0x65/0x100 [loop] > >> [<ffffffffa03fd589>] loop_queue_work+0x689/0x924 [loop] > >> [<ffffffff8163ba52>] ? retint_kernel+0x10/0x10 > >> [<ffffffff81074d71>] kthread_worker_fn+0x61/0x1c0 > >> [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 > >> [<ffffffff81074d10>] ? flush_kthread_work+0x120/0x120 > >> [<ffffffff810744d7>] kthread+0xd7/0xf0 > >> [<ffffffff8107d22e>] ? schedule_tail+0x1e/0xd0 > >> [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 > >> [<ffffffff8163b2af>] ret_from_fork+0x3f/0x70 > >> [<ffffffff81074400>] ? kthread_freezable_should_stop+0x80/0x80 > >> > >> So this seems that there are writes to the loop device being queued and > >> while being served XFS has to do some internal memory allocation to fit > >> the new data, however due to some *uknown* reason it fails and starts > >> looping in kmem_alloc. I didn't see any OOM reports so presumably the > >> server was not out of memory, but unfortunately I didn't check the > >> memory fragmentation, though I collected a crash dump in case you need > >> further info. > >> > >> The one thing which bugs me is that XFS tried to allocate 107 contiguous > >> kb which is page-order-26 isn't this waaaaay too big and almost never > >> satisfiable, despite direct/bg reclaim to be enabled? For now I've > >> reverted to using 3.12.52 kernel, where this issue hasn't been observed > >> (yet) any ideas would be much appreciated. > >> > >> _______________________________________________ > >> xfs mailing list > >> xfs@oss.sgi.com > >> http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 13:10 ` Brian Foster @ 2016-03-23 15:03 ` Nikolay Borisov 2016-03-23 16:58 ` Brian Foster 2016-03-23 23:00 ` Dave Chinner 1 sibling, 1 reply; 13+ messages in thread From: Nikolay Borisov @ 2016-03-23 15:03 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 03/23/2016 03:10 PM, Brian Foster wrote: > On Wed, Mar 23, 2016 at 02:56:25PM +0200, Nikolay Borisov wrote: >> >> >> On 03/23/2016 02:43 PM, Brian Foster wrote: >>> On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote: > ... >>> It looks like it's working to add a new extent to the in-core extent >>> list. If this is the stack associated with the warning message (combined >>> with the large alloc size), I wonder if there's a fragmentation issue on >>> the file leading to an excessive number of extents. >> >> Yes this is the stack trace associated. >> >>> >>> What does 'xfs_bmap -v /storage/loop/file1' show? >> >> It spews a lot of stuff but here is a summary, more detailed info can be >> provided if you need it: >> >> xfs_bmap -v /storage/loop/file1 | wc -l >> 900908 >> xfs_bmap -v /storage/loop/file1 | grep -c hole >> 94568 >> >> Also, what would constitute an "excessive number of extents"? >> > > I'm not sure where one would draw the line tbh, it's just a matter of > having too many extents to the point that it causes problems in terms of > performance (i.e., reading/modifying the extent list) or such as the > allocation problem you're running into. As it is, XFS maintains the full > extent list for an active inode in memory, so that's 800k+ extents that > it's looking for memory for. I saw in the comments that this problem has already been identified and a possible solution would be to add another level of indirection. Also, can you confirm that my understanding of the operation of the indirection array is correct in that each entry in the indirection array xfs_ext_irec is responsible for 256 extents. (the er_extbuf is PAGE_SIZE/4kb and an extent is 16 bytes which results in 256 extents) > > It looks like that is your problem here. 800k or so extents over 878G > looks to be about 1MB per extent. Are you using extent size hints? One > option that might prevent this is to use a larger extent size hint > value. Another might be to preallocate the entire file up front with > fallocate. You'd probably have to experiment with what option or value > works best for your workload. By preallocating with fallocate you mean using fallocate with FALLOC_FL_ZERO_RANGE and not FALLOC_FL_PUNCH_HOLE, right? Because as it stands now the file does have holes, which presumably are being filled and in order to be filled an extent has to be allocated which caused the issue? Am I right in this reasoning? Currently I'm not using extents size hint but will look into that, also if the extent size hint is say 4mb, wouldn't that cause a fairly serious loss of space, provided that the writes are smaller than 4mb. Would XFS try to perform some sort of extent coalescing or something else? I'm not an FS developer but my understanding is that with a 4mb extent size, whenever a new write occurs even if it's 256kb a new 4mb extent would be allocated, no? And a final question - when i printed the contents of the inode with xfs_db I get core.nextents = 972564 whereas invoking the xfs_bmap | wc -l on the file always gives varying numbers? Thanks a lot for taking the time to reply. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 15:03 ` Nikolay Borisov @ 2016-03-23 16:58 ` Brian Foster 0 siblings, 0 replies; 13+ messages in thread From: Brian Foster @ 2016-03-23 16:58 UTC (permalink / raw) To: Nikolay Borisov; +Cc: xfs On Wed, Mar 23, 2016 at 05:03:18PM +0200, Nikolay Borisov wrote: ... > > I'm not sure where one would draw the line tbh, it's just a matter of > > having too many extents to the point that it causes problems in terms of > > performance (i.e., reading/modifying the extent list) or such as the > > allocation problem you're running into. As it is, XFS maintains the full > > extent list for an active inode in memory, so that's 800k+ extents that > > it's looking for memory for. > > I saw in the comments that this problem has already been identified and > a possible solution would be to add another level of indirection. Also, > can you confirm that my understanding of the operation of the > indirection array is correct in that each entry in the indirection array > xfs_ext_irec is responsible for 256 extents. (the er_extbuf is > PAGE_SIZE/4kb and an extent is 16 bytes which results in 256 extents) > That looks about right from the XFS_LINEAR_EXTS #define. I see the comment but I've yet to really dig into the in-core extent list data structures too deep to have any intuition or insight on a potential solution (and don't really have time to atm). Dave or others might already have an understanding of a limitation here. > > > > It looks like that is your problem here. 800k or so extents over 878G > > looks to be about 1MB per extent. Are you using extent size hints? One > > option that might prevent this is to use a larger extent size hint > > value. Another might be to preallocate the entire file up front with > > fallocate. You'd probably have to experiment with what option or value > > works best for your workload. > > By preallocating with fallocate you mean using fallocate with > FALLOC_FL_ZERO_RANGE and not FALLOC_FL_PUNCH_HOLE, right? Because as it > stands now the file does have holes, which presumably are being filled > and in order to be filled an extent has to be allocated which caused the > issue? Am I right in this reasoning? > You don't need either, but definitely not hole punch. ;) See 'man 2 fallocate' for the default behavior (mode == 0). The idea is that the allocation will occur with as large extents as possible, rather than small, fragmented extents as writes occur. This is more reasonable if you ultimately expect to use the entire file. > Currently I'm not using extents size hint but will look into that, also > if the extent size hint is say 4mb, wouldn't that cause a fairly serious > loss of space, provided that the writes are smaller than 4mb. Would XFS > try to perform some sort of extent coalescing or something else? I'm not > an FS developer but my understanding is that with a 4mb extent size, > whenever a new write occurs even if it's 256kb a new 4mb extent would be > allocated, no? > Yes, the extent size hint will "widen" allocations due to smaller writes to the full hint size and alignment. This results in extra space usage at first but reduces fragmentation over time as more of the file is used. E.g., subsequent writes within that 4m range of your previous 256k write will already have blocks allocated (as part of a larger, contiguous extent). The best bet is probably to experiment with your workload or look into your current file layout and try to choose a value that reduces fragmentation without sacrificing too much space efficiency. > And a final question - when i printed the contents of the inode with > xfs_db I get core.nextents = 972564 whereas invoking the xfs_bmap | wc > -l on the file always gives varying numbers? > I'd assume that the file is being actively modified..? I believe xfs_db will read values from disk, which might not be coherent with the latest in memory state, whereas bmap returns the latest layout of the file at the time (which could also change again by the time bmap returns). Brian > Thanks a lot for taking the time to reply. > > > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 13:10 ` Brian Foster 2016-03-23 15:03 ` Nikolay Borisov @ 2016-03-23 23:00 ` Dave Chinner 2016-03-24 9:20 ` Nikolay Borisov 2016-03-24 9:31 ` Christoph Hellwig 1 sibling, 2 replies; 13+ messages in thread From: Dave Chinner @ 2016-03-23 23:00 UTC (permalink / raw) To: Brian Foster; +Cc: Nikolay Borisov, xfs On Wed, Mar 23, 2016 at 09:10:59AM -0400, Brian Foster wrote: > On Wed, Mar 23, 2016 at 02:56:25PM +0200, Nikolay Borisov wrote: > > On 03/23/2016 02:43 PM, Brian Foster wrote: > > > On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote: > ... > > > It looks like it's working to add a new extent to the in-core extent > > > list. If this is the stack associated with the warning message (combined > > > with the large alloc size), I wonder if there's a fragmentation issue on > > > the file leading to an excessive number of extents. > > > > Yes this is the stack trace associated. > > > > > > > > What does 'xfs_bmap -v /storage/loop/file1' show? > > > > It spews a lot of stuff but here is a summary, more detailed info can be > > provided if you need it: > > > > xfs_bmap -v /storage/loop/file1 | wc -l > > 900908 > > xfs_bmap -v /storage/loop/file1 | grep -c hole > > 94568 > > > > Also, what would constitute an "excessive number of extents"? > > > > I'm not sure where one would draw the line tbh, it's just a matter of > having too many extents to the point that it causes problems in terms of > performance (i.e., reading/modifying the extent list) or such as the > allocation problem you're running into. As it is, XFS maintains the full > extent list for an active inode in memory, so that's 800k+ extents that > it's looking for memory for. > > It looks like that is your problem here. 800k or so extents over 878G > looks to be about 1MB per extent. Which I wouldn't call excessive. I use a 1MB extent size hint on all my VM images as this allows the underlying device to do IOs large enough to maintain clear to full bandwidth when reading and writing regions of the underlying image file that are non-contiguous w.r.t. sequential IO from the guest. Mind you, it's not until I use ext4 or btrfs in the guests that I actually see significant increases in extent size. Rule of thumb in my testing is that if XFs creates 100k extents in the image file, ext4 will create 500k, and btrfs will create somewhere between 1m and 5m extents.... i.e. XFS as a guest filesystem gives results in much lower image file fragmentation that the other options.... As it is, yes, the memory allocation problem is with the in-core extent tree, and we've known about it for some time. The issue is that as memory gets fragmented, the top level indirection array grows too large to be allocated as a contiguous chunk. When this happens really depends on memory load, uptime and the way the extent tree is being modified. I'm working on prototype patches to convert it to an in-memory btree but they are far from ready at this point. This isn't straight forward because all the extent management code assumes extents are kept in a linear array and can be directly indexed by array offset rather than file offset. I also want to make sure we can demand page the extent list if necessary, and that also complicates things like locking, as we currently assume the extent list is either completely in memory or not in memory at all. Fundamentally, I don't want to repeat the mistakes ext4 and btrfs have made with their fine-grained in memory extent trees that are based on rb-trees (e.g. global locks, shrinkers that don't scale or consume way too much CPU, excessive memory consumption, etc) and so solving all aspects of the problem in one go is somewhat complex. And, of course, there's so much other stuff that needs to be done at the same time, I cannot find much time to work on it at the moment... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 23:00 ` Dave Chinner @ 2016-03-24 9:20 ` Nikolay Borisov 2016-03-24 21:58 ` Dave Chinner 2016-03-24 9:31 ` Christoph Hellwig 1 sibling, 1 reply; 13+ messages in thread From: Nikolay Borisov @ 2016-03-24 9:20 UTC (permalink / raw) To: Dave Chinner, Brian Foster; +Cc: xfs On 03/24/2016 01:00 AM, Dave Chinner wrote: > On Wed, Mar 23, 2016 at 09:10:59AM -0400, Brian Foster wrote: >> On Wed, Mar 23, 2016 at 02:56:25PM +0200, Nikolay Borisov wrote: >>> On 03/23/2016 02:43 PM, Brian Foster wrote: >>>> On Wed, Mar 23, 2016 at 12:15:42PM +0200, Nikolay Borisov wrote: >> ... >>>> It looks like it's working to add a new extent to the in-core extent >>>> list. If this is the stack associated with the warning message (combined >>>> with the large alloc size), I wonder if there's a fragmentation issue on >>>> the file leading to an excessive number of extents. >>> >>> Yes this is the stack trace associated. >>> >>>> >>>> What does 'xfs_bmap -v /storage/loop/file1' show? >>> >>> It spews a lot of stuff but here is a summary, more detailed info can be >>> provided if you need it: >>> >>> xfs_bmap -v /storage/loop/file1 | wc -l >>> 900908 >>> xfs_bmap -v /storage/loop/file1 | grep -c hole >>> 94568 >>> >>> Also, what would constitute an "excessive number of extents"? >>> >> >> I'm not sure where one would draw the line tbh, it's just a matter of >> having too many extents to the point that it causes problems in terms of >> performance (i.e., reading/modifying the extent list) or such as the >> allocation problem you're running into. As it is, XFS maintains the full >> extent list for an active inode in memory, so that's 800k+ extents that >> it's looking for memory for. >> >> It looks like that is your problem here. 800k or so extents over 878G >> looks to be about 1MB per extent. > > Which I wouldn't call excessive. I use a 1MB extent size hint on all > my VM images as this allows the underlying device to do IOs large > enough to maintain clear to full bandwidth when reading and writing > regions of the underlying image file that are non-contiguous w.r.t. > sequential IO from the guest. > > Mind you, it's not until I use ext4 or btrfs in the guests that I > actually see significant increases in extent size. Rule of thumb in > my testing is that if XFs creates 100k extents in the image file, > ext4 will create 500k, and btrfs will create somewhere between 1m > and 5m extents.... > > i.e. XFS as a guest filesystem gives results in much lower image > file fragmentation that the other options.... > > As it is, yes, the memory allocation problem is with the in-core > extent tree, and we've known about it for some time. The issue is > that as memory gets fragmented, the top level indirection array > grows too large to be allocated as a contiguous chunk. When this > happens really depends on memory load, uptime and the way the extent > tree is being modified. And what about the following completely crazy idea of switching order > 3 allocations to using vmalloc? I know this would incur heavy performance hit, but other than that would it cause correctness issues? Of course I'm not saying this should be implemented in upstream rather whether it's worth it having a go for experimenting with this idea. > > I'm working on prototype patches to convert it to an in-memory btree > but they are far from ready at this point. This isn't straight > forward because all the extent management code assumes extents are > kept in a linear array and can be directly indexed by array offset > rather than file offset. I also want to make sure we can demand page > the extent list if necessary, and that also complicates things like > locking, as we currently assume the extent list is either completely > in memory or not in memory at all. > > Fundamentally, I don't want to repeat the mistakes ext4 and btrfs > have made with their fine-grained in memory extent trees that are > based on rb-trees (e.g. global locks, shrinkers that don't scale or > consume way too much CPU, excessive memory consumption, etc) and so > solving all aspects of the problem in one go is somewhat complex. > And, of course, there's so much other stuff that needs to be done at > the same time, I cannot find much time to work on it at the > moment... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-24 9:20 ` Nikolay Borisov @ 2016-03-24 21:58 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2016-03-24 21:58 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Brian Foster, xfs On Thu, Mar 24, 2016 at 11:20:23AM +0200, Nikolay Borisov wrote: > On 03/24/2016 01:00 AM, Dave Chinner wrote: > > As it is, yes, the memory allocation problem is with the in-core > > extent tree, and we've known about it for some time. The issue is > > that as memory gets fragmented, the top level indirection array > > grows too large to be allocated as a contiguous chunk. When this > > happens really depends on memory load, uptime and the way the extent > > tree is being modified. > > And what about the following completely crazy idea of switching order > > 3 allocations to using vmalloc? I know this would incur heavy > performance hit, but other than that would it cause correctness issues? > Of course I'm not saying this should be implemented in upstream rather > whether it's worth it having a go for experimenting with this idea. It's not an option as many supported platforms which have extremely limited vmalloc space. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 23:00 ` Dave Chinner 2016-03-24 9:20 ` Nikolay Borisov @ 2016-03-24 9:31 ` Christoph Hellwig 2016-03-24 22:00 ` Dave Chinner 1 sibling, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2016-03-24 9:31 UTC (permalink / raw) To: Dave Chinner; +Cc: Brian Foster, Nikolay Borisov, xfs On Thu, Mar 24, 2016 at 10:00:02AM +1100, Dave Chinner wrote: > I'm working on prototype patches to convert it to an in-memory btree > but they are far from ready at this point. This isn't straight > forward because all the extent management code assumes extents are > kept in a linear array and can be directly indexed by array offset > rather than file offset. I also want to make sure we can demand page > the extent list if necessary, and that also complicates things like > locking, as we currently assume the extent list is either completely > in memory or not in memory at all. FYI, I did patches to get rid almost all direct extent array access a while ago, but I never bothered to post it as it seemed to much churn. Have you started that work yet or would it be useful to dust those up again? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-24 9:31 ` Christoph Hellwig @ 2016-03-24 22:00 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2016-03-24 22:00 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Brian Foster, Nikolay Borisov, xfs On Thu, Mar 24, 2016 at 02:31:27AM -0700, Christoph Hellwig wrote: > On Thu, Mar 24, 2016 at 10:00:02AM +1100, Dave Chinner wrote: > > I'm working on prototype patches to convert it to an in-memory btree > > but they are far from ready at this point. This isn't straight > > forward because all the extent management code assumes extents are > > kept in a linear array and can be directly indexed by array offset > > rather than file offset. I also want to make sure we can demand page > > the extent list if necessary, and that also complicates things like > > locking, as we currently assume the extent list is either completely > > in memory or not in memory at all. > > FYI, I did patches to get rid almost all direct extent array access > a while ago, but I never bothered to post it as it seemed to much > churn. Have you started that work yet or would it be useful > to dust those up again? I've done bits of it, but haven't completed it - send me the patches and I'll see which approch makes the most sense... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-23 10:15 Failing XFS memory allocation Nikolay Borisov 2016-03-23 12:43 ` Brian Foster @ 2016-03-24 9:33 ` Christoph Hellwig 2016-03-24 9:42 ` Nikolay Borisov 1 sibling, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2016-03-24 9:33 UTC (permalink / raw) To: Nikolay Borisov; +Cc: xfs [-- Attachment #1: Type: text/plain, Size: 242 bytes --] Hi Nikolay, can you give the patch below a spin? While it doesn't solve the root cause it makes many typical uses of kmem_realloc behave less badly, so it should help with at least some of the less dramatic cases of very fragmented files: [-- Attachment #2: 0001-xfs-improve-kmem_realloc.patch --] [-- Type: text/plain, Size: 4600 bytes --] >From 4cfef0d21729704c79dc26621a254e507ea372a7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Thu, 17 Mar 2016 11:15:59 +0100 Subject: xfs: improve kmem_realloc Use krealloc to implement our realloc function. This helps to avoid new allocations if we are still in the slab bucket. At least for the bmap btree root that's actually the common case. This also allows removing the now unused oldsize argument. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/kmem.c | 26 +++++++++++++++----------- fs/xfs/kmem.h | 2 +- fs/xfs/libxfs/xfs_inode_fork.c | 10 +++------- fs/xfs/xfs_log_recover.c | 2 +- fs/xfs/xfs_mount.c | 1 - 5 files changed, 20 insertions(+), 21 deletions(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 686ba6f..339c696 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -93,19 +93,23 @@ kmem_zalloc_large(size_t size, xfs_km_flags_t flags) } void * -kmem_realloc(const void *ptr, size_t newsize, size_t oldsize, - xfs_km_flags_t flags) +kmem_realloc(const void *old, size_t newsize, xfs_km_flags_t flags) { - void *new; + int retries = 0; + gfp_t lflags = kmem_flags_convert(flags); + void *ptr; - new = kmem_alloc(newsize, flags); - if (ptr) { - if (new) - memcpy(new, ptr, - ((oldsize < newsize) ? oldsize : newsize)); - kmem_free(ptr); - } - return new; + do { + ptr = krealloc(old, newsize, lflags); + if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP))) + return ptr; + if (!(++retries % 100)) + xfs_err(NULL, + "%s(%u) possible memory allocation deadlock size %zu in %s (mode:0x%x)", + current->comm, current->pid, + newsize, __func__, lflags); + congestion_wait(BLK_RW_ASYNC, HZ/50); + } while (1); } void * diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index d1c66e4..689f746 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -62,7 +62,7 @@ kmem_flags_convert(xfs_km_flags_t flags) extern void *kmem_alloc(size_t, xfs_km_flags_t); extern void *kmem_zalloc_large(size_t size, xfs_km_flags_t); -extern void *kmem_realloc(const void *, size_t, size_t, xfs_km_flags_t); +extern void *kmem_realloc(const void *, size_t, xfs_km_flags_t); static inline void kmem_free(const void *ptr) { kvfree(ptr); diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 4fbe226..d3d1477 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -542,7 +542,6 @@ xfs_iroot_realloc( new_max = cur_max + rec_diff; new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max); ifp->if_broot = kmem_realloc(ifp->if_broot, new_size, - XFS_BMAP_BROOT_SPACE_CALC(mp, cur_max), KM_SLEEP | KM_NOFS); op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1, ifp->if_broot_bytes); @@ -686,7 +685,6 @@ xfs_idata_realloc( ifp->if_u1.if_data = kmem_realloc(ifp->if_u1.if_data, real_size, - ifp->if_real_bytes, KM_SLEEP | KM_NOFS); } } else { @@ -1402,8 +1400,7 @@ xfs_iext_realloc_direct( if (rnew_size != ifp->if_real_bytes) { ifp->if_u1.if_extents = kmem_realloc(ifp->if_u1.if_extents, - rnew_size, - ifp->if_real_bytes, KM_NOFS); + rnew_size, KM_NOFS); } if (rnew_size > ifp->if_real_bytes) { memset(&ifp->if_u1.if_extents[ifp->if_bytes / @@ -1487,9 +1484,8 @@ xfs_iext_realloc_indirect( if (new_size == 0) { xfs_iext_destroy(ifp); } else { - ifp->if_u1.if_ext_irec = (xfs_ext_irec_t *) - kmem_realloc(ifp->if_u1.if_ext_irec, - new_size, size, KM_NOFS); + ifp->if_u1.if_ext_irec = + kmem_realloc(ifp->if_u1.if_ext_irec, new_size, KM_NOFS); } } diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 396565f..bf6e807 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3843,7 +3843,7 @@ xlog_recover_add_to_cont_trans( old_ptr = item->ri_buf[item->ri_cnt-1].i_addr; old_len = item->ri_buf[item->ri_cnt-1].i_len; - ptr = kmem_realloc(old_ptr, len+old_len, old_len, KM_SLEEP); + ptr = kmem_realloc(old_ptr, len + old_len, KM_SLEEP); memcpy(&ptr[old_len], dp, len); item->ri_buf[item->ri_cnt-1].i_len += len; item->ri_buf[item->ri_cnt-1].i_addr = ptr; diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 536a0ee..654799f 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -89,7 +89,6 @@ xfs_uuid_mount( if (hole < 0) { xfs_uuid_table = kmem_realloc(xfs_uuid_table, (xfs_uuid_table_size + 1) * sizeof(*xfs_uuid_table), - xfs_uuid_table_size * sizeof(*xfs_uuid_table), KM_SLEEP); hole = xfs_uuid_table_size++; } -- 2.1.4 [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Failing XFS memory allocation 2016-03-24 9:33 ` Christoph Hellwig @ 2016-03-24 9:42 ` Nikolay Borisov 0 siblings, 0 replies; 13+ messages in thread From: Nikolay Borisov @ 2016-03-24 9:42 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On 03/24/2016 11:33 AM, Christoph Hellwig wrote: > Hi Nikolay, > > can you give the patch below a spin? While it doesn't solve the root > cause it makes many typical uses of kmem_realloc behave less badly, > so it should help with at least some of the less dramatic cases of very > fragmented files: > Sure, however I just checked some other servers with analogical setup and there are files which even larger extents (the largest I saw was 2 millions) so I guess in this particular case the memory was fragmented and compaction as invoked from the page allocator couldn't satisfy it. So I don't know if it will help in my particular case but in any case I will give it a go. Thanks _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-03-24 22:00 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-23 10:15 Failing XFS memory allocation Nikolay Borisov 2016-03-23 12:43 ` Brian Foster 2016-03-23 12:56 ` Nikolay Borisov 2016-03-23 13:10 ` Brian Foster 2016-03-23 15:03 ` Nikolay Borisov 2016-03-23 16:58 ` Brian Foster 2016-03-23 23:00 ` Dave Chinner 2016-03-24 9:20 ` Nikolay Borisov 2016-03-24 21:58 ` Dave Chinner 2016-03-24 9:31 ` Christoph Hellwig 2016-03-24 22:00 ` Dave Chinner 2016-03-24 9:33 ` Christoph Hellwig 2016-03-24 9:42 ` Nikolay Borisov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox