From: Nikolay Borisov <nborisov@suse.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/7] btrfs: Preallocate chunks in cow_file_range_async
Date: Thu, 28 Mar 2019 14:52:37 +0200 [thread overview]
Message-ID: <7dedd732-b3dd-8b2c-c6f5-9cb63fc899d3@suse.com> (raw)
In-Reply-To: <04687d49-6642-54bf-1a8f-18f6b54465a0@suse.com>
On 28.03.19 г. 14:49 ч., Nikolay Borisov wrote:
>
>
> On 27.03.19 г. 19:23 ч., David Sterba wrote:
>> On Tue, Mar 12, 2019 at 05:20:24PM +0200, Nikolay Borisov wrote:
>>> @@ -1190,45 +1201,71 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
>>> unsigned int write_flags)
>>> {
>>> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>> - struct async_cow *async_cow;
>>> + struct async_cow *ctx;
>>> + struct async_chunk *async_chunk;
>>> unsigned long nr_pages;
>>> u64 cur_end;
>>> + u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
>>> + int i;
>>> + bool should_compress;
>>>
>>> clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED,
>>> 1, 0, NULL);
>>> - while (start < end) {
>>> - async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
>>> - BUG_ON(!async_cow); /* -ENOMEM */
>>> +
>>> + if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS &&
>>> + !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
>>> + num_chunks = 1;
>>> + should_compress = false;
>>> + } else {
>>> + should_compress = true;
>>> + }
>>> +
>>> + ctx = kmalloc(struct_size(ctx, chunks, num_chunks), GFP_NOFS);
>>
>> This leads to OOM due to high order allocation. And this is worse than
>> the previous state, where there are many small allocation that could
>> potentially fail (but most likely will not due to GFP_NOSF and size <
>> PAGE_SIZE).
>>
>> So this needs to be reworked to avoid the costly allocations or reverted
>> to the previous state.
>
> Right, makes sense. In order to have a simplified submission logic I
> think to rework the allocation to have a loop that allocates a single
> item for every chunk or alternatively switch to using kvmalloc? I think
> the fact that vmalloced memory might not be contiguous is not critical
> for the metadata structures in this case?
Just had a quick read through gfp_mask-from-fs-io.rst:
vmalloc doesn't support GFP_NOFS semantic because there are hardcoded
GFP_KERNEL allocations deep inside the allocator which are quite
non-trivial
to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
almost always a bug. The good news is that the NOFS/NOIO semantic can be
achieved by the scope API.
>
>>
>> btrfs/138 [19:44:05][ 4034.368157] run fstests btrfs/138 at 2019-03-25 19:44:05
>> [ 4034.559716] BTRFS: device fsid 9300f07a-78f4-4ac6-8376-1a902ef26830 devid 1 transid 5 /dev/vdb
>> [ 4034.573670] BTRFS info (device vdb): disk space caching is enabled
>> [ 4034.575068] BTRFS info (device vdb): has skinny extents
>> [ 4034.576258] BTRFS info (device vdb): flagging fs with big metadata feature
>> [ 4034.580226] BTRFS info (device vdb): checking UUID tree
>> [ 4066.104734] BTRFS info (device vdb): disk space caching is enabled
>> [ 4066.108558] BTRFS info (device vdb): has skinny extents
>> [ 4066.186856] BTRFS info (device vdb): setting 8 feature flag
>> [ 4074.017307] BTRFS info (device vdb): disk space caching is enabled
>> [ 4074.019646] BTRFS info (device vdb): has skinny extents
>> [ 4074.065117] BTRFS info (device vdb): setting 16 feature flag
>> [ 4075.787401] kworker/u8:12: page allocation failure: order:4, mode:0x604040(GFP_NOFS|__GFP_COMP), nodemask=(null)
>> [ 4075.789581] CPU: 0 PID: 31258 Comm: kworker/u8:12 Not tainted 5.0.0-rc8-default+ #524
>> [ 4075.791235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
>> [ 4075.793334] Workqueue: writeback wb_workfn (flush-btrfs-718)
>> [ 4075.794455] Call Trace:
>> [ 4075.795029] dump_stack+0x67/0x90
>> [ 4075.795756] warn_alloc.cold.131+0x73/0xf3
>> [ 4075.796601] __alloc_pages_slowpath+0xa0e/0xb50
>> [ 4075.797595] ? __wake_up_common_lock+0x89/0xc0
>> [ 4075.798558] __alloc_pages_nodemask+0x2bd/0x310
>> [ 4075.799537] kmalloc_order+0x14/0x60
>> [ 4075.800382] kmalloc_order_trace+0x1d/0x120
>> [ 4075.801341] btrfs_run_delalloc_range+0x3e6/0x4b0 [btrfs]
>> [ 4075.802344] writepage_delalloc+0xf8/0x150 [btrfs]
>> [ 4075.802991] __extent_writepage+0x113/0x420 [btrfs]
>> [ 4075.803640] extent_write_cache_pages+0x2a6/0x400 [btrfs]
>> [ 4075.804340] extent_writepages+0x52/0xa0 [btrfs]
>> [ 4075.804951] do_writepages+0x3e/0xe0
>> [ 4075.805480] ? writeback_sb_inodes+0x133/0x550
>> [ 4075.806406] __writeback_single_inode+0x54/0x640
>> [ 4075.807315] writeback_sb_inodes+0x204/0x550
>> [ 4075.808112] __writeback_inodes_wb+0x5d/0xb0
>> [ 4075.808692] wb_writeback+0x337/0x4a0
>> [ 4075.809207] wb_workfn+0x3a7/0x590
>> [ 4075.809849] process_one_work+0x246/0x610
>> [ 4075.810665] worker_thread+0x3c/0x390
>> [ 4075.811415] ? rescuer_thread+0x360/0x360
>> [ 4075.812293] kthread+0x116/0x130
>> [ 4075.812965] ? kthread_create_on_node+0x60/0x60
>> [ 4075.813870] ret_from_fork+0x24/0x30
>> [ 4075.814664] Mem-Info:
>> [ 4075.815167] active_anon:2942 inactive_anon:15105 isolated_anon:0
>> [ 4075.815167] active_file:2749 inactive_file:454876 isolated_file:0
>> [ 4075.815167] unevictable:0 dirty:68316 writeback:0 unstable:0
>> [ 4075.815167] slab_reclaimable:5500 slab_unreclaimable:6458
>> [ 4075.815167] mapped:940 shmem:15483 pagetables:51 bounce:0
>> [ 4075.815167] free:7068 free_pcp:297 free_cma:0
>> [ 4075.823236] Node 0 active_anon:11768kB inactive_anon:60420kB active_file:10996kB inactive_file:1827676kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3760kB dirty:277360kB writeback:0kB shmem:61932kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>> [ 4075.828200] Node 0 DMA free:7860kB min:44kB low:56kB high:68kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:8012kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> [ 4075.834484] lowmem_reserve[]: 0 1955 1955 1955
>> [ 4075.835419] Node 0 DMA32 free:11292kB min:5632kB low:7632kB high:9632kB active_anon:11768kB inactive_anon:60416kB active_file:10996kB inactive_file:1820532kB unevictable:0kB writepending:281184kB present:2080568kB managed:2009324kB mlocked:0kB kernel_stack:1984kB pagetables:204kB bounce:0kB free_pcp:132kB local_pcp:0kB free_cma:0k
>> [ 4075.841848] lowmem_reserve[]: 0 0 0 0
>> [ 4075.842677] Node 0 DMA: 1*4kB (U) 2*8kB (U) 4*16kB (UME) 5*32kB (UME) 1*64kB (E) 3*128kB (UME) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ME) 0*4096kB = 7860kB
>> [ 4075.844961] Node 0 DMA32: 234*4kB (UME) 238*8kB (UME) 426*16kB (UM) 43*32kB (UM) 28*64kB (UM) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 1*2048kB (H) 0*4096kB = 16280kB
>> [ 4075.847915] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> [ 4075.849266] 474599 total pagecache pages
>> [ 4075.850058] 0 pages in swap cache
>> [ 4075.850808] Swap cache stats: add 0, delete 0, find 0/0
>> [ 4075.851990] Free swap = 0kB
>> [ 4075.852811] Total swap = 0kB
>> [ 4075.853635] 524140 pages RAM
>> [ 4075.854351] 0 pages HighMem/MovableOnly
>> [ 4075.855048] 17832 pages reserved
>>
next prev parent reply other threads:[~2019-03-28 12:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-12 15:20 [PATCH v4 0/7] Compress path cleanups Nikolay Borisov
2019-03-12 15:20 ` [PATCH 1/7] btrfs: Preallocate chunks in cow_file_range_async Nikolay Borisov
2019-03-12 15:35 ` Johannes Thumshirn
2019-03-27 17:23 ` David Sterba
2019-03-28 12:49 ` Nikolay Borisov
2019-03-28 12:52 ` Nikolay Borisov [this message]
2019-03-28 14:11 ` David Sterba
2019-03-28 15:10 ` Nikolay Borisov
2019-03-28 15:45 ` David Sterba
2019-03-12 15:20 ` [PATCH 2/7] btrfs: Rename async_cow to async_chunk Nikolay Borisov
2019-03-12 15:34 ` Johannes Thumshirn
2019-03-12 15:20 ` [PATCH 3/7] btrfs: Remove fs_info from struct async_chunk Nikolay Borisov
2019-03-12 15:20 ` [PATCH 4/7] btrfs: Make compress_file_range take only " Nikolay Borisov
2019-03-12 15:20 ` [PATCH 5/7] btrfs: Replace clear_extent_bit with unlock_extent Nikolay Borisov
2019-03-12 15:27 ` Johannes Thumshirn
2019-03-12 15:20 ` [PATCH 6/7] btrfs: Set iotree only once in submit_compressed_extents Nikolay Borisov
2019-03-12 15:30 ` Johannes Thumshirn
2019-03-12 15:20 ` [PATCH 7/7] btrfs: Factor out common extent locking code " Nikolay Borisov
2019-03-12 15:31 ` Johannes Thumshirn
2019-03-13 15:36 ` [PATCH v4 0/7] Compress path cleanups David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7dedd732-b3dd-8b2c-c6f5-9cb63fc899d3@suse.com \
--to=nborisov@suse.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).