linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/7] btrfs: Preallocate chunks in cow_file_range_async
Date: Thu, 28 Mar 2019 14:52:37 +0200	[thread overview]
Message-ID: <7dedd732-b3dd-8b2c-c6f5-9cb63fc899d3@suse.com> (raw)
In-Reply-To: <04687d49-6642-54bf-1a8f-18f6b54465a0@suse.com>



On 28.03.19 г. 14:49 ч., Nikolay Borisov wrote:
> 
> 
> On 27.03.19 г. 19:23 ч., David Sterba wrote:
>> On Tue, Mar 12, 2019 at 05:20:24PM +0200, Nikolay Borisov wrote:
>>> @@ -1190,45 +1201,71 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
>>>  				unsigned int write_flags)
>>>  {
>>>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>> -	struct async_cow *async_cow;
>>> +	struct async_cow *ctx;
>>> +	struct async_chunk *async_chunk;
>>>  	unsigned long nr_pages;
>>>  	u64 cur_end;
>>> +	u64 num_chunks = DIV_ROUND_UP(end - start, SZ_512K);
>>> +	int i;
>>> +	bool should_compress;
>>>  
>>>  	clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED,
>>>  			 1, 0, NULL);
>>> -	while (start < end) {
>>> -		async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
>>> -		BUG_ON(!async_cow); /* -ENOMEM */
>>> +
>>> +	if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS &&
>>> +	    !btrfs_test_opt(fs_info, FORCE_COMPRESS)) {
>>> +		num_chunks = 1;
>>> +		should_compress = false;
>>> +	} else {
>>> +		should_compress = true;
>>> +	}
>>> +
>>> +	ctx = kmalloc(struct_size(ctx, chunks, num_chunks), GFP_NOFS);
>>
>> This leads to OOM due to high order allocation. And this is worse than
>> the previous state, where there are many small allocation that could
>> potentially fail (but most likely will not due to GFP_NOSF and size <
>> PAGE_SIZE).
>>
>> So this needs to be reworked to avoid the costly allocations or reverted
>> to the previous state.
> 
> Right, makes sense. In order to have a simplified submission logic I
> think to rework the allocation to have a loop that allocates a single
> item for every chunk or alternatively switch to using kvmalloc? I think
> the fact that vmalloced memory might not be contiguous is not critical
> for the metadata structures in this case?

Just had a quick read through gfp_mask-from-fs-io.rst:

vmalloc doesn't support GFP_NOFS semantic because there are hardcoded

GFP_KERNEL allocations deep inside the allocator which are quite
non-trivial
to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is

almost always a bug. The good news is that the NOFS/NOIO semantic can be

achieved by the scope API.


> 
>>
>> btrfs/138               [19:44:05][ 4034.368157] run fstests btrfs/138 at 2019-03-25 19:44:05
>> [ 4034.559716] BTRFS: device fsid 9300f07a-78f4-4ac6-8376-1a902ef26830 devid 1 transid 5 /dev/vdb
>> [ 4034.573670] BTRFS info (device vdb): disk space caching is enabled
>> [ 4034.575068] BTRFS info (device vdb): has skinny extents
>> [ 4034.576258] BTRFS info (device vdb): flagging fs with big metadata feature
>> [ 4034.580226] BTRFS info (device vdb): checking UUID tree
>> [ 4066.104734] BTRFS info (device vdb): disk space caching is enabled
>> [ 4066.108558] BTRFS info (device vdb): has skinny extents
>> [ 4066.186856] BTRFS info (device vdb): setting 8 feature flag
>> [ 4074.017307] BTRFS info (device vdb): disk space caching is enabled
>> [ 4074.019646] BTRFS info (device vdb): has skinny extents
>> [ 4074.065117] BTRFS info (device vdb): setting 16 feature flag
>> [ 4075.787401] kworker/u8:12: page allocation failure: order:4, mode:0x604040(GFP_NOFS|__GFP_COMP), nodemask=(null)
>> [ 4075.789581] CPU: 0 PID: 31258 Comm: kworker/u8:12 Not tainted 5.0.0-rc8-default+ #524
>> [ 4075.791235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
>> [ 4075.793334] Workqueue: writeback wb_workfn (flush-btrfs-718)
>> [ 4075.794455] Call Trace:
>> [ 4075.795029]  dump_stack+0x67/0x90
>> [ 4075.795756]  warn_alloc.cold.131+0x73/0xf3
>> [ 4075.796601]  __alloc_pages_slowpath+0xa0e/0xb50
>> [ 4075.797595]  ? __wake_up_common_lock+0x89/0xc0
>> [ 4075.798558]  __alloc_pages_nodemask+0x2bd/0x310
>> [ 4075.799537]  kmalloc_order+0x14/0x60
>> [ 4075.800382]  kmalloc_order_trace+0x1d/0x120
>> [ 4075.801341]  btrfs_run_delalloc_range+0x3e6/0x4b0 [btrfs]
>> [ 4075.802344]  writepage_delalloc+0xf8/0x150 [btrfs]
>> [ 4075.802991]  __extent_writepage+0x113/0x420 [btrfs]
>> [ 4075.803640]  extent_write_cache_pages+0x2a6/0x400 [btrfs]
>> [ 4075.804340]  extent_writepages+0x52/0xa0 [btrfs]
>> [ 4075.804951]  do_writepages+0x3e/0xe0
>> [ 4075.805480]  ? writeback_sb_inodes+0x133/0x550
>> [ 4075.806406]  __writeback_single_inode+0x54/0x640
>> [ 4075.807315]  writeback_sb_inodes+0x204/0x550
>> [ 4075.808112]  __writeback_inodes_wb+0x5d/0xb0
>> [ 4075.808692]  wb_writeback+0x337/0x4a0
>> [ 4075.809207]  wb_workfn+0x3a7/0x590
>> [ 4075.809849]  process_one_work+0x246/0x610
>> [ 4075.810665]  worker_thread+0x3c/0x390
>> [ 4075.811415]  ? rescuer_thread+0x360/0x360
>> [ 4075.812293]  kthread+0x116/0x130
>> [ 4075.812965]  ? kthread_create_on_node+0x60/0x60
>> [ 4075.813870]  ret_from_fork+0x24/0x30
>> [ 4075.814664] Mem-Info:
>> [ 4075.815167] active_anon:2942 inactive_anon:15105 isolated_anon:0
>> [ 4075.815167]  active_file:2749 inactive_file:454876 isolated_file:0
>> [ 4075.815167]  unevictable:0 dirty:68316 writeback:0 unstable:0
>> [ 4075.815167]  slab_reclaimable:5500 slab_unreclaimable:6458
>> [ 4075.815167]  mapped:940 shmem:15483 pagetables:51 bounce:0
>> [ 4075.815167]  free:7068 free_pcp:297 free_cma:0
>> [ 4075.823236] Node 0 active_anon:11768kB inactive_anon:60420kB active_file:10996kB inactive_file:1827676kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3760kB dirty:277360kB writeback:0kB shmem:61932kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>> [ 4075.828200] Node 0 DMA free:7860kB min:44kB low:56kB high:68kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:8012kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> [ 4075.834484] lowmem_reserve[]: 0 1955 1955 1955
>> [ 4075.835419] Node 0 DMA32 free:11292kB min:5632kB low:7632kB high:9632kB active_anon:11768kB inactive_anon:60416kB active_file:10996kB inactive_file:1820532kB unevictable:0kB writepending:281184kB present:2080568kB managed:2009324kB mlocked:0kB kernel_stack:1984kB pagetables:204kB bounce:0kB free_pcp:132kB local_pcp:0kB free_cma:0k 
>> [ 4075.841848] lowmem_reserve[]: 0 0 0 0
>> [ 4075.842677] Node 0 DMA: 1*4kB (U) 2*8kB (U) 4*16kB (UME) 5*32kB (UME) 1*64kB (E) 3*128kB (UME) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ME) 0*4096kB = 7860kB
>> [ 4075.844961] Node 0 DMA32: 234*4kB (UME) 238*8kB (UME) 426*16kB (UM) 43*32kB (UM) 28*64kB (UM) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 1*2048kB (H) 0*4096kB = 16280kB
>> [ 4075.847915] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> [ 4075.849266] 474599 total pagecache pages
>> [ 4075.850058] 0 pages in swap cache
>> [ 4075.850808] Swap cache stats: add 0, delete 0, find 0/0
>> [ 4075.851990] Free swap  = 0kB
>> [ 4075.852811] Total swap = 0kB
>> [ 4075.853635] 524140 pages RAM
>> [ 4075.854351] 0 pages HighMem/MovableOnly
>> [ 4075.855048] 17832 pages reserved
>>

  reply	other threads:[~2019-03-28 12:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 15:20 [PATCH v4 0/7] Compress path cleanups Nikolay Borisov
2019-03-12 15:20 ` [PATCH 1/7] btrfs: Preallocate chunks in cow_file_range_async Nikolay Borisov
2019-03-12 15:35   ` Johannes Thumshirn
2019-03-27 17:23   ` David Sterba
2019-03-28 12:49     ` Nikolay Borisov
2019-03-28 12:52       ` Nikolay Borisov [this message]
2019-03-28 14:11       ` David Sterba
2019-03-28 15:10         ` Nikolay Borisov
2019-03-28 15:45           ` David Sterba
2019-03-12 15:20 ` [PATCH 2/7] btrfs: Rename async_cow to async_chunk Nikolay Borisov
2019-03-12 15:34   ` Johannes Thumshirn
2019-03-12 15:20 ` [PATCH 3/7] btrfs: Remove fs_info from struct async_chunk Nikolay Borisov
2019-03-12 15:20 ` [PATCH 4/7] btrfs: Make compress_file_range take only " Nikolay Borisov
2019-03-12 15:20 ` [PATCH 5/7] btrfs: Replace clear_extent_bit with unlock_extent Nikolay Borisov
2019-03-12 15:27   ` Johannes Thumshirn
2019-03-12 15:20 ` [PATCH 6/7] btrfs: Set iotree only once in submit_compressed_extents Nikolay Borisov
2019-03-12 15:30   ` Johannes Thumshirn
2019-03-12 15:20 ` [PATCH 7/7] btrfs: Factor out common extent locking code " Nikolay Borisov
2019-03-12 15:31   ` Johannes Thumshirn
2019-03-13 15:36 ` [PATCH v4 0/7] Compress path cleanups David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dedd732-b3dd-8b2c-c6f5-9cb63fc899d3@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).