Re: [RFC PATCH] fs: ext4: don't trap kswapd and allocating tasks on ext4 inode IO

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.com>, "Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [RFC PATCH] fs: ext4: don't trap kswapd and allocating tasks on ext4 inode IO
Date: Tue, 16 May 2017 16:36:45 +0200	[thread overview]
Message-ID: <20170516143645.GA7316@quack2.suse.cz> (raw)
In-Reply-To: <20170515154634.19733-1-hannes@cmpxchg.org>

On Mon 15-05-17 11:46:34, Johannes Weiner wrote:
> We have observed across several workloads situations where kswapd and
> direct reclaimers get stuck in the inode shrinker of the ext4 / mount,
> causing allocation latencies across tasks in the system, while there
> are dozens of gigabytes of clean page cache covering multiple disks.
> 
> The stack traces of such an instance looks like this:
> 
> [<ffffffff812b3225>] jbd2_log_wait_commit+0x95/0x110
> [<ffffffff812b4f29>] jbd2_complete_transaction+0x59/0x90
> [<ffffffff812668da>] ext4_evict_inode+0x2da/0x480
> [<ffffffff811f2230>] evict+0xc0/0x190
> [<ffffffff811f2339>] dispose_list+0x39/0x50
> [<ffffffff811f323b>] prune_icache_sb+0x4b/0x60
> [<ffffffff811dba71>] super_cache_scan+0x141/0x190
> [<ffffffff8116e755>] shrink_slab+0x235/0x440
> [<ffffffff81172b48>] shrink_zone+0x268/0x2d0
> [<ffffffff81172f04>] do_try_to_free_pages+0x164/0x410
> [<ffffffff81173265>] try_to_free_pages+0xb5/0x160
> [<ffffffff811656b6>] __alloc_pages_nodemask+0x636/0xb30
> [<ffffffff811acac8>] alloc_pages_current+0x88/0x120
> [<ffffffff816d4e46>] skb_page_frag_refill+0xc6/0xf0
> [<ffffffff816d4e8d>] sk_page_frag_refill+0x1d/0x80
> [<ffffffff8173f86b>] tcp_sendmsg+0x28b/0xb10
> [<ffffffff81769727>] inet_sendmsg+0x67/0xa0
> [<ffffffff816d0488>] sock_sendmsg+0x38/0x50
> [<ffffffff816d0518>] sock_write_iter+0x78/0xd0
> [<ffffffff811d774e>] do_iter_readv_writev+0x5e/0xa0
> [<ffffffff811d8468>] do_readv_writev+0x178/0x210
> [<ffffffff811d871c>] vfs_writev+0x3c/0x50
> [<ffffffff811d8782>] do_writev+0x52/0xd0
> [<ffffffff811d9810>] SyS_writev+0x10/0x20
> [<ffffffff81002910>] do_syscall_64+0x50/0xa0
> [<ffffffff817eed3c>] return_from_SYSCALL_64+0x0/0x6a
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> The inode shrinker has provisions to skip any inodes that require
> writeback, to avoid tarpitting the entire system behind a single
> object when there are many other pools to recycle memory from. But
> that logic doesn't cover the situation where an ext4 inode is clean
> but journaled and tied to a commit that yet needs to hit the platter.
> 
> Add a superblock operation that lets the generic inode shrinker query
> the filesystem whether evicting a given inode will require any IO; add
> an ext4 implementation that checks whether the journal is caught up to
> the commit id associated with the inode.
> 
> Fixes: 2d859db3e4a8 ("ext4: fix data corruption in inodes with journalled data")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

OK. I have to say I'm somewhat surprised you use data journalling on some
of your files / filesystems but whatever - maybe these are long symlink
after all which would make sense. And I'm actually doubly surprised you can
see these stack traces as these days inode_lru_isolate() checks
inode->i_data.nrpages and uncommitted pages cannot be evicted from
pagecache (ext4_releasepage() will refuse to free them) so I don't see how
such inode can get to dispose_list(). But maybe the inode doesn't really
have any pages and i_datasync_tid just happens to be set to the current
transaction because it is initialized that way and we are evicting inode
that was recently read from disk.

Anyway if you add: "&& inode->i_data.nrpages" to the test in
ext4_evict_inode() do the stalls go away?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2017-05-16 14:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-15 15:46 [RFC PATCH] fs: ext4: don't trap kswapd and allocating tasks on ext4 inode IO Johannes Weiner
2017-05-16  1:34 ` [RFC PATCH] " Andreas Dilger
2017-05-16 14:36 ` Jan Kara [this message]
2017-05-16 15:41   ` [RFC PATCH] fs: " Johannes Weiner
2017-05-16 16:03     ` Jan Kara
2017-06-12  8:09       ` Jan Kara
2017-06-12 14:37         ` Johannes Weiner
2017-06-12 15:19           ` Jan Kara
2017-06-12 21:51             ` Johannes Weiner
2017-07-11  7:16         ` Jerry Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170516143645.GA7316@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.com \
    --cc=kernel-team@fb.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox