linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Alexander Polakov <apolyakov@beget.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-xfs@vger.kernel.org,
	bugzilla-daemon@bugzilla.kernel.org
Subject: Re: [Bug 192981] New: page allocation stalls
Date: Wed, 15 Feb 2017 11:05:39 -0500	[thread overview]
Message-ID: <20170215160538.GA62565@bfoster.bfoster> (raw)
In-Reply-To: <8f450abd-4e05-92d3-2533-72b05fea2012@beget.ru>

On Wed, Feb 15, 2017 at 03:56:56PM +0300, Alexander Polakov wrote:
> On 01/24/2017 12:51 AM, Andrew Morton wrote:
> > 
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > A 2100 second page allocation stall!
> > 
> 
> I think we finally figured out the problem using Tetsuo Handa's mallocwd
> patch. It seems like it is in XFS direct reclaim path.
> 
> Here's how it goes:
> 
> memory is low, rsync goes into direct reclaim, locking xfs mutex in
> xfs_reclaim_inodes_nr():
> 
> 2017-02-14T00:12:59.811447+03:00 storage9 [24646.497290] MemAlloc:
> rsync(19706) flags=0x404840 switches=8692 seq=340
> gfp=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK) order=0
> delay=6795
> 2017-02-14T00:12:59.811447+03:00 storage9 [24646.497550] rsync           R
> 2017-02-14T00:12:59.811579+03:00 storage9     0 19706   5000 0x00000000
> 2017-02-14T00:12:59.811579+03:00 storage9 [24646.497690]  ffffa4361dc36c00
> 2017-02-14T00:12:59.811591+03:00 storage9  0000000c1dc36c00
> 2017-02-14T00:12:59.811591+03:00 storage9  ffffa4361dc36c00
> 2017-02-14T00:12:59.811724+03:00 storage9  ffffa44347792580
> 2017-02-14T00:12:59.811841+03:00 storage9
> 2017-02-14T00:12:59.811841+03:00 storage9 [24646.497951]  0000000000000000
> 2017-02-14T00:12:59.811846+03:00 storage9  ffffb1cab6343458
> 2017-02-14T00:12:59.811885+03:00 storage9  ffffffffb383e799
> 2017-02-14T00:12:59.811987+03:00 storage9  0000000000000000
> 2017-02-14T00:12:59.812103+03:00 storage9
> 2017-02-14T00:12:59.812103+03:00 storage9 [24646.498208]  ffffa443ffff7a00
> 2017-02-14T00:12:59.812103+03:00 storage9  0000000000000001
> 2017-02-14T00:12:59.812104+03:00 storage9  ffffb1cab6343448
> 2017-02-14T00:12:59.812233+03:00 storage9  0000000000000002
> 2017-02-14T00:12:59.812350+03:00 storage9
> 2017-02-14T00:12:59.812475+03:00 storage9 [24646.498462] Call Trace:
> 2017-02-14T00:12:59.812610+03:00 storage9 [24646.498587]
> [<ffffffffb383e799>] ? __schedule+0x179/0x5c8
> 2017-02-14T00:12:59.812733+03:00 storage9 [24646.498718]
> [<ffffffffb383ecc2>] ? schedule+0x32/0x80
> 2017-02-14T00:12:59.812869+03:00 storage9 [24646.498846]
> [<ffffffffb3841229>] ? schedule_timeout+0x159/0x2a0
> 2017-02-14T00:12:59.812997+03:00 storage9 [24646.498977]
> [<ffffffffb30f1450>] ? add_timer_on+0x130/0x130
> 2017-02-14T00:12:59.813130+03:00 storage9 [24646.499108]
> [<ffffffffb318ff13>] ? __alloc_pages_nodemask+0xe73/0x16b0
> 2017-02-14T00:12:59.813263+03:00 storage9 [24646.499240]
> [<ffffffffb31deb2a>] ? alloc_pages_current+0x9a/0x120
> 2017-02-14T00:12:59.813388+03:00 storage9 [24646.499371]
> [<ffffffffb33d1a51>] ? xfs_buf_allocate_memory+0x171/0x2c0
> 2017-02-14T00:12:59.813564+03:00 storage9 [24646.499503]
> [<ffffffffb337c48b>] ? xfs_buf_get_map+0x18b/0x1d0
> 2017-02-14T00:12:59.813654+03:00 storage9 [24646.499634]
> [<ffffffffb337cdcb>] ? xfs_buf_read_map+0x3b/0x160
> 2017-02-14T00:12:59.813783+03:00 storage9 [24646.499765]
> [<ffffffffb33b71b0>] ? xfs_trans_read_buf_map+0x1f0/0x490
> 2017-02-14T00:12:59.813931+03:00 storage9 [24646.499897]
> [<ffffffffb3362859>] ? xfs_imap_to_bp+0x79/0x120
> 2017-02-14T00:12:59.814056+03:00 storage9 [24646.500029]
> [<ffffffffb3394a38>] ? xfs_iflush+0x118/0x380
> 2017-02-14T00:12:59.814196+03:00 storage9 [24646.500158]
> [<ffffffffb30d6130>] ? wake_atomic_t_function+0x40/0x40
> 2017-02-14T00:12:59.814314+03:00 storage9 [24646.500289]
> [<ffffffffb3385af4>] ? xfs_reclaim_inode+0x274/0x3f0
> 2017-02-14T00:12:59.814444+03:00 storage9 [24646.500421]
> [<ffffffffb3385e29>] ? xfs_reclaim_inodes_ag+0x1b9/0x2c0
> 2017-02-14T00:12:59.814573+03:00 storage9 [24646.500553]
> [<ffffffffb345c6b8>] ? radix_tree_next_chunk+0x108/0x2a0
> 2017-02-14T00:12:59.814701+03:00 storage9 [24646.500685]
> [<ffffffffb30f0e31>] ? lock_timer_base+0x51/0x70
> 2017-02-14T00:12:59.814846+03:00 storage9 [24646.500814]
> [<ffffffffb345ca4e>] ? radix_tree_gang_lookup_tag+0xae/0x180
> 2017-02-14T00:12:59.814966+03:00 storage9 [24646.500946]
> [<ffffffffb336f588>] ? xfs_perag_get_tag+0x48/0x100
> 2017-02-14T00:12:59.815104+03:00 storage9 [24646.501079]
> [<ffffffffb31b63c1>] ? __list_lru_walk_one.isra.7+0x31/0x120
> 2017-02-14T00:12:59.815231+03:00 storage9 [24646.501212]
> [<ffffffffb322aed0>] ? iget5_locked+0x240/0x240
> 2017-02-14T00:12:59.815360+03:00 storage9 [24646.501342]
> [<ffffffffb33873c1>] ? xfs_reclaim_inodes_nr+0x31/0x40
> 2017-02-14T00:12:59.815489+03:00 storage9 [24646.501472]
> [<ffffffffb3213160>] ? super_cache_scan+0x1a0/0x1b0
> 2017-02-14T00:12:59.815629+03:00 storage9 [24646.501603]
> [<ffffffffb319bf62>] ? shrink_slab+0x262/0x440
> 2017-02-14T00:12:59.815760+03:00 storage9 [24646.501734]
> [<ffffffffb319cc8b>] ? drop_slab_node+0x2b/0x60
> 2017-02-14T00:12:59.815891+03:00 storage9 [24646.501864]
> [<ffffffffb319cd02>] ? drop_slab+0x42/0x70
> 2017-02-14T00:12:59.816020+03:00 storage9 [24646.501994]
> [<ffffffffb318abc0>] ? out_of_memory+0x220/0x560
> 2017-02-14T00:12:59.816160+03:00 storage9 [24646.502122]
> [<ffffffffb31903b2>] ? __alloc_pages_nodemask+0x1312/0x16b0
> 2017-02-14T00:12:59.816285+03:00 storage9 [24646.502255]
> [<ffffffffb31deb2a>] ? alloc_pages_current+0x9a/0x120
> 2017-02-14T00:12:59.816407+03:00 storage9 [24646.502386]
> [<ffffffffb304b803>] ? pte_alloc_one+0x13/0x40
> 2017-02-14T00:12:59.816536+03:00 storage9 [24646.502517]
> [<ffffffffb31be21f>] ? handle_mm_fault+0xc7f/0x14b0
> 2017-02-14T00:12:59.816659+03:00 storage9 [24646.502648]
> [<ffffffffb30461ff>] ? __do_page_fault+0x1cf/0x5a0
> 2017-02-14T00:12:59.816851+03:00 storage9 [24646.502777]
> [<ffffffffb3843362>] ? page_fault+0x22/0x30
> 

You're in inode reclaim, blocked on a memory allocation for an inode
buffer required to flush a dirty inode. I suppose this means that the
backing buffer for the inode has already been reclaimed and must be
re-read, which ideally wouldn't have occurred before the inode is
flushed.

> But it cannot get memory, because it's low (?). So it stays blocked.
> 
> Other processes do the same but they can't get past the mutex in
> xfs_reclaim_inodes_nr():
> 
...
> Which finally leads to "Kernel panic - not syncing: Out of memory and no
> killable processes..." as no process is able to proceed.
> 
> I quickly hacked this:
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 9ef152b..8adfb0a 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1254,7 +1254,7 @@ struct xfs_inode *
>         xfs_reclaim_work_queue(mp);
>         xfs_ail_push_all(mp->m_ail);
> 
> -       return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
> &nr_to_scan);
> +       return 0; // xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
> &nr_to_scan);
>  }
> 

So you've disabled inode reclaim completely...

The bz shows you have non-default vm settings such as
'vm.vfs_cache_pressure = 200.' My understanding is that prefers
aggressive inode reclaim, yet the code workaround here is to bypass XFS
inode reclaim. Out of curiousity, have you reproduced this problem using
the default vfs_cache_pressure value (or if so, possibly moving it in
the other direction)?

Brian

> 
> We ran 2 of our machines with this patch for a night, no more lockups/stalls
> were detected.
> 
> xfsaild does its work asynchronously, so xfs_inodes don't run wild as
> confirmed by slabtop.
> 
> I put netconsole logs here: http://aplkv.beget.tech/lkml/xfs/ for anyone
> interested.
> 
> -- 
> Alexander Polakov | system software engineer | https://beget.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-02-15 16:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-192981-27@https.bugzilla.kernel.org/>
2017-01-23 21:51 ` [Bug 192981] New: page allocation stalls Andrew Morton
2017-01-30 15:11   ` Alexander Polakov
2017-02-01 15:27   ` Michal Hocko
2017-02-15 12:56   ` Alexander Polakov
2017-02-15 16:05     ` Brian Foster [this message]
2017-02-15 16:52       ` Alexander Polakov
2017-02-15 18:09         ` Brian Foster
2017-02-16 10:56           ` Alexander Polakov
2017-02-16 17:20             ` Brian Foster
2017-02-16 22:21               ` Dave Chinner
2017-02-17 11:11                 ` Tetsuo Handa
2017-02-17 23:58                   ` Dave Chinner
2017-02-17 19:05                 ` Brian Foster
2017-02-17 23:52                   ` Dave Chinner
2017-02-18 13:05                     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170215160538.GA62565@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apolyakov@beget.ru \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).