linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Nikolay Borisov <kernel@kyup.com>, Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org, Marian Marinov <mm@1h.com>
Subject: Re: Lockup in wait_transaction_locked under memory pressure
Date: Tue, 30 Jun 2015 11:52:06 +1000	[thread overview]
Message-ID: <20150630015206.GL22807@dastard> (raw)
In-Reply-To: <20150629093640.GD28471@dhcp22.suse.cz>

On Mon, Jun 29, 2015 at 11:36:40AM +0200, Michal Hocko wrote:
> On Mon 29-06-15 12:01:49, Nikolay Borisov wrote:
> > Today I observed the issue again, this time on a different server. What
> > is particularly strange is the fact that the OOM wasn't triggered for
> > the cgroup, whose tasks have entered D state. There were a couple of
> > SSHD processes and an RSYNC performing some backup tasks. Here is what
> > the stacktrace for the rsync looks like:
> > 
> > crash> set 18308
> >     PID: 18308
> > COMMAND: "rsync"
> >    TASK: ffff883d7c9b0a30  [THREAD_INFO: ffff881773748000]
> >     CPU: 1
> >   STATE: TASK_UNINTERRUPTIBLE
> > crash> bt
> > PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
> >  #0 [ffff88177374ac60] __schedule at ffffffff815ab152
> >  #1 [ffff88177374acb0] schedule at ffffffff815ab76e
> >  #2 [ffff88177374acd0] schedule_timeout at ffffffff815ae5e5
> >  #3 [ffff88177374ad70] io_schedule_timeout at ffffffff815aad6a
> >  #4 [ffff88177374ada0] bit_wait_io at ffffffff815abfc6
> >  #5 [ffff88177374adb0] __wait_on_bit at ffffffff815abda5
> >  #6 [ffff88177374ae00] wait_on_page_bit at ffffffff8111fd4f
> >  #7 [ffff88177374ae50] shrink_page_list at ffffffff81135445
> 
> This is most probably wait_on_page_writeback because the reclaim has
> encountered a dirty page which is under writeback currently.

Yes, and looks at the caller path....

> >  #8 [ffff88177374af50] shrink_inactive_list at ffffffff81135845
> >  #9 [ffff88177374b060] shrink_lruvec at ffffffff81135ead
> > #10 [ffff88177374b150] shrink_zone at ffffffff811360c3
> > #11 [ffff88177374b220] shrink_zones at ffffffff81136eff
> > #12 [ffff88177374b2a0] do_try_to_free_pages at ffffffff8113712f
> > #13 [ffff88177374b300] try_to_free_mem_cgroup_pages at ffffffff811372be
> > #14 [ffff88177374b380] try_charge at ffffffff81189423
> > #15 [ffff88177374b430] mem_cgroup_try_charge at ffffffff8118c6f5
> > #16 [ffff88177374b470] __add_to_page_cache_locked at ffffffff8112137d
> > #17 [ffff88177374b4e0] add_to_page_cache_lru at ffffffff81121618
> > #18 [ffff88177374b510] pagecache_get_page at ffffffff8112170b
> > #19 [ffff88177374b560] grow_dev_page at ffffffff811c8297
> > #20 [ffff88177374b5c0] __getblk_slow at ffffffff811c91d6
> > #21 [ffff88177374b600] __getblk_gfp at ffffffff811c92c1
> > #22 [ffff88177374b630] ext4_ext_grow_indepth at ffffffff8124565c
> > #23 [ffff88177374b690] ext4_ext_create_new_leaf at ffffffff81246ca8
> > #24 [ffff88177374b6e0] ext4_ext_insert_extent at ffffffff81246f09
> > #25 [ffff88177374b750] ext4_ext_map_blocks at ffffffff8124a848
> > #26 [ffff88177374b870] ext4_map_blocks at ffffffff8121a5b7
> > #27 [ffff88177374b910] mpage_map_one_extent at ffffffff8121b1fa
> > #28 [ffff88177374b950] mpage_map_and_submit_extent at ffffffff8121f07b
> > #29 [ffff88177374b9b0] ext4_writepages at ffffffff8121f6d5
> > #30 [ffff88177374bb20] do_writepages at ffffffff8112c490
> > #31 [ffff88177374bb30] __filemap_fdatawrite_range at ffffffff81120199
> > #32 [ffff88177374bb80] filemap_flush at ffffffff8112041c

That's a potential self deadlocking path, isn't it? i.e. the
writeback path has been entered, may hold pages locked in the
current bio being built (waiting for submission), then memory
reclaim has been entered while trying to map more contiguous blocks
to submit, and that waits on page IO to complete on a page in a bio
that ext4 hasn't yet submitted?

i.e. shouldn't ext4 be doing GFP_NOFS allocations all through this
writeback path?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-06-30  1:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <558BD447.1010503@kyup.com>
2015-06-25 10:16 ` Lockup in wait_transaction_locked under memory pressure Nikolay Borisov
2015-06-25 11:21   ` Michal Hocko
2015-06-25 11:43     ` Nikolay Borisov
2015-06-25 11:50       ` Michal Hocko
2015-06-25 12:05         ` Nikolay Borisov
2015-06-25 13:29         ` Nikolay Borisov
2015-06-25 13:45           ` Michal Hocko
2015-06-25 13:54             ` Nikolay Borisov
2015-06-25 13:58               ` Michal Hocko
2015-06-25 13:31         ` Theodore Ts'o
2015-06-25 13:49           ` Nikolay Borisov
2015-06-25 14:05             ` Michal Hocko
2015-06-25 14:34               ` Nikolay Borisov
2015-06-25 15:18                 ` Michal Hocko
2015-06-25 15:27                   ` Nikolay Borisov
2015-06-29  8:32                     ` Michal Hocko
2015-06-29  9:07                       ` Nikolay Borisov
2015-06-29  9:16                         ` Michal Hocko
2015-06-29  9:23                           ` Nikolay Borisov
2015-06-29  9:38                             ` Michal Hocko
2015-06-29 10:21                               ` Nikolay Borisov
2015-06-29 11:44                                 ` Michal Hocko
2015-06-25 14:45             ` Theodore Ts'o
2015-06-25 13:57           ` Michal Hocko
2015-06-29  9:01           ` Nikolay Borisov
2015-06-29  9:36             ` Michal Hocko
2015-06-30  1:52               ` Dave Chinner [this message]
2015-06-30  3:02                 ` Theodore Ts'o
2015-06-30  6:35                   ` Nikolay Borisov
2015-06-30 12:30                 ` Michal Hocko
2015-06-30 14:31                   ` Michal Hocko
2015-06-30 22:58                     ` Dave Chinner
2015-07-01  6:10                       ` Michal Hocko
2015-07-01 11:13                         ` Theodore Ts'o
2015-07-01 14:21                           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150630015206.GL22807@dastard \
    --to=david@fromorbit.com \
    --cc=kernel@kyup.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mm@1h.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).