All of lore.kernel.org
 help / color / mirror / Atom feed
From: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/3] add releasepage hooks to block devices which can be used by file systems
Date: Tue, 06 Jan 2009 13:07:27 +0900	[thread overview]
Message-ID: <4962D8FF.2000409@jp.fujitsu.com> (raw)
In-Reply-To: <20090105160514.GA8939@mit.edu>

Ted-san,

Theodore Tso wrote:
 > On Mon, Jan 05, 2009 at 05:16:08PM +0900, Toshiyuki Okajima wrote:
 > > >
 > > > I was confirming whether the kernel to which your new patch is
 > > > applied can run without trouble. But unfortunately, I got a hangup
 > > > problem. Now I am investigating the root cause.  After I
 > > > investigated it for a little time, I think calling log_wait_commit()
 > > > from journal_try_to_free_buffers() can cause it.
 >
 > Sounds like a deadlock caused by the fact that we're no longer masking
 > __GFP_WAIT, probably on journal->j_wait_done_commit.  Presumably the
 > system came under pressure during a commit operation, which makes
 > sense, and so we ended up with a deadlock between kjournald and
 > kswapd.  The fix is pretty simple; we just need to mask out the
 > __GFP_WAIT in the filesystem-specific callback, since this is a
 > restriction imposed by the filesystem's use of the jbd/jbd2 layer.

Your opinion is correct.
A detailed investigation is done, and the root cause has been understood.

The deadlock was caused by the following two processes:

(1) A certain process
Memory collecting process which is started by a memory allocator calls
journal_try_to_free_buffers(). And then it calls log_wait_commit() to get more
memory and waits for the finish of one committing transaction.

(2) kjournald process
kjournald process starts by Process (1) calling log_wait_commit().
And then it calls journal_commit_transaction to write all data buffers
into the filesystem and write all metadata buffers into the journal storage.
Writing metadata buffer is journal_write_metadata_buffer(). This function also needs
new buffer_head (more memory) in order to copy a buffer_head.

Detailed Information:
Process (1):
crash> bt 260
PID: 260    TASK: f71076d0  CPU: 1   COMMAND: "kswapd0"
  #0 [f707dcbc] schedule at c06346a3
  #1 [f707dd34] log_wait_commit at f80904c1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -> It lets kjournald start
                                                and waits for the commit.
  #2 [f707dd70] journal_try_to_free_buffers at f808c81f
  #3 [f707dd94] blkdev_releasepage at c04916cc
  #4 [f707dda4] try_to_release_page at c04526b1
  #5 [f707ddb0] shrink_page_list at c045b3d1
  #6 [f707de50] shrink_list at c045b72e
  #7 [f707def0] shrink_zone at c045bbc6
  #8 [f707df40] kswapd at c045c12c
  #9 [f707dfd8] kthread at c043612c
#10 [f707dfe4] kernel_thread_helper at c04045e1

journal structure: 0xccab1e00

Process (2) [kjournald]:
PID: 3170   TASK: f717b240  CPU: 1   COMMAND: "kjournald"
  #0 [c42b4cf4] schedule at c06346a3
  #1 [c42b4d6c] schedule_timeout at c06349ef
  #2 [c42b4d90] io_schedule_timeout at c0633e0f
  #3 [c42b4da0] congestion_wait at c045d7ee
  #4 [c42b4dc8] try_to_free_pages at c045c82a
  #5 [c42b4e2c] __alloc_pages_internal at c04579fc
  #6 [c42b4e70] cache_alloc_refill at c0471235
  #7 [c42b4ec0] kmem_cache_alloc at c0470fa8
  #8 [c42b4ed4] alloc_buffer_head at c048c06b
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-> It tries to get a buffer but
                                               cannot get one. Because memory
                                               collectors (include: process (1))
                                               cannot go farther.
  #9 [c42b4edc] journal_write_metadata_buffer at f8090eb6
#10 [c42b4f10] journal_commit_transaction at f808df80
#11 [c42b4f98] kjournald at f809089d
#12 [c42b4fd8] kthread at c043612c
#13 [c42b4fe4] kernel_thread_helper at c04045e1

journal structure: 0xccab1e00

Additional Information:
The process by which the trigger of a deadlock is pulled is not only kswapd.
[1]
PID: 1800   TASK: f7379b60  CPU: 1   COMMAND: "rsyslogd"
  #0 [f61c3bfc] schedule at c06346a3
  #1 [f61c3c74] log_wait_commit at f80904c1
  #2 [f61c3cb0] journal_try_to_free_buffers at f808c81f
  #3 [f61c3cd4] blkdev_releasepage at c04916cc
  #4 [f61c3ce4] try_to_release_page at c04526b1
  #5 [f61c3cf0] shrink_page_list at c045b3d1
  #6 [f61c3d90] shrink_list at c045b72e
  #7 [f61c3e30] shrink_zone at c045bbc6
  #8 [f61c3e80] try_to_free_pages at c045c787
  #9 [f61c3ee4] __alloc_pages_internal at c04579fc
#10 [f61c3f28] __get_free_pages at c0457bac
#11 [f61c3f30] copy_process at c0425823
#12 [f61c3f68] do_fork at c042674b
#13 [f61c3fa4] sys_clone at c0402399
#14 [f61c3fb4] system_call at c0403893
     EAX: ffffffda  EBX: 003d0f00  ECX: b7fcd4b4  EDX: b7fcdbd8
     DS:  007b      ESI: b6fcb16c  ES:  007b      EDI: b7fcdbd8
     SS:  007b      ESP: b6fcb100  EBP: b6fcb198
     CS:  0073      EIP: 00d271f8  ERR: 00000078  EFLAGS: 00000296

[2]
PID: 1990   TASK: f70c6000  CPU: 0   COMMAND: "pcscd"
  #0 [f6078be0] schedule at c06346a3
  #1 [f6078c58] log_wait_commit at f80904c1
  #2 [f6078c94] journal_try_to_free_buffers at f808c81f
  #3 [f6078cb8] blkdev_releasepage at c04916cc
  #4 [f6078cc8] try_to_release_page at c04526b1
  #5 [f6078cd4] shrink_page_list at c045b3d1
  #6 [f6078d74] shrink_list at c045b72e
  #7 [f6078e14] shrink_zone at c045bbc6
  #8 [f6078e64] try_to_free_pages at c045c787
  #9 [f6078ec8] __alloc_pages_internal at c04579fc
#10 [f6078f0c] cache_alloc_refill at c0471235
#11 [f6078f5c] kmem_cache_alloc at c0470fa8
#12 [f6078f70] getname at c047b71c
#13 [f6078f88] do_sys_open at c04729d2
#14 [f6078fa0] sys_open at c0472ab6
#15 [f6078fb4] ia32_sysenter_target at c04037da
     EAX: 00000005  EBX: 006a2700  ECX: 00098800  EDX: 00000000
     DS:  007b      ESI: 006a2700  ES:  007b      EDI: 00000000
     SS:  007b      ESP: b801d0f8  EBP: b801d188
     CS:  0073      EIP: b803f424  ERR: 00000005  EFLAGS: 00000202
...

Regards,
Toshiyuki Okajima


  reply	other threads:[~2009-01-06  4:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-02 11:06 [BUG][PATCH 1/4] ext3: fix a cause of __schedule_bug via blkdev_releasepage Toshiyuki Okajima
2008-12-08 14:01 ` Theodore Tso
2008-12-08 14:06   ` [PATCH -V2] ext3: provide function to release metadata pages under memory pressure Theodore Ts'o
2008-12-08 14:06     ` [PATCH -V2] ext4: " Theodore Ts'o
2008-12-12  0:54   ` [BUG][PATCH 1/4] ext3: fix a cause of __schedule_bug via blkdev_releasepage Toshiyuki Okajima
2008-12-12  6:21     ` Theodore Tso
2008-12-12 17:52       ` [PATCH -v3] vfs: add releasepages hooks to block devices which can be used by file systems Theodore Ts'o
2008-12-12 17:52         ` [PATCH -v3] ext3: provide function to release metadata pages under memory pressure Theodore Ts'o
2008-12-12 17:52           ` [PATCH -v3] ext4: " Theodore Ts'o
2008-12-17 15:39         ` [PATCH -v3] vfs: add releasepages hooks to block devices which can be used by file systems Jan Kara
2008-12-18  5:15           ` Toshiyuki Okajima
2008-12-18 13:12             ` Jan Kara
2008-12-18 14:54               ` Theodore Tso
2008-12-18 16:38                 ` Jan Kara
2008-12-19  5:15               ` Toshiyuki Okajima
2008-12-26  5:01         ` Al Viro
2009-01-03 15:09           ` Theodore Ts'o
2009-01-03 15:09             ` [PATCH 1/3] add releasepage " Theodore Ts'o
2009-01-03 15:09               ` [PATCH 2/3] ext3: provide function to release metadata pages under memory pressure Theodore Ts'o
2009-01-03 15:09                 ` [PATCH 3/3] ext4: " Theodore Ts'o
2009-01-05  8:16               ` [PATCH 1/3] add releasepage hooks to block devices which can be used by file systems Toshiyuki Okajima
2009-01-05 16:05                 ` Theodore Tso
2009-01-06  4:07                   ` Toshiyuki Okajima [this message]
2009-01-06  4:29                     ` Theodore Tso
2008-12-15  2:21       ` [BUG][PATCH 1/4] ext3: fix a cause of __schedule_bug via blkdev_releasepage Toshiyuki Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4962D8FF.2000409@jp.fujitsu.com \
    --to=toshi.okajima@jp.fujitsu.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.