Re: Re: Question about ext4 excessive stall time

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: EUNBONG SONG <eunb.song@samsung.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jack@suse.cz" <jack@suse.cz>,
	"dmonakhov@openvz.org" <dmonakhov@openvz.org>,
	"gnehzuil.liu@gmail.com" <gnehzuil.liu@gmail.com>
Subject: Re: Re: Question about ext4 excessive stall time
Date: Wed, 15 May 2013 22:07:33 +0000 (GMT)	[thread overview]
Message-ID: <17112733.237501368655652616.JavaMail.weblogic@epv6ml08> (raw)


> On Wed, May 15, 2013 at 07:15:02AM +0000, EUNBONG SONG wrote:
> > I know my kernel version is so old. I just want to know why this
> > problem is happened.  Because of my kernel version is old? or
> > Because of disk ?,, If anyone knows about this problem, Could you
> > help me?

> So what's happening is this.  The CFQ I/O scheduler prioritizes reads
> over writes, since most reads are synchronous (for example, if the
> compiler is waiting for the data block from include/unistd.h, it cant
> make forward progress until it receives the data blocks; there is an
> exception for readahead blocks, but those are dealt with at a low
> priority), and most writes are synchronous (since they are issued by
> the writeback daemons, and unless we are doing an fsync, no one is
> waiting for them).
>
> The problem comes when a metadata block, usually one which is shared
> across multiple files is undergoing writeback, such as an inode table
> block or a allocation bitmap block.  The write gets issued as a low
> priority I/O operation.  Then during the the next jbd2 transaction,
> some userspace operation needs to modify that metadata block, and in
> order to do that, it has to call jbd2_journal_get_write_access().  But
> if there is heavy read traffic going on, due to some other process
> using the disk a lot, the writeback operation may end up getting
> starved, and doesn't get acted on for a very long time.
>
> But the moment a process called jbd2_journal_get_write_access(), the
> write has effectively become one which is synchronous, in that forward
> progress of at least one process is now getting blocked waiting for
> this I/O to complete, since the buffer_head is locked for writeback,
> possibly for hundreds or thousands of milliseconds, and
> jbd2_journal_get_write_access() can not proceed until it can get the
> buffer_head lock.
>
> This was discussed at least month's Linux Storage, File System, and MM
> worksthop.  The right solution is to for lock_buffer() to notice if
> the buffer head has been locked for writeback, and if so, to bump the
> write request to the head of the elevator.  Jeff Moyer is looking at
> this.
>
> The partial workaround which will be in 3.10 is that we're marking all
> metadata writes with REQ_META and REQ_PRIO.  This will cause metadata
> writebacks to be prioritized at the same priority level as synchrnous
> reads.  If there is heavy read traffic, the metadata writebacks will
> still be in competition with the reads, but at least they will
> complete.
>
> Once we get priority escalation (or priority inheritance, because what
> we're seeing here is really a classic priority inversion problem),
> then it would make sense for us to no longer set REQ_PRIO for metadata
> writebacks, so the metadata writebacks only get prioritized when they
> are blocking some process from making forward progress.  (Doing this
> will probably result in a slight performance degradation on some
> workloads, but it will improve others with a heavy read traffic and
> minimal writeback interference.  We'll want to benchmark what
> percentage of metadata writebacks require getting bumped to the head
> of the line, but I suspect it will be the right choice.)
>
> If you want to try to backport this workaround to your older kernel,
> please see commit 9f203507ed277.


Hi, Ted.
I appreciate for your fantastic explanation. It's really great and very helpful for me.
Now i can understand about this issue thanks to you.

Thanks!
EunBong

                 reply	other threads:[~2013-05-15 22:07 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17112733.237501368655652616.JavaMail.weblogic@epv6ml08 \
    --to=eunb.song@samsung.com \
    --cc=dmonakhov@openvz.org \
    --cc=gnehzuil.liu@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).