public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: "Lin, Ming M" <ming.m.lin@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: FIO: kjournald blocked for more than 120 seconds
Date: Tue, 17 Jun 2008 10:36:00 +0200	[thread overview]
Message-ID: <20080617083600.GE20851@kernel.dk> (raw)
In-Reply-To: <37E52D09333DE2469A03574C88DBF40F02011751@pdsmsx414.ccr.corp.intel.com>

On Tue, Jun 17 2008, Zhang, Yanmin wrote:
> >>-----Original Message-----
> >>From: Jens Axboe [mailto:jens.axboe@oracle.com]
> >>Sent: Tuesday, June 17, 2008 3:30 AM
> >>To: Lin, Ming M
> >>Cc: Zhang, Yanmin; Linux Kernel Mailing List
> >>Subject: Re: FIO: kjournald blocked for more than 120 seconds
> >>
> >>On Mon, Jun 16 2008, Lin Ming wrote:
> >>> Hi, Jens
> >>>
> >>> When runnig FIO benchmark, kjournald blocked for more than 120
> seconds.
> >>> Detailed root cause analysis and proposed solutions as below.
> >>>
> >>> Any comment is appreciated.
> >>>
> >>> Hardware Environment
> >>> ---------------------
> >>> 13 SEAGATE ST373307FC disks in a JBOD, connected by a Qlogic ISP2312
> >>> Fibe Channel HBA.
> >>>
> >>> Bug description
> >>> ----------------
> >>> fio vsync random read 4K in 13 disks, 4 processes per disk, fio
> global
> >>> paramter as below,
> >>> Tested 4 IO schedulers, issue is only seen in CFQ.
> >>>
> >>> INFO: task kjournald:20558 blocked for more than 120 seconds.
> >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>> message.
> >>> kjournald     D ffff810010820978  6712 20558      2
> >>> ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
> >>> ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
> >>> 0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
> >>> Call Trace:
> >>> [<ffffffff803ba6f2>] kobject_get+0x12/0x17
> >>> The disks of my testing machine are tagged devices, so the CFQ idle
> >>> window is disabled. In other words, the active queue of tagged
> >>> devices(cfqd->hw_tag=1) never idle for a new request.
> >>>
> >>> This causes active queue be expired immediately if it's empty,
> although
> >>> it has not run out of time. CFQ will select next queue as active
> queue.
> >>> In this testcase, there are thousands of FIO read requests in sync
> >>> queues, only a few write requests by journal_write_commit_record in
> >>> async queues.
> >>>
> >>> In the other hand, all processes use the default io class and
> priority.
> >>> They share the async queue for the same device, but have their own
> sync
> >>> queue, so the sync queue number is 4 while asyn queue number is just
> 1
> >>> for the same device.
> >>>
> >>> So sync queue has much more chances be selected as new active queue
> than
> >>> async queue.
> >>>
> >>> Sync queues do not idle and they are dispatched all the time. This
> leads
> >>> to many unfinished requests in external queue,
> >>> namely, cfqd->sync_flight > 0.
> >>>
> >>> static int cfq_dispatch_requests (...) {
> >>> 	....
> >>> 	while ((cfqq = cfq_select_queue(cfqd)) != NULL) {
> >>> 	....
> >>> 	if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq))
> >>> 		break;
> >>> 		....
> >>> 		__cfq_dispatch_requests(cfqq)
> >>> 	}
> >>> 	....
> >>> }
> >>>
> >>> When cfq_select_queue selects the async queue which includes
> kjournald's
> >>> write request, this selected async queue will never be dispatched
> since
> >>> cfqd->sync_flight > 0, so kjournald is blocked.
> >>>
> >>> Proposed 3 solutions
> >>> ------------------
> >>> 1. Do not check cfqd->sync_flight
> >>>
> >>> -               if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq))
> >>> -                       break;
> >>>
> >>> 2. If we do need to check cfqd->sync_flight, then for tagged
> devices, we
> >>> should give a little more chances to async queue to be dispatched.
> >>>
> >>> @@ -1102,7 +1102,7 @@ static int cfq_dispatch_requests(struct
> >>> request_queue *q, int force)
> >>>                                 break;
> >>>                 }
> >>>
> >>> -               if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq))
> >>> +               if (cfqd->sync_flight && !cfq_cfqq_sync(cfqq) && !
> >>> cfqd->hw_tag)
> >>>                         break;
> >>>
> >>> 3. Force write request issued by journal_write_commit_record as sync
> >>> request. As a matter of fact, it looks like most write requests
> >>> submitted by kjournald is async request. We need convert them to
> sync
> >>> requests.
> >>
> >>Thanks for the very detailed analysis of the problem, complete with
> >>suggestions. While I think that any code that does:
> >>
> >>        submit async io
> >>        wait for it
> >>
> >>should be issuing sync IO (or, better, automatically upgrade the
> request
> >>from async -> sync), we cannot rely on that.
> [YM] We can talk case by case. We could convert some important async io
> codes
>  to sync io codes at least. For example, kjournald calls
> sync_dirty_buffer what 
> we captured in this case.

I agree, we should fix the obvious cases. My point was merely that there
will probably always be missed cases, so we should attempt to handle it
in the scheduler as well. Does the below buffer patch make it any
better?

> Another case is writeback. If processes do mmapped I/O and they might
> stop in 
> page fault to wait writeback finishing. Or a buffer write might trigger
> a dirty 
> page balance. As the latest kernel is more aggressive to start
> writeback, it might 
> be an issue now.

Sync process getting stuck in async writeout is another problem of the
same variety.

diff --git a/fs/buffer.c b/fs/buffer.c
index a073f3f..1957a8f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2978,7 +2978,7 @@ int sync_dirty_buffer(struct buffer_head *bh)
 	if (test_clear_buffer_dirty(bh)) {
 		get_bh(bh);
 		bh->b_end_io = end_buffer_write_sync;
-		ret = submit_bh(WRITE, bh);
+		ret = submit_bh(WRITE_SYNC, bh);
 		wait_on_buffer(bh);
 		if (buffer_eopnotsupp(bh)) {
 			clear_buffer_eopnotsupp(bh);

-- 
Jens Axboe


  reply	other threads:[~2008-06-17  8:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-16  2:04 FIO: kjournald blocked for more than 120 seconds Lin Ming
2008-06-16 19:29 ` Jens Axboe
2008-06-17  1:40   ` Zhang, Yanmin
2008-06-17  8:36     ` Jens Axboe [this message]
2008-06-17  9:02       ` Lin Ming
2008-06-17  9:31         ` Jens Axboe
2008-06-26  9:33           ` Lin Ming
2008-06-26  9:49             ` Jens Axboe
2008-06-27  8:09               ` Lin Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080617083600.GE20851@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=yanmin.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox