From: Theodore Tso <tytso@mit.edu>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
Arjan van de Ven <arjan@infradead.org>,
linux-kernel@vger.kernel.org, Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Thu, 2 Oct 2008 08:04:44 -0400 [thread overview]
Message-ID: <20081002120444.GA25164@mit.edu> (raw)
In-Reply-To: <20081002010315.1cda8147.akpm@linux-foundation.org>
On Thu, Oct 02, 2008 at 01:03:15AM -0700, Andrew Morton wrote:
>
> An async atime update gets recorded into the current transaction.
> kjournald is working on the committing transaction. We try to keep
> those separated, to prevent user processes from getting blocked behind
> kjournald activity.
>
This is true unless the journal gets too full, and we need to do a
checkpoint operation --- at which point, everything stops. If this
was metadata-intensive a benchmark, and the journal wasn't large
enough, this could be the problem. (And if you make the journal
bigger, then when you *do* finally get forced to do a checkpoint
operation, things get stalled for even longer.)
Arjan, is this *really* about atime updates? I thought most poeple
these days run with noatime or relatime. If people *really* want true
atime semantics, the best way to solve this problem would be to have
two dirty flags in the inode --- an "atime dirty" and a "dirty" flag.
The atime dirty bit would not actually cause the inode to get written
to disk, unless either (a) we are unmounting the filesystem, or (b) we
are trying to shrink the inode cache due to memory pressure. If when
we write the inode out to disk, only the atime dirty bit is set, we
can also skip journalling the inode table block. So if there are
people who really care about true atime semantics, without getting
killed by the I/O writes, there are some solutions we can pursue.
But if this is really about the "entangled fsync problem", where we
have a large number of processes writing a large amount of async data,
and then we have a single process writing a small amount of data and
then calling fsync(), then that's a different (and very long-standing)
problem in ext3/4. Raising the I/O priority is probably the only
thing we can do in this circumstance. We could try to do some kind of
complex priority inheritance scheme, but it would certainly be much
simpler to raise the I/O priority. We could choose a level just below
realtime priority, but the reality is that if a real-time priority is
trying to write to the filesystem, and we are doing a checkpointing
opration, we're going to be blocking the real-time process anyway, and
it will be a priority inversion. So perhaps the simplest and best
algorithm would be to use a priority level just below real-time when
doing a normal commit, but if we start to do a checkpoint, we go to
IOPRIO_CLASS_RT.
> But sometimes that doesn't work (including the place where I knowingly
> broke it). If we can find and fix the offending piece of jbd logic (a
> big if) then all is peachy.
Do we have workloads that can easily demonstrate this problem? If so,
we can add some tracing code which will allow us to see which theory
is correct, and what is actually happening.
- Ted
next prev parent reply other threads:[~2008-10-02 12:05 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-02 3:00 [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority Arjan van de Ven
2008-10-02 4:56 ` Andrew Morton
2008-10-02 6:27 ` Jens Axboe
2008-10-02 6:55 ` Andrew Morton
2008-10-02 7:45 ` Jens Axboe
2008-10-02 8:03 ` Andrew Morton
2008-10-02 8:22 ` Jens Axboe
2008-10-02 8:43 ` Andrew Morton
2008-10-02 8:46 ` Jens Axboe
2008-10-02 12:04 ` Theodore Tso [this message]
2008-10-02 13:16 ` Arjan van de Ven
2008-10-02 13:46 ` Theodore Tso
2008-10-02 14:33 ` Arjan van de Ven
2008-10-04 14:12 ` Theodore Tso
2008-10-04 17:14 ` Joseph Fannin
2008-10-04 21:27 ` Theodore Tso
2008-10-02 13:12 ` Arjan van de Ven
2008-10-02 20:24 ` Andrew Morton
2008-10-03 4:01 ` Arjan van de Ven
2008-10-03 4:23 ` Arjan van de Ven
2008-10-03 4:40 ` Andrew Morton
2008-10-03 4:43 ` Arjan van de Ven
2008-10-03 4:50 ` Andrew Morton
2008-10-03 5:00 ` Arjan van de Ven
2008-10-03 5:24 ` Andrew Morton
2008-10-03 17:21 ` Arjan van de Ven
2008-10-09 3:00 ` Theodore Tso
2008-10-09 3:38 ` Andrew Morton
2008-10-03 4:45 ` Arjan van de Ven
2008-10-02 6:57 ` Andi Kleen
2008-10-02 7:55 ` Jens Axboe
2008-10-02 9:33 ` Dave Chinner
2008-10-02 9:45 ` Jens Axboe
2008-10-02 13:14 ` Arjan van de Ven
2008-10-02 13:27 ` Jens Axboe
2008-10-02 13:36 ` Arjan van de Ven
2008-10-02 13:47 ` Jens Axboe
2008-10-02 14:26 ` Arjan van de Ven
2008-10-02 16:42 ` Jens Axboe
2008-10-02 19:04 ` Arjan van de Ven
2008-10-02 19:22 ` Jens Axboe
2008-10-02 21:37 ` Andrew Morton
2008-10-02 23:58 ` Dave Chinner
2008-10-03 0:06 ` Andrew Morton
2008-10-03 0:20 ` Andrew Morton
2008-10-02 13:05 ` Arjan van de Ven
2008-10-02 17:11 ` Jens Axboe
[not found] <bimJN-4cO-5@gated-at.bofh.it>
[not found] ` <biosl-6bq-9@gated-at.bofh.it>
[not found] ` <biqkw-aK-3@gated-at.bofh.it>
[not found] ` <birgx-1pQ-9@gated-at.bofh.it>
[not found] ` <bisPe-3xx-9@gated-at.bofh.it>
[not found] ` <bisYW-3HQ-13@gated-at.bofh.it>
2008-10-02 15:32 ` Bodo Eggert
2008-10-02 23:34 ` Dave Chinner
2008-10-04 7:45 ` Aaron Carroll
2008-10-06 3:18 ` Dave Chinner
2008-10-07 18:06 ` Jens Axboe
2008-10-07 22:22 ` Dave Chinner
2008-10-09 8:48 ` Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2007-10-15 17:46 [patch] " Arjan van de Ven
2007-10-15 18:47 ` Andrew Morton
2007-10-15 19:28 ` Jens Axboe
2007-10-22 9:10 ` Ingo Molnar
2007-10-22 9:23 ` Andrew Morton
2007-10-22 9:27 ` Ingo Molnar
2007-10-22 9:40 ` Ingo Molnar
2007-10-22 9:49 ` Andrew Morton
2007-10-15 20:13 ` Rik van Riel
2007-10-15 21:12 ` Andrew Morton
[not found] ` <473B18BA.5000709@hp.com>
2007-11-14 17:14 ` Andrew Morton
2007-11-14 17:18 ` Ingo Molnar
2007-11-14 17:51 ` Arjan van de Ven
2007-11-14 18:55 ` Ingo Molnar
2007-11-14 19:43 ` Alan D. Brunelle
2007-11-14 19:24 ` Alan D. Brunelle
2007-11-14 19:50 ` Arjan van de Ven
2007-11-14 19:56 ` Alan D. Brunelle
2007-11-16 16:25 ` Alan D. Brunelle
2007-11-16 16:40 ` Alan D. Brunelle
2007-11-16 18:35 ` Ray Lee
2007-11-16 18:39 ` Alan D. Brunelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081002120444.GA25164@mit.edu \
--to=tytso@mit.edu \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=arjan@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).