Re: ext3 writing of data before metadata in ordered mode

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Joel Fernandes <agnel.joel@gmail.com>
To: Josef Bacik <josef@redhat.com>
Cc: linux-fsdevel@vger.kernel.org,
	kernelnewbies <kernelnewbies@nl.linux.org>
Subject: Re: ext3 writing of data before metadata in ordered mode
Date: Mon, 26 Oct 2009 10:21:52 -0700	[thread overview]
Message-ID: <9ff7a3bc0910261021k5d9bd5e2r69fd64bfbf882da2@mail.gmail.com> (raw)
In-Reply-To: <20091026131939.GA20565@localhost.localdomain>

Hi Josef, Your analysis makes perfect sense. Thank you so much.

Another question, what could explain the slowness in data=ordered
mode? I believe everything is asynchronous right? various lists are
maintained, and kjournald keeps checking theses lists and flushing
data before metadata written and marked dirty as you said. Is the
slowness because the flushing of data is done earlier than required
unlike when done by pdflush which waits for a certain amount of time?

Regards,
-Joel

On Mon, Oct 26, 2009 at 6:19 AM, Josef Bacik <josef@redhat.com> wrote:
> On Sun, Oct 25, 2009 at 02:33:59PM -0700, Joel Fernandes wrote:
>> In data=ordered mode the ext3_ordered_commit_write function marks the
>> buffers as dirty, how then does the JBD ensure that the data is
>> written before the metadata?  Once the data buffers are marked as
>> dirty, JBD doesn't have control anymore over when the data is written
>> is actually written to disk right? Because the actually writing of the
>> data is handled by the page wtriteback mechanism (pdflush) right?
>>
>> I might be missing something here, thanks for your time and patience.
>>
>
> ordered mode means we don't care when the data gets flushed out, just so long as
> it happens before we do metadata.  So we mark the buffer as dirty, which is
> appropriate, so that if pdflush decides that it needs to start flushing dirty
> data it can.  We also add the buffer to the transactions t_sync_datalist list so
> we know all of the data buffers that were modified in this transaction.  So when
> we go to commit the transaction we go through this list writing out all of the
> dirty buffers on that list.  If we hit a buffer that is not dirty we know its
> already been written out and we can move on to the next one.  Then after all
> this is done we go through the list of metadata that was modified in that
> transaction, write out the journal entries, and then mark the metadata as dirty
> so it can be written out at some point in the future.  Let me know if that makes
> sense.  Thanks,
>
> Josef
>

next prev parent reply	other threads:[~2009-10-26 17:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes
2009-10-26  4:40 ` Mulyadi Santosa
2009-10-26  7:17   ` Joel Fernandes
2009-10-26 13:19 ` Josef Bacik
2009-10-26 17:21   ` Joel Fernandes [this message]
2009-10-26 17:58     ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ff7a3bc0910261021k5d9bd5e2r69fd64bfbf882da2@mail.gmail.com \
    --to=agnel.joel@gmail.com \
    --cc=josef@redhat.com \
    --cc=kernelnewbies@nl.linux.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).