* ext3 writing of data before metadata in ordered mode
@ 2009-10-25 21:33 Joel Fernandes
2009-10-26 4:40 ` Mulyadi Santosa
2009-10-26 13:19 ` Josef Bacik
0 siblings, 2 replies; 6+ messages in thread
From: Joel Fernandes @ 2009-10-25 21:33 UTC (permalink / raw)
To: linux-fsdevel, kernelnewbies
In data=ordered mode the ext3_ordered_commit_write function marks the
buffers as dirty, how then does the JBD ensure that the data is
written before the metadata? Once the data buffers are marked as
dirty, JBD doesn't have control anymore over when the data is written
is actually written to disk right? Because the actually writing of the
data is handled by the page wtriteback mechanism (pdflush) right?
I might be missing something here, thanks for your time and patience.
-Joel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode
2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes
@ 2009-10-26 4:40 ` Mulyadi Santosa
2009-10-26 7:17 ` Joel Fernandes
2009-10-26 13:19 ` Josef Bacik
1 sibling, 1 reply; 6+ messages in thread
From: Mulyadi Santosa @ 2009-10-26 4:40 UTC (permalink / raw)
To: Joel Fernandes; +Cc: linux-fsdevel, kernelnewbies
Hi Joel...
On Mon, Oct 26, 2009 at 4:33 AM, Joel Fernandes <agnel.joel@gmail.com> wrote:
> In data=ordered mode the ext3_ordered_commit_write function marks the
> buffers as dirty, how then does the JBD ensure that the data is
> written before the metadata? Once the data buffers are marked as
> dirty, JBD doesn't have control anymore over when the data is written
> is actually written to disk right? Because the actually writing of the
> data is handled by the page wtriteback mechanism (pdflush) right?
I am not an expert, but here's my thought:
I think writing to backing device is not done simply marking the
buffer/page cache dirty. So, I think what kernel does is first prepare
an I/O queue to update ext3 journal. Since we talk about data=ordered
here, only metadata are logged.
Perhaps the key here is, metadata writing is done as a async
completion handler of data writing handler. Thus, data is written
first, followed by metadata logging
Another possibility is composing a single atomic I/O writing request,
composed of data writing and metadata logging. Thus, I/O scheduler
won't be able to re-order the request and must complete the sequence
as we prepared.
--
regards,
Mulyadi Santosa
Freelance Linux trainer and consultant
blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode
2009-10-26 4:40 ` Mulyadi Santosa
@ 2009-10-26 7:17 ` Joel Fernandes
0 siblings, 0 replies; 6+ messages in thread
From: Joel Fernandes @ 2009-10-26 7:17 UTC (permalink / raw)
To: Mulyadi Santosa; +Cc: linux-fsdevel, kernelnewbies
Hi Mulyadi,
Thanks for your opinion. Well if you ask me, JBD and the I/O scheduler
are 2 independent layers, so don't think the ordering of the data and
metadata is done at that level. But there is something about the data
completion handler you're talking about - I think.
Simplistically,
During a write() In data=ordered mode:
1. During updating of metadata (before the data is copied), the kernel
updates the metadata buffers and moves the metadata block to a list in
the active trasaction (which is going to be logged).
2. Then the actually data buffers (memory) are updated with the contents.
3. Then journal_dirty_data is called on each affected data buffer
(this apparently ensures that data is written before the metadata - I
don't know how)
4. And then the block buffers are committed (marked as dirty so that
the page flushing mechanism can send them to disk).
Now steps 3 and 4 seem to be independent therefore I don't know how
step 3 knows when step 4 completes? The only way I can think of is
step 4 sends calls a callback after its done to step 3 somehow?
Let me know if the above analysis makes sense, Thanks.
-Joel
On Sun, Oct 25, 2009 at 9:40 PM, Mulyadi Santosa
<mulyadi.santosa@gmail.com> wrote:
> Hi Joel...
>
> On Mon, Oct 26, 2009 at 4:33 AM, Joel Fernandes <agnel.joel@gmail.com> wrote:
>> In data=ordered mode the ext3_ordered_commit_write function marks the
>> buffers as dirty, how then does the JBD ensure that the data is
>> written before the metadata? Once the data buffers are marked as
>> dirty, JBD doesn't have control anymore over when the data is written
>> is actually written to disk right? Because the actually writing of the
>> data is handled by the page wtriteback mechanism (pdflush) right?
>
> I am not an expert, but here's my thought:
>
> I think writing to backing device is not done simply marking the
> buffer/page cache dirty. So, I think what kernel does is first prepare
> an I/O queue to update ext3 journal. Since we talk about data=ordered
> here, only metadata are logged.
>
> Perhaps the key here is, metadata writing is done as a async
> completion handler of data writing handler. Thus, data is written
> first, followed by metadata logging
>
> Another possibility is composing a single atomic I/O writing request,
> composed of data writing and metadata logging. Thus, I/O scheduler
> won't be able to re-order the request and must complete the sequence
> as we prepared.
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode
2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes
2009-10-26 4:40 ` Mulyadi Santosa
@ 2009-10-26 13:19 ` Josef Bacik
2009-10-26 17:21 ` Joel Fernandes
1 sibling, 1 reply; 6+ messages in thread
From: Josef Bacik @ 2009-10-26 13:19 UTC (permalink / raw)
To: Joel Fernandes; +Cc: linux-fsdevel, kernelnewbies
On Sun, Oct 25, 2009 at 02:33:59PM -0700, Joel Fernandes wrote:
> In data=ordered mode the ext3_ordered_commit_write function marks the
> buffers as dirty, how then does the JBD ensure that the data is
> written before the metadata? Once the data buffers are marked as
> dirty, JBD doesn't have control anymore over when the data is written
> is actually written to disk right? Because the actually writing of the
> data is handled by the page wtriteback mechanism (pdflush) right?
>
> I might be missing something here, thanks for your time and patience.
>
ordered mode means we don't care when the data gets flushed out, just so long as
it happens before we do metadata. So we mark the buffer as dirty, which is
appropriate, so that if pdflush decides that it needs to start flushing dirty
data it can. We also add the buffer to the transactions t_sync_datalist list so
we know all of the data buffers that were modified in this transaction. So when
we go to commit the transaction we go through this list writing out all of the
dirty buffers on that list. If we hit a buffer that is not dirty we know its
already been written out and we can move on to the next one. Then after all
this is done we go through the list of metadata that was modified in that
transaction, write out the journal entries, and then mark the metadata as dirty
so it can be written out at some point in the future. Let me know if that makes
sense. Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode
2009-10-26 13:19 ` Josef Bacik
@ 2009-10-26 17:21 ` Joel Fernandes
2009-10-26 17:58 ` Josef Bacik
0 siblings, 1 reply; 6+ messages in thread
From: Joel Fernandes @ 2009-10-26 17:21 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-fsdevel, kernelnewbies
Hi Josef, Your analysis makes perfect sense. Thank you so much.
Another question, what could explain the slowness in data=ordered
mode? I believe everything is asynchronous right? various lists are
maintained, and kjournald keeps checking theses lists and flushing
data before metadata written and marked dirty as you said. Is the
slowness because the flushing of data is done earlier than required
unlike when done by pdflush which waits for a certain amount of time?
Regards,
-Joel
On Mon, Oct 26, 2009 at 6:19 AM, Josef Bacik <josef@redhat.com> wrote:
> On Sun, Oct 25, 2009 at 02:33:59PM -0700, Joel Fernandes wrote:
>> In data=ordered mode the ext3_ordered_commit_write function marks the
>> buffers as dirty, how then does the JBD ensure that the data is
>> written before the metadata? Once the data buffers are marked as
>> dirty, JBD doesn't have control anymore over when the data is written
>> is actually written to disk right? Because the actually writing of the
>> data is handled by the page wtriteback mechanism (pdflush) right?
>>
>> I might be missing something here, thanks for your time and patience.
>>
>
> ordered mode means we don't care when the data gets flushed out, just so long as
> it happens before we do metadata. So we mark the buffer as dirty, which is
> appropriate, so that if pdflush decides that it needs to start flushing dirty
> data it can. We also add the buffer to the transactions t_sync_datalist list so
> we know all of the data buffers that were modified in this transaction. So when
> we go to commit the transaction we go through this list writing out all of the
> dirty buffers on that list. If we hit a buffer that is not dirty we know its
> already been written out and we can move on to the next one. Then after all
> this is done we go through the list of metadata that was modified in that
> transaction, write out the journal entries, and then mark the metadata as dirty
> so it can be written out at some point in the future. Let me know if that makes
> sense. Thanks,
>
> Josef
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode
2009-10-26 17:21 ` Joel Fernandes
@ 2009-10-26 17:58 ` Josef Bacik
0 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2009-10-26 17:58 UTC (permalink / raw)
To: Joel Fernandes; +Cc: Josef Bacik, linux-fsdevel, kernelnewbies
On Mon, Oct 26, 2009 at 10:21:52AM -0700, Joel Fernandes wrote:
> Hi Josef, Your analysis makes perfect sense. Thank you so much.
>
> Another question, what could explain the slowness in data=ordered
> mode? I believe everything is asynchronous right? various lists are
> maintained, and kjournald keeps checking theses lists and flushing
> data before metadata written and marked dirty as you said. Is the
> slowness because the flushing of data is done earlier than required
> unlike when done by pdflush which waits for a certain amount of time?
>
I'm not sure what slowness you are talking about, but I will assume you mean the
slowness of committing a transaction. Basically everything that has happened
since the last journal commit must be taken care of. So all data that has been
written needs to be written out synchronously, and then its metadata written to
the journal, and then we can let things start going again while the metadata is
written to where its supposed to asynchronously. The key part of that is _all_
data needs to be written out. This is slow compared to Ext4 because with Ext4
we have delayed allocation, so even though we may have dirtied alot of pages
since the last transaction has occured, they may not have been allocated yet, so
no metadata has been changed yet, so we don't have to force the flushing of the
data out to disk, so the journal commit takes much less time because there is
much less work to do. I hope that answers your question. Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-26 17:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes
2009-10-26 4:40 ` Mulyadi Santosa
2009-10-26 7:17 ` Joel Fernandes
2009-10-26 13:19 ` Josef Bacik
2009-10-26 17:21 ` Joel Fernandes
2009-10-26 17:58 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).