* ext3 writing of data before metadata in ordered mode @ 2009-10-25 21:33 Joel Fernandes 2009-10-26 4:40 ` Mulyadi Santosa 2009-10-26 13:19 ` Josef Bacik 0 siblings, 2 replies; 6+ messages in thread From: Joel Fernandes @ 2009-10-25 21:33 UTC (permalink / raw) To: linux-fsdevel, kernelnewbies In data=ordered mode the ext3_ordered_commit_write function marks the buffers as dirty, how then does the JBD ensure that the data is written before the metadata? Once the data buffers are marked as dirty, JBD doesn't have control anymore over when the data is written is actually written to disk right? Because the actually writing of the data is handled by the page wtriteback mechanism (pdflush) right? I might be missing something here, thanks for your time and patience. -Joel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode 2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes @ 2009-10-26 4:40 ` Mulyadi Santosa 2009-10-26 7:17 ` Joel Fernandes 2009-10-26 13:19 ` Josef Bacik 1 sibling, 1 reply; 6+ messages in thread From: Mulyadi Santosa @ 2009-10-26 4:40 UTC (permalink / raw) To: Joel Fernandes; +Cc: linux-fsdevel, kernelnewbies Hi Joel... On Mon, Oct 26, 2009 at 4:33 AM, Joel Fernandes <agnel.joel@gmail.com> wrote: > In data=ordered mode the ext3_ordered_commit_write function marks the > buffers as dirty, how then does the JBD ensure that the data is > written before the metadata? Once the data buffers are marked as > dirty, JBD doesn't have control anymore over when the data is written > is actually written to disk right? Because the actually writing of the > data is handled by the page wtriteback mechanism (pdflush) right? I am not an expert, but here's my thought: I think writing to backing device is not done simply marking the buffer/page cache dirty. So, I think what kernel does is first prepare an I/O queue to update ext3 journal. Since we talk about data=ordered here, only metadata are logged. Perhaps the key here is, metadata writing is done as a async completion handler of data writing handler. Thus, data is written first, followed by metadata logging Another possibility is composing a single atomic I/O writing request, composed of data writing and metadata logging. Thus, I/O scheduler won't be able to re-order the request and must complete the sequence as we prepared. -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode 2009-10-26 4:40 ` Mulyadi Santosa @ 2009-10-26 7:17 ` Joel Fernandes 0 siblings, 0 replies; 6+ messages in thread From: Joel Fernandes @ 2009-10-26 7:17 UTC (permalink / raw) To: Mulyadi Santosa; +Cc: linux-fsdevel, kernelnewbies Hi Mulyadi, Thanks for your opinion. Well if you ask me, JBD and the I/O scheduler are 2 independent layers, so don't think the ordering of the data and metadata is done at that level. But there is something about the data completion handler you're talking about - I think. Simplistically, During a write() In data=ordered mode: 1. During updating of metadata (before the data is copied), the kernel updates the metadata buffers and moves the metadata block to a list in the active trasaction (which is going to be logged). 2. Then the actually data buffers (memory) are updated with the contents. 3. Then journal_dirty_data is called on each affected data buffer (this apparently ensures that data is written before the metadata - I don't know how) 4. And then the block buffers are committed (marked as dirty so that the page flushing mechanism can send them to disk). Now steps 3 and 4 seem to be independent therefore I don't know how step 3 knows when step 4 completes? The only way I can think of is step 4 sends calls a callback after its done to step 3 somehow? Let me know if the above analysis makes sense, Thanks. -Joel On Sun, Oct 25, 2009 at 9:40 PM, Mulyadi Santosa <mulyadi.santosa@gmail.com> wrote: > Hi Joel... > > On Mon, Oct 26, 2009 at 4:33 AM, Joel Fernandes <agnel.joel@gmail.com> wrote: >> In data=ordered mode the ext3_ordered_commit_write function marks the >> buffers as dirty, how then does the JBD ensure that the data is >> written before the metadata? Once the data buffers are marked as >> dirty, JBD doesn't have control anymore over when the data is written >> is actually written to disk right? Because the actually writing of the >> data is handled by the page wtriteback mechanism (pdflush) right? > > I am not an expert, but here's my thought: > > I think writing to backing device is not done simply marking the > buffer/page cache dirty. So, I think what kernel does is first prepare > an I/O queue to update ext3 journal. Since we talk about data=ordered > here, only metadata are logged. > > Perhaps the key here is, metadata writing is done as a async > completion handler of data writing handler. Thus, data is written > first, followed by metadata logging > > Another possibility is composing a single atomic I/O writing request, > composed of data writing and metadata logging. Thus, I/O scheduler > won't be able to re-order the request and must complete the sequence > as we prepared. > > -- > regards, > > Mulyadi Santosa > Freelance Linux trainer and consultant > > blog: the-hydra.blogspot.com > training: mulyaditraining.blogspot.com > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode 2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes 2009-10-26 4:40 ` Mulyadi Santosa @ 2009-10-26 13:19 ` Josef Bacik 2009-10-26 17:21 ` Joel Fernandes 1 sibling, 1 reply; 6+ messages in thread From: Josef Bacik @ 2009-10-26 13:19 UTC (permalink / raw) To: Joel Fernandes; +Cc: linux-fsdevel, kernelnewbies On Sun, Oct 25, 2009 at 02:33:59PM -0700, Joel Fernandes wrote: > In data=ordered mode the ext3_ordered_commit_write function marks the > buffers as dirty, how then does the JBD ensure that the data is > written before the metadata? Once the data buffers are marked as > dirty, JBD doesn't have control anymore over when the data is written > is actually written to disk right? Because the actually writing of the > data is handled by the page wtriteback mechanism (pdflush) right? > > I might be missing something here, thanks for your time and patience. > ordered mode means we don't care when the data gets flushed out, just so long as it happens before we do metadata. So we mark the buffer as dirty, which is appropriate, so that if pdflush decides that it needs to start flushing dirty data it can. We also add the buffer to the transactions t_sync_datalist list so we know all of the data buffers that were modified in this transaction. So when we go to commit the transaction we go through this list writing out all of the dirty buffers on that list. If we hit a buffer that is not dirty we know its already been written out and we can move on to the next one. Then after all this is done we go through the list of metadata that was modified in that transaction, write out the journal entries, and then mark the metadata as dirty so it can be written out at some point in the future. Let me know if that makes sense. Thanks, Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode 2009-10-26 13:19 ` Josef Bacik @ 2009-10-26 17:21 ` Joel Fernandes 2009-10-26 17:58 ` Josef Bacik 0 siblings, 1 reply; 6+ messages in thread From: Joel Fernandes @ 2009-10-26 17:21 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-fsdevel, kernelnewbies Hi Josef, Your analysis makes perfect sense. Thank you so much. Another question, what could explain the slowness in data=ordered mode? I believe everything is asynchronous right? various lists are maintained, and kjournald keeps checking theses lists and flushing data before metadata written and marked dirty as you said. Is the slowness because the flushing of data is done earlier than required unlike when done by pdflush which waits for a certain amount of time? Regards, -Joel On Mon, Oct 26, 2009 at 6:19 AM, Josef Bacik <josef@redhat.com> wrote: > On Sun, Oct 25, 2009 at 02:33:59PM -0700, Joel Fernandes wrote: >> In data=ordered mode the ext3_ordered_commit_write function marks the >> buffers as dirty, how then does the JBD ensure that the data is >> written before the metadata? Once the data buffers are marked as >> dirty, JBD doesn't have control anymore over when the data is written >> is actually written to disk right? Because the actually writing of the >> data is handled by the page wtriteback mechanism (pdflush) right? >> >> I might be missing something here, thanks for your time and patience. >> > > ordered mode means we don't care when the data gets flushed out, just so long as > it happens before we do metadata. So we mark the buffer as dirty, which is > appropriate, so that if pdflush decides that it needs to start flushing dirty > data it can. We also add the buffer to the transactions t_sync_datalist list so > we know all of the data buffers that were modified in this transaction. So when > we go to commit the transaction we go through this list writing out all of the > dirty buffers on that list. If we hit a buffer that is not dirty we know its > already been written out and we can move on to the next one. Then after all > this is done we go through the list of metadata that was modified in that > transaction, write out the journal entries, and then mark the metadata as dirty > so it can be written out at some point in the future. Let me know if that makes > sense. Thanks, > > Josef > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext3 writing of data before metadata in ordered mode 2009-10-26 17:21 ` Joel Fernandes @ 2009-10-26 17:58 ` Josef Bacik 0 siblings, 0 replies; 6+ messages in thread From: Josef Bacik @ 2009-10-26 17:58 UTC (permalink / raw) To: Joel Fernandes; +Cc: Josef Bacik, linux-fsdevel, kernelnewbies On Mon, Oct 26, 2009 at 10:21:52AM -0700, Joel Fernandes wrote: > Hi Josef, Your analysis makes perfect sense. Thank you so much. > > Another question, what could explain the slowness in data=ordered > mode? I believe everything is asynchronous right? various lists are > maintained, and kjournald keeps checking theses lists and flushing > data before metadata written and marked dirty as you said. Is the > slowness because the flushing of data is done earlier than required > unlike when done by pdflush which waits for a certain amount of time? > I'm not sure what slowness you are talking about, but I will assume you mean the slowness of committing a transaction. Basically everything that has happened since the last journal commit must be taken care of. So all data that has been written needs to be written out synchronously, and then its metadata written to the journal, and then we can let things start going again while the metadata is written to where its supposed to asynchronously. The key part of that is _all_ data needs to be written out. This is slow compared to Ext4 because with Ext4 we have delayed allocation, so even though we may have dirtied alot of pages since the last transaction has occured, they may not have been allocated yet, so no metadata has been changed yet, so we don't have to force the flushing of the data out to disk, so the journal commit takes much less time because there is much less work to do. I hope that answers your question. Thanks, Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-26 17:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-25 21:33 ext3 writing of data before metadata in ordered mode Joel Fernandes 2009-10-26 4:40 ` Mulyadi Santosa 2009-10-26 7:17 ` Joel Fernandes 2009-10-26 13:19 ` Josef Bacik 2009-10-26 17:21 ` Joel Fernandes 2009-10-26 17:58 ` Josef Bacik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).