* journal start/stop in ext3_writeback_writepage()
@ 2005-02-09 16:18 Badari Pulavarty
2005-02-09 16:37 ` Stephen C. Tweedie
0 siblings, 1 reply; 11+ messages in thread
From: Badari Pulavarty @ 2005-02-09 16:18 UTC (permalink / raw)
To: linux-fsdevel; +Cc: sct, Andrew Morton
Hi,
I am trying to understand journaling code in ext3.
Can some one enlighten me, why we need journal start
and stop in ext3_writeback_writepage() ? The block
allocation is already made in prepare_write().
Whats the purpose of journal start/stop around
block_write_full_page() ? Its not flushing metadata anyway ?
Whats getting written to journal ?
Thanks,
Badari
static int ext3_writeback_writepage(struct page *page,
struct writeback_control *wbc)
{
...
handle = ext3_journal_start(inode,
ext3_writepage_trans_blocks(inode));
...
ret = block_write_full_page(page, ext3_get_block, wbc);
err = ext3_journal_stop(handle);
...
}
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: journal start/stop in ext3_writeback_writepage() 2005-02-09 16:18 journal start/stop in ext3_writeback_writepage() Badari Pulavarty @ 2005-02-09 16:37 ` Stephen C. Tweedie 2005-02-09 18:38 ` Badari Pulavarty 2005-02-10 17:39 ` Badari Pulavarty 0 siblings, 2 replies; 11+ messages in thread From: Stephen C. Tweedie @ 2005-02-09 16:37 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-fsdevel, Andrew Morton, Stephen Tweedie Hi, On Wed, 2005-02-09 at 16:18, Badari Pulavarty wrote: > I am trying to understand journaling code in ext3. > Can some one enlighten me, why we need journal start > and stop in ext3_writeback_writepage() ? The block > allocation is already made in prepare_write(). prepare_write()/commit_write() are used for write(2) writes: the data is dirtied, but not immediately queued for IO (unless you're using O_SYNC). writepage is used when you want to write the page's data to disk *immediately* --- it's used when the VM is swapping out an mmaped file, or for msync(). So when writepage comes in, there's no guarantee that we've had a previous prepare. You can, for example, use ftruncate() to create a large hole in a file, and then mmap() it; if you then dirty a page, then the allocation occurs in the writepage(). So a transaction handle is necessary. --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-09 16:37 ` Stephen C. Tweedie @ 2005-02-09 18:38 ` Badari Pulavarty 2005-02-09 23:24 ` Stephen C. Tweedie 2005-02-10 17:39 ` Badari Pulavarty 1 sibling, 1 reply; 11+ messages in thread From: Badari Pulavarty @ 2005-02-09 18:38 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-fsdevel, Andrew Morton On Wed, 2005-02-09 at 08:37, Stephen C. Tweedie wrote: > Hi, > > On Wed, 2005-02-09 at 16:18, Badari Pulavarty wrote: > > > I am trying to understand journaling code in ext3. > > Can some one enlighten me, why we need journal start > > and stop in ext3_writeback_writepage() ? The block > > allocation is already made in prepare_write(). > > prepare_write()/commit_write() are used for write(2) writes: the data is > dirtied, but not immediately queued for IO (unless you're using O_SYNC). > > writepage is used when you want to write the page's data to disk > *immediately* --- it's used when the VM is swapping out an mmaped file, > or for msync(). > > So when writepage comes in, there's no guarantee that we've had a > previous prepare. You can, for example, use ftruncate() to create a > large hole in a file, and then mmap() it; if you then dirty a page, then > the allocation occurs in the writepage(). So a transaction handle is > necessary. > > --Stephen Thank you. It make sense now. I was under the impression that writepage() could also be used to flush the data even for write(2) writes. Is it not true ? I was trying to add writepages() interface for ext3. I am wondering if I need to do journaling for that case too. Thanks, Badari ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-09 18:38 ` Badari Pulavarty @ 2005-02-09 23:24 ` Stephen C. Tweedie 0 siblings, 0 replies; 11+ messages in thread From: Stephen C. Tweedie @ 2005-02-09 23:24 UTC (permalink / raw) To: Badari Pulavarty; +Cc: linux-fsdevel, Andrew Morton, Stephen Tweedie Hi, On Wed, 2005-02-09 at 18:38, Badari Pulavarty wrote: > I was under the impression that writepage() could also be used to > flush the data even for write(2) writes. Is it not true ? That _can_ happen, certainly. But it's not the default path. There's nothing that stops a deferred writeback from write(2) from subsequently getting flushed by a writepage(), if the VM decides that it needs to push that page to disk before the writeback background flush has happened. > I was trying to add writepages() interface for ext3. I am wondering > if I need to do journaling for that case too. Yes. If you implement writepages(), then msync() will ultimately use that for flushing file regions; and as that represents mmap()ed data, it could easily cover holes in the file, and allocation may be necessary. Cheers, Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-09 16:37 ` Stephen C. Tweedie 2005-02-09 18:38 ` Badari Pulavarty @ 2005-02-10 17:39 ` Badari Pulavarty 2005-02-10 19:07 ` Sonny Rao 2005-02-10 19:17 ` Badari Pulavarty 1 sibling, 2 replies; 11+ messages in thread From: Badari Pulavarty @ 2005-02-10 17:39 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-fsdevel, Andrew Morton, ext2-devel On Wed, 2005-02-09 at 08:37, Stephen C. Tweedie wrote: > Hi, > > On Wed, 2005-02-09 at 16:18, Badari Pulavarty wrote: > > > I am trying to understand journaling code in ext3. > > Can some one enlighten me, why we need journal start > > and stop in ext3_writeback_writepage() ? The block > > allocation is already made in prepare_write(). > > prepare_write()/commit_write() are used for write(2) writes: the data is > dirtied, but not immediately queued for IO (unless you're using O_SYNC). > > writepage is used when you want to write the page's data to disk > *immediately* --- it's used when the VM is swapping out an mmaped file, > or for msync(). > > So when writepage comes in, there's no guarantee that we've had a > previous prepare. You can, for example, use ftruncate() to create a > large hole in a file, and then mmap() it; if you then dirty a page, then > the allocation occurs in the writepage(). So a transaction handle is > necessary. > > --Stephen Okay, I started hacking. I added ext3_writeback_writepages() which calls journal start/stop before calling mpage_writepages(). I am getting OOPs which puzzles me. 2 reasons why.. 1) First of all OOps in is __mod_timer() which I have not touched. 2) journal_destory() is calling journal_start() now. But even with the original code, it would be calling journal_start() in ext3_writeback_writepage(). I am wondering why its a problem only now. Ideas ? Thanks, Badari Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: <ffffffff8013fa5b>{__mod_timer+219} PML4 19b4f4067 PGD 19eb3c067 PMD 0 Oops: 0002 [1] SMP CPU 3 Modules linked in: Pid: 12823, comm: umount Not tainted 2.6.10n RIP: 0010:[<ffffffff8013fa5b>] <ffffffff8013fa5b>{__mod_timer+219} RSP: 0018:000001017f6abae8 EFLAGS: 00010002 RAX: 0000000000000010 RBX: 00000101d4fd7f08 RCX: 0000000000000260 RDX: ffffffff8013a428 RSI: 0000000000000216 RDI: 00000101c0715aa0 RBP: 00000101c0715aa0 R08: 00000000000927c0 R09: 0000000000000720 R10: 00000000ffffffff R11: 0000000000000000 R12: 00000101d4fd7ed8 R13: 00000101d4fd7ef0 R14: 00000001000086e1 R15: 0000000000000216 FS: 0000002a9588e700(0000) GS:ffffffff80628900(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 00000001bffa4000 CR4: 00000000000006e0 Process umount (pid: 12823, threadinfo 000001017f6aa000, task 000001017ee935a0) Stack: 000000000007a000 00000101d6ac42c8 0000000000000000 000001019fa93000 000001017eece3c0 000000000000000e 00000101d6ac42c8 ffffffff801fa130 000001019fa93024 000000007f6abb48 Call Trace:<ffffffff801fa130>{start_this_handle+608} <ffffffff80132580>{finish_task_switch+64} <ffffffff803eb330>{thread_return+80} <ffffffff801fa6d3>{journal_start+227} <ffffffff801ea1e6>{ext3_writeback_writepages+70} <ffffffff8015fcbc>{do_writepages+28} <ffffffff8019f50c>{__writeback_single_inode+492} <ffffffff803eb9e0>{__wait_on_bit+96} <ffffffff8017eed0>{sync_buffer+0} <ffffffff803ebac3>{out_of_line_wait_on_bit+195} <ffffffff8014bba0>{wake_bit_function+0} <ffffffff8019f7c6>{write_inode_now+102} <ffffffff80196f4e>{generic_drop_inode+174} <ffffffff80195b0e>{iput+126} <ffffffff801ff5ca>{journal_destroy+618} <ffffffff8014bb70>{autoremove_wake_function+0} <ffffffff8014bb70>{autoremove_wake_function+0} <ffffffff801aa28c>{mb_cache_shrink+188} <ffffffff801f06f9>{ext3_put_super+41} <ffffffff80182a37>{generic_shutdown_super+151} <ffffffff80182afd>{kill_block_super+45} <ffffffff80182bd1>{deactivate_super+81} <ffffffff8019974a>{sys_umount+666} <ffffffff8026bd40>{__up_write+48} <ffffffff8016d57a>{sys_munmap+90} <ffffffff8010e4ce>{system_call+126} Code: 48 89 50 08 48 89 02 49 c7 44 24 08 00 02 20 00 49 c7 04 24 RIP <ffffffff8013fa5b>{__mod_timer+219} RSP <000001017f6abae8> CR2: 0000000000000018 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-10 17:39 ` Badari Pulavarty @ 2005-02-10 19:07 ` Sonny Rao 2005-02-10 19:17 ` Badari Pulavarty 1 sibling, 0 replies; 11+ messages in thread From: Sonny Rao @ 2005-02-10 19:07 UTC (permalink / raw) To: Badari Pulavarty Cc: Stephen C. Tweedie, linux-fsdevel, Andrew Morton, ext2-devel On Thu, Feb 10, 2005 at 09:39:05AM -0800, Badari Pulavarty wrote: > On Wed, 2005-02-09 at 08:37, Stephen C. Tweedie wrote: > > Hi, > > > > On Wed, 2005-02-09 at 16:18, Badari Pulavarty wrote: > > > > > I am trying to understand journaling code in ext3. > > > Can some one enlighten me, why we need journal start > > > and stop in ext3_writeback_writepage() ? The block > > > allocation is already made in prepare_write(). > > > > prepare_write()/commit_write() are used for write(2) writes: the data is > > dirtied, but not immediately queued for IO (unless you're using O_SYNC). > > > > writepage is used when you want to write the page's data to disk > > *immediately* --- it's used when the VM is swapping out an mmaped file, > > or for msync(). > > > > So when writepage comes in, there's no guarantee that we've had a > > previous prepare. You can, for example, use ftruncate() to create a > > large hole in a file, and then mmap() it; if you then dirty a page, then > > the allocation occurs in the writepage(). So a transaction handle is > > necessary. > > > > --Stephen > > Okay, I started hacking. I added ext3_writeback_writepages() which calls > journal start/stop before calling mpage_writepages(). > > I am getting OOPs which puzzles me. 2 reasons why.. > > 1) First of all OOps in is __mod_timer() which I have not touched. Total Guess: Callback after io-completed is blowing up ? Especially if it is trying to touch a buffer_head that isn't there or something. Sonny ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-10 17:39 ` Badari Pulavarty 2005-02-10 19:07 ` Sonny Rao @ 2005-02-10 19:17 ` Badari Pulavarty 2005-02-10 20:21 ` Andrew Morton 1 sibling, 1 reply; 11+ messages in thread From: Badari Pulavarty @ 2005-02-10 19:17 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: linux-fsdevel, Andrew Morton, ext2-devel Okay, I figured out why I am getting OOPS from mod_timer. Its because journal_destory() stopped the kjournald() which deleted the transaction timer. journal_start() is adding the timer back again - since its deleted, I get OOPs. But I still don't understand why this can't happen thro original code .. journal_destory() iput(journal inode) do_writepages() generic_writepages() ext3_writeback_writepage() journal_start() what am i missing ? Thanks, Badari On Thu, 2005-02-10 at 09:39, Badari Pulavarty wrote: > On Wed, 2005-02-09 at 08:37, Stephen C. Tweedie wrote: > > Hi, > > > > On Wed, 2005-02-09 at 16:18, Badari Pulavarty wrote: > > > > > I am trying to understand journaling code in ext3. > > > Can some one enlighten me, why we need journal start > > > and stop in ext3_writeback_writepage() ? The block > > > allocation is already made in prepare_write(). > > > > prepare_write()/commit_write() are used for write(2) writes: the data is > > dirtied, but not immediately queued for IO (unless you're using O_SYNC). > > > > writepage is used when you want to write the page's data to disk > > *immediately* --- it's used when the VM is swapping out an mmaped file, > > or for msync(). > > > > So when writepage comes in, there's no guarantee that we've had a > > previous prepare. You can, for example, use ftruncate() to create a > > large hole in a file, and then mmap() it; if you then dirty a page, then > > the allocation occurs in the writepage(). So a transaction handle is > > necessary. > > > > --Stephen > > Okay, I started hacking. I added ext3_writeback_writepages() which calls > journal start/stop before calling mpage_writepages(). > > I am getting OOPs which puzzles me. 2 reasons why.. > > 1) First of all OOps in is __mod_timer() which I have not touched. > > 2) journal_destory() is calling journal_start() now. But even > with the original code, it would be calling journal_start() in > ext3_writeback_writepage(). I am wondering why its a problem only > now. > > Ideas ? > > > Thanks, > Badari > > > Unable to handle kernel NULL pointer dereference at 0000000000000018 > RIP: > <ffffffff8013fa5b>{__mod_timer+219} > PML4 19b4f4067 PGD 19eb3c067 PMD 0 > Oops: 0002 [1] SMP > CPU 3 > Modules linked in: > Pid: 12823, comm: umount Not tainted 2.6.10n > RIP: 0010:[<ffffffff8013fa5b>] <ffffffff8013fa5b>{__mod_timer+219} > RSP: 0018:000001017f6abae8 EFLAGS: 00010002 > RAX: 0000000000000010 RBX: 00000101d4fd7f08 RCX: 0000000000000260 > RDX: ffffffff8013a428 RSI: 0000000000000216 RDI: 00000101c0715aa0 > RBP: 00000101c0715aa0 R08: 00000000000927c0 R09: 0000000000000720 > R10: 00000000ffffffff R11: 0000000000000000 R12: 00000101d4fd7ed8 > R13: 00000101d4fd7ef0 R14: 00000001000086e1 R15: 0000000000000216 > FS: 0000002a9588e700(0000) GS:ffffffff80628900(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000018 CR3: 00000001bffa4000 CR4: 00000000000006e0 > Process umount (pid: 12823, threadinfo 000001017f6aa000, task > 000001017ee935a0) > Stack: 000000000007a000 00000101d6ac42c8 0000000000000000 > 000001019fa93000 > 000001017eece3c0 000000000000000e 00000101d6ac42c8 > ffffffff801fa130 > 000001019fa93024 000000007f6abb48 > Call Trace:<ffffffff801fa130>{start_this_handle+608} > <ffffffff80132580>{finish_task_switch+64} > <ffffffff803eb330>{thread_return+80} > <ffffffff801fa6d3>{journal_start+227} > <ffffffff801ea1e6>{ext3_writeback_writepages+70} > <ffffffff8015fcbc>{do_writepages+28} > <ffffffff8019f50c>{__writeback_single_inode+492} > <ffffffff803eb9e0>{__wait_on_bit+96} > <ffffffff8017eed0>{sync_buffer+0} > <ffffffff803ebac3>{out_of_line_wait_on_bit+195} > <ffffffff8014bba0>{wake_bit_function+0} > <ffffffff8019f7c6>{write_inode_now+102} > <ffffffff80196f4e>{generic_drop_inode+174} > <ffffffff80195b0e>{iput+126} > <ffffffff801ff5ca>{journal_destroy+618} > <ffffffff8014bb70>{autoremove_wake_function+0} > <ffffffff8014bb70>{autoremove_wake_function+0} > <ffffffff801aa28c>{mb_cache_shrink+188} > <ffffffff801f06f9>{ext3_put_super+41} > <ffffffff80182a37>{generic_shutdown_super+151} > <ffffffff80182afd>{kill_block_super+45} > <ffffffff80182bd1>{deactivate_super+81} > <ffffffff8019974a>{sys_umount+666} > <ffffffff8026bd40>{__up_write+48} > <ffffffff8016d57a>{sys_munmap+90} > <ffffffff8010e4ce>{system_call+126} > > > Code: 48 89 50 08 48 89 02 49 c7 44 24 08 00 02 20 00 49 c7 04 24 > RIP <ffffffff8013fa5b>{__mod_timer+219} RSP <000001017f6abae8> > CR2: 0000000000000018 > > > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-10 19:17 ` Badari Pulavarty @ 2005-02-10 20:21 ` Andrew Morton 2005-02-10 23:12 ` Stephen C. Tweedie 0 siblings, 1 reply; 11+ messages in thread From: Andrew Morton @ 2005-02-10 20:21 UTC (permalink / raw) To: Badari Pulavarty; +Cc: sct, linux-fsdevel, ext2-devel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > But I still don't understand why this can't happen > thro original code .. > > journal_destory() > iput(journal inode) > do_writepages() > generic_writepages() > ext3_writeback_writepage() > journal_start() > > what am i missing ? presumably there are never any dirty pages or inodes when we run journal_destroy(). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: journal start/stop in ext3_writeback_writepage() 2005-02-10 20:21 ` Andrew Morton @ 2005-02-10 23:12 ` Stephen C. Tweedie 2005-02-11 0:22 ` [Ext2-devel] " Badari Pulavarty 0 siblings, 1 reply; 11+ messages in thread From: Stephen C. Tweedie @ 2005-02-10 23:12 UTC (permalink / raw) To: Andrew Morton Cc: Badari Pulavarty, linux-fsdevel, ext2-devel@lists.sourceforge.net, Stephen Tweedie Hi, On Thu, 2005-02-10 at 20:21, Andrew Morton wrote: > > But I still don't understand why this can't happen > > thro original code .. > > what am i missing ? > > presumably there are never any dirty pages or inodes when we run > journal_destroy(). I assume so, yes. If there is no a_ops->writepages(), then we default to generic_writepages() which is a noop if there are no dirty pages. If your new ext3-specific writepages code tries to do a journal_start() in that case, then yes, it is likely to blow up spectacularly during journal_destroy! --Stephen ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ext2-devel] Re: journal start/stop in ext3_writeback_writepage() 2005-02-10 23:12 ` Stephen C. Tweedie @ 2005-02-11 0:22 ` Badari Pulavarty 2005-02-11 0:27 ` Andrew Morton 0 siblings, 1 reply; 11+ messages in thread From: Badari Pulavarty @ 2005-02-11 0:22 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Andrew Morton, linux-fsdevel, ext2-devel On Thu, 2005-02-10 at 15:12, Stephen C. Tweedie wrote: > Hi, > > On Thu, 2005-02-10 at 20:21, Andrew Morton wrote: > > > > But I still don't understand why this can't happen > > > thro original code .. > > > > what am i missing ? > > > > presumably there are never any dirty pages or inodes when we run > > journal_destroy(). > > I assume so, yes. If there is no a_ops->writepages(), then we default > to generic_writepages() which is a noop if there are no dirty pages. If > your new ext3-specific writepages code tries to do a journal_start() in > that case, then yes, it is likely to blow up spectacularly during > journal_destroy! > > --Stephen Yep. I found this hardway that exactly whats happening. generic_writepages() is clever enough to do nothing, if there are no dirty pages. But I am being stupid in my writepages(). I need to teach writepages() to nothing in case of no dirty pages. Is there a easy way like checking a count somewhere than doing all the stuff mpage_writepages() is doing to figure this out, like .. while (!done && (index <= end) && (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY, min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) ... Thanks, Badari ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ext2-devel] Re: journal start/stop in ext3_writeback_writepage() 2005-02-11 0:22 ` [Ext2-devel] " Badari Pulavarty @ 2005-02-11 0:27 ` Andrew Morton 0 siblings, 0 replies; 11+ messages in thread From: Andrew Morton @ 2005-02-11 0:27 UTC (permalink / raw) To: Badari Pulavarty; +Cc: sct, linux-fsdevel, ext2-devel Badari Pulavarty <pbadari@us.ibm.com> wrote: > > I need to teach writepages() to nothing in case of no dirty pages. > Is there a easy way like checking a count somewhere than doing all the > stuff mpage_writepages() is doing to figure this out if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) return; ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-02-11 0:22 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-09 16:18 journal start/stop in ext3_writeback_writepage() Badari Pulavarty 2005-02-09 16:37 ` Stephen C. Tweedie 2005-02-09 18:38 ` Badari Pulavarty 2005-02-09 23:24 ` Stephen C. Tweedie 2005-02-10 17:39 ` Badari Pulavarty 2005-02-10 19:07 ` Sonny Rao 2005-02-10 19:17 ` Badari Pulavarty 2005-02-10 20:21 ` Andrew Morton 2005-02-10 23:12 ` Stephen C. Tweedie 2005-02-11 0:22 ` [Ext2-devel] " Badari Pulavarty 2005-02-11 0:27 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).