From: Aastha Mehta <aasthakm@gmail.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Questions regarding logging upon fsync in btrfs
Date: Tue, 1 Oct 2013 21:40:33 +0200 [thread overview]
Message-ID: <CAEx9m47fk2zwpAdPq2gFx1Y5DDV9rqtWzo=QxmCUgeTS83fBOQ@mail.gmail.com> (raw)
In-Reply-To: <20131001173416.GF27490@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 4456 bytes --]
On 1 October 2013 19:34, Josef Bacik <jbacik@fusionio.com> wrote:
> On Mon, Sep 30, 2013 at 11:07:20PM +0200, Aastha Mehta wrote:
>> On 30 September 2013 22:47, Josef Bacik <jbacik@fusionio.com> wrote:
>> > On Mon, Sep 30, 2013 at 10:30:59PM +0200, Aastha Mehta wrote:
>> >> On 30 September 2013 22:11, Josef Bacik <jbacik@fusionio.com> wrote:
>> >> > On Mon, Sep 30, 2013 at 09:32:54PM +0200, Aastha Mehta wrote:
>> >> >> On 29 September 2013 15:12, Josef Bacik <jbacik@fusionio.com> wrote:
>> >> >> > On Sun, Sep 29, 2013 at 11:22:36AM +0200, Aastha Mehta wrote:
>> >> >> >> Thank you very much for the reply. That clarifies a lot of things.
>> >> >> >>
>> >> >> >> I was trying a small test case that opens a file, writes a block of
>> >> >> >> data, calls fsync and then closes the file. If I understand correctly,
>> >> >> >> fsync would return only after all in-memory buffers have been
>> >> >> >> committed to disk. I have added few print statements in the
>> >> >> >> __extent_writepage function, and I notice that the function gets
>> >> >> >> called a bit later after fsync returns. It seems that I am not
>> >> >> >> guaranteed to see the data going to disk by the time fsync returns.
>> >> >> >>
>> >> >> >> Am I doing something wrong, or am I looking at the wrong place for
>> >> >> >> disk write? This happens both with tree logging enabled as well as
>> >> >> >> with notreelog.
>> >> >> >>
>> >> >> >
>> >> >> > So 3.1 was a long time ago and to be sure it had issues I don't think it was
>> >> >> > _that_ broken. You are probably better off instrumenting a recent kernel, 3.11
>> >> >> > or just build btrfs-next from git. But if I were to make a guess I'd say that
>> >> >> > __extent_writepage was how both data and metadata was written out at the time (I
>> >> >> > don't think I changed it until 3.2 or something later) so what you are likely
>> >> >> > seeing is the normal transaction commit after the fsync. In the case of
>> >> >> > notreelog we are likely starting another transaction and you are seeing that
>> >> >> > commit (at the time the transaction kthread would start a transaction even if
>> >> >> > none had been started yet.) Thanks,
>> >> >> >
>> >> >> > Josef
>> >> >>
>> >> >> Is there any special handling for very small file write, less than 4K? As
>> >> >> I understand there is an optimization to inline the first extent in a file if
>> >> >> it is smaller than 4K, does it affect the writeback on fsync as well? I did
>> >> >> set the max_inline mount option to 0, but even then it seems there is
>> >> >> some difference in fsync behaviour for writing first extent of less than 4K
>> >> >> size and writing 4K or more.
>> >> >>
>> >> >
>> >> > Yeah if the file is an inline extent then it will be copied into the log
>> >> > directly and the log will be written out, no going through the data write path
>> >> > at all. Max inline == 0 should make it so we don't inline, so if it isn't
>> >> > honoring that then that may be a bug. Thanks,
>> >> >
>> >> > Josef
>> >>
>> >> I tried it on 3.12-rc2 release, and it seems there is a bug then.
>> >> Please find attached logs to confirm.
>> >> Also, probably on the older release.
>> >>
>> >
>> > Oooh ok I understand, you have your printk's in the wrong place ;).
>> > do_writepages doesn't necessarily mean you are writing something. If you want
>> > to see if stuff got written to the disk I'd put a printk at run_delalloc_range
>> > and have it spit out the range it is writing out since thats what we think is
>> > actually dirty. Thanks,
>> >
>> > Josef
>>
>> No, but I also placed dump_stack() in the beginning of
>> __extent_writepage. run_delalloc_range is being called only from
>> __extent_writepage, if it were to be called, the dump_stack() at the
>> top of __extent_writepage would have printed as well, no?
>>
>
> Ok I've done the same thing and I'm not seeing what you are seeing. Are you
> using any mount options other than notreelog and max_inline=0? Could you adjust
> your printk to print out the root objectid for the inode as well? It could be
> possible that this is the writeout for the space cache or inode cache. Thanks,
>
> Josef
I actually printed the stack only when the root objectid is 5. I have
attached another log for writing the first 500 bytes in a file. I also
print the root objectid for the inode in run_delalloc and
__extent_writepage.
Thanks
--
Aastha Mehta
MPI-SWS, Germany
E-mail: aasthakm@mpi-sws.org
[-- Attachment #2: fsync-3.12-rc2-500bytes-011013.txt --]
[-- Type: text/plain, Size: 6945 bytes --]
write success, size: 500
fsync:: Success
file size: 500
fclose:: Success
[ 829.847483] btrfs: device fsid 4d2767ea-85cc-443c-88bd-2d8c693f7c7c devid 1 transid 2201 /dev/sdd
[ 829.848199] btrfs: max_inline at 0
[ 829.848201] btrfs: disabling tree log
[ 829.848202] btrfs: disk space caching is enabled
[ 835.266483] [do_writepages] inum: 258, nr_to_write: 9223372036854775807, writepages fn: ffffffff8129ce40
[ 835.266492] CPU: 0 PID: 10021 Comm: bb Not tainted 3.12.0-rc2-latest+ #7
[ 835.266494] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 835.266496] ffff88003cdc5e60 ffff88003cdc5e28 ffffffff817a8acf ffff88003fc0eb48
[ 835.266511] ffff88001f478b28 ffff88003cdc5e48 ffffffff81105c1f ffff88001f4789e0
[ 835.266515] ffff880037557400 ffff88003cdc5e88 ffffffff810fb689 ffff88003fc12c00
[ 835.266518] Call Trace:
[ 835.266528] [<ffffffff817a8acf>] dump_stack+0x46/0x58
[ 835.266534] [<ffffffff81105c1f>] do_writepages+0x4f/0x90
[ 835.266538] [<ffffffff810fb689>] __filemap_fdatawrite_range+0x49/0x50
[ 835.266542] [<ffffffff810fc41e>] filemap_fdatawrite_range+0xe/0x10
[ 835.266546] [<ffffffff812ab01c>] btrfs_sync_file+0xac/0x380
[ 835.266553] [<ffffffff817aee32>] ? __schedule+0x692/0x730
[ 835.266559] [<ffffffff811752d8>] do_fsync+0x58/0x90
[ 835.266563] [<ffffffff81148a6d>] ? SyS_write+0x4d/0xa0
[ 835.266568] [<ffffffff8117565b>] SyS_fsync+0xb/0x10
[ 835.266572] [<ffffffff817b8422>] system_call_fastpath+0x16/0x1b
[ 835.266579] [btrfs_sync_file] inum: (258, 5), called write_range, start: 0, end: 9223372036854775807, ret: 0
[ 835.266581] [btrfs_sync_file] inum: (258, 5), calling btrfs_wait_ordered_range, start: 0, end: 9223372036854775807
[ 835.266584] [do_writepages] inum: 258, nr_to_write: 9223372036854775807, writepages fn: ffffffff8129ce40
[ 835.266587] CPU: 0 PID: 10021 Comm: bb Not tainted 3.12.0-rc2-latest+ #7
[ 835.266589] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 835.266590] ffff88003cdc5e20 ffff88003cdc5de8 ffffffff817a8acf 0000000000000007
[ 835.266593] ffff88001f478b28 ffff88003cdc5e08 ffffffff81105c1f 0000000000000000
[ 835.266596] ffff88001f4789e0 ffff88003cdc5e48 ffffffff810fb689 ffff880037557400
[ 835.266599] Call Trace:
[ 835.266603] [<ffffffff817a8acf>] dump_stack+0x46/0x58
[ 835.266607] [<ffffffff81105c1f>] do_writepages+0x4f/0x90
[ 835.266610] [<ffffffff810fb689>] __filemap_fdatawrite_range+0x49/0x50
[ 835.266613] [<ffffffff810fc41e>] filemap_fdatawrite_range+0xe/0x10
[ 835.266618] [<ffffffff812b2ad8>] btrfs_wait_ordered_range+0x58/0x100
[ 835.266621] [<ffffffff812ab118>] btrfs_sync_file+0x1a8/0x380
[ 835.266625] [<ffffffff817aee32>] ? __schedule+0x692/0x730
[ 835.266629] [<ffffffff811752d8>] do_fsync+0x58/0x90
[ 835.266632] [<ffffffff81148a6d>] ? SyS_write+0x4d/0xa0
[ 835.266636] [<ffffffff8117565b>] SyS_fsync+0xb/0x10
[ 835.266640] [<ffffffff817b8422>] system_call_fastpath+0x16/0x1b
[ 835.266647] [btrfs_sync_file] inum: (258, 5), return from log_dentry, start: 0, end: 9223372036854775807, ret: 1
[ 865.312082] [do_writepages] inum: 258, nr_to_write: 3072, writepages fn: ffffffff8129ce40
[ 865.312090] CPU: 0 PID: 711 Comm: kworker/u2:5 Not tainted 3.12.0-rc2-latest+ #7
[ 865.312093] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 865.312100] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2)
[ 865.312104] ffff88003c793b60 ffff88003c793a88 ffffffff817a8acf ffff88003fc0eb48
[ 865.312107] ffff88001f478b28 ffff88003c793aa8 ffffffff81105c1f ffff88001f4789e0
[ 865.312110] ffff88003c793b60 ffff88003c793b08 ffffffff8116ea30 ffff880039caf398
[ 865.312113] Call Trace:
[ 865.312121] [<ffffffff817a8acf>] dump_stack+0x46/0x58
[ 865.312126] [<ffffffff81105c1f>] do_writepages+0x4f/0x90
[ 865.312131] [<ffffffff8116ea30>] __writeback_single_inode+0xa0/0x260
[ 865.312135] [<ffffffff8116f741>] writeback_sb_inodes+0x2b1/0x430
[ 865.312139] [<ffffffff8116f93e>] __writeback_inodes_wb+0x7e/0xc0
[ 865.312143] [<ffffffff8117000b>] wb_writeback+0x18b/0x2e0
[ 865.312147] [<ffffffff811701e3>] wb_check_old_data_flush+0x83/0xb0
[ 865.312151] [<ffffffff81171aac>] bdi_writeback_workfn+0x1dc/0x440
[ 865.312157] [<ffffffff8105f19c>] process_one_work+0x2bc/0x430
[ 865.312161] [<ffffffff810600c5>] worker_thread+0x215/0x320
[ 865.312165] [<ffffffff8105feb0>] ? manage_workers.isra.18+0x2b0/0x2b0
[ 865.312169] [<ffffffff810666f6>] kthread+0xb6/0xc0
[ 865.312172] [<ffffffff81066640>] ? flush_kthread_worker+0xa0/0xa0
[ 865.312178] [<ffffffff817b837c>] ret_from_fork+0x7c/0xb0
[ 865.312181] [<ffffffff81066640>] ? flush_kthread_worker+0xa0/0xa0
[ 865.312188] [__extent_writepage] inum: (258, 5)
[ 865.312191] CPU: 0 PID: 711 Comm: kworker/u2:5 Not tainted 3.12.0-rc2-latest+ #7
[ 865.312192] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 865.312196] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2)
[ 865.312198] ffffea0000b93700 ffff88003c793808 ffffffff817a8acf 0000000000000007
[ 865.312201] ffff88001f478850 ffff88003c793918 ffffffff812b7cf2 ffff88003c793878
[ 865.312204] ffffffff817a36ae ffff88003c793888 ffffffff00000020 ffff88003c793888
[ 865.312207] Call Trace:
[ 865.312211] [<ffffffff817a8acf>] dump_stack+0x46/0x58
[ 865.312216] [<ffffffff812b7cf2>] __extent_writepage+0x82/0x770
[ 865.312219] [<ffffffff817a36ae>] ? printk+0x5c/0x5e
[ 865.312222] [<ffffffff81066640>] ? flush_kthread_worker+0xa0/0xa0
[ 865.312226] [<ffffffff810fac71>] ? find_get_pages_tag+0x121/0x160
[ 865.312231] [<ffffffff812b85b2>] extent_write_cache_pages.isra.24.constprop.31+0x1d2/0x380
[ 865.312235] [<ffffffff81005877>] ? show_trace_log_lvl+0x57/0x70
[ 865.312239] [<ffffffff812b8a39>] extent_writepages+0x49/0x60
[ 865.312244] [<ffffffff8129f160>] ? btrfs_submit_direct+0x6c0/0x6c0
[ 865.312247] [<ffffffff8129ce63>] btrfs_writepages+0x23/0x30
[ 865.312251] [<ffffffff81105c3e>] do_writepages+0x6e/0x90
[ 865.312255] [<ffffffff8116ea30>] __writeback_single_inode+0xa0/0x260
[ 865.312259] [<ffffffff8116f741>] writeback_sb_inodes+0x2b1/0x430
[ 865.312263] [<ffffffff8116f93e>] __writeback_inodes_wb+0x7e/0xc0
[ 865.312267] [<ffffffff8117000b>] wb_writeback+0x18b/0x2e0
[ 865.312271] [<ffffffff811701e3>] wb_check_old_data_flush+0x83/0xb0
[ 865.312275] [<ffffffff81171aac>] bdi_writeback_workfn+0x1dc/0x440
[ 865.312279] [<ffffffff8105f19c>] process_one_work+0x2bc/0x430
[ 865.312283] [<ffffffff810600c5>] worker_thread+0x215/0x320
[ 865.312287] [<ffffffff8105feb0>] ? manage_workers.isra.18+0x2b0/0x2b0
[ 865.312290] [<ffffffff810666f6>] kthread+0xb6/0xc0
[ 865.312293] [<ffffffff81066640>] ? flush_kthread_worker+0xa0/0xa0
[ 865.312297] [<ffffffff817b837c>] ret_from_fork+0x7c/0xb0
[ 865.312301] [<ffffffff81066640>] ? flush_kthread_worker+0xa0/0xa0
[ 865.312309] [run_delalloc_range] inum: (258, 5), alloc start: 0, end: 4095
next prev parent reply other threads:[~2013-10-01 19:40 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-28 23:35 Questions regarding logging upon fsync in btrfs Aastha Mehta
2013-09-28 23:46 ` Aastha Mehta
2013-09-29 0:21 ` Hugo Mills
2013-09-29 0:42 ` Josef Bacik
2013-09-29 9:22 ` Aastha Mehta
2013-09-29 13:12 ` Josef Bacik
2013-09-30 19:32 ` Aastha Mehta
2013-09-30 20:11 ` Josef Bacik
2013-09-30 20:30 ` Aastha Mehta
2013-09-30 20:47 ` Josef Bacik
2013-09-30 21:07 ` Aastha Mehta
2013-09-30 21:17 ` Josef Bacik
2013-10-01 17:34 ` Josef Bacik
2013-10-01 19:40 ` Aastha Mehta [this message]
2013-10-01 19:42 ` Aastha Mehta
2013-10-01 20:13 ` Aastha Mehta
2013-10-02 11:52 ` Josef Bacik
2013-10-02 20:12 ` Aastha Mehta
2013-10-02 23:28 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEx9m47fk2zwpAdPq2gFx1Y5DDV9rqtWzo=QxmCUgeTS83fBOQ@mail.gmail.com' \
--to=aasthakm@gmail.com \
--cc=jbacik@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).