linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aastha Mehta <aasthakm@gmail.com>
To: Josef Bacik <jbacik@fusionio.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Questions regarding logging upon fsync in btrfs
Date: Tue, 1 Oct 2013 22:13:25 +0200	[thread overview]
Message-ID: <CAEx9m46UY3HiTgO-mt8x2Ykj=jn3_Sf=4SCSdJTwAHS0BfLkwQ@mail.gmail.com> (raw)
In-Reply-To: <CAEx9m4649eJR0kAXRoQx9myUw+8ZBXi71pm3aYRsquZ69+ULZA@mail.gmail.com>

On 1 October 2013 21:42, Aastha Mehta <aasthakm@gmail.com> wrote:
> On 1 October 2013 21:40, Aastha Mehta <aasthakm@gmail.com> wrote:
>> On 1 October 2013 19:34, Josef Bacik <jbacik@fusionio.com> wrote:
>>> On Mon, Sep 30, 2013 at 11:07:20PM +0200, Aastha Mehta wrote:
>>>> On 30 September 2013 22:47, Josef Bacik <jbacik@fusionio.com> wrote:
>>>> > On Mon, Sep 30, 2013 at 10:30:59PM +0200, Aastha Mehta wrote:
>>>> >> On 30 September 2013 22:11, Josef Bacik <jbacik@fusionio.com> wrote:
>>>> >> > On Mon, Sep 30, 2013 at 09:32:54PM +0200, Aastha Mehta wrote:
>>>> >> >> On 29 September 2013 15:12, Josef Bacik <jbacik@fusionio.com> wrote:
>>>> >> >> > On Sun, Sep 29, 2013 at 11:22:36AM +0200, Aastha Mehta wrote:
>>>> >> >> >> Thank you very much for the reply. That clarifies a lot of things.
>>>> >> >> >>
>>>> >> >> >> I was trying a small test case that opens a file, writes a block of
>>>> >> >> >> data, calls fsync and then closes the file. If I understand correctly,
>>>> >> >> >> fsync would return only after all in-memory buffers have been
>>>> >> >> >> committed to disk. I have added few print statements in the
>>>> >> >> >> __extent_writepage function, and I notice that the function gets
>>>> >> >> >> called a bit later after fsync returns. It seems that I am not
>>>> >> >> >> guaranteed to see the data going to disk by the time fsync returns.
>>>> >> >> >>
>>>> >> >> >> Am I doing something wrong, or am I looking at the wrong place for
>>>> >> >> >> disk write? This happens both with tree logging enabled as well as
>>>> >> >> >> with notreelog.
>>>> >> >> >>
>>>> >> >> >
>>>> >> >> > So 3.1 was a long time ago and to be sure it had issues I don't think it was
>>>> >> >> > _that_ broken.  You are probably better off instrumenting a recent kernel, 3.11
>>>> >> >> > or just build btrfs-next from git.  But if I were to make a guess I'd say that
>>>> >> >> > __extent_writepage was how both data and metadata was written out at the time (I
>>>> >> >> > don't think I changed it until 3.2 or something later) so what you are likely
>>>> >> >> > seeing is the normal transaction commit after the fsync.  In the case of
>>>> >> >> > notreelog we are likely starting another transaction and you are seeing that
>>>> >> >> > commit (at the time the transaction kthread would start a transaction even if
>>>> >> >> > none had been started yet.)  Thanks,
>>>> >> >> >
>>>> >> >> > Josef
>>>> >> >>
>>>> >> >> Is there any special handling for very small file write, less than 4K? As
>>>> >> >> I understand there is an optimization to inline the first extent in a file if
>>>> >> >> it is smaller than 4K, does it affect the writeback on fsync as well? I did
>>>> >> >> set the max_inline mount option to 0, but even then it seems there is
>>>> >> >> some difference in fsync behaviour for writing first extent of less than 4K
>>>> >> >> size and writing 4K or more.
>>>> >> >>
>>>> >> >
>>>> >> > Yeah if the file is an inline extent then it will be copied into the log
>>>> >> > directly and the log will be written out, no going through the data write path
>>>> >> > at all.  Max inline == 0 should make it so we don't inline, so if it isn't
>>>> >> > honoring that then that may be a bug.  Thanks,
>>>> >> >
>>>> >> > Josef
>>>> >>
>>>> >> I tried it on 3.12-rc2 release, and it seems there is a bug then.
>>>> >> Please find attached logs to confirm.
>>>> >> Also, probably on the older release.
>>>> >>
>>>> >
>>>> > Oooh ok I understand, you have your printk's in the wrong place ;).
>>>> > do_writepages doesn't necessarily mean you are writing something.  If you want
>>>> > to see if stuff got written to the disk I'd put a printk at run_delalloc_range
>>>> > and have it spit out the range it is writing out since thats what we think is
>>>> > actually dirty.  Thanks,
>>>> >
>>>> > Josef
>>>>
>>>> No, but I also placed dump_stack() in the beginning of
>>>> __extent_writepage. run_delalloc_range is being called only from
>>>> __extent_writepage, if it were to be called, the dump_stack() at the
>>>> top of __extent_writepage would have printed as well, no?
>>>>
>>>
>>> Ok I've done the same thing and I'm not seeing what you are seeing.  Are you
>>> using any mount options other than notreelog and max_inline=0?  Could you adjust
>>> your printk to print out the root objectid for the inode as well?  It could be
>>> possible that this is the writeout for the space cache or inode cache.  Thanks,
>>>
>>> Josef
>>
>> I actually printed the stack only when the root objectid is 5. I have
>> attached another log for writing the first 500 bytes in a file. I also
>> print the root objectid for the inode in run_delalloc and
>> __extent_writepage.
>>
>> Thanks
>>
>
> Just to clarify, in the latest logs, I allowed printing of debug
> printk's and stack dump for all root objectid's.

Actually, it is the same behaviour when I write anything less than 4K
long, no matter what offset, except if I straddle the page boundary.
To summarise:
1. write 4K -> write in the fsync path
2. write less than 4K, within a single page -> bdi_writeback by flush worker
3. small write that straddles a page boundary or write 4K+delta -> the
first page gets written in the fsync path, the remaining length that
straddles the page boundary is written in the bdi_writeback path

Please let me know, if I am trying out incorrect cases.

Sorry for too many mails.

Thanks

  reply	other threads:[~2013-10-01 20:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-28 23:35 Questions regarding logging upon fsync in btrfs Aastha Mehta
2013-09-28 23:46 ` Aastha Mehta
2013-09-29  0:21   ` Hugo Mills
2013-09-29  0:42 ` Josef Bacik
2013-09-29  9:22   ` Aastha Mehta
2013-09-29 13:12     ` Josef Bacik
2013-09-30 19:32       ` Aastha Mehta
2013-09-30 20:11         ` Josef Bacik
2013-09-30 20:30           ` Aastha Mehta
2013-09-30 20:47             ` Josef Bacik
2013-09-30 21:07               ` Aastha Mehta
2013-09-30 21:17                 ` Josef Bacik
2013-10-01 17:34                 ` Josef Bacik
2013-10-01 19:40                   ` Aastha Mehta
2013-10-01 19:42                     ` Aastha Mehta
2013-10-01 20:13                       ` Aastha Mehta [this message]
2013-10-02 11:52                         ` Josef Bacik
2013-10-02 20:12                           ` Aastha Mehta
2013-10-02 23:28                             ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEx9m46UY3HiTgO-mt8x2Ykj=jn3_Sf=4SCSdJTwAHS0BfLkwQ@mail.gmail.com' \
    --to=aasthakm@gmail.com \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).