All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Ma <tao.ma@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] New update for sync in reflink. [Re: some thoughts about data integrity test for reflink.]
Date: Sat, 08 Aug 2009 22:28:34 +0800	[thread overview]
Message-ID: <4A7D8B92.5090906@oracle.com> (raw)
In-Reply-To: <20090808001005.GE30923@mail.oracle.com>



Joel Becker wrote:
> [I've added ocfs2-devel to the Cc list.  Please copy the list, otherwise
>   we can't get Mark's thoughts, and he knows some of the code better
>   than I do. ]
>
> On Fri, Aug 07, 2009 at 11:23:52PM +0800, Tao Ma wrote:
>   
>> Joel Becker wrote:
>>     
>>>> I called ocfs2_sync_inode before reflink to flush the page
>>>> cache.  So I  think we should get all B's and C's are missed.
>>>> The reason I call ocfs2_sync_inode is the following test case:
>>>> 1. Write a file with all A's.
>>>> 2. reflink A to B.
>>>> 3. read B.
>>>> If the page cache of A isn't synced to the disk, we may get the
>>>> garbage  from reading of B.
>>>>         
>>> 	We won't get garbage.  We get 'A's, which is a valid state of
>>> the file.  Whether it's valid in the face of relink is what we're trying
>>> to decide here.
>>>       
>> We don't get 'A' here actually and we would get garbage. Strange,
>> uh? I just added some printk and it seems that while reading file B,
>> the buffer_head is created
>> and read from the disk. I used to think that while we write file A,
>> we will create buffer_head. And reading file B will use this
>> buffer_head.
>> But that is not the case actually. The only right sequence is that
>> we add a sync between step 1 and 2.
>>     
>
> 	Um, what?
> 	Oh, we do the check for PageDirty() in CoW, but not in straight
> read...
>   
So what do you mean here? I need to add another patch for straight read 
check because of reflink?
I am lack of this type of technique. So please give me some tips. Thanks.
>   
>>> 	Now, whether we want a reflink to force a sync or not in the
>>> writeback case...on the one hand, people will probably expect write();
>>> reflink(); read() to work.  On the other hand, they really should
>>> write(); fsync(); reflink(); if they care.  We're already dealing
>>> with this regarding rename(), and its ugly.  Force the fsync() for them,
>>> and we help the naive caller but penalize (performance-wise) the
>>> intelligent caller.
>>>       
>> I have checked my mount mode, it is data=ordered, so we also meet
>> with this issue in ordered mode.
>>     
>
> 	Yes, the original discussion was concerning the page cache
> contents during CoW.  You've coded that up correctly.  Here we're
> talking about simple read.
>
>   
>>> 	I really lean towards just requiring the caller to deal with it.
>>> It's what unix-like systems do for all other operations.  I poked Chris,
>>> and he agrees.  So a caller if they knew they had such a situation, does
>>> 'fsync(); reflink();'.
>>>       
>> So a user always need to do fsync before he do reflink. Is it OK? If
>> yes, I will ask tristan to update his test case.
>>     
>
> 	Well... I would normally say yes, because stale vs not-stale is
> one thing.  But complete garbage from before the first write is not OK.
>   
No problem, I will ask tristan to do it. And we may also need to add 
some explanations about it when we finally release reflink.
>   
>>>> I think it is OK if I call ocfs2_sync_inode before reflink?(See
>>>> my git  tree for details).
>>>>         
>>> 	Calling it unconditionally is just a performance disaster
>>> waiting to happen.  Even if we want to require reflink() to happen with
>>> data flushed, you don't want to call ocfs2_sync_inode() in the ordered
>>> case.  You want the ordered journal to handle it.
>>>       
>> Since it doesn't work in order mode which is more common for ocfs2
>> usage, do you change your mind here?
>>     
>
> 	I do not.  Having to call ocfs2_sync_inode() for every reflink
> is a disaster.  We need to think about this.  Basically, how can the
> reflink target notice it has to either sync the source or perhaps copy
> the data over?
>   
OK, I agree with you for ocfs2_sync_inode part. I am just a little 
worried about the above case I mentioned, so if we can
resolve it gracefully, I would be very glad about it.

Regards,
Tao

  reply	other threads:[~2009-08-08 14:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4A1B8E15.20005@oracle.com>
     [not found] ` <1243322353.6616.18.camel@tristan-laptop.cn.oracle.com>
     [not found]   ` <4A1C8708.8050704@oracle.com>
     [not found]     ` <1243389610.6616.30.camel@tristan-laptop.cn.oracle.com>
     [not found]       ` <4A1DDFCF.4060203@oracle.com>
     [not found]         ` <4A1E3B9A.7000807@oracle.com>
     [not found]           ` <20090528171406.GA315@mail.oracle.com>
     [not found]             ` <4A7C4708.9030303@oracle.com>
2009-08-08  0:10               ` [Ocfs2-devel] New update for sync in reflink. [Re: some thoughts about data integrity test for reflink.] Joel Becker
2009-08-08 14:28                 ` Tao Ma [this message]
2009-08-10 18:59                   ` Joel Becker
2009-08-11  1:29                     ` Tao Ma
2009-08-11 16:37                       ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7D8B92.5090906@oracle.com \
    --to=tao.ma@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.