From: Miao Xie <miaox@cn.fujitsu.com>
To: Alex Lyakas <alex.btrfs@zadarastorage.com>
Cc: Chen Yang <chenyang.fnst@cn.fujitsu.com>,
Jan Schmidt <list.btrfs@jan-o-sch.net>,
Alexander Block <ablock84@googlemail.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism
Date: Thu, 24 Jan 2013 19:39:17 +0800 [thread overview]
Message-ID: <51011D65.3070800@cn.fujitsu.com> (raw)
In-Reply-To: <CAOcd+r0hMRic8=fXNmUSbifAXwnQER8wCkj=mHSBGpQ7fM=LDA@mail.gmail.com>
On thu, 24 Jan 2013 10:53:17 +0200, Alex Lyakas wrote:
> Hi Chen,
> with all due respect, what do you mean by "I see" and "OK for users"?
> The semantics of the fallocate API with/without the
> FALLOC_FL_PUNCH_HOLE flag is not defined by me or you. As far as I
> understand, the file system *must* adhere to the semantics of this
> API, as defined elsewhere.
> Looking, for example, here:
> http://man7.org/linux/man-pages/man2/fallocate.2.html
>
> "Allocating disk space
> The default operation (i.e., mode is zero) of fallocate() allocates
> and initializes to zero the disk space within the range specified by
> offset and len."
>
> "Deallocating file space
> Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux
> 2.6.38) in mode deallocates space (i.e., creates a hole) in the byte
> range starting at offset and continuing for len bytes. Within the
> specified range, partial file system blocks are zeroed, and whole file
> system blocks are removed from the file."
>
> These are clearly two different modes of operation, and I don't think
> you or me can decide otherwise, at this point.
I think send/receive commands just need ensure the content of the file is
right, not the disk format. That is also the reason why the developers just
send out the zero data when they meet a hole or pre-allocated extent. From
this viewpoint, they are the same.
Thanks
Miao
>
> However, I may be not knowledgeable enough to confirm this.
> Jan/Alexander, can you perhaps comment on this?
>
> Thanks,
> Alex.
>
>
>
>
>
>
>
> On Thu, Jan 24, 2013 at 3:54 AM, Chen Yang <chenyang.fnst@cn.fujitsu.com> wrote:
>> 于 2013-1-23 19:56, Alex Lyakas 写道:
>>> Hi,
>>>
>>> On Wed, Jan 23, 2013 at 1:04 PM, Chen Yang <chenyang.fnst@cn.fujitsu.com> wrote:
>>>> From: Chen Yang <chenyang.fnst@cn.fujitsu.com>
>>>> Date: Wed, 23 Jan 2013 11:21:51 +0800
>>>> Subject: [PATCH] Btrfs/send: sparse and pre-allocated file support for
>>>> btrfs-send mechanism
>>>>
>>>> When sending a file with sparse or pre-allocated part,
>>>> these parts will be sent as ZERO streams, and it's unnecessary.
>>>>
>>>> There are two ways to improve this, one is just skip the EMPTY parts,
>>>> and the other one is to add a punch command to send, when an EMPTY parts
>>>> was detected. But considering a case of incremental sends, if we choose
>>>> the first one, when a hole got punched into the file after the initial
>>>> send, the data will be unchanged on the receiving side when received
>>>> incrementally. So the second choice is right.
>>>>
>>>> Signed-off-by: Cheng Yang <chenyang.fnst@cn.fujitsu.com>
>>>> ---
>>>> fs/btrfs/send.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>> fs/btrfs/send.h | 3 +-
>>>> 2 files changed, 61 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>> index 5445454..31e9aef 100644
>>>> --- a/fs/btrfs/send.c
>>>> +++ b/fs/btrfs/send.c
>>>> @@ -3585,6 +3585,52 @@ out:
>>>> return ret;
>>>> }
>>>>
>>>> +static int send_punch(struct send_ctx *sctx, u64 offset, u32 len)
>>>> +{
>>>> + int ret = 0;
>>>> + struct fs_path *p;
>>>> + mm_segment_t old_fs;
>>>> +
>>>> + p = fs_path_alloc(sctx);
>>>> + if (!p)
>>>> + return -ENOMEM;
>>>> +
>>>> + /*
>>>> + * vfs normally only accepts user space buffers for security reasons.
>>>> + * we only read from the file and also only provide the read_buf buffer
>>>> + * to vfs. As this buffer does not come from a user space call, it's
>>>> + * ok to temporary allow kernel space buffers.
>>>> + */
>>>> + old_fs = get_fs();
>>>> + set_fs(KERNEL_DS);
>>>> +
>>>> +verbose_printk("btrfs: send_fallocate offset=%llu, len=%d\n", offset, len);
>>>> +
>>>> + ret = open_cur_inode_file(sctx);
>>>> + if (ret < 0)
>>>> + goto out;
>>>> +
>>>> + ret = begin_cmd(sctx, BTRFS_SEND_C_PUNCH);
>>>> + if (ret < 0)
>>>> + goto out;
>>>> +
>>>> + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
>>>> + if (ret < 0)
>>>> + goto out;
>>>> +
>>>> + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
>>>> + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
>>>> + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len);
>>>> +
>>>> + ret = send_cmd(sctx);
>>>> +
>>>> +tlv_put_failure:
>>>> +out:
>>>> + fs_path_free(sctx, p);
>>>> + set_fs(old_fs);
>>>> + return ret;
>>>> +}
>>>> +
>>>> /*
>>>> * Read some bytes from the current inode/file and send a write command to
>>>> * user space.
>>>> @@ -3718,6 +3764,7 @@ static int send_write_or_clone(struct send_ctx *sctx,
>>>> u64 pos = 0;
>>>> u64 len;
>>>> u32 l;
>>>> + u64 bytenr;
>>>> u8 type;
>>>>
>>>> ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
>>>> @@ -3731,8 +3778,19 @@ static int send_write_or_clone(struct send_ctx *sctx,
>>>> * sure to send the whole thing
>>>> */
>>>> len = PAGE_CACHE_ALIGN(len);
>>>> - } else {
>>>> + } else if (type == BTRFS_FILE_EXTENT_REG) {
>>>> len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
>>>> + bytenr = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
>>>> + if (bytenr == 0) {
>>>> + ret = send_punch(sctx, offset, len);
>>>> + goto out;
>>>> + }
>>>> + } else if (type == BTRFS_FILE_EXTENT_PREALLOC) {
>>>> + len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
>>>> + ret = send_punch(sctx, offset, len);
>>>> + goto out;
>>>> + } else {
>>>> + BUG();
>>>> }
>>>
>>> Are these two cases really the same? In the bytenr == 0 we want to
>>> deallocate the range. While in the prealloc case, we want to ensure
>>> disk space allocation. Or am I mistaken?
>>> Looking at the receive side, you use the same command for both:
>>> ret = fallocate(r->write_fd,
>>> FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>>> offset, len);
>>>
>>> Looking at btrfs_fallocate code, in that case it will always punch the hole:
>>> static long btrfs_fallocate(struct file *file, int mode,
>>> loff_t offset, loff_t len)
>>> ...
>>> if (mode & FALLOC_FL_PUNCH_HOLE)
>>> return btrfs_punch_hole(inode, offset, len);
>>>
>>> So maybe you should have two different commands, or add a flag to
>>> distinguish between the two cases?
>>>
>>> Thanks,
>>> Alex.
>>>
>>
>> I see sparse and pre-allocated parts both as EMPTY hole,
>> and although different for a file, it's OK for users.
>>
>> Thanks
>>
>>>
>>>
>>>>
>>>> if (offset + len > sctx->cur_inode_size)
>>>> diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
>>>> index 1bf4f32..659ac8f 100644
>>>> --- a/fs/btrfs/send.h
>>>> +++ b/fs/btrfs/send.h
>>>> @@ -20,7 +20,7 @@
>>>> #include "ctree.h"
>>>>
>>>> #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
>>>> -#define BTRFS_SEND_STREAM_VERSION 1
>>>> +#define BTRFS_SEND_STREAM_VERSION 2
>>>>
>>>> #define BTRFS_SEND_BUF_SIZE (1024 * 64)
>>>> #define BTRFS_SEND_READ_SIZE (1024 * 48)
>>>> @@ -80,6 +80,7 @@ enum btrfs_send_cmd {
>>>> BTRFS_SEND_C_WRITE,
>>>> BTRFS_SEND_C_CLONE,
>>>>
>>>> + BTRFS_SEND_C_PUNCH,
>>>> BTRFS_SEND_C_TRUNCATE,
>>>> BTRFS_SEND_C_CHMOD,
>>>> BTRFS_SEND_C_CHOWN,
>>>> --
>>>> 1.7.7.6
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-01-24 11:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <50FF58BC.7090604@cn.fujitsu.com>
2013-01-23 11:04 ` [PATCH] Btrfs/send: sparse and pre-allocated file support for, btrfs-send mechanism Chen Yang
2013-01-23 11:56 ` Alex Lyakas
[not found] ` <51009455.7040005@cn.fujitsu.com>
2013-01-24 8:53 ` Alex Lyakas
2013-01-24 11:39 ` Miao Xie [this message]
2013-01-24 11:58 ` Hugo Mills
2013-01-24 12:13 ` Miao Xie
2013-01-24 12:41 ` Hugo Mills
2013-01-24 17:13 ` Jan Schmidt
2013-01-24 18:06 ` Alex Lyakas
2013-01-25 8:56 ` Jan Schmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51011D65.3070800@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=ablock84@googlemail.com \
--cc=alex.btrfs@zadarastorage.com \
--cc=chenyang.fnst@cn.fujitsu.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=list.btrfs@jan-o-sch.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).