From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86F5EC433C1 for ; Tue, 30 Mar 2021 05:39:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4FD7A61989 for ; Tue, 30 Mar 2021 05:39:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229675AbhC3Fis (ORCPT ); Tue, 30 Mar 2021 01:38:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229479AbhC3Fi2 (ORCPT ); Tue, 30 Mar 2021 01:38:28 -0400 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B621AC061762 for ; Mon, 29 Mar 2021 22:38:27 -0700 (PDT) Received: by mail-lf1-x129.google.com with SMTP id v15so21908494lfq.5 for ; Mon, 29 Mar 2021 22:38:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=hmjGXddVi87DUaOcjGBApoP51UasDRVFxIHB231i0mU=; b=M4yTjKQwJDvc5XlLWgCb146N91DLgPyxw+NE1//+IjtA0qRs+e8g5iaw7H0YB0FUMw mOKiFLvVYXy/xQbwzLRfKsZeK0mafLqYfl59xoBS3kr3sC6QqCzeEurMfZRb/gbDPjJm YNeuFHl+eu22rgF5JxZdcv+X0crgPGTKfS0EUSHYv8XMRkBcuQHxOo2FJoRfKLKFTvnl CGWOP2K4E/svo5Jmi1HuEPJJzOI1SfirUC9Z1yhus8m2DmVS24aTEQtp1SV5ytWUB5GG RWG63iMyCQpswMeNej0Ibf8fJuN+q1gp2S4iXEFb7plDZQ96aTyT/UGcf/8dWbh2VIEV r/Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=hmjGXddVi87DUaOcjGBApoP51UasDRVFxIHB231i0mU=; b=NEQxZ0Qvkdd+QwHER310OOIOh257bg2VYMgLLCnwOxAdqUFtZETXLaRvXYkeGnbpz6 4RPkHKbN+o83SV31p/1tLo8OG7zA7g29jrdPteRSTzmQyW4rkHpIYdEtjEtz2+XpSGDr KXZVbu39jVzLDggivhKEQ2MDyhQdQyDsV6iuCgOJ4yNJb24n8CLo7zJgpLJP52xS+6bD i/BLfJTbp9DqrIEuuF/cTuhqXVyWxj613sy04TJlFDCbc/uYMkNbswRq3Xj/qwbrGg5i p/+NC6ZJXjff1DnaZrGXJBSlUYnx16oinb2e0vhZqKl//v1bAGNYD4hF70wfs5v+Cf4k M7Gg== X-Gm-Message-State: AOAM532RM5zeO57bj2GGTuwpyD8qpEJJQcF6uyLA0O73471XkyD56zzA KoCzGYyWI2AevcWHgo7PuTW1eJhQj/XrOg== X-Google-Smtp-Source: ABdhPJyzcyS1FGCz2Os7OYP2jLVf+paV6VQGm1VOv4sTPIDHEuvztAsAK2mqzxnZjZ1A7BexLjxPwA== X-Received: by 2002:a05:6512:c2a:: with SMTP id z42mr17727587lfu.630.1617082705517; Mon, 29 Mar 2021 22:38:25 -0700 (PDT) Received: from ?IPv6:2a00:1370:812d:f67d:23b0:24c5:db70:4d19? ([2a00:1370:812d:f67d:23b0:24c5:db70:4d19]) by smtp.gmail.com with ESMTPSA id w24sm2666793ljh.19.2021.03.29.22.38.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 29 Mar 2021 22:38:25 -0700 (PDT) Subject: Re: btrfs-send format that contains binary diffs From: Andrei Borzenkov To: Claudius Heine , linux-btrfs@vger.kernel.org Cc: Henning Schild References: <5ba46b04-f3ba-03ef-6ad5-38fd44f8c67e@denx.de> <535709bb-0bde-6193-2cef-0c1d037ba211@gmail.com> Message-ID: <3cfc2421-4683-9439-1301-09d013a670ec@gmail.com> Date: Tue, 30 Mar 2021 08:38:24 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <535709bb-0bde-6193-2cef-0c1d037ba211@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 30.03.2021 08:33, Andrei Borzenkov wrote: > On 29.03.2021 22:14, Claudius Heine wrote: >> Hi Andrei, >> >> On 2021-03-29 18:30, Andrei Borzenkov wrote: >>> On 29.03.2021 16:16, Claudius Heine wrote: >>>> Hi, >>>> >>>> I am currently investigating the possibility to use `btrfs-stream` files >>>> (generated by `btrfs send`) for deploying a image based update to >>>> systems (probably embedded ones). >>>> >>>> One of the issues I encountered here is that btrfs-send does not use any >>>> diff algorithm on files that have changed from one snapshot to the next. >>>> >>> >>> btrfs send works on block level. It sends blocks that differ between two >>> snapshots. >> >> Are you sure? >> > > Yes. > >> I did a test with a 32MiB random file. I created one snapshot, then >> changed (not deleted or added) one byte in that file and then created a >> snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it >> would be only block based, then I would have expected that it would just >> contain the changed block, not the whole file. And if I use a smaller >> file on the same file system, then the `btrfs-stream` is smaller as well. >> >> I looked into those `btrfs-stream` files using [1] and also [2] as well >> as the code. While I haven't understood everything there yet, it >> currently looks to me like it is file based. >> > > btrfs send is not pure block based image, because it would require two > absolutely identical filesystems. It needs to replicate filesystem > structure so it of course needs to know which files are created/deleted. > But for each file it only sends changed parts since previous snapshot. > This only works if both snapshots refer to the *same* file. > Or more precisely - btrfs send knows which filesystem content was part of previous snapshot and so is already present on destination and it will not send this content again. It is actually more or less irrelevant which files this content belongs to. > As was already mentioned, you need to understand how your files are > changed. In particular, standard tools for software update do not > rewrite files in place - they create new files with new content. From > btrfs perspective they are completely different; two files with the same > name in two snapshots do not share a single byte. When you compute delta > between two snapshots you get instructions to delete old file and create > new file with new content (that will be renamed to the same name as > deleted old file). This also by necessity sends full new content. > > So yes, btrfs replication is block based; similarity is determined by > how much physical data is shared between two files. And you expect file > based replication where file names determine whether files should be > considered the same and changes are computed for two files with the same > name. > >>> >>>> One way to implement this would be to add some sort of 'patch' command >>>> to the `btrfs-stream` format. >>>> >>> >>> This would require reading complete content of both snapshots instead if >>> just computing block diff using metadata. Unless I misunderstand what >>> you mean. >> I think I should only need access to the old snapshot as well as the >> `btrfs-stream` file. But I currently don't have a complete PoC of this >> ready. >> >> regards, >> Claudius >> >> [1] https://github.com/sysnux/btrfs-snapshots-diff >> [2] https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive >