From: Ingo Molnar <mingo@elte.hu>
To: Paul Jackson <pj@engr.sgi.com>
Cc: torvalds@osdl.org, pasky@ucw.cz, rddunlap@osdl.org,
ross@jose.lug.udel.edu, linux-kernel@vger.kernel.org,
git@vger.kernel.org
Subject: Re: [rfc] git: combo-blobs
Date: Mon, 11 Apr 2005 17:12:04 +0200 [thread overview]
Message-ID: <20050411151204.GA5562@elte.hu> (raw)
In-Reply-To: <20050411074552.4e2e656b.pj@engr.sgi.com>
* Paul Jackson <pj@engr.sgi.com> wrote:
> Hmmm ... I have this strong sense that I am about 2 hours away from
> smacking my forehead and groaning "Duh - so that's what Ingo meant!"
>
> However, one must play out one's destiny.
>
> Could you provide an example scenario, which results in the creation
> of a combo-blob?
>
> The best I can come up with is the following.
>
> Let's say Nick changes one line in the middle of kernel/sched.c (yeah
> - I know - unlikely scenario - he usually changes more than that -
> nevermind that detail.)
>
> In the days Before Combo Blobs (BCB), git would have been told that
> kernel/sched.c was to be picked up, and would have wrapped it up in a
> zlib'd blob, sha1summed it, seen it was a new sum, and added that blob
> to its objects (or something like this -- I'm still a little fuzzy on
> these git details.)
>
> But Nick just downloaded the latest git 1.5.11.1 which has added
> support for combo blobs, so now, guessing here, instead of wrapping up
> the new sched.c, git instead unwraps the old one, diff's with the new,
> notices a couple of long sequences that are unchanged, wraps up both
> of those sequences as a couple of relatively large blobs, and wraps up
> the new lines that Nick just coded in the middle as a small blob, and
> puts all three in the object store, along with another small
> combo-blob, tying them all together.
actually, git would just include by reference the previous blob.
lets say we had the previous version of sched.c in a blob, ID
cc4ee6107d19f89898a8c89d45810f01710f2ff4. We have the new edit (which is
small, lets say 20 bytes) in blob e010fab710092b19be6e26de1721e249dff2d141.
We'd create the combo-blob representing the new version of sched.c, the
following way:
include cc4ee6107d19f89898a8c89d45810f01710f2ff4 0 54010
include e010fab710092b19be6e26de1721e249dff2d141 0 20
include cc4ee6107d19f89898a8c89d45810f01710f2ff4 54030 73061
so we'd include (by reference) most of the previous version, with a
small blob for the extras. Since sched.c compresses down to 36K, we
saved ~32K of bandwidth, and somewhere on the order of 20K of storage.
to construct the combo blob later on, we do have to unpack sched.c (and
if it's already a combo-blob that is not cached then we'd have to unpack
all parents until we arrive at some full blob).
> So far, not too bad. Haven't gained anything, and required the
> unpacking of a zlib blog we didn't require before, and the running and
> analyzing of a diff we didn't require before, but the end result is
> only moderately worse - four object blobs instead of one, but of total
> size not much larger (well, total size typically 3 disk blocks worse,
> due to a slight increase in fragmentation from using 4 blocks to store
> what used to be in one.)
we'd have 2 new objects (the 'delta' and the 'combo' blob).
(if # of objects is an issue then we could include new data in the combo
blob itself too, but that's getting too complex i think.)
Ingo
next prev parent reply other threads:[~2005-04-11 15:12 UTC|newest]
Thread overview: 178+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-09 19:45 more git updates Linus Torvalds
2005-04-09 19:56 ` Linus Torvalds
2005-04-09 20:07 ` Petr Baudis
2005-04-09 21:00 ` Linus Torvalds
2005-04-09 21:00 ` tony.luck
2005-04-10 16:01 ` Linus Torvalds
2005-04-12 17:34 ` Helge Hafting
2005-04-10 18:19 ` Paul Jackson
2005-04-10 23:04 ` Bernd Eckenfels
2005-04-11 9:27 ` Anton Altaparmakov
2005-04-09 21:08 ` Linus Torvalds
2005-04-09 23:31 ` Linus Torvalds
2005-04-10 2:41 ` Petr Baudis
2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis
2005-04-10 16:55 ` Linus Torvalds
2005-04-10 19:49 ` Sean
2005-04-10 17:33 ` Ingo Molnar
2005-04-10 17:42 ` Willy Tarreau
2005-04-10 17:45 ` Ingo Molnar
2005-04-10 18:45 ` Petr Baudis
2005-04-10 19:13 ` Willy Tarreau
2005-04-10 21:27 ` Petr Baudis
2005-04-10 20:38 ` Linus Torvalds
2005-04-10 21:39 ` Linus Torvalds
2005-04-10 23:49 ` Petr Baudis
2005-04-10 22:27 ` Petr Baudis
2005-04-10 23:10 ` Linus Torvalds
2005-04-10 23:26 ` Petr Baudis
2005-04-10 23:46 ` Linus Torvalds
2005-04-10 23:56 ` Petr Baudis
2005-04-11 0:20 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds
2005-04-11 0:27 ` Petr Baudis
2005-04-11 7:45 ` Ingo Molnar
2005-04-11 8:40 ` Florian Weimer
2005-04-11 10:52 ` Petr Baudis
2005-04-11 16:05 ` Florian Weimer
2005-04-10 23:23 ` [ANNOUNCE] git-pasky-0.1 Paul Jackson
2005-04-11 0:15 ` Randy.Dunlap
2005-04-11 0:30 ` Re: " Petr Baudis
2005-04-11 1:11 ` Linus Torvalds
2005-04-10 20:41 ` Paul Jackson
2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis
2005-04-11 2:46 ` Daniel Barkalow
2005-04-11 10:17 ` Petr Baudis
2005-04-11 8:50 ` Ingo Molnar
2005-04-11 10:16 ` Petr Baudis
2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis
2005-04-12 12:47 ` Martin Schlemmer
2005-04-12 13:02 ` Petr Baudis
2005-04-12 13:13 ` Martin Schlemmer
2005-04-12 13:23 ` Petr Baudis
2005-04-12 13:07 ` David Woodhouse
2005-04-13 8:47 ` Russell King
2005-04-13 8:59 ` Petr Baudis
2005-04-13 9:06 ` H. Peter Anvin
2005-04-13 9:09 ` David Woodhouse
2005-04-13 9:25 ` David Woodhouse
2005-04-13 9:42 ` Petr Baudis
2005-04-13 10:24 ` David Woodhouse
2005-04-13 17:01 ` Daniel Barkalow
2005-04-13 18:07 ` Petr Baudis
2005-04-13 18:22 ` git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) Linus Torvalds
2005-04-13 18:38 ` Re: Re: Re: [ANNOUNCE] git-pasky-0.3 Daniel Barkalow
2005-04-13 12:43 ` Xavier Bestel
2005-04-13 16:48 ` H. Peter Anvin
2005-04-13 18:15 ` Xavier Bestel
2005-04-13 23:05 ` bd
2005-04-13 14:38 ` Linus Torvalds
2005-04-13 14:47 ` David Woodhouse
2005-04-13 14:59 ` Linus Torvalds
2005-04-13 9:35 ` Russell King
2005-04-13 9:38 ` Russell King
2005-04-13 9:49 ` Petr Baudis
2005-04-13 11:02 ` Ingo Molnar
2005-04-13 14:50 ` Linus Torvalds
2005-04-13 9:46 ` Petr Baudis
2005-04-13 10:28 ` Russell King
2005-04-13 19:03 ` Russell King
2005-04-13 19:13 ` Petr Baudis
2005-04-13 19:21 ` Russell King
2005-04-13 19:23 ` H. Peter Anvin
2005-04-10 6:53 ` more git updates Christopher Li
2005-04-10 11:48 ` Ralph Corderoy
2005-04-10 19:23 ` Paul Jackson
2005-04-10 18:42 ` Christopher Li
2005-04-10 22:30 ` Petr Baudis
2005-04-11 13:58 ` H. Peter Anvin
2005-04-20 20:29 ` Kai Henningsen
2005-04-24 0:42 ` Paul Jackson
2005-04-24 1:29 ` Bernd Eckenfels
2005-04-24 4:13 ` Paul Jackson
2005-04-24 4:38 ` Bernd Eckenfels
2005-04-24 4:53 ` Paul Jackson
2005-04-25 11:57 ` Theodore Ts'o
2005-04-25 16:40 ` David Wagner
2005-04-25 20:35 ` Bernd Eckenfels
2005-04-24 16:52 ` Horst von Brand
2005-04-24 8:00 ` Kai Henningsen
[not found] ` <6f6293f10504210220744af114@mail.gmail.com>
2005-04-24 8:01 ` Kai Henningsen
2005-04-11 11:35 ` [rfc] git: combo-blobs Ingo Molnar
2005-04-11 14:45 ` Paul Jackson
2005-04-11 15:12 ` Ingo Molnar [this message]
2005-04-11 15:32 ` Linus Torvalds
2005-04-11 15:39 ` Ingo Molnar
2005-04-11 15:57 ` Ingo Molnar
2005-04-11 16:01 ` Linus Torvalds
2005-04-11 16:33 ` Ingo Molnar
2005-04-12 5:42 ` Barry K. Nathan
2005-04-11 18:13 ` Chris Wedgwood
2005-04-11 18:30 ` Linus Torvalds
2005-04-11 20:18 ` Linus Torvalds
2005-04-11 18:40 ` Petr Baudis
2005-04-11 17:50 ` Paul Jackson
2005-04-11 15:28 ` Ingo Molnar
2005-04-11 15:31 ` Ingo Molnar
2005-04-12 4:05 ` more git updates David Eger
2005-04-12 8:16 ` Petr Baudis
2005-04-12 20:44 ` David Eger
2005-04-12 21:21 ` Linus Torvalds
2005-04-12 22:29 ` Krzysztof Halasa
2005-04-12 22:49 ` Linus Torvalds
2005-04-13 4:32 ` Matthias Urlichs
2005-04-12 22:36 ` David Eger
2005-04-12 23:48 ` Panagiotis Issaris
2005-04-12 23:40 ` Andrea Arcangeli
2005-04-12 23:45 ` Linus Torvalds
2005-04-13 0:14 ` Andrea Arcangeli
2005-04-13 1:10 ` Linus Torvalds
2005-04-13 10:59 ` Andrea Arcangeli
2005-04-13 20:44 ` Matt Mackall
2005-04-13 23:42 ` Krzysztof Halasa
2005-04-14 0:13 ` Matt Mackall
2005-04-13 9:30 ` Russell King
2005-04-13 10:20 ` Andrea Arcangeli
2005-04-13 14:43 ` Linus Torvalds
2005-04-10 2:07 ` Paul Jackson
2005-04-10 2:20 ` Paul Jackson
2005-04-10 2:09 ` Paul Jackson
2005-04-10 7:51 ` Junio C Hamano
2005-04-10 5:53 ` Christopher Li
2005-04-10 9:28 ` Junio C Hamano
2005-04-10 7:06 ` Christopher Li
2005-04-10 11:38 ` tony.luck
2005-04-10 9:48 ` Petr Baudis
2005-04-10 9:40 ` Wichert Akkerman
2005-04-10 9:41 ` Petr Baudis
2005-04-10 7:09 ` Christopher Li
2005-04-10 11:21 ` Proposal for shell-patch-format [was: Re: more git updates..] Rutger Nijlunsing
2005-04-10 15:44 ` more git updates Linus Torvalds
2005-04-10 17:00 ` Rutger Nijlunsing
2005-04-10 18:50 ` Paul Jackson
2005-04-10 20:57 ` Linus Torvalds
2005-04-10 19:03 ` Christopher Li
2005-04-10 22:38 ` Linus Torvalds
2005-04-10 19:53 ` Christopher Li
2005-04-10 23:21 ` Linus Torvalds
2005-04-10 21:28 ` Christopher Li
2005-04-12 5:14 ` David Lang
2005-04-12 6:00 ` Paul Jackson
2005-04-12 7:05 ` Barry K. Nathan
2005-04-11 6:57 ` bert hubert
2005-04-11 7:20 ` Christer Weinigel
2005-04-10 23:14 ` Paul Jackson
2005-04-10 23:38 ` Linus Torvalds
2005-04-11 0:19 ` Paul Jackson
2005-04-11 15:49 ` Randy.Dunlap
2005-04-11 18:30 ` Petr Baudis
2005-04-11 0:10 ` Petr Baudis
2005-04-09 22:00 ` Paul Jackson
2005-04-09 23:21 ` Ralph Corderoy
2005-04-10 0:39 ` Paul Jackson
2005-04-10 1:14 ` Bernd Eckenfels
2005-04-10 1:33 ` Paul Jackson
2005-04-10 10:22 ` Ralph Corderoy
2005-04-10 17:30 ` Paul Jackson
2005-04-10 17:31 ` Rik van Riel
2005-04-10 17:35 ` Ingo Molnar
2005-04-11 16:46 ` ross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050411151204.GA5562@elte.hu \
--to=mingo@elte.hu \
--cc=git@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pasky@ucw.cz \
--cc=pj@engr.sgi.com \
--cc=rddunlap@osdl.org \
--cc=ross@jose.lug.udel.edu \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox