* on when to checksum @ 2005-04-20 22:25 Tom Lord 2005-04-20 22:41 ` Linus Torvalds 2005-04-21 16:53 ` Andrew Timberlake-Newell 0 siblings, 2 replies; 8+ messages in thread From: Tom Lord @ 2005-04-20 22:25 UTC (permalink / raw) To: git; +Cc: torvalds Linus, I think you have made a mistake by moving the sha1 checksum from the zipped form to the inflated form. Here is why: What you have set in motion with `git' is an ad-hoc p2p network for sharing filesystem trees -- a global distributed filesystem. I believe your starter here has a good chance of taking off to be much, much larger than just a tool for the kernel. A subset of your work: blobs and blob databaes, has much wider application than just sharing trees: Those parts of `git' can form a very solid foundation for many other applications as well. To the extent `git' succeeds in the context of the kernel, it will be invested in and extended and generalized --- and the kernel project will benefit. So don't ignore those wider applications even though they are not your focus today: they will generate investment that feeds back to your project. Your `git' is silent on transports and mirroring of blob databases -- tasks for scripting, sure -- but those elements won't be far behind. Eventually, slinging around blobs as atomic elements of payloads will become very common. The blob handle (aka "address")/payload model of a blob db is very clean and simple. In a network of nodes speaking to one and other by exchanging blobs, I forsee a prominent need for intermediate nodes that process blobs "blindly" and as quickly as possible. Blob compression is mostly goofy if regarded just as a way to save on (diminishingly cheap) disk space but it is mostly sane if regarded as a way to cut the cost of network bandwidth roughly in half. Must intermediate nodes inflate the payloads passing through them or which they cache just to validate them? That's not a desirable otucome for many obvious reasonhs. There *are* concerns about checksumming zips: it is necessary to nail down the zip process and make sure it is absolutely and permanently deterministic for this application. But *that* is the problem to solve, not avoid by moving what the checksum refers to. Thanks, -t ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: on when to checksum 2005-04-20 22:25 on when to checksum Tom Lord @ 2005-04-20 22:41 ` Linus Torvalds 2005-04-20 22:52 ` Tom Lord 2005-04-21 16:53 ` Andrew Timberlake-Newell 1 sibling, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2005-04-20 22:41 UTC (permalink / raw) To: Tom Lord; +Cc: git On Wed, 20 Apr 2005, Tom Lord wrote: > > I think you have made a mistake by moving the sha1 checksum from the > zipped form to the inflated form. Here is why: I'd have agreed with you (and I did, violently) if it wasn't for the performance issues. It makes a huge difference for write-tree, and to me, clearly performance _does_ matter. Fractions of seconds may not sound like a lot, but they add up. I work with 200-patch series myself all the time, so I'm very sensitive to a 0.3 second difference in performance. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: on when to checksum 2005-04-20 22:41 ` Linus Torvalds @ 2005-04-20 22:52 ` Tom Lord 2005-04-20 23:07 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Tom Lord @ 2005-04-20 22:52 UTC (permalink / raw) To: torvalds; +Cc: git From: Linus Torvalds <torvalds@osdl.org> On Wed, 20 Apr 2005, Tom Lord wrote: > > I think you have made a mistake by moving the sha1 checksum from the > zipped form to the inflated form. Here is why: I'd have agreed with you (and I did, violently) if it wasn't for the performance issues. It makes a huge difference for write-tree, and to me, clearly performance _does_ matter. Fractions of seconds may not sound like a lot, but they add up. I work with 200-patch series myself all the time, so I'm very sensitive to a 0.3 second difference in performance. How many times per day do you invoke `write-tree' and why? It takes a large multiple of `0.3s' to get me to take you seriously on this point. I have long harbored the suspician that your perceived bandwidth implies that you process a lot of patches unread or barely read -- implying that your day-to-day bitslingling could/should largely be handled by an Arch-style patch-queue-manager (a script). -t ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: on when to checksum 2005-04-20 22:52 ` Tom Lord @ 2005-04-20 23:07 ` Linus Torvalds 2005-04-20 23:39 ` Tom Lord 2005-05-02 19:21 ` Tom Lord 0 siblings, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2005-04-20 23:07 UTC (permalink / raw) To: Tom Lord; +Cc: git On Wed, 20 Apr 2005, Tom Lord wrote: > > How many times per day do you invoke `write-tree' and why? Every single commit does a write-tree, so when I merge with Andrew, it's usually a series of 100-250 of them in a row. (Actually, _usualyl_ it's smaller series, but it's the big series that can be painful enough to matter). > It takes a large multiple of `0.3s' to get me to take you seriously > on this point. The thing is, I don't "trickle" things in. That would be horribly inefficient for me. So I go over the patches, make a mbox, and do them all in one go. And then they need to happen _fast_. If it takes 20 minutes, I go away for coffee or something, and then if something didn't apply half-way through, I will have lost my "context". That's why I want things instant. Not because I have huge daily throughput issues, but I have huge _latency_ issues. I considered doing a "two-level" thing, where I first did the stuff in a light-weigth patch manager, and then batched things up in the background for the real thing. But the fact is, I don't think it's needed. Not the way git performs now. If I can apply a hundred patches in a minute or two, I have not "lost the context" if it turns out that there is some silly glitch with one of them. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: on when to checksum 2005-04-20 23:07 ` Linus Torvalds @ 2005-04-20 23:39 ` Tom Lord 2005-05-02 19:21 ` Tom Lord 1 sibling, 0 replies; 8+ messages in thread From: Tom Lord @ 2005-04-20 23:39 UTC (permalink / raw) To: torvalds; +Cc: git (I'll have to study/think about that for a while before a proper reply. Tomorrow, probably.) Thanks, -t ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: on when to checksum 2005-04-20 23:07 ` Linus Torvalds 2005-04-20 23:39 ` Tom Lord @ 2005-05-02 19:21 ` Tom Lord 2005-05-02 19:57 ` Linus Torvalds 1 sibling, 1 reply; 8+ messages in thread From: Tom Lord @ 2005-05-02 19:21 UTC (permalink / raw) To: torvalds; +Cc: git The thing is, I don't "trickle" things in. That would be horribly inefficient for me. So I go over the patches, make a mbox, and do them all in one go. And then they need to happen _fast_. If it takes 20 minutes, I go away for coffee or something, and then if something didn't apply half-way through, I will have lost my "context". That's why I want things instant. Not because I have huge daily throughput issues, but I have huge _latency_ issues. I'm curious about what is the value of the "batch" nature of that proces? Presumably most patches apply cleanly and most or orthogonal (order independent). I'm sure that there are frequently interesting exceptions but am I generally right about "most" here? So, if I understand, you review each change before stuffing it in a mailbox, then you apply all the patches in that mailbox in batch. In the majority of cases, the buffering of changes in the mailbox adds nothing. Why isn't that more automated: when you approve a change, it could be applied at once, in the background. If conflictless, it can be committed, tested, whatever. If conflicting, *then* the change can be buffered up for you to look at. Explicit declarations from programmers or text-based computations about dependencies among the patches can help improve the queue management in more complicated cases. In other words, a more asynchronous process might save you time *and* pay off by reserving more of your attention for areas where it's really needed. -t ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: on when to checksum 2005-05-02 19:21 ` Tom Lord @ 2005-05-02 19:57 ` Linus Torvalds 0 siblings, 0 replies; 8+ messages in thread From: Linus Torvalds @ 2005-05-02 19:57 UTC (permalink / raw) To: Tom Lord; +Cc: git On Mon, 2 May 2005, Tom Lord wrote: > > I'm curious about what is the value of the "batch" nature of that > proces? My time. I don't know about other people, but I don't multitask. I do one thing, and that's it. I don't move my mouse around. I sit in my mail reader, and I read email. I don't read one email, switch to another window, apply it, swithc back, read the next email etc etc. In fact, I claim that anybody who works that way is going to have an IQ of about 15 points lower than somebody who batches things up. Just because you end up losing your context, and that effectively makes you stupid. Concentration is a wonderful thing, but it _requires_ that you do things in a concentrated manner. > So, if I understand, you review each change before stuffing it in a > mailbox, then you apply all the patches in that mailbox in batch. > In the majority of cases, the buffering of changes in the mailbox > adds nothing. I read email, and while reading email I save the interesting ones off to another mbox (I call mine "doit"). They get saved off for "later perusal". I do a first-order review at that stage, and in fact, 95% of the time, what goes into the "doit" folder _will_ get applied. Not 100%, though, exactly because at this stage I just read email and work in a mail-reader: I don't usually even look at the actual kernel sources that a patch involves. In particular, sometimes it turns out that the patch wasn't against my version at all, but against a -mm tree, and I just don't even worry about technical details at that stage. Stage #2 is going through the "doit" folder at some later date (maybe a couple of times a day), and going through it one more time. Maybe not that much more "carefully", but with a different intent - now I actually check sign-offs, add my own, and check out the actual problems in the source tree if needed. Stage #3 is actually applying it. _Each_ stage culls out bad things. And I _really_ don't bounce between stages. > In other words, a more asynchronous process might save you time *and* > pay off by reserving more of your attention for areas where it's > really needed. It's not asynchronous. It's batched in different stages so that I can work better. And latency matters. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: on when to checksum 2005-04-20 22:25 on when to checksum Tom Lord 2005-04-20 22:41 ` Linus Torvalds @ 2005-04-21 16:53 ` Andrew Timberlake-Newell 1 sibling, 0 replies; 8+ messages in thread From: Andrew Timberlake-Newell @ 2005-04-21 16:53 UTC (permalink / raw) To: 'Tom Lord'; +Cc: torvalds, git Tom Lord graced us with: > I think you have made a mistake by moving the sha1 checksum from the > zipped form to the inflated form. Here is why: > > What you have set in motion with `git' is an ad-hoc p2p network for > sharing filesystem trees -- a global distributed filesystem. I > believe your starter here has a good chance of taking off to be much, > much larger than just a tool for the kernel. This might rather be a call for a git derivative. As Linus has already mentioned in this thread, git is optimized for his need for local speed. But while sacrificing local speed for network speed would break git by stepping away from the git philosophy, a gitling with a different philosophy but making use of gitish techniques could make that change without being broken even though git itself can't. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-05-02 19:49 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-04-20 22:25 on when to checksum Tom Lord 2005-04-20 22:41 ` Linus Torvalds 2005-04-20 22:52 ` Tom Lord 2005-04-20 23:07 ` Linus Torvalds 2005-04-20 23:39 ` Tom Lord 2005-05-02 19:21 ` Tom Lord 2005-05-02 19:57 ` Linus Torvalds 2005-04-21 16:53 ` Andrew Timberlake-Newell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).