* Re: [darcs-devel] Darcs and git: plan of action
@ 2005-04-18 21:04 linux
2005-04-19 0:07 ` Ray Lee
0 siblings, 1 reply; 30+ messages in thread
From: linux @ 2005-04-18 21:04 UTC (permalink / raw)
To: torvalds; +Cc: darcs-devel, git
> Hell no.
>
> The commit _does_ specify the patch uniquely and exactly, so I really
> don't see the point. You can always get the patch by just doing a
>
> git diff $parent_tree $thistree
>
> so putting the patch in the comment is not an option.
Er... no.
One of darcs' big points is that it has at least two fundamentally
different *kinds* of patches. One is the classic diff(1) style.
The other is "replace very instace of identifier `foo` with identifier`bar`".
Note that merging such a patch with another that adds a new instance
of "foo" has a quite different effect from a similar diff-style patch.
Even though both have identical effects on the tree to which they were
initially merged.
And darcs is specifically intended to support additional kinds of patches.
Again, all in order that the patch can work better when applied to
trees *other* that the one it was originally developed against.
Anyway, the point is that, in the darcs world, it is NOT possible to
reconstruct a patch from the before and after trees.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-18 21:04 [darcs-devel] Darcs and git: plan of action linux @ 2005-04-19 0:07 ` Ray Lee 2005-04-19 1:05 ` Kevin Smith 0 siblings, 1 reply; 30+ messages in thread From: Ray Lee @ 2005-04-19 0:07 UTC (permalink / raw) To: linux; +Cc: git, darcs-devel, torvalds On Mon, 2005-04-18 at 21:04 +0000, linux@horizon.com wrote: > The other is "replace very instace of identifier `foo` with identifier`bar`". That could be derived, however, by a particularly smart parser [1]. Alternately, that itself could be embedded in the comment for patches sourced from darcs. Of course, that means patches from others are less commutable than from other darcs users, but that's the price you'd pay for relying on the user to explicitly note a token rename. [1] An example: http://minnie.tuhs.org/Programs/Ctcompare/index.html As for "darcs mv", that can be derived from the before/after pictures of the trees. > And darcs is specifically intended to support additional kinds of patches. Anything missing out of what I listed above? (darcs has adddir and addfile, IIRC, but those are trivially discovered via inspection of the trees as well, I think.) > Anyway, the point is that, in the darcs world, it is NOT possible to > reconstruct a patch from the before and after trees. Not yet, and maybe not ever, but I think we can certainly get closer to discovering what the coder was thinking during a changeset. Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 0:07 ` Ray Lee @ 2005-04-19 1:05 ` Kevin Smith 2005-04-19 1:42 ` Ray Lee 0 siblings, 1 reply; 30+ messages in thread From: Kevin Smith @ 2005-04-19 1:05 UTC (permalink / raw) To: Ray Lee; +Cc: git, darcs-devel Ray Lee wrote: > On Mon, 2005-04-18 at 21:04 +0000, linux@horizon.com wrote: > >>The other is "replace very instace of identifier `foo` with identifier`bar`". > > > That could be derived, however, by a particularly smart parser [1]. No, it can't. Seriously. A darcs replace patch is encoded as rules, not effects, and it is impossible to derive the rules just by looking at the results. Not difficult. Impossible. You could guess, but that's not good enough for darcs to be able to reliably commute the patches later. I am curious whether Linus's suggestion about including the corresponding darcs patch id in the git commit comments would be good enough. > As for "darcs mv", that can be derived from the before/after pictures of > the trees. Perhaps. If a file is moved and edited within the same commit, I'm not sure that you can be certain whether it was done with d 'darcs mv' or not. Requiring separate checkins for the rename and the subsequent modify would make things easier on SCM's, but is impractical in real life. Automated refactoring tools, for example, perform the rename+modify as an atomic operation. Now, git might not need to deal with any of this, because it only needs to work with the kernel project. But darcs does have to deal with this wide range of uses, as does just about any other SCM. I'm *not* advocating cluttering up git with features that are not directly needed for kernel development. I'm just trying to clarify the facts so everyone can understand what darcs is trying to do. Kevin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 1:05 ` Kevin Smith @ 2005-04-19 1:42 ` Ray Lee 2005-04-19 2:05 ` Kevin Smith 2005-04-19 11:05 ` David Roundy 0 siblings, 2 replies; 30+ messages in thread From: Ray Lee @ 2005-04-19 1:42 UTC (permalink / raw) To: Kevin Smith; +Cc: git, darcs-devel On Mon, 2005-04-18 at 21:05 -0400, Kevin Smith wrote: > >>The other is "replace very instace of identifier `foo` with identifier`bar`". > > That could be derived, however, by a particularly smart parser [1]. > > No, it can't. Seriously. A darcs replace patch is encoded as rules, not > effects, and it is impossible to derive the rules just by looking at the > results. Not difficult. Impossible. Okay, either I'm a sight stupider than I thought, or I'm not communicating well. Same net effect either way, I 'spose. If I do a token replace in an editor (say one of those fancy new-fangled refactoring thangs, or good ol' vi), a token-level comparator can discover what I did. That link I sent is an example of one such beast. > You could guess, but that's not good > enough for darcs to be able to reliably commute the patches later. Who said anything about guessing? If a user replaces all instances of foo with bar, that's as close to proof as you can ever get, without recording intent of the user at the time it's done. Now, I realize that darcs *does* record intent, but I claim that's immaterial. Perhaps I'm clueless; it's happened before, I'm resigned to it happening again. So, tell it to me with full jargon, if you will. When it comes down to brass tacks, why does my suggestion place weaker guarantees about the quality of the resulting patch operator? > > As for "darcs mv", that can be derived from the before/after pictures of > > the trees. > > Perhaps. If a file is moved and edited within the same commit, I'm not > sure that you can be certain whether it was done with d 'darcs mv' or > not. Agreed. But then you go lart the committer of that patch. > Requiring separate checkins for the rename and the subsequent > modify would make things easier on SCM's, but is impractical in real > life. Eh? Why? "darcs mv" *is* a commit. Just because it doesn't seem to look like one doesn't change the fact that you just invoked the SCM. > Automated refactoring tools, for example, perform the > rename+modify as an atomic operation. And that's harder, I agree. But unless I'm missing some nifty refactoring editor out there that integrates with darcs during the edit session, the user *still* has to tell the SCM about the rename manually. > Now, git might not need to deal with any of this, because it only needs > to work with the kernel project. It'd be unfortunate if git were limited to such a small developer base. > I'm *not* advocating cluttering up git with features that are not > directly needed for kernel development. I'm not claiming you are. We want the same thing -- a nuanced SCM that can take some of the drudge-work away from this stuff. Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 1:42 ` Ray Lee @ 2005-04-19 2:05 ` Kevin Smith 2005-04-19 22:08 ` Patrick McFarland 2005-04-19 22:40 ` Ray Lee 2005-04-19 11:05 ` David Roundy 1 sibling, 2 replies; 30+ messages in thread From: Kevin Smith @ 2005-04-19 2:05 UTC (permalink / raw) To: Ray Lee; +Cc: git, darcs-devel Ray Lee wrote: > On Mon, 2005-04-18 at 21:05 -0400, Kevin Smith wrote: > >>>>The other is "replace very instace of identifier `foo` with identifier`bar`". >>> >>>That could be derived, however, by a particularly smart parser [1]. >> >>No, it can't. Seriously. A darcs replace patch is encoded as rules, not >>effects, and it is impossible to derive the rules just by looking at the >>results. Not difficult. Impossible. > > > Okay, either I'm a sight stupider than I thought, or I'm not > communicating well. Same net effect either way, I 'spose. > > If I do a token replace in an editor (say one of those fancy new-fangled > refactoring thangs, or good ol' vi), a token-level comparator can > discover what I did. That link I sent is an example of one such beast. The big feature of a darcs replace patch is that it works forward and backward in time. Let me try to come up with an example that can help explain it. Hopefully I'll get it right. Let's start with a file like this that exists in a project for which both you and I have darcs repos: cat dog fish Now, you change it to: cat dog dog fish while I simultaneously do a replace of "dog" with "plant", resulting in: cat plant fish We merge. The final result in both of our trees is: cat plant plant fish Notice that just by looking at my diffs, you can't tell that I used a replace operation. I didn't just replace the instances of "dog" that were in my file at that moment. I conceptually replaced all instances, including ones that aren't there yet. Now, I should mention here that I personally dislike the replace operation, and I think it is more dangerous than helpful. However, other darcs users are quite happy with it, and it certainly is a creative and powerful feature. Other creative patch types have also been dreamed of. For example, a powerful language-specific refactoring operation has been discussed as a far-future possibility. That would be safe, and cool. >>Automated refactoring tools, for example, perform the >>rename+modify as an atomic operation. > > And that's harder, I agree. But unless I'm missing some nifty > refactoring editor out there that integrates with darcs during the edit > session, the user *still* has to tell the SCM about the rename manually. Although there are no such nifty refactoring tools available today, they will exist at some point. If they existed today, the world would be a better place. Even without tools, many shops have policies against checking in code that won't compile. If you rename a java class, you must simultaneously perform the rename and modify the class name inside. If you commit between those steps, it's broken. [I do realize that the kernel doesn't have java code, by the way.] I should also mention that I currently believe that Linus is correct that explicit rename tracking is not required for git. I have every hope that his plan for handling the more general case of "moved text" will take care of renames as a side effect. I don't know if that will be sufficient to allow a two-way lossless gateway between git and darcs or other systems that do track renames explicitly. Kevin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 2:05 ` Kevin Smith @ 2005-04-19 22:08 ` Patrick McFarland 2005-04-19 22:40 ` Ray Lee 1 sibling, 0 replies; 30+ messages in thread From: Patrick McFarland @ 2005-04-19 22:08 UTC (permalink / raw) To: darcs-devel; +Cc: Kevin Smith, Ray Lee, git [-- Attachment #1: Type: text/plain, Size: 1295 bytes --] On Monday 18 April 2005 10:05 pm, Kevin Smith wrote: > The big feature of a darcs replace patch is that it works forward and > backward in time. Let me try to come up with an example that can help > explain it. Hopefully I'll get it right. Let's start with a file like > this that exists in a project for which both you and I have darcs repos: > > cat > dog > fish > > Now, you change it to: > > cat dog > dog > fish > > while I simultaneously do a replace of "dog" with "plant", resulting in: > > cat > plant > fish > > We merge. The final result in both of our trees is: > > cat plant > plant > fish > > Notice that just by looking at my diffs, you can't tell that I used a > replace operation. I didn't just replace the instances of "dog" that > were in my file at that moment. I conceptually replaced all instances, > including ones that aren't there yet. I think that's the best explanation of how it works. And that is partially why darcs is so powerful. -- Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 2:05 ` Kevin Smith 2005-04-19 22:08 ` Patrick McFarland @ 2005-04-19 22:40 ` Ray Lee 2005-04-19 23:00 ` Tupshin Harper 2005-04-19 23:03 ` [darcs-devel] " Kevin Smith 1 sibling, 2 replies; 30+ messages in thread From: Ray Lee @ 2005-04-19 22:40 UTC (permalink / raw) To: Kevin Smith; +Cc: git, darcs-devel (Sorry for the delayed reply -- I'm living on tape delay for a bit.) On Mon, 2005-04-18 at 22:05 -0400, Kevin Smith wrote: > >>>>The other is "replace very instace of identifier `foo` with identifier`bar`". > >>> > >>>That could be derived, however, by a particularly smart parser [1]. > >> > >>No, it can't. Seriously. A darcs replace patch is encoded as rules, not > >>effects, and it is impossible to derive the rules just by looking at the > >>results. Not difficult. Impossible. > > > > If I do a token replace in an editor (say one of those fancy new-fangled > > refactoring thangs, or good ol' vi), a token-level comparator can > > discover what I did. That link I sent is an example of one such beast. > > The big feature of a darcs replace patch is that it works forward and > backward in time. That's *not* a feature of the token replace patch, however. That's a feature of the darcs commutation machinery, correct? (With the obvious caveat that darcs can only *do* the commutation if it has correctly nuanced darcs-style token replace patches, rather than mere ASCII textual diffs.) > Let me try to come up with an example that can help > explain it. Hopefully I'll get it right. Let's start with a file like > this that exists in a project for which both you and I have darcs repos: > > cat > dog > fish > > Now, you change it to: > > cat dog > dog > fish > > while I simultaneously do a replace of "dog" with "plant", resulting in: > > cat > plant > fish > > We merge. The final result in both of our trees is: > > cat plant > plant > fish Okay, that all makes sense. > Notice that just by looking at my diffs, you can't tell that I used a > replace operation. Here's where we disagree. If you checkpoint your tree before the replace, and immediately after, the only differences in the source-controlled files would be due to the replace. And since the language of the file is known (and thereby the tokenization -- it *is* well-defined), then a tokenizer that compares the before and after trees (for just the files that changed, obviously), can discover what you did, and promote the mere ASCII diff into a token-replace diff. (The same sort of idea could be done for reindention, I'd hope.) > I didn't just replace the instances of "dog" that > were in my file at that moment. I conceptually replaced all instances, > including ones that aren't there yet. Well yes, that's exactly what we want. And the key point of all of this is that there's no magic here. The darcs machinery does all the commutations such that the patches can wiggle together without conflicts. To do it's job, of course, it needs nuanced patches, rather than the quite literal ones generated by diff. We agree on everything except that it's provable that one can discover a replace operation, given a before and after tree. > Now, I should mention here that I personally dislike the replace > operation, and I think it is more dangerous than helpful. However, other > darcs users are quite happy with it, and it certainly is a creative and > powerful feature. It's creative alright, though I had the same misgivings. In my common code workflow, I almost never have global tokens -- all my replaces would be per function, so I never saw an opportunity to use it when I was screwing around with darcs. > Other creative patch types have also been dreamed of. For example, a > powerful language-specific refactoring operation has been discussed as a > far-future possibility. That would be safe, and cool. <subliminal> indention patch type, indention patch type... </subliminal> > > > Automated refactoring tools, for example, perform the > > > rename+modify as an atomic operation. > > [...] > Although there are no such nifty refactoring tools available today, they > will exist at some point. Yeah, I spent some time drooling over the refactoring editors before slapping myself and deciding I'd wait for others to live on that bleeding edge for a while. I've had to clean up too much code from other people. > Even without tools, many shops have policies against checking in code > that won't compile. If you rename a java class, you must simultaneously > perform the rename and modify the class name inside. If you commit > between those steps, it's broken. I'm trying hard to find a nice way to say that's silly. I'm failing. My suggestion in that case would be that the local coder commit many patches to a local repository, one of which is the rename. Then upon completion of the refactoring, the set of patches is committed to the group repository. Tags before and after preserve the repository's precondition that it always compiles. > [I do realize that the kernel doesn't have java code, by the way.] Don't worry, I didn't think that you did :-). Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 22:40 ` Ray Lee @ 2005-04-19 23:00 ` Tupshin Harper 2005-04-19 23:21 ` Ray Lee 2005-04-19 23:03 ` [darcs-devel] " Kevin Smith 1 sibling, 1 reply; 30+ messages in thread From: Tupshin Harper @ 2005-04-19 23:00 UTC (permalink / raw) To: Ray Lee; +Cc: git, Kevin Smith, darcs-devel Ray Lee wrote: >Here's where we disagree. If you checkpoint your tree before the >replace, and immediately after, the only differences in the >source-controlled files would be due to the replace. > This is assuming that you only have one replace and no other operations recorded in the patch. If you have multiple replaces or a replace and a traditional diff recorded in the same patch, then this is not true. > And since the >language of the file is known (and thereby the tokenization -- it *is* >well-defined), then a tokenizer that compares the before and after trees >(for just the files that changed, obviously), can discover what you did, >and promote the mere ASCII diff into a token-replace diff. (The same >sort of idea could be done for reindention, I'd hope.) > > See above for one set of limitations on this. A more fundamental problem comes back to intent. If I have a file "foo" before: a1 a2 and after: b1 b2 is that a "replace [_a-zA-Z0-9] a b foo" patch, or is that a -a1 -a2 +b1 +b2 patch? Note that this comes down to heuristics, and no matter what you use, you will be wrong sometimes, *and* the choice that is made can substantively affect the contents of the repository after additional patches are applied. >We agree on everything except that it's provable that one can discover a >replace operation, given a before and after tree. > > It's provable that you can not. -Tupshin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 23:00 ` Tupshin Harper @ 2005-04-19 23:21 ` Ray Lee 2005-04-19 23:38 ` Tupshin Harper 0 siblings, 1 reply; 30+ messages in thread From: Ray Lee @ 2005-04-19 23:21 UTC (permalink / raw) To: Tupshin Harper; +Cc: Kevin Smith, git, darcs-devel On Tue, 2005-04-19 at 16:00 -0700, Tupshin Harper wrote: > Ray Lee wrote: > > >Here's where we disagree. If you checkpoint your tree before the > >replace, and immediately after, the only differences in the > >source-controlled files would be due to the replace. > > > This is assuming that you only have one replace and no other operations > recorded in the patch. If you have multiple replaces or a replace and a > traditional diff recorded in the same patch, then this is not true. I had a precondition on my argument (not quoted), that the code was checkpointed before and after. Obviously, a large set of changes in one patch is a problem. However, a darcs replace is (effectively) a commit on its own, so I was limiting myself to the same situation under a different system. > A more fundamental problem comes back to intent. If I have a file > "foo" before: > a1 > a2 > and after: > b1 > b2 > is that a "replace [_a-zA-Z0-9] a b foo" patch, or is that a > -a1 > -a2 > +b1 > +b2 > patch? Okay, so in reading the online darcs manual (yet) again, I now see that it allows regular expressions for the match and replace, which means multiple unique tokens could change atomically. (Does anyone actually *use* regexes? Sounds like a cannon that'd be hard to aim.) Regardless, I only care about code, not free text. If it's in a language that doesn't do some use-'em-as-you-need-'em duck typing spiel (<cough>python</cough), then the context of your patch (namely, the file) already has those tokens somewhere in them. And I bet that if *you* looked at that file, you could tell if it was a replace or a mere textual diff. Am I wrong? > Note that this comes down to heuristics, and no matter what you > use, you will be wrong sometimes, *and* the choice that is made can > substantively affect the contents of the repository after additional > patches are applied. Unless I'm missing something, the darcs replace patch can already do the wrong thing. If I do a replace patch on a variable introduced in a local tree, then do a darcs replace on it before committing it to a shared repository, and coder B introduces a variable of the same original name in my copy, then there's a chance that the replace patch will incorrectly apply upon his newly introduced variable. No? > It's provable that you can not. I'm still not seeing the problem, at least when it comes to ANSI C. Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 23:21 ` Ray Lee @ 2005-04-19 23:38 ` Tupshin Harper 0 siblings, 0 replies; 30+ messages in thread From: Tupshin Harper @ 2005-04-19 23:38 UTC (permalink / raw) To: Ray Lee; +Cc: Kevin Smith, git, darcs-devel Ray Lee wrote: >it allows regular expressions for the match and replace, which means >multiple unique tokens could change atomically. (Does anyone actually >*use* regexes? Sounds like a cannon that'd be hard to aim.) > > Yes, and replace patches need to be used very carefully. >Regardless, I only care about code, not free text. If it's in a language >that doesn't do some use-'em-as-you-need-'em duck typing spiel >(<cough>python</cough), then the context of your patch (namely, the >file) already has those tokens somewhere in them. And I bet that if >*you* looked at that file, you could tell if it was a replace or a mere >textual diff. Am I wrong? > > Yes. See my hello world example from my last email. > >Unless I'm missing something, the darcs replace patch can already do the >wrong thing. > Yes, depending on how you define wrong. Darcs replace is fully predictable, and poorly chosen replaces can lead to incorrect results after future patches are applied. >If I do a replace patch on a variable introduced in a local >tree, then do a darcs replace on it before committing it to a shared >repository, and coder B introduces a variable of the same original name >in my copy, then there's a chance that the replace patch will >incorrectly apply upon his newly introduced variable. No? > > Absolutely correct, and the exact reason why replace patches need to be used *very* selectively. > > >>It's provable that you can not. >> >> > >I'm still not seeing the problem, at least when it comes to ANSI C. > >Ray > > See hello world example in my other email. You can argue that it is an existing problem in darcs, but really, it just points out the fact that a computer is *incapable* of knowing whether it is safe to use a replace patch based on a diff because replace patches are dangerous if not used intelligently. -Tupshin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 22:40 ` Ray Lee 2005-04-19 23:00 ` Tupshin Harper @ 2005-04-19 23:03 ` Kevin Smith 2005-04-19 23:06 ` Ray Lee 1 sibling, 1 reply; 30+ messages in thread From: Kevin Smith @ 2005-04-19 23:03 UTC (permalink / raw) To: Ray Lee; +Cc: git, darcs-devel Ray Lee wrote: > On Mon, 2005-04-18 at 22:05 -0400, Kevin Smith wrote: > >>Notice that just by looking at my diffs, you can't tell that I used a >>replace operation. > > > Here's where we disagree. If you checkpoint your tree before the > replace, and immediately after, the only differences in the > source-controlled files would be due to the replace. But I might have manually changed those tokens, or I might have done it with a replace operation. Just looking at the diffs, those two cases would look identical and be indistinguishable. The only way to know whether or not a darcs replace was done was to look at the patch metadata. Pop quiz: Here is revision 1 of my file: abcde Here is revision 2: wow Now, did I do that with a darcs replace, or just by typing? Kevin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 23:03 ` [darcs-devel] " Kevin Smith @ 2005-04-19 23:06 ` Ray Lee 2005-04-19 23:32 ` Tupshin Harper 2005-04-20 17:11 ` Ralph Corderoy 0 siblings, 2 replies; 30+ messages in thread From: Ray Lee @ 2005-04-19 23:06 UTC (permalink / raw) To: Kevin Smith; +Cc: git, darcs-devel On Tue, 2005-04-19 at 19:03 -0400, Kevin Smith wrote: > Pop quiz: > Here is revision 1 of my file: > abcde > > Here is revision 2: > wow > Now, did I do that with a darcs replace, or just by typing? I'm still not communicating well. Give me a case where assuming it's a replace will do the wrong thing, for C code, where it's a variable or function name. Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 23:06 ` Ray Lee @ 2005-04-19 23:32 ` Tupshin Harper 2005-04-20 1:11 ` [darcs-devel] " Ray Lee 2005-04-20 17:11 ` Ralph Corderoy 1 sibling, 1 reply; 30+ messages in thread From: Tupshin Harper @ 2005-04-19 23:32 UTC (permalink / raw) To: Ray Lee; +Cc: git, Kevin Smith, darcs-devel Ray Lee wrote: > I'm still not communicating well. > >Give me a case where assuming it's a replace will do the wrong thing, >for C code, where it's a variable or function name. > >Ray > >- > I think you are communicating fine, but not fully understanding darcs. try this: initial patch creates hello.c #include <stdio.h> int main(int argc, char *argv[]) { printf("Hello world!\n"); return 0; } second patch: replace ./hello.c [A-Za-z_0-9] world universe third patch, for conceptual clarity, created in another repository that had seen the first patch, but not the second (adds function wide_world): hunk ./hello.c 3 +void wide_world() +{ + printf("Hello wide world\n"); +} + hunk ./hello.c 11 + wide_world(); } If patch2 was a replace patch, then the result of running the combined 3 patch version would be: Hello universe! Hello wide universe but if patch2 was a non-replace patch, then the result would be: Hello universe! Hello wide world -Tupshin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 23:32 ` Tupshin Harper @ 2005-04-20 1:11 ` Ray Lee 2005-04-20 7:52 ` Juliusz Chroboczek 2005-04-20 11:55 ` David Roundy 0 siblings, 2 replies; 30+ messages in thread From: Ray Lee @ 2005-04-20 1:11 UTC (permalink / raw) To: Tupshin Harper; +Cc: Kevin Smith, git, darcs-devel Thanks for your patience. On Tue, 2005-04-19 at 16:32 -0700, Tupshin Harper wrote: > >Give me a case where assuming it's a replace will do the wrong thing, > >for C code, where it's a variable or function name. > try this: > initial patch creates hello.c > #include <stdio.h> > > int main(int argc, char *argv[]) > { > printf("Hello world!\n"); > return 0; > } > > second patch: > replace ./hello.c [A-Za-z_0-9] world universe Aha! Okay, I now see at least part of issue: we're using different definitions of 'token.' Yours is quite sensible, in that it matches the darcs syntax. However, I'm claiming a token is defined by the file's language, and that a replace patch on anything but a token as per those language standards is a silly thing. In your example, I'd claim you did an inter-token edit, as the natural token there was "Hello world!\n". With that, let me restate what I think is possible. One should be able to discover renames (replaces) of user identifiers in C code programmatically. Is that everything darcs replace does? Obviously not. Is that what users would usually *want*? If I were using it, that's what I'd want (especially including the limited scope of replacement -- user identifiers such as variable or function names, etc.). But then I'm not a lurker on the darcs user list, so I don't know how usage of darcs replace plays out in actual practice. So, it's a subset. Is it a useful subset? Yes, as it addresses what happens during refactoring, which is when I'd usually see this getting used. (Syntactically ignorant search and replace is so, y'know, *1970s*.) Any clearer? Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-20 1:11 ` [darcs-devel] " Ray Lee @ 2005-04-20 7:52 ` Juliusz Chroboczek 2005-04-20 11:55 ` David Roundy 1 sibling, 0 replies; 30+ messages in thread From: Juliusz Chroboczek @ 2005-04-20 7:52 UTC (permalink / raw) To: git, darcs-devel > However, I'm claiming a token is defined by the file's language, and > that a replace patch on anything but a token as per those language > standards is a silly thing. Please recall the context of this discussion: getting Darcs to grok git repositories. You are arguing that it should be possible to design a set of heuristics that Do The Right Thing often enough. And you are probably right. But the point is immaterial as nobody has stepped up to implement in Darcs the sort of heuristics you have in mind. Partly because nobody has time, but mostly because we don't like heuristics, we prefer Darcs to remain deterministic. So while yes, it might be possible to get about using heuristics, it seems rather unlikely that that's what we'll do. Juliusz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-20 1:11 ` [darcs-devel] " Ray Lee 2005-04-20 7:52 ` Juliusz Chroboczek @ 2005-04-20 11:55 ` David Roundy 1 sibling, 0 replies; 30+ messages in thread From: David Roundy @ 2005-04-20 11:55 UTC (permalink / raw) To: Ray Lee; +Cc: Tupshin Harper, Kevin Smith, git, darcs-devel On Tue, Apr 19, 2005 at 06:11:43PM -0700, Ray Lee wrote: > > second patch: > > replace ./hello.c [A-Za-z_0-9] world universe > > Aha! Okay, I now see at least part of issue: we're using different > definitions of 'token.' Yours is quite sensible, in that it matches the > darcs syntax. However, I'm claiming a token is defined by the file's > language, and that a replace patch on anything but a token as per those > language standards is a silly thing. The trouble is that a token based on language standards is also wrong, unless your file at all times is syntactically correct. It also means (for C in particular) that the result of the token replace isn't uniquely determined by the combination of the token replace patch and the file it applies to, since you need parse any header files in order to tokenize the C file. In the case of header files, it may not be possible to tokenize them uniquely, since they may tokenize differently depending on what other header files are included before them. And of course, none of this may be possible if you haven't run autoconf and configure, since you may not actually *have* the header files in the first place... In a (reasonably) general-purpose tool like darcs, I think it's better to stick with a simpler definition of token that doesn't require a complete integrated development environment. It's also true that often you want to modify headers and string contents simultaneously with the change of the code itself. When I replace get_pseudowavefunction with get_atomic_orbital, I also want to modify // We call get_pseudowavefunction to get the atomic orbital... and printf("Error in get_pseudowavefunction!\n"); -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 23:06 ` Ray Lee 2005-04-19 23:32 ` Tupshin Harper @ 2005-04-20 17:11 ` Ralph Corderoy 1 sibling, 0 replies; 30+ messages in thread From: Ralph Corderoy @ 2005-04-20 17:11 UTC (permalink / raw) To: Ray Lee; +Cc: Kevin Smith, git, darcs-devel Hi Ray, > Give me a case where assuming it's a replace will do the wrong thing, > for C code, where it's a variable or function name. How about two patches. 1. s/foo/bar/ throughout file because foo() has been decided upon as the name of a new globally visible forthcoming function but was already in use as a static function. 2. Add definition of new foo(). Patch 1 mustn't be a `darcs replace' despite it changing every occurence of the C token foo into bar. Cheers, Ralph. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 1:42 ` Ray Lee 2005-04-19 2:05 ` Kevin Smith @ 2005-04-19 11:05 ` David Roundy 1 sibling, 0 replies; 30+ messages in thread From: David Roundy @ 2005-04-19 11:05 UTC (permalink / raw) To: Ray Lee; +Cc: Kevin Smith, git, darcs-devel On Mon, Apr 18, 2005 at 06:42:11PM -0700, Ray Lee wrote: > On Mon, 2005-04-18 at 21:05 -0400, Kevin Smith wrote: > > You could guess, but that's not good enough for darcs to be able to > > reliably commute the patches later. > > Who said anything about guessing? If a user replaces all instances of > foo with bar, that's as close to proof as you can ever get, without > recording intent of the user at the time it's done. Now, I realize that > darcs *does* record intent, but I claim that's immaterial. The problem is, how do you know how to define a token? That's also included in a darcs patch. And a darcs user may choose not to use a replace patch, if (for example) he's renaming a local variable, since he might not want to mess with other functions in the same file. Guessing the author's intent cannot reliably reproduce the author's stated intent. Either we need to include that information in one form or another (and in one location or another), or we've got to simply disallow replaces (and moves?) when interacting with git. -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <20050419235832.56117.qmail@web51003.mail.yahoo.com>]
* Re: [darcs-devel] Darcs and git: plan of action [not found] <20050419235832.56117.qmail@web51003.mail.yahoo.com> @ 2005-04-20 7:55 ` Juliusz Chroboczek 0 siblings, 0 replies; 30+ messages in thread From: Juliusz Chroboczek @ 2005-04-20 7:55 UTC (permalink / raw) To: darcs-devel, git > We're talking about interoperating with a Git repository here, > right? Even if we got the metadata in there, doesn't Git have to > understand a replace patch for things to work out? > 0. All three are in sync to begin with. > 1. CC creates a token-replace patch, sends the changes in normal hunk > format to AA. > 2. BB makes changes, sends a normal hunk patch to AA and CC. AA will > apply the hunk normally. For CC the token replace might apply here > and so the result could be different. 3. when AA and CC try to sync, they will get spurious merge conflicts. > Isn't this a potential problem? It is. In a heterogeneous environment they will get spurious merge conflicts. Juliusz ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <7ivf6lm594.fsf@lanthane.pps.jussieu.fr>]
* Re: Darcs and git: plan of action [not found] <7ivf6lm594.fsf@lanthane.pps.jussieu.fr> @ 2005-04-18 12:20 ` David Roundy 2005-04-18 15:38 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: David Roundy @ 2005-04-18 12:20 UTC (permalink / raw) To: darcs-devel; +Cc: Linus Torvalds, Git Mailing List Linus and gittish people, I'm cc'ing you on this email, since Juliusz had some interesting ideas as to how darcs could interact with git, which then gave me an idea concerning which I'd like feedback from you. In particular, it would make life (that is, life interacting back and forth with git) easier if we were to embed darcs patches in their entirety in the git comment block. It's a bit of an ugly idea, but would greatly simplify the two-way interaction between git and darcs, since no information would be lost when a darcs patch was merged into git. See below for the discussion. As I say, it's a bit ugly, and before we explore the idea further, it would be nice to know if this would cause Linus to vomit in disgust and/or refuse patches from darcs users. Another slightly less noxious possibility would be to store the darcs patch as a "hidden" file, if git were given the concept of commit-specific files. So then we could include in the commit log something like "Darcs-patch: 780c057447d4feef015a905aaf6c87db894ff58c". We could do this silently, except that I wonder if fsck would delete these files, since they aren't pointed to by any trees. On Mon, Apr 18, 2005 at 12:02:15AM +0200, Juliusz Chroboczek wrote: > David, > > I've read git over the week-end. I think I can see where it's coming > from. > > Git is basically a (userspace) filesystem with support for efficiently > finding identical objects. It's both simple and generic enough to be > usable by us. Right. > You mentioned that you'd like to use git as a cache for Darcs; and I > don't think I agree. Caches are tricky -- they need to be kept in > synch -- and they might result in unexpected performance (you need to > update both the native and the cached data structures on every > modification). It's true that we'd need to keep the cache in sync, which would mean making sure it gets updated with every repository-modifying darcs command, but we've already got a cache that has those properties, and it seems like modifying the interface to deal with a more complex cache would be relatively straightforward, and would likely have other advantages, such as if we wanted to implement a per-file cache to speed up annotate (since the speed of annotate seems to be a relatively common concern). Basically, I'm imagining that we'd have to replace writePristine and write_dirty_Pristine with the applyPristine that Ian implemented for efficiency reasons. So we'd write to pristine by throwing patches at it, and letting it do what it pleases with them. Then we'd read from Pristine as usual--but we might want to add interfaces for reading slurpies of older versions from the pristine cache. This would again be a helpful interface anyways, since it might allow us, for example, to use checkpoints when reading older versions. > I'd rather remodularise Darcs so that the on-disk patch representation > is decoupled from the in-memory representation, so that we can use > various backends in the same way as we use the native repository > format. The problem I have with this is that "other" repository formats (e.g. git) store "tree versions", not "changes", and I think it would be fragile to try to store "changes" (in the darcs sense) in them. > As you seem motivated by git (my motivation is slightly different -- I > want to be able to pull from Arch and other widespread systems with > dysfunctional user interfaces), I suggest that we start with that. I see. You're thinking of using darcs as a client for other SCMs. That's sort of how I'm thinking of darcs interacting with git, so we aren't so far off in terms of goals. My hope would tend to be that people would coalesce around git--since Linus will be using git. If everyone can interoperate with git, we'd be able to interoperate with everyone, in a sense, anyways. > I suggest we do the following: > > 1. remove the assumption that patch IDs have a fixed format. Patch > IDs should be opaque blobs of binary data that Darcs only compares > for equality. I'm not really comfortable with this, although I can see that there is an appeal to it, and that something like it may turn out to be necesary for interacting with systems for which we can't create a simple mapping of patch IDs. > 2. get Darcs to pull from git. By restricting ourselves to a fairly > simple command, this should be doable in finite time. Okay, this is definitely a good goal. See below for thoughts on how this should be accomplished. > 3. allow a patch to have multiple IDs; if the IDs associated to two > patches are not disjoint, then the patches are the same patch. This I find a bit confusing. So a patch can have two IDs, presumably something like a "darcs ID" and a "git ID"? I can see that this might simplify some things, but am not sure how it would work. The IDs would have to have a hierarchy, so that you wouldn't ever end up with the "same" patch having disjoint IDs in two cases. > 4. allow applying to git repos of non-merger patches. Here's where I think I'd differ. I think when dealing with git (and probably also with *any* other SCM (arch being a possible exception), we need to consider the exchange medium to be not a patch, but a tag. Git only knows about "versions" of the tree, which in darcs terminology is a tag. It *does* know about the (possibly multiple) parents of a given version, so we have a "context" for the patch--provided those two (or one...) parents are treated as tags. So in pulling from git, I'd treat each git change as a patch followed by a tag. When pulling from git, unfortunately, the contents of that patch will be determined by our diff algorithm, so if we want long-term stability we might need to mummify a variant of the diff algorithm that we agree not to change, and to always use when computing patches from a git archive. This tagging (and I imagine the tags will look something like "git:0c16636264037e8b5ccd38b28ecd191aebc67389") will mean that we can create a single-patch darcs "patch bundle" for any given git commit. Which is to say, that we'll be able to "see" a git repository as an odd-looking darcs repository. This means that getting a fresh darcs repository from git would potentially involve a whole lot of merging... Putting darcs patches *into* git is more complicated, since we'll want to get them back again without modification. Normal "hunk" patches would be no problem, provided we never change our diff algorithm (which has been discussed recently, in the context of making hunks better align with blocks of code). We could perhaps tell users not to use "replace" patches. But avoiding "mv" patches would be downright silly. So we're somehow going to have to either sneak this sort of metadata into the git repository, or we're going to have to store just the darcs "patch ID" in the git repository, and require that darcs users get the actual patch from somewhere else. I had been imagining the latter, but now I'm wondering if the former is a reasonable possibility. Linus has said that he figures an SCM needs to be built on top of git, and that SCM--rather than git itself--would be the one that would know about things like file renames, probably by storing some sort of rename metadata. I wonder if we could perhaps store the entire darcs patch in the git commit? It seems a bit abusive, but would certainly be the easiest way to interface losslessly with git. So when we pull from git, we'd look in the commit log for the magic words indicating that this is really a darcs patch. If so, we could handle it natively. If not, we'd know it was actually a "gittish" entity, and requires that we diff a couple of trees to find the actual patch to be used with darcs' patch theory. The ugliness of this idea is that it involves storing redundant information. And I think we'll have a bit of an excercise in commutation and merging when we get the patches from a git repository, since in git they'll be stored in tree form, but that's something we'll have to do even to create a read-only darcs mirror of a git repository. We could perhaps alleviate the pain by perhaps not including the actual contents of new or deleted files in the patch, but instead retrieve those from git directly. But that might be more trouble than it's worth, at least for the first sketch. > 5. think about mergers. Since git stores a branched history rather than a linear history, I'm not sure that we'll ever need to store mergers in git. Instead, we could just commute until the mergers disappear (which might be a bit scary), and then store *that* in git. On the other hand, if this is to inefficient, and if we store actual darcs patches in git, then we wouldn't perhaps need to worry about mergers as a special case. > Whether we end up with a useful implementation of Darcs/git or not, > this will result in a more modular Darcs, and hence one that will be > easier to optimise. > > What do you think? Err, I think I've answered that one... :) -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-18 12:20 ` David Roundy @ 2005-04-18 15:38 ` Linus Torvalds 2005-04-19 10:42 ` [darcs-devel] " David Roundy 2005-04-18 18:35 ` Ray Lee 2005-04-19 0:55 ` Juliusz Chroboczek 2 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2005-04-18 15:38 UTC (permalink / raw) To: David Roundy; +Cc: Git Mailing List, darcs-devel On Mon, 18 Apr 2005, David Roundy wrote: > > I'm cc'ing you on this email, since Juliusz had some interesting ideas as > to how darcs could interact with git, which then gave me an idea concerning > which I'd like feedback from you. In particular, it would make life (that > is, life interacting back and forth with git) easier if we were to embed > darcs patches in their entirety in the git comment block. Hell no. The commit _does_ specify the patch uniquely and exactly, so I really don't see the point. You can always get the patch by just doing a git diff $parent_tree $thistree so putting the patch in the comment is not an option. Then you can use the patch to index to whatever extra "darcs index" information you want to. > As I say, it's a bit ugly, and before we explore the idea further, it would > be nice to know if this would cause Linus to vomit in disgust and/or refuse > patches from darcs users. That's definitely the case. I will _not_ be taking random files etc just to keep other peoples stuff straightened up. If you want to add a "log ID", you can certainly do that, but the data the ID refers to is _you_ data, and will not go into the git archive. So: > Another slightly less noxious possibility would > be to store the darcs patch as a "hidden" file, if git were given the > concept of commit-specific files. No, git will not track commit-specific files. There's the comment section, and that _is_ the commit-specific file. But I will refuse to take any comments that aren't just human-readable explanations, together with maybe one extra line of # Darcs ID: 780c057447d4feef015a905aaf6c87db894ff58c (others will want to track _their_ PR numbers etc) and that's it. The actual darcs data that that ID refers to can obviously be maintained in _another_ git archive, but it's not one I'm going to carry about. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-18 15:38 ` Linus Torvalds @ 2005-04-19 10:42 ` David Roundy 2005-04-19 14:55 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: David Roundy @ 2005-04-19 10:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: darcs-devel, Git Mailing List On Mon, Apr 18, 2005 at 08:38:25AM -0700, Linus Torvalds wrote: > On Mon, 18 Apr 2005, David Roundy wrote: > > .... In particular, it would make life (that is, life interacting back > > and forth with git) easier if we were to embed darcs patches in their > > entirety in the git comment block. > > Hell no. I was afraid that would be the response... > The commit _does_ specify the patch uniquely and exactly, so I really > don't see the point. You can always get the patch by just doing a > > git diff $parent_tree $thistree > > so putting the patch in the comment is not an option. The issue is that in darcs the parent and child trees *don't* uniquely or exactly specify the patch. In fact, even the output of git diff will depend on what version of diff you're using (e.g. if someone were to use BSD diff rather than GNU diff). > > As I say, it's a bit ugly, and before we explore the idea further, it would > > be nice to know if this would cause Linus to vomit in disgust and/or refuse > > patches from darcs users. > > That's definitely the case. I will _not_ be taking random files etc just > to keep other peoples stuff straightened up. Okay. > > Another slightly less noxious possibility would be to store the darcs > > patch as a "hidden" file, if git were given the concept of > > commit-specific files. > > No, git will not track commit-specific files. There's the comment > section, and that _is_ the commit-specific file. But I will refuse to > take any comments that aren't just human-readable explanations, together > with maybe one extra line of > > # Darcs ID: 780c057447d4feef015a905aaf6c87db894ff58c > > (others will want to track _their_ PR numbers etc) and that's it. The > actual darcs data that that ID refers to can obviously be maintained in > _another_ git archive, but it's not one I'm going to carry about. The trouble is that the philosophy of darcs and git are about as orthogonal as one can come. Git treats the content as fundamental, where in darcs the changes are fundamental. Since in darcs there can be different changes that lead from the same parent to the same child--and these differences are meaningful when merges happen---when interacting with git, we either need to restrict darcs to only describe changes in a way that can be uniquely determined by a parent and child, or we need to have extra metadata somewhere. For bidirectional functionality, we either need to avoid the use of advanced darcs features, or we need to include that information in git somehow, or we need to keep a parallel darcs archive holding that information. Would a small amount of human-readable change information be acceptable in the free-form comment area? In the rename thread I got the impression this would be okay for renames. For example, rename foo bar or (this is less important, but you might consider it to be a useful human-readable comment) replace [_a-zA-Z0-9] old_variable new_variable file/path Currently these two patch types account for almost the sum total of the cases where different patches lead to the same resulting trees. -- David Roundy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 10:42 ` [darcs-devel] " David Roundy @ 2005-04-19 14:55 ` Linus Torvalds 2005-04-19 16:33 ` [darcs-devel] " Tupshin Harper 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2005-04-19 14:55 UTC (permalink / raw) To: David Roundy; +Cc: Git Mailing List, darcs-devel On Tue, 19 Apr 2005, David Roundy wrote: > > Would a small amount of human-readable change information be acceptable in > the free-form comment area? In the rename thread I got the impression this > would be okay for renames. For example, > > rename foo bar Sure. That's human-readable and meaningful, as in "it actually makes sense as a commit comment regardless of any darcs issues". As does: > replace [_a-zA-Z0-9] old_variable new_variable file/path which is almost so (a human would have written "rename old to new", but the above isn't _that_ different). HOWEVER, then the requirement would be that we'd never have complex combinations of the above. Ie having 2-5 lines of something like that is "human-readable". Having 10+ lines of the above is not. See? I have this suspicion that the "replace" thing often ends up being done on dozens of files, and I don't want to have dozens of lines of stuff that ends up really being machine-readable. But if it's ok to depend on the content changes (you _do_ see which files changed) together with a single line of "replace [token-def] xxx yyy", then hell yes - I consider that to be useful information even outside of git. (In other words: if it looks like something a careful human _could_ have written, it's certainly ok. But if it looks like something a careful human would have used a script to generate 40 entries of, it's bad). Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 14:55 ` Linus Torvalds @ 2005-04-19 16:33 ` Tupshin Harper 2005-04-19 16:49 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: Tupshin Harper @ 2005-04-19 16:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Roundy, darcs-devel, Git Mailing List Linus Torvalds wrote: >(In other words: if it looks like something a careful human _could_ have >written, it's certainly ok. But if it looks like something a careful human >would have used a script to generate 40 entries of, it's bad). > > Linus > > This is the way that darcs would currently represent a "darcs replace foo bar" on 15 files, which is obviously exactly what you are objecting to: [global foo to bar tupshin@tupshin.com**20050419155539] { replace ./dir1/file1 [A-Za-z_0-9] foo bar replace ./dir1/file2 [A-Za-z_0-9] foo bar replace ./dir1/file3 [A-Za-z_0-9] foo bar replace ./dir1/file4 [A-Za-z_0-9] foo bar replace ./dir1/file5 [A-Za-z_0-9] foo bar replace ./dir2/file1 [A-Za-z_0-9] foo bar replace ./dir2/file2 [A-Za-z_0-9] foo bar replace ./dir2/file3 [A-Za-z_0-9] foo bar replace ./dir2/file4 [A-Za-z_0-9] foo bar replace ./dir2/file5 [A-Za-z_0-9] foo bar replace ./dir3/file1 [A-Za-z_0-9] foo bar replace ./dir3/file2 [A-Za-z_0-9] foo bar replace ./dir3/file3 [A-Za-z_0-9] foo bar replace ./dir3/file4 [A-Za-z_0-9] foo bar replace ./dir3/file5 [A-Za-z_0-9] foo bar } I see two possible complementary ways to address this: 1) allow something akin to the above form in git free-form comments as a *technical* solution, while leaving it up to the individual repository owner whether to accept such patches on aesthetic grounds. 2) explore adding a different format to darcs that would allow a files affected to be represented more compactly. I suspect that any use of wildcards in a new format would be impossible for darcs since it wouldn't allow darcs to construct dependencies, though I'll leave it to david to respond to that. At a minimum, something like: replace ./dir1/[file1|file2|file3|file4|file5] [A-Za-z_0-9] foo bar replace ./dir2/[file1|file2|file3|file4|file5] [A-Za-z_0-9] foo bar replace ./dir3/[file1|file2|file3|file4|file5] [A-Za-z_0-9] foo bar should be pretty feasible. I don't believe, however, that it would ever be 100% reliable to try to look at a one line replace description and combine it with the actual changes and end up with a correct darcs replace patch. -Tupshin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 16:33 ` [darcs-devel] " Tupshin Harper @ 2005-04-19 16:49 ` Linus Torvalds 0 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2005-04-19 16:49 UTC (permalink / raw) To: Tupshin Harper; +Cc: David Roundy, darcs-devel, Git Mailing List On Tue, 19 Apr 2005, Tupshin Harper wrote: > > I suspect that any use of wildcards in a new format would be impossible > for darcs since it wouldn't allow darcs to construct dependencies, > though I'll leave it to david to respond to that. Note that git _does_ very efficiently (and I mean _very_) expose the changed files. So if this kind of darcs patch is always the same pattern just repeated over <n> files, then you really don't need to even list the files at all. Git gives you a very efficient file listing by just doing a "diff-tree" (which does not diff the _contents_ - it really just gives you a pretty much zero-cost "which files changed" listing). So that combination would be 100% reliable _if_ you always split up darcs patches to "common elements". And note that there does not have to be a 1:1 relationship between a git commit and a darcs patch. For example, say that you have a darcs patch that does a combination of "change token x to token y in 100 files" and "rename file a into b". I don't know if you do those kind of "combination patches" at all, but if you do, why not just split them up into two? That way the list of files changed _does_ 100% determine the list of files for the token exchange. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-18 12:20 ` David Roundy 2005-04-18 15:38 ` Linus Torvalds @ 2005-04-18 18:35 ` Ray Lee 2005-04-19 0:55 ` Juliusz Chroboczek 2 siblings, 0 replies; 30+ messages in thread From: Ray Lee @ 2005-04-18 18:35 UTC (permalink / raw) To: David Roundy; +Cc: Linus Torvalds, Git Mailing List, darcs-devel On Mon, 2005-04-18 at 08:20 -0400, David Roundy wrote: > Putting darcs patches *into* git is more complicated, since we'll want to > get them back again without modification. Normal "hunk" patches would be > no problem, provided we never change our diff algorithm (which has been > discussed recently, in the context of making hunks better align with blocks > of code). We could perhaps tell users not to use "replace" patches. But > avoiding "mv" patches would be downright silly. Okay, I still haven't used git yet (and have only toyed around with darcs for a bit), so take what I'm saying with a grain of salt. Regardless, I think you may be asking the wrong question. The tracking of renames was bandied about pretty thoroughly on-list from Wednesday through Friday (for far better commentary and insight, see Linus' messages with subject: Merge with git-pasky II.) git does track changesets that describe the parent tree(s) and the result. The trees track filenames and hashes. So, doing a fairly straightforward compare on two trees will let you immediately discover renames that have occurred, as the filename in the tree changed while the hash didn't. So, the question then becomes, can an outside tool cheaply derive all the information that darcs would need to perform it's work? The renames should be easy, as long as no content changed during the rename. As for token replacement (and whitespace changes, etc.), that could be discovered via domain-specific parsers (something specific per language, for example). Linus tossed a link to one such tool (hmm, where was it. Sheesh. You sure right a lot, dude :-).) http://minnie.tuhs.org/Programs (see Ctcompare) ...which should be viewed more as a proof-of-concept than a mergeable code-set. It does show that diff's vocabulary is sadly lacking in expressiveness, and improving that, I think, would be a useful area to expend effort. Again, I may be off here, especially considering I've a backlog of a couple hundred messages to read since the weekend. (You guys need to go outside more often.) Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-18 12:20 ` David Roundy 2005-04-18 15:38 ` Linus Torvalds 2005-04-18 18:35 ` Ray Lee @ 2005-04-19 0:55 ` Juliusz Chroboczek 2005-04-19 1:43 ` [darcs-devel] " Ray Lee 2005-04-19 11:04 ` David Roundy 2 siblings, 2 replies; 30+ messages in thread From: Juliusz Chroboczek @ 2005-04-19 0:55 UTC (permalink / raw) To: darcs-devel; +Cc: Linus Torvalds, Git Mailing List [Using git as a backend for Darcs.] > The problem I have with this is that "other" repository formats (e.g. git) > store "tree versions", not "changes", and I think it would be fragile to > try to store "changes" (in the darcs sense) in them. Not really; a Darcs patch is just a pair of two git versions (from and to). Which is why Darcs needs to support arbitrarily formatted patch ids -- a patch originating from git will be identified by a pair of git hashes. Obviously, we'll need to think harder when pushing from darcs into git (we'll need to preserve the Darcs patch id somehow), but it's premature to worry about that right now. >> 1. remove the assumption that patch IDs have a fixed format. Patch >> IDs should be opaque blobs of binary data that Darcs only compares >> for equality. > I'm not really comfortable with this, Why? >> 3. allow a patch to have multiple IDs; if the IDs associated to two >> patches are not disjoint, then the patches are the same patch. > > This I find a bit confusing. So a patch can have two IDs, presumably > something like a "darcs ID" and a "git ID"? I can see that this might > simplify some things, but am not sure how it would work. The IDs would > have to have a hierarchy, so that you wouldn't ever end up with the "same" > patch having disjoint IDs in two cases. It's a case of ``don't do that''. Suppose I record a patch in Darcs; it gets a Darcs id. I push it into git, at which point it gets a git id, whether we want it to or not. What do we do when we pull that patch back into darcs? Either we arbitrarily discard one of the ids (which one?), or we keep both. If there's more pulling/pushing going on on the git side, we definitely need to keep both. > Here's where I think I'd differ. Same to you ;-) > I think when dealing with git (and probably also with *any* other > SCM (arch being a possible exception), we need to consider the > exchange medium to be not a patch, but a tag. We're thinking in opposite directions -- you're thinking of the alien versions as integrals of Darcs patches, I'm thinking of Darcs patches as derivatives of alien versions. You: alien version = Darcs tag Me: Darcs patch = pair of successive alien versions My gut instinct is that the second model can be made to work almost seamlessly, unlike the first one. But that's just a guess. > if we want long-term stability we might need to mummify a variant of > the diff algorithm that we agree not to change, Good point, noted. > But avoiding "mv" patches would be downright silly. Aye, that will require some metadata on the git side (the hack, suggested by Linus, of using git hashes to notice moves won't work). Happily, it's premature to worry about that, too. Juliusz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 0:55 ` Juliusz Chroboczek @ 2005-04-19 1:43 ` Ray Lee 2005-04-19 8:22 ` Juliusz Chroboczek 2005-04-19 11:04 ` David Roundy 1 sibling, 1 reply; 30+ messages in thread From: Ray Lee @ 2005-04-19 1:43 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: darcs-devel, Git Mailing List, Linus Torvalds On Tue, 2005-04-19 at 02:55 +0200, Juliusz Chroboczek wrote: > > But avoiding "mv" patches would be downright silly. > > Aye, that will require some metadata on the git side (the hack, > suggested by Linus, of using git hashes to notice moves won't work). Okay, I'm coming to believe I missed something. So, why won't it work? Ray ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 1:43 ` [darcs-devel] " Ray Lee @ 2005-04-19 8:22 ` Juliusz Chroboczek 2005-04-20 1:22 ` Ray Lee 0 siblings, 1 reply; 30+ messages in thread From: Juliusz Chroboczek @ 2005-04-19 8:22 UTC (permalink / raw) To: Ray Lee; +Cc: darcs-devel, Git Mailing List > > Aye, that will require some metadata on the git side (the hack, > > suggested by Linus, of using git hashes to notice moves won't work). > So, why won't it work? Because two files can legitimately have identical contents without being ``the same'' file from the VC system's point of view. In other words, two files may happen to have the same contents but have distinct histories. Juliusz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 8:22 ` Juliusz Chroboczek @ 2005-04-20 1:22 ` Ray Lee 0 siblings, 0 replies; 30+ messages in thread From: Ray Lee @ 2005-04-20 1:22 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: darcs-devel, git, Kevin Smith On Tue, 2005-04-19 at 10:22 +0200, Juliusz Chroboczek wrote: > > > Aye, that will require some metadata on the git side (the hack, > > > suggested by Linus, of using git hashes to notice moves won't work). > > > So, why won't it work? > > Because two files can legitimately have identical contents without > being ``the same'' file from the VC system's point of view. > > In other words, two files may happen to have the same contents but > have distinct histories. Eh, let's not talk using integral/summation view across all the patches that ever could have come in against the file. We're hamstringing ourselves if we do that, and it's not what darcs does. darcs looks at a differential view of the changes, and for a mv, it looks at it when it happens. darcs does a "darcs mv" to commit a "file move patch" to whatever logging or patch repository it keeps below the surface. The equivalent in git would be to have a given tree, move a file via bash's mv, and then checkpoint a new tree. (I'm sure there's details in there, but that's plumbing, and what we have Petr for.) A differential comparison of the two trees shows no content changed, but a file label was modified. Ergo, a rename occurred. QED. ~r. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 0:55 ` Juliusz Chroboczek 2005-04-19 1:43 ` [darcs-devel] " Ray Lee @ 2005-04-19 11:04 ` David Roundy 2005-04-19 12:20 ` Juliusz Chroboczek 1 sibling, 1 reply; 30+ messages in thread From: David Roundy @ 2005-04-19 11:04 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: darcs-devel, Linus Torvalds, Git Mailing List On Tue, Apr 19, 2005 at 02:55:05AM +0200, Juliusz Chroboczek wrote: > [Using git as a backend for Darcs.] ... > >> 1. remove the assumption that patch IDs have a fixed format. Patch > >> IDs should be opaque blobs of binary data that Darcs only compares > >> for equality. > > > I'm not really comfortable with this, > > Why? I'm not clear why it would be necesary, and it takes the only immutable piece of information regarding a patch, and makes it variable. Just seems dangerous and complicated, and I'm not sure why we'd need to do it. > Suppose I record a patch in Darcs; it gets a Darcs id. I push it into > git, at which point it gets a git id, whether we want it to or not. > What do we do when we pull that patch back into darcs? > > Either we arbitrarily discard one of the ids (which one?), or we keep > both. If there's more pulling/pushing going on on the git side, we > definitely need to keep both. Or alternatively, we could have a one-to-one mapping between git IDs and darcs IDs, which is what I'd do. > > I think when dealing with git (and probably also with *any* other SCM > > (arch being a possible exception), we need to consider the exchange > > medium to be not a patch, but a tag. > > We're thinking in opposite directions -- you're thinking of the alien > versions as integrals of Darcs patches, I'm thinking of Darcs patches > as derivatives of alien versions. > > You: alien version = Darcs tag > > Me: Darcs patch = pair of successive alien versions > > My gut instinct is that the second model can be made to work almost > seamlessly, unlike the first one. But that's just a guess. The problem is that there is no sequence of alien versions that one can differentiate. Git has a branched history, with each version that follows a merge having multiple parents. How do you define that change? It's easy enough to do if we tag each git version in darcs, since we know what the two parents are, and we know what the final state is, but there *is* no translation from a single git ID either to a single patch(1) patch, or to a single darcs patch--unless you treat its parents as tags. The key is that we can't make git work like darcs, so we'll have to make darcs work like git. If we do it right (automatically tagging like crazy people), darcs users between themselves can cherry-pick all they like, without introducing inconsistencies or losing interoperability with git. To summarize how I'd see the mapping between git information and darcs, a git commit would be composed of one darcs patch and one darcs tag. With this mapping, I don't believe we lose any information, and I believe we'll be able to (except that patches would have to be uniquely determined by a pair of trees) simply translate the darcs system right back again, since it's a one-to-one correspondence of information. My proposed mapping: tree 6ff0e9f3d131bd110d32829f0b14f07da8313c45 # This is a darcs tag ID parent abd62b9caee377595a9bf75f363328c82a38f86e # This is the context of both a patch and tag. author James Bottomley <James.Bottomley@SteelEye.com> 1113879319 -0700 # This is the author and date of the patch committer Linus Torvalds <torvalds@ppc970.osdl.org.(none)> 1113879319 -0700 # This is the author and date of the tag # Everything below would be the name and long comment of the patch [PATCH] SCSI trees, merges and git status Doing the latest SCSI merge exposed two bugs in your merge script: 1) It doesn't like a completely new directory (the misc tree contains a new drivers/scsi/lpfc) 2) the merge testing logic is wrong. You only want to exit 1 if the merge fails. -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Darcs and git: plan of action 2005-04-19 11:04 ` David Roundy @ 2005-04-19 12:20 ` Juliusz Chroboczek 2005-04-19 12:25 ` [darcs-devel] " Petr Baudis 2005-04-20 11:29 ` David Roundy 0 siblings, 2 replies; 30+ messages in thread From: Juliusz Chroboczek @ 2005-04-19 12:20 UTC (permalink / raw) To: darcs-devel, Git Mailing List [Removing Linus from CC, keeping the Git list -- or should we remove it?] > I'm not clear why it would be necesary, and it takes the only immutable > piece of information regarding a patch, and makes it variable. Er... I'm not suggesting to make it variable, just to make it an opaque blob of bytes (still immutable). I see from the examples you give below that you agree that the format needs extending, so I suspect we're actually agreeing here, just failing to communicate. about having multiple ids per patch: > Or alternatively, we could have a one-to-one mapping between git IDs and > darcs IDs, which is what I'd do. Okay, you've convinced me. It's much simpler that way, we'll see how well it works. > The problem is that there is no sequence of alien versions that one can > differentiate. Git has a branched history, with each version that follows > a merge having multiple parents. Yep. I've just realised that this morning. Is there some notion of ``primary parent'' as in Arch? Can a changeset have 0 parents? > If we do it right (automatically tagging like crazy people), darcs > users between themselves can cherry-pick all they like, without > introducing inconsistencies or losing interoperability with git. You've lost me here. How can you cherry-pick if every tag depends on the preceding patches? Or are you thinking of pulling just the patch and not the tag -- in that case, what happens when you push to git a Darcs patch that depends on a patch that originated with git? I've started interfacing Haskell with git this week-end, that's something we'll need whichever model we choose. We should be able to start playing with actually modifying Darcs after next week-end. Juliusz ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 12:20 ` Juliusz Chroboczek @ 2005-04-19 12:25 ` Petr Baudis 2005-04-20 11:18 ` David Roundy 2005-04-20 11:29 ` David Roundy 1 sibling, 1 reply; 30+ messages in thread From: Petr Baudis @ 2005-04-19 12:25 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: darcs-devel, Git Mailing List Dear diary, on Tue, Apr 19, 2005 at 02:20:55PM CEST, I got a letter where Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> told me that... > > The problem is that there is no sequence of alien versions that one can > > differentiate. Git has a branched history, with each version that follows > > a merge having multiple parents. > > Yep. I've just realised that this morning. Is there some notion of > ``primary parent'' as in Arch? Can a changeset have 0 parents? Yes, the root commit. Usually, there is only one, but there may be multiple of them theoretically. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 12:25 ` [darcs-devel] " Petr Baudis @ 2005-04-20 11:18 ` David Roundy 0 siblings, 0 replies; 30+ messages in thread From: David Roundy @ 2005-04-20 11:18 UTC (permalink / raw) To: Petr Baudis; +Cc: Juliusz Chroboczek, darcs-devel, Git Mailing List On Tue, Apr 19, 2005 at 02:25:18PM +0200, Petr Baudis wrote: > Dear diary, on Tue, Apr 19, 2005 at 02:20:55PM CEST, I got a letter > where Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> told me that... > > > The problem is that there is no sequence of alien versions that one > > > can differentiate. Git has a branched history, with each version > > > that follows a merge having multiple parents. > > > > Yep. I've just realised that this morning. Is there some notion of > > ``primary parent'' as in Arch? Can a changeset have 0 parents? > > Yes, the root commit. Usually, there is only one, but there may be > multiple of them theoretically. Incidentally (and completely off-topic for this thread), wouldn't there be a sha1 tree hash corresponding to a completely empty directory, and couldn't one use that as the parent for the root? Would there be any reason to do so? Just a silly thought... -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [darcs-devel] Darcs and git: plan of action 2005-04-19 12:20 ` Juliusz Chroboczek 2005-04-19 12:25 ` [darcs-devel] " Petr Baudis @ 2005-04-20 11:29 ` David Roundy 1 sibling, 0 replies; 30+ messages in thread From: David Roundy @ 2005-04-20 11:29 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: darcs-devel, Git Mailing List On Tue, Apr 19, 2005 at 02:20:55PM +0200, Juliusz Chroboczek wrote: > [Removing Linus from CC, keeping the Git list -- or should we remove it?] I think leaving much of this on git would be appropriate, since there are issues of how to relate to git that should be relevant. > > If we do it right (automatically tagging like crazy people), darcs > > users between themselves can cherry-pick all they like, without > > introducing inconsistencies or losing interoperability with git. > > You've lost me here. How can you cherry-pick if every tag depends on > the preceding patches? Or are you thinking of pulling just the patch > and not the tag -- in that case, what happens when you push to git a > Darcs patch that depends on a patch that originated with git? Yes, I'm thinking of pulling patches from one darcs repo to another. If we cherry-pick in this way, we need to create a "git-tag" for each patch that we pull without its associated tag. To git, this would look like two separate changes that have the same commit log, except that they have different parents and different commiters and commit dates. I don't think this will be a problem for git, and since darcs will recognize the two patches as the identical darcs patch (we'll need to put somewhere in the git commit log a magic word indicating that this patch originated in darcs), there won't be a problem for darcs either. In case I haven't been clear (which seems likely), the scenario is that darcs user 1 makes the following changes to his darcs version of a git-based repository: changes in 1: A -> B tags in 1: A1 B1 Darcs user 2 wants B, but not A, and didn't do any development: changes in 2: B tags in 2: B2 User 2 pushes to git, and now git has (where P is the parent of both of the above): git: P -> B/B2 (where B/B2 is the commit log with B2 as "committer info" and B as the "author info and long comment) User 1 pushes (everything) to git and merges the two (patch M, which has two parents, B1 and B2: git: ->B/B2--------- / \ P--> A/A1 -> B/B1---> M It's a little lame, and if user 2 doesn't do any real work, the git-using person might be annoyed, but I think it's doable. -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2005-04-20 17:09 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-04-18 21:04 [darcs-devel] Darcs and git: plan of action linux 2005-04-19 0:07 ` Ray Lee 2005-04-19 1:05 ` Kevin Smith 2005-04-19 1:42 ` Ray Lee 2005-04-19 2:05 ` Kevin Smith 2005-04-19 22:08 ` Patrick McFarland 2005-04-19 22:40 ` Ray Lee 2005-04-19 23:00 ` Tupshin Harper 2005-04-19 23:21 ` Ray Lee 2005-04-19 23:38 ` Tupshin Harper 2005-04-19 23:03 ` [darcs-devel] " Kevin Smith 2005-04-19 23:06 ` Ray Lee 2005-04-19 23:32 ` Tupshin Harper 2005-04-20 1:11 ` [darcs-devel] " Ray Lee 2005-04-20 7:52 ` Juliusz Chroboczek 2005-04-20 11:55 ` David Roundy 2005-04-20 17:11 ` Ralph Corderoy 2005-04-19 11:05 ` David Roundy [not found] <20050419235832.56117.qmail@web51003.mail.yahoo.com> 2005-04-20 7:55 ` Juliusz Chroboczek [not found] <7ivf6lm594.fsf@lanthane.pps.jussieu.fr> 2005-04-18 12:20 ` David Roundy 2005-04-18 15:38 ` Linus Torvalds 2005-04-19 10:42 ` [darcs-devel] " David Roundy 2005-04-19 14:55 ` Linus Torvalds 2005-04-19 16:33 ` [darcs-devel] " Tupshin Harper 2005-04-19 16:49 ` Linus Torvalds 2005-04-18 18:35 ` Ray Lee 2005-04-19 0:55 ` Juliusz Chroboczek 2005-04-19 1:43 ` [darcs-devel] " Ray Lee 2005-04-19 8:22 ` Juliusz Chroboczek 2005-04-20 1:22 ` Ray Lee 2005-04-19 11:04 ` David Roundy 2005-04-19 12:20 ` Juliusz Chroboczek 2005-04-19 12:25 ` [darcs-devel] " Petr Baudis 2005-04-20 11:18 ` David Roundy 2005-04-20 11:29 ` David Roundy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).