git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Fredrik Kuivinen <freku045@student.liu.se>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Handling large files with GIT
Date: Wed, 15 Feb 2006 07:44:08 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602150715470.3691@g5.osdl.org> (raw)
In-Reply-To: <7vd5hpj6ab.fsf@assigned-by-dhcp.cox.net>



On Wed, 15 Feb 2006, Junio C Hamano wrote:
> 
> I was thinking about implementing mergers as a pipeline:
> 
> 	git-merge-tree O A B |
>         git-merge-renaming A |
>         git-merge-aggressive A |
>         git-merge-filemerge

Great minds think alike.

> git-merge-tree (yours) does not do trivial collapsing, and
> produce raw-diff from A.

(It does _truly_ trivial collapsing, but I think we both agree: it doesn't 
do anything that we used to go git-merge-one-file on)

> git-merge-renaming reads it, finds
> copied/renamed entries (maybe reusing parts of diffcore), and
> writes out the results in the same format as merge-tree output

I was considering perhaps doing a first cut at that in git-merge-tree 
already. Not sure.

One issue is that I think I may have to change the output format if I do 
that. I should anyway. 

Why?

It's hard to see where "one event" stops, and another starts. I stupidly 
initially thought that you can do it entirely based on looking at the 
numbers, but you can't. Right now you have to look at the pathname too, 
which is kind of sad, and doesn't work after rename detection (since then 
the pathnames won't be sorted any more, and one "event" can have different 
pathnames in different stages).

[ Side note: it doesn't even work for file/directory conflicts, which can 
  have the same name, but are two different "events". So you'd actually 
  have to look at both mode _and_ filename to sort out if two lines that 
  start with "1" and "3" respectively are one event (removal in first 
  branch) or two events ("1" on one file: removal in both branches + "3" 
  on another file: add in second branch) ]

So to do the rename output, you can't use the same format as merge-tree 
uses right _now_. We'd have to add a marker to mark what the event 
boundaries are.

The "mark" could be a running "event number", or even as easy as an 
alternating character ("#" vs " " as the second character in the line or 
similar)

So instead of

	2 100644 ff280e2e1613e808e4d7844376134dfa2bb1fc21 Documentation/cputopology.txt
	2 100644 28c5b7d1eb90f0ccd8e0307c170f89bd7954dc9c Documentation/hwmon/f71805f
	1 100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	3 100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	2 100644 00a009b977e92b1a942d1138afdccf1b725df956 Documentation/i2c/busses/i2c-sis96x
	2 100644 90a5e9e5bef1daa9d0f0621e209827f0d180f384 Documentation/unshare.txt
	2 100644 5127f39fa9bf9a384a6529c6d5deb1002e945de5 arch/arm/mach-s3c2410/s3c2400-gpio.c
	2 100644 8b2394e1ed4088c3b8d38e87e58bde2f38152bf7 arch/arm/mach-s3c2410/s3c2400.h
	 ...

it migth be

	2#100644 ff280e2e1613e808e4d7844376134dfa2bb1fc21 Documentation/cputopology.txt
	2 100644 28c5b7d1eb90f0ccd8e0307c170f89bd7954dc9c Documentation/hwmon/f71805f
	1#100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	3#100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	2 100644 00a009b977e92b1a942d1138afdccf1b725df956 Documentation/i2c/busses/i2c-sis96x
	2#100644 90a5e9e5bef1daa9d0f0621e209827f0d180f384 Documentation/unshare.txt
	2 100644 5127f39fa9bf9a384a6529c6d5deb1002e945de5 arch/arm/mach-s3c2410/s3c2400-gpio.c
	2#100644 8b2394e1ed4088c3b8d38e87e58bde2f38152bf7 arch/arm/mach-s3c2410/s3c2400.h
	 ...

where you can clearly see the "grouping" without having to even look at 
the filename.

(The example I show actually has a rename-with-modifications that was made 
on the first branch: notice that i2c-sis69x vs i2c-sis96x thing?)

I don't know exactly what the "after rename detection" output format would 
be, but it _might_ turn that

	...
	1#b889... i2c-sis69x
	3#b889... i2c-sis69x
	2 00a0... i2c-sis96x
	...

into one event:

	...
	1#b889... i2c-sis69x
	2#00a0... i2c-sis96x
	3#b889... i2c-sis69x
	...

and then the actual file-merge logic would have to merge the names as well 
as the file contents (and in this case, the final name would thus be 
"i2c-sis96x", since one branch hadn't changed it).

Hmm?

		Linus

  reply	other threads:[~2006-02-15 15:44 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-08  9:14 Handling large files with GIT Martin Langhoff
2006-02-08 11:54 ` Johannes Schindelin
2006-02-08 16:34   ` Linus Torvalds
2006-02-08 17:01     ` Linus Torvalds
2006-02-08 20:11       ` Junio C Hamano
2006-02-08 21:20 ` Florian Weimer
2006-02-08 22:35   ` Martin Langhoff
2006-02-13  1:26     ` Ben Clifford
2006-02-13  3:42       ` Linus Torvalds
2006-02-13  4:57         ` Linus Torvalds
2006-02-13  5:05           ` Linus Torvalds
2006-02-13 23:17             ` Ian Molton
2006-02-13 23:19               ` Martin Langhoff
2006-02-14 18:56               ` Johannes Schindelin
2006-02-14 19:52                 ` Linus Torvalds
2006-02-14 21:21                   ` Sam Vilain
2006-02-14 22:01                     ` Linus Torvalds
2006-02-14 22:30                       ` Junio C Hamano
2006-02-15  0:40                         ` Sam Vilain
2006-02-15  1:39                           ` Junio C Hamano
2006-02-15  4:03                             ` Sam Vilain
2006-02-15  2:07                           ` Martin Langhoff
2006-02-15  2:05                         ` Linus Torvalds
2006-02-15  2:18                           ` Linus Torvalds
2006-02-15  2:33                             ` Linus Torvalds
2006-02-15  3:58                               ` Linus Torvalds
2006-02-15  9:54                                 ` Junio C Hamano
2006-02-15 15:44                                   ` Linus Torvalds [this message]
2006-02-15 17:16                                     ` Linus Torvalds
2006-02-16  3:25                                   ` Linus Torvalds
2006-02-16  3:29                                     ` Junio C Hamano
2006-02-16 20:32                                 ` Fredrik Kuivinen
2006-02-13  5:55           ` Jeff Garzik
2006-02-13  6:07             ` Keith Packard
2006-02-14  0:07               ` Martin Langhoff
2006-02-13 16:19             ` Linus Torvalds
2006-02-13  4:40       ` Martin Langhoff
2006-02-09  4:54   ` Greg KH
2006-02-09  5:38     ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602150715470.3691@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=freku045@student.liu.se \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).