Git development
 help / color / mirror / Atom feed
* Re: performance problem: "git commit filename"
From: Linus Torvalds @ 2008-01-14 21:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Kristian H?gsberg
In-Reply-To: <7vodbojhkj.fsf@gitster.siamese.dyndns.org>



On Mon, 14 Jan 2008, Junio C Hamano wrote:
> 
> If we are using different types anyway, we might want to start
> using time_t (a worse alternative is ulong which we use for
> timestamps everywhere else, which we probably want to convert to
> time_t as well).

Careful.

There are two issues, one trivial one and one important one:
 (a) trivially, right now, the code depends on the fact that the in-memory 
     structure is actually smaller than the on-disk one, to avoid having 
     to estimate the size of the allocation for the in-memory array. That 
     was a matter of gettign a quickly working and efficient patch (we do 
     *not* want to allocate those initial "struct cache_entry" entries one 
     by one, we want to allocate one big block!)

     This should be pretty easy to fix up, by just taking the sizes and 
     number of entries (which we do know) into account of the initial 
     allocation. However, it's made a bit more interesting by the 
     differing alignment of the "name" part (and the fact that we align 
     each individual on-disk and in-memory structure).

 (b) More importantly, the on-disk structures DO NOT CONTAIN the whole 
     stat information! The classic example of this is "ce_size": it's 
     32-bit, but it works even if you have a file that is larger than 32 
     bits in size! It just means that from a stat comparison standpoint, 
     we only compare the low 32 bits!

     This means that if you make "ce_size" be a "loff_t", for example, you 
     still need to then *compare* it in just an "unsigned int",  because 
     the upper bits aren't zero - they are "nonexistent".

that (b) is important, and is why some of the code changed from

	-       if (ce->ce_ino != htonl(st->st_ino))
	+       if (ce->ce_ino != (unsigned int) st->st_ino)

ie note how this didn't just remove the "htonl()", it replaced it by a 
"truncate to 'unsigned int'"!

So the fact that the types aren't necessarily the "native" types is 
actually *important*.

> Is there still a reason to insist that ce_flags should be a
> single field that is multi-purposed for storing stage, namelen
> and other flags?  Wouldn't the code become even simpler and
> safer if we separated them into individual fields?  For example,
> a piece like this:

No reason for that part, except I wanted to make this particular initial 
patch be as minimal as possible.

> I somehow had this impression that it was a huge deal to you
> that we do not have to read and populate each cache entry when
> reading from the existing index file, and thought that was the
> reason why we mmap and access the fields in network byte order.
> If that was my misconception, then I agree this is a good change
> to make everything else easier to write and much less error
> prone.

I was a bit worried about it, but I did make sure that the allocation is 
done as one single allocation, and I did time it. Doing a 

	git update-index --refresh

seems to be identical before and after, so the costs of conversion are 
either very small or are possibly counteracted by the fact that we then 
can avoid the byte-order conversion of individual words less at run-time.

		Linus

^ permalink raw reply

* Re: [PATCH] index: be careful when handling long names
From: Junio C Hamano @ 2008-01-14 21:03 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20080113233323.GB19970@steel.home>

Alex Riesen <raa.lkml@gmail.com> writes:

> Junio C Hamano, Mon, Jan 14, 2008 00:08:07 +0100:
> ...
>> I would agree that it might overflow the argument limit when
>> this is given to "echo", though.  We cannot do much about it,
>> but you may have cleverer ideas.
>
> I thought about conditionally disabling the test, like it was done
> when the tabs in filenames had to be tested.

Yes, we can and should do that when somebody reports this (or
any other tests) a real issue, as we have done for other tests.

^ permalink raw reply

* Re: [PATCH 1/2] parse_commit_buffer: don't parse invalid commits
From: Martin Koegler @ 2008-01-14 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vk5mclvk3.fsf@gitster.siamese.dyndns.org>

On Sun, Jan 13, 2008 at 11:23:40PM -0800, Junio C Hamano wrote:
> Martin Koegler <mkoegler@auto.tuwien.ac.at> writes:
> 
> > +	if (!parse_commit_date(bufptr, tail, &item->date))
> > +		return error("bogus commit date in object %s", sha1_to_hex(item->object.sha1));
> >  
> >  	if (track_object_refs) {
> >  		unsigned i = 0;
>
> I suspect this might be an undesirable regression.

You seem to have missed my reply to your last mail:
http://marc.info/?l=git&m=119969163624138&w=2

I asked you, what you would think about this change, but got no
answer. Anyway, now I know your opinion ;-)

> If somebody managed to create a commit with a bogus "author"
> line and wanted to clean up the history, your previous one at
> least gave something usable back, even though it had to come up
> with a bogus date.  It gave the rest of the data back without
> barfing.  And it was easy to see which "resurrected" commit had
> a missing author date (bogus ones always gave 0 timestamp).
>
> This round you made it to error out, and callers that check the
> return value of parse_commit() would stop traversing the
> history, even if the commit in question has perfectly valid
> "parent " lines, thinking "ah, this commit object is faulty".
> It actively interferes with attempts to resurrect data from
> history that contains a faulty commit.

On the other hand, it is possible, that somebody pushed such a commit
out, if he does not notice it. Then its difficult to get rid of the
broken commit. [Hasn't happend a broken commit on pu recently?]

parse_commit_date is not very strict, so its likely, that it miss such
a commit. Commit parsing is too common function to slow it down with
further checks.

> Your previous version was much better with respect to this
> issue.  It was about being more careful not to read outside the
> commit object buffer, while still allowing the data from a
> history that has an unfortunate commit with broken author line
> to be resurrected more easily.

I'll repost a version, which reverts this change.

> I do not think the checks done by fsck and parse_commit should
> share the same strictness.  They serve different purposes.

Maybe I can improve fsck.

mfg Martin Kögler

^ permalink raw reply

* Re: Git Cygwin - unable to create any repository - help!
From: Alex Riesen @ 2008-01-14 20:29 UTC (permalink / raw)
  To: Paul Umbers; +Cc: git
In-Reply-To: <a5eb9c330801140921m63b1b8a9pe67bf6f0d2e58dba@mail.gmail.com>

Paul Umbers, Mon, Jan 14, 2008 18:21:44 +0100:
> Trying to create a repository under the cygwin install of git, windows
> XP Pro. I can create the initial repository OK using "git init" and
> add files using "git add .", but when I come to commit I get the
> messages:
> 
> error: invalid object d9b06fceac52f6c24357e6a7f85c601
> 088381152
> fatal: git-write-tree: error building trees

Is it a "text-mode" mount where your repository is to reside?

^ permalink raw reply

* Re: [PATCH] http-push: making HTTP push more robust and more user-friendly
From: Johannes Schindelin @ 2008-01-14 20:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Grégoire Barbier, git
In-Reply-To: <7v1w8kkxo7.fsf@gitster.siamese.dyndns.org>

Hi,

On Mon, 14 Jan 2008, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Sun, 13 Jan 2008, Junio C Hamano wrote:
> >
> >> The second one to add a couple of "goto cleanup" looked correct.  
> >> Acks, people?
> >
> > I haven't used http-push in ages, but there was a bug report with 
> > msysgit.  Hopefully that issue gets fixed by this patch.
> 
> Could you work with the reporter to see if this fixes the issue for him?

I wanted to try to reproduce first, but I had definitely not enough time 
for git today.

Will try to find some time tomorrow,
Dscho

^ permalink raw reply

* Re: performance problem: "git commit filename"
From: Junio C Hamano @ 2008-01-14 20:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Kristian H?gsberg
In-Reply-To: <alpine.LFD.1.00.0801141132250.2806@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, 14 Jan 2008, Linus Torvalds wrote:
>> 
>> So I think this patch is good, but I think it would be even better if we 
>> just bit the bullet and started looking at having a different in-memory 
>> representation from the on-disk one.
>
> Ok, so here's a possible patch.
>
> It passes all the tests for me, and looks fairly ok, but it's also a bit 
> big.
>
> What makes it big is that I made the in-memory format be in host order, so 
> that we can remove a *lot* of the "htonl/ntohl" switcheroo, and do it just 
> on index file read/write.
>
> The nice thing about this patch is that it would make it a lot easier to 
> do any index handling changes,  because it makes a clear difference 
> between the on-disk and the in-memory formats.
>
> I realize that the patch looks big (195 lines inserted and 148 lines 
> removed), but *most* of the lines are literally those ntohl() 
> simplifications, ie stuff like
>
> 	-       if (S_ISGITLINK(ntohl(ce->ce_mode))) {
> 	+       if (S_ISGITLINK(ce->ce_mode)) {
>
> so while it adds lines (for the "convert from disk" and "convert to disk" 
> format conversions), in  many ways it really simplifies the source code 
> too.
>
> Comments?
>
> This is on top of current master, so it's *before* junios thing that adds 
> a CE_UPTODATE.
>
> With this, the high 16 bits of "ce_flags" are in-memory only, so you could 
> just make CE_UPTODATE be 0x10000, and it automatically ends up never being 
> written to disk (and always "reads" as zero).
>
> 		Linus

> diff --git a/cache.h b/cache.h
> index 39331c2..0aed11e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -94,17 +94,31 @@ struct cache_time {
>   * We save the fields in big-endian order to allow using the
>   * index file over NFS transparently.
>   */
> +struct ondisk_cache_entry {
> +	struct cache_time ctime;
> +	struct cache_time mtime;
> +	unsigned int dev;
> +	unsigned int ino;
> +	unsigned int mode;
> +	unsigned int uid;
> +	unsigned int gid;
> +	unsigned int size;
> +	unsigned char sha1[20];
> +	unsigned short flags;
> +	char name[FLEX_ARRAY]; /* more */
> +};
> +
>  struct cache_entry {
> -	struct cache_time ce_ctime;
> -	struct cache_time ce_mtime;
> +	unsigned int ce_ctime;
> +	unsigned int ce_mtime;
>  	unsigned int ce_dev;
>  	unsigned int ce_ino;
>  	unsigned int ce_mode;
>  	unsigned int ce_uid;
>  	unsigned int ce_gid;
>  	unsigned int ce_size;
> +	unsigned int ce_flags;
>  	unsigned char sha1[20];
> -	unsigned short ce_flags;
>  	char name[FLEX_ARRAY]; /* more */
>  };

If we are using different types anyway, we might want to start
using time_t (a worse alternative is ulong which we use for
timestamps everywhere else, which we probably want to convert to
time_t as well).

Is there still a reason to insist that ce_flags should be a
single field that is multi-purposed for storing stage, namelen
and other flags?  Wouldn't the code become even simpler and
safer if we separated them into individual fields?  For example,
a piece like this:

@@ -2388,7 +2388,7 @@ static void add_index_file(const char *path, unsigned mode, void *buf, unsigned
 	ce = xcalloc(1, ce_size);
 	memcpy(ce->name, path, namelen);
 	ce->ce_mode = create_ce_mode(mode);
-	ce->ce_flags = htons(namelen);
+	ce->ce_flags = namelen;
 	if (S_ISGITLINK(mode)) {
 		const char *s = buf;
 
still has that "names longer than 4096 bytes go unchecked,
corrupting stage information" issue.

+static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce)
+{
+	ce->ce_ctime = ntohl(ondisk->ctime.sec);
+	ce->ce_mtime = ntohl(ondisk->mtime.sec);
+	ce->ce_dev   = ntohl(ondisk->dev);
+	ce->ce_ino   = ntohl(ondisk->ino);
+	ce->ce_mode  = ntohl(ondisk->mode);
+	ce->ce_uid   = ntohl(ondisk->uid);
+	ce->ce_gid   = ntohl(ondisk->gid);
+	ce->ce_size  = ntohl(ondisk->size);
+	/* On-disk flags are just 16 bits */
+	ce->ce_flags = ntohs(ondisk->flags);
+	hashcpy(ce->sha1, ondisk->sha1);
+	memcpy(ce->name, ondisk->name, ce_namelen(ce)+1);
+}

I presume that the fix to handle names that are longer than 4096
bytes naturally fits here.  We can make the low 12-bits of
ondisk->ce_flags all 1 for such names and we actually
count the strlen to populate ce->ce_namelen.

> +	/*
> +	 * The disk format is actually larger than the in-memory format,
> +	 * due to space for nsec etc, so even though the in-memory one
> +	 * has room for a few  more flags, we can allocate using the same
> +	 * index size
> +	 */
> +	istate->alloc = xmalloc(mmap_size);
> +
> +	src_offset = sizeof(*hdr);
> +	dst_offset = 0;
>  	for (i = 0; i < istate->cache_nr; i++) {
> +		struct ondisk_cache_entry *disk_ce;
>  		struct cache_entry *ce;
>  
> -		ce = (struct cache_entry *)((char *)(istate->mmap) + offset);
> -		offset = offset + ce_size(ce);
> +		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
> +		ce = (struct cache_entry *)((char *)istate->alloc + dst_offset);
> +		convert_from_disk(disk_ce, ce);
>  		istate->cache[i] = ce;
> +
> +		src_offset += ondisk_ce_size(ce);
> +		dst_offset += ce_size(ce);
>  	}
>  	istate->timestamp = st.st_mtime;

I somehow had this impression that it was a huge deal to you
that we do not have to read and populate each cache entry when
reading from the existing index file, and thought that was the
reason why we mmap and access the fields in network byte order.
If that was my misconception, then I agree this is a good change
to make everything else easier to write and much less error
prone.

^ permalink raw reply

* Re: [msysGit] Re: safecrlf not in 1.5.4
From: Junio C Hamano @ 2008-01-14 19:41 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Johannes Schindelin, Mark Levedahl, Git Mailing List, msysGit
In-Reply-To: <E4FD5B11-F61A-4838-B9AD-1E6F6C2B0AD6@zib.de>

Steffen Prohaska <prohaska@zib.de> writes:

> On Jan 14, 2008, at 8:30 AM, Junio C Hamano wrote:
>
>> By definition of 'maint', 1.5.4.X are to fix bugs in the
>> features that are in 1.5.4, so the answer is no.
>
> I expected this answer.

And it won't change.

>> But we could end up having a short cycle for 1.5.5 if we agree
>> that the lack of crlf=safe is a severe bug that is worth fixing
>> post 1.5.4.
> ...
> So I should try harder to find better arguments.  But this has
> time until the 1.5.4 release is out.  For now, I am being quiet.

Instead, you could be louder and convince people that it is a
severe bug worth fixing before 1.5.4, like Linus did with the
issue with performance regression on a partial commit.  It's
entirely your choice.

> (Well, I'll continue to improve the safecrlf patch and most
> likely will send it to the list, too...)

Please do.  "I am currently not convinced" does not mean "I am
always right" nor "I won't reconsider".

^ permalink raw reply

* Re: performance problem: "git commit filename"
From: Linus Torvalds @ 2008-01-14 19:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Kristian H?gsberg
In-Reply-To: <alpine.LFD.1.00.0801140902140.2806@woody.linux-foundation.org>



On Mon, 14 Jan 2008, Linus Torvalds wrote:
> 
> So I think this patch is good, but I think it would be even better if we 
> just bit the bullet and started looking at having a different in-memory 
> representation from the on-disk one.

Ok, so here's a possible patch.

It passes all the tests for me, and looks fairly ok, but it's also a bit 
big.

What makes it big is that I made the in-memory format be in host order, so 
that we can remove a *lot* of the "htonl/ntohl" switcheroo, and do it just 
on index file read/write.

The nice thing about this patch is that it would make it a lot easier to 
do any index handling changes,  because it makes a clear difference 
between the on-disk and the in-memory formats.

I realize that the patch looks big (195 lines inserted and 148 lines 
removed), but *most* of the lines are literally those ntohl() 
simplifications, ie stuff like

	-       if (S_ISGITLINK(ntohl(ce->ce_mode))) {
	+       if (S_ISGITLINK(ce->ce_mode)) {

so while it adds lines (for the "convert from disk" and "convert to disk" 
format conversions), in  many ways it really simplifies the source code 
too.

Comments?

This is on top of current master, so it's *before* junios thing that adds 
a CE_UPTODATE.

With this, the high 16 bits of "ce_flags" are in-memory only, so you could 
just make CE_UPTODATE be 0x10000, and it automatically ends up never being 
written to disk (and always "reads" as zero).

		Linus

---
 builtin-apply.c        |   10 ++--
 builtin-blame.c        |    2 +-
 builtin-fsck.c         |    2 +-
 builtin-grep.c         |    4 +-
 builtin-ls-files.c     |   10 ++--
 builtin-read-tree.c    |    2 +-
 builtin-rerere.c       |    4 +-
 builtin-update-index.c |   18 +++---
 cache-tree.c           |    2 +-
 cache.h                |   41 +++++++----
 diff-lib.c             |   31 ++++-----
 dir.c                  |    2 +-
 entry.c                |    6 +-
 merge-index.c          |    2 +-
 merge-recursive.c      |    2 +-
 reachable.c            |    2 +-
 read-cache.c           |  183 ++++++++++++++++++++++++++++-------------------
 sha1_name.c            |    2 +-
 tree.c                 |    4 +-
 unpack-trees.c         |   14 ++--
 20 files changed, 195 insertions(+), 148 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index d57bb6e..bd7cc37 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -1946,7 +1946,7 @@ static int read_file_or_gitlink(struct cache_entry *ce, struct strbuf *buf)
 	if (!ce)
 		return 0;
 
-	if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+	if (S_ISGITLINK(ce->ce_mode)) {
 		strbuf_grow(buf, 100);
 		strbuf_addf(buf, "Subproject commit %s\n", sha1_to_hex(ce->sha1));
 	} else {
@@ -2023,7 +2023,7 @@ static int check_to_create_blob(const char *new_name, int ok_if_exists)
 
 static int verify_index_match(struct cache_entry *ce, struct stat *st)
 {
-	if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+	if (S_ISGITLINK(ce->ce_mode)) {
 		if (!S_ISDIR(st->st_mode))
 			return -1;
 		return 0;
@@ -2082,12 +2082,12 @@ static int check_patch(struct patch *patch, struct patch *prev_patch)
 				return error("%s: does not match index",
 					     old_name);
 			if (cached)
-				st_mode = ntohl(ce->ce_mode);
+				st_mode = ce->ce_mode;
 		} else if (stat_ret < 0)
 			return error("%s: %s", old_name, strerror(errno));
 
 		if (!cached)
-			st_mode = ntohl(ce_mode_from_stat(ce, st.st_mode));
+			st_mode = ce_mode_from_stat(ce, st.st_mode);
 
 		if (patch->is_new < 0)
 			patch->is_new = 0;
@@ -2388,7 +2388,7 @@ static void add_index_file(const char *path, unsigned mode, void *buf, unsigned
 	ce = xcalloc(1, ce_size);
 	memcpy(ce->name, path, namelen);
 	ce->ce_mode = create_ce_mode(mode);
-	ce->ce_flags = htons(namelen);
+	ce->ce_flags = namelen;
 	if (S_ISGITLINK(mode)) {
 		const char *s = buf;
 
diff --git a/builtin-blame.c b/builtin-blame.c
index 9b4c02e..c7e6887 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -2092,7 +2092,7 @@ static struct commit *fake_working_tree_commit(const char *path, const char *con
 	if (!mode) {
 		int pos = cache_name_pos(path, len);
 		if (0 <= pos)
-			mode = ntohl(active_cache[pos]->ce_mode);
+			mode = active_cache[pos]->ce_mode;
 		else
 			/* Let's not bother reading from HEAD tree */
 			mode = S_IFREG | 0644;
diff --git a/builtin-fsck.c b/builtin-fsck.c
index e4874f6..8876d34 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -762,7 +762,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			struct blob *blob;
 			struct object *obj;
 
-			mode = ntohl(active_cache[i]->ce_mode);
+			mode = active_cache[i]->ce_mode;
 			if (S_ISGITLINK(mode))
 				continue;
 			blob = lookup_blob(active_cache[i]->sha1);
diff --git a/builtin-grep.c b/builtin-grep.c
index 0d6cc73..9180b39 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -331,7 +331,7 @@ static int external_grep(struct grep_opt *opt, const char **paths, int cached)
 		struct cache_entry *ce = active_cache[i];
 		char *name;
 		int kept;
-		if (!S_ISREG(ntohl(ce->ce_mode)))
+		if (!S_ISREG(ce->ce_mode))
 			continue;
 		if (!pathspec_matches(paths, ce->name))
 			continue;
@@ -387,7 +387,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
 
 	for (nr = 0; nr < active_nr; nr++) {
 		struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ntohl(ce->ce_mode)))
+		if (!S_ISREG(ce->ce_mode))
 			continue;
 		if (!pathspec_matches(paths, ce->name))
 			continue;
diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index 0f0ab2d..d56e33e 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -189,7 +189,7 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
 		return;
 
 	if (tag && *tag && show_valid_bit &&
-	    (ce->ce_flags & htons(CE_VALID))) {
+	    (ce->ce_flags & CE_VALID)) {
 		static char alttag[4];
 		memcpy(alttag, tag, 3);
 		if (isalpha(tag[0]))
@@ -210,7 +210,7 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
 	} else {
 		printf("%s%06o %s %d\t",
 		       tag,
-		       ntohl(ce->ce_mode),
+		       ce->ce_mode,
 		       abbrev ? find_unique_abbrev(ce->sha1,abbrev)
 				: sha1_to_hex(ce->sha1),
 		       ce_stage(ce));
@@ -242,7 +242,7 @@ static void show_files(struct dir_struct *dir, const char *prefix)
 				continue;
 			if (show_unmerged && !ce_stage(ce))
 				continue;
-			if (ce->ce_flags & htons(CE_UPDATE))
+			if (ce->ce_flags & CE_UPDATE)
 				continue;
 			show_ce_entry(ce_stage(ce) ? tag_unmerged : tag_cached, ce);
 		}
@@ -350,7 +350,7 @@ void overlay_tree_on_cache(const char *tree_name, const char *prefix)
 		struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
 			continue;
-		ce->ce_flags |= htons(CE_STAGEMASK);
+		ce->ce_flags |= CE_STAGEMASK;
 	}
 
 	if (prefix) {
@@ -379,7 +379,7 @@ void overlay_tree_on_cache(const char *tree_name, const char *prefix)
 			 */
 			if (last_stage0 &&
 			    !strcmp(last_stage0->name, ce->name))
-				ce->ce_flags |= htons(CE_UPDATE);
+				ce->ce_flags |= CE_UPDATE;
 		}
 	}
 }
diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index 43cd56a..eb879e1 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -46,7 +46,7 @@ static int read_cache_unmerged(void)
 			cache_tree_invalidate_path(active_cache_tree, ce->name);
 			last = ce;
 			ce->ce_mode = 0;
-			ce->ce_flags &= ~htons(CE_STAGEMASK);
+			ce->ce_flags &= ~CE_STAGEMASK;
 		}
 		*dst++ = ce;
 	}
diff --git a/builtin-rerere.c b/builtin-rerere.c
index 37e6248..df1a55d 100644
--- a/builtin-rerere.c
+++ b/builtin-rerere.c
@@ -149,8 +149,8 @@ static int find_conflict(struct path_list *conflict)
 		if (ce_stage(e2) == 2 &&
 		    ce_stage(e3) == 3 &&
 		    ce_same_name(e2, e3) &&
-		    S_ISREG(ntohl(e2->ce_mode)) &&
-		    S_ISREG(ntohl(e3->ce_mode))) {
+		    S_ISREG(e2->ce_mode) &&
+		    S_ISREG(e3->ce_mode)) {
 			path_list_insert((const char *)e2->name, conflict);
 			i++; /* skip over both #2 and #3 */
 		}
diff --git a/builtin-update-index.c b/builtin-update-index.c
index e1a938d..b5a34ed 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -47,10 +47,10 @@ static int mark_valid(const char *path)
 	if (0 <= pos) {
 		switch (mark_valid_only) {
 		case MARK_VALID:
-			active_cache[pos]->ce_flags |= htons(CE_VALID);
+			active_cache[pos]->ce_flags |= CE_VALID;
 			break;
 		case UNMARK_VALID:
-			active_cache[pos]->ce_flags &= ~htons(CE_VALID);
+			active_cache[pos]->ce_flags &= ~CE_VALID;
 			break;
 		}
 		cache_tree_invalidate_path(active_cache_tree, path);
@@ -95,7 +95,7 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);
 	memcpy(ce->name, path, len);
-	ce->ce_flags = htons(len);
+	ce->ce_flags = len;
 	fill_stat_cache_info(ce, st);
 	ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
 
@@ -139,7 +139,7 @@ static int process_directory(const char *path, int len, struct stat *st)
 	/* Exact match: file or existing gitlink */
 	if (pos >= 0) {
 		struct cache_entry *ce = active_cache[pos];
-		if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+		if (S_ISGITLINK(ce->ce_mode)) {
 
 			/* Do nothing to the index if there is no HEAD! */
 			if (resolve_gitlink_ref(path, "HEAD", sha1) < 0)
@@ -183,7 +183,7 @@ static int process_file(const char *path, int len, struct stat *st)
 	int pos = cache_name_pos(path, len);
 	struct cache_entry *ce = pos < 0 ? NULL : active_cache[pos];
 
-	if (ce && S_ISGITLINK(ntohl(ce->ce_mode)))
+	if (ce && S_ISGITLINK(ce->ce_mode))
 		return error("%s is already a gitlink, not replacing", path);
 
 	return add_one_path(ce, path, len, st);
@@ -226,7 +226,7 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
 	ce->ce_flags = create_ce_flags(len, stage);
 	ce->ce_mode = create_ce_mode(mode);
 	if (assume_unchanged)
-		ce->ce_flags |= htons(CE_VALID);
+		ce->ce_flags |= CE_VALID;
 	option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
 	option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
 	if (add_cache_entry(ce, option))
@@ -246,14 +246,14 @@ static void chmod_path(int flip, const char *path)
 	if (pos < 0)
 		goto fail;
 	ce = active_cache[pos];
-	mode = ntohl(ce->ce_mode);
+	mode = ce->ce_mode;
 	if (!S_ISREG(mode))
 		goto fail;
 	switch (flip) {
 	case '+':
-		ce->ce_mode |= htonl(0111); break;
+		ce->ce_mode |= 0111; break;
 	case '-':
-		ce->ce_mode &= htonl(~0111); break;
+		ce->ce_mode &= ~0111; break;
 	default:
 		goto fail;
 	}
diff --git a/cache-tree.c b/cache-tree.c
index 50b3526..3ef5f87 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -320,7 +320,7 @@ static int update_one(struct cache_tree *it,
 		}
 		else {
 			sha1 = ce->sha1;
-			mode = ntohl(ce->ce_mode);
+			mode = ce->ce_mode;
 			entlen = pathlen - baselen;
 		}
 		if (mode != S_IFGITLINK && !missing_ok && !has_sha1_file(sha1))
diff --git a/cache.h b/cache.h
index 39331c2..0aed11e 100644
--- a/cache.h
+++ b/cache.h
@@ -94,17 +94,31 @@ struct cache_time {
  * We save the fields in big-endian order to allow using the
  * index file over NFS transparently.
  */
+struct ondisk_cache_entry {
+	struct cache_time ctime;
+	struct cache_time mtime;
+	unsigned int dev;
+	unsigned int ino;
+	unsigned int mode;
+	unsigned int uid;
+	unsigned int gid;
+	unsigned int size;
+	unsigned char sha1[20];
+	unsigned short flags;
+	char name[FLEX_ARRAY]; /* more */
+};
+
 struct cache_entry {
-	struct cache_time ce_ctime;
-	struct cache_time ce_mtime;
+	unsigned int ce_ctime;
+	unsigned int ce_mtime;
 	unsigned int ce_dev;
 	unsigned int ce_ino;
 	unsigned int ce_mode;
 	unsigned int ce_uid;
 	unsigned int ce_gid;
 	unsigned int ce_size;
+	unsigned int ce_flags;
 	unsigned char sha1[20];
-	unsigned short ce_flags;
 	char name[FLEX_ARRAY]; /* more */
 };
 
@@ -114,28 +128,29 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
-#define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
-#define ce_namelen(ce) (CE_NAMEMASK & ntohs((ce)->ce_flags))
+#define create_ce_flags(len, stage) ((len) | ((stage) << CE_STAGESHIFT))
+#define ce_namelen(ce) (CE_NAMEMASK & (ce)->ce_flags)
 #define ce_size(ce) cache_entry_size(ce_namelen(ce))
-#define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
+#define ondisk_ce_size(ce) ondisk_cache_entry_size(ce_namelen(ce))
+#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
 
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
 static inline unsigned int create_ce_mode(unsigned int mode)
 {
 	if (S_ISLNK(mode))
-		return htonl(S_IFLNK);
+		return S_IFLNK;
 	if (S_ISDIR(mode) || S_ISGITLINK(mode))
-		return htonl(S_IFGITLINK);
-	return htonl(S_IFREG | ce_permissions(mode));
+		return S_IFGITLINK;
+	return S_IFREG | ce_permissions(mode);
 }
 static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned int mode)
 {
 	extern int trust_executable_bit, has_symlinks;
 	if (!has_symlinks && S_ISREG(mode) &&
-	    ce && S_ISLNK(ntohl(ce->ce_mode)))
+	    ce && S_ISLNK(ce->ce_mode))
 		return ce->ce_mode;
 	if (!trust_executable_bit && S_ISREG(mode)) {
-		if (ce && S_ISREG(ntohl(ce->ce_mode)))
+		if (ce && S_ISREG(ce->ce_mode))
 			return ce->ce_mode;
 		return create_ce_mode(0666);
 	}
@@ -146,14 +161,14 @@ static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned in
 	S_ISLNK(mode) ? S_IFLNK : S_ISDIR(mode) ? S_IFDIR : S_IFGITLINK)
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
+#define ondisk_cache_entry_size(len) ((offsetof(struct ondisk_cache_entry,name) + (len) + 8) & ~7)
 
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int cache_nr, cache_alloc, cache_changed;
 	struct cache_tree *cache_tree;
 	time_t timestamp;
-	void *mmap;
-	size_t mmap_size;
+	void *alloc;
 };
 
 extern struct index_state the_index;
diff --git a/diff-lib.c b/diff-lib.c
index d85d8f3..c20adaa 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -37,7 +37,7 @@ static int get_mode(const char *path, int *mode)
 	if (!path || !strcmp(path, "/dev/null"))
 		*mode = 0;
 	else if (!strcmp(path, "-"))
-		*mode = ntohl(create_ce_mode(0666));
+		*mode = create_ce_mode(0666);
 	else if (stat(path, &st))
 		return error("Could not access '%s'", path);
 	else
@@ -384,7 +384,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 					continue;
 			}
 			else
-				dpath->mode = ntohl(ce_mode_from_stat(ce, st.st_mode));
+				dpath->mode = ce_mode_from_stat(ce, st.st_mode);
 
 			while (i < entries) {
 				struct cache_entry *nce = active_cache[i];
@@ -398,10 +398,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				 */
 				stage = ce_stage(nce);
 				if (2 <= stage) {
-					int mode = ntohl(nce->ce_mode);
+					int mode = nce->ce_mode;
 					num_compare_stages++;
 					hashcpy(dpath->parent[stage-2].sha1, nce->sha1);
-					dpath->parent[stage-2].mode = ntohl(ce_mode_from_stat(nce, mode));
+					dpath->parent[stage-2].mode = ce_mode_from_stat(nce, mode);
 					dpath->parent[stage-2].status =
 						DIFF_STATUS_MODIFIED;
 				}
@@ -442,15 +442,15 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			}
 			if (silent_on_removed)
 				continue;
-			diff_addremove(&revs->diffopt, '-', ntohl(ce->ce_mode),
+			diff_addremove(&revs->diffopt, '-', ce->ce_mode,
 				       ce->sha1, ce->name, NULL);
 			continue;
 		}
 		changed = ce_match_stat(ce, &st, ce_option);
 		if (!changed && !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
 			continue;
-		oldmode = ntohl(ce->ce_mode);
-		newmode = ntohl(ce_mode_from_stat(ce, st.st_mode));
+		oldmode = ce->ce_mode;
+		newmode = ce_mode_from_stat(ce, st.st_mode);
 		diff_change(&revs->diffopt, oldmode, newmode,
 			    ce->sha1, (changed ? null_sha1 : ce->sha1),
 			    ce->name, NULL);
@@ -471,7 +471,7 @@ static void diff_index_show_file(struct rev_info *revs,
 				 struct cache_entry *ce,
 				 unsigned char *sha1, unsigned int mode)
 {
-	diff_addremove(&revs->diffopt, prefix[0], ntohl(mode),
+	diff_addremove(&revs->diffopt, prefix[0], mode,
 		       sha1, ce->name, NULL);
 }
 
@@ -550,14 +550,14 @@ static int show_modified(struct rev_info *revs,
 		p->len = pathlen;
 		memcpy(p->path, new->name, pathlen);
 		p->path[pathlen] = 0;
-		p->mode = ntohl(mode);
+		p->mode = mode;
 		hashclr(p->sha1);
 		memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
 		p->parent[0].status = DIFF_STATUS_MODIFIED;
-		p->parent[0].mode = ntohl(new->ce_mode);
+		p->parent[0].mode = new->ce_mode;
 		hashcpy(p->parent[0].sha1, new->sha1);
 		p->parent[1].status = DIFF_STATUS_MODIFIED;
-		p->parent[1].mode = ntohl(old->ce_mode);
+		p->parent[1].mode = old->ce_mode;
 		hashcpy(p->parent[1].sha1, old->sha1);
 		show_combined_diff(p, 2, revs->dense_combined_merges, revs);
 		free(p);
@@ -569,9 +569,6 @@ static int show_modified(struct rev_info *revs,
 	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
 		return 0;
 
-	mode = ntohl(mode);
-	oldmode = ntohl(oldmode);
-
 	diff_change(&revs->diffopt, oldmode, mode,
 		    old->sha1, sha1, old->name, NULL);
 	return 0;
@@ -628,7 +625,7 @@ static int diff_cache(struct rev_info *revs,
 					   cached, match_missing))
 				break;
 			diff_unmerge(&revs->diffopt, ce->name,
-				     ntohl(ce->ce_mode), ce->sha1);
+				     ce->ce_mode, ce->sha1);
 			break;
 		case 3:
 			diff_unmerge(&revs->diffopt, ce->name,
@@ -664,7 +661,7 @@ static void mark_merge_entries(void)
 		struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
 			continue;
-		ce->ce_flags |= htons(CE_STAGEMASK);
+		ce->ce_flags |= CE_STAGEMASK;
 	}
 }
 
@@ -723,7 +720,7 @@ int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt)
 						   ce->name);
 			last = ce;
 			ce->ce_mode = 0;
-			ce->ce_flags &= ~htons(CE_STAGEMASK);
+			ce->ce_flags &= ~CE_STAGEMASK;
 		}
 		*dst++ = ce;
 	}
diff --git a/dir.c b/dir.c
index 3e345c2..1b9cc7a 100644
--- a/dir.c
+++ b/dir.c
@@ -391,7 +391,7 @@ static enum exist_status directory_exists_in_index(const char *dirname, int len)
 			break;
 		if (endchar == '/')
 			return index_directory;
-		if (!endchar && S_ISGITLINK(ntohl(ce->ce_mode)))
+		if (!endchar && S_ISGITLINK(ce->ce_mode))
 			return index_gitdir;
 	}
 	return index_nonexistent;
diff --git a/entry.c b/entry.c
index 257ab46..44f4b89 100644
--- a/entry.c
+++ b/entry.c
@@ -103,7 +103,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 	int fd;
 	long wrote;
 
-	switch (ntohl(ce->ce_mode) & S_IFMT) {
+	switch (ce->ce_mode & S_IFMT) {
 		char *new;
 		struct strbuf buf;
 		unsigned long size;
@@ -129,7 +129,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 			strcpy(path, ".merge_file_XXXXXX");
 			fd = mkstemp(path);
 		} else
-			fd = create_file(path, ntohl(ce->ce_mode));
+			fd = create_file(path, ce->ce_mode);
 		if (fd < 0) {
 			free(new);
 			return error("git-checkout-index: unable to create file %s (%s)",
@@ -221,7 +221,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 		unlink(path);
 		if (S_ISDIR(st.st_mode)) {
 			/* If it is a gitlink, leave it alone! */
-			if (S_ISGITLINK(ntohl(ce->ce_mode)))
+			if (S_ISGITLINK(ce->ce_mode))
 				return 0;
 			if (!state->force)
 				return error("%s is a directory", path);
diff --git a/merge-index.c b/merge-index.c
index fa719cb..bbb700b 100644
--- a/merge-index.c
+++ b/merge-index.c
@@ -48,7 +48,7 @@ static int merge_entry(int pos, const char *path)
 			break;
 		found++;
 		strcpy(hexbuf[stage], sha1_to_hex(ce->sha1));
-		sprintf(ownbuf[stage], "%o", ntohl(ce->ce_mode));
+		sprintf(ownbuf[stage], "%o", ce->ce_mode);
 		arguments[stage] = hexbuf[stage];
 		arguments[stage + 4] = ownbuf[stage];
 	} while (++pos < active_nr);
diff --git a/merge-recursive.c b/merge-recursive.c
index b34177d..0db0b3a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -333,7 +333,7 @@ static struct path_list *get_unmerged(void)
 			item->util = xcalloc(1, sizeof(struct stage_data));
 		}
 		e = item->util;
-		e->stages[ce_stage(ce)].mode = ntohl(ce->ce_mode);
+		e->stages[ce_stage(ce)].mode = ce->ce_mode;
 		hashcpy(e->stages[ce_stage(ce)].sha, ce->sha1);
 	}
 
diff --git a/reachable.c b/reachable.c
index 6383401..00f289f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -176,7 +176,7 @@ static void add_cache_refs(struct rev_info *revs)
 		 * lookup_blob() on them, to avoid populating the hash table
 		 * with invalid information
 		 */
-		if (S_ISGITLINK(ntohl(active_cache[i]->ce_mode)))
+		if (S_ISGITLINK(active_cache[i]->ce_mode))
 			continue;
 
 		lookup_blob(active_cache[i]->sha1);
diff --git a/read-cache.c b/read-cache.c
index 7db5588..4414a40 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -30,20 +30,16 @@ struct index_state the_index;
  */
 void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 {
-	ce->ce_ctime.sec = htonl(st->st_ctime);
-	ce->ce_mtime.sec = htonl(st->st_mtime);
-#ifdef USE_NSEC
-	ce->ce_ctime.nsec = htonl(st->st_ctim.tv_nsec);
-	ce->ce_mtime.nsec = htonl(st->st_mtim.tv_nsec);
-#endif
-	ce->ce_dev = htonl(st->st_dev);
-	ce->ce_ino = htonl(st->st_ino);
-	ce->ce_uid = htonl(st->st_uid);
-	ce->ce_gid = htonl(st->st_gid);
-	ce->ce_size = htonl(st->st_size);
+	ce->ce_ctime = st->st_ctime;
+	ce->ce_mtime = st->st_mtime;
+	ce->ce_dev = st->st_dev;
+	ce->ce_ino = st->st_ino;
+	ce->ce_uid = st->st_uid;
+	ce->ce_gid = st->st_gid;
+	ce->ce_size = st->st_size;
 
 	if (assume_unchanged)
-		ce->ce_flags |= htons(CE_VALID);
+		ce->ce_flags |= CE_VALID;
 }
 
 static int ce_compare_data(struct cache_entry *ce, struct stat *st)
@@ -116,7 +112,7 @@ static int ce_modified_check_fs(struct cache_entry *ce, struct stat *st)
 			return DATA_CHANGED;
 		break;
 	case S_IFDIR:
-		if (S_ISGITLINK(ntohl(ce->ce_mode)))
+		if (S_ISGITLINK(ce->ce_mode))
 			return 0;
 	default:
 		return TYPE_CHANGED;
@@ -128,14 +124,14 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 {
 	unsigned int changed = 0;
 
-	switch (ntohl(ce->ce_mode) & S_IFMT) {
+	switch (ce->ce_mode & S_IFMT) {
 	case S_IFREG:
 		changed |= !S_ISREG(st->st_mode) ? TYPE_CHANGED : 0;
 		/* We consider only the owner x bit to be relevant for
 		 * "mode changes"
 		 */
 		if (trust_executable_bit &&
-		    (0100 & (ntohl(ce->ce_mode) ^ st->st_mode)))
+		    (0100 & (ce->ce_mode ^ st->st_mode)))
 			changed |= MODE_CHANGED;
 		break;
 	case S_IFLNK:
@@ -152,29 +148,17 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 	case 0: /* Special case: unmerged file in index */
 		return MODE_CHANGED | DATA_CHANGED | TYPE_CHANGED;
 	default:
-		die("internal error: ce_mode is %o", ntohl(ce->ce_mode));
+		die("internal error: ce_mode is %o", ce->ce_mode);
 	}
-	if (ce->ce_mtime.sec != htonl(st->st_mtime))
+	if (ce->ce_mtime != (unsigned int) st->st_mtime)
 		changed |= MTIME_CHANGED;
-	if (ce->ce_ctime.sec != htonl(st->st_ctime))
+	if (ce->ce_ctime != (unsigned int) st->st_ctime)
 		changed |= CTIME_CHANGED;
 
-#ifdef USE_NSEC
-	/*
-	 * nsec seems unreliable - not all filesystems support it, so
-	 * as long as it is in the inode cache you get right nsec
-	 * but after it gets flushed, you get zero nsec.
-	 */
-	if (ce->ce_mtime.nsec != htonl(st->st_mtim.tv_nsec))
-		changed |= MTIME_CHANGED;
-	if (ce->ce_ctime.nsec != htonl(st->st_ctim.tv_nsec))
-		changed |= CTIME_CHANGED;
-#endif
-
-	if (ce->ce_uid != htonl(st->st_uid) ||
-	    ce->ce_gid != htonl(st->st_gid))
+	if (ce->ce_uid != (unsigned int) st->st_uid ||
+	    ce->ce_gid != (unsigned int) st->st_gid)
 		changed |= OWNER_CHANGED;
-	if (ce->ce_ino != htonl(st->st_ino))
+	if (ce->ce_ino != (unsigned int) st->st_ino)
 		changed |= INODE_CHANGED;
 
 #ifdef USE_STDEV
@@ -183,11 +167,11 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 	 * clients will have different views of what "device"
 	 * the filesystem is on
 	 */
-	if (ce->ce_dev != htonl(st->st_dev))
+	if (ce->ce_dev != (unsigned int) st->st_dev)
 		changed |= INODE_CHANGED;
 #endif
 
-	if (ce->ce_size != htonl(st->st_size))
+	if (ce->ce_size != (unsigned int) st->st_size)
 		changed |= DATA_CHANGED;
 
 	return changed;
@@ -205,7 +189,7 @@ int ie_match_stat(struct index_state *istate,
 	 * If it's marked as always valid in the index, it's
 	 * valid whatever the checked-out copy says.
 	 */
-	if (!ignore_valid && (ce->ce_flags & htons(CE_VALID)))
+	if (!ignore_valid && (ce->ce_flags & CE_VALID))
 		return 0;
 
 	changed = ce_match_stat_basic(ce, st);
@@ -228,7 +212,7 @@ int ie_match_stat(struct index_state *istate,
 	 */
 	if (!changed &&
 	    istate->timestamp &&
-	    istate->timestamp <= ntohl(ce->ce_mtime.sec)) {
+	    istate->timestamp <= ce->ce_mtime) {
 		if (assume_racy_is_modified)
 			changed |= DATA_CHANGED;
 		else
@@ -320,7 +304,7 @@ int index_name_pos(struct index_state *istate, const char *name, int namelen)
 	while (last > first) {
 		int next = (last + first) >> 1;
 		struct cache_entry *ce = istate->cache[next];
-		int cmp = cache_name_compare(name, namelen, ce->name, ntohs(ce->ce_flags));
+		int cmp = cache_name_compare(name, namelen, ce->name, ce->ce_flags);
 		if (!cmp)
 			return next;
 		if (cmp < 0) {
@@ -405,7 +389,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	size = cache_entry_size(namelen);
 	ce = xcalloc(1, size);
 	memcpy(ce->name, path, namelen);
-	ce->ce_flags = htons(namelen);
+	ce->ce_flags = namelen;
 	fill_stat_cache_info(ce, &st);
 
 	if (trust_executable_bit && has_symlinks)
@@ -616,7 +600,7 @@ static int has_dir_name(struct index_state *istate,
 		}
 		len = slash - name;
 
-		pos = index_name_pos(istate, name, ntohs(create_ce_flags(len, stage)));
+		pos = index_name_pos(istate, name, create_ce_flags(len, stage));
 		if (pos >= 0) {
 			/*
 			 * Found one, but not so fast.  This could
@@ -704,7 +688,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
 
 	cache_tree_invalidate_path(istate->cache_tree, ce->name);
-	pos = index_name_pos(istate, ce->name, ntohs(ce->ce_flags));
+	pos = index_name_pos(istate, ce->name, ce->ce_flags);
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
@@ -736,7 +720,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 		if (!ok_to_replace)
 			return error("'%s' appears as both a file and as a directory",
 				     ce->name);
-		pos = index_name_pos(istate, ce->name, ntohs(ce->ce_flags));
+		pos = index_name_pos(istate, ce->name, ce->ce_flags);
 		pos = -pos-1;
 	}
 	return pos + 1;
@@ -810,7 +794,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		 * valid again, under "assume unchanged" mode.
 		 */
 		if (ignore_valid && assume_unchanged &&
-		    !(ce->ce_flags & htons(CE_VALID)))
+		    !(ce->ce_flags & CE_VALID))
 			; /* mark this one VALID again */
 		else
 			return ce;
@@ -826,7 +810,6 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	updated = xmalloc(size);
 	memcpy(updated, ce, size);
 	fill_stat_cache_info(updated, &st);
-
 	/*
 	 * If ignore_valid is not set, we should leave CE_VALID bit
 	 * alone.  Otherwise, paths marked with --no-assume-unchanged
@@ -834,8 +817,8 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	 * automatically, which is not really what we want.
 	 */
 	if (!ignore_valid && assume_unchanged &&
-	    !(ce->ce_flags & htons(CE_VALID)))
-		updated->ce_flags &= ~htons(CE_VALID);
+	    !(ce->ce_flags & CE_VALID))
+		updated->ce_flags &= ~CE_VALID;
 
 	return updated;
 }
@@ -880,7 +863,7 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p
 				/* If we are doing --really-refresh that
 				 * means the index is not valid anymore.
 				 */
-				ce->ce_flags &= ~htons(CE_VALID);
+				ce->ce_flags &= ~CE_VALID;
 				istate->cache_changed = 1;
 			}
 			if (quiet)
@@ -942,16 +925,34 @@ int read_index(struct index_state *istate)
 	return read_index_from(istate, get_index_file());
 }
 
+static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce)
+{
+	ce->ce_ctime = ntohl(ondisk->ctime.sec);
+	ce->ce_mtime = ntohl(ondisk->mtime.sec);
+	ce->ce_dev   = ntohl(ondisk->dev);
+	ce->ce_ino   = ntohl(ondisk->ino);
+	ce->ce_mode  = ntohl(ondisk->mode);
+	ce->ce_uid   = ntohl(ondisk->uid);
+	ce->ce_gid   = ntohl(ondisk->gid);
+	ce->ce_size  = ntohl(ondisk->size);
+	/* On-disk flags are just 16 bits */
+	ce->ce_flags = ntohs(ondisk->flags);
+	hashcpy(ce->sha1, ondisk->sha1);
+	memcpy(ce->name, ondisk->name, ce_namelen(ce)+1);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
 	int fd, i;
 	struct stat st;
-	unsigned long offset;
+	unsigned long src_offset, dst_offset;
 	struct cache_header *hdr;
+	void *mmap;
+	size_t mmap_size;
 
 	errno = EBUSY;
-	if (istate->mmap)
+	if (istate->alloc)
 		return istate->cache_nr;
 
 	errno = ENOENT;
@@ -967,31 +968,47 @@ int read_index_from(struct index_state *istate, const char *path)
 		die("cannot stat the open index (%s)", strerror(errno));
 
 	errno = EINVAL;
-	istate->mmap_size = xsize_t(st.st_size);
-	if (istate->mmap_size < sizeof(struct cache_header) + 20)
+	mmap_size = xsize_t(st.st_size);
+	if (mmap_size < sizeof(struct cache_header) + 20)
 		die("index file smaller than expected");
 
-	istate->mmap = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	close(fd);
+	if (mmap == MAP_FAILED)
+		die("unable to map index file");
 
-	hdr = istate->mmap;
-	if (verify_hdr(hdr, istate->mmap_size) < 0)
+	hdr = mmap;
+	if (verify_hdr(hdr, mmap_size) < 0)
 		goto unmap;
 
 	istate->cache_nr = ntohl(hdr->hdr_entries);
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
 
-	offset = sizeof(*hdr);
+	/*
+	 * The disk format is actually larger than the in-memory format,
+	 * due to space for nsec etc, so even though the in-memory one
+	 * has room for a few  more flags, we can allocate using the same
+	 * index size
+	 */
+	istate->alloc = xmalloc(mmap_size);
+
+	src_offset = sizeof(*hdr);
+	dst_offset = 0;
 	for (i = 0; i < istate->cache_nr; i++) {
+		struct ondisk_cache_entry *disk_ce;
 		struct cache_entry *ce;
 
-		ce = (struct cache_entry *)((char *)(istate->mmap) + offset);
-		offset = offset + ce_size(ce);
+		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
+		ce = (struct cache_entry *)((char *)istate->alloc + dst_offset);
+		convert_from_disk(disk_ce, ce);
 		istate->cache[i] = ce;
+
+		src_offset += ondisk_ce_size(ce);
+		dst_offset += ce_size(ce);
 	}
 	istate->timestamp = st.st_mtime;
-	while (offset <= istate->mmap_size - 20 - 8) {
+	while (src_offset <= mmap_size - 20 - 8) {
 		/* After an array of active_nr index entries,
 		 * there can be arbitrary number of extended
 		 * sections, each of which is prefixed with
@@ -999,40 +1016,36 @@ int read_index_from(struct index_state *istate, const char *path)
 		 * in 4-byte network byte order.
 		 */
 		unsigned long extsize;
-		memcpy(&extsize, (char *)(istate->mmap) + offset + 4, 4);
+		memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
 		extsize = ntohl(extsize);
 		if (read_index_extension(istate,
-					 ((const char *) (istate->mmap)) + offset,
-					 (char *) (istate->mmap) + offset + 8,
+					 (const char *) mmap + src_offset,
+					 (char *) mmap + src_offset + 8,
 					 extsize) < 0)
 			goto unmap;
-		offset += 8;
-		offset += extsize;
+		src_offset += 8;
+		src_offset += extsize;
 	}
+	munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
-	munmap(istate->mmap, istate->mmap_size);
+	munmap(mmap, mmap_size);
 	errno = EINVAL;
 	die("index file corrupt");
 }
 
 int discard_index(struct index_state *istate)
 {
-	int ret;
-
 	istate->cache_nr = 0;
 	istate->cache_changed = 0;
 	istate->timestamp = 0;
 	cache_tree_free(&(istate->cache_tree));
-	if (istate->mmap == NULL)
-		return 0;
-	ret = munmap(istate->mmap, istate->mmap_size);
-	istate->mmap = NULL;
-	istate->mmap_size = 0;
+	free(istate->alloc);
+	istate->alloc = NULL;
 
 	/* no need to throw away allocated active_cache */
-	return ret;
+	return 0;
 }
 
 #define WRITE_BUFFER_SIZE 8192
@@ -1148,6 +1161,28 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
 	}
 }
 
+static int ce_write_entry(SHA_CTX *c, int fd, struct cache_entry *ce)
+{
+	int size = ondisk_ce_size(ce);
+	struct ondisk_cache_entry *ondisk = xcalloc(1, size);
+
+	ondisk->ctime.sec = htonl(ce->ce_ctime);
+	ondisk->ctime.nsec = 0;
+	ondisk->mtime.sec = htonl(ce->ce_mtime);
+	ondisk->mtime.nsec = 0;
+	ondisk->dev  = htonl(ce->ce_dev);
+	ondisk->ino  = htonl(ce->ce_ino);
+	ondisk->mode = htonl(ce->ce_mode);
+	ondisk->uid  = htonl(ce->ce_uid);
+	ondisk->gid  = htonl(ce->ce_gid);
+	ondisk->size = htonl(ce->ce_size);
+	hashcpy(ondisk->sha1, ce->sha1);
+	ondisk->flags = htons(ce->ce_flags);
+	memcpy(ondisk->name, ce->name, ce_namelen(ce));
+
+	return ce_write(c, fd, ondisk, size);
+}
+
 int write_index(struct index_state *istate, int newfd)
 {
 	SHA_CTX c;
@@ -1173,9 +1208,9 @@ int write_index(struct index_state *istate, int newfd)
 		if (!ce->ce_mode)
 			continue;
 		if (istate->timestamp &&
-		    istate->timestamp <= ntohl(ce->ce_mtime.sec))
+		    istate->timestamp <= ce->ce_mtime)
 			ce_smudge_racily_clean_entry(ce);
-		if (ce_write(&c, newfd, ce, ce_size(ce)) < 0)
+		if (ce_write_entry(&c, newfd, ce) < 0)
 			return -1;
 	}
 
diff --git a/sha1_name.c b/sha1_name.c
index 13e1164..be8489e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -695,7 +695,7 @@ int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
 				break;
 			if (ce_stage(ce) == stage) {
 				hashcpy(sha1, ce->sha1);
-				*mode = ntohl(ce->ce_mode);
+				*mode = ce->ce_mode;
 				return 0;
 			}
 			pos++;
diff --git a/tree.c b/tree.c
index 8c0819f..87708ef 100644
--- a/tree.c
+++ b/tree.c
@@ -142,8 +142,8 @@ static int cmp_cache_name_compare(const void *a_, const void *b_)
 
 	ce1 = *((const struct cache_entry **)a_);
 	ce2 = *((const struct cache_entry **)b_);
-	return cache_name_compare(ce1->name, ntohs(ce1->ce_flags),
-				  ce2->name, ntohs(ce2->ce_flags));
+	return cache_name_compare(ce1->name, ce1->ce_flags,
+				  ce2->name, ce2->ce_flags);
 }
 
 int read_tree(struct tree *tree, int stage, const char **match)
diff --git a/unpack-trees.c b/unpack-trees.c
index aa2513e..9205320 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -289,7 +289,7 @@ static struct checkout state;
 static void check_updates(struct cache_entry **src, int nr,
 			struct unpack_trees_options *o)
 {
-	unsigned short mask = htons(CE_UPDATE);
+	unsigned short mask = CE_UPDATE;
 	unsigned cnt = 0, total = 0;
 	struct progress *progress = NULL;
 	char last_symlink[PATH_MAX];
@@ -408,7 +408,7 @@ static void verify_uptodate(struct cache_entry *ce,
 		 * submodules that are marked to be automatically
 		 * checked out.
 		 */
-		if (S_ISGITLINK(ntohl(ce->ce_mode)))
+		if (S_ISGITLINK(ce->ce_mode))
 			return;
 		errno = 0;
 	}
@@ -450,7 +450,7 @@ static int verify_clean_subdirectory(struct cache_entry *ce, const char *action,
 	int cnt = 0;
 	unsigned char sha1[20];
 
-	if (S_ISGITLINK(ntohl(ce->ce_mode)) &&
+	if (S_ISGITLINK(ce->ce_mode) &&
 	    resolve_gitlink_ref(ce->name, "HEAD", sha1) == 0) {
 		/* If we are not going to update the submodule, then
 		 * we don't care.
@@ -580,7 +580,7 @@ static void verify_absent(struct cache_entry *ce, const char *action,
 static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
 		struct unpack_trees_options *o)
 {
-	merge->ce_flags |= htons(CE_UPDATE);
+	merge->ce_flags |= CE_UPDATE;
 	if (old) {
 		/*
 		 * See if we can re-use the old CE directly?
@@ -601,7 +601,7 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
 		invalidate_ce_path(merge);
 	}
 
-	merge->ce_flags &= ~htons(CE_STAGEMASK);
+	merge->ce_flags &= ~CE_STAGEMASK;
 	add_cache_entry(merge, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
 	return 1;
 }
@@ -634,7 +634,7 @@ static void show_stage_entry(FILE *o,
 	else
 		fprintf(o, "%s%06o %s %d\t%s\n",
 			label,
-			ntohl(ce->ce_mode),
+			ce->ce_mode,
 			sha1_to_hex(ce->sha1),
 			ce_stage(ce),
 			ce->name);
@@ -920,7 +920,7 @@ int oneway_merge(struct cache_entry **src,
 			struct stat st;
 			if (lstat(old->name, &st) ||
 			    ce_match_stat(old, &st, CE_MATCH_IGNORE_VALID))
-				old->ce_flags |= htons(CE_UPDATE);
+				old->ce_flags |= CE_UPDATE;
 		}
 		return keep_entry(old, o);
 	}

^ permalink raw reply related

* Re: [PATCH] http-push: making HTTP push more robust and more user-friendly
From: Junio C Hamano @ 2008-01-14 19:35 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Grégoire Barbier, git
In-Reply-To: <alpine.LSU.1.00.0801141220001.8333@wbgn129.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sun, 13 Jan 2008, Junio C Hamano wrote:
>
>> The second one to add a couple of "goto cleanup" looked correct.  Acks, 
>> people?
>
> I haven't used http-push in ages, but there was a bug report with msysgit.  
> Hopefully that issue gets fixed by this patch.

Could you work with the reporter to see if this fixes the issue
for him?

^ permalink raw reply

* Re: performance problem: "git commit filename"
From: Junio C Hamano @ 2008-01-14 18:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Kristian H?gsberg
In-Reply-To: <alpine.LFD.1.00.0801140902140.2806@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> ... I think it would be even better if we 
> just bit the bullet and started looking at having a different in-memory 
> representation from the on-disk one. Possibly not *that* much different: 
> perhaps just keeping a pointer to the on-disk one along with a flags 
> value.

We have two things we currently do that are not about on-disk
index file ('-'), and this patch adds another ('+'):

 - Update the work tree file that corresponds to this entry
   (CE_UPDATE);

 - This entry is to be removed (ce_mode == 0);

 + The work tree file that corresponds to this entry is known to
   be unchanged (CE_UPTODATE).

We could introduce "struct in_core_cache_entry" that has these
information, indexed and sorted by name, and has a pointer to
what we read from the on-disk index.

	struct in_core_cache_entry {
		struct cache_entry *e;
                unsigned is_up_to_date : 1,
                	 to_be_updated : 1,
                         to_be_removed : 1;
	};

The code that iterate over active_cache[] will instead iterate
over this.  The number of the entries in this array will be the
new active_nr.

In the existing code, we reference "ce->name" and "ce->sha1"
everywhere. When we check and update flags we do bitops between
"ce->ce_flags" and "htons(CE_BLAH)" in many places.  Converting
them would adds another indirection and be quite painful.  But
the compiler can reliably spot the places we fail to find, so it
at least is not so risky.  It will just be a lot of work.

^ permalink raw reply

* Re: How to bypass the post-commit hook?
From: Jan Hudec @ 2008-01-14 17:59 UTC (permalink / raw)
  To: Pascal Obry; +Cc: Ping Yin, Git Mailing List
In-Reply-To: <4778BB63.6080908@obry.net>

On Mon, Dec 31, 2007 at 10:50:27 +0100, Pascal Obry wrote:
> Jan Hudec a écrit :
> > By the way, what is your post-commit hook doing anyway? Modifying the work
> > tree *after* a commit does not sound like a common thing to do.
> 
> Or just trigger a build via a built robot or record a commit information
> into an issue tracker...

That's not a case of "modifying a work tree *after* a commit".

There are many valid uses for post-commit hook and this is one of them. But
like the others it does not modify versioned files. That was the issue
discussed (and pre-commit indeed turned out a better match).

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply

* Re: safecrlf not in 1.5.4
From: Pierre Habouzit @ 2008-01-14 17:35 UTC (permalink / raw)
  To: Dmitry Potapov
  Cc: Junio C Hamano, Steffen Prohaska, Johannes Schindelin,
	Mark Levedahl, Git Mailing List, msysGit
In-Reply-To: <20080114090456.GZ2963@dpotapov.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

On Mon, Jan 14, 2008 at 09:04:56AM +0000, Dmitry Potapov wrote:
> On Sun, Jan 13, 2008 at 11:30:51PM -0800, Junio C Hamano wrote:
> > 
> > But we could end up having a short cycle for 1.5.5 if we agree
> > that the lack of crlf=safe is a severe bug that is worth fixing
> > post 1.5.4.
> 
> Hopefully, the cycle for 1.5.5 will be a bit shorter than 1.5.4, because
> 1.5.4 seems to have the longest development cycle of all versions, and
> it already contains almost as much changes as three previous versions
> ("git diff v1.5.3 master" is almost as big as "git diff v1.5.0 v1.5.3").

  hehe, though we still do not have Megabytes of changes between two RCs
yet ;)

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Git Cygwin - unable to create any repository - help!
From: Paul Umbers @ 2008-01-14 17:21 UTC (permalink / raw)
  To: git

Trying to create a repository under the cygwin install of git, windows
XP Pro. I can create the initial repository OK using "git init" and
add files using "git add .", but when I come to commit I get the
messages:

error: invalid object d9b06fceac52f6c24357e6a7f85c601
088381152
fatal: git-write-tree: error building trees

git-fsck gives me:

notice: HEAD points to an unborn branch (master)
notice: No default references
missing blob d9b06fceac52f6c24357e6a7f85c601088381152

This is with a simple repository of one directory containing one plain
ascii text file with some text in it. I get similar messages, with one
missing blob for each file or directory in the project, on more
complex projects. At home, I use git under Ubuntu linux and haven't
had any such problems.

The git/cygwin install followed the instruction for a cygwin binary
installation from the wiki and no problems were reported. I've been
unable to find online comments relating to this problem and am stuck.

Any suggestions?

Paul

^ permalink raw reply

* Re: performance problem: "git commit filename"
From: Linus Torvalds @ 2008-01-14 17:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Kristian H?gsberg
In-Reply-To: <7vr6glnrvp.fsf@gitster.siamese.dyndns.org>



On Sun, 13 Jan 2008, Junio C Hamano wrote:
> 
> I've reworked the patch, and in the kernel repository, a
> single-path commit after touching that path now calls 23k
> lstat(2).  It used to call 46k lstat(2) after your fix.

Ok, I really like what the patch does, and how it looks.

At the same time, I *really* hate how we now edit the cache entries in 
place for these kinds of things that really have nothing to do with the 
on-disk format. That's not a new thing (CE_UPDATE is the same), but it 
definitely got uglier.

So I think this patch is good, but I think it would be even better if we 
just bit the bullet and started looking at having a different in-memory 
representation from the on-disk one. Possibly not *that* much different: 
perhaps just keeping a pointer to the on-disk one along with a flags 
value.

That would be a fairly painful change, though (and quite independent from 
this particular one - apart from the fact that CE_UPTODATE is one of the 
users that could be cleaned up if we did that).

Comments?

			Linus

^ permalink raw reply

* [PATCH] Move sha1_file_to_archive into libgit
From: Lars Hjemli @ 2008-01-14 16:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

When the specfile (export-subst) attribute was introduced, it added a
dependency from archive-{tar|zip}.c to builtin-archive.c. This broke the
support for archive-operations in libgit.a since builtin-archive.o doesn't
belong in libgit.a.

This patch moves the functions required by libgit.a from builtin-archive.c
to the new file archive.c (which becomes part of libgit.a).

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
---
 Makefile          |    2 +-
 archive.c         |   84 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 builtin-archive.c |   80 --------------------------------------------------
 3 files changed, 85 insertions(+), 81 deletions(-)
 create mode 100644 archive.c

diff --git a/Makefile b/Makefile
index 21c80e6..c9b482a 100644
--- a/Makefile
+++ b/Makefile
@@ -316,7 +316,7 @@ LIB_OBJS = \
 	alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
 	color.o wt-status.o archive-zip.o archive-tar.o shallow.o utf8.o \
 	convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o \
-	transport.o bundle.o walker.o parse-options.o ws.o
+	transport.o bundle.o walker.o parse-options.o ws.o archive.o
 
 BUILTIN_OBJS = \
 	builtin-add.o \
diff --git a/archive.c b/archive.c
new file mode 100644
index 0000000..fb159fe
--- /dev/null
+++ b/archive.c
@@ -0,0 +1,84 @@
+#include "cache.h"
+#include "commit.h"
+#include "attr.h"
+
+static void format_subst(const struct commit *commit,
+                         const char *src, size_t len,
+                         struct strbuf *buf)
+{
+	char *to_free = NULL;
+	struct strbuf fmt;
+
+	if (src == buf->buf)
+		to_free = strbuf_detach(buf, NULL);
+	strbuf_init(&fmt, 0);
+	for (;;) {
+		const char *b, *c;
+
+		b = memmem(src, len, "$Format:", 8);
+		if (!b || src + len < b + 9)
+			break;
+		c = memchr(b + 8, '$', len - 8);
+		if (!c)
+			break;
+
+		strbuf_reset(&fmt);
+		strbuf_add(&fmt, b + 8, c - b - 8);
+
+		strbuf_add(buf, src, b - src);
+		format_commit_message(commit, fmt.buf, buf);
+		len -= c + 1 - src;
+		src  = c + 1;
+	}
+	strbuf_add(buf, src, len);
+	strbuf_release(&fmt);
+	free(to_free);
+}
+
+static int convert_to_archive(const char *path,
+                              const void *src, size_t len,
+                              struct strbuf *buf,
+                              const struct commit *commit)
+{
+	static struct git_attr *attr_export_subst;
+	struct git_attr_check check[1];
+
+	if (!commit)
+		return 0;
+
+	if (!attr_export_subst)
+		attr_export_subst = git_attr("export-subst", 12);
+
+	check[0].attr = attr_export_subst;
+	if (git_checkattr(path, ARRAY_SIZE(check), check))
+		return 0;
+	if (!ATTR_TRUE(check[0].value))
+		return 0;
+
+	format_subst(commit, src, len, buf);
+	return 1;
+}
+
+void *sha1_file_to_archive(const char *path, const unsigned char *sha1,
+                           unsigned int mode, enum object_type *type,
+                           unsigned long *sizep,
+                           const struct commit *commit)
+{
+	void *buffer;
+
+	buffer = read_sha1_file(sha1, type, sizep);
+	if (buffer && S_ISREG(mode)) {
+		struct strbuf buf;
+		size_t size = 0;
+
+		strbuf_init(&buf, 0);
+		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
+		convert_to_working_tree(path, buf.buf, buf.len, &buf);
+		convert_to_archive(path, buf.buf, buf.len, &buf, commit);
+		buffer = strbuf_detach(&buf, &size);
+		*sizep = size;
+	}
+
+	return buffer;
+}
+
diff --git a/builtin-archive.c b/builtin-archive.c
index 14a1b30..c2e0c1e 100644
--- a/builtin-archive.c
+++ b/builtin-archive.c
@@ -79,86 +79,6 @@ static int run_remote_archiver(const char *remote, int argc,
 	return !!rv;
 }
 
-static void format_subst(const struct commit *commit,
-                         const char *src, size_t len,
-                         struct strbuf *buf)
-{
-	char *to_free = NULL;
-	struct strbuf fmt;
-
-	if (src == buf->buf)
-		to_free = strbuf_detach(buf, NULL);
-	strbuf_init(&fmt, 0);
-	for (;;) {
-		const char *b, *c;
-
-		b = memmem(src, len, "$Format:", 8);
-		if (!b || src + len < b + 9)
-			break;
-		c = memchr(b + 8, '$', len - 8);
-		if (!c)
-			break;
-
-		strbuf_reset(&fmt);
-		strbuf_add(&fmt, b + 8, c - b - 8);
-
-		strbuf_add(buf, src, b - src);
-		format_commit_message(commit, fmt.buf, buf);
-		len -= c + 1 - src;
-		src  = c + 1;
-	}
-	strbuf_add(buf, src, len);
-	strbuf_release(&fmt);
-	free(to_free);
-}
-
-static int convert_to_archive(const char *path,
-                              const void *src, size_t len,
-                              struct strbuf *buf,
-                              const struct commit *commit)
-{
-	static struct git_attr *attr_export_subst;
-	struct git_attr_check check[1];
-
-	if (!commit)
-		return 0;
-
-	if (!attr_export_subst)
-		attr_export_subst = git_attr("export-subst", 12);
-
-	check[0].attr = attr_export_subst;
-	if (git_checkattr(path, ARRAY_SIZE(check), check))
-		return 0;
-	if (!ATTR_TRUE(check[0].value))
-		return 0;
-
-	format_subst(commit, src, len, buf);
-	return 1;
-}
-
-void *sha1_file_to_archive(const char *path, const unsigned char *sha1,
-                           unsigned int mode, enum object_type *type,
-                           unsigned long *sizep,
-                           const struct commit *commit)
-{
-	void *buffer;
-
-	buffer = read_sha1_file(sha1, type, sizep);
-	if (buffer && S_ISREG(mode)) {
-		struct strbuf buf;
-		size_t size = 0;
-
-		strbuf_init(&buf, 0);
-		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
-		convert_to_working_tree(path, buf.buf, buf.len, &buf);
-		convert_to_archive(path, buf.buf, buf.len, &buf, commit);
-		buffer = strbuf_detach(&buf, &size);
-		*sizep = size;
-	}
-
-	return buffer;
-}
-
 static int init_archiver(const char *name, struct archiver *ar)
 {
 	int rv = -1, i;
-- 
1.5.4.rc2.69.g047fe-dirty

^ permalink raw reply related

* Re: [PATCH] gitk: make Ctrl "+" really increase the font size
From: Johannes Schindelin @ 2008-01-14 15:57 UTC (permalink / raw)
  To: Stephan Hennig; +Cc: msysgit-/JYPxA39Uh5TLH3MbocFFw, git-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <fmft8d$kuv$2@ger.gmane.org>


Hi,

[please do not cull me from the Cc' list.  It's like answering to what I 
 said while looking at someone else.]

On Mon, 14 Jan 2008, Stephan Hennig wrote:

> Johannes Schindelin schrieb:
> > 
> > Only Ctrl "=" was bound to increase the font size, probably because
> > English keyboards have the plus on the same key as the equal sign.
> > However, not the whole world is English, and at least with some
> > other keyboard layouts, Ctrl "+" did not work as documented.
> > 
> > Noticed by Stephan Hennig.
> > 
> > Signed-off-by: Johannes Schindelin <Johannes.Schindelin-Mmb7MZpHnFY@public.gmane.org>
> > 
> > ---
> > 
> > 	On Thu, 10 Jan 2008, Stephan Hennig wrote:
> > 
> > 	> 
> > 	> Hi,
> > 	> 
> > 	> reducing font size in gitk with CTRL-- works, but enlarging font 
> > 	> size fails.  Typing CTRL-+ just doesn't have any effect here.
> > 
> > 	This is no bug in msysgit, but in gitk.
> > 
> > 	Paul, please apply.
> 
> The bug is still present in gitk that comes with Msysgit

If you are complaining about msysgit, please do not flood the git list 
with your response.

Besides, it is asking a little much after just 4 days for a patch to come 
through no less than 4 different repositories: gitk -> git.git -> 
mingw.git -> 4msysgit.git.

A patch that you could apply yourself easily, with the further benefit of 
being able to added your "Tested-by:" line.

After all, I just worked for you, for free, and I expect something back.

Hth,
Dscho

^ permalink raw reply

* Re: git-svn: network error while git-svn dcommit
From: Pedro Melo @ 2008-01-14 15:34 UTC (permalink / raw)
  To: Pedro Melo; +Cc: Git Mailing List
In-Reply-To: <75A83473-664E-4CC8-97ED-119D18F17F76@simplicidade.org>

Hi,

sorry to reply to myself. Figure it out.

Created a new branch from the remote svn, brought it up-to-date, then  
got the last applied sha1, and rebased that onto my new branch.

there might be an easier way, but this worked.

Best regards,

On Jan 14, 2008, at 1:20 PM, Pedro Melo wrote:

> Hi,
>
> I was doing a git-svn dcommit (git-1.5.4-rc2)  and the network to  
> the svn server died on me.
>
> Network connection closed unexpectedly: Connection closed  
> unexpectedly at /usr/local/git/bin/git-svn line 450
>
> If I try again, I get a warning about a dirty index.
>
> Cannot dcommit with a dirty index.  Commit your changes first, or  
> stash them with `git stash'.
>  at /usr/local/git/bin/git-svn line 406
>
> What's the best way to recover from this?
>
> Thanks,
> -- 
> Pedro Melo
> Blog: http://www.simplicidade.org/notes/
> XMPP ID: melo@simplicidade.org
> Use XMPP!
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: melo@simplicidade.org
Use XMPP!

^ permalink raw reply

* Re: [PATCH] gitk: make Ctrl "+" really increase the font size
From: Stephan Hennig @ 2008-01-14 14:52 UTC (permalink / raw)
  To: msysgit-/JYPxA39Uh5TLH3MbocFFw; +Cc: git-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.LSU.1.00.0801111238150.31053-OGWIkrnhIhzN0uC3ymp8PA@public.gmane.org>


Johannes Schindelin schrieb:
> 
> Only Ctrl "=" was bound to increase the font size, probably because
> English keyboards have the plus on the same key as the equal sign.
> However, not the whole world is English, and at least with some
> other keyboard layouts, Ctrl "+" did not work as documented.
> 
> Noticed by Stephan Hennig.
> 
> Signed-off-by: Johannes Schindelin <Johannes.Schindelin-Mmb7MZpHnFY@public.gmane.org>
> 
> ---
> 
> 	On Thu, 10 Jan 2008, Stephan Hennig wrote:
> 
> 	> 
> 	> Hi,
> 	> 
> 	> reducing font size in gitk with CTRL-- works, but enlarging font 
> 	> size fails.  Typing CTRL-+ just doesn't have any effect here.
> 
> 	This is no bug in msysgit, but in gitk.
> 
> 	Paul, please apply.

The bug is still present in gitk that comes with Msysgit

> $ git version
> git version 1.5.4.rc3.941.gebb79

Best regards,
Stephan Hennig

^ permalink raw reply

* Re: Digging through old vendor code
From: Jon Smirl @ 2008-01-14 14:49 UTC (permalink / raw)
  To: Jeff King; +Cc: linux, git, torvalds
In-Reply-To: <20080114120807.GB12723@coredump.intra.peff.net>

On 1/14/08, Jeff King <peff@peff.net> wrote:
> On Sun, Jan 13, 2008 at 11:28:06AM -0500, linux@horizon.com wrote:
>
> > Maybe a real git wizard will show me how to insert the index entries
> > directly without ever doing anything as pedestrian as extracting, hashing,
> > and then deleting the files, but it's still not that bad.
>
> git-read-tree?  Unfortunately it has no option to insert only a subset
> of the tree. But you can make do with git-ls-tree piped to
> git-update-index.
>
> Using the script below, Jon's sample file seems to be
>
>   v2.6.15-rc6-81-g0b57ee9:drivers/serial/amba-pl010.c
>
> and it runs in about 8 seconds on v2.6.13..v2.6.15. I think it might be
> more intuitive to just diff a temporary index against each tree, but I
> don't think there's a way to say "find copies harder, but use only this
> subset of files as the source" which makes it less efficient.
>
> Jon, you might try playing around with different ranges. I get a
> different answer for v2.6.13..v2.6.16.

That's probably because some of the other drivers in there were also
created via copy and edit. The similarity between drivers is what made
the original base hard to find. amba-pl010.c is probably a copy/edit
from one of the other drivers.

I'm pretty sure 2.6.15/amba-pl010.c is right. I started undoing some
of the obvious changes like renaming the functions and the diffs are
getting pretty small.

Thanks for the help on this. I have enough info now to separate the diffs.

This would probably make a nice command to add to git since I'm sure
other vendors have done copy/edit to make their out of tree drivers. I
doubt if any other SCM has a command like this.

I'm really starting to hate the "port and forget" that goes on in the
embedded world. But I'm beginning to understand why it happens. I've
been working since November to get a patch  into i2c without success -
19 versions. The process is chewing up way too much time relative to
the importance of the patch to my code. So I'm just about ready to
"port and forget" the patch.

>
> -- >8 --
> SRC=drivers/serial
>
> echo >&2 Cleaning up after old runs...
> rm -f tmpindex
> git branch -D tmpbranch
>
> echo >&2 Creating giant source commit...
> for i in `git rev-list v2.6.13..v2.6.15 -- $SRC`; do
>   git ls-tree -r $i -- $SRC |
>     # note the whitespace is a literal tab
>     sed "s,     ,       $i/," |
>     GIT_INDEX_FILE=tmpindex git update-index --index-info
> done
> tree=`GIT_INDEX_FILE=tmpindex git write-tree`
> commit=`echo source | git commit-tree $tree`
> git update-ref refs/heads/tmpbranch $commit
>
> echo >&2 Creating updated index...
> GIT_INDEX_FILE=tmpindex git add candidate.c
> echo >&2 Diffing...
> GIT_INDEX_FILE=tmpindex git diff-index --cached -l0 -M1% -C1% --find-copies-harder tmpbranch
>
> # now you should manually git-describe the winner
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* git-svn: network error while git-svn dcommit
From: Pedro Melo @ 2008-01-14 13:20 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I was doing a git-svn dcommit (git-1.5.4-rc2)  and the network to the  
svn server died on me.

Network connection closed unexpectedly: Connection closed  
unexpectedly at /usr/local/git/bin/git-svn line 450

If I try again, I get a warning about a dirty index.

Cannot dcommit with a dirty index.  Commit your changes first, or  
stash them with `git stash'.
  at /usr/local/git/bin/git-svn line 406

What's the best way to recover from this?

Thanks,
-- 
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: melo@simplicidade.org
Use XMPP!

^ permalink raw reply

* valgrind test scripts (was Re: [PATCH] Teach remote...)
From: Jeff King @ 2008-01-14 12:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git
In-Reply-To: <alpine.LSU.1.00.0801141202240.8333@wbgn129.biozentrum.uni-wuerzburg.de>

On Mon, Jan 14, 2008 at 12:18:13PM +0100, Johannes Schindelin wrote:

> >  * test scripts to use valgrind (Jeff King, but there was another
> >    one in the past -- can their efforts compared and coordinated
> >    better?).
> 
> Yes, that was written in Perl by Christian Couder:
> 
> http://article.gmane.org/gmane.comp.version-control.git/69236
> 
> Peff's version does not need Perl, and is better integrated with the 
> testsuite (via the new option -m).  Christian's version parses the output, 
> and might therefore be nicer to look at.

I don't think parsing is necessary. Christian's version counts the
errors, whereas I just barf if valgrind has mentioned any errors. And
using the '-q' output of valgrind means the output is fairly cleaned up.

But of course the main difference is that I tried to integrate into the
test scripts, and stop running as soon as any errors are found.

> However, I think that both versions do not account for scripts, and I 
> imagine that going through Git.pm and git-sh-setup is necessary for that.

Both versions use the 'alias' approach. A more comprehensive approach
would be something like:

  mkdir wrapper-bin
  cat >wrapper-bin/git <<EOF
  ...
  EOF
  chmod 755 wrapper-bin/git
  for i in $GIT_PROGRAMS; do
    ln -s git wrapper-bin/git-$i
  done
  PATH=$PWD/wrapper-bin:$PATH

which should get all git calls (though we should probably not wrap
"git-foo" if git-foo is a script (or we should convert it to "git
foo") since I have no desire to valgrind bash or perl).

> Also, it might be a good idea to be able to provide extra arguments, such 
> as "--attach-db=yes".

Yes. I suspect some people will need to add custom suppression files
depending on their platform, as well.

> Post 1.5.4, definitely.

Agreed.

-Peff

^ permalink raw reply

* Re: Adding Git to Better SCM Initiative : Comparison
From: Jakub Narebski @ 2008-01-14 12:14 UTC (permalink / raw)
  To: Dmitry Potapov
  Cc: Shlomi Fish, git, Eyvind Bernhardsen, David Kastrup,
	Florian Weimer, Chris Shoemaker
In-Reply-To: <20080114065810.GY2963@dpotapov.dyndns.org>

Dnia poniedziałek 14. stycznia 2008 07:58, Dmitry Potapov napisał:
> On Mon, Jan 14, 2008 at 01:31:19AM +0100, Jakub Narebski wrote:
> > On Mon, 14 January 2008, Dmitry Potapov wrote:
> > 
> > > Yes. Git can automatically detects renames and show history together,
> > > however being content oriented rather than file oriented, the notion of
> > > "retaining the history of the file" can not exactly applied to it.
> > 
> > "History of a file" can be defined as "<scm> log 'file'", and this is
> > well defined also for git. 
> 
> You missed the key word here -- *retaining*. In fact, if you define the
> history of a file just as something what "<scm> log" produce then what
> is the problem with CVS here? Why do most people say that CVS does not
> retain file history over rename? Certainly, you can type "cvs old-name"
> and see history of one file, and if you type "cvs new-name" then history
> of another... But somehow most people think about these two pieces as
> being the history of *one* file... So, your definition is incorrect or,
> at least, very different from what most people mean by that.

I assume that 0th part of rename support is true, i.e. that we can
recover previous full-tree state of repository.

> BTW, when you type "git log 'file'", it shows you not history of a file,
> but history of changes that affect the specified paths...

The fact that in "git log <path>" the <path> part is path _limiter_
(and can be directory, or set of directories) rather than being limited
to simply single filename is what makes git different, both in good
("git log subsystem/path") and in bad (different from what other SCM
used to) way.

When you type "git log --follow='file'", it shows you history of
a _contents_ which currently is in 'file'; even if there were rename
in the history of 'file' somewhere in the past.

When you type "git log 'directory'", it shows you (simplified) history
of changes affesting specified directory (usually some subsystem).


IMHO "rename support" should be defined as
1.) showing renames when examining given revision (status, log, show;
    whatever it is called).
2.) it should be able to follow history of a file when it looks like
    this: add, change, rename, change.

> > And 'rename support' for file means just
> > that this history of a file (of a current file contents) follows file
> > renames.
[...]
> > 
> > IIRC this des not work for directories... 
> 
> Git works for directories, it is just that the --follow option cannot
> applied to it, because this option means to follow the file contents,
> which does not make much sense for directories.

But it would be nice to have somehow "git log --follow=directory" work,
even if directory in which susystem resides was renamed. It is harder
work also because (I think) directories are more often split and joined
than file[s contents].

> > > Git is different in that it tracks the content as the whole rather than
> > > tracking a set of files. When you look at some source code, what you
> > > really want to know who and why wrote *this*, and usually it does not
> > > matter to you whether it was written in this file or another one. CVS
> > > is really bad at that, because if you renamed a file, it would be very
> > > difficult to go back to history and find that. Many file-ids based SCMs
> > > have solved this problem, however, they do not do any better than CVS
> > > in another very common case -- when your code is moved around as result
> > > of refactoring, but Git addresses both problems, not just one!
> > 
> > AFAIK Mercurial (hg) is not file-id based, but does explicitely track
> > renames. There was even an idea presented on git mailing list to mark
> > renames in commit object in some "note" header.
> 
> I suspect the main reason why Mercurial support that is that a lot of
> programmers whose mind was mangled by many years of CVS experience asked
> for that feature. In practice, what you really want to track is contents.
> And it is not difficult to add some "note" to the commit and teach Git to
> follow it, but I don't see any practical value in that...

Mercurial can be IMHO from architecture point of view be viewed a bit
as "CVS done right", much more than Subversion, with its path-hashed
changeset storage, manifest file, and changelog / changerev file.

And I guess that Mercurial supports this because of the most important
part of "renames support" (which is present only as TODO for Better-SCM
comparison), namely merging correct files in presence of renames.
 
> > 
> > It would be much better if for each feature there was some test
> > described which would allow to check if the feature is supported.
> 
> Wanna test your LCD monitor with some old CRT tests? -:)

If those tests were done correctly, not from technical side ("renames
support" and other similar thingies for SCMs, refresh rate for LCD/CRT),
but from user side (does command which shows history of a file follows
renames, eyestrain / image sharpness for monitors).... :-)

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: Digging through old vendor code
From: Jeff King @ 2008-01-14 12:08 UTC (permalink / raw)
  To: linux; +Cc: git, jonsmirl, torvalds
In-Reply-To: <20080113162806.13991.qmail@science.horizon.com>

On Sun, Jan 13, 2008 at 11:28:06AM -0500, linux@horizon.com wrote:

> Maybe a real git wizard will show me how to insert the index entries
> directly without ever doing anything as pedestrian as extracting, hashing,
> and then deleting the files, but it's still not that bad.

git-read-tree?  Unfortunately it has no option to insert only a subset
of the tree. But you can make do with git-ls-tree piped to
git-update-index.

Using the script below, Jon's sample file seems to be

  v2.6.15-rc6-81-g0b57ee9:drivers/serial/amba-pl010.c

and it runs in about 8 seconds on v2.6.13..v2.6.15. I think it might be
more intuitive to just diff a temporary index against each tree, but I
don't think there's a way to say "find copies harder, but use only this
subset of files as the source" which makes it less efficient.

Jon, you might try playing around with different ranges. I get a
different answer for v2.6.13..v2.6.16.

-- >8 --
SRC=drivers/serial

echo >&2 Cleaning up after old runs...
rm -f tmpindex
git branch -D tmpbranch

echo >&2 Creating giant source commit...
for i in `git rev-list v2.6.13..v2.6.15 -- $SRC`; do
  git ls-tree -r $i -- $SRC |
    # note the whitespace is a literal tab
    sed "s,	,	$i/," |
    GIT_INDEX_FILE=tmpindex git update-index --index-info
done
tree=`GIT_INDEX_FILE=tmpindex git write-tree`
commit=`echo source | git commit-tree $tree`
git update-ref refs/heads/tmpbranch $commit

echo >&2 Creating updated index...
GIT_INDEX_FILE=tmpindex git add candidate.c
echo >&2 Diffing...
GIT_INDEX_FILE=tmpindex git diff-index --cached -l0 -M1% -C1% --find-copies-harder tmpbranch

# now you should manually git-describe the winner

^ permalink raw reply

* Re: [PATCH] http-push: making HTTP push more robust and more user-friendly
From: Johannes Schindelin @ 2008-01-14 11:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Grégoire Barbier, git
In-Reply-To: <7vbq7ppbyh.fsf@gitster.siamese.dyndns.org>

Hi,

On Sun, 13 Jan 2008, Junio C Hamano wrote:

> The second one to add a couple of "goto cleanup" looked correct.  Acks, 
> people?

I haven't used http-push in ages, but there was a bug report with msysgit.  
Hopefully that issue gets fixed by this patch.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] Teach remote machinery about remotes.default config variable
From: Johannes Schindelin @ 2008-01-14 11:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mark Levedahl, git
In-Reply-To: <7vir1xmazm.fsf@gitster.siamese.dyndns.org>

Hi,

On Sun, 13 Jan 2008, Junio C Hamano wrote:

> And we already have "clone -o" and claim to support that option.

My understanding was _always_ that the "-o" option was meant for the case 
that you want to clone from somewhere else than where you want to pull 
from.  Something like an initial clone from a USB disk.

>  * test scripts to use valgrind (Jeff King, but there was another
>    one in the past -- can their efforts compared and coordinated
>    better?).

Yes, that was written in Perl by Christian Couder:

http://article.gmane.org/gmane.comp.version-control.git/69236

Peff's version does not need Perl, and is better integrated with the 
testsuite (via the new option -m).  Christian's version parses the output, 
and might therefore be nicer to look at.

However, I think that both versions do not account for scripts, and I 
imagine that going through Git.pm and git-sh-setup is necessary for that.

Also, it might be a good idea to be able to provide extra arguments, such 
as "--attach-db=yes".

Post 1.5.4, definitely.

Ciao,
Dscho

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox