From: fork0@t-online.de (Alex Riesen)
To: Shawn Pearce <spearce@spearce.org>
Cc: Jakub Narebski <jnareb@gmail.com>,
Junio C Hamano <junkio@cox.net>,
git@vger.kernel.org
Subject: Re: win2k/cygwin cannot handle even moderately sized packs
Date: Wed, 8 Nov 2006 00:11:30 +0100 [thread overview]
Message-ID: <20061107231130.GA5141@steel.home> (raw)
In-Reply-To: <20061107185648.GE26591@spearce.org>
Shawn Pearce, Tue, Nov 07, 2006 19:56:48 +0100:
> I've pushed the changes out to repo.or.cz:
>
> http://repo.or.cz/w/git/fastimport.git
>
> in the window-mapping branch. Note that this is based on a slightly
> older version of Git (v1.4.2). There are two "tuneables" on line 376
> of sha1_file.c, this is the maximum amount of memory (in bytes) to
> denote to packs and the maximum chunk size of each pack (in bytes).
Thanks.
I couldn't help noticing that the interface to the packs data is
a bit complex:
unsigned char *use_pack(struct packed_git *p,
struct pack_window **window,
unsigned long offset,
unsigned int *left);
void unuse_pack(struct pack_window **w);
Can I suggest something like below?
unsigned char *use_pack(struct packed_git *p,
off_t offset, size_t size, size_t *mapped);
void unuse_pack(struct packed_git *p, off_t offset, size_t size);
or
void unuse_pack(struct packed_git *p, unsigned char *data);
(where size/maxsize is the amount of bytes at the offset in the
pack file the caller was asking for. use_pack would fail if offset
or size do not pass sanity checks (offset past end of file, size
is 0). The mapped argument would get the length of the data
actually mapped, and can be less than requested, subject to end of
data). The window sliding code seem to have all information it
needs with this set of arguments.
Or am I missing something very obvious, and something like this
is just not feasible for some reasons?
I'm asking because I tried to slowly rebase the window-mapping up and
merge the newer branches into it (to get it working with more recent
code). At some point I came over conflicts and one of them got me
thinking about the interface. That's the part:
<<<<<<< HEAD/sha1_file.c
c = *use_pack(p, w, offset++, NULL);
*type = (c >> 4) & 7;
size = c & 15;
shift = 4;
while (c & 0x80) {
c = *use_pack(p, w, offset++, NULL);
size += (c & 0x7f) << shift;
shift += 7;
}
*sizep = size;
return offset;
=======
used = unpack_object_header_gently((unsigned char *)p->pack_base +
offset,
p->pack_size - offset, type, sizep);
if (!used)
die("object offset outside of pack file");
return offset + used;
>>>>>>> f685d07de045423a69045e42e72d2efc22a541ca/sha1_file.c
I was almost about to move your code into unpack_object_header_gently,
but ... The header isn't that big, is it? It is variable in the pack,
but the implementation of the parser is at the moment restricted by
the type we use for object size (unsigned long for the particular
platform). For example:
/* Object size is in the first 4 bits, and in the low 7 bits
* of the subsequent bytes which have high bit set */
#define MAX_LOCAL_HDRSIZE ((sizeof(long) * 8 - 4) / 7 + 1)
unsigned long size;
unsigned char *header = use_pack(p, offset, MAX_LOCAL_HDRSIZE, &size);
if (!header)
die("object header offset out of range");
/* unpack_object_header_gently takes care about truncated
* headers, by returning 0 if it encounters one */
used = unpack_object_header_gently(header, size, type, sizep);
Wouldn't have to change unpack_object_header_gently at all.
(BTW, current unpack_object_header_gently does not use it's len
argument to check if there actually is enough data to hold at least
minimal header. Is the size of mapped data checked for correctness
somewhere before?)
next prev parent reply other threads:[~2006-11-07 23:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-07 11:02 win2k/cygwin cannot handle even moderately sized packs Alex Riesen
2006-11-07 12:17 ` Noel Grandin
2006-11-07 13:55 ` Alex Riesen
2006-11-07 15:50 ` Jakub Narebski
2006-11-07 17:28 ` Alex Riesen
2006-11-07 17:48 ` Shawn Pearce
2006-11-07 18:13 ` Alex Riesen
2006-11-07 18:18 ` Shawn Pearce
2006-11-07 18:26 ` Shawn Pearce
2006-11-07 18:56 ` Shawn Pearce
2006-11-07 23:11 ` Alex Riesen [this message]
2006-11-08 5:19 ` Shawn Pearce
2006-11-08 13:37 ` Alex Riesen
2006-11-08 17:11 ` Shawn Pearce
2006-11-08 21:33 ` Alex Riesen
2006-11-08 22:28 ` Shawn Pearce
2006-11-07 19:27 ` Alex Riesen
2006-11-08 19:22 ` Christopher Faylor
2006-11-13 12:45 ` Johannes Schindelin
2006-11-13 17:34 ` Alex Riesen
2006-11-13 17:36 ` Alex Riesen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061107231130.GA5141@steel.home \
--to=fork0@t-online.de \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=junkio@cox.net \
--cc=raa.lkml@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).