* [PATCH] Make object contents optionally available
@ 2005-05-17 4:56 Daniel Barkalow
2005-05-17 15:29 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Barkalow @ 2005-05-17 4:56 UTC (permalink / raw)
To: git, Petr Baudis, Junio C Hamano; +Cc: Linus Torvalds
This adds contents and size fields to struct object. If unpack_object is
called on an object, it will fill in the contents field with the complete
raw contents of the object. If free_object_contents is called on an
object, the contents will be freed. If contents is filled when an object
is parsed, it is not unpacked an extra time, but the contents are not
retained if they were not unpacked before parsing.
This patch also centralizes the code to unpack and check the type of
objects, making it worthwhile even without callers for the new functions.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Index: blob.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/blob.c (mode:100644 sha1:280f5241577ac029e9d5a7eb5bf895642b342fc8)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/blob.c (mode:100644 sha1:68c1adc2b1009fbc5525207b0ddaac34e9c4413b)
@@ -30,21 +30,15 @@
int parse_blob(struct blob *item)
{
- char type[20];
- void *buffer;
- unsigned long size;
+ int unpacked = !!item->object.contents;
int ret;
if (item->object.parsed)
return 0;
- buffer = read_sha1_file(item->object.sha1, type, &size);
- if (!buffer)
- return error("Could not read %s",
- sha1_to_hex(item->object.sha1));
- if (strcmp(type, blob_type))
- return error("Object %s not a blob",
- sha1_to_hex(item->object.sha1));
- ret = parse_blob_buffer(item, buffer, size);
- free(buffer);
+ unpack_object(&item->object);
+ ret = parse_blob_buffer(item, item->object.contents,
+ item->object.size);
+ if (!unpacked)
+ free_object_contents(&item->object);
return ret;
}
Index: commit.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/commit.c (mode:100644 sha1:706c7cba08ebc2100c2dbf63ed1238f39324f750)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/commit.c (mode:100644 sha1:38324dcc16785caa4baafb260c2fa9c0af1248a9)
@@ -69,24 +69,16 @@
int parse_commit(struct commit *item)
{
- char type[20];
- void *buffer;
- unsigned long size;
+ int unpacked = !!item->object.contents;
int ret;
if (item->object.parsed)
return 0;
- buffer = read_sha1_file(item->object.sha1, type, &size);
- if (!buffer)
- return error("Could not read %s",
- sha1_to_hex(item->object.sha1));
- if (strcmp(type, commit_type)) {
- free(buffer);
- return error("Object %s not a commit",
- sha1_to_hex(item->object.sha1));
- }
- ret = parse_commit_buffer(item, buffer, size);
- free(buffer);
+ unpack_object(&item->object);
+ ret = parse_commit_buffer(item, item->object.contents,
+ item->object.size);
+ if (!unpacked)
+ free_object_contents(&item->object);
return ret;
}
Index: object.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/object.c (mode:100644 sha1:b5a62e7f87f24c2ab0ea83f3c445d81bcbff027a)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/object.c (mode:100644 sha1:3e712d5f665d22e06e310932447629b0badc6b48)
@@ -99,6 +99,34 @@
}
}
+int unpack_object(struct object *obj)
+{
+ unsigned long mapsize;
+ void *map;
+ char type[20];
+ if (obj->contents)
+ return 0;
+ map = map_sha1_file(obj->sha1, &mapsize);
+ if (!map)
+ return error("Could not map %s",
+ sha1_to_hex(obj->sha1));
+ obj->contents = unpack_sha1_file(map, mapsize, type, &obj->size);
+ munmap(map, mapsize);
+ if (!obj->contents)
+ return error("Could not unpack %s",
+ sha1_to_hex(obj->sha1));
+ if (strcmp(type, obj->type))
+ return error("Object %s not a %s",
+ sha1_to_hex(obj->sha1), obj->type);
+ return 0;
+}
+
+void free_object_contents(struct object *obj)
+{
+ free(obj->contents);
+ obj->contents = NULL;
+}
+
struct object *parse_object(unsigned char *sha1)
{
unsigned long mapsize;
Index: object.h
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/object.h (mode:100644 sha1:09700d376077b2d6136620faf6efb77ee679deeb)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/object.h (mode:100644 sha1:848aee6a1b9138618d5e44abf8567d2c2d3c35c1)
@@ -13,6 +13,8 @@
unsigned char sha1[20];
const char *type;
struct object_list *refs;
+ char *contents;
+ unsigned long size;
};
extern int nr_objs;
@@ -25,6 +27,12 @@
/** Returns the object, having parsed it to find out what it is. **/
struct object *parse_object(unsigned char *sha1);
+/** Unpacks the object into the contents field. **/
+int unpack_object(struct object *obj);
+
+/** Deallocates the unpacked contents of the object. **/
+void free_object_contents(struct object *obj);
+
void add_ref(struct object *refer, struct object *target);
void mark_reachable(struct object *obj, unsigned int mask);
Index: tag.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/tag.c (mode:100644 sha1:22deb243ad58b2c57fb8652fe2d08c571ee3e781)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/tag.c (mode:100644 sha1:7fc7960c866a2704ec8f230d24259bc331f7f209)
@@ -66,23 +66,15 @@
int parse_tag(struct tag *item)
{
- char type[20];
- void *data;
- unsigned long size;
+ int unpacked = !!item->object.contents;
int ret;
if (item->object.parsed)
return 0;
- data = read_sha1_file(item->object.sha1, type, &size);
- if (!data)
- return error("Could not read %s",
- sha1_to_hex(item->object.sha1));
- if (strcmp(type, tag_type)) {
- free(data);
- return error("Object %s not a tag",
- sha1_to_hex(item->object.sha1));
- }
- ret = parse_tag_buffer(item, data, size);
- free(data);
+ unpack_object(&item->object);
+ ret = parse_tag_buffer(item, item->object.contents,
+ item->object.size);
+ if (!unpacked)
+ free_object_contents(&item->object);
return ret;
}
Index: tree.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/tree.c (mode:100644 sha1:ca800a85f771be1bd10d6575d93ca05bd3fc381c)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/tree.c (mode:100644 sha1:3b10428aa0a8b9da32c0deaa16edea38c74da0c0)
@@ -140,23 +140,15 @@
int parse_tree(struct tree *item)
{
- char type[20];
- void *buffer;
- unsigned long size;
- int ret;
+ int unpacked = !!item->object.contents;
+ int ret;
if (item->object.parsed)
return 0;
- buffer = read_sha1_file(item->object.sha1, type, &size);
- if (!buffer)
- return error("Could not read %s",
- sha1_to_hex(item->object.sha1));
- if (strcmp(type, tree_type)) {
- free(buffer);
- return error("Object %s not a tree",
- sha1_to_hex(item->object.sha1));
- }
- ret = parse_tree_buffer(item, buffer, size);
- free(buffer);
+ unpack_object(&item->object);
+ ret = parse_tree_buffer(item, item->object.contents,
+ item->object.size);
+ if (!unpacked)
+ free_object_contents(&item->object);
return ret;
}
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] Make object contents optionally available
2005-05-17 4:56 [PATCH] Make object contents optionally available Daniel Barkalow
@ 2005-05-17 15:29 ` Linus Torvalds
2005-05-17 15:52 ` Daniel Barkalow
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2005-05-17 15:29 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Petr Baudis, Junio C Hamano
On Tue, 17 May 2005, Daniel Barkalow wrote:
>
> This adds contents and size fields to struct object. If unpack_object is
> called on an object, it will fill in the contents field with the complete
> raw contents of the object. If free_object_contents is called on an
> object, the contents will be freed. If contents is filled when an object
> is parsed, it is not unpacked an extra time, but the contents are not
> retained if they were not unpacked before parsing.
I really hate magic interfaces like that. It's just a bug waiting to
happen.
I'd actually prefer it if "parse_object()" just always unpacks it, and
leaves the unpacked thing in memory.
Then, the _few_ programs that really care about memory use because they
parse potentially millions of objects (especially blobs, which obviously
can be very large) can be updated to just do a manual
"free_object_contents()". That's primarily things like fsck and
convert-cache, I suspect.
That would leave a lot less special cases and strange rules..
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make object contents optionally available
2005-05-17 15:29 ` Linus Torvalds
@ 2005-05-17 15:52 ` Daniel Barkalow
2005-05-17 17:12 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Barkalow @ 2005-05-17 15:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, Petr Baudis, Junio C Hamano
On Tue, 17 May 2005, Linus Torvalds wrote:
>
>
> On Tue, 17 May 2005, Daniel Barkalow wrote:
> >
> > This adds contents and size fields to struct object. If unpack_object is
> > called on an object, it will fill in the contents field with the complete
> > raw contents of the object. If free_object_contents is called on an
> > object, the contents will be freed. If contents is filled when an object
> > is parsed, it is not unpacked an extra time, but the contents are not
> > retained if they were not unpacked before parsing.
>
> I really hate magic interfaces like that. It's just a bug waiting to
> happen.
I think I described it with too many cases above. parse_*() doesn't change
whether the contents are unpacked. The obvious optimization is performed
if possible.
I'm already going to add a per-type global to have the parse functions
also unpack the object contents user-visibly, for the case that Junio
pointed out. (Making it: parse_* doesn't change whether the contents are
unpacked, unless you tell it to unpack objects.)
I think the only likely bug would be unpacking objects after parsing
them, instead of before, which is correct but inefficient. It should be
clear to a user whether the raw contents are available at any point in the
user code.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make object contents optionally available
2005-05-17 15:52 ` Daniel Barkalow
@ 2005-05-17 17:12 ` Junio C Hamano
2005-05-17 17:55 ` Daniel Barkalow
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2005-05-17 17:12 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Linus Torvalds, git, Petr Baudis
>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:
DB> I'm already going to add a per-type global to have the parse functions
DB> also unpack the object contents user-visibly, for the case that Junio
DB> pointed out. (Making it: parse_* doesn't change whether the contents are
DB> unpacked, unless you tell it to unpack objects.)
That sounds better than the patch you sent last night, but are
we sure that callers would be satisfied if you just make some
_types_ more special than others?
Parse functions need to unpack anyway, so it might make more
sense to add an optional callback just after where unpacking
happens to ask the main program if it wants the unpacked raw
data to be kept. So you would do something like:
struct object *parse_object(unsigned char *sha1)
{
...
unsigned long size;
void *buffer = unpack_sha1_file(map, mapsize, type, &size);
munmap(map, mapsize);
if (!buffer)
return NULL;
...
} else {
obj = NULL;
}
+ if (obj && keep_object_raw_data(sha1, type, size, buffer)) {
+ obj.raw_data = buffer;
+ obj.raw_size = size;
+ } else
free(buffer);
return obj;
}
return NULL;
}
And put a dummy implementation of keep_object_raw_data() that
says "I do not want anything to be kept" in the libgit.a. Main
programs that _care_ about raw data can implement their own
keep_object_raw_data() callback that is more intelligent.
In addition you give them the interface to free raw data you
already have.
DB> I think the only likely bug would be unpacking objects after parsing
DB> them, instead of before, which is correct but inefficient. It should be
DB> clear to a user whether the raw contents are available at any point in the
DB> user code.
Another possibility is not to make double unpacking too costly
by having an LRU of unpack_sha1_file(), but I am not sure how
effective that would be.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] Make object contents optionally available
2005-05-17 17:12 ` Junio C Hamano
@ 2005-05-17 17:55 ` Daniel Barkalow
2005-05-17 19:32 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Barkalow @ 2005-05-17 17:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git, Petr Baudis
On Tue, 17 May 2005, Junio C Hamano wrote:
> >>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:
>
> DB> I'm already going to add a per-type global to have the parse functions
> DB> also unpack the object contents user-visibly, for the case that Junio
> DB> pointed out. (Making it: parse_* doesn't change whether the contents are
> DB> unpacked, unless you tell it to unpack objects.)
>
> That sounds better than the patch you sent last night, but are
> we sure that callers would be satisfied if you just make some
> _types_ more special than others?
It wouldn't make any types special; the caller can just control each type
individually, so that code that only cares about commits could get commits
unpacked, but not get trees unpacked even if it has them parsed.
> Parse functions need to unpack anyway, so it might make more
> sense to add an optional callback just after where unpacking
> happens to ask the main program if it wants the unpacked raw
> data to be kept.
I think that a callback is a bit excessive, and it wouldn't get the useful
information anyway, which is really: "what is supposed to happen to this
object, which you haven't seen before, after it's parsed?"
> So you would do something like:
> struct object *parse_object(unsigned char *sha1)
> {
That reminds me that I should also fix the parse_object path, which can
now be simplified substantially.
Note that parse_object calls parse_<type>, rather than the reverse,
because parse_<type> can skip the step of figuring out what routine to
use. The parse_<type> version also has lookup_<type>, which can return the
struct which will be filled out later; this isn't possible for an object
of unknown type, since it can't tell what struct type to allocate.
> Another possibility is not to make double unpacking too costly
> by having an LRU of unpack_sha1_file(), but I am not sure how
> effective that would be.
It would work for pop_most_recent_commit, which parses the thing it's
about to return, but I don't think it's a particularly good solution in
general.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] Make object contents optionally available
2005-05-17 17:55 ` Daniel Barkalow
@ 2005-05-17 19:32 ` Junio C Hamano
2005-05-17 19:59 ` Daniel Barkalow
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2005-05-17 19:32 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Linus Torvalds, git, Petr Baudis
>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:
DB> It wouldn't make any types special; the caller can just control each type
DB> individually, so that code that only cares about commits could get commits
DB> unpacked, but not get trees unpacked even if it has them parsed.
That is exactly the behaviour I am questioning about. You are
giving the caller the ability to discriminate objects based on
their _type_ (which is fine and better than not having that
control at all). Why can't the caller discriminate objects
based on their, say, size for example [*1*]? That's what I
meant to say by "types are special" but I phrased it so badly.
And the callback would solve that naturally. Or you can
additionally give the callback the result of the parse_xxx(),
which would be even more useful. The callback can then throw
commit raw-data away based on the date, for example.
[Footnote]
*1* So this hypothethical program uses fast cached raw data for
small things but is willing to pay penalty for big things if it
later finds those big things turn out to be needed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make object contents optionally available
2005-05-17 19:32 ` Junio C Hamano
@ 2005-05-17 19:59 ` Daniel Barkalow
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Barkalow @ 2005-05-17 19:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git, Petr Baudis
On Tue, 17 May 2005, Junio C Hamano wrote:
> That's what I meant to say by "types are special" but I phrased it so
> badly.
Ah, okay. I'm not convinced that discriminating at all at the point where
parse is deciding whether to free the contents is that useful; doing it
based on type is just sufficiently easy that it could be worthwhile.
> And the callback would solve that naturally. Or you can
> additionally give the callback the result of the parse_xxx(),
> which would be even more useful. The callback can then throw
> commit raw-data away based on the date, for example.
I don't think a callback has any advantage over turning on a flag to
provide all unpacked objects, and having the program free them when it
wants to, either right after whatever caused them to be parsed returns
or later. I suspect that the main relevant thing is going to be the stack,
not the data.
I have another idea, which is almost certainly too magical, but which
would work: have library functions that parse objects return them unpacked
if the objects they are given are unpacked. So, for example,
pop_most_recent_commit() would unpack parents of a commit if the commit
was unpacked. This would make everything work the way you'd expect if you
didn't think about it, although it would probably be really surprising if
you were thinking about it. I'm just mentioning this because it might make
someone think of something similar but sane.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-05-17 20:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-17 4:56 [PATCH] Make object contents optionally available Daniel Barkalow
2005-05-17 15:29 ` Linus Torvalds
2005-05-17 15:52 ` Daniel Barkalow
2005-05-17 17:12 ` Junio C Hamano
2005-05-17 17:55 ` Daniel Barkalow
2005-05-17 19:32 ` Junio C Hamano
2005-05-17 19:59 ` Daniel Barkalow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox