* How do I get the contents of a directory in fast-import
@ 2016-01-01 15:54 Stefan Monnier
2016-01-09 23:56 ` Stefan Monnier
2016-01-15 22:39 ` Jeff King
0 siblings, 2 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-01-01 15:54 UTC (permalink / raw)
To: git
I have a program which tries to collect info from lots of branches and
generate some table from that data into another branch.
For performance reasons, I'd like to do that from fast-import, and as
long as I know the name of all the files I need to consult, everything
is fine since I can use the "ls" and "cat-blob" commands of fast-import
to get efficiently the data I need.
But I also need to look at some files whose names I don't know beforehand
(i.e. all the files in some directories). If I do "cat-blob" on those
directories I get some binary "thing" which I don't understand.
So how do I get a directory listing from fast-inmport, i.e.
like I can get with "git cat-file -p", but without having to fork
a separate git process?
Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How do I get the contents of a directory in fast-import
2016-01-01 15:54 How do I get the contents of a directory in fast-import Stefan Monnier
@ 2016-01-09 23:56 ` Stefan Monnier
2016-01-15 22:39 ` Jeff King
1 sibling, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-01-09 23:56 UTC (permalink / raw)
To: git
Any help would be greatly welcome, including "sorry, can't do that".
Stefan
>>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes:
> I have a program which tries to collect info from lots of branches and
> generate some table from that data into another branch.
> For performance reasons, I'd like to do that from fast-import, and as
> long as I know the name of all the files I need to consult, everything
> is fine since I can use the "ls" and "cat-blob" commands of fast-import
> to get efficiently the data I need.
> But I also need to look at some files whose names I don't know beforehand
> (i.e. all the files in some directories). If I do "cat-blob" on those
> directories I get some binary "thing" which I don't understand.
> So how do I get a directory listing from fast-inmport, i.e.
> like I can get with "git cat-file -p", but without having to fork
> a separate git process?
> Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How do I get the contents of a directory in fast-import
2016-01-01 15:54 How do I get the contents of a directory in fast-import Stefan Monnier
2016-01-09 23:56 ` Stefan Monnier
@ 2016-01-15 22:39 ` Jeff King
2016-01-16 1:59 ` Stefan Monnier
1 sibling, 1 reply; 4+ messages in thread
From: Jeff King @ 2016-01-15 22:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: git
On Fri, Jan 01, 2016 at 10:54:00AM -0500, Stefan Monnier wrote:
> I have a program which tries to collect info from lots of branches and
> generate some table from that data into another branch.
>
> For performance reasons, I'd like to do that from fast-import, and as
> long as I know the name of all the files I need to consult, everything
> is fine since I can use the "ls" and "cat-blob" commands of fast-import
> to get efficiently the data I need.
>
> But I also need to look at some files whose names I don't know beforehand
> (i.e. all the files in some directories). If I do "cat-blob" on those
> directories I get some binary "thing" which I don't understand.
>
> So how do I get a directory listing from fast-inmport, i.e.
> like I can get with "git cat-file -p", but without having to fork
> a separate git process?
I'm not sure I understand your use case exactly, but is the directory
listing you want part of the newly-added objects from fast-import, or
does it already exist in the branches you are collecting from?
If the latter, I wonder if a separate "cat-file --batch" process could
give you what you need (it's a separate process, but you can start a
single process and make many queries of it; I assume your desire not to
add an extra process is to avoid the overhead).
But I think it won't pretty-print trees for you; it will give you the
raw tree data (which I imagine is what you are getting from cat-blob,
too). I'm not sure that's actually documented anywhere (it was part of
the original revisions of git, and hasn't changed since). But it is
basically:
tree = tree_entry*
tree_entry = mode SP path NUL sha1
mode = ascii mode, in octal (e.g., "100644")
path = <any byte except NUL>*
sha1 = <any byte>{20}
SP = ascii space (0x20)
NUL = 0-byte
So it is pretty simple to parse.
There may be a better way to do what you want with fast-import. I'm not
familiar enough with it to say.
-Peff
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How do I get the contents of a directory in fast-import
2016-01-15 22:39 ` Jeff King
@ 2016-01-16 1:59 ` Stefan Monnier
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-01-16 1:59 UTC (permalink / raw)
To: git
>> So how do I get a directory listing from fast-import, i.e.
>> like I can get with "git cat-file -p", but without having to fork
>> a separate git process?
> I'm not sure I understand your use case exactly, but is the directory
> listing you want part of the newly-added objects from fast-import, or
> does it already exist in the branches you are collecting from?
For the most important cases, the relevant revision already exists
before fast-import, yes.
> If the latter, I wonder if a separate "cat-file --batch" process could
> give you what you need (it's a separate process, but you can start a
I'm not sure exactly how "git cat-file --batch" works internally
(whether it tries to keep active revisions, like fast-import does), but
I've indeed used it successfully (tho for files).
> single process and make many queries of it; I assume your desire not to
> add an extra process is to avoid the overhead).
The overhead of starting a new process is one part, but another is the
overhead of re-reading the refs (I can have tens of thousands of
branches in my repository), etc..
> But I think it won't pretty-print trees for you; it will give you the
> raw tree data
Indeed.
> (which I imagine is what you are getting from cat-blob, too).
Actually no, "cat-blob" gives an error instead:
fatal: Object 2ca1672d50c9dbfe582dc53af3c7ce9891a7a664 is a tree but a blob was expected.
> I'm not sure that's actually documented anywhere (it was part of
> the original revisions of git, and hasn't changed since). But it is
> basically:
> tree = tree_entry*
> tree_entry = mode SP path NUL sha1
> mode = ascii mode, in octal (e.g., "100644")
> path = <any byte except NUL>*
> sha1 = <any byte>{20}
> SP = ascii space (0x20)
> NUL = 0-byte
Ah, thanks. It'd be great if cat-blob could return this instead of
signalling an error.
> So it is pretty simple to parse.
My program is written in /bin/sh so parsing the above is actually rather
inconvenient, but it's much better than just getting an error.
Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-16 2:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-01 15:54 How do I get the contents of a directory in fast-import Stefan Monnier
2016-01-09 23:56 ` Stefan Monnier
2016-01-15 22:39 ` Jeff King
2016-01-16 1:59 ` Stefan Monnier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).