git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How do I get the contents of a directory in fast-import
@ 2016-01-01 15:54 Stefan Monnier
  2016-01-09 23:56 ` Stefan Monnier
  2016-01-15 22:39 ` Jeff King
  0 siblings, 2 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-01-01 15:54 UTC (permalink / raw)
  To: git

I have a program which tries to collect info from lots of branches and
generate some table from that data into another branch.

For performance reasons, I'd like to do that from fast-import, and as
long as I know the name of all the files I need to consult, everything
is fine since I can use the "ls" and "cat-blob" commands of fast-import
to get efficiently the data I need.

But I also need to look at some files whose names I don't know beforehand
(i.e. all the files in some directories).  If I do "cat-blob" on those
directories I get some binary "thing" which I don't understand.

So how do I get a directory listing from fast-inmport, i.e.
like I can get with "git cat-file -p", but without having to fork
a separate git process?


        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How do I get the contents of a directory in fast-import
  2016-01-01 15:54 How do I get the contents of a directory in fast-import Stefan Monnier
@ 2016-01-09 23:56 ` Stefan Monnier
  2016-01-15 22:39 ` Jeff King
  1 sibling, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-01-09 23:56 UTC (permalink / raw)
  To: git

Any help would be greatly welcome, including "sorry, can't do that".


        Stefan


>>>>> "Stefan" == Stefan Monnier <monnier@iro.umontreal.ca> writes:

> I have a program which tries to collect info from lots of branches and
> generate some table from that data into another branch.

> For performance reasons, I'd like to do that from fast-import, and as
> long as I know the name of all the files I need to consult, everything
> is fine since I can use the "ls" and "cat-blob" commands of fast-import
> to get efficiently the data I need.

> But I also need to look at some files whose names I don't know beforehand
> (i.e. all the files in some directories).  If I do "cat-blob" on those
> directories I get some binary "thing" which I don't understand.

> So how do I get a directory listing from fast-inmport, i.e.
> like I can get with "git cat-file -p", but without having to fork
> a separate git process?


>         Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How do I get the contents of a directory in fast-import
  2016-01-01 15:54 How do I get the contents of a directory in fast-import Stefan Monnier
  2016-01-09 23:56 ` Stefan Monnier
@ 2016-01-15 22:39 ` Jeff King
  2016-01-16  1:59   ` Stefan Monnier
  1 sibling, 1 reply; 4+ messages in thread
From: Jeff King @ 2016-01-15 22:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: git

On Fri, Jan 01, 2016 at 10:54:00AM -0500, Stefan Monnier wrote:

> I have a program which tries to collect info from lots of branches and
> generate some table from that data into another branch.
> 
> For performance reasons, I'd like to do that from fast-import, and as
> long as I know the name of all the files I need to consult, everything
> is fine since I can use the "ls" and "cat-blob" commands of fast-import
> to get efficiently the data I need.
> 
> But I also need to look at some files whose names I don't know beforehand
> (i.e. all the files in some directories).  If I do "cat-blob" on those
> directories I get some binary "thing" which I don't understand.
> 
> So how do I get a directory listing from fast-inmport, i.e.
> like I can get with "git cat-file -p", but without having to fork
> a separate git process?

I'm not sure I understand your use case exactly, but is the directory
listing you want part of the newly-added objects from fast-import, or
does it already exist in the branches you are collecting from?

If the latter, I wonder if a separate "cat-file --batch" process could
give you what you need (it's a separate process, but you can start a
single process and make many queries of it; I assume your desire not to
add an extra process is to avoid the overhead).

But I think it won't pretty-print trees for you; it will give you the
raw tree data (which I imagine is what you are getting from cat-blob,
too).  I'm not sure that's actually documented anywhere (it was part of
the original revisions of git, and hasn't changed since). But it is
basically:

  tree = tree_entry*
  tree_entry = mode SP path NUL sha1
  mode = ascii mode, in octal (e.g., "100644")
  path = <any byte except NUL>*
  sha1 = <any byte>{20}
  SP = ascii space (0x20)
  NUL = 0-byte

So it is pretty simple to parse.

There may be a better way to do what you want with fast-import. I'm not
familiar enough with it to say.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How do I get the contents of a directory in fast-import
  2016-01-15 22:39 ` Jeff King
@ 2016-01-16  1:59   ` Stefan Monnier
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2016-01-16  1:59 UTC (permalink / raw)
  To: git

>> So how do I get a directory listing from fast-import, i.e.
>> like I can get with "git cat-file -p", but without having to fork
>> a separate git process?
> I'm not sure I understand your use case exactly, but is the directory
> listing you want part of the newly-added objects from fast-import, or
> does it already exist in the branches you are collecting from?

For the most important cases, the relevant revision already exists
before fast-import, yes.

> If the latter, I wonder if a separate "cat-file --batch" process could
> give you what you need (it's a separate process, but you can start a

I'm not sure exactly how "git cat-file --batch" works internally
(whether it tries to keep active revisions, like fast-import does), but
I've indeed used it successfully (tho for files).

> single process and make many queries of it; I assume your desire not to
> add an extra process is to avoid the overhead).

The overhead of starting a new process is one part, but another is the
overhead of re-reading the refs (I can have tens of thousands of
branches in my repository), etc..

> But I think it won't pretty-print trees for you; it will give you the
> raw tree data

Indeed.

> (which I imagine is what you are getting from cat-blob, too).

Actually no, "cat-blob" gives an error instead:

    fatal: Object 2ca1672d50c9dbfe582dc53af3c7ce9891a7a664 is a tree but a blob was expected.

> I'm not sure that's actually documented anywhere (it was part of
> the original revisions of git, and hasn't changed since). But it is
> basically:

>   tree = tree_entry*
>   tree_entry = mode SP path NUL sha1
>   mode = ascii mode, in octal (e.g., "100644")
>   path = <any byte except NUL>*
>   sha1 = <any byte>{20}
>   SP = ascii space (0x20)
>   NUL = 0-byte

Ah, thanks.  It'd be great if cat-blob could return this instead of
signalling an error.

> So it is pretty simple to parse.

My program is written in /bin/sh so parsing the above is actually rather
inconvenient, but it's much better than just getting an error.


        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-16  2:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-01 15:54 How do I get the contents of a directory in fast-import Stefan Monnier
2016-01-09 23:56 ` Stefan Monnier
2016-01-15 22:39 ` Jeff King
2016-01-16  1:59   ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).