Fast access git-rev-list output: some OS knowledge required

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fast access git-rev-list output: some OS knowledge required
@ 2006-12-06 19:24 Marco Costalba
  2006-12-06 19:28 ` Shawn Pearce
  0 siblings, 1 reply; 17+ messages in thread
From: Marco Costalba @ 2006-12-06 19:24 UTC (permalink / raw)
  To: Git Mailing List

I ask help to the list because my knowledge on this is not enough.

Currently qgit uses, socket based, QProcess class to read data from
'git rev-list'  when loading the repository at startup.

The time it takes to read, without processing, the whole Linux tree
with this approach it's almost _double_ of the time it takes 'git
rev-list' to write to a file:

$git rev-list --header --boundary --parents --topo-order HEAD >> tmp.txt

We are talking of about 7s against less then 4s, on my box (warm cache).

So I have a patch to make 'git rev-list' writing into a temporary file
and then read it in memory, perhaps it's not the cleaner way, but it's
faster, about 1s less.

I have browsed Qt sources and found that QProcess uses internal
buffers that are then copied again before to be used by the
application. File approach uses a call to read() /fread() buired
inside the Qt's QFile class, and no intermediate buffers, so perhaps
this could be the reason the second way it's faster.

Anyway there are some issues:

1) File tmp.txt is deleted as soon as read, but this is not enough
sometimes to avoid a costly and wasteful write access to disk by the
OS. What is the easiest, portable way to create a temporary 'in memory
only' file, with no disk access? Or at least delay the HD write access
enough to be able to read and delete the file before the fist block of
tmp.txt is flushed to disk?

2) There is a faster/cleaner (and *safe* ) way to access directly 'git
rev-list' output, something like (just as an example):

$git rev-list --header --boundary --parents --topo-order HEAD >> /dev/mem

Or something similar, possibly _simple_ and _portable_ , so to be able
to copy the big amount of 'git rev-list' output just once (about 30MB
with current tree).

3) Other suggestions?  ;-)

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:24 Fast access git-rev-list output: some OS knowledge required Marco Costalba
@ 2006-12-06 19:28 ` Shawn Pearce
  2006-12-06 19:34   ` Marco Costalba
  2006-12-06 23:27   ` Johannes Schindelin
  0 siblings, 2 replies; 17+ messages in thread
From: Shawn Pearce @ 2006-12-06 19:28 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List

Marco Costalba <mcostalba@gmail.com> wrote:
> The time it takes to read, without processing, the whole Linux tree
> with this approach it's almost _double_ of the time it takes 'git
> rev-list' to write to a file:
> 
> 3) Other suggestions?  ;-)

The revision listing machinery is fairly well isolated behind some
pretty clean APIs in Git.  Why not link qgit against libgit.a and
just do the revision listing in process?

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:28 ` Shawn Pearce
@ 2006-12-06 19:34   ` Marco Costalba
  2006-12-06 19:42     ` Shawn Pearce
  2006-12-06 23:27   ` Johannes Schindelin
  1 sibling, 1 reply; 17+ messages in thread
From: Marco Costalba @ 2006-12-06 19:34 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Git Mailing List

On 12/6/06, Shawn Pearce <spearce@spearce.org> wrote:
> Marco Costalba <mcostalba@gmail.com> wrote:
> > The time it takes to read, without processing, the whole Linux tree
> > with this approach it's almost _double_ of the time it takes 'git
> > rev-list' to write to a file:
> >
> > 3) Other suggestions?  ;-)
>
> The revision listing machinery is fairly well isolated behind some
> pretty clean APIs in Git.  Why not link qgit against libgit.a and
> just do the revision listing in process?
>

Where can I found some documentation (yes I know RTFS, but...) or,
better, an example of using the API to read git-rev-list output?

if it is possible I also would like to avoid to mess with internal git
API's, of course *if it is possible*  ;-)

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:34   ` Marco Costalba
@ 2006-12-06 19:42     ` Shawn Pearce
  2006-12-06 19:51       ` Shawn Pearce
  0 siblings, 1 reply; 17+ messages in thread
From: Shawn Pearce @ 2006-12-06 19:42 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List

Marco Costalba <mcostalba@gmail.com> wrote:
> On 12/6/06, Shawn Pearce <spearce@spearce.org> wrote:
> >Marco Costalba <mcostalba@gmail.com> wrote:
> >> The time it takes to read, without processing, the whole Linux tree
> >> with this approach it's almost _double_ of the time it takes 'git
> >> rev-list' to write to a file:
> >>
> >> 3) Other suggestions?  ;-)
> >
> >The revision listing machinery is fairly well isolated behind some
> >pretty clean APIs in Git.  Why not link qgit against libgit.a and
> >just do the revision listing in process?
> >
> 
> Where can I found some documentation (yes I know RTFS, but...) or,
> better, an example of using the API to read git-rev-list output?

builtin-rev-list.c.  :-)
 
I think all you may need is:

	#include "revision.h"
	...
	struct rev_info revs;
	init_revisions(&revs, prefix);
	revs.abbrev = 0;
	revs.commit_format = CMIT_FMT_UNSPECIFIED;
	argc = setup_revisions(argc, argv, &revs, NULL);

where argv just a char** of the arguments you were going to hand
to rev-list on the command line.

then get the data back:

	static void show_commit(struct commit *commit)
	{
		const char * hex = sha1_to_hex(commit->object.sha1);
		... copy from hex to your own structures ...
	}

	static void show_object(struct object_array_entry *p)
	{
		/* do nothing */
	}

	prepare_revision_walk(&revs);
	traverse_commit_list(&revs, show_commit, show_object);

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:42     ` Shawn Pearce
@ 2006-12-06 19:51       ` Shawn Pearce
  2006-12-06 20:08         ` Marco Costalba
  2006-12-07 13:25         ` Andreas Ericsson
  0 siblings, 2 replies; 17+ messages in thread
From: Shawn Pearce @ 2006-12-06 19:51 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List

Shawn Pearce <spearce@spearce.org> wrote:
> I think all you may need is:
> 
> 	#include "revision.h"
> 	...

You'll also need to call:

	setup_git_directory();

before any of the below; but that should be done once per process.

> 	struct rev_info revs;
> 	init_revisions(&revs, prefix);
> 	revs.abbrev = 0;
> 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
> 	argc = setup_revisions(argc, argv, &revs, NULL);

Although now that I think about it the library may not be enough
of a library.  Some data (e.g. commits) will stay in memory forever
once loaded.  Pack files won't be released once read; a pack recently
made available while the application is running may not get noticed.

Perhaps there is some fast IPC API supported by Qt that you could
use to run the revision listing outside of the main UI process,
to eliminate the bottlenecks you are seeing and remove the problems
noted above?  One that doesn't involve reading from a pipe I mean...

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:51       ` Shawn Pearce
@ 2006-12-06 20:08         ` Marco Costalba
  2006-12-06 20:18           ` Shawn Pearce
  2006-12-07 13:25         ` Andreas Ericsson
  1 sibling, 1 reply; 17+ messages in thread
From: Marco Costalba @ 2006-12-06 20:08 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Git Mailing List

On 12/6/06, Shawn Pearce <spearce@spearce.org> wrote:
> Shawn Pearce <spearce@spearce.org> wrote:
>
> Perhaps there is some fast IPC API supported by Qt that you could
> use to run the revision listing outside of the main UI process,
> to eliminate the bottlenecks you are seeing and remove the problems
> noted above?  One that doesn't involve reading from a pipe I mean...
>

Qt it's very fast in reading from files, also git-rev-list is fast in
write to a file...the problem is I would not want the file to be saved
on disk, but stay cached in the OS memory for the few seconds needed
to be written and read back, and then deleted. It's a kind of shared
memory at the end. But I don't know how to realize it.

Also let git-rev-list to write directly in qgit process address space
would be nice, indeed very nice.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 20:08         ` Marco Costalba
@ 2006-12-06 20:18           ` Shawn Pearce
  0 siblings, 0 replies; 17+ messages in thread
From: Shawn Pearce @ 2006-12-06 20:18 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List

Marco Costalba <mcostalba@gmail.com> wrote:
> On 12/6/06, Shawn Pearce <spearce@spearce.org> wrote:
> >Shawn Pearce <spearce@spearce.org> wrote:
> >
> >Perhaps there is some fast IPC API supported by Qt that you could
> >use to run the revision listing outside of the main UI process,
> >to eliminate the bottlenecks you are seeing and remove the problems
> >noted above?  One that doesn't involve reading from a pipe I mean...
> >
> 
> Qt it's very fast in reading from files, also git-rev-list is fast in
> write to a file...the problem is I would not want the file to be saved
> on disk, but stay cached in the OS memory for the few seconds needed
> to be written and read back, and then deleted. It's a kind of shared
> memory at the end. But I don't know how to realize it.

On a modern Linux (probably your largest target audience) a small
file which has a very short lifespan (few seconds) is unlikey to
hit the platter.  Most filesystems will put the data into buffer
cache and delay writing to disk because temporary files are so
common on UNIX.

Though our resident Linux experts may chime in with more details...

> Also let git-rev-list to write directly in qgit process address space
> would be nice, indeed very nice.

And ugly.  :-)

SysV IPC (shared memory, semaphores) are messy and difficult to
get right.  mmap against a random file in the filesystem tends
to work better on those systems which support it well, provided
that the file isn't on a network mount.  But again you still need
semaphores or something like them to control access to the data in
the mmap'd region.

I was thinking that maybe if Qt had a bounded buffer available for
use between a process and its child, that you could use that to run
your own "qgit-rev-list" child and get the data back more quickly,
without the need for a temporary file.  But it doesn't look like
they have one.  Oh well.

Your current temporary file approach is probably the best you can
get, and has the simplest possible implementation.  Doing better
would require linking against libgit.a, and getting the core Git
hackers to make at least the revision machinery more useful in a
library setting.

-- 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:51       ` Shawn Pearce
  2006-12-06 20:08         ` Marco Costalba
@ 2006-12-07 13:25         ` Andreas Ericsson
  2006-12-07 14:53           ` Johannes Schindelin
  2006-12-08 18:34           ` Marco Costalba
  1 sibling, 2 replies; 17+ messages in thread
From: Andreas Ericsson @ 2006-12-07 13:25 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Marco Costalba, Git Mailing List

Shawn Pearce wrote:
> 
> Perhaps there is some fast IPC API supported by Qt that you could
> use to run the revision listing outside of the main UI process,
> to eliminate the bottlenecks you are seeing and remove the problems
> noted above?  One that doesn't involve reading from a pipe I mean...
> 

Why not just fork() + exec() and read from the filedescriptor? You can 
up the output buffer of the forked program to something suitable, which 
means the OS will cache it for you until you copy it to a buffer in qgit 
(i.e., read from the descriptor).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-07 13:25         ` Andreas Ericsson
@ 2006-12-07 14:53           ` Johannes Schindelin
  2006-12-07 15:28             ` Andreas Ericsson
  2006-12-08 18:34           ` Marco Costalba
  1 sibling, 1 reply; 17+ messages in thread
From: Johannes Schindelin @ 2006-12-07 14:53 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Shawn Pearce, Marco Costalba, Git Mailing List

Hi,

On Thu, 7 Dec 2006, Andreas Ericsson wrote:

> Shawn Pearce wrote:
> > 
> > Perhaps there is some fast IPC API supported by Qt that you could use 
> > to run the revision listing outside of the main UI process, to 
> > eliminate the bottlenecks you are seeing and remove the problems noted 
> > above?  One that doesn't involve reading from a pipe I mean...
> > 
> 
> Why not just fork() + exec() and read from the filedescriptor? You can 
> up the output buffer of the forked program to something suitable, which 
> means the OS will cache it for you until you copy it to a buffer in qgit 
> (i.e., read from the descriptor).

Could somebody remind me why different processes are needed? I thought 
that the revision machinery should be used directly, by linking to 
libgit.a...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-07 14:53           ` Johannes Schindelin
@ 2006-12-07 15:28             ` Andreas Ericsson
  2006-12-07 16:01               ` Johannes Schindelin
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Ericsson @ 2006-12-07 15:28 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn Pearce, Marco Costalba, Git Mailing List

Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 7 Dec 2006, Andreas Ericsson wrote:
> 
>> Shawn Pearce wrote:
>>> Perhaps there is some fast IPC API supported by Qt that you could use 
>>> to run the revision listing outside of the main UI process, to 
>>> eliminate the bottlenecks you are seeing and remove the problems noted 
>>> above?  One that doesn't involve reading from a pipe I mean...
>>>
>> Why not just fork() + exec() and read from the filedescriptor? You can 
>> up the output buffer of the forked program to something suitable, which 
>> means the OS will cache it for you until you copy it to a buffer in qgit 
>> (i.e., read from the descriptor).
> 
> Could somebody remind me why different processes are needed? I thought 
> that the revision machinery should be used directly, by linking to 
> libgit.a...
> 

You wrote:
--%<--%<--%<--
Because, depending on what you do, the revision machinery is not
reentrable. For example, if you filter by filename, the history is
rewritten in-memory to simulate a history where just that filename was
tracked, and nothing else. These changes are not cleaned up after 
calling the internal revision machinery.
--%<--%<--%<--

When I wrote the above suggestion, I hadn't read the posts following the 
email where I cut this text from (where Linus said "we can add a 'reset' 
thingie to the revision walking machinery" and Marco replied with some 
more questions).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-07 15:28             ` Andreas Ericsson
@ 2006-12-07 16:01               ` Johannes Schindelin
  0 siblings, 0 replies; 17+ messages in thread
From: Johannes Schindelin @ 2006-12-07 16:01 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Shawn Pearce, Marco Costalba, Git Mailing List

Hi,

On Thu, 7 Dec 2006, Andreas Ericsson wrote:

> Johannes Schindelin wrote:
> > Hi,
> > 
> > On Thu, 7 Dec 2006, Andreas Ericsson wrote:
> > 
> > > Shawn Pearce wrote:
> > > > Perhaps there is some fast IPC API supported by Qt that you could use to
> > > > run the revision listing outside of the main UI process, to eliminate
> > > > the bottlenecks you are seeing and remove the problems noted above?  One
> > > > that doesn't involve reading from a pipe I mean...
> > > > 
> > > Why not just fork() + exec() and read from the filedescriptor? You can up
> > > the output buffer of the forked program to something suitable, which means
> > > the OS will cache it for you until you copy it to a buffer in qgit (i.e.,
> > > read from the descriptor).
> > 
> > Could somebody remind me why different processes are needed? I thought that
> > the revision machinery should be used directly, by linking to libgit.a...
> > 
> 
> You wrote:
> --%<--%<--%<--
> Because, depending on what you do, the revision machinery is not
> reentrable. For example, if you filter by filename, the history is
> rewritten in-memory to simulate a history where just that filename was
> tracked, and nothing else. These changes are not cleaned up after calling the
> internal revision machinery.
> --%<--%<--%<--
> 
> When I wrote the above suggestion, I hadn't read the posts following the 
> email where I cut this text from (where Linus said "we can add a 'reset' 
> thingie to the revision walking machinery" and Marco replied with some 
> more questions).

Yes. The reset thingie is already in place: clear_commit_marks(). It would 
have to be enhanced a little, though:

1) the function rewrite_parents(), should add another flag, HALFORPHANED, 
   and
2) clear_commit_marks() should unset the "parsed" flag of the commits for 
   which HALFORPHANED is reset.

-- snip --
diff --git a/commit.c b/commit.c
index d5103cd..fd225c8 100644
--- a/commit.c
+++ b/commit.c
@@ -431,6 +431,10 @@ void clear_commit_marks(struct commit *commit, unsigned int mark)
 {
 	struct commit_list *parents;
 
+	/* were parents rewritten? */
+	if ((mark & commit->object.flags) & HALFORPHANED)
+		commit->object.parsed = 0;
+
 	commit->object.flags &= ~mark;
 	parents = commit->parents;
 	while (parents) {
diff --git a/revision.c b/revision.c
index 993bb66..461ee06 100644
--- a/revision.c
+++ b/revision.c
@@ -1097,6 +1097,7 @@ static void rewrite_parents(struct rev_info *revs, struct commit *commit)
 		struct commit_list *parent = *pp;
 		if (rewrite_one(revs, &parent->item) < 0) {
 			*pp = parent->next;
+			commit->object.flags |= HALFORPHANED;
 			continue;
 		}
 		pp = &parent->next;
diff --git a/revision.h b/revision.h
index 3adab95..544238c 100644
--- a/revision.h
+++ b/revision.h
@@ -9,6 +9,7 @@
 #define BOUNDARY	(1u<<5)
 #define BOUNDARY_SHOW	(1u<<6)
 #define ADDED		(1u<<7)	/* Parents already parsed and added? */
+#define HALFORPHANED	(1u<<8) /* parents were rewritten */
 
 struct rev_info;
 struct log_info;
-- snap --

Note that this is just the idea. This particular implementation opens a 
gaping memory leak, since the buffer of the commit is not free()d, and a 
reparse would probably not pick up on the fact that the parent commits are 
already in memory.

Ciao,
Dscho

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-07 13:25         ` Andreas Ericsson
  2006-12-07 14:53           ` Johannes Schindelin
@ 2006-12-08 18:34           ` Marco Costalba
  2006-12-08 20:10             ` Michael K. Edwards
  1 sibling, 1 reply; 17+ messages in thread
From: Marco Costalba @ 2006-12-08 18:34 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Shawn Pearce, Git Mailing List

On 12/7/06, Andreas Ericsson <ae@op5.se> wrote:
> Shawn Pearce wrote:
> >
> > Perhaps there is some fast IPC API supported by Qt that you could
> > use to run the revision listing outside of the main UI process,
> > to eliminate the bottlenecks you are seeing and remove the problems
> > noted above?  One that doesn't involve reading from a pipe I mean...
> >
>
> Why not just fork() + exec() and read from the filedescriptor? You can
> up the output buffer of the forked program to something suitable, which
> means the OS will cache it for you until you copy it to a buffer in qgit
> (i.e., read from the descriptor).
>

Please, what do you mean with "something suitable"? How can I redirect
the output to a memory buffer or to a file that the OS will cache
*until* I've copied it?

If I redirect to a 'normal' file, this will be flushed by OS after
some time, normally few seconds.

Could you please post links with examples/docs about this kind of
implementation?


Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-08 18:34           ` Marco Costalba
@ 2006-12-08 20:10             ` Michael K. Edwards
  2006-12-09 12:15               ` Marco Costalba
  0 siblings, 1 reply; 17+ messages in thread
From: Michael K. Edwards @ 2006-12-08 20:10 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Andreas Ericsson, Shawn Pearce, Git Mailing List

There is a very handy solution to this problem called "tmpfs".  It
should already be mounted at /tmp.  Put tmp.txt there and your problem
will go away.

Cheers,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-08 20:10             ` Michael K. Edwards
@ 2006-12-09 12:15               ` Marco Costalba
  0 siblings, 0 replies; 17+ messages in thread
From: Marco Costalba @ 2006-12-09 12:15 UTC (permalink / raw)
  To: Michael K. Edwards; +Cc: Andreas Ericsson, Shawn Pearce, Git Mailing List

On 12/8/06, Michael K. Edwards <medwards.linux@gmail.com> wrote:
> There is a very handy solution to this problem called "tmpfs".  It
> should already be mounted at /tmp.  Put tmp.txt there and your problem
> will go away.
>

Thanks Michael,

It seems to work! patch pushed.

 Marco

P.S: I've looked again to Shawn idea (and code) of linking qgit
against libgit.a but I found these two difficult points:

- traverse_commit_list(&revs, show_commit, show_object) is blocking,
i.e. the GUI will stop responding for few seconds while traversing the
list. This is easily and transparently solved by the OS scheduler if
an external process is used for git-rev-list. To solve this in qgit I
have two ways: 1) call QEventLoop() once in a while from inside
show_commit()/ show_object() to process pending events  2) Use a
separate thread (QThread class). The first idea is not nice, the
second opens a whole a new set of problems and it's a big amount of
not trivial new code to add.

-  traverse_commit_list() having an internal state it's not
re-entrant. git-rev-list it's used to load main view data but also
file history in another tab, and the two calls _could_ be ran
concurrently. With external process I simply run two instances of
DataLoader class and consequently two external git-rev-list processes,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 19:28 ` Shawn Pearce
  2006-12-06 19:34   ` Marco Costalba
@ 2006-12-06 23:27   ` Johannes Schindelin
  2006-12-07  0:47     ` Linus Torvalds
  1 sibling, 1 reply; 17+ messages in thread
From: Johannes Schindelin @ 2006-12-06 23:27 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Marco Costalba, Git Mailing List

Hi,

On Wed, 6 Dec 2006, Shawn Pearce wrote:

> Marco Costalba <mcostalba@gmail.com> wrote:
> > The time it takes to read, without processing, the whole Linux tree
> > with this approach it's almost _double_ of the time it takes 'git
> > rev-list' to write to a file:
> > 
> > 3) Other suggestions?  ;-)
> 
> The revision listing machinery is fairly well isolated behind some 
> pretty clean APIs in Git.  Why not link qgit against libgit.a and just 
> do the revision listing in process?

Because, depending on what you do, the revision machinery is not 
reentrable. For example, if you filter by filename, the history is 
rewritten in-memory to simulate a history where just that filename was 
tracked, and nothing else. These changes are not cleaned up after calling 
the internal revision machinery.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-06 23:27   ` Johannes Schindelin
@ 2006-12-07  0:47     ` Linus Torvalds
  2006-12-07  6:46       ` Marco Costalba
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2006-12-07  0:47 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn Pearce, Marco Costalba, Git Mailing List



On Thu, 7 Dec 2006, Johannes Schindelin wrote:
> 
> Because, depending on what you do, the revision machinery is not 
> reentrable. For example, if you filter by filename, the history is 
> rewritten in-memory to simulate a history where just that filename was 
> tracked, and nothing else. These changes are not cleaned up after calling 
> the internal revision machinery.

Well, it really wouldn't be that hard to add a new library interface to 
"reset object state". We could fairly trivially either:

 - walk all objects in the object hashes and clear all the flags.
 - just clear all objects _and_ the hashes.

Yes, it implies a small amount of manual "management", but considering 
that the reason it needs to be manual is that the functions simply _need_ 
the state, is that such a big deal?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fast access git-rev-list output: some OS knowledge required
  2006-12-07  0:47     ` Linus Torvalds
@ 2006-12-07  6:46       ` Marco Costalba
  0 siblings, 0 replies; 17+ messages in thread
From: Marco Costalba @ 2006-12-07  6:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Shawn Pearce, Git Mailing List

On 12/7/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Thu, 7 Dec 2006, Johannes Schindelin wrote:
> >
> > Because, depending on what you do, the revision machinery is not
> > reentrable. For example, if you filter by filename, the history is
> > rewritten in-memory to simulate a history where just that filename was
> > tracked, and nothing else. These changes are not cleaned up after calling
> > the internal revision machinery.
>
> Well, it really wouldn't be that hard to add a new library interface to
> "reset object state". We could fairly trivially either:
>

So the library approach sounds like the best?

Of course in this case the producer git-rev-list and the receiver use
the same address space.

In the case of a temporary file data is first copied to OS disk cache
buffers and then again to userspace, in qgit address space. But the
real pain is that the temporary file is always flushed to disk after
4-5 seconds from creation, also if under heavy read/write activity.
This is a problem for big repos. I really don't know how to workaround
this useless disk flush.

Finally, what about using some kind of shared memory at run time,
instead of _sharing_ developer libraries ;-) ? is it too messy?

Probably the concurrent reading while writing is possible without
syncro if the reader understands that a sequence of _two_ or more \0
it means the end of current write stream if producer is still running
or the end of data if producer is not running anymore. I use a similar
approach in the 'temporary file' patch where receiver is able to read
while producer writes without explicit synchronization. In that case a
read() of a block smaller then maximum with producer still running is
used as the 'break' condition in the receiver while loop.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-12-09 12:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-06 19:24 Fast access git-rev-list output: some OS knowledge required Marco Costalba
2006-12-06 19:28 ` Shawn Pearce
2006-12-06 19:34   ` Marco Costalba
2006-12-06 19:42     ` Shawn Pearce
2006-12-06 19:51       ` Shawn Pearce
2006-12-06 20:08         ` Marco Costalba
2006-12-06 20:18           ` Shawn Pearce
2006-12-07 13:25         ` Andreas Ericsson
2006-12-07 14:53           ` Johannes Schindelin
2006-12-07 15:28             ` Andreas Ericsson
2006-12-07 16:01               ` Johannes Schindelin
2006-12-08 18:34           ` Marco Costalba
2006-12-08 20:10             ` Michael K. Edwards
2006-12-09 12:15               ` Marco Costalba
2006-12-06 23:27   ` Johannes Schindelin
2006-12-07  0:47     ` Linus Torvalds
2006-12-07  6:46       ` Marco Costalba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).