git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sync() slowdown
@ 2008-05-26 14:26 Sebastien Gross
  2008-05-26 16:06 ` Kenneth P. Turvey
  2008-05-26 16:09 ` Matthieu Moy
  0 siblings, 2 replies; 4+ messages in thread
From: Sebastien Gross @ 2008-05-26 14:26 UTC (permalink / raw)
  To: git

Hi git users,

I use git (a very basic usage) every day and I noticed a big slowdown
when I do a "git repack -a -d".

I noticed that it only happens when I do backup to an usb stick.

After a few investigation, I noticed that sync() is call when repacking
objects (from both builtin-prune.c and builtin-prune-packed.c).

I do understand that syncing filesystem is usefull and needed.

But is there a good idea to add a --no-sync option to prevent that
behaviour ?

I think this might be useful if you repack many repositories.
If you call the sync command before looping the repacks I guess this
could be equivalent (modulo changes done in repositories during that
time).

Any idea suggestions ?


Thanks lot

Cheers.

-- 
Sébastien Gross

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: sync() slowdown
  2008-05-26 14:26 sync() slowdown Sebastien Gross
@ 2008-05-26 16:06 ` Kenneth P. Turvey
  2008-05-26 16:32   ` Sebastien Gross
  2008-05-26 16:09 ` Matthieu Moy
  1 sibling, 1 reply; 4+ messages in thread
From: Kenneth P. Turvey @ 2008-05-26 16:06 UTC (permalink / raw)
  To: git

On Mon, 26 May 2008 16:26:07 +0200, Sebastien Gross wrote:

> I do understand that syncing filesystem is usefull and needed.
> 
> But is there a good idea to add a --no-sync option to prevent that
> behaviour ?

Just a user here, but I would prefer it if it didn't sync at all.  If I 
want to sync it, I will, or the operating system will handle it like it 
does with all other file accesses.  

Just my 2 cents. 

I was just editing my backup script the other day and part of the problem 
with it was that I was syncing too often.  What I needed was a single 
sync when everything was done.  

This was possible because I was just doing copies and tars.  It should be 
possible with git too.  

-- 
Kenneth P. Turvey <kt-usenet@squeakydolphin.com>
http://www.electricsenator.net

  There are two major products that come out of Berkeley: LSD and UNIX.
  We don't believe this to be a coincidence.
        -- Jeremy S. Anderson

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: sync() slowdown
  2008-05-26 14:26 sync() slowdown Sebastien Gross
  2008-05-26 16:06 ` Kenneth P. Turvey
@ 2008-05-26 16:09 ` Matthieu Moy
  1 sibling, 0 replies; 4+ messages in thread
From: Matthieu Moy @ 2008-05-26 16:09 UTC (permalink / raw)
  To: Sebastien Gross; +Cc: git

Sebastien Gross <seb-git@chezwam.org> writes:

> I think this might be useful if you repack many repositories.
> If you call the sync command before looping the repacks I guess this
> could be equivalent (modulo changes done in repositories during that
> time).

I suppose git-repack does something like

write_new_data();
sync();
delete_old_data();

And if you remove the "sync" and your system crashes (or you eject
your USB key, or ...) while "delete_old_data" is done, but
"write_new_data" hasn't been sync-ed to the hard disk, you're in
trouble.

If you repack many repositories, I guess the first time is expansive,
but the next ones pay only for what they just did.

My 2 cents,

-- 
Matthieu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: sync() slowdown
  2008-05-26 16:06 ` Kenneth P. Turvey
@ 2008-05-26 16:32   ` Sebastien Gross
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastien Gross @ 2008-05-26 16:32 UTC (permalink / raw)
  To: git

On Mon, May 26, 2008 at 04:06:41PM +0000, kt-usenet@squeakydolphin.com wrote:
> On Mon, 26 May 2008 16:26:07 +0200, Sebastien Gross wrote:
> 
> > I do understand that syncing filesystem is usefull and needed.
> > 
> > But is there a good idea to add a --no-sync option to prevent that
> > behaviour ?
> 
> Just a user here, but I would prefer it if it didn't sync at all.  If I 
> want to sync it, I will, or the operating system will handle it like it 
> does with all other file accesses.  

Well I guess I missed something in my explanation.

I do my backup to an usb stick (somewhere like /media/usb0) and I work
in git dirs (somewhere in /srv/git-repo). Obviously these 2 mount points
are in different physical devices.

In a common run the system would sync cache and storage media when
needed.
But git (both prune and prune-packed command) call the sync() function
before pruning objects and packs:

builtin-prune-packed.c:

int cmd_prune_packed(int argc, const char **argv, const char *prefix)
...
  sync();
  prune_packed_objects(opts);
  return 0;
}

The code is exactly the same in builtin-prune.c.

calling sync is a good way to be sure that no unsaved data remains in
ram and then everything would be included in the packs.

This must remain the default behaviour.

But in some case, sync() would also act on usb storage (which is my
case) and would be very slow.

I do repack a lot of repositories something such as:
for d in *.git; do cd $d; git repack -a -d; cd ..; done

In the same time if I use the usb stick to do some backup on it, it
would change all the time then sync() would flush a changed cache for
each call.

That's why I suggested to add a --no-sync option to bypass the sync()
call.

In any case this would be a dangerous option to not use unless you know
what you are doing.

Cheers


-- 
Sébastien Gross

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-05-26 16:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-26 14:26 sync() slowdown Sebastien Gross
2008-05-26 16:06 ` Kenneth P. Turvey
2008-05-26 16:32   ` Sebastien Gross
2008-05-26 16:09 ` Matthieu Moy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).