git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* server-info dumbing-down
@ 2005-12-07 21:58 Petr Baudis
  2005-12-07 22:45 ` Junio C Hamano
  0 siblings, 1 reply; 2+ messages in thread
From: Petr Baudis @ 2005-12-07 21:58 UTC (permalink / raw)
  To: git

  Hello,

  I've noticed few commits from Dec 4 landing into the git repository,
which remove various computations and corresponding lines from the
server info (3e15c67 and few ancestors). I'm curious about this - were
the computations that hugely computationally expensive? If not, wouldn't
it be better to leave it in for future use (since it doesn't cost a lot)
rather than making the future deployment of anything using this data
much harder since the server infos won't have it anymore?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: server-info dumbing-down
  2005-12-07 21:58 server-info dumbing-down Petr Baudis
@ 2005-12-07 22:45 ` Junio C Hamano
  0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2005-12-07 22:45 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis <pasky@suse.cz> writes:

>   I've noticed few commits from Dec 4 landing into the git repository,
> which remove various computations and corresponding lines from the
> server info (3e15c67 and few ancestors). I'm curious about this - were
> the computations that hugely computationally expensive? If not, wouldn't
> it be better to leave it in for future use (since it doesn't cost a lot)
> rather than making the future deployment of anything using this data
> much harder since the server infos won't have it anymore?

T and D lines were expensive.  Very expensive.

They are not used by existing Porcelains.

It is dubious if those lines are useful.

The information those lines were attempting to give Porcelains
were designed way before dumb-transport clients were completed,
and it was purely a guesswork what _might_ be needed by them.

For example, I did not foresee that dumb-transport clients would
grab all the .idx files to see which packs are needed themselves
without consulting T lines (which turns out to be the right
thing to do anyway), and once they have the .idx file, clients
can do better computation themselves to pick which pack is the
best one to fetch without help from the T and D lines.

For example, if we implement the "staggered overlapping packs"
you suggested, the clients will face a choice when walking the
commit chain.  Two packs may give the object currently being
sought after.  Which one to pick?  One strategy would be to pick
the one that contains least number of objects we already have.
Another would be to pick the one that contains the most number
of objects we do not have yet.  This can be done with only
having .idx files, and you need to have .idx files for both of
them to realize that you have a choice to begin with.

IIRC, in your "staggered packs" approach, some recent objects
are left unpacked and also in the latest pack.  Clients that
have all the objects in the latest pack are better off walking
individual commits, while other clients that are way behind are
better off fetching the pack.  To help them, we would need to
describe the object database differently from the way
objects/info/packs attempted with those T and D lines.  We need
to say "if you do not have these objects, do not walk individual
commits beyond this commit, even though they are available as
loose objects, because you are better off grabbing this pack
instead".

For these reasons, I feel that the whole thing should be
stripped down first.  The improvements to the dumb-transport
clients may need additional information to be computed by
server-info, but it is far more likely than not that those
additional information would be quite different from what T and
D lines were giving them.  

The repacking strategy, the repacking program to prepare the
repository to be helpful to dumb-transport clients, the logic in
the clients to take advantage of that repacking strategy, and
the additional information server-info supplies to help that
happen, need to be designed together, and in this order.  The
old T/D lines were developed in a wrong order --- we did not
know what the best repacking strategy was (and I suspect we
still don't) and these lines were done without knowing if they
are useful.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-12-07 22:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-07 21:58 server-info dumbing-down Petr Baudis
2005-12-07 22:45 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).