policy and mechanism for less-connected clients

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* policy and mechanism for less-connected clients
@ 2008-06-25  0:36 David Jeske
  0 siblings, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25  0:36 UTC (permalink / raw)
  To: git

I'd like to hear feedback and ideas about a different mechanism than is being
used with git-pull or git-push.

The purpose of this mechanism is to host a distributed source repository in a
world where most most developer contributors are behind firewalls and do not
have access to, or do not want to configure a unix server, ftp, or ssh to
possibly contribute to a project. The model of allowing less-authoritative
developers to make their changes available for more-authoritative users to pull
is accepted as superior. However, no users are assumed to be authoritative over
each-other, or an entire tree, and many users should have authority only to
supply new deltas to their own branches. The ability to handle emailed patches
is an asset, but is deemed too manual for this need.

I believe git's design is strong; that many of the mechanisms are already
built; that new mechanisms to build this can be simple; and that with such
mechanisms, many more developers would have access to git's decentralized
development style. Further, it would address drawbacks in today's git relative
to public central version control systems, making this system closer to a 'best
of both worlds'.

design assumptions:

- all developers are firewalled and can not be "pulled" from directly.
- there can be one or more well-connected servers which all users can access.
- .. but which they cannot have ssh, ftp, or other dangerous access to
- .. and whose protocol should be layered on http(s)
- there is a shared namespace for branches, and tags
- .. users are not-trusted to change the branches or tags of other users
- .. only certain users are trusted to change the shared origin branches
- .. also allow directory ACLS on shared branch commits
- all their DAGs should be in a single repository for space efficiency
- users generally want to follow well-named branches
- .. will be free to follow any branch, and pull changes from any branch

I would like to make it easy for users to:

(a) safely "share" every DAG, branch, and tag data in their repository to a
well-connected server, into an established namespace, while only changing
branches and tags in their namespace. This will allow all users to see the
changes of other users, without needing direct access to their trees (which are
inaccessible behind firewalls). [1]

(b) fetch selected DAG, branch, and tag data of others to their tree, to see
the changes of others (whether merged with head or not) while disconnected or
remote.

(c) grant and enforce permission for certain users to submit _merges only_ onto
certain sub-portions of the "well-named branches"

There are many many benefits of git's mechanisms for this topology, and I
expect you know them so I'll skip them. I see the following challenges from the
current git implementation. Please tell me where I'm mistaken.. this is all
AFAIK, some from tests, some from docs.

(1) A server will need to support the required permissions and isolation
enforcement. Namely, permissions for portions of the branch/tag namespace,
assurance that DAGs are valid, and directory permissions.

(2) a "share" client command will need to be implemented which transmits-up
local changes to only my DAGs, branches, and tags without affecting the shared
origin namespace pointers on the server. It will share all these changes
regardless of what the user's "active" branch is. Local branches might be
mapped to a branch on the server such as origin/users/(username)/(branchname).
Branches which are supposed to stay local might be named "local/branch", and be
ignored by the "share". [1]

(3) a mechanism for controlling permissions (possibly based on checking out and
editing a special subtree, ala cvs)

(4) A mechanism to be sure "share" does not cause users who have permission to
inadvertently move the origin/master branch pointer, even if they are working
on their local master branch. For example, their changes would be named by
origin/users/(username)/master. This is necessary because "share"  is the only
way for the firewalled user to make their changes available to others. As a
result, it is imperative that this be separate from a decision to promote their
changes onto the shared origin branch. Currently git-push implies both of these
together. git-share would be to git-push what git-fetch is git-pull. git-push
would continue to be used to tell the system you wish to promote your change to
origin/master. [2]

[1] - "share" permissions can be considered two ways. In the strictly client
server model, the server will only allow the client to change branch pointers
that it owns in the namespace. However, if clients establish their own PGP-keys
or other hash-identity keys with the server, then branch changes may be signed
by clients, and propagate between clients in any direction and order, until
they fully propagate. It's not clear if this additional complexity is worth it.

[2] - it might be reasonable to build a mechanism to allow a local "intent to
promote" preceed a git-share, in which case git-share could safetly
fast-forward the head. However, it's unclear what benefit this has over
git-fetch.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* policy and mechanism for less-connected clients
@ 2008-06-25  0:36 David Jeske
  0 siblings, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25  0:36 UTC (permalink / raw)
  To: git

I'd like to hear feedback and ideas about a different mechanism than is being
used with git-pull or git-push.

The purpose of this mechanism is to host a distributed source repository in a
world where most most developer contributors are behind firewalls and do not
have access to, or do not want to configure a unix server, ftp, or ssh to
possibly contribute to a project. The model of allowing less-authoritative
developers to make their changes available for more-authoritative users to pull
is accepted as superior. However, no users are assumed to be authoritative over
each-other, or an entire tree, and many users should have authority only to
supply new deltas to their own branches. The ability to handle emailed patches
is an asset, but is deemed too manual for this need.

I believe git's design is strong; that many of the mechanisms are already
built; that new mechanisms to build this can be simple; and that with such
mechanisms, many more developers would have access to git's decentralized
development style. Further, it would address drawbacks in today's git relative
to public central version control systems, making this system closer to a 'best
of both worlds'.

design assumptions:

- all developers are firewalled and can not be "pulled" from directly.
- there can be one or more well-connected servers which all users can access.
- .. but which they cannot have ssh, ftp, or other dangerous access to
- .. and whose protocol should be layered on http(s)
- there is a shared namespace for branches, and tags
- .. users are not-trusted to change the branches or tags of other users
- .. only certain users are trusted to change the shared origin branches
- .. also allow directory ACLS on shared branch commits
- all their DAGs should be in a single repository for space efficiency
- users generally want to follow well-named branches
- .. will be free to follow any branch, and pull changes from any branch

I would like to make it easy for users to:

(a) safely "share" every DAG, branch, and tag data in their repository to a
well-connected server, into an established namespace, while only changing
branches and tags in their namespace. This will allow all users to see the
changes of other users, without needing direct access to their trees (which are
inaccessible behind firewalls). [1]

(b) fetch selected DAG, branch, and tag data of others to their tree, to see
the changes of others (whether merged with head or not) while disconnected or
remote.

(c) grant and enforce permission for certain users to submit _merges only_ onto
certain sub-portions of the "well-named branches"

There are many many benefits of git's mechanisms for this topology, and I
expect you know them so I'll skip them. I see the following challenges from the
current git implementation. Please tell me where I'm mistaken.. this is all
AFAIK, some from tests, some from docs.

(1) A server will need to support the required permissions and isolation
enforcement. Namely, permissions for portions of the branch/tag namespace,
assurance that DAGs are valid, and directory permissions.

(2) a "share" client command will need to be implemented which transmits-up
local changes to only my DAGs, branches, and tags without affecting the shared
origin namespace pointers on the server. It will share all these changes
regardless of what the user's "active" branch is. Local branches might be
mapped to a branch on the server such as origin/users/(username)/(branchname).
Branches which are supposed to stay local might be named "local/branch", and be
ignored by the "share". [1]

(3) a mechanism for controlling permissions (possibly based on checking out and
editing a special subtree, ala cvs)

(4) A mechanism to be sure "share" does not cause users who have permission to
inadvertently move the origin/master branch pointer, even if they are working
on their local master branch. For example, their changes would be named by
origin/users/(username)/master. This is necessary because "share"  is the only
way for the firewalled user to make their changes available to others. As a
result, it is imperative that this be separate from a decision to promote their
changes onto the shared origin branch. Currently git-push implies both of these
together. git-share would be to git-push what git-fetch is git-pull. git-push
would continue to be used to tell the system you wish to promote your change to
origin/master. [2]

[1] - "share" permissions can be considered two ways. In the strictly client
server model, the server will only allow the client to change branch pointers
that it owns in the namespace. However, if clients establish their own PGP-keys
or other hash-identity keys with the server, then branch changes may be signed
by clients, and propagate between clients in any direction and order, until
they fully propagate. It's not clear if this additional complexity is worth it.

[2] - it might be reasonable to build a mechanism to allow a local "intent to
promote" preceed a git-share, in which case git-share could safetly
fast-forward the head. However, it's unclear what benefit this has over
git-fetch.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
@ 2008-06-25  2:33 Theodore Tso
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6@3N@FEDjCXZO>
  0 siblings, 1 reply; 28+ messages in thread
From: Theodore Tso @ 2008-06-25  2:33 UTC (permalink / raw)
  To: David Jeske; +Cc: git

On Wed, Jun 25, 2008 at 12:36:03AM -0000, David Jeske wrote:
> The purpose of this mechanism is to host a distributed source
> repository in a world where most most developer contributors are
> behind firewalls and do not have access to, or do not want to
> configure a unix server, ftp, or ssh to possibly contribute to a
> project. 

> design assumptions:
> 
> - all developers are firewalled and can not be "pulled" from directly.
> - there can be one or more well-connected servers which all users can access.
> - .. but which they cannot have ssh, ftp, or other dangerous access to
> - .. and whose protocol should be layered on http(s)
> - there is a shared namespace for branches, and tags
> - .. users are not-trusted to change the branches or tags of other users

Up to here, you can do this all with repo.or.cz, and/or github; you
just give each developer their own repository, which they are allowed
to push to, and no once else.  Within their own repository they can
make changes to their branches, so that all works just fine.

> (a) safely "share" every DAG, branch, and tag data in their
> repository to a well-connected server, into an established
> namespace, while only changing branches and tags in their
> namespace. This will allow all users to see the changes of other
> users, without needing direct access to their trees (which are
> inaccessible behind firewalls). [1]

Right, so thats github and/or git.or.cz.  Each user gets his/her own
> repository, but thats a very minor change.  Not a big deal.

> (b) fetch selected DAG, branch, and tag data of others to their tree, to see
> the changes of others (whether merged with head or not) while disconnected or
> remote.

This is also easy; you just establish remote tracking branches.  I
have a single shell scripted command, git-get-all, which pulls from
all of the repositories I am interested in into various remote
tracking branches so while I am disconnected, I can see what other
folks have done on their trees.

> (c) grant and enforce permission for certain users to submit _merges
> only_ onto certain sub-portions of the "well-named branches"

This is the wierd one.  *** Why ***?  There is nothing magical about
merges; all a merge is a commit that contains more than one parent.
You can put anything into a merge, and in theory the result of a merge
could have nothing to do with either parent.  It would be a very
perverse merge, but it's certainly possible.  So what's the point of
trying to enforce rules about "merges only"?

					- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6@3N@FEDjCXZO>
  2008-06-25  5:20   ` David Jeske
@ 2008-06-25  5:20   ` David Jeske
  2008-06-25  9:30     ` Jakub Narebski
  1 sibling, 1 reply; 28+ messages in thread
From: David Jeske @ 2008-06-25  5:20 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

-- Theodore Tso wrote:
> Up to here, you can do this all with repo.or.cz, and/or github; you
> just give each developer their own repository, which they are allowed
> to push to, and no once else. Within their own repository they can
> make changes to their branches, so that all works just fine.

Yup. That's one of the reasons git is so attractive. There is some good stuff
under "here" though....

> > (a) safely "share" every DAG, branch, and tag data in their
> > repository to a well-connected server, into an established
> > namespace, while only changing branches and tags in their
> > namespace. This will allow all users to see the changes of other
> > users, without needing direct access to their trees (which are
> > inaccessible behind firewalls). [1]
>
> Right, so thats github and/or git.or.cz. Each user gets his/her own
> repository, but thats a very minor change. Not a big deal.

...most notably, all their DAGs in a single repository to save space is
important. Thousands of copies of thousands of repositories adds up. Especially
when most of the users who want to commit something probably commit <1-10k of
unique stuff. Seems pretty easy to change though. git.or.cz and github will
both be wanting this eventually.

The other big one is ACLs in 'well named' repositories, so multiple people can
safely be allowed to add changes to them, without giving them ability to blow
away the repository. I can see this isn't the way all git users work, but at
least a few users working this way now with shared push repositories. This is
just making it 'safer'. Also seems pretty easy to do.

> > (b) fetch selected DAG, branch, and tag data of others to their tree, to
see
> > the changes of others (whether merged with head or not) while disconnected
or
> > remote.
>
> This is also easy; you just establish remote tracking branches. I
> have a single shell scripted command, git-get-all, which pulls from
> all of the repositories I am interested in into various remote
> tracking branches so while I am disconnected, I can see what other
> folks have done on their trees.

Yes, so I'd have the same thing, except instead of a remote repository, it
would be a pattern of the branch namespace, such as /origin/users/jeske/*. It
doesn't seem like the current remote tracking branch stuff can do this, but it
would be easy to provide a client wrapper that would. Users who tracked the
whole repository would just get everything, which is also fine. Maybe a client
patch to make this better would be accepted.

> > (c) grant and enforce permission for certain users to submit _merges
> > only_ onto certain sub-portions of the "well-named branches"
>
> This is the wierd one. *** Why ***? There is nothing magical about
> merges; all a merge is a commit that contains more than one parent.
> You can put anything into a merge, and in theory the result of a merge
> could have nothing to do with either parent. It would be a very
> perverse merge, but it's certainly possible. So what's the point of
> trying to enforce rules about "merges only"?

I'll explain why I wrote this, but I admit it's a strange roundabout way to get
what I was hoping for. I hope there is a better way. One better way is to just
change the client, but I was hoping not to have to do that. let me explain..

Think about using CVS. user does "cvs up; hack hack hack; cvs commit (to
server)". In git, this workflow is "git pull; hack; commit; hack; commit; git
push (to server)". I want those interum "commits" to share the changes with the
server. I want to change this to "git pull; hack; commit-and-share; hack;
commit-and-share; git-push (to shared branch tag)"

It would be nice if "commit-and-share" could just use "git-push". However,
because users are going to do this habitually every commit, probably through a
script or merged command, I didn't want users who are accidentally working
directly in the master to accidentally fast-forward origin/master. (everyone
seems to discourage working on master anyhow). I was hoping to enforce this
only with server policy, so any git client works. That leaves me with the
challenge of figuring out which commits on origin/master are actually intended
to move the pointer, and which are accidents because someone forgot to branch
before hacking in their client. One simple way to do this is to require any
origin/master commit to have two children, one on the master, one somewhere
else. If you have a commit that is directly hanging off of master in this
design, you are doing the wrong thing. The server would tell you to "git
checkout master; git branch -b mymaster; git reset origin/master; git push".
This would put their local changes onto their private branch where they should
be. When they wanted to do the equivilant of "cvs commit;" or current "git
push;", they would do a merge to the master, and push again. The server would
allow it, because it sees the merge.

I recognize this is a bit strange. I'd love to have a better solution, but this
is the solution I can think of which only involves server enforcement. Other
solutions I thought of would all require client changes that would change
everyone's behavior. The candidate I liked best was: disallowing changes to
tracking branches, including master, probably by implicitly creating a branch
on commit to a tracking branch... However, I don't get the impression this will
fit into current git very well, because users would need to turn their current
"git push", into a "git merge master;git push"

I'm interested in other ideas to address this.

I know that all of what I wrote above seems strange if you don't buy into the
design assumptions. That it's critical to share a single server-repository,
that it's critical to have a shared 'well known' branch that only trusts
clients to add new changes to, etc.. However, these are important.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6@3N@FEDjCXZO>
@ 2008-06-25  5:20   ` David Jeske
  2008-06-25 19:17     ` Daniel Barkalow
  2008-06-25  5:20   ` David Jeske
  1 sibling, 1 reply; 28+ messages in thread
From: David Jeske @ 2008-06-25  5:20 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

-- Theodore Tso wrote:
> Up to here, you can do this all with repo.or.cz, and/or github; you
> just give each developer their own repository, which they are allowed
> to push to, and no once else. Within their own repository they can
> make changes to their branches, so that all works just fine.

Yup. That's one of the reasons git is so attractive. There is some good stuff
under "here" though....

> > (a) safely "share" every DAG, branch, and tag data in their
> > repository to a well-connected server, into an established
> > namespace, while only changing branches and tags in their
> > namespace. This will allow all users to see the changes of other
> > users, without needing direct access to their trees (which are
> > inaccessible behind firewalls). [1]
>
> Right, so thats github and/or git.or.cz. Each user gets his/her own
> repository, but thats a very minor change. Not a big deal.

...most notably, all their DAGs in a single repository to save space is
important. Thousands of copies of thousands of repositories adds up. Especially
when most of the users who want to commit something probably commit <1-10k of
unique stuff. Seems pretty easy to change though. git.or.cz and github will
both be wanting this eventually.

The other big one is ACLs in 'well named' repositories, so multiple people can
safely be allowed to add changes to them, without giving them ability to blow
away the repository. I can see this isn't the way all git users work, but at
least a few users working this way now with shared push repositories. This is
just making it 'safer'. Also seems pretty easy to do.

> > (b) fetch selected DAG, branch, and tag data of others to their tree, to
see
> > the changes of others (whether merged with head or not) while disconnected
or
> > remote.
>
> This is also easy; you just establish remote tracking branches. I
> have a single shell scripted command, git-get-all, which pulls from
> all of the repositories I am interested in into various remote
> tracking branches so while I am disconnected, I can see what other
> folks have done on their trees.

Yes, so I'd have the same thing, except instead of a remote repository, it
would be a pattern of the branch namespace, such as /origin/users/jeske/*. It
doesn't seem like the current remote tracking branch stuff can do this, but it
would be easy to provide a client wrapper that would. Users who tracked the
whole repository would just get everything, which is also fine. Maybe a client
patch to make this better would be accepted.

> > (c) grant and enforce permission for certain users to submit _merges
> > only_ onto certain sub-portions of the "well-named branches"
>
> This is the wierd one. *** Why ***? There is nothing magical about
> merges; all a merge is a commit that contains more than one parent.
> You can put anything into a merge, and in theory the result of a merge
> could have nothing to do with either parent. It would be a very
> perverse merge, but it's certainly possible. So what's the point of
> trying to enforce rules about "merges only"?

I'll explain why I wrote this, but I admit it's a strange roundabout way to get
what I was hoping for. I hope there is a better way. One better way is to just
change the client, but I was hoping not to have to do that. let me explain..

Think about using CVS. user does "cvs up; hack hack hack; cvs commit (to
server)". In git, this workflow is "git pull; hack; commit; hack; commit; git
push (to server)". I want those interum "commits" to share the changes with the
server. I want to change this to "git pull; hack; commit-and-share; hack;
commit-and-share; git-push (to shared branch tag)"

It would be nice if "commit-and-share" could just use "git-push". However,
because users are going to do this habitually every commit, probably through a
script or merged command, I didn't want users who are accidentally working
directly in the master to accidentally fast-forward origin/master. (everyone
seems to discourage working on master anyhow). I was hoping to enforce this
only with server policy, so any git client works. That leaves me with the
challenge of figuring out which commits on origin/master are actually intended
to move the pointer, and which are accidents because someone forgot to branch
before hacking in their client. One simple way to do this is to require any
origin/master commit to have two children, one on the master, one somewhere
else. If you have a commit that is directly hanging off of master in this
design, you are doing the wrong thing. The server would tell you to "git
checkout master; git branch -b mymaster; git reset origin/master; git push".
This would put their local changes onto their private branch where they should
be. When they wanted to do the equivilant of "cvs commit;" or current "git
push;", they would do a merge to the master, and push again. The server would
allow it, because it sees the merge.

I recognize this is a bit strange. I'd love to have a better solution, but this
is the solution I can think of which only involves server enforcement. Other
solutions I thought of would all require client changes that would change
everyone's behavior. The candidate I liked best was: disallowing changes to
tracking branches, including master, probably by implicitly creating a branch
on commit to a tracking branch... However, I don't get the impression this will
fit into current git very well, because users would need to turn their current
"git push", into a "git merge master;git push"

I'm interested in other ideas to address this.

I know that all of what I wrote above seems strange if you don't buy into the
design assumptions. That it's critical to share a single server-repository,
that it's critical to have a shared 'well known' branch that only trusts
clients to add new changes to, etc.. However, these are important.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25  5:20   ` David Jeske
@ 2008-06-25  9:30     ` Jakub Narebski
  0 siblings, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2008-06-25  9:30 UTC (permalink / raw)
  To: David Jeske; +Cc: Theodore Tso, git

"David Jeske" <jeske@willowmail.com> writes:
> -- Theodore Tso wrote:
> > ???
> > > 
> > > (a) safely "share" every DAG, branch, and tag data in their
> > > repository to a well-connected server, into an established
> > > namespace, while only changing branches and tags in their
> > > namespace. This will allow all users to see the changes of other
> > > users, without needing direct access to their trees (which are
> > > inaccessible behind firewalls). [1]
> >
> > Right, so thats github and/or git.or.cz. Each user gets his/her own
> > repository, but thats a very minor change. Not a big deal.
> 
> ...most notably, all their DAGs in a single repository to save space
> is important. Thousands of copies of thousands of repositories adds
> up. Especially when most of the users who want to commit something
> probably commit <1-10k of unique stuff. Seems pretty easy to change
> though. git.or.cz and github will both be wanting this eventually.

repo.or.cz has support for forks, i.e. sharing object database (for
old objects) via alternates, although it is not "common object
database" (as in, for example, $GIT_DIR/objects symlinked to single
common parent repository)

GitHub has also some support for "forks", but as it is closed source I
don't think anybody knows how it is done.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
@ 2008-06-25 13:34 Theodore Tso
  2008-06-25 17:34 ` Junio C Hamano
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6OB5yFEDjCYe3>
  0 siblings, 2 replies; 28+ messages in thread
From: Theodore Tso @ 2008-06-25 13:34 UTC (permalink / raw)
  To: David Jeske; +Cc: git

On Wed, Jun 25, 2008 at 05:20:49AM -0000, David Jeske wrote:
> The other big one is ACLs in 'well named' repositories, so multiple
> people can safely be allowed to add changes to them, without giving
> them ability to blow away the repository. I can see this isn't the
> way all git users work, but at least a few users working this way
> now with shared push repositories. This is just making it
> 'safer'. Also seems pretty easy to do.

So this isn't true security, since someone determined (or an ingenious
enough fool) can always blow away repository if you allow them to add
changes; they could just add a change which rm's all of the files,
yes?  You just want to prevent something stupid.

Well, as long as they don't do non-fast forward updates (i.e., they
never do something like: "git push publish +head:head", or any other
incantation involving a leading '+' in the refspec), they should be
pretty safe.  I don't see how they would do any damage just due to
user confusion.  So I think git is pretty safe as-is.

> > This is also easy; you just establish remote tracking branches. I
> > have a single shell scripted command, git-get-all, which pulls from
> > all of the repositories I am interested in into various remote
> > tracking branches so while I am disconnected, I can see what other
> > folks have done on their trees.
> 
> Yes, so I'd have the same thing, except instead of a remote
> repository, it would be a pattern of the branch namespace, such as
> /origin/users/jeske/*.

And the advantage of using branch namespaces instead of separate
remote repositories is.... ?  I don't see any....

> Think about using CVS. user does "cvs up; hack hack hack; cvs commit
> (to server)". In git, this workflow is "git pull; hack; commit;
> hack; commit; git push (to server)". I want those interum "commits"
> to share the changes with the server. I want to change this to "git
> pull; hack; commit-and-share; hack; commit-and-share; git-push (to
> shared branch tag)"

OK, so *why* is it a good idea to ask people to share their
in-progress work?  What's the upside?  Maybe if the idea is as backup
if people are working from their laptops, and they're about to travel
internationally or some such, but in general, sharing in-progress work
is highly overrated.

The other thing is in your design assumption is that remote
repositories are somehow expensive, when in fact they are very cheap;
use either repo.or.cz or github; they support repo sharing so there
isn't major cost to letting each developer having their own repository
to push to.

So the way I would do things is to simply encourage people to do start
their work by branching off of an up-to-date master branch, but *not*
do any git pulls or git pushes.  They can use git commit as necessary
to save interim work, and they do all of this work on a private
branch.  When they are done doing their work, they should review the
git commit points and make sure they make sense; in some cases they
may be better off squashing the commits down to a single commit, or
possibly refactoring their work so that each individual commit is
free-standing, so that their series of commits is git-bisectable
(i.e., after each commit the tree will fully compile and fully pass
the project regression test suite).

Once they have done *that*, they make sure the master branch has been
fully updated, and then do a git-rebase on their feature branch so
that it is up-to-date with respect to master, and then they do a full
build and regression test.  Then they switch back to the master
branch, and do a "git push publish" --- where <publish> is defined in
.git/config to be something like this:

[remote "publish"]
	url = ssh://master.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
	push = refs/heads/master:refs/heads/master

This will *only* push the master branch (and not any of the feature
branches), and it will not allow non-fast forward merges.  Hence, if
the user screwed up and accidentally made changes to the master branch
(say, an accidental git-rebase while on the master branch, or
something else bone-headed), the git push will fail.  This gives you
the safety you desire about not accidentally screwing up the master branch.

And you're done.  The only reason why you need a per-user repository
if you want some safety in terms of backups in case the work being
done on the laptop gets destroyed, but you can get that pretty much
for free via git.or.cz or github.  I really don't buy the sharing
argument, because if you are in the middle of implementing a feature,
it's generally not useful for others to look at your in-progress work.

> I know that all of what I wrote above seems strange if you don't buy into the
> design assumptions. That it's critical to share a single server-repository,
> that it's critical to have a shared 'well known' branch that only trusts
> clients to add new changes to, etc.. However, these are important.

Yep.  And you still haven't justified why it's critical to share a
single server repository.  ***Why*** is that important?

And when you have shared push repositories, as long as users don't use
the '+', in practice they can only add new changes.  And if you don't
trust them not to use the '+' character in refspecs, are you really
going to trust them not to introduce either bone-headed mistakes into
the code?  Or to "git rm" the wrong files, git commit them, and then
merge that into the repository?  If all you care about is avoiding the
accidentally stupid user mistakes, then putting in a convenience
default so that "git push publish" always does what you want should be
good enough.

So fundamentally, yeah, I think your primary problem is with the
design assumptions, which haven't been justified at all.

       		    	  	       		     - Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
@ 2008-06-25 14:03 Petr Baudis
  0 siblings, 0 replies; 28+ messages in thread
From: Petr Baudis @ 2008-06-25 14:03 UTC (permalink / raw)
  To: David Jeske; +Cc: git

  Hi,

On Wed, Jun 25, 2008 at 12:36:03AM -0000, David Jeske wrote:
> The purpose of this mechanism is to host a distributed source repository in a
> world where most most developer contributors are behind firewalls and do not
> have access to, or do not want to configure a unix server, ftp, or ssh to
> possibly contribute to a project. The model of allowing less-authoritative
> developers to make their changes available for more-authoritative users to pull
> is accepted as superior. However, no users are assumed to be authoritative over
> each-other, or an entire tree, and many users should have authority only to
> supply new deltas to their own branches. The ability to handle emailed patches
> is an asset, but is deemed too manual for this need.

  BTW, have you read about git-bundle(1)?

> design assumptions:
> 
> - all developers are firewalled and can not be "pulled" from directly.
> - there can be one or more well-connected servers which all users can access.
> - .. but which they cannot have ssh, ftp, or other dangerous access to
> - .. and whose protocol should be layered on http(s)

  Please note that we support pushing using the HTTP DAV extensions. It
seems to be only rarely used in practice though, since developers seem
to either work at sane companies, are tunneling through the firewalls or
the firewalls are adjusted if this is required for development of their
day-job applications. There are some cases where this is useful, but I
don't tihnk they are very numerous (in practicular, I've had more
requests (about three?) for git-cvsserver than for HTTP DAV (zero to
one?) at repo.or.cz). Do _you_ have any real large-scale scenario where
this is an actual issue?

> - there is a shared namespace for branches, and tags
> - .. users are not-trusted to change the branches or tags of other users
> - .. only certain users are trusted to change the shared origin branches
> - .. also allow directory ACLS on shared branch commits
> - all their DAGs should be in a single repository for space efficiency
> - users generally want to follow well-named branches
> - .. will be free to follow any branch, and pull changes from any branch

  Of course, if pushing through the DAV extensions this can get hairy;
if you allow push access for users, you better trust them since they can
touch the objects database. If you don't care about possible DoS attack
vectors, I assume you could configure refs permissions for various users
using some fancy Apache configuration.

  As previously noted though, I believe the space efficiency is not an
issue in real world. Are you familiar with Git's alternates? In a Git
repository, you can specify alternate locations for searching objects,
so you can create a "sub-repository" for each user, where an alternate
is set up pointing to the object database of the main project
repository. Then, the bulk of the objects will be in the main repository
and the sub-repositories will carry only tiny amount of objects specific
to the local development of the given person.

  This is exactly how (and large reason of why) the repo.or.cz forks are
set up, by the way.

-- 
				Petr "Pasky" Baudis
The last good thing written in C++ was the Pachelbel Canon. -- J. Olson

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25 13:34 Theodore Tso
@ 2008-06-25 17:34 ` Junio C Hamano
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6OB5yFEDjCYe3>
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2008-06-25 17:34 UTC (permalink / raw)
  To: Theodore Tso; +Cc: David Jeske, git

Theodore Tso <tytso@mit.edu> writes:

> And when you have shared push repositories, as long as users don't use
> the '+', in practice they can only add new changes.  And if you don't
> trust them not to use the '+' character in refspecs, are you really
> going to trust them not to introduce either bone-headed mistakes into
> the code?

Well, if you do not trust them, just set receive.denynonfastforwards
and they won't be able to.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25  5:20   ` David Jeske
@ 2008-06-25 19:17     ` Daniel Barkalow
  2008-06-25 20:12       ` Raimund Bauer
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Barkalow @ 2008-06-25 19:17 UTC (permalink / raw)
  To: David Jeske; +Cc: Theodore Tso, git

On Wed, 25 Jun 2008, David Jeske wrote:

> Yes, so I'd have the same thing, except instead of a remote repository, it
> would be a pattern of the branch namespace, such as /origin/users/jeske/*. It
> doesn't seem like the current remote tracking branch stuff can do this, but it
> would be easy to provide a client wrapper that would. Users who tracked the
> whole repository would just get everything, which is also fine. Maybe a client
> patch to make this better would be accepted.

Git actually has good support for large numbers of repositories sharing 
the same object storage. It's actually more efficient (in terms of 
server load) to have thousands of repositories with the same contents than 
one repository with thousands of branches.

> > > (c) grant and enforce permission for certain users to submit _merges
> > > only_ onto certain sub-portions of the "well-named branches"
> >
> > This is the wierd one. *** Why ***? There is nothing magical about
> > merges; all a merge is a commit that contains more than one parent.
> > You can put anything into a merge, and in theory the result of a merge
> > could have nothing to do with either parent. It would be a very
> > perverse merge, but it's certainly possible. So what's the point of
> > trying to enforce rules about "merges only"?
> 
> I'll explain why I wrote this, but I admit it's a strange roundabout way to get
> what I was hoping for. I hope there is a better way. One better way is to just
> change the client, but I was hoping not to have to do that. let me explain..
> 
> Think about using CVS. user does "cvs up; hack hack hack; cvs commit (to
> server)". In git, this workflow is "git pull; hack; commit; hack; commit; git
> push (to server)". I want those interum "commits" to share the changes with the
> server. I want to change this to "git pull; hack; commit-and-share; hack;
> commit-and-share; git-push (to shared branch tag)"
> 
> It would be nice if "commit-and-share" could just use "git-push". However,
> because users are going to do this habitually every commit, probably through a
> script or merged command, I didn't want users who are accidentally working
> directly in the master to accidentally fast-forward origin/master. (everyone
> seems to discourage working on master anyhow). I was hoping to enforce this
> only with server policy, so any git client works. That leaves me with the
> challenge of figuring out which commits on origin/master are actually intended
> to move the pointer, and which are accidents because someone forgot to branch
> before hacking in their client. One simple way to do this is to require any
> origin/master commit to have two children, one on the master, one somewhere
> else. If you have a commit that is directly hanging off of master in this
> design, you are doing the wrong thing. The server would tell you to "git
> checkout master; git branch -b mymaster; git reset origin/master; git push".
> This would put their local changes onto their private branch where they should
> be. When they wanted to do the equivilant of "cvs commit;" or current "git
> push;", they would do a merge to the master, and push again. The server would
> allow it, because it sees the merge.

You have a fundamental misconception about git's data model. A commit 
doesn't have a particular branch it is on. There is only the DAG, where 
each node is a commit that is structured identically to all of the other 
commits. Branches pick out particular nodes in the DAG at particular 
times.

You can even think of there being a single theoretical universal DAG, 
independant of the actual development that gets done, and developers work 
to find the interesting portions, which are ones that contain trees that 
contain working code and useful messages and history that is informative. 
And they use branches to hold references to worthwhile parts of the DAG, 
and not (as in systems like SVN) to partition the DAG, which makes no 
reference to branches.

It therefore doesn't make any sense to ask if a commit is directly hanging 
off of master. If your local branch is up to date, and you commit, your 
commit's parent is the current master. If you now check out master and 
merge your local branch, master gets the same (non-merge) commit.

> I recognize this is a bit strange. I'd love to have a better solution, but this
> is the solution I can think of which only involves server enforcement.

You fundamentally can't do what you want with only server enforcement, 
because git doesn't provide the history of what local operations were used 
to prepare to ask the server to change something. It fundamentally can't, 
because there's no room in its data model of changes to hold that, and 
because its design is to allow flexibility in this preparation.

> Other solutions I thought of would all require client changes that would 
> change everyone's behavior. The candidate I liked best was: disallowing 
> changes to tracking branches, including master, probably by implicitly 
> creating a branch on commit to a tracking branch... However, I don't get 
> the impression this will fit into current git very well, because users 
> would need to turn their current "git push", into a "git merge 
> master;git push"

Git prevents you from committing to tracking branches at all. Any branch 
you can commit to is inherently a local branch, because that's what it 
means for a branch to be local. The "push" operation updates a remote 
branch from a local branch.

Now, what might be good would be to introduce a type of ref that you can 
update with "merge" but not with "commit". Of course, this has to be 
client-side, because the final state doesn't depend on whether you commit 
in a temporary branch and merge into a publishing branch or commit 
directory in the publishing branch.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6OB5yFEDjCYe3>
@ 2008-06-25 19:37   ` David Jeske
  2008-06-25 20:52     ` Jakub Narebski
  2008-06-25 20:54     ` Jakub Narebski
  2008-06-25 19:37   ` David Jeske
  1 sibling, 2 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25 19:37 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

Thanks for the info about shared object storage for shared repositories. That's
great, and looks like a good implementation method.

Previously I was thinking in terms of making a different server to change
behavior. However, I think the comments I've read are shifting my mindset
towards making a client-wrapper. I want to provide a system [wrapper] without
the user-burden of thinking about three repositories (local, my-public,
shared-public). Doing this as a wrapper has other benefits, like the fact that
users can treat services like repo.or.cz as the "networked filesystem of their
version control system", so I like it.

I have a model for the operations of this wrapper below.

-- Theodore Tso wrote:
> [snip] sharing in-progress work is highly overrated.

_Seeing_ unfinished changes is overrated. However, so is managing multiple
repositories and managing which data is shared.

I think my new wrapper approach below eliminates this overly-aggressive sharing
while still reducing complexity for the average user.

> So the way I would do things is to simply encourage people to do start
> their work by branching off of an up-to-date master branch, but *not*
> do any git pulls or git pushes.

You confused me here. If their repo.or.cz private repository is their only way
of sharing (because their home directory is inaccessible and emailing patches
is cumbersome), how do they exchange their own changes without pushing? Even in
a short time on git mailing list I see mini-unfinished-patches being posted.

> [ description of commit rewriting, rebase, push ]

The method you describe is burdening all users with learning a bunch of new
concepts to do things that are unnecessary micromanagement for their needs. I'd
prefer to give my users many of the benefits of DVCS/git with a
command/argument set 1/20th the size and a much simpler mental model.

Most of the software we're all using was developed while working with
centralized source control, where people just hack and commit and those commits
are not even known-working. They don't bother with patch/commit rewriting and
management, and it works out just fine. I can see how that finer granularity
may be valuable for linux kernel coordinators. However, most projects don't
need to bother with all that, and even in the ones that do, most of their
contributors don't.

Despite the success of centralized revision control, distributed source control
revision models have some very attractive features which can add efficiency to
a shared-central-repo model without straying far from the familiar (cvs up;
hack; hack; cvs up; cvs commit;) workflow. I read some commentary from Linus
that compared git to a 'filesystem', and that's what I see.. a really awesome
underlying set of mechanisms for implementing SCM.. I'm trying to understand
how to layer an easy to use SCM system on top if it.

Some 'git' users might say the right thing to do is do a different project, but
I think, just like with the filesystem-analogy, there is significant benefit to
sharing a single repository model so a simple source control system can then be
used in powerful ways by powerful users. This is similar to the direction "eg"
(easy git) is heading, but more extreme and extending to the server.

In fact, it seems like we might be better off if all of these source control
user-interfaces (cvs, perforce, git, eg, mercurial, etc. etc.) could be written
on top of a version-control-api that they shared. Witness the similar
implementation strategies of this modern rash of DVCS systems.

--------------------------------------------------------------

I'll try to explain my wrapper model in terms of an example... Imagine I'm
going to deliver a "cvs drop in replacement", ncvs, that mostly keeps the cvs
mental model, but is implemented underneath using git and just works better
than cvs (yet is simpler than git). I'll use the exact cvs command parameters
for illustration, but I wouldn't plan to do this. Notice how each ncvs command
uses many git commands. It's possible these things should be done in terms of
plumbing instead of porcelain to reduce dependence on git changes, but it's
more concise to express them as porcelain.

>From the earlier feedback, there are now two repositories, one is considered
the "shared-root" while the other is the "user" repository.

(1) make "cvs update" safe, make it easy to see granular comments for things
you have not pushed

CVS users do potentially destructive merges all the time. Despite the way we
use terminology, working files ARE a branch, and "cvs up" IS a merge. That
merge can require edits to resolve, and after those edits are complete, the
previous state is NOT recoverable. There is no reason for this. We can easily
save the delta by just making "cvs up" equal "git commit; git pull;", or
alternately, "git stash; git pull; git apply;".

: "ncvs up" ->
:
: git stash; git pull; git apply;
: git diff --stat <baseof:current branch> - un-pushed filenames
: git-show-branch <current branch> - un-pushed comments

Question: when I say "baseof:current branch", I mean "the common-ancestor
between my local-repo tracking branch and the remote-repo branch it's
tracking". How do I find that out?

Adding "git diff --stat <baseof:current branch>" helps keep us aware of what
changes are in our local repo. Any files not pushed up to the branch head on
the server are seen. Likewise with "git-show-branch <current branch>" (which
somehow is not the same as git-show-branch --current).

(2) make "planned ahead of time" branches cheap to make

"cvs up" is the easiest merge in cvs, therefore, separate sets of checked out
working files become the most common form of branching in cvs. They are
basically personal work branches that you can't commit on, and can't
collaborate on. I've seen developers with cvs working directories weeks or
months old because that's an easier way to work on different ideas than
creating a branch and checking them in. DVCS fixes this, by making branches
cheap to make, and by making all branch merges closer to the simplicity of
cvs's easy branch merge "cvs up". However, I don't need to burden the user with
the extra complexity and workload of the default being local branches, which
they then need to do more work to share. I want branches to be shared by
default.

: "ncvs tag -b --shared $branch" ->
:
: [ create a branch on the "shared root" repo, pointing
:   to where I am in my local tree, if I have permission ]
:  git branch --track $branch origin/$branch

: "ncvs tag -b mybranch
:
: [ create a branch on my "user" repo, pointing to where I am
in my local tree, if I have permission ]
: git branch --track $mybranch my-origin/$branch

Question: I'm not sure what commands to use above. How do I create a branch on
a remote repo when I'm on my local machine, without sshing to it?

The advantages of git's repository over cvs's repository in this use-case are
not created because the branch is on the local machine. In fact, we also
created it on the server. The benefit comes from the git revision storage model
being faster and BETTER.

Then to switch our working pointer to this branch, we might do:

: "ncvs up -r mybranch" ->
:
: git stash; git checkout mybranch; git pull;
: git stash show --relevant --recent;

Our "safe update" automatically saved away any local directory changes before
switching off to the branch (if there were any). Our "stash show" is there
always to show us if any stashes hang off a recent parent of the tree we just
switched to, but it only shows them if they are hanging off this tree, and only
if they are recent. If there is, we might want to look at or grab it, or we
might just ignore it and not care.

(3) allow users to commit their 'final' changes to others (only on the branch
they are on)

: "ncvs commit" -> "git commit; git push <only this branch>;"

Question: how do I only push the branch I'm on? "eg" says it does this, but
from a quick look at the code, it wasn't obvious to me how.

Developers who are plenty happy with their existing model of never saving local
changes, can continue doing what they are doing. This makes the ability to save
local changes an added benefit to the users like me that want to do it, instead
of an extra burden to the other users. It also simplifies the issue of which
changes are pushed to the server and which are not, because pushing is managed
by "git push <only this branch>", not by creating and managing local and remote
branch names separately. (easy git took the same approach with push)

(4) Allow users to save interim changes, without ahead of time planning, ahead
of time nameing, and hopefully, without naming at all.

Saving interim changes in a cvs working tree before merging with head is not
cheap. Making my own branch tag isn't too hard, but it takes a long time on a
big tree. Ironically, perforce made branching mechanism faster while making the
cognitive load of branch hing much higher.

: "ncvs save" -> "git commit -a"
:
: "ncvs stash [$name]" ->
:
: $currentbranch = `git branch`
: $base-ish = '<baseof: current branch>'
: git stash;
: git branch -m $currentbranch $name;
: git checkout $baseish;
: git branch $currentbranch

This "ncvs stash" is acknowledging the value of the "git stash" idea, while
also recognizing that when I'm using "git commit" regularly, I don't have
anything in the working set! I really want to stash the changes made since
"origin/<branchname>" and return there with my local <branchname>. This is
really after the fact branch creation. If no $name is supplied, then it can
auto-generate one like stash does.

(5) make it obvious there is a difference between local and remote changes, but
make it easy to diff against remote before "ncvs commit;"

: "ncvs diff" ->
:
: echo -n "since commit(-C): "  \
:   `git diff --shortstat <baseof:current branch>`; \
:   echo
: echo -n "since save(-S): " \
:   `git diff`; echo
:
: "ncvs diff -S" -> "git diff"
: "ncvs diff -C" -> "git diff <baseof:current branch>
--------------------------------------------------------------

I'm primarily trying to understand how to map my model to git.
Continued thanks for the discussion and help.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6OB5yFEDjCYe3>
  2008-06-25 19:37   ` David Jeske
@ 2008-06-25 19:37   ` David Jeske
       [not found]     ` <willow-jeske-01l6@3PlFEDjCVAh-01l6XqjPFEDjCY6P>
  1 sibling, 1 reply; 28+ messages in thread
From: David Jeske @ 2008-06-25 19:37 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

Thanks for the info about shared object storage for shared repositories. That's
great, and looks like a good implementation method.

Previously I was thinking in terms of making a different server to change
behavior. However, I think the comments I've read are shifting my mindset
towards making a client-wrapper. I want to provide a system [wrapper] without
the user-burden of thinking about three repositories (local, my-public,
shared-public). Doing this as a wrapper has other benefits, like the fact that
users can treat services like repo.or.cz as the "networked filesystem of their
version control system", so I like it.

I have a model for the operations of this wrapper below.

-- Theodore Tso wrote:
> [snip] sharing in-progress work is highly overrated.

_Seeing_ unfinished changes is overrated. However, so is managing multiple
repositories and managing which data is shared.

I think my new wrapper approach below eliminates this overly-aggressive sharing
while still reducing complexity for the average user.

> So the way I would do things is to simply encourage people to do start
> their work by branching off of an up-to-date master branch, but *not*
> do any git pulls or git pushes.

You confused me here. If their repo.or.cz private repository is their only way
of sharing (because their home directory is inaccessible and emailing patches
is cumbersome), how do they exchange their own changes without pushing? Even in
a short time on git mailing list I see mini-unfinished-patches being posted.

> [ description of commit rewriting, rebase, push ]

The method you describe is burdening all users with learning a bunch of new
concepts to do things that are unnecessary micromanagement for their needs. I'd
prefer to give my users many of the benefits of DVCS/git with a
command/argument set 1/20th the size and a much simpler mental model.

Most of the software we're all using was developed while working with
centralized source control, where people just hack and commit and those commits
are not even known-working. They don't bother with patch/commit rewriting and
management, and it works out just fine. I can see how that finer granularity
may be valuable for linux kernel coordinators. However, most projects don't
need to bother with all that, and even in the ones that do, most of their
contributors don't.

Despite the success of centralized revision control, distributed source control
revision models have some very attractive features which can add efficiency to
a shared-central-repo model without straying far from the familiar (cvs up;
hack; hack; cvs up; cvs commit;) workflow. I read some commentary from Linus
that compared git to a 'filesystem', and that's what I see.. a really awesome
underlying set of mechanisms for implementing SCM.. I'm trying to understand
how to layer an easy to use SCM system on top if it.

Some 'git' users might say the right thing to do is do a different project, but
I think, just like with the filesystem-analogy, there is significant benefit to
sharing a single repository model so a simple source control system can then be
used in powerful ways by powerful users. This is similar to the direction "eg"
(easy git) is heading, but more extreme and extending to the server.

In fact, it seems like we might be better off if all of these source control
user-interfaces (cvs, perforce, git, eg, mercurial, etc. etc.) could be written
on top of a version-control-api that they shared. Witness the similar
implementation strategies of this modern rash of DVCS systems.

--------------------------------------------------------------

I'll try to explain my wrapper model in terms of an example... Imagine I'm
going to deliver a "cvs drop in replacement", ncvs, that mostly keeps the cvs
mental model, but is implemented underneath using git and just works better
than cvs (yet is simpler than git). I'll use the exact cvs command parameters
for illustration, but I wouldn't plan to do this. Notice how each ncvs command
uses many git commands. It's possible these things should be done in terms of
plumbing instead of porcelain to reduce dependence on git changes, but it's
more concise to express them as porcelain.

>From the earlier feedback, there are now two repositories, one is considered
the "shared-root" while the other is the "user" repository.

(1) make "cvs update" safe, make it easy to see granular comments for things
you have not pushed

CVS users do potentially destructive merges all the time. Despite the way we
use terminology, working files ARE a branch, and "cvs up" IS a merge. That
merge can require edits to resolve, and after those edits are complete, the
previous state is NOT recoverable. There is no reason for this. We can easily
save the delta by just making "cvs up" equal "git commit; git pull;", or
alternately, "git stash; git pull; git apply;".

: "ncvs up" ->
:
: git stash; git pull; git apply;
: git diff --stat <baseof:current branch> - un-pushed filenames
: git-show-branch <current branch> - un-pushed comments

Question: when I say "baseof:current branch", I mean "the common-ancestor
between my local-repo tracking branch and the remote-repo branch it's
tracking". How do I find that out?

Adding "git diff --stat <baseof:current branch>" helps keep us aware of what
changes are in our local repo. Any files not pushed up to the branch head on
the server are seen. Likewise with "git-show-branch <current branch>" (which
somehow is not the same as git-show-branch --current).

(2) make "planned ahead of time" branches cheap to make

"cvs up" is the easiest merge in cvs, therefore, separate sets of checked out
working files become the most common form of branching in cvs. They are
basically personal work branches that you can't commit on, and can't
collaborate on. I've seen developers with cvs working directories weeks or
months old because that's an easier way to work on different ideas than
creating a branch and checking them in. DVCS fixes this, by making branches
cheap to make, and by making all branch merges closer to the simplicity of
cvs's easy branch merge "cvs up". However, I don't need to burden the user with
the extra complexity and workload of the default being local branches, which
they then need to do more work to share. I want branches to be shared by
default.

: "ncvs tag -b --shared $branch" ->
:
: [ create a branch on the "shared root" repo, pointing
:   to where I am in my local tree, if I have permission ]
:  git branch --track $branch origin/$branch

: "ncvs tag -b mybranch
:
: [ create a branch on my "user" repo, pointing to where I am
in my local tree, if I have permission ]
: git branch --track $mybranch my-origin/$branch

Question: I'm not sure what commands to use above. How do I create a branch on
a remote repo when I'm on my local machine, without sshing to it?

The advantages of git's repository over cvs's repository in this use-case are
not created because the branch is on the local machine. In fact, we also
created it on the server. The benefit comes from the git revision storage model
being faster and BETTER.

Then to switch our working pointer to this branch, we might do:

: "ncvs up -r mybranch" ->
:
: git stash; git checkout mybranch; git pull;
: git stash show --relevant --recent;

Our "safe update" automatically saved away any local directory changes before
switching off to the branch (if there were any). Our "stash show" is there
always to show us if any stashes hang off a recent parent of the tree we just
switched to, but it only shows them if they are hanging off this tree, and only
if they are recent. If there is, we might want to look at or grab it, or we
might just ignore it and not care.

(3) allow users to commit their 'final' changes to others (only on the branch
they are on)

: "ncvs commit" -> "git commit; git push <only this branch>;"

Question: how do I only push the branch I'm on? "eg" says it does this, but
from a quick look at the code, it wasn't obvious to me how.

Developers who are plenty happy with their existing model of never saving local
changes, can continue doing what they are doing. This makes the ability to save
local changes an added benefit to the users like me that want to do it, instead
of an extra burden to the other users. It also simplifies the issue of which
changes are pushed to the server and which are not, because pushing is managed
by "git push <only this branch>", not by creating and managing local and remote
branch names separately. (easy git took the same approach with push)

(4) Allow users to save interim changes, without ahead of time planning, ahead
of time nameing, and hopefully, without naming at all.

Saving interim changes in a cvs working tree before merging with head is not
cheap. Making my own branch tag isn't too hard, but it takes a long time on a
big tree. Ironically, perforce made branching mechanism faster while making the
cognitive load of branch hing much higher.

: "ncvs save" -> "git commit -a"
:
: "ncvs stash [$name]" ->
:
: $currentbranch = `git branch`
: $base-ish = '<baseof: current branch>'
: git stash;
: git branch -m $currentbranch $name;
: git checkout $baseish;
: git branch $currentbranch

This "ncvs stash" is acknowledging the value of the "git stash" idea, while
also recognizing that when I'm using "git commit" regularly, I don't have
anything in the working set! I really want to stash the changes made since
"origin/<branchname>" and return there with my local <branchname>. This is
really after the fact branch creation. If no $name is supplied, then it can
auto-generate one like stash does.

(5) make it obvious there is a difference between local and remote changes, but
make it easy to diff against remote before "ncvs commit;"

: "ncvs diff" ->
:
: echo -n "since commit(-C): "  \
:   `git diff --shortstat <baseof:current branch>`; \
:   echo
: echo -n "since save(-S): " \
:   `git diff`; echo
:
: "ncvs diff -S" -> "git diff"
: "ncvs diff -C" -> "git diff <baseof:current branch>
--------------------------------------------------------------

I'm primarily trying to understand how to map my model to git.
Continued thanks for the discussion and help.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25 19:17     ` Daniel Barkalow
@ 2008-06-25 20:12       ` Raimund Bauer
  0 siblings, 0 replies; 28+ messages in thread
From: Raimund Bauer @ 2008-06-25 20:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: David Jeske, Theodore Tso, git

On Wed, 2008-06-25 at 15:17 -0400, Daniel Barkalow wrote:

> You have a fundamental misconception about git's data model. A commit 
> doesn't have a particular branch it is on. There is only the DAG, where 
> each node is a commit that is structured identically to all of the other 
> commits. Branches pick out particular nodes in the DAG at particular 
> times.

But a branch in repository also has a local history. The ref-log.
And git could use that to produce a distributed branch-history.

<wishful thinking>

A developer prepares a series of commits in a local branch to push to
the server.
On the server the ref-log of a branch gets updated with a new entry for
each push, and other developers pulling from the server get the servers
ref-log as ref-log of their remote tracking branch and can see the
push-points there.

Those push-points seem to be somehow more important than other commits -
there was a reason for the first developer to push right this branch
tip, right?
Seems like valuable (optional) information to me.

</wishful thinking>

> It therefore doesn't make any sense to ask if a commit is directly hanging 
> off of master. If your local branch is up to date, and you commit, your 
> commit's parent is the current master. If you now check out master and 
> merge your local branch, master gets the same (non-merge) commit.

Check if the commit is in master's ref-log?

regards,
Ray

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25 19:37   ` David Jeske
@ 2008-06-25 20:52     ` Jakub Narebski
  2008-06-25 20:54     ` Jakub Narebski
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2008-06-25 20:52 UTC (permalink / raw)
  To: git

<opublikowany i wysłany>

David Jeske wrote:

> Question: when I say "baseof:current branch", I mean "the common-ancestor
> between my local-repo tracking branch and the remote-repo branch it's
> tracking". How do I find that out?

git-merge-base

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25 19:37   ` David Jeske
  2008-06-25 20:52     ` Jakub Narebski
@ 2008-06-25 20:54     ` Jakub Narebski
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2008-06-25 20:54 UTC (permalink / raw)
  To: git

David Jeske wrote:

> Question: how do I only push the branch I'm on? "eg" says it does this, but
> from a quick look at the code, it wasn't obvious to me how.

git push <remote> HEAD   # with current enough git

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found]     ` <willow-jeske-01l6@3PlFEDjCVAh-01l6XqjPFEDjCY6P>
@ 2008-06-25 21:34       ` David Jeske
  2008-06-25 22:10         ` Jakub Narebski
  2008-06-25 22:13         ` Junio C Hamano
  2008-06-25 21:34       ` David Jeske
  1 sibling, 2 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25 21:34 UTC (permalink / raw)
  To: David Jeske; +Cc: Theodore Tso, git

Some answers thanks to Jakub...

-- David Jeske wrote:
> : "ncvs up" ->
> :
> : git stash; git pull; git apply;
> : git diff --stat <baseof:current branch> - un-pushed filenames
> : git-show-branch <current branch> - un-pushed comments
>
> Question: when I say "baseof:current branch", I mean "the common-ancestor
> between my local-repo tracking branch and the remote-repo branch it's
> tracking". How do I find that out?

I'm told I need...

git diff --stat `git-merge-base HEAD ORIG_HEAD`

> : "ncvs commit" -> "git commit; git push <only this branch>;"
>
> Question: how do I only push the branch I'm on? "eg" says it does this, but
> from a quick look at the code, it wasn't obvious to me how.

and...

git push HEAD


which just leaves this one....

Question: How do I create a branch on a remote repo when I'm on
my local machine, without sshing to it?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found]     ` <willow-jeske-01l6@3PlFEDjCVAh-01l6XqjPFEDjCY6P>
  2008-06-25 21:34       ` David Jeske
@ 2008-06-25 21:34       ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25 21:34 UTC (permalink / raw)
  To: David Jeske; +Cc: Theodore Tso, git

Some answers thanks to Jakub...

-- David Jeske wrote:
> : "ncvs up" ->
> :
> : git stash; git pull; git apply;
> : git diff --stat <baseof:current branch> - un-pushed filenames
> : git-show-branch <current branch> - un-pushed comments
>
> Question: when I say "baseof:current branch", I mean "the common-ancestor
> between my local-repo tracking branch and the remote-repo branch it's
> tracking". How do I find that out?

I'm told I need...

git diff --stat `git-merge-base HEAD ORIG_HEAD`

> : "ncvs commit" -> "git commit; git push <only this branch>;"
>
> Question: how do I only push the branch I'm on? "eg" says it does this, but
> from a quick look at the code, it wasn't obvious to me how.

and...

git push HEAD


which just leaves this one....

Question: How do I create a branch on a remote repo when I'm on
my local machine, without sshing to it?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25 21:34       ` David Jeske
@ 2008-06-25 22:10         ` Jakub Narebski
  2008-06-25 22:13         ` Junio C Hamano
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2008-06-25 22:10 UTC (permalink / raw)
  To: git

David Jeske wrote:

> Question: How do I create a branch on a remote repo when I'm on
> my local machine, without sshing to it?

Push into it.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-25 21:34       ` David Jeske
  2008-06-25 22:10         ` Jakub Narebski
@ 2008-06-25 22:13         ` Junio C Hamano
       [not found]           ` <willow-jeske-01l6@3PlFEDjCVAh-01l6[3InFEDjC[dy>
  1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2008-06-25 22:13 UTC (permalink / raw)
  To: David Jeske; +Cc: Theodore Tso, git

"David Jeske" <jeske@willowmail.com> writes:

>> : "ncvs up" ->
>> :
>> : git stash; git pull; git apply;

First of all, if you are in CVS mindset, you may not want to necessarily
do "git pull", but "git fetch" followed by "git rebase".

I suspect the last one in the above sequence of yours is "git stash pop".
Definitely not "git apply" without any argument which is a no-op.

>> : git diff --stat <baseof:current branch> - un-pushed filenames

"git diff [--options] origin..." (three-dots) is often used.  This is a
shorthand for:

	git diff [--options] $(git merge-base origin HEAD) HEAD

that is, "show me what I did since I forked from origin".

>> : git-show-branch <current branch> - un-pushed comments

This would be useful if you are using "fetch + rebase", but in any case

	git log --graph --pretty=oneline origin..

may be prettier these days.  --graph is a recent invention that appeared
first in 1.5.6.

> Question: How do I create a branch on a remote repo when I'm on
> my local machine, without sshing to it?

I hope that the question is not "How do I do anything on a remote without
having any network connection to it" as its answer cannot be anything but
"telepathy".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found]           ` <willow-jeske-01l6@3PlFEDjCVAh-01l6[3InFEDjC[dy>
  2008-06-25 23:03             ` David Jeske
@ 2008-06-25 23:03             ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25 23:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, git

-- Junio C Hamano wrote:
> >> : "ncvs up" ->
> >> :
> >> : git stash; git pull; git apply;
>
> First of all, if you are in CVS mindset, you may not want to necessarily
> do "git pull", but "git fetch" followed by "git rebase".

I don't want to replicate CVS behavior, just the workflow. I've considered
rebase, but the diagrams on the documentation page look scarry. I want to keep
the dag-nodes made by their local git commit;. At those commits the code worked
and tested in their tree. rebase looks like it tosses those dag-nodes when it
rewrites the diffs -- who knows if the tests actually pass for every point
along that new rebase. That's no good.

I can see the use of rebase when your job is to "author an understandable
public source tree", but I'm working on SCM, where the goal is to be able to
reproduce the state of past successes reliably.

I want someone to be able to checkout what was actually in the user's local
client as they were working. Which means I think I want "fetch and merge" which
is pull. Did I get that wrong?

> I suspect the last one in the above sequence of yours is "git stash pop".
> Definitely not "git apply" without any argument which is a no-op.

I meant to type "git stash apply", but I think you're right, pop is what I
wanted.

> >> : git diff --stat <baseof:current branch> - un-pushed filenames
>
> "git diff [--options] origin..." (three-dots) is often used. This is a
> shorthand for:
>
> git diff [--options] $(git merge-base origin HEAD) HEAD
>
> that is, "show me what I did since I forked from origin".

I'm still a little foggy on the remote referenecs, but remember I have two
remotes (shared) and (personal). Something in the docs led me to believe
'origin' was repository wide, not private to each branch. Is "origin" a magic
name for the current branch's target?

> >> : git-show-branch <current branch> - un-pushed comments
>
> This would be useful if you are using "fetch + rebase", but in any case
>
> git log --graph --pretty=oneline origin..

Ahh, yes, Thanks!. How does this interact with the "pull" I just did?

What I want is "show me the commit messages (and sha1 keys) for changes in my
local branch that are not yet submitted to it's remote tracking location"

Will that command above include the commit lines that came down in my pull
(fetch/merge)? If so, how do I not include them?

> > Question: How do I create a branch on a remote repo when I'm on
> > my local machine, without sshing to it?
>
> I hope that the question is not "How do I do anything on a remote without
> having any network connection to it" as its answer cannot be anything but
> "telepathy".

Funny. I'm asking how I can run a command locally, that during the next "git
push HEAD" will cause a branch to be created on a remote repository, without
assuming that is the same repository that my current branch is pointing to.
Will this do the trick?

git branch --track mynewbranch git://myserver/path/foo.git
# hack hack
git commit
git push HEAD

- David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found]           ` <willow-jeske-01l6@3PlFEDjCVAh-01l6[3InFEDjC[dy>
@ 2008-06-25 23:03             ` David Jeske
  2008-06-25 23:03             ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-25 23:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Theodore Tso, git

-- Junio C Hamano wrote:
> >> : "ncvs up" ->
> >> :
> >> : git stash; git pull; git apply;
>
> First of all, if you are in CVS mindset, you may not want to necessarily
> do "git pull", but "git fetch" followed by "git rebase".

I don't want to replicate CVS behavior, just the workflow. I've considered
rebase, but the diagrams on the documentation page look scarry. I want to keep
the dag-nodes made by their local git commit;. At those commits the code worked
and tested in their tree. rebase looks like it tosses those dag-nodes when it
rewrites the diffs -- who knows if the tests actually pass for every point
along that new rebase. That's no good.

I can see the use of rebase when your job is to "author an understandable
public source tree", but I'm working on SCM, where the goal is to be able to
reproduce the state of past successes reliably.

I want someone to be able to checkout what was actually in the user's local
client as they were working. Which means I think I want "fetch and merge" which
is pull. Did I get that wrong?

> I suspect the last one in the above sequence of yours is "git stash pop".
> Definitely not "git apply" without any argument which is a no-op.

I meant to type "git stash apply", but I think you're right, pop is what I
wanted.

> >> : git diff --stat <baseof:current branch> - un-pushed filenames
>
> "git diff [--options] origin..." (three-dots) is often used. This is a
> shorthand for:
>
> git diff [--options] $(git merge-base origin HEAD) HEAD
>
> that is, "show me what I did since I forked from origin".

I'm still a little foggy on the remote referenecs, but remember I have two
remotes (shared) and (personal). Something in the docs led me to believe
'origin' was repository wide, not private to each branch. Is "origin" a magic
name for the current branch's target?

> >> : git-show-branch <current branch> - un-pushed comments
>
> This would be useful if you are using "fetch + rebase", but in any case
>
> git log --graph --pretty=oneline origin..

Ahh, yes, Thanks!. How does this interact with the "pull" I just did?

What I want is "show me the commit messages (and sha1 keys) for changes in my
local branch that are not yet submitted to it's remote tracking location"

Will that command above include the commit lines that came down in my pull
(fetch/merge)? If so, how do I not include them?

> > Question: How do I create a branch on a remote repo when I'm on
> > my local machine, without sshing to it?
>
> I hope that the question is not "How do I do anything on a remote without
> having any network connection to it" as its answer cannot be anything but
> "telepathy".

Funny. I'm asking how I can run a command locally, that during the next "git
push HEAD" will cause a branch to be created on a remote repository, without
assuming that is the same repository that my current branch is pointing to.
Will this do the trick?

git branch --track mynewbranch git://myserver/path/foo.git
# hack hack
git commit
git push HEAD

- David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
@ 2008-06-26  5:23 Theodore Tso
  2008-06-26  5:26 ` Junio C Hamano
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6it3ZFEDjCd5X>
  0 siblings, 2 replies; 28+ messages in thread
From: Theodore Tso @ 2008-06-26  5:23 UTC (permalink / raw)
  To: David Jeske; +Cc: Junio C Hamano, git

On Wed, Jun 25, 2008 at 11:03:02PM -0000, David Jeske wrote:
> I don't want to replicate CVS behavior, just the workflow.

It's not clear exactly what you want.  If you want the CVS workflow
(with all of its downsides), then just use "git pull; hack hack hack;
git push" all on the master branch.  If you are going to preserve the
workflow of CVS, then you're also going to preserve all of the
downsides of CVS.  If you aren't willing to make the users learn
anything new, then what's the point?

And if you are willing to make the users change their behaviour a
somewhat -- how much change are you willing to make them deviate from
the CVS workflow, and how much smarts are you willing to assume that
they have?

							- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
  2008-06-26  5:23 Theodore Tso
@ 2008-06-26  5:26 ` Junio C Hamano
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6it3ZFEDjCd5X>
  1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2008-06-26  5:26 UTC (permalink / raw)
  To: Theodore Tso; +Cc: David Jeske, git

Theodore Tso <tytso@mit.edu> writes:

> On Wed, Jun 25, 2008 at 11:03:02PM -0000, David Jeske wrote:
>> I don't want to replicate CVS behavior, just the workflow.
>
> It's not clear exactly what you want.  If you want the CVS workflow
> (with all of its downsides), then just use "git pull; hack hack hack;
> git push" all on the master branch.

Eh, my point was more about "to preserve CVS workflow, fetch+rebase+push
is much closer than pull+push"...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6it3ZFEDjCd5X>
  2008-06-26  6:08   ` David Jeske
@ 2008-06-26  6:08   ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-26  6:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Junio C Hamano, git

-- Theodore Tso wrote:
> If you are going to preserve the workflow of CVS, then you're
> also going to preserve all of the downsides of CVS.

I don't agree with this, and I don't see you proposing any logic that proves it
to be true. Of course I plan to make small changes. However, in my previous
message I proposed 3 same-workflow improvements, and 2 small-workflow-extension
improvements. I have more in mind..

http://marc.info/?l=git&m=121442660332114&w=2

Maybe it was too confusing or too long to read. Just consider the first simple
example.

Currently "cvs up" in a dirty tree is a destructive operation. If you merge
badly, you can't get back to your local working files before the "up". I've
been burned by this in cvs/perforce enough that now when there are complicated
update-conflicts I tar up the tree before trying to fix them. I still can't
really get back to the pre-up state.

I can be better than cvs with the EXACT same workflow, by checking in their
local changes (git checkin;) and then doing the "up" (git pull;). If they
decide they botched their merge, they can get back to where they were before
the UP because I'm using a richer underlying mechanism to implement their
workflow.

Do you think that's not an improvement? or not the same workflow? It sure seems
like a same-workflow improvement to me.

----

git's mechanisms are really great for making a hybrid central/distributed
system which has the simplicity of cvs/perforce and several of the benefits of
git. The git interface is just too complicated to be used for this.
Fortunately, building on git means that power users will still be able to use
git directly and people can distribute the repositories as much as they want.

> how much change are you willing to make them deviate from
> the CVS workflow, and how much smarts are you willing to assume that
> they have?

Good question. I'm working on a command-line wrapper for git that does it.
Digging into the "plumbling" is making it more obvious why I find git's
porcelain operations hard to understand. I think I can make a 2-repository
setup (personal-inaccessible, origin) work like cvs/perforce with local
checkins, and I can make a 3-repository setup (personal-inaccessible,
personal-accessible, origin) work nearly the same as cvs while allowing
distributed collaboration. I think I will need a tiny bit of custom server
support (to create the personal-accessible repositories automatically).

Right now it looks like I'll be a simple hybrid of cvs/perforce, with a couple
git concepts peppered in. (but just a couple) It seems simple so far, it's just
taking me a while to dig through git-plumbing to understand it.

Also remember, this isn't built to handle what linux-kernel folks do with git.
It's designed to provide a familiar environment for cvs/perforce style users
that is just as simple but a whole lot better. Even if it eventually gets lots
of git concepts, they won't HAVE to understand them to use it. They can learn
them as they go.  This is obviously something that people want, as cogito and
easy-git show.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6it3ZFEDjCd5X>
@ 2008-06-26  6:08   ` David Jeske
  2008-06-26  6:08   ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-26  6:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Junio C Hamano, git

-- Theodore Tso wrote:
> If you are going to preserve the workflow of CVS, then you're
> also going to preserve all of the downsides of CVS.

I don't agree with this, and I don't see you proposing any logic that proves it
to be true. Of course I plan to make small changes. However, in my previous
message I proposed 3 same-workflow improvements, and 2 small-workflow-extension
improvements. I have more in mind..

http://marc.info/?l=git&m=121442660332114&w=2

Maybe it was too confusing or too long to read. Just consider the first simple
example.

Currently "cvs up" in a dirty tree is a destructive operation. If you merge
badly, you can't get back to your local working files before the "up". I've
been burned by this in cvs/perforce enough that now when there are complicated
update-conflicts I tar up the tree before trying to fix them. I still can't
really get back to the pre-up state.

I can be better than cvs with the EXACT same workflow, by checking in their
local changes (git checkin;) and then doing the "up" (git pull;). If they
decide they botched their merge, they can get back to where they were before
the UP because I'm using a richer underlying mechanism to implement their
workflow.

Do you think that's not an improvement? or not the same workflow? It sure seems
like a same-workflow improvement to me.

----

git's mechanisms are really great for making a hybrid central/distributed
system which has the simplicity of cvs/perforce and several of the benefits of
git. The git interface is just too complicated to be used for this.
Fortunately, building on git means that power users will still be able to use
git directly and people can distribute the repositories as much as they want.

> how much change are you willing to make them deviate from
> the CVS workflow, and how much smarts are you willing to assume that
> they have?

Good question. I'm working on a command-line wrapper for git that does it.
Digging into the "plumbling" is making it more obvious why I find git's
porcelain operations hard to understand. I think I can make a 2-repository
setup (personal-inaccessible, origin) work like cvs/perforce with local
checkins, and I can make a 3-repository setup (personal-inaccessible,
personal-accessible, origin) work nearly the same as cvs while allowing
distributed collaboration. I think I will need a tiny bit of custom server
support (to create the personal-accessible repositories automatically).

Right now it looks like I'll be a simple hybrid of cvs/perforce, with a couple
git concepts peppered in. (but just a couple) It seems simple so far, it's just
taking me a while to dig through git-plumbing to understand it.

Also remember, this isn't built to handle what linux-kernel folks do with git.
It's designed to provide a familiar environment for cvs/perforce style users
that is just as simple but a whole lot better. Even if it eventually gets lots
of git concepts, they won't HAVE to understand them to use it. They can learn
them as they go.  This is obviously something that people want, as cogito and
easy-git show.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
@ 2008-06-26 11:37 Theodore Tso
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6rSE7FEDjCYv6>
  0 siblings, 1 reply; 28+ messages in thread
From: Theodore Tso @ 2008-06-26 11:37 UTC (permalink / raw)
  To: David Jeske; +Cc: Junio C Hamano, git

On Thu, Jun 26, 2008 at 06:08:55AM -0000, David Jeske wrote:
> I can be better than cvs with the EXACT same workflow, by checking in their
> local changes (git checkin;) and then doing the "up" (git pull;). If they
> decide they botched their merge, they can get back to where they were before
> the UP because I'm using a richer underlying mechanism to implement their
> workflow.

This is a really good example of the problems involved.  One of the
major problems with CVS is that CVS developers have a tendency to use
"cvs up" *way* too often --- i.e., with a dirty tree.  Why do they
have a dirty tree?  Well, generally because they commit too rarely;
since CVS branches are so awful to use, they generally don't use CVS
branches, and so if they are in the middle of making major changes the
source base, they may not do a CVS checkin for weeks or months, since
they don't want to break the centrally visible branch until their
project actuall is at a stage where it can be checked into the tree
without breaking core functionality.

(I once supervised a programmer who didn't do a CVS checkin for two
months, and then lost two months of work when his local disk died, and
as a result he had a nervous breakdown; you just can't make up some of
the massive, major problems that can result from CVS-inspired
workflows.)

So if you are going to accomodate the broken workflow where people
leave dirty state in their local tree for vast amounts of time, and
thus insist on running "cvs up" all the time, and will try to cover
for it by committing their work under their noses when they do the
equivalent of "cvs up" in a dirty worktree --- what does that mean?
Well, maybe you can make it work, but it breaks other nice features of
git.  For example, it means that "git bisect" can't possibly work,
since there will be huge number of commits where the tree may not even
build!

Accomodating the CVS workflow is basically about the fact that users
don't want to learn about CVS branches, because they were horrible to
use, and even worse to merge.  But that's not true with git branches;
so maybe it's better to teach them how to use git branches instead,
instead of trying to coddle them into letting them use the the same
old broken CVS workflow that was based on branch-avoidance?

I've created and taught a Usenix tutorial which covers the basics of
distributed source code management systems, including branches,
repositories, pushing and pulling between them, for git, hg, AND bzr,
and I did it in half a day.  The concepts really aren't hard.  The
main problem with git is that because the UI grew organically, there
are all sorts of exceptions and non-linearities in its CLI.  

For example, the fact that "git checkout" can be used both to switch
between branches, and revert and editing file.  Or the fact that how
you specify a set of revisions in git-format-patch is different in
terms of what happens when you specify a single commit; it's
documented in the man page now, at least, and people who teach git
after a while learn about the things that you have to teach newbies
that git experts take for granted.  (Just as people who teach English
as a second language learn about all of the exceptions to the language
that you have to point out that are second nature to the natives.)
But really, git *isn't* that hard, once you get past the somewhat
awkard CLI.  (It's no worse, and probably much better, than the Unix
shell/test/awk/sed/head/tail/sort/uniq/comm, etc.  You just have to
get over the learning curve.)

> git's mechanisms are really great for making a hybrid
> central/distributed system which has the simplicity of cvs/perforce
> and several of the benefits of git. The git interface is just too
> complicated to be used for this.  Fortunately, building on git means
> that power users will still be able to use git directly and people
> can distribute the repositories as much as they want.

I'd suggest that you try using git straight for a bit longer, before
you start drawing these conclusions.  Trust me, the concepts of git
really aren't that hard to explain to people; that's not what you need
to hide from people coming from the CVS world.  The hard part is the
fact that git's UI has all sorts of non-linearities and that git's
documentation and introductory tutorials are not as good as it should
be.  (Although it's gotten a LOT better than just a year or two ago.)

Also, if your program when used by CVS refugees to causes the git
repository to be peppered with trash commits which don't build, even
if power users are using git directly, their ability to browse the
repository using "git log" or "gitk", or to try to find problems using
"git bisect", will be horribly, negatively affected.  So I am a bit
worried that the result will end up destroying value for the project
in the long-term, and that the costs will not be matched by the
benefits of simply teaching the CVS refugees a few bits of git and
DSCM core concepts, which I've found is *not* the hard parts of
getting newbies to use git.

> Good question. I'm working on a command-line wrapper for git that does it.
> Digging into the "plumbling" is making it more obvious why I find git's
> porcelain operations hard to understand.

Exactly.  So what I would ask you to consider is that you may find it
personally useful to design this system, but afterwards, before you
inflict it on projects, and deal with some of the attendent side
effects (like all of these trash commits causing "git bisect" to go
down the drain), that you consider whether *now* that you understand
how git works and why it does some of the things it does, and what the
shortcomings of the git porcelain are from a UI perspective, whether
CVS refugees really would be best served by this system you are
designing, or whether a few wrapper scripts to hide some of the more
pointy spikes in git's CLI, plus some better tutorials, might in the
long run be much better for these CVS developers that you are trying
to serve.

						- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6rSE7FEDjCYv6>
@ 2008-06-26 16:21   ` David Jeske
  2008-06-26 16:21   ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-26 16:21 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Junio C Hamano, git

Thanks for pointing out the issue with automatically committing and bisect.
You're right, if I'm going to automatically commit under the covers I should
use stash instead. However, I don't want users to keep a dirty tree, and now
they don't have to.

To use your two-months-without-checkins example.. one of the big problems I
have with cvs/p4 is this notion that I'm not supposed to record my work every
5-50 minutes. I checkin every time my code does something new and the tests
pass. In my own startup projects/companies this is fine, because it's my tree.
As soon as the group policy stops me from checking in every 5-50 minutes, I
painfully make my own branch so I can checkin on my schedule. I'm starting to
witness some users solving this problem in a very libertarian way, by using git
to manage their local changes even though they work in a code-review restricted
cvs/p4 environment.

-- Theodore Tso wrote:
> I'd suggest that you try using git straight for a bit longer, before
> you start drawing these conclusions. Trust me, the concepts of git
> really aren't that hard to explain to people; that's not what you need
> to hide from people coming from the CVS world.  The hard part is the
> fact that git's UI has all sorts of non-linearities and that git's
> documentation and introductory tutorials are not as good as it should
> be.  (Although it's gotten a LOT better than just a year or two ago.)

I agree 100%. I am using git straight. I think I have read more git
documentation and definitely read more git source-code in trying to use it over
a couple months, than I have read of cvs/p4 in decades - just to try to
understand which of the 3 ways to get from here-to-there is correct, and then
when I pull back the red curtain a little further I realize I was totally
wrong.

This started as a "cheat sheet" file with the combination of git commands I had
to execute to perform each task. However, they are only valid in the context of
a git-repo that's configured in certain ways. I realized it would be simpler
(even for just me) if I had something that grouped commands and did 'lint'
sanity checks, with helpful tutorial responses. Thus the wrapper.

> Exactly.  So what I would ask you to consider is that you may find it
> personally useful to design this system,

I see where you're going with this, and I agree...

> but afterwards, before you inflict it on projects, and deal
> with some of the attendent side effects (like all of these trash
> commits causing "git bisect" to go down the drain), that you
> consider whether *now* that you understand how git works and
> why it does some of the things it does, and what the
> shortcomings of the git porcelain are from a UI perspective, whether
> CVS refugees really would be best served by this system you are
> designing, or whether a few wrapper scripts to hide some of the more
> pointy spikes in git's CLI, plus some better tutorials, might in the
> long run be much better for these CVS developers that you are trying
> to serve.

Absolutly. I hope that you can understand my goal of an 'interactive command
line/tutorial linear path from cvs/p4 to git'. One where they don't get stuck
and turn back, but also where they work in ways which are 'fairly reasonable'
in the git community. I also hope you'll help me evaluate whether I've succeed
or just made another confusing set of compromises that are no good. There is no
need for more of the latter.

I also have a group that's been using git and wants to switch back to cvs/p4.
They are willing to give up tracking their local changes (or do it with private
gits) in order to get a simpler model for 'shared head of tree' development. I
think they are a good test-case as well.

------

So far, 1/2 of the lines of my script merely transitional documentation from
p4/cvs to git. As I write more of this prose, I realize that it may be helpful
as transition documentation webpages. However, it is much more than passive
documentation, because if there are 3 steps from here to there, I can look at
the repository and see where the user is, and tell them what they need to do
next.

As one example, I have a command "pending" (like p4 pending) which shows local
changes in my branch (on my inaccessible firewalled machine) which are not on
my origin repo(s). Except that in order for this concept to even make sense, it
first:

- checks if I have an 'origin' for a public repo
- checks that my current branch is tracking an [some]origin
- if it is mapped to a 'myorigin' personal published repo
(because I'm firewalled), it checks that the name of the
branch matches the myorigin/branchname (because it's easier
to think straight if myorigin is a literal copy of my local
repo)
- shows what changes I have which are not submitted to myorigin
and/or origin

If at any step along that path something doesn't check out, it explains what
didn't check out, and has a helpful help-page about ways that I might configure
it so 'pending' can do something useful. Think of it like "git lint" and some
documentation.

That said, it's trickier than I thought, because git is capable of working in
so many ways. (all that complexity isn't there for nothing) Time will tell if I
can strike a useful balance.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: policy and mechanism for less-connected clients
       [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6rSE7FEDjCYv6>
  2008-06-26 16:21   ` David Jeske
@ 2008-06-26 16:21   ` David Jeske
  1 sibling, 0 replies; 28+ messages in thread
From: David Jeske @ 2008-06-26 16:21 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Junio C Hamano, git

Thanks for pointing out the issue with automatically committing and bisect.
You're right, if I'm going to automatically commit under the covers I should
use stash instead. However, I don't want users to keep a dirty tree, and now
they don't have to.

To use your two-months-without-checkins example.. one of the big problems I
have with cvs/p4 is this notion that I'm not supposed to record my work every
5-50 minutes. I checkin every time my code does something new and the tests
pass. In my own startup projects/companies this is fine, because it's my tree.
As soon as the group policy stops me from checking in every 5-50 minutes, I
painfully make my own branch so I can checkin on my schedule. I'm starting to
witness some users solving this problem in a very libertarian way, by using git
to manage their local changes even though they work in a code-review restricted
cvs/p4 environment.

-- Theodore Tso wrote:
> I'd suggest that you try using git straight for a bit longer, before
> you start drawing these conclusions. Trust me, the concepts of git
> really aren't that hard to explain to people; that's not what you need
> to hide from people coming from the CVS world.  The hard part is the
> fact that git's UI has all sorts of non-linearities and that git's
> documentation and introductory tutorials are not as good as it should
> be.  (Although it's gotten a LOT better than just a year or two ago.)

I agree 100%. I am using git straight. I think I have read more git
documentation and definitely read more git source-code in trying to use it over
a couple months, than I have read of cvs/p4 in decades - just to try to
understand which of the 3 ways to get from here-to-there is correct, and then
when I pull back the red curtain a little further I realize I was totally
wrong.

This started as a "cheat sheet" file with the combination of git commands I had
to execute to perform each task. However, they are only valid in the context of
a git-repo that's configured in certain ways. I realized it would be simpler
(even for just me) if I had something that grouped commands and did 'lint'
sanity checks, with helpful tutorial responses. Thus the wrapper.

> Exactly.  So what I would ask you to consider is that you may find it
> personally useful to design this system,

I see where you're going with this, and I agree...

> but afterwards, before you inflict it on projects, and deal
> with some of the attendent side effects (like all of these trash
> commits causing "git bisect" to go down the drain), that you
> consider whether *now* that you understand how git works and
> why it does some of the things it does, and what the
> shortcomings of the git porcelain are from a UI perspective, whether
> CVS refugees really would be best served by this system you are
> designing, or whether a few wrapper scripts to hide some of the more
> pointy spikes in git's CLI, plus some better tutorials, might in the
> long run be much better for these CVS developers that you are trying
> to serve.

Absolutly. I hope that you can understand my goal of an 'interactive command
line/tutorial linear path from cvs/p4 to git'. One where they don't get stuck
and turn back, but also where they work in ways which are 'fairly reasonable'
in the git community. I also hope you'll help me evaluate whether I've succeed
or just made another confusing set of compromises that are no good. There is no
need for more of the latter.

I also have a group that's been using git and wants to switch back to cvs/p4.
They are willing to give up tracking their local changes (or do it with private
gits) in order to get a simpler model for 'shared head of tree' development. I
think they are a good test-case as well.

------

So far, 1/2 of the lines of my script merely transitional documentation from
p4/cvs to git. As I write more of this prose, I realize that it may be helpful
as transition documentation webpages. However, it is much more than passive
documentation, because if there are 3 steps from here to there, I can look at
the repository and see where the user is, and tell them what they need to do
next.

As one example, I have a command "pending" (like p4 pending) which shows local
changes in my branch (on my inaccessible firewalled machine) which are not on
my origin repo(s). Except that in order for this concept to even make sense, it
first:

- checks if I have an 'origin' for a public repo
- checks that my current branch is tracking an [some]origin
- if it is mapped to a 'myorigin' personal published repo
(because I'm firewalled), it checks that the name of the
branch matches the myorigin/branchname (because it's easier
to think straight if myorigin is a literal copy of my local
repo)
- shows what changes I have which are not submitted to myorigin
and/or origin

If at any step along that path something doesn't check out, it explains what
didn't check out, and has a helpful help-page about ways that I might configure
it so 'pending' can do something useful. Think of it like "git lint" and some
documentation.

That said, it's trickier than I thought, because git is capable of working in
so many ways. (all that complexity isn't there for nothing) Time will tell if I
can strike a useful balance.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-06-26 17:43 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-25  2:33 policy and mechanism for less-connected clients Theodore Tso
     [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6@3N@FEDjCXZO>
2008-06-25  5:20   ` David Jeske
2008-06-25 19:17     ` Daniel Barkalow
2008-06-25 20:12       ` Raimund Bauer
2008-06-25  5:20   ` David Jeske
2008-06-25  9:30     ` Jakub Narebski
  -- strict thread matches above, loose matches on Subject: below --
2008-06-26 11:37 Theodore Tso
     [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6rSE7FEDjCYv6>
2008-06-26 16:21   ` David Jeske
2008-06-26 16:21   ` David Jeske
2008-06-26  5:23 Theodore Tso
2008-06-26  5:26 ` Junio C Hamano
     [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6it3ZFEDjCd5X>
2008-06-26  6:08   ` David Jeske
2008-06-26  6:08   ` David Jeske
2008-06-25 14:03 Petr Baudis
2008-06-25 13:34 Theodore Tso
2008-06-25 17:34 ` Junio C Hamano
     [not found] ` <willow-jeske-01l6@3PlFEDjCVAh-01l6OB5yFEDjCYe3>
2008-06-25 19:37   ` David Jeske
2008-06-25 20:52     ` Jakub Narebski
2008-06-25 20:54     ` Jakub Narebski
2008-06-25 19:37   ` David Jeske
     [not found]     ` <willow-jeske-01l6@3PlFEDjCVAh-01l6XqjPFEDjCY6P>
2008-06-25 21:34       ` David Jeske
2008-06-25 22:10         ` Jakub Narebski
2008-06-25 22:13         ` Junio C Hamano
     [not found]           ` <willow-jeske-01l6@3PlFEDjCVAh-01l6[3InFEDjC[dy>
2008-06-25 23:03             ` David Jeske
2008-06-25 23:03             ` David Jeske
2008-06-25 21:34       ` David Jeske
2008-06-25  0:36 David Jeske
2008-06-25  0:36 David Jeske

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).