git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Looking for feedback and help with a git-mirror for local usage
@ 2015-06-11 20:44 Bernd Naumann
  2015-06-12 10:52 ` Need some help on patching buildin-files // was: " Bernd Naumann
  0 siblings, 1 reply; 3+ messages in thread
From: Bernd Naumann @ 2015-06-11 20:44 UTC (permalink / raw)
  To: git

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I have came up with an idea
# Yep I know, exactly that kind of e-mail everyone wants to read ;)
and I'm working currently on a shell-prototype to face the following
situation and problem and need some feedback/advise:


I often build in example 'openwrt' with various build-scripts which
depends heavily on a fresh or clean environment and they omit many
sources via `git clone`, which results sometimes in over 100 MB of
traffic just for one build. /* Later needed .tar.gz source archives
are stored in a symlinked download directory which is supported by
'openwrt/.config' since a few months... to reduce network traffic. */

My connection to the internet is not the fastest in world and
sometimes unstable, so I wanted to have some kind of local bare
repository mirror, which is possible with `git clone --mirror`.

- From these repositories I can later clone from, by calling `git clone
- --reference /path/to.git <url>`, but I do not wish to edit all the
build-scripts and Makefiles.


So I wrote a git wrapper script (`$HOME/bin/git`), which checks if
`git` was called with 'clone', and if so, then it will first clones
the repository as a mirror and then clones from that local mirror. If
the mirror already exists, then it will only be updated (`git remote
update`). This works for now.

/*
  To be able to have multiple identical named repositories,
  the script builds paths like:

  ~/var/cache/gitmirror $ find . -name "*.git"

  ./github.com/openwrt-management/packages.git
  ./github.com/openwrt/packages.git
  ./github.com/openwrt-routing/packages.git
  ./nbd.name/packages.git
  ./git.openwrt.org/packages.git
  ./git.openwrt.org/openwrt.git

It strips the schema from the url and replaces ":" with "/" in case a
port is specified or a svn link is provided. The remaining should be a
valid linux file and directory structure, if I guess correctly!?
*/

Ok, so far, so good, but the implementation of the current
shell-prototype looks way too hacky [0] and I have found some edge
cases on which my script will fail:
  The script depends on the fact that the last, or at least the second
last argument is a valid git-url, but the following is a valid call, too
:

  `git --no-pager \
   clone git@github.com:openwrt/packages.git openwrt-packages --depth 1`

But this is not valid:

`git clone https://github.com/openwrt/packages.git --reference
packages.git packages-2`
or
`git clone --verbose https://github.com/openwrt/packages.git
packages-2 --reference packages.git`


I found out that git-clone actually also can only make a guess what is
the url and what not.



However, now I'm looking for a way to write something like a submodul
for git which will check for a *new* git-config value like
"user.mirror" (or something...) which points to a directory, and will
be used to clone from, and in case of 'fetch', 'pull' or 'remote
update' update the mirror first, and then the update of the current
working directory is gotten from that mirror. (And in case of 'push'
the mirror would be updated from the working dir, of course.)


I would like to hear some toughs on that, and how I could start to
build this submodul, or if someone more talented, then I am, is willed
to spent some time on that. If requested/wished I could send a link to
the shell-prototype.


[0] For a reason I have to do ugly things like
`$( eval exec /usr/bin/git clone --mirror $REPO_URL ) 2>&1 >/dev/null`
cause otherwise in case of just `eval exec` the script stops after
execution, and without `eval exec` arguments with spaces will
interpreted as seperated arguments, which is no good, because of failing
.


Thanks for your time!
Yours faithfully, Bernd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJVefMYAAoJEEYW3OihUKBPJkAP/iiFBoHnJXloX0SRQHjEUBDf
C5PQ/42IZTB5ghM959IBA0QZ4p4BEFcwu8q7xKKE2FtiUzAAb1hRiXZOV7S7DZ1s
iPDCOk8hTp9eqSLgfDL6WX7ztGByFoT9GodwpTFBLU31RvooWO1BYc/jrd3lMA4k
4lk+8SM1dOffJm0g+A4YCsE59P7Rn/t0iJYepaN7cXVMdgKvuZ0iVi9CvHAipPUG
xuCwNYCM6tcOvnjZH/Nqa57+l5LfQ7qIA6YBlG77wOwDHgX+GPkYAqq+xOq28aP0
+W7duf32SgkQBwSTnfntYd4G+QZqIktP30Ik0e8hCcU37ECcEP2s28CebY2825/n
FaZEutK3sE+lk47j89ndvPdtpHybchUi/0zNftPY0ngU6Yc/0YMMq9KeYM6kt6+s
8coSvt5AQLhgR+NMQhXF4nKtfcvt9B+xZtag6Re/zA8AwIrBFFvu7dGkvG9aydDe
Zwvt4/ddYxEouEPhwr4+KmmM2ll8tHoBcJJYr+xoqQlE/nSPfF/gvsQVqciijp4b
afyStwFYGHPo68pMvEZx+xXYaAhkKSAvaN5vupy1e5765E0F5DWOV5P026L45D7V
yKFVa/eYZc/iJjQcjzpch9mq/Jiblht6XXR1YDlHg5PoKE3Chs8EjYp0wyPWqGS0
lrCPzhwrMLVmmksF0wcN
=P4eQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Need some help on patching buildin-files // was: Looking for feedback and help with a git-mirror for local usage
  2015-06-11 20:44 Looking for feedback and help with a git-mirror for local usage Bernd Naumann
@ 2015-06-12 10:52 ` Bernd Naumann
  2015-06-14  8:17   ` David Aguilar
  0 siblings, 1 reply; 3+ messages in thread
From: Bernd Naumann @ 2015-06-12 10:52 UTC (permalink / raw)
  To: git

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello again,

After digging the code I may have got a clue where to start but I
would  still appreciate some help from a developer, cause I have never
learned to write C. (Some basics at school which happened over a
decade ago.)

Currently I have questions on:

* How to patch clone: would cmd_clone() a good place? Or are there
other calls which might be better. I think about to insert the check
if a mirror will be setup or just updated, right after dest_exists.

* Is it correct that a new config key just get specified via a config
file or by cmd_init_db()? So later, a check on that value is enough?
Would be the section 'user' a good place for this key or is it
something that would get a own/new section?

* Have I missed a relevant file?

git/git.c
git/builtin/clone.c
git/builtin/fetch.c
git/builtin/push.c
git/buildin/remote.c
along with the translation and Documentation, of course.


If you have some comments on that, please share these with me, and if
you are interested in helping me to got this implemented, I would
appreciate that :)

Sincere regards,
Bernd


On 06/11/2015 10:44 PM, Bernd Naumann wrote:
> Hello,
> 
> I have came up with an idea # Yep I know, exactly that kind of 
> e-mail everyone wants to read ;) and I'm working currently on a 
> shell-prototype to face the following situation and problem and 
> need some feedback/advise:
> 
> 
> I often build in example 'openwrt' with various build-scripts which
> depends heavily on a fresh or clean environment and they omit many
> sources via `git clone`, which results sometimes in over 100 MB of
> traffic just for one build. /* Later needed .tar.gz source archives
> are stored in a symlinked download directory which is supported by
> 'openwrt/.config' since a few months... to reduce network traffic.
> */
> 
> My connection to the internet is not the fastest in world and 
> sometimes unstable, so I wanted to have some kind of local bare 
> repository mirror, which is possible with `git clone --mirror`.
> 
> From these repositories I can later clone from, by calling `git 
> clone --reference /path/to.git <url>`, but I do not wish to edit 
> all the build-scripts and Makefiles.
> 
> 
> So I wrote a git wrapper script (`$HOME/bin/git`), which checks if
>  `git` was called with 'clone', and if so, then it will first 
> clones the repository as a mirror and then clones from that local 
> mirror. If the mirror already exists, then it will only be updated 
> (`git remote update`). This works for now.
> 
> /* To be able to have multiple identical named repositories, the 
> script builds paths like:
> 
> ~/var/cache/gitmirror $ find . -name "*.git"
> 
> ./github.com/openwrt-management/packages.git 
> ./github.com/openwrt/packages.git 
> ./github.com/openwrt-routing/packages.git ./nbd.name/packages.git 
> ./git.openwrt.org/packages.git ./git.openwrt.org/openwrt.git
> 
> It strips the schema from the url and replaces ":" with "/" in
> case a port is specified or a svn link is provided. The remaining
> should be a valid linux file and directory structure, if I guess 
> correctly!? */
> 
> Ok, so far, so good, but the implementation of the current 
> shell-prototype looks way too hacky [0] and I have found some edge
>  cases on which my script will fail: The script depends on the
> fact that the last, or at least the second last argument is a
> valid git-url, but the following is a valid call, too :
> 
> `git --no-pager \ clone git@github.com:openwrt/packages.git 
> openwrt-packages --depth 1`
> 
> But this is not valid:
> 
> `git clone https://github.com/openwrt/packages.git --reference 
> packages.git packages-2` or `git clone --verbose 
> https://github.com/openwrt/packages.git packages-2 --reference 
> packages.git`
> 
> 
> I found out that git-clone actually also can only make a guess
> what is the url and what not.
> 
> 
> 
> However, now I'm looking for a way to write something like a 
> submodul for git which will check for a *new* git-config value like
> "user.mirror" (or something...) which points to a directory, and
> will be used to clone from, and in case of 'fetch', 'pull' or 
> 'remote update' update the mirror first, and then the update of
> the current working directory is gotten from that mirror. (And in
> case of 'push' the mirror would be updated from the working dir,
> of course.)
> 
> 
> I would like to hear some toughs on that, and how I could start to
>  build this submodul, or if someone more talented, then I am, is 
> willed to spent some time on that. If requested/wished I could
> send a link to the shell-prototype.
> 
> 
> [0] For a reason I have to do ugly things like `$( eval exec 
> /usr/bin/git clone --mirror $REPO_URL ) 2>&1 >/dev/null` cause 
> otherwise in case of just `eval exec` the script stops after 
> execution, and without `eval exec` arguments with spaces will 
> interpreted as seperated arguments, which is no good, because of 
> failing .
> 
> 
> Thanks for your time! Yours faithfully, Bernd -- To unsubscribe 
> from this list: send the line "unsubscribe git" in the body of a 
> message to majordomo@vger.kernel.org More majordomo info at 
> http://vger.kernel.org/majordomo-info.html
> 

- -- 
Bernd Naumann <bernd@kr217.de>

PGP:   0xA150A04F via pool.sks-keyservers.net
XMPP:  bn@weimarnetz.de

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJVern8AAoJEEYW3OihUKBPLPYP/2RYrqH7qtH9ZVCc+dN6kEMb
KzEFiSF7Vg7EIcSYIeyw7SS8M/3QyBHmdClq6Gcgby7yAYuSsXcY4V0xja12cI2g
glH+5kXfZg11shJdi530GGLNVyTaLhhNUxqmrB56FHP31nOeFGEYzLhDs16mh4z8
2YiN4wT62O8R/yjRReaeRBe2cTniga1ZeDVFgYGE2atWmGOb2DLEfDyxAWIUeu0r
RfLF7NPb5ZLAlSBfifmMeRJ7Fu8Ewf0m3ESZoBlD3+CW68k7vefTo+iVvmRJnkEl
0p89IJMCdPEwTsXwXUMnI0xcofM9tLthGQ+x482rQTxUYvkzQnjT2vBc/DTe08Ok
+xS4JaZl+22IlyRt8KFJOLHhZfuZgYOlGqoHqxbIPZyNvR+AtuGRSGdGEJoc2ACb
aij+smTlN3k8X3DZVGPNlsNaFCRVgGin2Yad4pOIk/mlkR6xx3LfFB6qv8mSoj0z
kmFDPAdYWlGps+hPeM76Ql9UN+wgcD+1y2TpMJGyUy4YoGmOQq4TWO1JMUVjWWie
ie5Yf/JD8fopmrV3BM2sT7gLmi75zpsc3Em/i6S5zUIhZ0v+UfNE5dSim0KcAgyS
I0apoXZZw3UjaTilmd3h/ecuTa1lBygNucnGnDfm3NAGYDWjy0LR8Mu7+2rBogvk
JCF1hf3qka80GEwsZqu+
=3nO1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Need some help on patching buildin-files // was: Looking for feedback and help with a git-mirror for local usage
  2015-06-12 10:52 ` Need some help on patching buildin-files // was: " Bernd Naumann
@ 2015-06-14  8:17   ` David Aguilar
  0 siblings, 0 replies; 3+ messages in thread
From: David Aguilar @ 2015-06-14  8:17 UTC (permalink / raw)
  To: Bernd Naumann; +Cc: git

On Fri, Jun 12, 2015 at 12:52:44PM +0200, Bernd Naumann wrote:
> Hello again,
> 
> After digging the code I may have got a clue where to start but I
> would  still appreciate some help from a developer, cause I have never
> learned to write C. (Some basics at school which happened over a
> decade ago.)
> 
> Currently I have questions on:
> 
> * How to patch clone: would cmd_clone() a good place? Or are there
> other calls which might be better. I think about to insert the check
> if a mirror will be setup or just updated, right after dest_exists.

If you'd still like to modify "git clone" itself, then the
"cmd_clone" entry point is certainly the place to start.
I would suggest exploring other alternatives, though.


Is it possible to use a caching HTTP proxy, so that "git clone"
goes through a local caching proxy?  I haven't tried this myself,
so maybe it's not even possible, but that seems like a natural
http-ish solution.


Another idea is to use Git's URL rewriting feature.  If your
clone URLs all follow a similar pattern then they can
automatically be rewritten to point to some other URL.

e.g. in ~/.gitconfig:

[url "file:///home/git/mirror/github.com/"]
	insteadOf = "https://github.com/"

This will make git clone from /home/git/mirror/github.com/
whenever it sees https://github.com/ URLs.

This is not perfect because it ends up cloning from your local
copies rather than setting up the references via --mirror, but
at least it avoids hitting the network.  You'll need to
periodically update your local mirrors, though.

If you prefer to keep ~/.gitconfig pristine then you could do it
in a wrapper script by injecting e.g. the "-c" config flags,

	git \
	-c url.file://foo/bar/.insteadOf=https://github.com/ \
	clone ...

> [...snip...]
> > 
> > I often build in example 'openwrt' with various build-scripts which
> > depends heavily on a fresh or clean environment and they omit many
> > sources via `git clone`, which results sometimes in over 100 MB of
> > traffic just for one build. /* Later needed .tar.gz source archives
> > are stored in a symlinked download directory which is supported by
> > 'openwrt/.config' since a few months... to reduce network traffic.
> > */

Why does a rebuild delete existing Git repositories?
That seems like a bad practice, and shouldn't be needed.
If possible, it would be worth improving the build scripts.

For example, a clone can be made pristine by doing
"git reset --hard && git clean -fdx".  Deleting a repository
just so that it can be re-cloned is very wasteful.

> > My connection to the internet is not the fastest in world and 
> > sometimes unstable, so I wanted to have some kind of local bare 
> > repository mirror, which is possible with `git clone --mirror`.
> > 
> > From these repositories I can later clone from, by calling `git 
> > clone --reference /path/to.git <url>`, but I do not wish to edit 
> > all the build-scripts and Makefiles.

Maybe it'd be possible to make just the "git clone" part of the
build scripts configurable?

That'd make it really easy to inject a wrapper script that scans
the arguments and injects the needed --mirror arguments, in the
case that the above options won't work.


> > So I wrote a git wrapper script (`$HOME/bin/git`), which checks if
> >  `git` was called with 'clone', and if so, then it will first 
> > clones the repository as a mirror and then clones from that local 
> > mirror. If the mirror already exists, then it will only be updated 
> > (`git remote update`). This works for now.
> > 
> > [...snip...]
> > 
> > Ok, so far, so good, but the implementation of the current 
> > shell-prototype looks way too hacky [0] and I have found some edge
> >  cases on which my script will fail: The script depends on the
> > fact that the last, or at least the second last argument is a
> > valid git-url, but the following is a valid call, too :
> > 
> > `git --no-pager \ clone git@github.com:openwrt/packages.git 
> > openwrt-packages --depth 1`
> > 
> > But this is not valid:
> > 
> > `git clone https://github.com/openwrt/packages.git --reference 
> > packages.git packages-2` or `git clone --verbose 
> > https://github.com/openwrt/packages.git packages-2 --reference 
> > packages.git`
> > 
> > 
> > I found out that git-clone actually also can only make a guess
> > what is the url and what not.

Another option is to rewrite the wrapper script in a better language.
For example, Python's argparse module can handle the above cases
with minimal fuss.

Anyways, as I said before, the root problem is really the build
scripts.  I bet modifying the build scripts to reuse existing
git repositories is easier than modifying "git clone".

cheers,
-- 
David

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-06-14  8:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-11 20:44 Looking for feedback and help with a git-mirror for local usage Bernd Naumann
2015-06-12 10:52 ` Need some help on patching buildin-files // was: " Bernd Naumann
2015-06-14  8:17   ` David Aguilar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).