From: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
To: Marek Vasut <marex@denx.de>,
"bitbake-devel@lists.openembedded.org"
<bitbake-devel@lists.openembedded.org>
Cc: Martin Jansa <Martin.Jansa@gmail.com>,
Richard Purdie <richard.purdie@linuxfoundation.org>
Subject: RE: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
Date: Sat, 20 Aug 2022 12:06:55 +0000 [thread overview]
Message-ID: <ab9e6c5562c44ac68e5d5e13e82dc8c0@axis.com> (raw)
In-Reply-To: <20220819165455.270130-1-marex@denx.de>
> -----Original Message-----
> From: Marek Vasut <marex@denx.de>
> Sent: den 19 augusti 2022 18:55
> To: bitbake-devel@lists.openembedded.org
> Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org>
> Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
>
> The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> single object in the remote repository. This works poorly with gitlab
> and github, which use the remote git repository to track its metadata
> like merge requests, CI pipelines and such.
>
> Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> and refs/keep-around/* and they all contain massive amount of data that
> are useless for the bitbake build purposes. The amount of useless data
> can in fact be so massive (e.g. with FDO mesa.git repository) that some
> proxies may outright terminate the 'git fetch' connection, and make it
> appear as if bitbake got stuck on 'git fetch' with no output.
>
> To avoid fetching all these useless metadata, tweak the git fetcher such
> that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> refspecs as those are only available in new git versions.
>
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Martin Jansa <Martin.Jansa@gmail.com>
> Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
> lib/bb/fetch2/git.py | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 4534bd75..b5fc0a51 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -382,7 +382,7 @@ class Git(FetchMethod):
> runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
>
> runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> - fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> + fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
> if ud.proto.lower() != 'file':
> bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
> progresshandler = GitProgressHandler(d)
> --
> 2.35.1
Seems like the right thing to do. We use Gerrit, which also has its
metadata in special refs/ spaces. One repository I tested with grew
from 3 MB to 35 MB when I fetched using refs/* while another grew
from 20 MB to 120 MB, so there is definitely space and time to be
saved by only fetching the refs/heads and refs/tags spaces....
Reviewed-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
//Peter
next prev parent reply other threads:[~2022-08-20 12:07 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-19 16:54 [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata Marek Vasut
2022-08-20 12:06 ` Peter Kjellerstedt [this message]
2022-08-22 5:19 ` [bitbake-devel] " Mikko.Rapeli
2022-08-22 6:57 ` Alexander Kanavin
2022-08-22 7:38 ` Mikko.Rapeli
2022-08-22 8:29 ` Marek Vasut
2022-08-22 8:37 ` Marek Vasut
2022-08-22 8:41 ` Marek Vasut
2022-08-22 9:09 ` Mikko.Rapeli
2022-08-22 8:41 ` Alexander Kanavin
2022-08-22 10:35 ` Marek Vasut
2022-08-22 10:51 ` Mikko.Rapeli
2022-08-22 10:57 ` Quentin Schulz
2022-08-22 11:55 ` Marek Vasut
2022-08-22 14:17 ` Richard Purdie
2022-08-22 15:21 ` Peter Kjellerstedt
2022-08-22 16:39 ` Marek Vasut
2022-09-01 17:50 ` Marek Vasut
2022-09-02 15:54 ` Richard Purdie
2022-08-22 16:02 ` Luca Ceresoli
2022-08-22 16:06 ` Peter Kjellerstedt
2022-08-22 16:07 ` Richard Purdie
2022-08-23 14:34 ` Luca Ceresoli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab9e6c5562c44ac68e5d5e13e82dc8c0@axis.com \
--to=peter.kjellerstedt@axis.com \
--cc=Martin.Jansa@gmail.com \
--cc=bitbake-devel@lists.openembedded.org \
--cc=marex@denx.de \
--cc=richard.purdie@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.