All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
@ 2022-08-19 16:54 Marek Vasut
  2022-08-20 12:06 ` Peter Kjellerstedt
  2022-08-22 16:02 ` Luca Ceresoli
  0 siblings, 2 replies; 23+ messages in thread
From: Marek Vasut @ 2022-08-19 16:54 UTC (permalink / raw)
  To: bitbake-devel
  Cc: Marek Vasut, Martin Jansa, Peter Kjellerstedt, Richard Purdie

The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
single object in the remote repository. This works poorly with gitlab
and github, which use the remote git repository to track its metadata
like merge requests, CI pipelines and such.

Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
and refs/keep-around/* and they all contain massive amount of data that
are useless for the bitbake build purposes. The amount of useless data
can in fact be so massive (e.g. with FDO mesa.git repository) that some
proxies may outright terminate the 'git fetch' connection, and make it
appear as if bitbake got stuck on 'git fetch' with no output.

To avoid fetching all these useless metadata, tweak the git fetcher such
that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
refspecs as those are only available in new git versions.

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Martin Jansa <Martin.Jansa@gmail.com>
Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 lib/bb/fetch2/git.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 4534bd75..b5fc0a51 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -382,7 +382,7 @@ class Git(FetchMethod):
               runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
 
             runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
-            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
+            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
             if ud.proto.lower() != 'file':
                 bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
             progresshandler = GitProgressHandler(d)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-19 16:54 [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata Marek Vasut
@ 2022-08-20 12:06 ` Peter Kjellerstedt
  2022-08-22  5:19   ` [bitbake-devel] " Mikko.Rapeli
  2022-08-22 16:02 ` Luca Ceresoli
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Kjellerstedt @ 2022-08-20 12:06 UTC (permalink / raw)
  To: Marek Vasut, bitbake-devel@lists.openembedded.org
  Cc: Martin Jansa, Richard Purdie

> -----Original Message-----
> From: Marek Vasut <marex@denx.de>
> Sent: den 19 augusti 2022 18:55
> To: bitbake-devel@lists.openembedded.org
> Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org>
> Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
> 
> The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> single object in the remote repository. This works poorly with gitlab
> and github, which use the remote git repository to track its metadata
> like merge requests, CI pipelines and such.
> 
> Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> and refs/keep-around/* and they all contain massive amount of data that
> are useless for the bitbake build purposes. The amount of useless data
> can in fact be so massive (e.g. with FDO mesa.git repository) that some
> proxies may outright terminate the 'git fetch' connection, and make it
> appear as if bitbake got stuck on 'git fetch' with no output.
> 
> To avoid fetching all these useless metadata, tweak the git fetcher such
> that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> refspecs as those are only available in new git versions.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Martin Jansa <Martin.Jansa@gmail.com>
> Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>  lib/bb/fetch2/git.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 4534bd75..b5fc0a51 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -382,7 +382,7 @@ class Git(FetchMethod):
>                runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
> 
>              runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> -            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> +            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
>              if ud.proto.lower() != 'file':
>                  bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
>              progresshandler = GitProgressHandler(d)
> --
> 2.35.1

Seems like the right thing to do. We use Gerrit, which also has its 
metadata in special refs/ spaces. One repository I tested with grew 
from 3 MB to 35 MB when I fetched using refs/* while another grew 
from 20 MB to 120 MB, so there is definitely space and time to be 
saved by only fetching the refs/heads and refs/tags spaces....

Reviewed-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>

//Peter



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-20 12:06 ` Peter Kjellerstedt
@ 2022-08-22  5:19   ` Mikko.Rapeli
  2022-08-22  6:57     ` Alexander Kanavin
  0 siblings, 1 reply; 23+ messages in thread
From: Mikko.Rapeli @ 2022-08-22  5:19 UTC (permalink / raw)
  To: peter.kjellerstedt; +Cc: marex, bitbake-devel, Martin.Jansa, richard.purdie

Hi,

On Sat, Aug 20, 2022 at 12:06:55PM +0000, Peter Kjellerstedt wrote:
> > -----Original Message-----
> > From: Marek Vasut <marex@denx.de>
> > Sent: den 19 augusti 2022 18:55
> > To: bitbake-devel@lists.openembedded.org
> > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>; Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <richard.purdie@linuxfoundation.org>
> > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
> > 
> > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > single object in the remote repository. This works poorly with gitlab
> > and github, which use the remote git repository to track its metadata
> > like merge requests, CI pipelines and such.
> > 
> > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > and refs/keep-around/* and they all contain massive amount of data that
> > are useless for the bitbake build purposes. The amount of useless data
> > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > proxies may outright terminate the 'git fetch' connection, and make it
> > appear as if bitbake got stuck on 'git fetch' with no output.
> > 
> > To avoid fetching all these useless metadata, tweak the git fetcher such
> > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > refspecs as those are only available in new git versions.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > ---
> > Cc: Martin Jansa <Martin.Jansa@gmail.com>
> > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> > ---
> >  lib/bb/fetch2/git.py | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > index 4534bd75..b5fc0a51 100644
> > --- a/lib/bb/fetch2/git.py
> > +++ b/lib/bb/fetch2/git.py
> > @@ -382,7 +382,7 @@ class Git(FetchMethod):
> >                runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
> > 
> >              runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> > -            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> > +            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
> >              if ud.proto.lower() != 'file':
> >                  bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
> >              progresshandler = GitProgressHandler(d)
> > --
> > 2.35.1
> 
> Seems like the right thing to do. We use Gerrit, which also has its 
> metadata in special refs/ spaces. One repository I tested with grew 
> from 3 MB to 35 MB when I fetched using refs/* while another grew 
> from 20 MB to 120 MB, so there is definitely space and time to be 
> saved by only fetching the refs/heads and refs/tags spaces....

As user of Gerrit, I fear this will cause problems. In my case developers
are used to creating test topics and using git hashes in recipes which
are not yet released, e.g. not yet in release branches or tags. This can of
course create problems when such changes end up in real releases.

Workaround is that developers can create throw away testing branches
and refer to them in recipes.

From one side this is an improvement to have less data in caches, but on
the other side this adds extra actions to developers who want to test
changes to their recipes. Can't decide which one is more important though :/

Cheers,

-Mikko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  5:19   ` [bitbake-devel] " Mikko.Rapeli
@ 2022-08-22  6:57     ` Alexander Kanavin
  2022-08-22  7:38       ` Mikko.Rapeli
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Kanavin @ 2022-08-22  6:57 UTC (permalink / raw)
  To: Mikko Rapeli
  Cc: Martin.Jansa, bitbake-devel, marex, peter.kjellerstedt,
	richard.purdie

[-- Attachment #1: Type: text/plain, Size: 4346 bytes --]

Can be solved with a parameter to a fetcher perhaps?

Alex

On Mon 22. Aug 2022 at 7.20, Mikko Rapeli <mikko.rapeli@bmw.de> wrote:

> Hi,
>
> On Sat, Aug 20, 2022 at 12:06:55PM +0000, Peter Kjellerstedt wrote:
> > > -----Original Message-----
> > > From: Marek Vasut <marex@denx.de>
> > > Sent: den 19 augusti 2022 18:55
> > > To: bitbake-devel@lists.openembedded.org
> > > Cc: Marek Vasut <marex@denx.de>; Martin Jansa <Martin.Jansa@gmail.com>;
> Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Richard Purdie <
> richard.purdie@linuxfoundation.org>
> > > Subject: [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching
> gitlab repository metadata
> > >
> > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > > single object in the remote repository. This works poorly with gitlab
> > > and github, which use the remote git repository to track its metadata
> > > like merge requests, CI pipelines and such.
> > >
> > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > > and refs/keep-around/* and they all contain massive amount of data that
> > > are useless for the bitbake build purposes. The amount of useless data
> > > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > > proxies may outright terminate the 'git fetch' connection, and make it
> > > appear as if bitbake got stuck on 'git fetch' with no output.
> > >
> > > To avoid fetching all these useless metadata, tweak the git fetcher
> such
> > > that it only fetches refs/heads/* and refs/tags/* . Avoid using
> negative
> > > refspecs as those are only available in new git versions.
> > >
> > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > ---
> > > Cc: Martin Jansa <Martin.Jansa@gmail.com>
> > > Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > > Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> > > ---
> > >  lib/bb/fetch2/git.py | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> > > index 4534bd75..b5fc0a51 100644
> > > --- a/lib/bb/fetch2/git.py
> > > +++ b/lib/bb/fetch2/git.py
> > > @@ -382,7 +382,7 @@ class Git(FetchMethod):
> > >                runfetchcmd("%s remote rm origin" % ud.basecmd, d,
> workdir=ud.clonedir)
> > >
> > >              runfetchcmd("%s remote add --mirror=fetch origin %s" %
> (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
> > > -            fetch_cmd = "LANG=C %s fetch -f --progress %s
> refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
> > > +            fetch_cmd = "LANG=C %s fetch -f --progress %s
> refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd,
> shlex.quote(repourl))
> > >              if ud.proto.lower() != 'file':
> > >                  bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
> > >              progresshandler = GitProgressHandler(d)
> > > --
> > > 2.35.1
> >
> > Seems like the right thing to do. We use Gerrit, which also has its
> > metadata in special refs/ spaces. One repository I tested with grew
> > from 3 MB to 35 MB when I fetched using refs/* while another grew
> > from 20 MB to 120 MB, so there is definitely space and time to be
> > saved by only fetching the refs/heads and refs/tags spaces....
>
> As user of Gerrit, I fear this will cause problems. In my case developers
> are used to creating test topics and using git hashes in recipes which
> are not yet released, e.g. not yet in release branches or tags. This can of
> course create problems when such changes end up in real releases.
>
> Workaround is that developers can create throw away testing branches
> and refer to them in recipes.
>
> From one side this is an improvement to have less data in caches, but on
> the other side this adds extra actions to developers who want to test
> changes to their recipes. Can't decide which one is more important though
> :/
>
> Cheers,
>
> -Mikko
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13910):
> https://lists.openembedded.org/g/bitbake-devel/message/13910
> Mute This Topic: https://lists.openembedded.org/mt/93128921/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

[-- Attachment #2: Type: text/html, Size: 6504 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  6:57     ` Alexander Kanavin
@ 2022-08-22  7:38       ` Mikko.Rapeli
  2022-08-22  8:29         ` Marek Vasut
  0 siblings, 1 reply; 23+ messages in thread
From: Mikko.Rapeli @ 2022-08-22  7:38 UTC (permalink / raw)
  To: alex.kanavin
  Cc: Martin.Jansa, bitbake-devel, marex, peter.kjellerstedt,
	richard.purdie

On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
> Can be solved with a parameter to a fetcher perhaps?

Frequently developers know to change the git URL in recipes from
"branch=master" to "nobranch=1" for their test commits.

This could be used for fetching the changes too, to limit the scope.

Cheers,

-Mikko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  7:38       ` Mikko.Rapeli
@ 2022-08-22  8:29         ` Marek Vasut
  2022-08-22  8:37           ` Marek Vasut
  0 siblings, 1 reply; 23+ messages in thread
From: Marek Vasut @ 2022-08-22  8:29 UTC (permalink / raw)
  To: Mikko.Rapeli, alex.kanavin
  Cc: Martin.Jansa, bitbake-devel, peter.kjellerstedt, richard.purdie

On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
>> Can be solved with a parameter to a fetcher perhaps?
> 
> Frequently developers know to change the git URL in recipes from
> "branch=master" to "nobranch=1" for their test commits.
> 
> This could be used for fetching the changes too, to limit the scope.

So maybe the easy way out is, if nobranch=1 then fetch everything, else 
just heads and tags ?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  8:29         ` Marek Vasut
@ 2022-08-22  8:37           ` Marek Vasut
  2022-08-22  8:41             ` Marek Vasut
  2022-08-22  8:41             ` Alexander Kanavin
  0 siblings, 2 replies; 23+ messages in thread
From: Marek Vasut @ 2022-08-22  8:37 UTC (permalink / raw)
  To: Mikko.Rapeli, alex.kanavin
  Cc: Martin.Jansa, bitbake-devel, peter.kjellerstedt, richard.purdie

On 8/22/22 10:29, Marek Vasut wrote:
> On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
>> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
>>> Can be solved with a parameter to a fetcher perhaps?
>>
>> Frequently developers know to change the git URL in recipes from
>> "branch=master" to "nobranch=1" for their test commits.
>>
>> This could be used for fetching the changes too, to limit the scope.
> 
> So maybe the easy way out is, if nobranch=1 then fetch everything, else 
> just heads and tags ?

No, this won't do, nobranch expects the commit to be in a tag.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  8:37           ` Marek Vasut
@ 2022-08-22  8:41             ` Marek Vasut
  2022-08-22  9:09               ` Mikko.Rapeli
  2022-08-22  8:41             ` Alexander Kanavin
  1 sibling, 1 reply; 23+ messages in thread
From: Marek Vasut @ 2022-08-22  8:41 UTC (permalink / raw)
  To: Mikko.Rapeli, alex.kanavin
  Cc: Martin.Jansa, bitbake-devel, peter.kjellerstedt, richard.purdie

On 8/22/22 10:37, Marek Vasut wrote:
> On 8/22/22 10:29, Marek Vasut wrote:
>> On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
>>> On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
>>>> Can be solved with a parameter to a fetcher perhaps?
>>>
>>> Frequently developers know to change the git URL in recipes from
>>> "branch=master" to "nobranch=1" for their test commits.
>>>
>>> This could be used for fetching the changes too, to limit the scope.
>>
>> So maybe the easy way out is, if nobranch=1 then fetch everything, 
>> else just heads and tags ?
> 
> No, this won't do, nobranch expects the commit to be in a tag.

But then, if gerrit works with nobranch=1, then gerrit must be 
generating tags which contain the commits you test ?

And since this patch fetches refs/tags/ , then this patch won't break 
the gerrit setup ?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  8:37           ` Marek Vasut
  2022-08-22  8:41             ` Marek Vasut
@ 2022-08-22  8:41             ` Alexander Kanavin
  2022-08-22 10:35               ` Marek Vasut
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander Kanavin @ 2022-08-22  8:41 UTC (permalink / raw)
  To: Marek Vasut
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel, Peter Kjellerstedt,
	Richard Purdie

On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:

> > So maybe the easy way out is, if nobranch=1 then fetch everything, else
> > just heads and tags ?
>
> No, this won't do, nobranch expects the commit to be in a tag.

I don't think it expects that.

Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  8:41             ` Marek Vasut
@ 2022-08-22  9:09               ` Mikko.Rapeli
  0 siblings, 0 replies; 23+ messages in thread
From: Mikko.Rapeli @ 2022-08-22  9:09 UTC (permalink / raw)
  To: marex
  Cc: alex.kanavin, Martin.Jansa, bitbake-devel, peter.kjellerstedt,
	richard.purdie

Hi,

On Mon, Aug 22, 2022 at 10:41:23AM +0200, Marek Vasut wrote:
> On 8/22/22 10:37, Marek Vasut wrote:
> > On 8/22/22 10:29, Marek Vasut wrote:
> > > On 8/22/22 09:38, Mikko.Rapeli@bmw.de wrote:
> > > > On Mon, Aug 22, 2022 at 08:57:11AM +0200, Alexander Kanavin wrote:
> > > > > Can be solved with a parameter to a fetcher perhaps?
> > > > 
> > > > Frequently developers know to change the git URL in recipes from
> > > > "branch=master" to "nobranch=1" for their test commits.
> > > > 
> > > > This could be used for fetching the changes too, to limit the scope.
> > > 
> > > So maybe the easy way out is, if nobranch=1 then fetch everything,
> > > else just heads and tags ?

To me this would be the way to go.

> > No, this won't do, nobranch expects the commit to be in a tag.
> 
> But then, if gerrit works with nobranch=1, then gerrit must be generating
> tags which contain the commits you test ?
> 
> And since this patch fetches refs/tags/ , then this patch won't break the
> gerrit setup ?

nobranch=1 works with any branch or tag or open gerrit review commit id.
At least with sumo and dunfell.

Cheers,

-Mikko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22  8:41             ` Alexander Kanavin
@ 2022-08-22 10:35               ` Marek Vasut
  2022-08-22 10:51                 ` Mikko.Rapeli
  2022-08-22 10:57                 ` Quentin Schulz
  0 siblings, 2 replies; 23+ messages in thread
From: Marek Vasut @ 2022-08-22 10:35 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel, Peter Kjellerstedt,
	Richard Purdie

On 8/22/22 10:41, Alexander Kanavin wrote:
> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> 
>>> So maybe the easy way out is, if nobranch=1 then fetch everything, else
>>> just heads and tags ?
>>
>> No, this won't do, nobranch expects the commit to be in a tag.
> 
> I don't think it expects that.

Documentation says it does:

https://git.openembedded.org/bitbake/tree/lib/bb/fetch2/git.py#n45
"
- nobranch
    Don't check the SHA validation for branch. set this option for the 
recipe
    referring to commit which is valid in tag instead of branch.
    The default is "0", set nobranch=1 if needed.
"


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 10:35               ` Marek Vasut
@ 2022-08-22 10:51                 ` Mikko.Rapeli
  2022-08-22 10:57                 ` Quentin Schulz
  1 sibling, 0 replies; 23+ messages in thread
From: Mikko.Rapeli @ 2022-08-22 10:51 UTC (permalink / raw)
  To: marex
  Cc: alex.kanavin, Martin.Jansa, bitbake-devel, peter.kjellerstedt,
	richard.purdie

On Mon, Aug 22, 2022 at 12:35:08PM +0200, Marek Vasut wrote:
> On 8/22/22 10:41, Alexander Kanavin wrote:
> > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> > 
> > > > So maybe the easy way out is, if nobranch=1 then fetch everything, else
> > > > just heads and tags ?
> > > 
> > > No, this won't do, nobranch expects the commit to be in a tag.
> > 
> > I don't think it expects that.
> 
> Documentation says it does:
> 
> https://git.openembedded.org/bitbake/tree/lib/bb/fetch2/git.py#n45
> "
> - nobranch
>    Don't check the SHA validation for branch. set this option for the recipe
>    referring to commit which is valid in tag instead of branch.
>    The default is "0", set nobranch=1 if needed.
> "

Only the first sentence is enforced. The change can still be in a branch, in
a tag, in random other namespace as long as the commit is found at checkout
time.

Cheers,

-Mikko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 10:35               ` Marek Vasut
  2022-08-22 10:51                 ` Mikko.Rapeli
@ 2022-08-22 10:57                 ` Quentin Schulz
  2022-08-22 11:55                   ` Marek Vasut
  1 sibling, 1 reply; 23+ messages in thread
From: Quentin Schulz @ 2022-08-22 10:57 UTC (permalink / raw)
  To: Marek Vasut, Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel, Peter Kjellerstedt,
	Richard Purdie

Hi Marek,

On 8/22/22 12:35, Marek Vasut wrote:
> On 8/22/22 10:41, Alexander Kanavin wrote:
>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
>>
>>>> So maybe the easy way out is, if nobranch=1 then fetch everything, else
>>>> just heads and tags ?
>>>
>>> No, this won't do, nobranch expects the commit to be in a tag.
>>
>> I don't think it expects that.
> 
> Documentation says it does:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= 
> "
> - nobranch
>     Don't check the SHA validation for branch. set this option for the 
> recipe
>     referring to commit which is valid in tag instead of branch.

I assume this was meant to give the example of tags which aren't 
necessarily in a branch (annotated tags or tags of commits not belong to 
any branch anymore (force-push for example, or branch deletion).

The git fetcher does a git log --pretty=oneline -n 1 <hash> when 
nobranch is set, otherwise git branch --contains <hash> --list <branch> 
to check whether a commit exists and can be used by bitbake.

Considering this check, I assume nobranch=1 is working for any commit 
that was fetched by the git fetcher?

(We need to update the docs to reflect that in that case).

Cheers,
Quentin

>     The default is "0", set nobranch=1 if needed.
> "
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13918): https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_g_bitbake-2Ddevel_message_13918&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=g4KWyxwbq71V3gbvIJNG-oA9Gdvj3A5wqfz8Kws5qZg&e=
> Mute This Topic: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_mt_93128921_6293953&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=KMelWJJL5NtG7NWmtiS3jFAONb4GRttyl1ziLzEHhr8&e=
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.openembedded.org_g_bitbake-2Ddevel_unsub&d=DwIFaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=nyYO56WzR2jmLa4g95pgXToervc-fJhqbVjnOOUDm0g&e=   [quentin.schulz@theobroma-systems.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 10:57                 ` Quentin Schulz
@ 2022-08-22 11:55                   ` Marek Vasut
  2022-08-22 14:17                     ` Richard Purdie
  0 siblings, 1 reply; 23+ messages in thread
From: Marek Vasut @ 2022-08-22 11:55 UTC (permalink / raw)
  To: Quentin Schulz, Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel, Peter Kjellerstedt,
	Richard Purdie

On 8/22/22 12:57, Quentin Schulz wrote:
> Hi Marek,
> 
> On 8/22/22 12:35, Marek Vasut wrote:
>> On 8/22/22 10:41, Alexander Kanavin wrote:
>>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
>>>
>>>>> So maybe the easy way out is, if nobranch=1 then fetch everything, 
>>>>> else
>>>>> just heads and tags ?
>>>>
>>>> No, this won't do, nobranch expects the commit to be in a tag.
>>>
>>> I don't think it expects that.
>>
>> Documentation says it does:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
>> - nobranch
>>     Don't check the SHA validation for branch. set this option for the 
>> recipe
>>     referring to commit which is valid in tag instead of branch.
> 
> I assume this was meant to give the example of tags which aren't 
> necessarily in a branch (annotated tags or tags of commits not belong to 
> any branch anymore (force-push for example, or branch deletion).
> 
> The git fetcher does a git log --pretty=oneline -n 1 <hash> when 
> nobranch is set, otherwise git branch --contains <hash> --list <branch> 
> to check whether a commit exists and can be used by bitbake.
> 
> Considering this check, I assume nobranch=1 is working for any commit 
> that was fetched by the git fetcher?
> 
> (We need to update the docs to reflect that in that case).

In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head 
refs/tags' otherwise ?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 11:55                   ` Marek Vasut
@ 2022-08-22 14:17                     ` Richard Purdie
  2022-08-22 15:21                       ` Peter Kjellerstedt
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Purdie @ 2022-08-22 14:17 UTC (permalink / raw)
  To: Marek Vasut, Quentin Schulz, Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel, Peter Kjellerstedt

On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote:
> On 8/22/22 12:57, Quentin Schulz wrote:
> > Hi Marek,
> > 
> > On 8/22/22 12:35, Marek Vasut wrote:
> > > On 8/22/22 10:41, Alexander Kanavin wrote:
> > > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> > > > 
> > > > > > So maybe the easy way out is, if nobranch=1 then fetch everything, 
> > > > > > else
> > > > > > just heads and tags ?
> > > > > 
> > > > > No, this won't do, nobranch expects the commit to be in a tag.
> > > > 
> > > > I don't think it expects that.
> > > 
> > > Documentation says it does:
> > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq8yBP6m6qZZ4njZguQhZhkI_-172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6oI&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
> > > - nobranch
> > >     Don't check the SHA validation for branch. set this option for the 
> > > recipe
> > >     referring to commit which is valid in tag instead of branch.
> > 
> > I assume this was meant to give the example of tags which aren't 
> > necessarily in a branch (annotated tags or tags of commits not belong to 
> > any branch anymore (force-push for example, or branch deletion).
> > 
> > The git fetcher does a git log --pretty=oneline -n 1 <hash> when 
> > nobranch is set, otherwise git branch --contains <hash> --list <branch> 
> > to check whether a commit exists and can be used by bitbake.
> > 
> > Considering this check, I assume nobranch=1 is working for any commit 
> > that was fetched by the git fetcher?
> > 
> > (We need to update the docs to reflect that in that case).
> 
> In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head 
> refs/tags' otherwise ?

This does get a bit more complex though since you now need two
different mirror tarballs, one for each option. The code can do that if
setup correctly but we do need to cover that issue.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 14:17                     ` Richard Purdie
@ 2022-08-22 15:21                       ` Peter Kjellerstedt
  2022-08-22 16:39                         ` Marek Vasut
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Kjellerstedt @ 2022-08-22 15:21 UTC (permalink / raw)
  To: Richard Purdie, Marek Vasut, Quentin Schulz, Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel

> -----Original Message-----
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
> Sent: den 22 augusti 2022 16:17
> To: Marek Vasut <marex@denx.de>; Quentin Schulz <quentin.schulz@theobroma-
> systems.com>; Alexander Kanavin <alex.kanavin@gmail.com>
> Cc: Mikko Rapeli <Mikko.Rapeli@bmw.de>; Martin Jansa
> <Martin.Jansa@gmail.com>; bitbake-devel <bitbake-
> devel@lists.openembedded.org>; Peter Kjellerstedt
> <peter.kjellerstedt@axis.com>
> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher
> from fetching gitlab repository metadata
> 
> On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote:
> > On 8/22/22 12:57, Quentin Schulz wrote:
> > > Hi Marek,
> > >
> > > On 8/22/22 12:35, Marek Vasut wrote:
> > > > On 8/22/22 10:41, Alexander Kanavin wrote:
> > > > > On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
> > > > >
> > > > > > > So maybe the easy way out is, if nobranch=1 then fetch
> everything,
> > > > > > > else
> > > > > > > just heads and tags ?
> > > > > >
> > > > > > No, this won't do, nobranch expects the commit to be in a tag.
> > > > >
> > > > > I don't think it expects that.
> > > >
> > > > Documentation says it does:
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-
> 23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq
> 8yBP6m6qZZ4njZguQhZhkI_-
> 172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6o
> I&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
> > > > - nobranch
> > > >     Don't check the SHA validation for branch. set this option for
> the
> > > > recipe
> > > >     referring to commit which is valid in tag instead of branch.
> > >
> > > I assume this was meant to give the example of tags which aren't
> > > necessarily in a branch (annotated tags or tags of commits not belong
> to
> > > any branch anymore (force-push for example, or branch deletion).
> > >
> > > The git fetcher does a git log --pretty=oneline -n 1 <hash> when
> > > nobranch is set, otherwise git branch --contains <hash> --list
> <branch>
> > > to check whether a commit exists and can be used by bitbake.
> > >
> > > Considering this check, I assume nobranch=1 is working for any commit
> > > that was fetched by the git fetcher?
> > >
> > > (We need to update the docs to reflect that in that case).
> >
> > In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head
> > refs/tags' otherwise ?
> 
> This does get a bit more complex though since you now need two
> different mirror tarballs, one for each option. The code can do that if
> setup correctly but we do need to cover that issue.
> 
> Cheers,
> 
> Richard

I made some testing, and for Gerrit to continue to work it would be
enough to use:

            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))

This should not affect other Git servers and should avoid using 
different fetch commands depending on the URL. The drawback is of 
course that for Gerrit, there would be only marginal benefits to 
this change since the majority of its metadata is in the 
refs/changes space.

However, I wonder if the suggested change actually has any significant 
effect, given that the initial clone is done using --mirror, which means 
all refs/ spaces are fetched. If I remove the --mirror option from the 
clone command the change works as expected, but I have no idea if that 
has any other significant impact...

//Peter


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-19 16:54 [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata Marek Vasut
  2022-08-20 12:06 ` Peter Kjellerstedt
@ 2022-08-22 16:02 ` Luca Ceresoli
  2022-08-22 16:06   ` Peter Kjellerstedt
  2022-08-22 16:07   ` Richard Purdie
  1 sibling, 2 replies; 23+ messages in thread
From: Luca Ceresoli @ 2022-08-22 16:02 UTC (permalink / raw)
  To: Marek Vasut
  Cc: bitbake-devel, Martin Jansa, Peter Kjellerstedt, Richard Purdie

Hi Marek,

On Fri, 19 Aug 2022 18:54:55 +0200
"Marek Vasut" <marex@denx.de> wrote:

> The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> single object in the remote repository. This works poorly with gitlab
> and github, which use the remote git repository to track its metadata
> like merge requests, CI pipelines and such.
> 
> Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> and refs/keep-around/* and they all contain massive amount of data that
> are useless for the bitbake build purposes. The amount of useless data
> can in fact be so massive (e.g. with FDO mesa.git repository) that some
> proxies may outright terminate the 'git fetch' connection, and make it
> appear as if bitbake got stuck on 'git fetch' with no output.
> 
> To avoid fetching all these useless metadata, tweak the git fetcher such
> that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> refspecs as those are only available in new git versions.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>

Of course this might become irrelevant with whatever implementation
will be in v2, however when testing with this patch applied I got the
following warning and wonder whether they are related:

WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR

Full log:
https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 16:02 ` Luca Ceresoli
@ 2022-08-22 16:06   ` Peter Kjellerstedt
  2022-08-22 16:07   ` Richard Purdie
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Kjellerstedt @ 2022-08-22 16:06 UTC (permalink / raw)
  To: Luca Ceresoli, Marek Vasut
  Cc: bitbake-devel@lists.openembedded.org, Martin Jansa,
	Richard Purdie

> -----Original Message-----
> From: Luca Ceresoli <luca.ceresoli@bootlin.com>
> Sent: den 22 augusti 2022 18:03
> To: Marek Vasut <marex@denx.de>
> Cc: bitbake-devel@lists.openembedded.org; Martin Jansa
> <Martin.Jansa@gmail.com>; Peter Kjellerstedt
> <peter.kjellerstedt@axis.com>; Richard Purdie
> <richard.purdie@linuxfoundation.org>
> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher
> from fetching gitlab repository metadata
> 
> Hi Marek,
> 
> On Fri, 19 Aug 2022 18:54:55 +0200
> "Marek Vasut" <marex@denx.de> wrote:
> 
> > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > single object in the remote repository. This works poorly with gitlab
> > and github, which use the remote git repository to track its metadata
> > like merge requests, CI pipelines and such.
> >
> > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > and refs/keep-around/* and they all contain massive amount of data that
> > are useless for the bitbake build purposes. The amount of useless data
> > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > proxies may outright terminate the 'git fetch' connection, and make it
> > appear as if bitbake got stuck on 'git fetch' with no output.
> >
> > To avoid fetching all these useless metadata, tweak the git fetcher such
> > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > refspecs as those are only available in new git versions.
> >
> > Signed-off-by: Marek Vasut <marex@denx.de>
> 
> Of course this might become irrelevant with whatever implementation
> will be in v2, however when testing with this patch applied I got the
> following warning and wonder whether they are related:
> 
> WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR

I cannot see any reason how they can be related.

> 
> Full log:
> https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/ste ps/32/logs/stdio
> 
> --
> Luca Ceresoli, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

//Peter



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 16:02 ` Luca Ceresoli
  2022-08-22 16:06   ` Peter Kjellerstedt
@ 2022-08-22 16:07   ` Richard Purdie
  2022-08-23 14:34     ` Luca Ceresoli
  1 sibling, 1 reply; 23+ messages in thread
From: Richard Purdie @ 2022-08-22 16:07 UTC (permalink / raw)
  To: Luca Ceresoli, Marek Vasut
  Cc: bitbake-devel, Martin Jansa, Peter Kjellerstedt

On Mon, 2022-08-22 at 18:02 +0200, Luca Ceresoli wrote:
> Hi Marek,
> 
> On Fri, 19 Aug 2022 18:54:55 +0200
> "Marek Vasut" <marex@denx.de> wrote:
> 
> > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > single object in the remote repository. This works poorly with gitlab
> > and github, which use the remote git repository to track its metadata
> > like merge requests, CI pipelines and such.
> > 
> > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > and refs/keep-around/* and they all contain massive amount of data that
> > are useless for the bitbake build purposes. The amount of useless data
> > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > proxies may outright terminate the 'git fetch' connection, and make it
> > appear as if bitbake got stuck on 'git fetch' with no output.
> > 
> > To avoid fetching all these useless metadata, tweak the git fetcher such
> > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > refspecs as those are only available in new git versions.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> 
> Of course this might become irrelevant with whatever implementation
> will be in v2, however when testing with this patch applied I got the
> following warning and wonder whether they are related:
> 
> WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR
> 
> Full log:
> https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio
> 

This is a known issue with an open bug assigned to me (unfortunately),
it isn't related. It is intermittent as it is llvm-native related and
we don't commonly rebuild this codepath.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 15:21                       ` Peter Kjellerstedt
@ 2022-08-22 16:39                         ` Marek Vasut
  2022-09-01 17:50                           ` Marek Vasut
  0 siblings, 1 reply; 23+ messages in thread
From: Marek Vasut @ 2022-08-22 16:39 UTC (permalink / raw)
  To: Peter Kjellerstedt, Richard Purdie, Quentin Schulz,
	Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel

On 8/22/22 17:21, Peter Kjellerstedt wrote:
>> -----Original Message-----
>> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>> Sent: den 22 augusti 2022 16:17
>> To: Marek Vasut <marex@denx.de>; Quentin Schulz <quentin.schulz@theobroma-
>> systems.com>; Alexander Kanavin <alex.kanavin@gmail.com>
>> Cc: Mikko Rapeli <Mikko.Rapeli@bmw.de>; Martin Jansa
>> <Martin.Jansa@gmail.com>; bitbake-devel <bitbake-
>> devel@lists.openembedded.org>; Peter Kjellerstedt
>> <peter.kjellerstedt@axis.com>
>> Subject: Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher
>> from fetching gitlab repository metadata
>>
>> On Mon, 2022-08-22 at 13:55 +0200, Marek Vasut wrote:
>>> On 8/22/22 12:57, Quentin Schulz wrote:
>>>> Hi Marek,
>>>>
>>>> On 8/22/22 12:35, Marek Vasut wrote:
>>>>> On 8/22/22 10:41, Alexander Kanavin wrote:
>>>>>> On Mon, 22 Aug 2022 at 10:37, Marek Vasut <marex@denx.de> wrote:
>>>>>>
>>>>>>>> So maybe the easy way out is, if nobranch=1 then fetch
>> everything,
>>>>>>>> else
>>>>>>>> just heads and tags ?
>>>>>>>
>>>>>>> No, this won't do, nobranch expects the commit to be in a tag.
>>>>>>
>>>>>> I don't think it expects that.
>>>>>
>>>>> Documentation says it does:
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__git.openembedded.org_bitbake_tree_lib_bb_fetch2_git.py-
>> 23n45&d=DwICaQ&c=_sEr5x9kUWhuk4_nFwjJtA&r=LYjLexDn7rXIzVmkNPvw5ymA1XTSqHGq
>> 8yBP6m6qZZ4njZguQhZhkI_-
>> 172IIy1t&m=uZNWVMGEowy_ntO8q5fjINXu3LIe9haqbSTYjwWiqO6Q5sEPsUIx5nw28YTBw6o
>> I&s=_AJ3mkGnnM8peSNP8k6MePZ0RtEkQLo7yS1Cll2yjmc&e= "
>>>>> - nobranch
>>>>>      Don't check the SHA validation for branch. set this option for
>> the
>>>>> recipe
>>>>>      referring to commit which is valid in tag instead of branch.
>>>>
>>>> I assume this was meant to give the example of tags which aren't
>>>> necessarily in a branch (annotated tags or tags of commits not belong
>> to
>>>> any branch anymore (force-push for example, or branch deletion).
>>>>
>>>> The git fetcher does a git log --pretty=oneline -n 1 <hash> when
>>>> nobranch is set, otherwise git branch --contains <hash> --list
>> <branch>
>>>> to check whether a commit exists and can be used by bitbake.
>>>>
>>>> Considering this check, I assume nobranch=1 is working for any commit
>>>> that was fetched by the git fetcher?
>>>>
>>>> (We need to update the docs to reflect that in that case).
>>>
>>> In that case, 'git fetch refs/*' in case nobranch is set and 'refs/head
>>> refs/tags' otherwise ?
>>
>> This does get a bit more complex though since you now need two
>> different mirror tarballs, one for each option. The code can do that if
>> setup correctly but we do need to cover that issue.
>>
>> Cheers,
>>
>> Richard
> 
> I made some testing, and for Gerrit to continue to work it would be
> enough to use:
> 
>              fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))
> 
> This should not affect other Git servers and should avoid using
> different fetch commands depending on the URL. The drawback is of
> course that for Gerrit, there would be only marginal benefits to
> this change since the majority of its metadata is in the
> refs/changes space.
> 
> However, I wonder if the suggested change actually has any significant
> effect, given that the initial clone is done using --mirror, which means
> all refs/ spaces are fetched. If I remove the --mirror option from the
> clone command the change works as expected, but I have no idea if that
> has any other significant impact...

With this change, I am able to actually fetch mesa from 
gitlab.freedesktop.org without local CI proxy terminating the connection 
in the process. So yes, it does have effect.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 16:07   ` Richard Purdie
@ 2022-08-23 14:34     ` Luca Ceresoli
  0 siblings, 0 replies; 23+ messages in thread
From: Luca Ceresoli @ 2022-08-23 14:34 UTC (permalink / raw)
  To: Richard Purdie
  Cc: Marek Vasut, bitbake-devel, Martin Jansa, Peter Kjellerstedt

Hi Richard, Peter,

On Mon, 22 Aug 2022 17:07:50 +0100
"Richard Purdie" <richard.purdie@linuxfoundation.org> wrote:

> On Mon, 2022-08-22 at 18:02 +0200, Luca Ceresoli wrote:
> > Hi Marek,
> > 
> > On Fri, 19 Aug 2022 18:54:55 +0200
> > "Marek Vasut" <marex@denx.de> wrote:
> >   
> > > The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> > > single object in the remote repository. This works poorly with gitlab
> > > and github, which use the remote git repository to track its metadata
> > > like merge requests, CI pipelines and such.
> > > 
> > > Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> > > and refs/keep-around/* and they all contain massive amount of data that
> > > are useless for the bitbake build purposes. The amount of useless data
> > > can in fact be so massive (e.g. with FDO mesa.git repository) that some
> > > proxies may outright terminate the 'git fetch' connection, and make it
> > > appear as if bitbake got stuck on 'git fetch' with no output.
> > > 
> > > To avoid fetching all these useless metadata, tweak the git fetcher such
> > > that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> > > refspecs as those are only available in new git versions.
> > > 
> > > Signed-off-by: Marek Vasut <marex@denx.de>  
> > 
> > Of course this might become irrelevant with whatever implementation
> > will be in v2, however when testing with this patch applied I got the
> > following warning and wonder whether they are related:
> > 
> > WARNING: mesa-2_22.1.6-r0 do_package_qa: QA Issue: File /usr/lib/dri/nouveau_dri.so in package mesa-megadriver contains reference to TMPDIR
> > 
> > Full log:
> > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/5753/steps/32/logs/stdio
> >   
> 
> This is a known issue with an open bug assigned to me (unfortunately),
> it isn't related. It is intermittent as it is llvm-native related and
> we don't commonly rebuild this codepath.

Indeed, added to https://bugzilla.yoctoproject.org/show_bug.cgi?id=14897

Thanks for the hint and apologies for the noise.

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-08-22 16:39                         ` Marek Vasut
@ 2022-09-01 17:50                           ` Marek Vasut
  2022-09-02 15:54                             ` Richard Purdie
  0 siblings, 1 reply; 23+ messages in thread
From: Marek Vasut @ 2022-09-01 17:50 UTC (permalink / raw)
  To: Peter Kjellerstedt, Richard Purdie, Quentin Schulz,
	Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel

On 8/22/22 18:39, Marek Vasut wrote:

Hi,

[...]

>> I made some testing, and for Gerrit to continue to work it would be
>> enough to use:
>>
>>              fetch_cmd = "LANG=C %s fetch -f --progress %s 
>> refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* 
>> refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))
>>
>> This should not affect other Git servers and should avoid using
>> different fetch commands depending on the URL. The drawback is of
>> course that for Gerrit, there would be only marginal benefits to
>> this change since the majority of its metadata is in the
>> refs/changes space.
>>
>> However, I wonder if the suggested change actually has any significant
>> effect, given that the initial clone is done using --mirror, which means
>> all refs/ spaces are fetched. If I remove the --mirror option from the
>> clone command the change works as expected, but I have no idea if that
>> has any other significant impact...
> 
> With this change, I am able to actually fetch mesa from 
> gitlab.freedesktop.org without local CI proxy terminating the connection 
> in the process. So yes, it does have effect.

I keep running into this problem with mesa, how can we proceed to fix it 
upstream ?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [bitbake-devel] [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-09-01 17:50                           ` Marek Vasut
@ 2022-09-02 15:54                             ` Richard Purdie
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Purdie @ 2022-09-02 15:54 UTC (permalink / raw)
  To: Marek Vasut, Peter Kjellerstedt, Quentin Schulz,
	Alexander Kanavin
  Cc: Mikko Rapeli, Martin Jansa, bitbake-devel

On Thu, 2022-09-01 at 19:50 +0200, Marek Vasut wrote:
> On 8/22/22 18:39, Marek Vasut wrote:
> 
> Hi,
> 
> [...]
> 
> > > I made some testing, and for Gerrit to continue to work it would be
> > > enough to use:
> > > 
> > >              fetch_cmd = "LANG=C %s fetch -f --progress %s 
> > > refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* 
> > > refs/changes/*:refs/changes/*" % (ud.basecmd, shlex.quote(repourl))
> > > 
> > > This should not affect other Git servers and should avoid using
> > > different fetch commands depending on the URL. The drawback is of
> > > course that for Gerrit, there would be only marginal benefits to
> > > this change since the majority of its metadata is in the
> > > refs/changes space.
> > > 
> > > However, I wonder if the suggested change actually has any significant
> > > effect, given that the initial clone is done using --mirror, which means
> > > all refs/ spaces are fetched. If I remove the --mirror option from the
> > > clone command the change works as expected, but I have no idea if that
> > > has any other significant impact...
> > 
> > With this change, I am able to actually fetch mesa from 
> > gitlab.freedesktop.org without local CI proxy terminating the connection 
> > in the process. So yes, it does have effect.
> 
> I keep running into this problem with mesa, how can we proceed to fix it 
> upstream ?

We probably need a version of the patch which restricts by default but
allows it restriction to be turned off on a per url basis with a
parameter.

That restriction needs to be reflected in the mirror tarball name too.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-09-02 15:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-19 16:54 [PATCH] [RFC] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata Marek Vasut
2022-08-20 12:06 ` Peter Kjellerstedt
2022-08-22  5:19   ` [bitbake-devel] " Mikko.Rapeli
2022-08-22  6:57     ` Alexander Kanavin
2022-08-22  7:38       ` Mikko.Rapeli
2022-08-22  8:29         ` Marek Vasut
2022-08-22  8:37           ` Marek Vasut
2022-08-22  8:41             ` Marek Vasut
2022-08-22  9:09               ` Mikko.Rapeli
2022-08-22  8:41             ` Alexander Kanavin
2022-08-22 10:35               ` Marek Vasut
2022-08-22 10:51                 ` Mikko.Rapeli
2022-08-22 10:57                 ` Quentin Schulz
2022-08-22 11:55                   ` Marek Vasut
2022-08-22 14:17                     ` Richard Purdie
2022-08-22 15:21                       ` Peter Kjellerstedt
2022-08-22 16:39                         ` Marek Vasut
2022-09-01 17:50                           ` Marek Vasut
2022-09-02 15:54                             ` Richard Purdie
2022-08-22 16:02 ` Luca Ceresoli
2022-08-22 16:06   ` Peter Kjellerstedt
2022-08-22 16:07   ` Richard Purdie
2022-08-23 14:34     ` Luca Ceresoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.