public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
From: Akash Hadke <akash.hadke27@gmail.com>
To: openembedded-core@lists.openembedded.org
Cc: Stefan Koch <stefan-koch@siemens.com>,
	Richard Purdie <richard.purdie@linuxfoundation.org>
Subject: [poky][scarthgap][PATCH 08/23] bitbake: fetch2/git: Add support for fast initial shallow fetch
Date: Fri,  8 Aug 2025 14:19:16 +0530	[thread overview]
Message-ID: <20250808084931.2156763-8-akash.hadke27@gmail.com> (raw)
In-Reply-To: <20250808084931.2156763-1-akash.hadke27@gmail.com>

From: Stefan Koch <stefan-koch@siemens.com>

When `ud.shallow == 1`:
- Prefer an initial shallow clone over an initial full bare clone,
  while still utilizing any already existing full bare clones.
- If the Git error "Server does not allow request for unadvertised object"
  occurs, the initial full bare clone is fetched automatically.
  This may happen if the Git server does not allow the request
  or if the Git client has issues with this functionality,
  especially with the Git client from Ubuntu 20.04.

This improves:
- Resolve timeout issues during initial clones on slow internet connections
  by reducing the amount of data transferred.
- Eliminate the need to use an HTTPS tarball `SRC_URI`
  to reduce data transfer.
- Allow SSH-based authentication (e.g. cert and agent-based) when
  using non-public repos, so additional HTTPS tokens may not be required.

(Bitbake rev: 457288b2fda86fd00cdcaefac616129b0029e1f9)

Signed-off-by: Stefan Koch <stefan-koch@siemens.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 65ae50cd16a3989699bea845566d38476f2ae9a7)
Signed-off-by: Akash Hadke <akash.hadke27@gmail.com>
---
 bitbake/lib/bb/fetch2/git.py | 114 ++++++++++++++++++++++++++---------
 1 file changed, 85 insertions(+), 29 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index 168f14d0c8..9a15abaa79 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -207,6 +207,7 @@ class Git(FetchMethod):
         if ud.bareclone:
             ud.cloneflags += " --mirror"
 
+        ud.shallow_skip_fast = False
         ud.shallow = d.getVar("BB_GIT_SHALLOW") == "1"
         ud.shallow_extra_refs = (d.getVar("BB_GIT_SHALLOW_EXTRA_REFS") or "").split()
 
@@ -446,6 +447,24 @@ class Git(FetchMethod):
             if ud.proto.lower() != 'file':
                 bb.fetch2.check_network_access(d, clone_cmd, ud.url)
             progresshandler = GitProgressHandler(d)
+
+            # Try creating a fast initial shallow clone
+            # Enabling ud.shallow_skip_fast will skip this
+            # If the Git error "Server does not allow request for unadvertised object"
+            # occurs, shallow_skip_fast is enabled automatically.
+            # This may happen if the Git server does not allow the request
+            # or if the Git client has issues with this functionality.
+            if ud.shallow and not ud.shallow_skip_fast:
+                try:
+                    self.clone_shallow_with_tarball(ud, d)
+                    # When the shallow clone has succeeded, use the shallow tarball
+                    ud.localpath = ud.fullshallow
+                    return
+                except:
+                    logger.warning("Creating fast initial shallow clone failed, try initial regular clone now.")
+
+            # When skipping fast initial shallow or the fast inital shallow clone failed:
+            # Try again with an initial regular clone
             runfetchcmd(clone_cmd, d, log=progresshandler)
 
         # Update the checkout if needed
@@ -508,48 +527,74 @@ class Git(FetchMethod):
                 if os.path.exists(os.path.join(ud.destdir, ".git", "lfs")):
                     runfetchcmd("tar -cf - lfs | tar -xf - -C %s" % ud.clonedir, d, workdir="%s/.git" % ud.destdir)
 
-    def build_mirror_data(self, ud, d):
-
-        # Create as a temp file and move atomically into position to avoid races
-        @contextmanager
-        def create_atomic(filename):
-            fd, tfile = tempfile.mkstemp(dir=os.path.dirname(filename))
-            try:
-                yield tfile
-                umask = os.umask(0o666)
-                os.umask(umask)
-                os.chmod(tfile, (0o666 & ~umask))
-                os.rename(tfile, filename)
-            finally:
-                os.close(fd)
+    def lfs_fetch(self, ud, d, clonedir, revision, fetchall=False, progresshandler=None):
+        """Helper method for fetching Git LFS data"""
+        try:
+            if self._need_lfs(ud) and self._contains_lfs(ud, d, clonedir) and self._find_git_lfs(d) and len(revision):
+                # Using worktree with the revision because .lfsconfig may exists
+                worktree_add_cmd = "%s worktree add wt %s" % (ud.basecmd, revision)
+                runfetchcmd(worktree_add_cmd, d, log=progresshandler, workdir=clonedir)
+                lfs_fetch_cmd = "%s lfs fetch %s" % (ud.basecmd, "--all" if fetchall else "")
+                runfetchcmd(lfs_fetch_cmd, d, log=progresshandler, workdir=(clonedir + "/wt"))
+                worktree_rem_cmd = "%s worktree remove -f wt" % ud.basecmd
+                runfetchcmd(worktree_rem_cmd, d, log=progresshandler, workdir=clonedir)
+        except:
+            logger.warning("Fetching LFS did not succeed.")
+
+    @contextmanager
+    def create_atomic(self, filename):
+        """Create as a temp file and move atomically into position to avoid races"""
+        fd, tfile = tempfile.mkstemp(dir=os.path.dirname(filename))
+        try:
+            yield tfile
+            umask = os.umask(0o666)
+            os.umask(umask)
+            os.chmod(tfile, (0o666 & ~umask))
+            os.rename(tfile, filename)
+        finally:
+            os.close(fd)
 
+    def build_mirror_data(self, ud, d):
         if ud.shallow and ud.write_shallow_tarballs:
             if not os.path.exists(ud.fullshallow):
                 if os.path.islink(ud.fullshallow):
                     os.unlink(ud.fullshallow)
-                tempdir = tempfile.mkdtemp(dir=d.getVar('DL_DIR'))
-                shallowclone = os.path.join(tempdir, 'git')
-                try:
-                    self.clone_shallow_local(ud, shallowclone, d)
-
-                    logger.info("Creating tarball of git repository")
-                    with create_atomic(ud.fullshallow) as tfile:
-                        runfetchcmd("tar -czf %s ." % tfile, d, workdir=shallowclone)
-                    runfetchcmd("touch %s.done" % ud.fullshallow, d)
-                finally:
-                    bb.utils.remove(tempdir, recurse=True)
+                self.clone_shallow_with_tarball(ud, d)
         elif ud.write_tarballs and not os.path.exists(ud.fullmirror):
             if os.path.islink(ud.fullmirror):
                 os.unlink(ud.fullmirror)
 
             logger.info("Creating tarball of git repository")
-            with create_atomic(ud.fullmirror) as tfile:
+            with self.create_atomic(ud.fullmirror) as tfile:
                 mtime = runfetchcmd("{} log --all -1 --format=%cD".format(ud.basecmd), d,
                         quiet=True, workdir=ud.clonedir)
                 runfetchcmd("tar -czf %s --owner oe:0 --group oe:0 --mtime \"%s\" ."
                         % (tfile, mtime), d, workdir=ud.clonedir)
             runfetchcmd("touch %s.done" % ud.fullmirror, d)
 
+    def clone_shallow_with_tarball(self, ud, d):
+        ret = False
+        tempdir = tempfile.mkdtemp(dir=d.getVar('DL_DIR'))
+        shallowclone = os.path.join(tempdir, 'git')
+        try:
+            try:
+                self.clone_shallow_local(ud, shallowclone, d)
+            except:
+                logger.warning("Fash shallow clone failed, try to skip fast mode now.")
+                bb.utils.remove(tempdir, recurse=True)
+                os.mkdir(tempdir)
+                ud.shallow_skip_fast = True
+                self.clone_shallow_local(ud, shallowclone, d)
+            logger.info("Creating tarball of git repository")
+            with self.create_atomic(ud.fullshallow) as tfile:
+                runfetchcmd("tar -czf %s ." % tfile, d, workdir=shallowclone)
+            runfetchcmd("touch %s.done" % ud.fullshallow, d)
+            ret = True
+        finally:
+            bb.utils.remove(tempdir, recurse=True)
+
+        return ret
+
     def clone_shallow_local(self, ud, dest, d):
         """
         Shallow fetch from ud.clonedir (${DL_DIR}/git2/<gitrepo> by default):
@@ -557,12 +602,20 @@ class Git(FetchMethod):
         - For BB_GIT_SHALLOW_REVS: git fetch --shallow-exclude=<revs> rev
         """
 
+        progresshandler = GitProgressHandler(d)
+        repourl = self._get_repo_url(ud)
         bb.utils.mkdirhier(dest)
         init_cmd = "%s init -q" % ud.basecmd
         if ud.bareclone:
             init_cmd += " --bare"
         runfetchcmd(init_cmd, d, workdir=dest)
-        runfetchcmd("%s remote add origin %s" % (ud.basecmd, ud.clonedir), d, workdir=dest)
+        # Use repourl when creating a fast initial shallow clone
+        # Prefer already existing full bare clones if available
+        if not ud.shallow_skip_fast and not os.path.exists(ud.clonedir):
+            remote = shlex.quote(repourl)
+        else:
+            remote = ud.clonedir
+        runfetchcmd("%s remote add origin %s" % (ud.basecmd, remote), d, workdir=dest)
 
         # Check the histories which should be excluded
         shallow_exclude = ''
@@ -600,10 +653,14 @@ class Git(FetchMethod):
             # The ud.clonedir is a local temporary dir, will be removed when
             # fetch is done, so we can do anything on it.
             adv_cmd = 'git branch -f advertise-%s %s' % (revision, revision)
-            runfetchcmd(adv_cmd, d, workdir=ud.clonedir)
+            if ud.shallow_skip_fast:
+                runfetchcmd(adv_cmd, d, workdir=ud.clonedir)
 
             runfetchcmd(fetch_cmd, d, workdir=dest)
             runfetchcmd("%s update-ref %s %s" % (ud.basecmd, ref, revision), d, workdir=dest)
+            # Fetch Git LFS data for fast shallow clones
+            if not ud.shallow_skip_fast:
+                self.lfs_fetch(ud, d, dest, ud.revisions[ud.names[0]])
 
         # Apply extra ref wildcards
         all_refs_remote = runfetchcmd("%s ls-remote origin 'refs/*'" % ud.basecmd, \
@@ -629,7 +686,6 @@ class Git(FetchMethod):
             runfetchcmd("%s update-ref %s %s" % (ud.basecmd, ref, revision), d, workdir=dest)
 
         # The url is local ud.clonedir, set it to upstream one
-        repourl = self._get_repo_url(ud)
         runfetchcmd("%s remote set-url origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=dest)
 
     def unpack(self, ud, destdir, d):
-- 
2.25.1



  parent reply	other threads:[~2025-08-08  8:50 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-08  8:49 [poky][scarthgap][PATCH 01/23] bitbake: fetch2/git: Use git shallow fetch to implement clone_shallow_local() Akash Hadke
2025-08-08  8:49 ` [poky][scarthgap][PATCH 02/23] bitbake: bitbake: tests/fetch: Update GitShallowTest for clone_shallow_local() Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 03/23] bitbake: fetch2/git: Enforce default remote name to "origin" Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 04/23] bitbake: gitsm: Add clean function Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 05/23] bitbake: git: Clean shallow mirror tarball Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 06/23] bitbake: git: Clean broken symlink Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 07/23] bitbake: lib: Remove double imports Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` Akash Hadke [this message]
2025-08-08  9:10   ` Patchtest results for [poky][scarthgap][PATCH 08/23] bitbake: fetch2/git: Add support for fast initial shallow fetch patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 09/23] bitbake: fetch2/gitsm: Unpack even when `ud.clonedir` is not available Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 10/23] bitbake: tests/fetch: Adapt test cases for fast shallow fetches Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 11/23] bitbake: fetch2/git: Restore escape quoting for the git url when used Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 12/23] bitbake: fetch/git: always fetch lfs when creating shallow tarball Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 13/23] bitbake: tests/fetch: Move commonly used imports to top Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 14/23] bitbake: fetch2: Check for git-lfs existence before using it Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 15/23] bitbake: fetch2: Simplify git LFS detection Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 16/23] bitbake: fetch2: Use git-lfs fetch to download objects Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 17/23] bitbake: fetch2: Fix incorrect lfs parametrization for submodules Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 18/23] bitbake: fetch2: Fix LFS object checkout in submodules Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 19/23] bitbake: tests/fetch: Test gitsm with LFS Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 20/23] bitbake: fetch2/git: fix shallow clone for tag containing slash Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 21/23] bitbake: fetch2: Move the `ensure_symlink()` function into the `FetchMethod` class Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 22/23] bitbake: fetch2: Ensure a valid symlink in `PREMIRRORS` case when using shallow cloning Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-08  8:49 ` [poky][scarthgap][PATCH 23/23] bitbake: fetch2/git: Add multiple revision support Akash Hadke
2025-08-08  9:10   ` Patchtest results for " patchtest
2025-08-11  8:23   ` [OE-core] " Alexander Kanavin
     [not found]     ` <10458.1760096658331781386@lists.openembedded.org>
2025-10-10 12:41       ` Richard Purdie
2025-08-08  9:10 ` Patchtest results for [poky][scarthgap][PATCH 01/23] bitbake: fetch2/git: Use git shallow fetch to implement clone_shallow_local() patchtest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250808084931.2156763-8-akash.hadke27@gmail.com \
    --to=akash.hadke27@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=stefan-koch@siemens.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox