[RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
@ 2024-12-20 11:25 Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 01/21] tests: fetch: update npmsw tests to new lockfile format Stefan Herbrechtsmeier
                   ` (24 more replies)
  0 siblings, 25 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

The patch series improves the fetcher support for tightly coupled
package manager (npm, go and cargo). It adds support for embedded
dependency fetcher via a common dependency mixin. The patch series
reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
fetcher for go.sum and cargo.lock files. The dependency mixin contains
two stages. The first stage locates a local specification file or
fetches an archive or git repository with a specification file. The
second stage resolves the dependency URLs from the specification file
and fetches the dependencies.

SRC_URI = "<type>://npm-shrinkwrap.json"
SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"

Additionally, the patch series reworks the npm fetcher to work without a
npm binary and external package repository. It adds support for a common
dependency name and version schema to integrate the dependencies into
the SBOM.

= Background
Bitbake has diverse concepts and drawbacks for different tightly coupled
package manager. The Python support uses a recipe per dependency and
generates common fetcher URLs via a python function. The other languages
embed the dependencies inside the recipe. The Node.js support offers a
npmsw fetcher which uses a lock file beside the recipe to generates
multiple common fetcher URLs on the fly and thereby hides the real
download sources. This leads to a single source in the SBOM for example.
The Go support contains two parallel implementations. A vendor-based
solution with a common fetcher and a go-mod-based solution with a gomod
fetcher. The vendor-based solution includes the individual dependencies
into the SRC_URI of the recipe and uses a python function to generate
common fetcher URLs which additional information for the vendor task.The
gomod fetcher uses a proprietary gomod URL. It translates the URL into a
common URL and prepares meta data during unpack. The Rust support
includes the individual dependencies in the SRC_URI of the recipe and
uses proprietary crate URLs. The crate fetcher translates a proprietary
URL into a common fetcher URL and prepares meta data during unpack. The
recipetool does not support the crate and the gomod fetcher. This leads
to missing licenses of the dependencies in the recipe for example
librsvg.

The steps needed to fetch dependencies for Node.js, Go and Rust are
similar:
1. Extract the dependencies from a specification file (name, version,
   checksum and URL)
2. Generate proprietary fetcher URIs
  a. npm://registry.npmjs.org/;package=glob;version= 10.3.15
  b. gomod://golang.org/x/net;version=v0.9.0
     gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net
  c. crate://crates.io/glob/0.3.1
3. Generate wget or git fetcher URIs
  a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=…
  b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=…
     git://go.googlesource.com/net;protocol=https; subdir=…
  c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=…
4. Unpack
5. Create meta files
  a. Update lockfile and create tar.gz archives
  b. Create go.mod file
     Create info, go.mod file and zip archives
  c. Create .cargo-checksum.json files

It looks like the recipetool is not widely used and therefore this patch
series integrates the dependency resolving into the fetcher. After an
agreement on a concept the fetcher could be extended. The fetcher could
download the license information per package and a new build task could
run the license cruncher from the recipetool.

= Open questions

* Where should we download dependencies?
** Should we use a folder per fetcher (ex. git and npm)?
** Should we use the main folder (ex. crate)?
** Should we translate the name into folder (ex. gomod)?
** Should we integrate the name into the filename (ex. git)?
* Where should we unpack the dependencies?
** Should we use a folder inside the parent folder (ex. node_modules)?
** Should we use a fixed folder inside unpackdir
   (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
* How should we treat archives for package manager caches?
** Should we unpack the archives to support patching (ex. npm)?
** Should we copy the packed archive to avoid unpacking and packaging
   (ex. gomod)?

This patch series depends on patch series
20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com
("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior").

Stefan Herbrechtsmeier (21):
  tests: fetch: update npmsw tests to new lockfile format
  fetch2: npmsw: remove old lockfile format support
  tests: fetch: replace [url] with urls for npm
  fetch2: do not prefix embedded checksums
  fetch2: read checksum from SRC_URI flag for npm
  fetch2: introduce common package manager metadata
  fetch2: add unpack support for npm archives
  utils: add Go mod h1 checksum support
  fetch2: add destdir to FetchData
  fetch: npm: rework
  tests: fetch: adapt style in npm(sw) class
  tests: fetch: move npmsw test cases into npmsw test class
  tests: fetch: adapt npm test cases
  fetch: add dependency mixin
  tests: fetch: add test cases for dependency fetcher
  fetch: npmsw: migrate to dependency mixin
  tests: fetch: adapt npmsw test cases
  fetch: add gosum fetcher
  tests: fetch: add test cases for gosum
  fetch: add cargolock fetcher
  tests: fetch: add test cases for cargolock

 lib/bb/fetch2/__init__.py   |  35 +-
 lib/bb/fetch2/cargolock.py  |  73 +++
 lib/bb/fetch2/dependency.py | 167 +++++++
 lib/bb/fetch2/gomod.py      |   5 +-
 lib/bb/fetch2/gosum.py      |  51 +++
 lib/bb/fetch2/npm.py        | 244 +++-------
 lib/bb/fetch2/npmsw.py      | 347 ++++----------
 lib/bb/tests/fetch.py       | 880 +++++++++++++++++-------------------
 lib/bb/utils.py             |  25 +
 9 files changed, 916 insertions(+), 911 deletions(-)
 create mode 100644 lib/bb/fetch2/cargolock.py
 create mode 100644 lib/bb/fetch2/dependency.py
 create mode 100644 lib/bb/fetch2/gosum.py

-- 
2.39.5

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH 01/21] tests: fetch: update npmsw tests to new lockfile format
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 02/21] fetch2: npmsw: remove old lockfile format support Stefan Herbrechtsmeier
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Update npmsw test cases to new lockfile format. The old lockfile format
is required by npm 6 / Node.js 14 which is out of maintenance [2].

[1] https://docs.npmjs.com/cli/v6/configuring-npm/package-lock-json
[2] https://nodejs.org/en/about/previous-releases

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 135 +++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 81 deletions(-)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 6dda0d381..b89348236 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -2843,23 +2843,25 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npmsw(self):
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI=',
                     'dependencies': {
-                        'content-type': {
-                            'version': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
-                            'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
-                            'dependencies': {
-                                'cookie': {
-                                    'version': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09',
-                                    'from': 'git+https://github.com/jshttp/cookie.git'
-                                }
-                            }
-                        }
+                        'content-type': "1.0.4"
                     }
+                },
+                'node_modules/array-flatten/node_modules/content-type': {
+                    'version': '1.0.4',
+                    'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
+                    'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
+                    'dependencies': {
+                        'cookie': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09'
+                    }
+                },
+                'node_modules/array-flatten/node_modules/content-type/node_modules/cookie': {
+                    'resolved': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09'
                 }
             }
         })
@@ -2877,10 +2879,9 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npmsw_git(self):
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'cookie': {
-                    'version': 'github:jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09',
-                    'from': 'github:jshttp/cookie.git'
+            'packages': {
+                'node_modules/cookie': {
+                    'resolved': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09'
                 }
             }
         })
@@ -2888,40 +2889,16 @@ class NPMTest(FetcherTest):
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, 'git2', 'github.com.jshttp.cookie.git')))
 
-        swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'cookie': {
-                    'version': 'jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09',
-                    'from': 'jshttp/cookie.git'
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
-        fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'git2', 'github.com.jshttp.cookie.git')))
-
-        swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'nodejs': {
-                    'version': 'gitlab:gitlab-examples/nodejs.git#892a1f16725e56cc3a2cb0d677be42935c8fc262',
-                    'from': 'gitlab:gitlab-examples/nodejs'
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
-        fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'git2', 'gitlab.com.gitlab-examples.nodejs.git')))
-
     @skipIfNoNetwork()
     def test_npmsw_dev(self):
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
                 },
-                'content-type': {
+                'node_modules/content-type': {
                     'version': '1.0.4',
                     'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
                     'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
@@ -2943,8 +2920,8 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npmsw_destsuffix(self):
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -2958,8 +2935,8 @@ class NPMTest(FetcherTest):
 
     def test_npmsw_no_network_no_tarball(self):
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -2981,8 +2958,8 @@ class NPMTest(FetcherTest):
         self.d.setVar('BB_NO_NETWORK', '1')
         # Fetch again
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -2998,8 +2975,8 @@ class NPMTest(FetcherTest):
     def test_npmsw_npm_reusability(self):
         # Fetch once with npmsw
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -3020,8 +2997,8 @@ class NPMTest(FetcherTest):
     def test_npmsw_bad_checksum(self):
         # Try to fetch with bad checksum
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-gfNEp2hqgLTFKT6P3AsBYMgsBqg='
@@ -3033,8 +3010,8 @@ class NPMTest(FetcherTest):
             fetcher.download()
         # Fetch correctly to get a tarball
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -3072,8 +3049,8 @@ class NPMTest(FetcherTest):
         # Fetch again
         self.assertFalse(os.path.exists(ud.localpath))
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -3100,8 +3077,8 @@ class NPMTest(FetcherTest):
         # Fetch again with invalid url
         self.assertFalse(os.path.exists(ud.localpath))
         swfile = self.create_shrinkwrap_file({
-            'dependencies': {
-                'array-flatten': {
+            'packages': {
+                'node_modules/array-flatten': {
                     'version': '1.1.1',
                     'resolved': 'https://invalid',
                     'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
@@ -3114,29 +3091,25 @@ class NPMTest(FetcherTest):
 
     @skipIfNoNetwork()
     def test_npmsw_bundled(self):
-        for packages_key, package_prefix, bundled_key in [
-            ('dependencies', '', 'bundled'),
-            ('packages', 'node_modules/', 'inBundle')
-        ]:
-            swfile = self.create_shrinkwrap_file({
-                packages_key: {
-                    package_prefix + 'array-flatten': {
-                        'version': '1.1.1',
-                        'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                        'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
-                    },
-                    package_prefix + 'content-type': {
-                        'version': '1.0.4',
-                        'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
-                        'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
-                        bundled_key: True
-                    }
+        swfile = self.create_shrinkwrap_file({
+            'packages': {
+                'node_modules/array-flatten': {
+                    'version': '1.1.1',
+                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
+                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+                },
+                'node_modules/content-type': {
+                    'version': '1.0.4',
+                    'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
+                    'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
+                    'inBundle': True
                 }
-            })
-            fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
-            fetcher.download()
-            self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')))
-            self.assertFalse(os.path.exists(os.path.join(self.dldir, 'npm2', 'content-type-1.0.4.tgz')))
+            }
+        })
+        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')))
+        self.assertFalse(os.path.exists(os.path.join(self.dldir, 'npm2', 'content-type-1.0.4.tgz')))
 
 class GitSharedTest(FetcherTest):
     def setUp(self):
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 02/21] fetch2: npmsw: remove old lockfile format support
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 01/21] tests: fetch: update npmsw tests to new lockfile format Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 03/21] tests: fetch: replace [url] with urls for npm Stefan Herbrechtsmeier
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Remove support for the old lockfile format. The old lockfile format is
required by npm 6 / Node.js 14 which is out of maintenance [2].

[1] https://docs.npmjs.com/cli/v6/configuring-npm/package-lock-json
[2] https://nodejs.org/en/about/previous-releases

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/npmsw.py | 97 ++++++++++++++++--------------------------
 1 file changed, 36 insertions(+), 61 deletions(-)

diff --git a/lib/bb/fetch2/npmsw.py b/lib/bb/fetch2/npmsw.py
index 558c9a2b0..2f9599ee9 100644
--- a/lib/bb/fetch2/npmsw.py
+++ b/lib/bb/fetch2/npmsw.py
@@ -37,40 +37,26 @@ def foreach_dependencies(shrinkwrap, callback=None, dev=False):
     """
         Run a callback for each dependencies of a shrinkwrap file.
         The callback is using the format:
-            callback(name, params, deptree)
+            callback(name, data, location)
         with:
             name = the package name (string)
-            params = the package parameters (dictionary)
-            destdir = the destination of the package (string)
+            data = the package data (dictionary)
+            location = the location of the package (string)
     """
-    # For handling old style dependencies entries in shinkwrap files
-    def _walk_deps(deps, deptree):
-        for name in deps:
-            subtree = [*deptree, name]
-            _walk_deps(deps[name].get("dependencies", {}), subtree)
-            if callback is not None:
-                if deps[name].get("dev", False) and not dev:
-                    continue
-                elif deps[name].get("bundled", False):
-                    continue
-                destsubdirs = [os.path.join("node_modules", dep) for dep in subtree]
-                destsuffix = os.path.join(*destsubdirs)
-                callback(name, deps[name], destsuffix)
-
-    # packages entry means new style shrinkwrap file, else use dependencies
-    packages = shrinkwrap.get("packages", None)
-    if packages is not None:
-        for package in packages:
-            if package != "":
-                name = package.split('node_modules/')[-1]
-                package_infos = packages.get(package, {})
-                if dev == False and package_infos.get("dev", False):
-                    continue
-                elif package_infos.get("inBundle", False):
-                    continue
-                callback(name, package_infos, package)
-    else:
-        _walk_deps(shrinkwrap.get("dependencies", {}), [])
+    packages = shrinkwrap.get("packages")
+    if not packages:
+        raise FetchError("Invalid shrinkwrap file format")
+
+    for location, data in packages.items():
+        # Skip empty main and local link target packages
+        if not location.startswith('node_modules/'):
+            continue
+        elif not dev and data.get("dev", False):
+            continue
+        elif data.get("inBundle", False):
+            continue
+        name = location.split('node_modules/')[-1]
+        callback(name, data, location)
 
 class NpmShrinkWrap(FetchMethod):
     """Class to fetch all package from a shrinkwrap file"""
@@ -97,12 +83,18 @@ class NpmShrinkWrap(FetchMethod):
             extrapaths = []
             unpack = True
 
-            integrity = params.get("integrity", None)
-            resolved = params.get("resolved", None)
-            version = params.get("version", resolved)
+            integrity = params.get("integrity")
+            resolved = params.get("resolved")
+            version = params.get("version")
+            link = params.get("link", False)
+
+            # Handle link sources
+            if link:
+                localpath = resolved
+                unpack = False
 
             # Handle registry sources
-            if is_semver(version) and integrity:
+            elif version and is_semver(version) and integrity:
                 # Handle duplicate dependencies without url
                 if not resolved:
                     return
@@ -130,10 +122,10 @@ class NpmShrinkWrap(FetchMethod):
                 extrapaths.append(resolvefile)
 
             # Handle http tarball sources
-            elif version.startswith("http") and integrity:
-                localfile = npm_localfile(os.path.basename(version))
+            elif resolved.startswith("http") and integrity:
+                localfile = npm_localfile(os.path.basename(resolved))
 
-                uri = URI(version)
+                uri = URI(resolved)
                 uri.params["downloadfilename"] = localfile
 
                 checksum_name, checksum_expected = npm_integrity(integrity)
@@ -143,28 +135,12 @@ class NpmShrinkWrap(FetchMethod):
 
                 localpath = os.path.join(d.getVar("DL_DIR"), localfile)
 
-            # Handle local tarball and link sources
-            elif version.startswith("file"):
-                localpath = version[5:]
-                if not version.endswith(".tgz"):
-                    unpack = False
+            # Handle local tarball sources
+            elif resolved.startswith("file"):
+                localpath = resolved[5:]
 
             # Handle git sources
-            elif version.startswith(("git", "bitbucket","gist")) or (
-                not version.endswith((".tgz", ".tar", ".tar.gz"))
-                and not version.startswith((".", "@", "/"))
-                and "/" in version
-            ):
-                if version.startswith("github:"):
-                    version = "git+https://github.com/" + version[len("github:"):]
-                elif version.startswith("gist:"):
-                    version = "git+https://gist.github.com/" + version[len("gist:"):]
-                elif version.startswith("bitbucket:"):
-                    version = "git+https://bitbucket.org/" + version[len("bitbucket:"):]
-                elif version.startswith("gitlab:"):
-                    version = "git+https://gitlab.com/" + version[len("gitlab:"):]
-                elif not version.startswith(("git+","git:")):
-                    version = "git+https://github.com/" + version
+            elif resolved.startswith("git"):
                 regex = re.compile(r"""
                     ^
                     git\+
@@ -176,10 +152,9 @@ class NpmShrinkWrap(FetchMethod):
                     $
                     """, re.VERBOSE)
 
-                match = regex.match(version)
-
+                match = regex.match(resolved)
                 if not match:
-                    raise ParameterError("Invalid git url: %s" % version, ud.url)
+                    raise ParameterError("Invalid git url: %s" % resolved, ud.url)
 
                 groups = match.groupdict()
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 03/21] tests: fetch: replace [url] with urls for npm
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 01/21] tests: fetch: update npmsw tests to new lockfile format Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 02/21] fetch2: npmsw: remove old lockfile format support Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 04/21] fetch2: do not prefix embedded checksums Stefan Herbrechtsmeier
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Replace [url] with urls to simplify future modifications.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 66 +++++++++++++++++++++----------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index b89348236..c5ec84dc5 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -2627,8 +2627,8 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
@@ -2641,9 +2641,9 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_bad_checksum(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch([url], self.d)
+        fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
@@ -2660,9 +2660,9 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_premirrors(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch([url], self.d)
+        fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
@@ -2682,7 +2682,7 @@ class NPMTest(FetcherTest):
         # while the fetcher object exists, which it does when we rename the
         # download directory to "mirror" above. Thus we need a new fetcher to go
         # with the now empty download directory.
-        fetcher = bb.fetch.Fetch([url], self.d)
+        fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
@@ -2690,9 +2690,9 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_premirrors_with_specified_filename(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch([url], self.d)
+        fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
@@ -2712,8 +2712,8 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npm_mirrors(self):
         # Fetch once to get a tarball
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
@@ -2737,8 +2737,8 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_destsuffix_downloadfilename(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0;destsuffix=foo/bar;downloadfilename=foo-bar.tgz'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0;destsuffix=foo/bar;downloadfilename=foo-bar.tgz']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'foo-bar.tgz')))
         fetcher.unpack(self.unpackdir)
@@ -2746,18 +2746,18 @@ class NPMTest(FetcherTest):
         self.assertTrue(os.path.exists(os.path.join(unpackdir, 'package.json')))
 
     def test_npm_no_network_no_tarball(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
         self.d.setVar('BB_NO_NETWORK', '1')
-        fetcher = bb.fetch.Fetch([url], self.d)
+        fetcher = bb.fetch.Fetch(urls, self.d)
         with self.assertRaises(bb.fetch2.NetworkAccess):
             fetcher.download()
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_no_network_with_tarball(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch([url], self.d)
+        fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         # Disable network access
         self.d.setVar('BB_NO_NETWORK', '1')
@@ -2770,8 +2770,8 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_registry_alternate(self):
-        url = 'npm://skimdb.npmjs.com;package=@savoirfairelinux/node-server-example;version=1.0.0'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        urls = ['npm://skimdb.npmjs.com;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
         unpackdir = os.path.join(self.unpackdir, 'npm')
@@ -2780,8 +2780,8 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_latest(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=latest'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        url = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=latest']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
         unpackdir = os.path.join(self.unpackdir, 'npm')
@@ -2790,46 +2790,46 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_registry_invalid(self):
-        url = 'npm://registry.invalid.org;package=@savoirfairelinux/node-server-example;version=1.0.0'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        urls = ['npm://registry.invalid.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         with self.assertRaises(bb.fetch2.FetchError):
             fetcher.download()
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_package_invalid(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/invalid;version=1.0.0'
-        fetcher = bb.fetch.Fetch([url], self.d)
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/invalid;version=1.0.0']
+        fetcher = bb.fetch.Fetch(urls, self.d)
         with self.assertRaises(bb.fetch2.FetchError):
             fetcher.download()
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_invalid(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=invalid'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=invalid']
         with self.assertRaises(bb.fetch2.ParameterError):
-            fetcher = bb.fetch.Fetch([url], self.d)
+            fetcher = bb.fetch.Fetch(urls, self.d)
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_registry_none(self):
-        url = 'npm://;package=@savoirfairelinux/node-server-example;version=1.0.0'
+        urls = ['npm://;package=@savoirfairelinux/node-server-example;version=1.0.0']
         with self.assertRaises(bb.fetch2.MalformedUrl):
-            fetcher = bb.fetch.Fetch([url], self.d)
+            fetcher = bb.fetch.Fetch(urls, self.d)
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_package_none(self):
-        url = 'npm://registry.npmjs.org;version=1.0.0'
+        urls = ['npm://registry.npmjs.org;version=1.0.0']
         with self.assertRaises(bb.fetch2.MissingParameterError):
-            fetcher = bb.fetch.Fetch([url], self.d)
+            fetcher = bb.fetch.Fetch(urls, self.d)
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_none(self):
-        url = 'npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example'
+        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example']
         with self.assertRaises(bb.fetch2.MissingParameterError):
-            fetcher = bb.fetch.Fetch([url], self.d)
+            fetcher = bb.fetch.Fetch(urls, self.d)
 
     def create_shrinkwrap_file(self, data):
         import json
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 04/21] fetch2: do not prefix embedded checksums
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (2 preceding siblings ...)
  2024-12-20 11:25 ` [RFC PATCH 03/21] tests: fetch: replace [url] with urls for npm Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 05/21] fetch2: read checksum from SRC_URI flag for npm Stefan Herbrechtsmeier
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

The fetcher support entries with an embedded checksum like 'sha256sum'
in the SRC_URI. It adds the parameter 'name' as prefix to the checksums
if the parameter is set. This behavior is unexpected and leads to hacks
in fetchers. Fallback to the checksum without the useless prefix and
set the parameter 'name' in the gomod fetcher unconditional.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py | 9 ++++++---
 lib/bb/fetch2/gomod.py    | 5 +----
 lib/bb/tests/fetch.py     | 4 ----
 3 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index f79e278b1..3d31a8d4a 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1316,20 +1316,23 @@ class FetchData(object):
         self.setup = False
 
         def configure_checksum(checksum_id):
+            checksum_plain_name = "%ssum" % checksum_id
             if "name" in self.parm:
                 checksum_name = "%s.%ssum" % (self.parm["name"], checksum_id)
             else:
-                checksum_name = "%ssum" % checksum_id
-
-            setattr(self, "%s_name" % checksum_id, checksum_name)
+                checksum_name = checksum_plain_name
 
             if checksum_name in self.parm:
                 checksum_expected = self.parm[checksum_name]
+            elif checksum_plain_name in self.parm:
+                checksum_expected = self.parm[checksum_plain_name]
+                checksum_name = checksum_plain_name
             elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod"]:
                 checksum_expected = None
             else:
                 checksum_expected = d.getVarFlag("SRC_URI", checksum_name)
 
+            setattr(self, "%s_name" % checksum_id, checksum_name)
             setattr(self, "%s_expected" % checksum_id, checksum_expected)
 
         self.names = self.parm.get("name",'default').split(',')
diff --git a/lib/bb/fetch2/gomod.py b/lib/bb/fetch2/gomod.py
index 21fbe80f5..6c999e8ba 100644
--- a/lib/bb/fetch2/gomod.py
+++ b/lib/bb/fetch2/gomod.py
@@ -119,10 +119,7 @@ class GoMod(Wget):
             ('https', proxy, '/' + path, None, None, None))
         ud.parm['downloadfilename'] = path
 
-        # Set name parameter if sha256sum is set in recipe
-        name = f"{module}@{ud.parm['version']}"
-        if d.getVarFlag('SRC_URI', name + '.sha256sum'):
-            ud.parm['name'] = name
+        ud.parm['name'] = f"{module}@{ud.parm['version']}"
 
         # Set subdir for unpack
         ud.parm['subdir'] = os.path.join(moddir, 'cache/download',
diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index c5ec84dc5..6b8e3e060 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -3391,7 +3391,6 @@ class GoModTest(FetcherTest):
         fetcher = bb.fetch2.Fetch(urls, self.d)
         ud = fetcher.ud[urls[0]]
         self.assertEqual(ud.url, 'https://proxy.golang.org/github.com/%21azure/azure-sdk-for-go/sdk/storage/azblob/%40v/v1.0.0.zip')
-        self.assertNotIn('name', ud.parm)
 
         fetcher.download()
         fetcher.unpack(self.unpackdir)
@@ -3409,7 +3408,6 @@ class GoModTest(FetcherTest):
         fetcher = bb.fetch2.Fetch(urls, self.d)
         ud = fetcher.ud[urls[0]]
         self.assertEqual(ud.url, 'https://proxy.golang.org/github.com/%21azure/azure-sdk-for-go/sdk/storage/azblob/%40v/v1.0.0.mod')
-        self.assertNotIn('name', ud.parm)
 
         fetcher.download()
         fetcher.unpack(self.unpackdir)
@@ -3442,7 +3440,6 @@ class GoModTest(FetcherTest):
         fetcher = bb.fetch2.Fetch(urls, self.d)
         ud = fetcher.ud[urls[0]]
         self.assertEqual(ud.url, 'https://proxy.golang.org/gopkg.in/ini.v1/%40v/v1.67.0.zip')
-        self.assertNotIn('name', ud.parm)
 
         fetcher.download()
         fetcher.unpack(self.unpackdir)
@@ -3460,7 +3457,6 @@ class GoModTest(FetcherTest):
         fetcher = bb.fetch2.Fetch(urls, self.d)
         ud = fetcher.ud[urls[0]]
         self.assertEqual(ud.url, 'https://proxy.golang.org/go.opencensus.io/%40v/v0.24.0.zip')
-        self.assertNotIn('name', ud.parm)
 
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 05/21] fetch2: read checksum from SRC_URI flag for npm
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (3 preceding siblings ...)
  2024-12-20 11:25 ` [RFC PATCH 04/21] fetch2: do not prefix embedded checksums Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 06/21] fetch2: introduce common package manager metadata Stefan Herbrechtsmeier
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 3d31a8d4a..d2a30c18f 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1327,7 +1327,7 @@ class FetchData(object):
             elif checksum_plain_name in self.parm:
                 checksum_expected = self.parm[checksum_plain_name]
                 checksum_name = checksum_plain_name
-            elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod"]:
+            elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs", "gomod", "npm"]:
                 checksum_expected = None
             else:
                 checksum_expected = d.getVarFlag("SRC_URI", checksum_name)
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 06/21] fetch2: introduce common package manager metadata
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (4 preceding siblings ...)
  2024-12-20 11:25 ` [RFC PATCH 05/21] fetch2: read checksum from SRC_URI flag for npm Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-20 11:25 ` [RFC PATCH 07/21] fetch2: add unpack support for npm archives Stefan Herbrechtsmeier
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Downloads from package manager repositories are identified via registry,
name, and version. The fetchers use individual styles to define the
download metadata:

npm://<REGISTRY>;package=<NAME>;version=<VERSION>

crate://<REGISTRY>/<NAME>/<VERSION>

GO_MOD_PROXY = “<REGISTRY>”
gomod://<NAME>;version=<VERSION>
gomodgit://<NAME>;version=<VERSION>;repo= <REPOSITORY>

The name and version are important for the SBOM to add usable name,
version, and CPE to the SBOM entries for the downloaded dependencies.
Introduce a common style and check the existence of the parameters:

<TYPE>://<REGISTRY | REPOSITORY>;dn=<NAME>;dv=<VERSION>

The style clearly separates the metadata and supports slashes and @
in the name.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index d2a30c18f..4b7c01d6a 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1356,6 +1356,12 @@ class FetchData(object):
         if hasattr(self.method, "urldata_init"):
             self.method.urldata_init(self, d)
 
+        if self.method.require_download_metadata():
+            if "dn" not in self.parm:
+                raise MissingParameterError("dn", self.url)
+            if "dv" not in self.parm:
+                raise MissingParameterError("dv", self.url)
+
         for checksum_id in CHECKSUM_LIST:
             configure_checksum(checksum_id)
 
@@ -1711,6 +1717,12 @@ class FetchMethod(object):
         """
         return []
 
+    def require_download_metadata(self):
+        """
+        The fetcher requires download name (dn) und version (dv) parameter.
+        """
+        return False
+
 
 class DummyUnpackTracer(object):
     """
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 07/21] fetch2: add unpack support for npm archives
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (5 preceding siblings ...)
  2024-12-20 11:25 ` [RFC PATCH 06/21] fetch2: introduce common package manager metadata Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-23 11:56   ` [bitbake-devel] " Richard Purdie
  2024-12-20 11:25 ` [RFC PATCH 08/21] utils: add Go mod h1 checksum support Stefan Herbrechtsmeier
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add unpack support for npm archives with unusual member ordering and
disable warnings for unknown extended header keywords.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 4b7c01d6a..7d8f71b20 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1535,6 +1535,7 @@ class FetchMethod(object):
 
         if unpack:
             tar_cmd = 'tar --extract --no-same-owner'
+            tar_cmd += ' --delay-directory-restore --warning=no-unknown-keyword'
             if 'striplevel' in urldata.parm:
                 tar_cmd += ' --strip-components=%s' %  urldata.parm['striplevel']
             if file.endswith('.tar'):
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 08/21] utils: add Go mod h1 checksum support
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (6 preceding siblings ...)
  2024-12-20 11:25 ` [RFC PATCH 07/21] fetch2: add unpack support for npm archives Stefan Herbrechtsmeier
@ 2024-12-20 11:25 ` Stefan Herbrechtsmeier
  2024-12-23 10:01   ` [bitbake-devel] " Richard Purdie
  2024-12-20 11:26 ` [RFC PATCH 09/21] fetch2: add destdir to FetchData Stefan Herbrechtsmeier
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:25 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add support for the Go mod h1 hash. The hash is
based on the Go dirhash package. The package
defines hashes over directory trees and is uses
for Go mod files and zip archives.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py |  2 +-
 lib/bb/utils.py           | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 7d8f71b20..0c2d6d73e 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -34,7 +34,7 @@ _revisions_cache = bb.checksum.RevisionsCache()
 
 logger = logging.getLogger("BitBake.Fetcher")
 
-CHECKSUM_LIST = [ "md5", "sha256", "sha1", "sha384", "sha512" ]
+CHECKSUM_LIST = [ "h1", "md5", "sha256", "sha1", "sha384", "sha512" ]
 SHOWN_CHECKSUM_LIST = ["sha256"]
 
 class BBFetchException(Exception):
diff --git a/lib/bb/utils.py b/lib/bb/utils.py
index e722f9113..131766e33 100644
--- a/lib/bb/utils.py
+++ b/lib/bb/utils.py
@@ -585,6 +585,31 @@ def sha512_file(filename):
     import hashlib
     return _hasher(hashlib.sha512(), filename)
 
+def h1_file(filename):
+    """
+    Return the hex string representation of the Go mod h1 checksum of the
+    filename. The Go mod h1 checksum uses the Go dirhash package. The package
+    defines hashes over directory trees and is used by go mod for mod files and
+    zip archives.
+    """
+    import hashlib
+    import zipfile
+
+    lines = []
+    if zipfile.is_zipfile(filename):
+        with zipfile.ZipFile(filename) as archive:
+            for fn in sorted(archive.namelist()):
+                method = hashlib.sha256()
+                method.update(archive.read(fn))
+                hash = method.hexdigest()
+                lines.append("%s  %s\n" % (hash, fn))
+    else:
+        hash = _hasher(hashlib.sha256(), filename)
+        lines.append("%s  go.mod\n" % hash)
+    method = hashlib.sha256()
+    method.update("".join(lines).encode('utf-8'))
+    return method.hexdigest()
+
 def preserved_envvars_exported():
     """Variables which are taken from the environment and placed in and exported
     from the metadata"""
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 09/21] fetch2: add destdir to FetchData
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (7 preceding siblings ...)
  2024-12-20 11:25 ` [RFC PATCH 08/21] utils: add Go mod h1 checksum support Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-23  9:56   ` [bitbake-devel] " Richard Purdie
  2024-12-20 11:26 ` [RFC PATCH 10/21] fetch: npm: rework Stefan Herbrechtsmeier
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add a `destdir` variable to the `FetchData` class to record destination
directory of unpack method. Users of the `FetchData` class can use the
directory to unpack additional content into the directory. The git
fetcher class already records the destination directory in `destdir`
class variable of `FetchData`.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 0c2d6d73e..3a7030bf3 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -1314,6 +1314,7 @@ class FetchData(object):
         if not self.pswd and "pswd" in self.parm:
             self.pswd = self.parm["pswd"]
         self.setup = False
+        self.destdir = None
 
         def configure_checksum(checksum_id):
             checksum_plain_name = "%ssum" % checksum_id
@@ -1609,6 +1610,8 @@ class FetchMethod(object):
         else:
             unpackdir = rootdir
 
+        urldata.destdir = unpackdir
+
         if not unpack or not cmd:
             urldata.unpack_tracer.unpack("file-copy", unpackdir)
             # If file == dest, then avoid any copies, as we already put the file into dest!
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 10/21] fetch: npm: rework
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (8 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 09/21] fetch2: add destdir to FetchData Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 11/21] tests: fetch: adapt style in npm(sw) class Stefan Herbrechtsmeier
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Rework the npm class regarding testability and integrity:
* Remove dependency to npm binary.
* Construct URL via a fix style and don’t resolve the URL via package
  registry.
* Use the checksum from the recipe or URI and don’t depend on the
  checksum from the package registry.
* Add common name and version schema.
* Mark unused NpmEnvironment and npm_unpack function as deprecated.
* Use Wget class as base and remove foreign done stamp handling.
* Add support to compute the latest release version.
* Remove support for latest version because it requires a package
  registry and should be rarely used.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/npm.py | 244 +++++++++++--------------------------------
 1 file changed, 63 insertions(+), 181 deletions(-)

diff --git a/lib/bb/fetch2/npm.py b/lib/bb/fetch2/npm.py
index ac76d64cd..120dddbfd 100644
--- a/lib/bb/fetch2/npm.py
+++ b/lib/bb/fetch2/npm.py
@@ -10,20 +10,19 @@ SRC_URI = "npm://some.registry.url;OptionA=xxx;OptionB=xxx;..."
 
 Supported SRC_URI options are:
 
-- package
+- dn
    The npm package name. This is a mandatory parameter.
 
-- version
+- dv
     The npm package version. This is a mandatory parameter.
 
 - downloadfilename
     Specifies the filename used when storing the downloaded file.
 
 - destsuffix
-    Specifies the directory to use to unpack the package (default: npm).
+    The name of the path in which to place the package (default: npm).
 """
 
-import base64
 import json
 import os
 import re
@@ -40,6 +39,8 @@ from bb.fetch2 import check_network_access
 from bb.fetch2 import runfetchcmd
 from bb.utils import is_semver
 
+from bb.fetch2.wget import Wget
+
 def npm_package(package):
     """Convert the npm package name to remove unsupported character"""
     # For scoped package names ('@user/package') the '/' is replaced by a '-'.
@@ -64,14 +65,7 @@ def npm_localfile(package, version=None):
         filename = package
     return os.path.join("npm2", filename)
 
-def npm_integrity(integrity):
-    """
-    Get the checksum name and expected value from the subresource integrity
-        https://www.w3.org/TR/SRI/
-    """
-    algo, value = integrity.split("-", maxsplit=1)
-    return "%ssum" % algo, base64.b64decode(value).hex()
-
+# Deprecated
 def npm_unpack(tarball, destdir, d):
     """Unpack a npm tarball"""
     bb.utils.mkdirhier(destdir)
@@ -80,8 +74,8 @@ def npm_unpack(tarball, destdir, d):
     cmd += " --delay-directory-restore"
     cmd += " --strip-components=1"
     runfetchcmd(cmd, d, workdir=destdir)
-    runfetchcmd("chmod -R +X '%s'" % (destdir), d, quiet=True, workdir=destdir)
 
+# Deprecated
 class NpmEnvironment(object):
     """
     Using a npm config file seems more reliable than using cli arguments.
@@ -130,7 +124,15 @@ class NpmEnvironment(object):
 
             return _run(cmd)
 
-class Npm(FetchMethod):
+
+def construct_url_path(name, version):
+    return f"/{name}/-/{name.split('/')[-1]}-{version}.tgz"
+
+def construct_url(registry, name, version):
+    path = construct_url_path(name, version)
+    return f"https://{registry}{path}"
+
+class Npm(Wget):
     """Class to fetch a package from a npm registry"""
 
     def supports(self, ud, d):
@@ -139,178 +141,58 @@ class Npm(FetchMethod):
 
     def urldata_init(self, ud, d):
         """Init npm specific variables within url data"""
-        ud.package = None
-        ud.version = None
-        ud.registry = None
 
-        # Get the 'package' parameter
         if "package" in ud.parm:
-            ud.package = ud.parm.get("package")
+            bb.warn(f"Parameter 'package' in '{ud.url}' is deprecated."
+                    "Please use 'dn' parameter instead.")
+            ud.parm["dn"] = ud.parm["package"]
+            del ud.parm["package"]
+        if "version" in ud.parm:
+            bb.warn(f"Parameter 'version' in '{ud.url}' is deprecated."
+                    "Please use 'dv' parameter instead.")
+            ud.parm["dv"] = ud.parm["version"]
+            del ud.parm["version"]
+
+        if any(x not in ud.parm for x in ["dn", "dv"]):
+            return
+
+        registry = ud.host
+        if ud.path != '/':
+            registry += ud.path
+        name = ud.parm["dn"]
+        version = ud.parm["dv"]
+
+        if not is_semver(version):
+            if version == "latest":
+                raise ParameterError("Value 'latest' for parameter 'version' is no longer supported", ud.url)
+            else:
+                raise ParameterError("Invalid 'version' parameter", ud.url)
 
-        if not ud.package:
-            raise MissingParameterError("Parameter 'package' required", ud.url)
+        ud.url = construct_url(registry, name, version)
+        ud.info_url = f"https://{registry}/{name}"
 
-        # Get the 'version' parameter
-        if "version" in ud.parm:
-            ud.version = ud.parm.get("version")
+        if not "downloadfilename" in ud.parm:
+            ud.parm['downloadfilename'] = npm_localfile(name, version)
 
-        if not ud.version:
-            raise MissingParameterError("Parameter 'version' required", ud.url)
+        destsuffix = ud.parm.get("destsuffix", "npm")
+        subdir = ud.parm.get("subdir", "")
+        ud.parm["destsuffix"] = destsuffix
+        ud.parm["subdir"] = os.path.join(subdir, destsuffix)
 
-        if not is_semver(ud.version) and not ud.version == "latest":
-            raise ParameterError("Invalid 'version' parameter", ud.url)
+        if 'name' not in ud.parm:
+            ud.parm["name"] = f"{npm_package(name)}-{version}"
 
-        # Extract the 'registry' part of the url
-        ud.registry = re.sub(r"^npm://", "https://", ud.url.split(";")[0])
+        ud.parm["striplevel"] = 1
 
-        # Using the 'downloadfilename' parameter as local filename
-        # or the npm package name.
-        if "downloadfilename" in ud.parm:
-            ud.localfile = npm_localfile(d.expand(ud.parm["downloadfilename"]))
-        else:
-            ud.localfile = npm_localfile(ud.package, ud.version)
-
-        # Get the base 'npm' command
-        ud.basecmd = d.getVar("FETCHCMD_npm") or "npm"
-
-        # This fetcher resolves a URI from a npm package name and version and
-        # then forwards it to a proxy fetcher. A resolve file containing the
-        # resolved URI is created to avoid unwanted network access (if the file
-        # already exists). The management of the donestamp file, the lockfile
-        # and the checksums are forwarded to the proxy fetcher.
-        ud.proxy = None
-        ud.needdonestamp = False
-        ud.resolvefile = self.localpath(ud, d) + ".resolved"
-
-    def _resolve_proxy_url(self, ud, d):
-        def _npm_view():
-            args = []
-            args.append(("json", "true"))
-            args.append(("registry", ud.registry))
-            pkgver = shlex.quote(ud.package + "@" + ud.version)
-            cmd = ud.basecmd + " view %s" % pkgver
-            env = NpmEnvironment(d)
-            check_network_access(d, cmd, ud.registry)
-            view_string = env.run(cmd, args=args)
-
-            if not view_string:
-                raise FetchError("Unavailable package %s" % pkgver, ud.url)
-
-            try:
-                view = json.loads(view_string)
-
-                error = view.get("error")
-                if error is not None:
-                    raise FetchError(error.get("summary"), ud.url)
-
-                if ud.version == "latest":
-                    bb.warn("The npm package %s is using the latest " \
-                            "version available. This could lead to " \
-                            "non-reproducible builds." % pkgver)
-                elif ud.version != view.get("version"):
-                    raise ParameterError("Invalid 'version' parameter", ud.url)
-
-                return view
-
-            except Exception as e:
-                raise FetchError("Invalid view from npm: %s" % str(e), ud.url)
-
-        def _get_url(view):
-            tarball_url = view.get("dist", {}).get("tarball")
-
-            if tarball_url is None:
-                raise FetchError("Invalid 'dist.tarball' in view", ud.url)
-
-            uri = URI(tarball_url)
-            uri.params["downloadfilename"] = ud.localfile
-
-            integrity = view.get("dist", {}).get("integrity")
-            shasum = view.get("dist", {}).get("shasum")
-
-            if integrity is not None:
-                checksum_name, checksum_expected = npm_integrity(integrity)
-                uri.params[checksum_name] = checksum_expected
-            elif shasum is not None:
-                uri.params["sha1sum"] = shasum
-            else:
-                raise FetchError("Invalid 'dist.integrity' in view", ud.url)
-
-            return str(uri)
-
-        url = _get_url(_npm_view())
-
-        bb.utils.mkdirhier(os.path.dirname(ud.resolvefile))
-        with open(ud.resolvefile, "w") as f:
-            f.write(url)
-
-    def _setup_proxy(self, ud, d):
-        if ud.proxy is None:
-            if not os.path.exists(ud.resolvefile):
-                self._resolve_proxy_url(ud, d)
-
-            with open(ud.resolvefile, "r") as f:
-                url = f.read()
-
-            # Avoid conflicts between the environment data and:
-            # - the proxy url checksum
-            data = bb.data.createCopy(d)
-            data.delVarFlags("SRC_URI")
-            ud.proxy = Fetch([url], data)
-
-    def _get_proxy_method(self, ud, d):
-        self._setup_proxy(ud, d)
-        proxy_url = ud.proxy.urls[0]
-        proxy_ud = ud.proxy.ud[proxy_url]
-        proxy_d = ud.proxy.d
-        proxy_ud.setup_localpath(proxy_d)
-        return proxy_ud.method, proxy_ud, proxy_d
-
-    def verify_donestamp(self, ud, d):
-        """Verify the donestamp file"""
-        proxy_m, proxy_ud, proxy_d = self._get_proxy_method(ud, d)
-        return proxy_m.verify_donestamp(proxy_ud, proxy_d)
-
-    def update_donestamp(self, ud, d):
-        """Update the donestamp file"""
-        proxy_m, proxy_ud, proxy_d = self._get_proxy_method(ud, d)
-        proxy_m.update_donestamp(proxy_ud, proxy_d)
-
-    def need_update(self, ud, d):
-        """Force a fetch, even if localpath exists ?"""
-        if not os.path.exists(ud.resolvefile):
-            return True
-        if ud.version == "latest":
-            return True
-        proxy_m, proxy_ud, proxy_d = self._get_proxy_method(ud, d)
-        return proxy_m.need_update(proxy_ud, proxy_d)
-
-    def try_mirrors(self, fetch, ud, d, mirrors):
-        """Try to use a mirror"""
-        proxy_m, proxy_ud, proxy_d = self._get_proxy_method(ud, d)
-        return proxy_m.try_mirrors(fetch, proxy_ud, proxy_d, mirrors)
-
-    def download(self, ud, d):
-        """Fetch url"""
-        self._setup_proxy(ud, d)
-        ud.proxy.download()
-
-    def unpack(self, ud, rootdir, d):
-        """Unpack the downloaded archive"""
-        destsuffix = ud.parm.get("destsuffix", "npm")
-        destdir = os.path.join(rootdir, destsuffix)
-        npm_unpack(ud.localpath, destdir, d)
-        ud.unpack_tracer.unpack("npm", destdir)
-
-    def clean(self, ud, d):
-        """Clean any existing full or partial download"""
-        if os.path.exists(ud.resolvefile):
-            self._setup_proxy(ud, d)
-            ud.proxy.clean()
-            bb.utils.remove(ud.resolvefile)
-
-    def done(self, ud, d):
-        """Is the download done ?"""
-        if not os.path.exists(ud.resolvefile):
-            return False
-        proxy_m, proxy_ud, proxy_d = self._get_proxy_method(ud, d)
-        return proxy_m.done(proxy_ud, proxy_d)
+        super().urldata_init(ud, d)
+
+    def latest_versionstring(self, ud, d):
+        from functools import cmp_to_key
+        info = json.loads(self._fetch_index(ud.info_url, ud, d))
+        versions = [(0, v, "") for v in info["versions"]]
+        versions = sorted(versions, key=cmp_to_key(bb.utils.vercmp))
+
+        return (versions[-1][1], "")
+
+    def require_download_metadata(self):
+        return True
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 11/21] tests: fetch: adapt style in npm(sw) class
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (9 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 10/21] fetch: npm: rework Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 12/21] tests: fetch: move npmsw test cases into npmsw test class Stefan Herbrechtsmeier
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 348 +++++++++++++++++++++---------------------
 1 file changed, 174 insertions(+), 174 deletions(-)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 6b8e3e060..934b96cac 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -2620,47 +2620,47 @@ class CrateTest(FetcherTest):
 class NPMTest(FetcherTest):
     def skipIfNoNpm():
         import shutil
-        if not shutil.which('npm'):
-            return unittest.skip('npm not installed')
+        if not shutil.which("npm"):
+            return unittest.skip("npm not installed")
         return lambda f: f
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
-        self.assertTrue(os.path.exists(ud.localpath + '.done'))
+        self.assertTrue(os.path.exists(ud.localpath + ".done"))
         self.assertTrue(os.path.exists(ud.resolvefile))
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, 'npm')
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, 'package.json')))
+        unpackdir = os.path.join(self.unpackdir, "npm")
+        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_bad_checksum(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         # Fetch once to get a tarball
         fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
         # Modify the tarball
-        bad = b'bad checksum'
-        with open(ud.localpath, 'wb') as f:
+        bad = b"bad checksum"
+        with open(ud.localpath, "wb") as f:
             f.write(bad)
         # Verify that the tarball is fetched again
         fetcher.download()
         badsum = hashlib.sha512(bad).hexdigest()
-        self.assertTrue(os.path.exists(ud.localpath + '_bad-checksum_' + badsum))
+        self.assertTrue(os.path.exists(ud.localpath + "_bad-checksum_" + badsum))
         self.assertTrue(os.path.exists(ud.localpath))
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_premirrors(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         # Fetch once to get a tarball
         fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
@@ -2668,17 +2668,17 @@ class NPMTest(FetcherTest):
         self.assertTrue(os.path.exists(ud.localpath))
 
         # Setup the mirror by renaming the download directory
-        mirrordir = os.path.join(self.tempdir, 'mirror')
+        mirrordir = os.path.join(self.tempdir, "mirror")
         bb.utils.rename(self.dldir, mirrordir)
         os.mkdir(self.dldir)
 
         # Configure the premirror to be used
-        self.d.setVar('PREMIRRORS', 'https?$://.*/.* file://%s/npm2' % mirrordir)
-        self.d.setVar('BB_FETCH_PREMIRRORONLY', '1')
+        self.d.setVar("PREMIRRORS", "https?$://.*/.* file://%s/npm2" % mirrordir)
+        self.d.setVar("BB_FETCH_PREMIRRORONLY", "1")
 
         # Fetch again
         self.assertFalse(os.path.exists(ud.localpath))
-        # The npm fetcher doesn't handle that the .resolved file disappears
+        # The npm fetcher doesn"t handle that the .resolved file disappears
         # while the fetcher object exists, which it does when we rename the
         # download directory to "mirror" above. Thus we need a new fetcher to go
         # with the now empty download directory.
@@ -2690,19 +2690,19 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_premirrors_with_specified_filename(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         # Fetch once to get a tarball
         fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
         # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, 'mirror')
+        mirrordir = os.path.join(self.tempdir, "mirror")
         bb.utils.mkdirhier(mirrordir)
         mirrorfilename = os.path.join(mirrordir, os.path.basename(ud.localpath))
         os.replace(ud.localpath, mirrorfilename)
-        self.d.setVar('PREMIRRORS', 'https?$://.*/.* file://%s' % mirrorfilename)
-        self.d.setVar('BB_FETCH_PREMIRRORONLY', '1')
+        self.d.setVar("PREMIRRORS", "https?$://.*/.* file://%s" % mirrorfilename)
+        self.d.setVar("BB_FETCH_PREMIRRORONLY", "1")
         # Fetch again
         self.assertFalse(os.path.exists(ud.localpath))
         fetcher.download()
@@ -2712,22 +2712,22 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npm_mirrors(self):
         # Fetch once to get a tarball
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
         # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, 'mirror')
+        mirrordir = os.path.join(self.tempdir, "mirror")
         bb.utils.mkdirhier(mirrordir)
         os.replace(ud.localpath, os.path.join(mirrordir, os.path.basename(ud.localpath)))
-        self.d.setVar('MIRRORS', 'https?$://.*/.* file://%s/' % mirrordir)
+        self.d.setVar("MIRRORS", "https?$://.*/.* file://%s/" % mirrordir)
         # Update the resolved url to an invalid url
-        with open(ud.resolvefile, 'r') as f:
+        with open(ud.resolvefile, "r") as f:
             url = f.read()
         uri = URI(url)
-        uri.path = '/invalid'
-        with open(ud.resolvefile, 'w') as f:
+        uri.path = "/invalid"
+        with open(ud.resolvefile, "w") as f:
             f.write(str(uri))
         # Fetch again
         self.assertFalse(os.path.exists(ud.localpath))
@@ -2737,17 +2737,17 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_destsuffix_downloadfilename(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0;destsuffix=foo/bar;downloadfilename=foo-bar.tgz']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0;destsuffix=foo/bar;downloadfilename=foo-bar.tgz"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'foo-bar.tgz')))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/foo-bar.tgz")))
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, 'foo', 'bar')
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, 'package.json')))
+        unpackdir = os.path.join(self.unpackdir, "foo", "bar")
+        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
 
     def test_npm_no_network_no_tarball(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
-        self.d.setVar('BB_NO_NETWORK', '1')
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
+        self.d.setVar("BB_NO_NETWORK", "1")
         fetcher = bb.fetch.Fetch(urls, self.d)
         with self.assertRaises(bb.fetch2.NetworkAccess):
             fetcher.download()
@@ -2755,42 +2755,42 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_no_network_with_tarball(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         # Fetch once to get a tarball
         fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         # Disable network access
-        self.d.setVar('BB_NO_NETWORK', '1')
+        self.d.setVar("BB_NO_NETWORK", "1")
         # Fetch again
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, 'npm')
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, 'package.json')))
+        unpackdir = os.path.join(self.unpackdir, "npm")
+        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_registry_alternate(self):
-        urls = ['npm://skimdb.npmjs.com;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://skimdb.npmjs.com;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, 'npm')
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, 'package.json')))
+        unpackdir = os.path.join(self.unpackdir, "npm")
+        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_latest(self):
-        url = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=latest']
+        url = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=latest"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, 'npm')
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, 'package.json')))
+        unpackdir = os.path.join(self.unpackdir, "npm")
+        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_registry_invalid(self):
-        urls = ['npm://registry.invalid.org;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://registry.invalid.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         with self.assertRaises(bb.fetch2.FetchError):
             fetcher.download()
@@ -2798,7 +2798,7 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_package_invalid(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/invalid;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/invalid;version=1.0.0"]
         fetcher = bb.fetch.Fetch(urls, self.d)
         with self.assertRaises(bb.fetch2.FetchError):
             fetcher.download()
@@ -2806,145 +2806,145 @@ class NPMTest(FetcherTest):
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_invalid(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=invalid']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=invalid"]
         with self.assertRaises(bb.fetch2.ParameterError):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_registry_none(self):
-        urls = ['npm://;package=@savoirfairelinux/node-server-example;version=1.0.0']
+        urls = ["npm://;package=@savoirfairelinux/node-server-example;version=1.0.0"]
         with self.assertRaises(bb.fetch2.MalformedUrl):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_package_none(self):
-        urls = ['npm://registry.npmjs.org;version=1.0.0']
+        urls = ["npm://registry.npmjs.org;version=1.0.0"]
         with self.assertRaises(bb.fetch2.MissingParameterError):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_none(self):
-        urls = ['npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example']
+        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example"]
         with self.assertRaises(bb.fetch2.MissingParameterError):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
     def create_shrinkwrap_file(self, data):
         import json
-        datadir = os.path.join(self.tempdir, 'data')
-        swfile = os.path.join(datadir, 'npm-shrinkwrap.json')
+        datadir = os.path.join(self.tempdir, "data")
+        swfile = os.path.join(datadir, "npm-shrinkwrap.json")
         bb.utils.mkdirhier(datadir)
-        with open(swfile, 'w') as f:
+        with open(swfile, "w") as f:
             json.dump(data, f)
         return swfile
 
     @skipIfNoNetwork()
     def test_npmsw(self):
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI=',
-                    'dependencies': {
-                        'content-type': "1.0.4"
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI=",
+                    "dependencies": {
+                        "content-type": "1.0.4"
                     }
                 },
-                'node_modules/array-flatten/node_modules/content-type': {
-                    'version': '1.0.4',
-                    'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
-                    'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
-                    'dependencies': {
-                        'cookie': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09'
+                "node_modules/array-flatten/node_modules/content-type": {
+                    "version": "1.0.4",
+                    "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz",
+                    "integrity": "sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==",
+                    "dependencies": {
+                        "cookie": "git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09"
                     }
                 },
-                'node_modules/array-flatten/node_modules/content-type/node_modules/cookie': {
-                    'resolved': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09'
+                "node_modules/array-flatten/node_modules/content-type/node_modules/cookie": {
+                    "resolved": "git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09"
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')))
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'content-type-1.0.4.tgz')))
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'git2', 'github.com.jshttp.cookie.git')))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.jshttp.cookie.git")))
         fetcher.unpack(self.unpackdir)
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'npm-shrinkwrap.json')))
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'node_modules', 'array-flatten', 'package.json')))
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'node_modules', 'array-flatten', 'node_modules', 'content-type', 'package.json')))
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'node_modules', 'array-flatten', 'node_modules', 'content-type', 'node_modules', 'cookie', 'package.json')))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "npm-shrinkwrap.json")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "node_modules/array-flatten/package.json")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "node_modules/array-flatten/node_modules/content-type/package.json")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "node_modules/array-flatten/node_modules/content-type/node_modules/cookie/package.json")))
 
     @skipIfNoNetwork()
     def test_npmsw_git(self):
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/cookie': {
-                    'resolved': 'git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09'
+            "packages": {
+                "node_modules/cookie": {
+                    "resolved": "git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09"
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'git2', 'github.com.jshttp.cookie.git')))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.jshttp.cookie.git")))
 
     @skipIfNoNetwork()
     def test_npmsw_dev(self):
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 },
-                'node_modules/content-type': {
-                    'version': '1.0.4',
-                    'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
-                    'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
-                    'dev': True
+                "node_modules/content-type": {
+                    "version": "1.0.4",
+                    "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz",
+                    "integrity": "sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==",
+                    "dev": True
                 }
             }
         })
         # Fetch with dev disabled
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')))
-        self.assertFalse(os.path.exists(os.path.join(self.dldir, 'npm2', 'content-type-1.0.4.tgz')))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertFalse(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
         # Fetch with dev enabled
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile + ';dev=1'], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile + ";dev=1"], self.d)
         fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')))
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'content-type-1.0.4.tgz')))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
 
     @skipIfNoNetwork()
     def test_npmsw_destsuffix(self):
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile + ';destsuffix=foo/bar'], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile + ";destsuffix=foo/bar"], self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'foo', 'bar', 'node_modules', 'array-flatten', 'package.json')))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "foo/bar/node_modules/array-flatten/package.json")))
 
     def test_npmsw_no_network_no_tarball(self):
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        self.d.setVar('BB_NO_NETWORK', '1')
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        self.d.setVar("BB_NO_NETWORK", "1")
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         with self.assertRaises(bb.fetch2.NetworkAccess):
             fetcher.download()
 
@@ -2952,112 +2952,112 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npmsw_no_network_with_tarball(self):
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(['npm://registry.npmjs.org;package=array-flatten;version=1.1.1'], self.d)
+        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
         fetcher.download()
         # Disable network access
-        self.d.setVar('BB_NO_NETWORK', '1')
+        self.d.setVar("BB_NO_NETWORK", "1")
         # Fetch again
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'node_modules', 'array-flatten', 'package.json')))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "node_modules/array-flatten/package.json")))
 
     @skipIfNoNetwork()
     def test_npmsw_npm_reusability(self):
         # Fetch once with npmsw
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
         # Disable network access
-        self.d.setVar('BB_NO_NETWORK', '1')
+        self.d.setVar("BB_NO_NETWORK", "1")
         # Fetch again with npm
-        fetcher = bb.fetch.Fetch(['npm://registry.npmjs.org;package=array-flatten;version=1.1.1'], self.d)
+        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
         fetcher.download()
         fetcher.unpack(self.unpackdir)
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'npm', 'package.json')))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "npm/package.json")))
 
     @skipIfNoNetwork()
     def test_npmsw_bad_checksum(self):
         # Try to fetch with bad checksum
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-gfNEp2hqgLTFKT6P3AsBYMgsBqg='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-gfNEp2hqgLTFKT6P3AsBYMgsBqg="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         with self.assertRaises(bb.fetch2.FetchError):
             fetcher.download()
         # Fetch correctly to get a tarball
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
-        localpath = os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')
+        localpath = os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")
         self.assertTrue(os.path.exists(localpath))
         # Modify the tarball
-        bad = b'bad checksum'
-        with open(localpath, 'wb') as f:
+        bad = b"bad checksum"
+        with open(localpath, "wb") as f:
             f.write(bad)
         # Verify that the tarball is fetched again
         fetcher.download()
         badsum = hashlib.sha1(bad).hexdigest()
-        self.assertTrue(os.path.exists(localpath + '_bad-checksum_' + badsum))
+        self.assertTrue(os.path.exists(localpath + "_bad-checksum_" + badsum))
         self.assertTrue(os.path.exists(localpath))
 
     @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npmsw_premirrors(self):
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(['npm://registry.npmjs.org;package=array-flatten;version=1.1.1'], self.d)
+        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
         # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, 'mirror')
+        mirrordir = os.path.join(self.tempdir, "mirror")
         bb.utils.mkdirhier(mirrordir)
         os.replace(ud.localpath, os.path.join(mirrordir, os.path.basename(ud.localpath)))
-        self.d.setVar('PREMIRRORS', 'https?$://.*/.* file://%s/' % mirrordir)
-        self.d.setVar('BB_FETCH_PREMIRRORONLY', '1')
+        self.d.setVar("PREMIRRORS", "https?$://.*/.* file://%s/" % mirrordir)
+        self.d.setVar("BB_FETCH_PREMIRRORONLY", "1")
         # Fetch again
         self.assertFalse(os.path.exists(ud.localpath))
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
 
@@ -3065,51 +3065,51 @@ class NPMTest(FetcherTest):
     @skipIfNoNetwork()
     def test_npmsw_mirrors(self):
         # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(['npm://registry.npmjs.org;package=array-flatten;version=1.1.1'], self.d)
+        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
         # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, 'mirror')
+        mirrordir = os.path.join(self.tempdir, "mirror")
         bb.utils.mkdirhier(mirrordir)
         os.replace(ud.localpath, os.path.join(mirrordir, os.path.basename(ud.localpath)))
-        self.d.setVar('MIRRORS', 'https?$://.*/.* file://%s/' % mirrordir)
+        self.d.setVar("MIRRORS", "https?$://.*/.* file://%s/" % mirrordir)
         # Fetch again with invalid url
         self.assertFalse(os.path.exists(ud.localpath))
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://invalid',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://invalid",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(ud.localpath))
 
     @skipIfNoNetwork()
     def test_npmsw_bundled(self):
         swfile = self.create_shrinkwrap_file({
-            'packages': {
-                'node_modules/array-flatten': {
-                    'version': '1.1.1',
-                    'resolved': 'https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz',
-                    'integrity': 'sha1-ml9pkFGx5wczKPKgCJaLZOopVdI='
+            "packages": {
+                "node_modules/array-flatten": {
+                    "version": "1.1.1",
+                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
+                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
                 },
-                'node_modules/content-type': {
-                    'version': '1.0.4',
-                    'resolved': 'https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz',
-                    'integrity': 'sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==',
-                    'inBundle': True
+                "node_modules/content-type": {
+                    "version": "1.0.4",
+                    "resolved": "https://registry.npmjs.org/content-type/-/content-type-1.0.4.tgz",
+                    "integrity": "sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA==",
+                    "inBundle": True
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(['npmsw://' + swfile], self.d)
+        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
         fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, 'npm2', 'array-flatten-1.1.1.tgz')))
-        self.assertFalse(os.path.exists(os.path.join(self.dldir, 'npm2', 'content-type-1.0.4.tgz')))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertFalse(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
 
 class GitSharedTest(FetcherTest):
     def setUp(self):
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 12/21] tests: fetch: move npmsw test cases into npmsw test class
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (10 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 11/21] tests: fetch: adapt style in npm(sw) class Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 13/21] tests: fetch: adapt npm test cases Stefan Herbrechtsmeier
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 934b96cac..3fd4c82cd 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -2831,6 +2831,13 @@ class NPMTest(FetcherTest):
         with self.assertRaises(bb.fetch2.MissingParameterError):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
+class NPMSWTest(FetcherTest):
+    def skipIfNoNpm():
+        import shutil
+        if not shutil.which("npm"):
+            return unittest.skip("npm not installed")
+        return lambda f: f
+
     def create_shrinkwrap_file(self, data):
         import json
         datadir = os.path.join(self.tempdir, "data")
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 13/21] tests: fetch: adapt npm test cases
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (11 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 12/21] tests: fetch: move npmsw test cases into npmsw test class Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 14/21] fetch: add dependency mixin Stefan Herbrechtsmeier
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Adapt the npm test cases to the reworked npm fetcher:
* Add test case for latest version check
* Remove decorator for npm binary check
* Use common npm package for test cases
* Define excepted file names
* Remove test cases for (pre)mirrors, network and invalid urls because
  the reworked class is based on the wget fetcher.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 221 +++++++-----------------------------------
 1 file changed, 36 insertions(+), 185 deletions(-)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 3fd4c82cd..09f493f8a 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -1507,6 +1507,14 @@ class FetchLatestVersionTest(FetcherTest):
             : "0.9.29"
    }
 
+    test_npm_uris = {
+        # basic example; version pattern "A.B.C"
+        (
+            "types-node",
+            "npm://registry.npmjs.org;package=@types/node;version=16.0.0"
+        ) : "22.10.2"
+   }
+
     @skipIfNoNetwork()
     def test_git_latest_versionstring(self):
         for k, v in self.test_git_uris.items():
@@ -1557,6 +1565,17 @@ class FetchLatestVersionTest(FetcherTest):
             r = bb.utils.vercmp_string(v, verstring)
             self.assertTrue(r == -1 or r == 0, msg="Package %s, version: %s <= %s" % (k[0], v, verstring))
 
+    @skipIfNoNetwork()
+    def test_npm_latest_versionstring(self):
+        for k, v in self.test_npm_uris.items():
+            self.d.setVar("PN", k[0])
+            ud = bb.fetch2.FetchData(k[1], self.d)
+            pupver = ud.method.latest_versionstring(ud, self.d)
+            verstring = pupver[0]
+            self.assertTrue(verstring, msg="Could not find upstream version for %s" % k[0])
+            r = bb.utils.vercmp_string(v, verstring)
+            self.assertTrue(r == -1 or r == 0, msg="Package %s, version: %s <= %s" % (k[0], v, verstring))
+
 class FetchCheckStatusTest(FetcherTest):
     test_wget_uris = ["https://downloads.yoctoproject.org/releases/sato/sato-engine-0.1.tar.gz",
                       "https://downloads.yoctoproject.org/releases/sato/sato-engine-0.2.tar.gz",
@@ -2618,216 +2637,48 @@ class CrateTest(FetcherTest):
             fetcher.download()
 
 class NPMTest(FetcherTest):
-    def skipIfNoNpm():
-        import shutil
-        if not shutil.which("npm"):
-            return unittest.skip("npm not installed")
-        return lambda f: f
-
-    @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
+        urls = [
+            "npm://registry.npmjs.org;package=@types/node;version=22.10.2;"
+            "sha256sum=9dad888e5280e9969393d7410e26d4edf726a828ee4762318c8ddf6fcfee793e"
+        ]
         fetcher = bb.fetch.Fetch(urls, self.d)
         ud = fetcher.ud[fetcher.urls[0]]
         fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-        self.assertTrue(os.path.exists(ud.localpath + ".done"))
-        self.assertTrue(os.path.exists(ud.resolvefile))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/@types-node-22.10.2.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/@types-node-22.10.2.tgz.done")))
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, "npm")
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_bad_checksum(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-        # Modify the tarball
-        bad = b"bad checksum"
-        with open(ud.localpath, "wb") as f:
-            f.write(bad)
-        # Verify that the tarball is fetched again
-        fetcher.download()
-        badsum = hashlib.sha512(bad).hexdigest()
-        self.assertTrue(os.path.exists(ud.localpath + "_bad-checksum_" + badsum))
-        self.assertTrue(os.path.exists(ud.localpath))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_premirrors(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-
-        # Setup the mirror by renaming the download directory
-        mirrordir = os.path.join(self.tempdir, "mirror")
-        bb.utils.rename(self.dldir, mirrordir)
-        os.mkdir(self.dldir)
-
-        # Configure the premirror to be used
-        self.d.setVar("PREMIRRORS", "https?$://.*/.* file://%s/npm2" % mirrordir)
-        self.d.setVar("BB_FETCH_PREMIRRORONLY", "1")
-
-        # Fetch again
-        self.assertFalse(os.path.exists(ud.localpath))
-        # The npm fetcher doesn"t handle that the .resolved file disappears
-        # while the fetcher object exists, which it does when we rename the
-        # download directory to "mirror" above. Thus we need a new fetcher to go
-        # with the now empty download directory.
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_premirrors_with_specified_filename(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-        # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, "mirror")
-        bb.utils.mkdirhier(mirrordir)
-        mirrorfilename = os.path.join(mirrordir, os.path.basename(ud.localpath))
-        os.replace(ud.localpath, mirrorfilename)
-        self.d.setVar("PREMIRRORS", "https?$://.*/.* file://%s" % mirrorfilename)
-        self.d.setVar("BB_FETCH_PREMIRRORONLY", "1")
-        # Fetch again
-        self.assertFalse(os.path.exists(ud.localpath))
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_mirrors(self):
-        # Fetch once to get a tarball
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-        # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, "mirror")
-        bb.utils.mkdirhier(mirrordir)
-        os.replace(ud.localpath, os.path.join(mirrordir, os.path.basename(ud.localpath)))
-        self.d.setVar("MIRRORS", "https?$://.*/.* file://%s/" % mirrordir)
-        # Update the resolved url to an invalid url
-        with open(ud.resolvefile, "r") as f:
-            url = f.read()
-        uri = URI(url)
-        uri.path = "/invalid"
-        with open(ud.resolvefile, "w") as f:
-            f.write(str(uri))
-        # Fetch again
-        self.assertFalse(os.path.exists(ud.localpath))
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, 'npm/package.json')))
 
-    @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_destsuffix_downloadfilename(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0;destsuffix=foo/bar;downloadfilename=foo-bar.tgz"]
+        urls = [
+            "npm://registry.npmjs.org;package=@types/node;version=22.10.2;"
+            "destsuffix=foo/bar;downloadfilename=npm2/foo-bar.tgz;"
+            "sha256sum=9dad888e5280e9969393d7410e26d4edf726a828ee4762318c8ddf6fcfee793e"
+        ]
         fetcher = bb.fetch.Fetch(urls, self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/foo-bar.tgz")))
         fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, "foo", "bar")
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
-
-    def test_npm_no_network_no_tarball(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        self.d.setVar("BB_NO_NETWORK", "1")
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        with self.assertRaises(bb.fetch2.NetworkAccess):
-            fetcher.download()
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_no_network_with_tarball(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        fetcher.download()
-        # Disable network access
-        self.d.setVar("BB_NO_NETWORK", "1")
-        # Fetch again
-        fetcher.download()
-        fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, "npm")
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "foo/bar/package.json")))
 
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_registry_alternate(self):
-        urls = ["npm://skimdb.npmjs.com;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        fetcher.download()
-        fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, "npm")
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
-
-    @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_latest(self):
-        url = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=latest"]
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        fetcher.download()
-        fetcher.unpack(self.unpackdir)
-        unpackdir = os.path.join(self.unpackdir, "npm")
-        self.assertTrue(os.path.exists(os.path.join(unpackdir, "package.json")))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_registry_invalid(self):
-        urls = ["npm://registry.invalid.org;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        with self.assertRaises(bb.fetch2.FetchError):
-            fetcher.download()
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_package_invalid(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/invalid;version=1.0.0"]
-        fetcher = bb.fetch.Fetch(urls, self.d)
-        with self.assertRaises(bb.fetch2.FetchError):
-            fetcher.download()
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_version_invalid(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example;version=invalid"]
+        urls = ["npm://registry.npmjs.org;package=@types/node;version=latest"]
         with self.assertRaises(bb.fetch2.ParameterError):
-            fetcher = bb.fetch.Fetch(urls, self.d)
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npm_registry_none(self):
-        urls = ["npm://;package=@savoirfairelinux/node-server-example;version=1.0.0"]
-        with self.assertRaises(bb.fetch2.MalformedUrl):
-            fetcher = bb.fetch.Fetch(urls, self.d)
+            bb.fetch.Fetch(urls, self.d)
 
-    @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_package_none(self):
-        urls = ["npm://registry.npmjs.org;version=1.0.0"]
+        urls = ["npm://registry.npmjs.org;version=22.10.2"]
         with self.assertRaises(bb.fetch2.MissingParameterError):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
-    @skipIfNoNpm()
     @skipIfNoNetwork()
     def test_npm_version_none(self):
-        urls = ["npm://registry.npmjs.org;package=@savoirfairelinux/node-server-example"]
+        urls = ["npm://registry.npmjs.org;package=@types/node"]
         with self.assertRaises(bb.fetch2.MissingParameterError):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 14/21] fetch: add dependency mixin
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (12 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 13/21] tests: fetch: adapt npm test cases Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 15/21] tests: fetch: add test cases for dependency fetcher Stefan Herbrechtsmeier
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add a dependency mixin classes and module. The mixin implements generic
methods to fetch dependencies via a dependency specification file. The
subclass must implement a resolve dependencies method. The method
resolves the dependencies from a specification file into fetcher URLs.
The module provides different fetcher subtypes:

<type> (Local)
    The fetcher uses a local specification file to fetch dependencies.

    SRC_URI = "<type>://specification.txt"

<type>+https (Wget)
    The fetcher downloads a specification file or archive with a
    specification file in the root folder and uses the specification
    file to fetch dependencies.

    SRC_URI = "<type>+http://example.com/specification.txt "
    SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"

<type>+git (Git)
    The fetcher checkouts a git repository with a specification file to
    fetch dependencies.

    SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/dependency.py | 175 ++++++++++++++++++++++++++++++++++++
 1 file changed, 175 insertions(+)
 create mode 100644 lib/bb/fetch2/dependency.py

diff --git a/lib/bb/fetch2/dependency.py b/lib/bb/fetch2/dependency.py
new file mode 100644
index 000000000..4acad8779
--- /dev/null
+++ b/lib/bb/fetch2/dependency.py
@@ -0,0 +1,175 @@
+# Copyright (C) 2024-2025 Weidmueller Interface GmbH & Co. KG
+# Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
+#
+# SPDX-License-Identifier: MIT
+#
+"""
+BitBake 'Fetch' mixin implementation for dependency specification files
+"""
+
+import tempfile
+import bb
+from bb.fetch2 import Fetch
+from bb.fetch2.git import Git
+from bb.fetch2.local import Local
+from bb.fetch2.wget import Wget
+from bb.utils import lockfile, unlockfile
+
+class DependencyMixin:
+    """Class to fetch all dependencies resolved via foreign function"""
+
+    def urldata_init(self, ud, d):
+        ud.type = ud.type.split("+")[-1] if "+" in ud.type else "file"
+        ud.url = ":".join((ud.type, ud.url.split(":", 1)[-1]))
+        super().urldata_init(ud, d)
+        ud.proxy = None
+
+    def _init_proxy(self, ud, d):
+        if ud.proxy:
+            return
+
+        urls = self.process_source(ud, d)
+        if urls:
+            ud.proxy = Fetch(urls, d)
+
+    @staticmethod
+    def _foreach_proxy_method(ud, handle, d):
+        """Call method for each dependency"""
+        returns = []
+        for proxy_url in ud.proxy.urls:
+            proxy_ud = ud.proxy.ud[proxy_url]
+            proxy_d = ud.proxy.d
+            proxy_ud.setup_localpath(proxy_d)
+            lf = lockfile(proxy_ud.lockfile)
+            returns.append(handle(proxy_ud.method, proxy_ud, proxy_d))
+            unlockfile(lf)
+        return returns
+
+    def verify_donestamp(self, ud, d):
+        """Verify the donestamp file"""
+        if not super().verify_donestamp(ud, d):
+            return False
+
+        self._init_proxy(ud, d)
+        def handle(m, ud, d):
+            return m.verify_donestamp(ud, d)
+        return all(self._foreach_proxy_method(ud, handle, d))
+
+    def update_donestamp(self, ud, d):
+        """Update the donestamp file"""
+        super().update_donestamp(ud, d)
+
+        self._init_proxy(ud, d)
+        def handle(m, ud, d):
+            m.update_donestamp(ud, d)
+        self._foreach_proxy_method(ud, handle, d)
+
+    def need_update(self, ud, d):
+        """Force a fetch, even if localpath exists ?"""
+        if super().need_update(ud, d):
+            return True
+
+        self._init_proxy(ud, d)
+        def handle(m, ud, d):
+            return m.need_update(ud, d)
+        return any(self._foreach_proxy_method(ud, handle, d))
+
+    def try_mirrors(self, fetch, ud, d, mirrors):
+        """Try to use a mirror"""
+        if not super().try_mirrors(fetch, ud, d, mirrors):
+            return False
+
+        self._init_proxy(ud, d)
+        def handle(m, ud, d):
+            return m.try_mirrors(fetch, ud, d, mirrors)
+        return all(self._foreach_proxy_method(ud, handle, d))
+
+    def download(self, ud, d):
+        """Fetch url"""
+        super().download(ud, d)
+        self._init_proxy(ud, d)
+        ud.proxy.download()
+
+    def unpack(self, ud, rootdir, d):
+        """Unpack the downloaded dependencies"""
+        super().unpack(ud, rootdir, d)
+        self._init_proxy(ud, d)
+        ud.proxy.unpack(ud.destdir)
+
+    def clean(self, ud, d):
+        """Clean any existing full or partial download"""
+        self._init_proxy(ud, d)
+        ud.proxy.clean()
+        super().clean(ud, d)
+
+    def done(self, ud, d):
+        """Is the download done ?"""
+        if not super().done(ud, d):
+            return False
+
+        self._init_proxy(ud, d)
+        def _handle(m, ud, d):
+            return m.done(ud, d)
+        return all(self._foreach_proxy_method(ud, _handle, d))
+
+class LocalDependency(DependencyMixin, Local):
+    """
+    Abstract class to fetch all dependencies from a local specification file
+    """
+
+    def process_source(self, ud, d):
+        return self.resolve_dependencies(ud, ud.localpath, d)
+
+class WgetDependency(DependencyMixin, Wget):
+    """
+    Abstract class to fetch all dependencies from a specification file inside an
+    archive
+    """
+
+    def process_source(self, ud, d):
+        with tempfile.TemporaryDirectory(dir=d.getVar('DL_DIR')) as tmpdir:
+            Wget.unpack(self, ud, tmpdir, d)
+            return self.resolve_dependencies(ud, ud.destdir, d)
+
+class GitDependency(DependencyMixin, Git):
+    """
+    Abstract class to fetch all dependencies from a specification file inside a
+    git repository
+    """
+
+    def process_source(self, ud, d):
+        with tempfile.TemporaryDirectory(dir=d.getVar('DL_DIR')) as tmpdir:
+            Git.unpack(self, ud, tmpdir, d)
+            return self.resolve_dependencies(ud, ud.destdir, d)
+
+def create_methods(type, mixin):
+    class SpecificLocalDependency(mixin, LocalDependency):
+        """
+        Specific class to fetch all dependencies from a local specification file
+        """
+
+        def supports(self, ud, d):
+            return ud.type == type
+
+    class SpecificWgetDependency(mixin, WgetDependency):
+        """
+        Specific class to fetch all dependencies from a specification file
+        inside an archive
+        """
+
+        def supports(self, ud, d):
+            return ud.type in [f"{type}+http", f"{type}+https"]
+
+    class SpecificGitDependency(mixin, GitDependency):
+        """
+        Specific class to fetch all dependencies from a specification file
+        inside a git repository
+        """
+
+        def supports(self, ud, d):
+            return ud.type == f"{type}+git"
+
+    return [
+        SpecificLocalDependency(),
+        SpecificWgetDependency(),
+        SpecificGitDependency()]
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 15/21] tests: fetch: add test cases for dependency fetcher
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (13 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 14/21] fetch: add dependency mixin Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 16/21] fetch: npmsw: migrate to dependency mixin Stefan Herbrechtsmeier
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add test cases for the dependency fetcher. The tests use a dummy fetcher
because the dependency fetcher provides a mixin only and isn't
self-contained.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 121 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 09f493f8a..903095746 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -3443,3 +3443,124 @@ class GoModGitTest(FetcherTest):
         self.assertTrue(os.path.exists(os.path.join(downloaddir, 'go.opencensus.io/@v/v0.24.0.mod')))
         self.assertEqual(bb.utils.sha256_file(os.path.join(downloaddir, 'go.opencensus.io/@v/v0.24.0.mod')),
                          '0dc9ccc660ad21cebaffd548f2cc6efa27891c68b4fbc1f8a3893b00f1acec96')
+
+class DependencyTest(FetcherTest):
+    class DummyMixin:
+        def resolve_dependencies(self, ud, localpath, d):
+            urls = []
+            if os.path.isdir(localpath):
+                localpath = os.path.join(localpath, "dummy.txt")
+            with open(localpath, "r") as f:
+                for line in f:
+                    line = line.strip()
+                    urls.append(line)
+            return urls
+
+    def create_specification_file(self):
+        dummyfile = "dummy.txt"
+        lines = [
+            "https://downloads.yoctoproject.org/releases/bitbake/bitbake-1.0.tar.gz",
+            "git://git.openembedded.org/bitbake;branch=master;protocol=https;rev=82ea737a0b42a8b53e11c9cde141e9e9c0bd8c40"
+        ]
+        with open(os.path.join(self.srcdir, dummyfile), "w") as f:
+            f.write("\n".join(lines) + "\n")
+        return dummyfile
+
+    def setUp(self):
+        from bb.fetch2.dependency import create_methods
+        super().setUp()
+        self.srcdir = os.path.join(self.tempdir, "src")
+        os.makedirs(self.srcdir)
+        bb.fetch.methods.extend(create_methods("dummy", self.DummyMixin))
+        self.srcfilename = self.create_specification_file()
+
+    @skipIfNoNetwork()
+    def test_dummy(self):
+        self.d.setVar("FILESPATH", self.srcdir)
+        fetcher = bb.fetch.Fetch([f"dummy://{self.srcfilename}"], self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/git.openembedded.org.bitbake")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy.txt")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "bitbake-1.0")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git")))
+
+    @skipIfNoNetwork()
+    def test_dummy_git(self):
+        self.git_init(self.srcdir)
+        self.git(["add", "--all", "."], self.srcdir)
+        self.git(["commit", "-m", "Dummy commit"], self.srcdir)
+        rev = self.git(["rev-parse", "HEAD"], self.srcdir).strip()
+        urls = [
+            f"dummy+git://{self.srcdir};branch=master;protocol=file;rev={rev}"
+        ]
+        fetcher = bb.fetch.Fetch(urls, self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/git.openembedded.org.bitbake")))
+        archivename = self.srcdir[1:].replace('/', '.')
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2", archivename)))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/dummy.txt")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/bitbake-1.0")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/git")))
+
+    @skipIfNoNetwork()
+    def test_dummy_https_file(self):
+        archivename = "archive.tar.gz"
+        sha256sum = bb.utils.sha256_file(os.path.join(self.srcdir, self.srcfilename))
+        server = HTTPService(self.srcdir, "127.0.0.1")
+        server.start()
+        port = server.port
+        try:
+            urls = [
+                f"dummy+http://{server.host}:{server.port}/{self.srcfilename};"
+                f"sha256sum={sha256sum}"
+            ]
+            fetcher = bb.fetch.Fetch(urls, self.d)
+            fetcher.download()
+        finally:
+            server.stop()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "dummy.txt")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/git.openembedded.org.bitbake")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy.txt")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "bitbake-1.0")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git")))
+
+    @skipIfNoNetwork()
+    def test_dummy_https_archive(self):
+        archivename = "archive.tar.gz"
+        projectname = "dummy"
+        projectdir = os.path.join(self.srcdir, projectname)
+        os.makedirs(projectdir)
+        os.rename(os.path.join(self.srcdir, self.srcfilename),
+                    os.path.join(projectdir, self.srcfilename))
+        bb.process.run(f"tar czf {archivename} -C {projectname} .", cwd=self.srcdir)
+        sha256sum = bb.utils.sha256_file(os.path.join(self.srcdir, archivename))
+        server = HTTPService(self.srcdir, "127.0.0.1")
+        server.start()
+        port = server.port
+        try:
+            urls = [
+                f"dummy+http://{server.host}:{server.port}/{archivename};"
+                f"sha256sum={sha256sum};striplevel=1;subdir={projectname}"
+            ]
+            fetcher = bb.fetch.Fetch(urls, self.d)
+            fetcher.download()
+        finally:
+            server.stop()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "archive.tar.gz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "archive.tar.gz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "bitbake-1.0.tar.gz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/git.openembedded.org.bitbake")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy/dummy.txt")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy/bitbake-1.0")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy/git")))
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 16/21] fetch: npmsw: migrate to dependency mixin
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (14 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 15/21] tests: fetch: add test cases for dependency fetcher Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 17/21] tests: fetch: adapt npmsw test cases Stefan Herbrechtsmeier
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Migrate npmsw fetcher to dependency mixin to support different fetcher
subtypes. The migrated fetcher fetches dependencies via a
npm-shrinkwrap.json file or if missing via package-lock.json file. It
supports different types:s

npmsw
    The fetcher uses a local npm-shrinkwrap.json or package-lock.json
    file to fetch dependencies.

    SRC_URI = "npmsw://npm-shrinkwrap.json"

npmsw+https
    The fetcher downloads a npm-shrinkwrap.json or package-lock.json
    file or archive with a npm-shrinkwrap.json or package-lock.json
    file in the root folder and uses the npm-shrinkwrap.json or
    package-lock.json file to fetch dependencies.

    SRC_URI = "npmsw+http://example.com/ npm-shrinkwrap.json"
    SRC_URI = "npmsw+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"

npmsw+git
    The fetcher checkouts a git repository with a npm-shrinkwrap.json or
    package-lock.json file to fetch dependencies.

    SRC_URI = "npmsw+git://example.com/${BPN}.git;protocol=https"

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py   |   2 +-
 lib/bb/fetch2/dependency.py |   8 --
 lib/bb/fetch2/npmsw.py      | 272 +++++++++---------------------------
 3 files changed, 66 insertions(+), 216 deletions(-)

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 3a7030bf3..5dbc0598d 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -2134,9 +2134,9 @@ methods.append(osc.Osc())
 methods.append(repo.Repo())
 methods.append(clearcase.ClearCase())
 methods.append(npm.Npm())
-methods.append(npmsw.NpmShrinkWrap())
 methods.append(az.Az())
 methods.append(crate.Crate())
 methods.append(gcp.GCP())
 methods.append(gomod.GoMod())
 methods.append(gomod.GoModGit())
+methods.extend(npmsw.methods)
diff --git a/lib/bb/fetch2/dependency.py b/lib/bb/fetch2/dependency.py
index 4acad8779..e30d7fb73 100644
--- a/lib/bb/fetch2/dependency.py
+++ b/lib/bb/fetch2/dependency.py
@@ -46,7 +46,6 @@ class DependencyMixin:
         return returns
 
     def verify_donestamp(self, ud, d):
-        """Verify the donestamp file"""
         if not super().verify_donestamp(ud, d):
             return False
 
@@ -56,7 +55,6 @@ class DependencyMixin:
         return all(self._foreach_proxy_method(ud, handle, d))
 
     def update_donestamp(self, ud, d):
-        """Update the donestamp file"""
         super().update_donestamp(ud, d)
 
         self._init_proxy(ud, d)
@@ -65,7 +63,6 @@ class DependencyMixin:
         self._foreach_proxy_method(ud, handle, d)
 
     def need_update(self, ud, d):
-        """Force a fetch, even if localpath exists ?"""
         if super().need_update(ud, d):
             return True
 
@@ -75,7 +72,6 @@ class DependencyMixin:
         return any(self._foreach_proxy_method(ud, handle, d))
 
     def try_mirrors(self, fetch, ud, d, mirrors):
-        """Try to use a mirror"""
         if not super().try_mirrors(fetch, ud, d, mirrors):
             return False
 
@@ -85,25 +81,21 @@ class DependencyMixin:
         return all(self._foreach_proxy_method(ud, handle, d))
 
     def download(self, ud, d):
-        """Fetch url"""
         super().download(ud, d)
         self._init_proxy(ud, d)
         ud.proxy.download()
 
     def unpack(self, ud, rootdir, d):
-        """Unpack the downloaded dependencies"""
         super().unpack(ud, rootdir, d)
         self._init_proxy(ud, d)
         ud.proxy.unpack(ud.destdir)
 
     def clean(self, ud, d):
-        """Clean any existing full or partial download"""
         self._init_proxy(ud, d)
         ud.proxy.clean()
         super().clean(ud, d)
 
     def done(self, ud, d):
-        """Is the download done ?"""
         if not super().done(ud, d):
             return False
 
diff --git a/lib/bb/fetch2/npmsw.py b/lib/bb/fetch2/npmsw.py
index 2f9599ee9..fffb2a102 100644
--- a/lib/bb/fetch2/npmsw.py
+++ b/lib/bb/fetch2/npmsw.py
@@ -1,37 +1,35 @@
 # Copyright (C) 2020 Savoir-Faire Linux
+# Copyright (C) 2024-2025 Weidmueller Interface GmbH & Co. KG
+# Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
 #
 # SPDX-License-Identifier: GPL-2.0-only
 #
 """
-BitBake 'Fetch' npm shrinkwrap implementation
+BitBake 'Fetch' implementation for npm-shrinkwrap.json and package-lock.json
 
-npm fetcher support the SRC_URI with format of:
-SRC_URI = "npmsw://some.registry.url;OptionA=xxx;OptionB=xxx;..."
+The npmsw, npmsw+https and npmsw+git fetchers are used to download npm package
+dependencies via a npm-shrinkwrap.json and package-lock.json file.
 
-Supported SRC_URI options are:
+The fetcher support the SRC_URI with format of:
+SRC_URI = "npmsw://npm-shrinkwrap.json"
+SRC_URI = "npmsw+https://example.com/name-1.2.3.tar.gz"
+SRC_URI = "npmsw+git://example.com/repo.git"
+
+Additional supported SRC_URI options are:
 
 - dev
    Set to 1 to also install devDependencies.
-
-- destsuffix
-    Specifies the directory to use to unpack the dependencies (default: ${S}).
 """
 
+import base64
 import json
 import os
 import re
 import bb
-from bb.fetch2 import Fetch
-from bb.fetch2 import FetchMethod
-from bb.fetch2 import ParameterError
-from bb.fetch2 import runfetchcmd
-from bb.fetch2 import URI
-from bb.fetch2.npm import npm_integrity
-from bb.fetch2.npm import npm_localfile
-from bb.fetch2.npm import npm_unpack
+from bb.fetch2 import FetchError, ParameterError, URI
+from bb.fetch2.dependency import create_methods
+from bb.fetch2.npm import construct_url_path
 from bb.utils import is_semver
-from bb.utils import lockfile
-from bb.utils import unlockfile
 
 def foreach_dependencies(shrinkwrap, callback=None, dev=False):
     """
@@ -58,40 +56,31 @@ def foreach_dependencies(shrinkwrap, callback=None, dev=False):
         name = location.split('node_modules/')[-1]
         callback(name, data, location)
 
-class NpmShrinkWrap(FetchMethod):
-    """Class to fetch all package from a shrinkwrap file"""
-
-    def supports(self, ud, d):
-        """Check if a given url can be fetched with npmsw"""
-        return ud.type in ["npmsw"]
-
+class NpmShrinkWrapMixin:
     def urldata_init(self, ud, d):
         """Init npmsw specific variables within url data"""
-
-        # Get the 'shrinkwrap' parameter
-        ud.shrinkwrap_file = re.sub(r"^npmsw://", "", ud.url.split(";")[0])
-
-        # Get the 'dev' parameter
+        super().urldata_init(ud, d)
         ud.dev = bb.utils.to_boolean(ud.parm.get("dev"), False)
 
-        # Resolve the dependencies
-        ud.deps = []
+    def resolve_dependencies(self, ud, localpath, d):
+        urls = []
 
-        def _resolve_dependency(name, params, destsuffix):
+        def resolve_dependency(name, data, location):
             url = None
-            localpath = None
-            extrapaths = []
-            unpack = True
 
-            integrity = params.get("integrity")
-            resolved = params.get("resolved")
-            version = params.get("version")
-            link = params.get("link", False)
+            integrity = data.get("integrity")
+            resolved = data.get("resolved")
+            version = data.get("version")
+            link = data.get("link", False)
+
+            if integrity:
+                algorithm, value = integrity.split("-", maxsplit=1)
+                checksum_name = f"{algorithm}sum"
+                checksum_value = base64.b64decode(value).hex()
 
-            # Handle link sources
+            # Skip link sources
             if link:
-                localpath = resolved
-                unpack = False
+                return
 
             # Handle registry sources
             elif version and is_semver(version) and integrity:
@@ -99,193 +88,62 @@ class NpmShrinkWrap(FetchMethod):
                 if not resolved:
                     return
 
-                localfile = npm_localfile(name, version)
-
                 uri = URI(resolved)
-                uri.params["downloadfilename"] = localfile
-
-                checksum_name, checksum_expected = npm_integrity(integrity)
-                uri.params[checksum_name] = checksum_expected
-
+                package_path = construct_url_path(name, version)
+                if uri.scheme == "https" and uri.path.endswith(package_path):
+                    uri.scheme = "npm"
+                    uri.path = uri.path[:-len(package_path)]
+                    uri.params["dn"] = name
+                    uri.params["dv"] = version
+                    uri.params["destsuffix"] = location
+                else:
+                    bb.warn(f"Please add support for the url to npm fetcher: {resolved}")
+                uri.params[checksum_name] = checksum_value
                 url = str(uri)
 
-                localpath = os.path.join(d.getVar("DL_DIR"), localfile)
-
-                # Create a resolve file to mimic the npm fetcher and allow
-                # re-usability of the downloaded file.
-                resolvefile = localpath + ".resolved"
-
-                bb.utils.mkdirhier(os.path.dirname(resolvefile))
-                with open(resolvefile, "w") as f:
-                    f.write(url)
-
-                extrapaths.append(resolvefile)
-
             # Handle http tarball sources
             elif resolved.startswith("http") and integrity:
-                localfile = npm_localfile(os.path.basename(resolved))
-
                 uri = URI(resolved)
-                uri.params["downloadfilename"] = localfile
-
-                checksum_name, checksum_expected = npm_integrity(integrity)
-                uri.params[checksum_name] = checksum_expected
-
+                uri.params["subdir"] = location
+                uri.params["striplevel"] = 1
+                uri.params[checksum_name] = checksum_value
                 url = str(uri)
 
-                localpath = os.path.join(d.getVar("DL_DIR"), localfile)
-
-            # Handle local tarball sources
+            # Skip local tarball
             elif resolved.startswith("file"):
-                localpath = resolved[5:]
+                return
 
             # Handle git sources
             elif resolved.startswith("git"):
-                regex = re.compile(r"""
-                    ^
-                    git\+
-                    (?P<protocol>[a-z]+)
-                    ://
-                    (?P<url>[^#]+)
-                    \#
-                    (?P<rev>[0-9a-f]+)
-                    $
-                    """, re.VERBOSE)
-
-                match = regex.match(resolved)
-                if not match:
-                    raise ParameterError("Invalid git url: %s" % resolved, ud.url)
-
-                groups = match.groupdict()
-
-                uri = URI("git://" + str(groups["url"]))
-                uri.params["protocol"] = str(groups["protocol"])
-                uri.params["rev"] = str(groups["rev"])
+                url, _, rev = resolved.partition("#")
+                uri = URI(url)
+                scheme, _, protocol = uri.scheme.partition("+")
+                if protocol:
+                    uri.params["protocol"] = protocol
+                    uri.scheme = scheme
+                uri.params["rev"] = rev
                 uri.params["nobranch"] = "1"
-                uri.params["destsuffix"] = destsuffix
-
+                uri.params["destsuffix"] = location
                 url = str(uri)
 
             else:
-                raise ParameterError("Unsupported dependency: %s" % name, ud.url)
+                raise ParameterError(f"Unsupported dependency: {name}", ud.url)
 
-            # name is needed by unpack tracer for module mapping
-            ud.deps.append({
-                "name": name,
-                "url": url,
-                "localpath": localpath,
-                "extrapaths": extrapaths,
-                "destsuffix": destsuffix,
-                "unpack": unpack,
-            })
+            urls.append(url)
 
+        if os.path.isdir(localpath):
+            localdir = localpath
+            localpath = os.path.join(localdir, "npm-shrinkwrap.json")
+            if not os.path.isfile(localpath):
+                localpath = os.path.join(localdir, "package-lock.json")
         try:
-            with open(ud.shrinkwrap_file, "r") as f:
+            with open(localpath, "r") as f:
                 shrinkwrap = json.load(f)
         except Exception as e:
             raise ParameterError("Invalid shrinkwrap file: %s" % str(e), ud.url)
 
-        foreach_dependencies(shrinkwrap, _resolve_dependency, ud.dev)
-
-        # Avoid conflicts between the environment data and:
-        # - the proxy url revision
-        # - the proxy url checksum
-        data = bb.data.createCopy(d)
-        data.delVar("SRCREV")
-        data.delVarFlags("SRC_URI")
-
-        # This fetcher resolves multiple URIs from a shrinkwrap file and then
-        # forwards it to a proxy fetcher. The management of the donestamp file,
-        # the lockfile and the checksums are forwarded to the proxy fetcher.
-        shrinkwrap_urls = [dep["url"] for dep in ud.deps if dep["url"]]
-        if shrinkwrap_urls:
-            ud.proxy = Fetch(shrinkwrap_urls, data)
-        ud.needdonestamp = False
-
-    @staticmethod
-    def _foreach_proxy_method(ud, handle):
-        returns = []
-        #Check if there are dependencies before try to fetch them
-        if len(ud.deps) > 0:
-            for proxy_url in ud.proxy.urls:
-                proxy_ud = ud.proxy.ud[proxy_url]
-                proxy_d = ud.proxy.d
-                proxy_ud.setup_localpath(proxy_d)
-                lf = lockfile(proxy_ud.lockfile)
-                returns.append(handle(proxy_ud.method, proxy_ud, proxy_d))
-                unlockfile(lf)
-        return returns
-
-    def verify_donestamp(self, ud, d):
-        """Verify the donestamp file"""
-        def _handle(m, ud, d):
-            return m.verify_donestamp(ud, d)
-        return all(self._foreach_proxy_method(ud, _handle))
-
-    def update_donestamp(self, ud, d):
-        """Update the donestamp file"""
-        def _handle(m, ud, d):
-            m.update_donestamp(ud, d)
-        self._foreach_proxy_method(ud, _handle)
-
-    def need_update(self, ud, d):
-        """Force a fetch, even if localpath exists ?"""
-        def _handle(m, ud, d):
-            return m.need_update(ud, d)
-        return all(self._foreach_proxy_method(ud, _handle))
-
-    def try_mirrors(self, fetch, ud, d, mirrors):
-        """Try to use a mirror"""
-        def _handle(m, ud, d):
-            return m.try_mirrors(fetch, ud, d, mirrors)
-        return all(self._foreach_proxy_method(ud, _handle))
-
-    def download(self, ud, d):
-        """Fetch url"""
-        ud.proxy.download()
-
-    def unpack(self, ud, rootdir, d):
-        """Unpack the downloaded dependencies"""
-        destdir = rootdir
-        destsuffix = ud.parm.get("destsuffix")
-        if destsuffix:
-            destdir = os.path.join(rootdir, destsuffix)
-        ud.unpack_tracer.unpack("npm-shrinkwrap", destdir)
-
-        bb.utils.mkdirhier(destdir)
-        bb.utils.copyfile(ud.shrinkwrap_file,
-                          os.path.join(destdir, "npm-shrinkwrap.json"))
-
-        auto = [dep["url"] for dep in ud.deps if not dep["localpath"]]
-        manual = [dep for dep in ud.deps if dep["localpath"]]
-
-        if auto:
-            ud.proxy.unpack(destdir, auto)
-
-        for dep in manual:
-            depdestdir = os.path.join(destdir, dep["destsuffix"])
-            if dep["url"]:
-                npm_unpack(dep["localpath"], depdestdir, d)
-            else:
-                depsrcdir= os.path.join(destdir, dep["localpath"])
-                if dep["unpack"]:
-                    npm_unpack(depsrcdir, depdestdir, d)
-                else:
-                    bb.utils.mkdirhier(depdestdir)
-                    cmd = 'cp -fpPRH "%s/." .' % (depsrcdir)
-                    runfetchcmd(cmd, d, workdir=depdestdir)
-
-    def clean(self, ud, d):
-        """Clean any existing full or partial download"""
-        ud.proxy.clean()
+        foreach_dependencies(shrinkwrap, resolve_dependency, ud.dev)
 
-        # Clean extra files
-        for dep in ud.deps:
-            for path in dep["extrapaths"]:
-                bb.utils.remove(path)
+        return urls
 
-    def done(self, ud, d):
-        """Is the download done ?"""
-        def _handle(m, ud, d):
-            return m.done(ud, d)
-        return all(self._foreach_proxy_method(ud, _handle))
+methods = create_methods("npmsw", NpmShrinkWrapMixin)
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 17/21] tests: fetch: adapt npmsw test cases
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (15 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 16/21] fetch: npmsw: migrate to dependency mixin Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 18/21] fetch: add gosum fetcher Stefan Herbrechtsmeier
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Adapt the npmsw test cases to the reworked npmsw fetcher:
* Remove decorator for npm binary check
* Define excepted file names
* Remove test cases for (pre)mirrors, network and invalid urls because
  the reworked class uses the npm, wget and git fetcher.

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 255 +++++++++++-------------------------------
 1 file changed, 65 insertions(+), 190 deletions(-)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 903095746..437571f1c 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -2683,24 +2683,22 @@ class NPMTest(FetcherTest):
             fetcher = bb.fetch.Fetch(urls, self.d)
 
 class NPMSWTest(FetcherTest):
-    def skipIfNoNpm():
-        import shutil
-        if not shutil.which("npm"):
-            return unittest.skip("npm not installed")
-        return lambda f: f
+    def setUp(self):
+        super().setUp()
+        self.localsrcdir = os.path.join(self.tempdir, "localsrc")
+        os.makedirs(self.localsrcdir)
+        self.d.setVar("FILESPATH", self.localsrcdir)
 
     def create_shrinkwrap_file(self, data):
         import json
-        datadir = os.path.join(self.tempdir, "data")
-        swfile = os.path.join(datadir, "npm-shrinkwrap.json")
-        bb.utils.mkdirhier(datadir)
-        with open(swfile, "w") as f:
+        filename = "npm-shrinkwrap.json"
+        with open(os.path.join(self.localsrcdir, filename), 'w') as f:
             json.dump(data, f)
-        return swfile
+        return filename
 
     @skipIfNoNetwork()
     def test_npmsw(self):
-        swfile = self.create_shrinkwrap_file({
+        filename = self.create_shrinkwrap_file({
             "packages": {
                 "node_modules/array-flatten": {
                     "version": "1.1.1",
@@ -2720,13 +2718,19 @@ class NPMSWTest(FetcherTest):
                 },
                 "node_modules/array-flatten/node_modules/content-type/node_modules/cookie": {
                     "resolved": "git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09"
+                },
+                "node_modules/karma": {
+                    "resolved": "",
+                    "link": True
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
+        fetcher = bb.fetch.Fetch([f"npmsw://{filename}"], self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz.done")))
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz.done")))
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.jshttp.cookie.git")))
         fetcher.unpack(self.unpackdir)
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "npm-shrinkwrap.json")))
@@ -2735,21 +2739,8 @@ class NPMSWTest(FetcherTest):
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "node_modules/array-flatten/node_modules/content-type/node_modules/cookie/package.json")))
 
     @skipIfNoNetwork()
-    def test_npmsw_git(self):
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/cookie": {
-                    "resolved": "git+https://github.com/jshttp/cookie.git#aec1177c7da67e3b3273df96cf476824dbc9ae09"
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        fetcher.download()
-        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.jshttp.cookie.git")))
-
-    @skipIfNoNetwork()
-    def test_npmsw_dev(self):
-        swfile = self.create_shrinkwrap_file({
+    def test_npmsw_resolve_dev(self):
+        filename = self.create_shrinkwrap_file({
             "packages": {
                 "node_modules/array-flatten": {
                     "version": "1.1.1",
@@ -2765,19 +2756,23 @@ class NPMSWTest(FetcherTest):
             }
         })
         # Fetch with dev disabled
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
+        fetcher = bb.fetch.Fetch([f"npmsw://{filename}"], self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz.done")))
         self.assertFalse(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
+        self.assertFalse(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz.done")))
         # Fetch with dev enabled
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile + ";dev=1"], self.d)
+        fetcher = bb.fetch.Fetch([f"npmsw://{filename};dev=1"], self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz.done")))
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz.done")))
 
     @skipIfNoNetwork()
-    def test_npmsw_destsuffix(self):
-        swfile = self.create_shrinkwrap_file({
+    def test_npmsw_subdir(self):
+        filename = self.create_shrinkwrap_file({
             "packages": {
                 "node_modules/array-flatten": {
                     "version": "1.1.1",
@@ -2786,170 +2781,17 @@ class NPMSWTest(FetcherTest):
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile + ";destsuffix=foo/bar"], self.d)
+        fetcher = bb.fetch.Fetch([f"npmsw://{filename};subdir=foo/bar"], self.d)
         fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz.done")))
         fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "foo/bar/npm-shrinkwrap.json")))
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "foo/bar/node_modules/array-flatten/package.json")))
 
-    def test_npmsw_no_network_no_tarball(self):
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
-                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
-                }
-            }
-        })
-        self.d.setVar("BB_NO_NETWORK", "1")
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        with self.assertRaises(bb.fetch2.NetworkAccess):
-            fetcher.download()
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npmsw_no_network_with_tarball(self):
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
-        fetcher.download()
-        # Disable network access
-        self.d.setVar("BB_NO_NETWORK", "1")
-        # Fetch again
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
-                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        fetcher.download()
-        fetcher.unpack(self.unpackdir)
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "node_modules/array-flatten/package.json")))
-
-    @skipIfNoNetwork()
-    def test_npmsw_npm_reusability(self):
-        # Fetch once with npmsw
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
-                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        fetcher.download()
-        # Disable network access
-        self.d.setVar("BB_NO_NETWORK", "1")
-        # Fetch again with npm
-        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
-        fetcher.download()
-        fetcher.unpack(self.unpackdir)
-        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "npm/package.json")))
-
-    @skipIfNoNetwork()
-    def test_npmsw_bad_checksum(self):
-        # Try to fetch with bad checksum
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
-                    "integrity": "sha1-gfNEp2hqgLTFKT6P3AsBYMgsBqg="
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        with self.assertRaises(bb.fetch2.FetchError):
-            fetcher.download()
-        # Fetch correctly to get a tarball
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
-                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        fetcher.download()
-        localpath = os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")
-        self.assertTrue(os.path.exists(localpath))
-        # Modify the tarball
-        bad = b"bad checksum"
-        with open(localpath, "wb") as f:
-            f.write(bad)
-        # Verify that the tarball is fetched again
-        fetcher.download()
-        badsum = hashlib.sha1(bad).hexdigest()
-        self.assertTrue(os.path.exists(localpath + "_bad-checksum_" + badsum))
-        self.assertTrue(os.path.exists(localpath))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npmsw_premirrors(self):
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-        # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, "mirror")
-        bb.utils.mkdirhier(mirrordir)
-        os.replace(ud.localpath, os.path.join(mirrordir, os.path.basename(ud.localpath)))
-        self.d.setVar("PREMIRRORS", "https?$://.*/.* file://%s/" % mirrordir)
-        self.d.setVar("BB_FETCH_PREMIRRORONLY", "1")
-        # Fetch again
-        self.assertFalse(os.path.exists(ud.localpath))
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz",
-                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-
-    @skipIfNoNpm()
-    @skipIfNoNetwork()
-    def test_npmsw_mirrors(self):
-        # Fetch once to get a tarball
-        fetcher = bb.fetch.Fetch(["npm://registry.npmjs.org;package=array-flatten;version=1.1.1"], self.d)
-        ud = fetcher.ud[fetcher.urls[0]]
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-        # Setup the mirror
-        mirrordir = os.path.join(self.tempdir, "mirror")
-        bb.utils.mkdirhier(mirrordir)
-        os.replace(ud.localpath, os.path.join(mirrordir, os.path.basename(ud.localpath)))
-        self.d.setVar("MIRRORS", "https?$://.*/.* file://%s/" % mirrordir)
-        # Fetch again with invalid url
-        self.assertFalse(os.path.exists(ud.localpath))
-        swfile = self.create_shrinkwrap_file({
-            "packages": {
-                "node_modules/array-flatten": {
-                    "version": "1.1.1",
-                    "resolved": "https://invalid",
-                    "integrity": "sha1-ml9pkFGx5wczKPKgCJaLZOopVdI="
-                }
-            }
-        })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
-        fetcher.download()
-        self.assertTrue(os.path.exists(ud.localpath))
-
     @skipIfNoNetwork()
     def test_npmsw_bundled(self):
-        swfile = self.create_shrinkwrap_file({
+        filename = self.create_shrinkwrap_file({
             "packages": {
                 "node_modules/array-flatten": {
                     "version": "1.1.1",
@@ -2964,10 +2806,43 @@ class NPMSWTest(FetcherTest):
                 }
             }
         })
-        fetcher = bb.fetch.Fetch(["npmsw://" + swfile], self.d)
+        fetcher = bb.fetch.Fetch([f"npmsw://{filename}"], self.d)
         fetcher.download()
         self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/array-flatten-1.1.1.tgz.done")))
         self.assertFalse(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz")))
+        self.assertFalse(os.path.exists(os.path.join(self.dldir, "npm2/content-type-1.0.4.tgz.done")))
+
+    @skipIfNoNetwork()
+    def test_npmsw_git(self):
+        urls = [
+            "npmsw+git://github.com/karma-runner/karma.git;protocol=https;"
+            "rev=84f85e7016efc2266fa6b3465f494a3fa151c85c"
+        ]
+        fetcher = bb.fetch.Fetch(urls, self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.karma-runner.karma.git")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/http-proxy-1.18.1.tgz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/http-proxy-1.18.1.tgz.done")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/package.json")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/node_modules/http-proxy")))
+
+    @skipIfNoNetwork()
+    def test_npmswhttps(self):
+        urls = [
+            "npmsw+https://github.com/karma-runner/karma/archive/refs/tags/v6.4.4.tar.gz;"
+            "striplevel=1;subdir=karma-6.4.4;"
+            "sha256sum=3cbd3b72da3b0b8aa650a90ac3a97aa5d0995ad2415989b9b8b59d09c460a6bc"
+        ]
+        fetcher = bb.fetch.Fetch(urls, self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "v6.4.4.tar.gz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "v6.4.4.tar.gz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "npm2/http-proxy-1.18.1.tgz")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "karma-6.4.4/package.json")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "karma-6.4.4/node_modules/http-proxy")))
 
 class GitSharedTest(FetcherTest):
     def setUp(self):
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 18/21] fetch: add gosum fetcher
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (16 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 17/21] tests: fetch: adapt npmsw test cases Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 19/21] tests: fetch: add test cases for gosum Stefan Herbrechtsmeier
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add gosum fetcher to fetch dependencies via a go.sum file. The fetcher
uses the dependency mixin and supports different types:

gosum
    The fetcher uses a local go.sum file to fetch dependencies.

    SRC_URI = "gosum://go.sum"

gosum+https
    The fetcher downloads a go.sum file or archive with a go.sum file in
    the root folder and uses the go.sum  to fetch dependencies.

    SRC_URI = "gosum+http://example.com/go.sum"
    SRC_URI = "gosum+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"

gosum+git
    The fetcher checkouts a git repository with a go.sum file to fetch
    dependencies.

    SRC_URI = "gosum+git://example.com/${BPN}.git;protocol=https"

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py |  2 ++
 lib/bb/fetch2/gosum.py    | 51 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)
 create mode 100644 lib/bb/fetch2/gosum.py

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 5dbc0598d..10b4cf24b 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -2116,6 +2116,7 @@ from . import az
 from . import crate
 from . import gcp
 from . import gomod
+from . import gosum
 
 methods.append(local.Local())
 methods.append(wget.Wget())
@@ -2140,3 +2141,4 @@ methods.append(gcp.GCP())
 methods.append(gomod.GoMod())
 methods.append(gomod.GoModGit())
 methods.extend(npmsw.methods)
+methods.extend(gosum.methods)
diff --git a/lib/bb/fetch2/gosum.py b/lib/bb/fetch2/gosum.py
new file mode 100644
index 000000000..001ef3304
--- /dev/null
+++ b/lib/bb/fetch2/gosum.py
@@ -0,0 +1,51 @@
+# Copyright (C) 2024-2025 Weidmueller Interface GmbH & Co. KG
+# Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
+#
+# SPDX-License-Identifier: MIT
+#
+"""
+BitBake 'Fetch' implementation for go.sum
+
+The gosum, gosum+https and gosum+git fetchers are used to download
+Go mod dependencies via a go.sum file.
+"""
+
+import os
+import bb
+import base64
+from bb.fetch2 import ParameterError
+from bb.fetch2 import URI
+from bb.fetch2.dependency import create_methods
+
+class GoSumMixin:
+    def resolve_dependencies(self, ud, localpath, d):
+        urls = []
+
+        def resolve_dependency(module_path, version, hash):
+            uri = URI(f"gomod://{module_path}")
+            if version.endswith("/go.mod"):
+                uri.params["version"] = version[:-7]
+                uri.params["mod"] = "1"
+            else:
+                uri.params["version"] = version
+            if hash.startswith("h1:"):
+                uri.params["h1sum"] = base64.b64decode(hash[3:]).hex()
+            else:
+                raise ParameterError(f"Invalid hash: {hash}", ud.url)
+            urls.append(str(uri))
+
+        if os.path.isdir(localpath):
+            localpath = os.path.join(localpath, "go.sum")
+        try:
+            with open(localpath, "r") as f:
+                for line in f:
+                    fields = line.strip().split()
+                    if len(fields) != 3:
+                        raise ParameterError(f"Invalid go.sum line: {line}", ud.url)
+                    resolve_dependency(*fields)
+        except Exception as e:
+            raise ParameterError(f"Invalid go.sum file: {str(e)}", ud.url)
+
+        return urls
+
+methods = create_methods("gosum", GoSumMixin)
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 19/21] tests: fetch: add test cases for gosum
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (17 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 18/21] fetch: add gosum fetcher Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 20/21] fetch: add cargolock fetcher Stefan Herbrechtsmeier
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/tests/fetch.py | 56 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 437571f1c..30c821a67 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -3439,3 +3439,59 @@ class DependencyTest(FetcherTest):
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy/dummy.txt")))
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy/bitbake-1.0")))
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "dummy/git")))
+
+class GoSumTest(FetcherTest):
+    def setUp(self):
+        super().setUp()
+        self.localsrcdir = os.path.join(self.tempdir, 'localsrc')
+        os.makedirs(self.localsrcdir)
+        self.d.setVar("FILESPATH", self.localsrcdir)
+
+    def create_go_sum_file(self, data):
+        filename = "go.sum"
+        with open(os.path.join(self.localsrcdir, filename), 'w') as f:
+            for module_path, version, hash in data:
+                f.write(f"{module_path} {version} {hash}\n")
+        return filename
+
+    @skipIfNoNetwork()
+    def test_gosum(self):
+        filename = self.create_go_sum_file([
+            (
+                "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob",
+                "v1.0.0",
+                "h1:u/LLAOFgsMv7HmNL4Qufg58y+qElGOt5qv0z1mURkRY="
+            ), (
+                "gopkg.in/ini.v1",
+                "v1.67.0/go.mod",
+                "h1:pNLf8WUiyNEtQjuu5G5vTm06TEv9tsIgeAvK8hOrP4k="
+            )
+        ])
+        fetcher = bb.fetch.Fetch(["gosum://" + filename], self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "github.com/!azure/azure-sdk-for-go/sdk/storage/azblob/@v/v1.0.0.zip")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "github.com/!azure/azure-sdk-for-go/sdk/storage/azblob/@v/v1.0.0.zip.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "gopkg.in/ini.v1/@v/v1.67.0.mod")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "gopkg.in/ini.v1/@v/v1.67.0.mod.done")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "go.sum")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "pkg/mod/cache/download/github.com/!azure/azure-sdk-for-go/sdk/storage/azblob/@v/v1.0.0.zip")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "pkg/mod/cache/download/github.com/!azure/azure-sdk-for-go/sdk/storage/azblob/@v/v1.0.0.mod")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "pkg/mod/cache/download/gopkg.in/ini.v1/@v/v1.67.0.mod")))
+
+    @skipIfNoNetwork()
+    def test_gosum_git(self):
+        urls = [
+            "gosum+git://github.com/Azure/azure-sdk-for-go.git;protocol=https;"
+            "nobranch=1;subpath=sdk/storage/azblob;"
+            "rev=ec928e0ed34db682b3f783d3739d1c538142e0c3"
+        ]
+        fetcher = bb.fetch.Fetch(urls, self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.Azure.azure-sdk-for-go.git")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "golang.org/x/net/@v/v0.0.0-20220425223048-2871e0cb64e4.zip")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "golang.org/x/net/@v/v0.0.0-20220425223048-2871e0cb64e4.zip.done")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "azblob/go.sum")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "azblob/pkg/mod/cache/download/golang.org/x/net/@v/v0.0.0-20220425223048-2871e0cb64e4.zip")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "azblob/pkg/mod/cache/download/golang.org/x/net/@v/v0.0.0-20220425223048-2871e0cb64e4.mod")))
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 20/21] fetch: add cargolock fetcher
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (18 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 19/21] tests: fetch: add test cases for gosum Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-20 11:26 ` [RFC PATCH 21/21] tests: fetch: add test cases for cargolock Stefan Herbrechtsmeier
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Add cargolock fetcher to fetch dependencies via a cargo.lock file. The
fetcher uses the dependency mixin and supports different types:

cargolock
    The fetcher uses a local cargo.lock file to fetch dependencies.

    SRC_URI = "cargolock://cargo.lock"

cargolock+https
    The fetcher downloads a cargo.lock file or archive with a cargo.lock
    file in the root folder and uses the cargo.lock file to fetch
    dependencies.

    SRC_URI = "cargolock+http://example.com/cargo.lock"
    SRC_URI = "cargolock+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"

cargolock+git
    The fetcher checkouts a git repository with a cargo.lock file to
    fetch dependencies.

    SRC_URI = "cargolock+git://example.com/${BPN}.git;protocol=https"

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
---

 lib/bb/fetch2/__init__.py  |  2 ++
 lib/bb/fetch2/cargolock.py | 73 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)
 create mode 100644 lib/bb/fetch2/cargolock.py

diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
index 10b4cf24b..012130ac6 100644
--- a/lib/bb/fetch2/__init__.py
+++ b/lib/bb/fetch2/__init__.py
@@ -2117,6 +2117,7 @@ from . import crate
 from . import gcp
 from . import gomod
 from . import gosum
+from . import cargolock
 
 methods.append(local.Local())
 methods.append(wget.Wget())
@@ -2142,3 +2143,4 @@ methods.append(gomod.GoMod())
 methods.append(gomod.GoModGit())
 methods.extend(npmsw.methods)
 methods.extend(gosum.methods)
+methods.extend(cargolock.methods)
diff --git a/lib/bb/fetch2/cargolock.py b/lib/bb/fetch2/cargolock.py
new file mode 100644
index 000000000..18df160ed
--- /dev/null
+++ b/lib/bb/fetch2/cargolock.py
@@ -0,0 +1,73 @@
+# Copyright (C) 2024-2025 Weidmueller Interface GmbH & Co. KG
+# Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
+#
+# SPDX-License-Identifier: MIT
+#
+"""
+BitBake 'Fetch' implementation for cargo.lock
+
+The cargolock, cargolock+https and cargolock+git fetchers are used to download
+Cargo dependencies via a cargo.lock file.
+"""
+
+import os
+import tomllib
+import bb
+from bb.fetch2 import ParameterError
+from bb.fetch2 import URI
+from bb.fetch2.dependency import create_methods
+
+class CargoLockMixin:
+    def resolve_dependencies(self, ud, localpath, d):
+        if os.path.isdir(localpath):
+            localpath = os.path.join(localpath, "Cargo.lock")
+        try:
+            with open(localpath, "rb") as f:
+                crates = tomllib.load(f)
+        except Exception as e:
+            raise ParameterError("Invalid Cargo lock file: %s" % str(e), ud.url)
+
+        urls = []
+        for crate in crates.get("package", []):
+            name =  crate.get('name')
+            version = crate.get('version')
+            source = crate.get("source")
+            
+            if not source:
+                continue
+
+            if source.startswith("registry"):
+                uri = URI(source[9:])
+                if (uri.scheme == "https" and uri.hostname == "github.com"
+                        and uri.path == "/rust-lang/crates.io-index"):
+                    uri.scheme = "crate"
+                    uri.hostname = "crates.io"
+                    uri.path = f"/{name}/{version}"
+                    uri.params["dn"] = name
+                    uri.params["dv"] = version
+                else:
+                    bb.warn(f"Please add support for the url to crate fetcher: {source}")
+                    uri.params["subdir"] = os.path.join("cargo_home", "bitbake")
+                uri.params["sha256sum"] = crate.get('checksum')
+                url = str(uri)
+
+            elif source.startswith("git"):
+                url, _, rev = source.partition("#")
+                uri = URI(url)
+                scheme, _, protocol = uri.scheme.partition("+")
+                if protocol:
+                    uri.params["protocol"] = protocol
+                    uri.scheme = scheme
+                uri.params["rev"] = rev
+                uri.params["nobranch"] = "1"
+                uri.params["destsuffix"] = f"{name}-{version}"
+                uri.params["subdir"] = os.path.join("cargo_home", "bitbake")
+                url = str(uri)
+            else:
+                raise ParameterError(f"Unsupported dependency: {name}", ud.url)
+
+            urls.append(url)
+
+        return urls
+
+methods = create_methods("cargolock", CargoLockMixin)
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [RFC PATCH 21/21] tests: fetch: add test cases for cargolock
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (19 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 20/21] fetch: add cargolock fetcher Stefan Herbrechtsmeier
@ 2024-12-20 11:26 ` Stefan Herbrechtsmeier
  2024-12-23 10:03 ` [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Richard Purdie
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2024-12-20 11:26 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Stefan Herbrechtsmeier

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>

---

 lib/bb/tests/fetch.py | 75 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 30c821a67..37146c087 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -3495,3 +3495,78 @@ class GoSumTest(FetcherTest):
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "azblob/go.sum")))
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "azblob/pkg/mod/cache/download/golang.org/x/net/@v/v0.0.0-20220425223048-2871e0cb64e4.zip")))
         self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "azblob/pkg/mod/cache/download/golang.org/x/net/@v/v0.0.0-20220425223048-2871e0cb64e4.mod")))
+
+class CargoLockTest(FetcherTest):
+    def setUp(self):
+        super().setUp()
+        self.localsrcdir = os.path.join(self.tempdir, "localsrc")
+        os.makedirs(self.localsrcdir)
+        self.d.setVar("FILESPATH", self.localsrcdir)
+
+    def create_cargo_lock_file(self, data):
+        import tomllib
+        filename = "Cargo.lock"
+        with open(os.path.join(self.localsrcdir, filename), "w") as f:
+            for package in data.get("package", []):
+                f.write("\n[[package]]\n")
+                for key in package.keys():
+                    f.write(f'{key} = "{package[key]}"\n')
+        return filename
+
+    @skipIfNoNetwork()
+    def test_cargolock(self):
+        filename = self.create_cargo_lock_file({
+            "package": [
+                {
+                    "name": "regex",
+                    "version": "1.4.0",
+                    "source": "registry+https://github.com/rust-lang/crates.io-index",
+                    "checksum": "36f45b719a674bf4b828ff318906d6c133264c793eff7a41e30074a45b5099e2"
+                }, {
+                    "name": "regex",
+                    "version": "1.5.0",
+                    "source": "git+https://github.com/rust-lang/regex.git#9f9f693768c584971a4d53bc3c586c33ed3a6831"
+                }
+            ]
+        })
+        fetcher = bb.fetch.Fetch([f"cargolock://{filename}"], self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "regex-1.4.0.crate")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "regex-1.4.0.crate.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/github.com.rust-lang.regex.git")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "Cargo.lock")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "cargo_home/bitbake/regex-1.4.0/Cargo.toml")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "cargo_home/bitbake/regex-1.5.0/Cargo.toml")))
+
+    @skipIfNoNetwork()
+    def test_cargolock_https(self):
+        urls = [
+            "cargolock+https://download.gnome.org/sources/librsvg/2.58/librsvg-2.58.2.tar.xz;"
+            "striplevel=1;subdir=librsvg-2.58.2;"
+            "sha256sum=18e9d70c08cf25f50d610d6d5af571561d67cf4179f962e04266475df6e2e224"
+        ]
+        fetcher = bb.fetch.Fetch(urls, self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "librsvg-2.58.2.tar.xz")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "librsvg-2.58.2.tar.xz.done")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "adler-1.0.2.crate")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "adler-1.0.2.crate.done")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "librsvg-2.58.2/Cargo.toml")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "librsvg-2.58.2/cargo_home/bitbake/adler-1.0.2/Cargo.toml")))
+
+    @skipIfNoNetwork()
+    def test_cargolock_git(self):
+        urls = [
+            "cargolock+git://gitlab.gnome.org/GNOME/librsvg.git;protocol=https;"
+            "nobranch=1;rev=ef5c94d8362c35573d7eb651cf9a07c6df9df6da"
+        ]
+        fetcher = bb.fetch.Fetch(urls, self.d)
+        fetcher.download()
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "git2/gitlab.gnome.org.GNOME.librsvg.git")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "adler-1.0.2.crate")))
+        self.assertTrue(os.path.exists(os.path.join(self.dldir, "adler-1.0.2.crate.done")))
+        fetcher.unpack(self.unpackdir)
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/Cargo.toml")))
+        self.assertTrue(os.path.exists(os.path.join(self.unpackdir, "git/cargo_home/bitbake/adler-1.0.2/Cargo.toml")))
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 09/21] fetch2: add destdir to FetchData
  2024-12-20 11:26 ` [RFC PATCH 09/21] fetch2: add destdir to FetchData Stefan Herbrechtsmeier
@ 2024-12-23  9:56   ` Richard Purdie
  2025-01-02  8:04     ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2024-12-23  9:56 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Fri, 2024-12-20 at 12:26 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> 
> Add a `destdir` variable to the `FetchData` class to record destination
> directory of unpack method. Users of the `FetchData` class can use the
> directory to unpack additional content into the directory. The git
> fetcher class already records the destination directory in `destdir`
> class variable of `FetchData`.
> 
> Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> ---
> 
>  lib/bb/fetch2/__init__.py | 3 +++
>  1 file changed, 3 insertions(+)

Where/how is this being used/needed?

unpackdir is only used during the do_unpack task and we've deliberately
kept this out of FetchFata since in general it is meant to be
independent of the target location.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 08/21] utils: add Go mod h1 checksum support
  2024-12-20 11:25 ` [RFC PATCH 08/21] utils: add Go mod h1 checksum support Stefan Herbrechtsmeier
@ 2024-12-23 10:01   ` Richard Purdie
  2025-01-02  8:27     ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2024-12-23 10:01 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> 
> Add support for the Go mod h1 hash. The hash is
> based on the Go dirhash package. The package
> defines hashes over directory trees and is uses
> for Go mod files and zip archives.
> 
> Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> ---
> 
>  lib/bb/fetch2/__init__.py |  2 +-
>  lib/bb/utils.py           | 25 +++++++++++++++++++++++++
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
> index 7d8f71b20..0c2d6d73e 100644
> --- a/lib/bb/fetch2/__init__.py
> +++ b/lib/bb/fetch2/__init__.py
> @@ -34,7 +34,7 @@ _revisions_cache = bb.checksum.RevisionsCache()
>  
>  logger = logging.getLogger("BitBake.Fetcher")
>  
> -CHECKSUM_LIST = [ "md5", "sha256", "sha1", "sha384", "sha512" ]
> +CHECKSUM_LIST = [ "h1", "md5", "sha256", "sha1", "sha384", "sha512" ]
>  SHOWN_CHECKSUM_LIST = ["sha256"]
>  
>  class BBFetchException(Exception):
> 

The others are all generic checksum formats so I'm wondering if we need
to indicate this one is go specific go-h1/goh1/go_h1?

Cheers,

Richard




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (20 preceding siblings ...)
  2024-12-20 11:26 ` [RFC PATCH 21/21] tests: fetch: add test cases for cargolock Stefan Herbrechtsmeier
@ 2024-12-23 10:03 ` Richard Purdie
  2024-12-25 15:17   ` Alexander Kanavin
  2025-01-02  8:55   ` Stefan Herbrechtsmeier
  2025-01-06 11:04 ` Richard Purdie
                   ` (2 subsequent siblings)
  24 siblings, 2 replies; 66+ messages in thread
From: Richard Purdie @ 2024-12-23 10:03 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> 
> The patch series improves the fetcher support for tightly coupled
> package manager (npm, go and cargo). It adds support for embedded
> dependency fetcher via a common dependency mixin. The patch series
> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
> fetcher for go.sum and cargo.lock files. The dependency mixin contains
> two stages. The first stage locates a local specification file or
> fetches an archive or git repository with a specification file. The
> second stage resolves the dependency URLs from the specification file
> and fetches the dependencies.
> 
> SRC_URI = "<type>://npm-shrinkwrap.json"
> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
> 
> Additionally, the patch series reworks the npm fetcher to work without a
> npm binary and external package repository. It adds support for a common
> dependency name and version schema to integrate the dependencies into
> the SBOM.

This certainly sounds promising, thanks for working on it. It will take
me a bit of time to digest the changes.

A while back I was asked to document the constraints the fetchers
operate within and I documented this here:

https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README

Would you be able to check if this work meets the criteria set out
there and if not, what the differences are?

Thanks,

Richard


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 07/21] fetch2: add unpack support for npm archives
  2024-12-20 11:25 ` [RFC PATCH 07/21] fetch2: add unpack support for npm archives Stefan Herbrechtsmeier
@ 2024-12-23 11:56   ` Richard Purdie
  2025-01-02 12:39     ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2024-12-23 11:56 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> 
> Add unpack support for npm archives with unusual member ordering and
> disable warnings for unknown extended header keywords.
> 
> Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> ---
> 
>  lib/bb/fetch2/__init__.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
> index 4b7c01d6a..7d8f71b20 100644
> --- a/lib/bb/fetch2/__init__.py
> +++ b/lib/bb/fetch2/__init__.py
> @@ -1535,6 +1535,7 @@ class FetchMethod(object):
>  
>          if unpack:
>              tar_cmd = 'tar --extract --no-same-owner'
> +            tar_cmd += ' --delay-directory-restore --warning=no-unknown-keyword'
>              if 'striplevel' in urldata.parm:
>                  tar_cmd += ' --strip-components=%s' %  urldata.parm['striplevel']
>              if file.endswith('.tar'):

I think I'd be happier if this option was only added for the npm fetcher...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2024-12-23 10:03 ` [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Richard Purdie
@ 2024-12-25 15:17   ` Alexander Kanavin
  2025-01-06 14:42     ` Stefan Herbrechtsmeier
  2025-01-02  8:55   ` Stefan Herbrechtsmeier
  1 sibling, 1 reply; 66+ messages in thread
From: Alexander Kanavin @ 2024-12-25 15:17 UTC (permalink / raw)
  To: richard.purdie
  Cc: stefan.herbrechtsmeier-oss, bitbake-devel, Stefan Herbrechtsmeier

On Mon, 23 Dec 2024 at 11:03, Richard Purdie via
lists.openembedded.org
<richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
> Would you be able to check if this work meets the criteria set out
> there and if not, what the differences are?

I'd also add that this would benefit from a demonstration with one of
the real recipes go/rust recipes in oe-core: basically it would be
good to push a branch of poky somewhere public, and provide
instructions on how to see the new fetchers in action, and observe
their benefits.

Alex


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 09/21] fetch2: add destdir to FetchData
  2024-12-23  9:56   ` [bitbake-devel] " Richard Purdie
@ 2025-01-02  8:04     ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02  8:04 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]

Am 23.12.2024 um 10:56 schrieb Richard Purdie:
> On Fri, 2024-12-20 at 12:26 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> From: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com>
>>
>> Add a `destdir` variable to the `FetchData` class to record destination
>> directory of unpack method. Users of the `FetchData` class can use the
>> directory to unpack additional content into the directory. The git
>> fetcher class already records the destination directory in `destdir`
>> class variable of `FetchData`.
>>
>> Signed-off-by: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com>
>> ---
>>
>>   lib/bb/fetch2/__init__.py | 3 +++
>>   1 file changed, 3 insertions(+)
> Where/how is this being used/needed?

The git fetcher already set the variable. The variable is useful to know 
the destination directory including subdir and destsuffix parameter. 
Otherwise the user of unpack must handle the parameters manual (see S 
variable). The dependency mixin unpack the main source to process the 
dependency specification file and need to know the destination folder of 
the content.

> unpackdir is only used during the do_unpack task and we've deliberately
> kept this out of FetchFata since in general it is meant to be
> independent of the target location.

The unpackdir depends on the subdir and destsuffix parameter of the 
FetchData and thereby the target location isn't independent of the 
FetchData. Only the unpackdir root is independent of the FetchData. 
Would it be acceptable to save the relative destination directory inside 
the FetchData or to add a function to determine it? The function could 
be used to eliminate code duplication inside generic and git specific 
unpack.

[-- Attachment #2: Type: text/html, Size: 2720 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 08/21] utils: add Go mod h1 checksum support
  2024-12-23 10:01   ` [bitbake-devel] " Richard Purdie
@ 2025-01-02  8:27     ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02  8:27 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

Am 23.12.2024 um 11:01 schrieb Richard Purdie:
> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>>
>> Add support for the Go mod h1 hash. The hash is
>> based on the Go dirhash package. The package
>> defines hashes over directory trees and is uses
>> for Go mod files and zip archives.
>>
>> Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>> ---
>>
>>   lib/bb/fetch2/__init__.py |  2 +-
>>   lib/bb/utils.py           | 25 +++++++++++++++++++++++++
>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
>> index 7d8f71b20..0c2d6d73e 100644
>> --- a/lib/bb/fetch2/__init__.py
>> +++ b/lib/bb/fetch2/__init__.py
>> @@ -34,7 +34,7 @@ _revisions_cache = bb.checksum.RevisionsCache()
>>   
>>   logger = logging.getLogger("BitBake.Fetcher")
>>   
>> -CHECKSUM_LIST = [ "md5", "sha256", "sha1", "sha384", "sha512" ]
>> +CHECKSUM_LIST = [ "h1", "md5", "sha256", "sha1", "sha384", "sha512" ]
>>   SHOWN_CHECKSUM_LIST = ["sha256"]
>>   
>>   class BBFetchException(Exception):
>>
> The others are all generic checksum formats so I'm wondering if we need
> to indicate this one is go specific go-h1/goh1/go_h1?

It looks like the go dirhash (h1) [1] is different to the dirhash 
standard [2] and the name should be go specific. The SRC_URI parameters 
use composed words without separator (goh1sum). What do you prefer:

SRC_URI[package_name-1.2.3.go-h1sum]
SRC_URI[package_name-1.2.3.goh1sum]
SRC_URI[package_name-1.2.3.go_h1sum]

[1] https://pkg.go.dev/golang.org/x/mod/sumdb/dirhash
[2] https://github.com/andhus/dirhash



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2024-12-23 10:03 ` [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Richard Purdie
  2024-12-25 15:17   ` Alexander Kanavin
@ 2025-01-02  8:55   ` Stefan Herbrechtsmeier
  2025-01-02  9:32     ` Richard Purdie
  1 sibling, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02  8:55 UTC (permalink / raw)
  To: richard.purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org:
> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>>
>> The patch series improves the fetcher support for tightly coupled
>> package manager (npm, go and cargo). It adds support for embedded
>> dependency fetcher via a common dependency mixin. The patch series
>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
>> fetcher for go.sum and cargo.lock files. The dependency mixin contains
>> two stages. The first stage locates a local specification file or
>> fetches an archive or git repository with a specification file. The
>> second stage resolves the dependency URLs from the specification file
>> and fetches the dependencies.
>>
>> SRC_URI = "<type>://npm-shrinkwrap.json"
>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
>>
>> Additionally, the patch series reworks the npm fetcher to work without a
>> npm binary and external package repository. It adds support for a common
>> dependency name and version schema to integrate the dependencies into
>> the SBOM.
> This certainly sounds promising, thanks for working on it. It will take
> me a bit of time to digest the changes.
>
> A while back I was asked to document the constraints the fetchers
> operate within and I documented this here:
>
> https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README
>
> Would you be able to check if this work meets the criteria set out
> there and if not, what the differences are?

The fetchers inheritance existing fetchers and reuse existent 
functionality. The npm fetcher inheritance the wget fetcher and only 
override the urldata_init and latest_versionstring function. The 
reworked urldata_init  function preprocess the url and works without 
internet access. The dependency mixin is inspired by the gitsm fetcher. 
The cargolock, gosum and npmsw fetcher inherit the local, wget and git 
fetcher. They forward the function calls to the parent class, process 
the dependency specification file and handle the dependencies if needed. 
Thereby the content of the specification file is translated into source 
urls for existing fetchers and saved inside a proxy object. The user has 
to call the download function to download the main source with 
specification file and all dependencies. The dependencies are downloaded 
via existing fetchers.

Because of the reuse of existing fetchers all criteria should be 
satisfied by the new fetcher or need to be fixed inside the existing 
fetchers.



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-02  8:55   ` Stefan Herbrechtsmeier
@ 2025-01-02  9:32     ` Richard Purdie
  2025-01-02 10:51       ` Stefan Herbrechtsmeier
  2025-01-02 13:50       ` Stefan Herbrechtsmeier
  0 siblings, 2 replies; 66+ messages in thread
From: Richard Purdie @ 2025-01-02  9:32 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Thu, 2025-01-02 at 09:55 +0100, Stefan Herbrechtsmeier wrote:
> Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org:
> > On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> > > From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> > > 
> > > The patch series improves the fetcher support for tightly coupled
> > > package manager (npm, go and cargo). It adds support for embedded
> > > dependency fetcher via a common dependency mixin. The patch series
> > > reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
> > > fetcher for go.sum and cargo.lock files. The dependency mixin contains
> > > two stages. The first stage locates a local specification file or
> > > fetches an archive or git repository with a specification file. The
> > > second stage resolves the dependency URLs from the specification file
> > > and fetches the dependencies.
> > > 
> > > SRC_URI = "<type>://npm-shrinkwrap.json"
> > > SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
> > > SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
> > > SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
> > > 
> > > Additionally, the patch series reworks the npm fetcher to work without a
> > > npm binary and external package repository. It adds support for a common
> > > dependency name and version schema to integrate the dependencies into
> > > the SBOM.
> > This certainly sounds promising, thanks for working on it. It will take
> > me a bit of time to digest the changes.
> > 
> > A while back I was asked to document the constraints the fetchers
> > operate within and I documented this here:
> > 
> > https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README
> > 
> > Would you be able to check if this work meets the criteria set out
> > there and if not, what the differences are?
> 
> The fetchers inheritance existing fetchers and reuse existent 
> functionality. The npm fetcher inheritance the wget fetcher and only 
> override the urldata_init and latest_versionstring function. The 
> reworked urldata_init  function preprocess the url and works without 
> internet access. The dependency mixin is inspired by the gitsm fetcher. 
> The cargolock, gosum and npmsw fetcher inherit the local, wget and git 
> fetcher. They forward the function calls to the parent class, process
> the dependency specification file and handle the dependencies if needed. 
> Thereby the content of the specification file is translated into source 
> urls for existing fetchers and saved inside a proxy object. The user has 
> to call the download function to download the main source with 
> specification file and all dependencies. The dependencies are downloaded 
> via existing fetchers.
> 
> Because of the reuse of existing fetchers all criteria should be 
> satisfied by the new fetcher or need to be fixed inside the existing 
> fetchers.

Even if you forward everything to the parent API, there are ways you
could use it such that the parent class meets the criteria but the
dervived one does not. I'm trying to aid the review process by asking
those questions, it will just take longer if I have to work this out
myself.

The other question I'm wondering about is compatibility and how we
change the way urls are working. Do these changes need a flag day where
recipes need to be updated to match? If so, how do we best handle that?
Is the user going to get errors they can easily fix or how is that
going to work?

Cheers,

Richard



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-02  9:32     ` Richard Purdie
@ 2025-01-02 10:51       ` Stefan Herbrechtsmeier
  2025-01-02 13:50       ` Stefan Herbrechtsmeier
  1 sibling, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02 10:51 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

Am 02.01.2025 um 10:32 schrieb Richard Purdie:
> On Thu, 2025-01-02 at 09:55 +0100, Stefan Herbrechtsmeier wrote:
>> Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org:
>>> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>>> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>>>>
>>>> The patch series improves the fetcher support for tightly coupled
>>>> package manager (npm, go and cargo). It adds support for embedded
>>>> dependency fetcher via a common dependency mixin. The patch series
>>>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
>>>> fetcher for go.sum and cargo.lock files. The dependency mixin contains
>>>> two stages. The first stage locates a local specification file or
>>>> fetches an archive or git repository with a specification file. The
>>>> second stage resolves the dependency URLs from the specification file
>>>> and fetches the dependencies.
>>>>
>>>> SRC_URI = "<type>://npm-shrinkwrap.json"
>>>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
>>>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
>>>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
>>>>
>>>> Additionally, the patch series reworks the npm fetcher to work without a
>>>> npm binary and external package repository. It adds support for a common
>>>> dependency name and version schema to integrate the dependencies into
>>>> the SBOM.
>>> This certainly sounds promising, thanks for working on it. It will take
>>> me a bit of time to digest the changes.
>>>
>>> A while back I was asked to document the constraints the fetchers
>>> operate within and I documented this here:
>>>
>>> https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README
>>>
>>> Would you be able to check if this work meets the criteria set out
>>> there and if not, what the differences are?
>> The fetchers inheritance existing fetchers and reuse existent
>> functionality. The npm fetcher inheritance the wget fetcher and only
>> override the urldata_init and latest_versionstring function. The
>> reworked urldata_init  function preprocess the url and works without
>> internet access. The dependency mixin is inspired by the gitsm fetcher.
>> The cargolock, gosum and npmsw fetcher inherit the local, wget and git
>> fetcher. They forward the function calls to the parent class, process
>> the dependency specification file and handle the dependencies if needed.
>> Thereby the content of the specification file is translated into source
>> urls for existing fetchers and saved inside a proxy object. The user has
>> to call the download function to download the main source with
>> specification file and all dependencies. The dependencies are downloaded
>> via existing fetchers.
>>
>> Because of the reuse of existing fetchers all criteria should be
>> satisfied by the new fetcher or need to be fixed inside the existing
>> fetchers.
> Even if you forward everything to the parent API, there are ways you
> could use it such that the parent class meets the criteria but the
> dervived one does not. I'm trying to aid the review process by asking
> those questions, it will just take longer if I have to work this out
> myself.

I don't see a reason why the fetchers shouldn't meet the constraints. 
The fetchers were designed to fulfill the constraints.

> The other question I'm wondering about is compatibility and how we
> change the way urls are working. Do these changes need a flag day where
> recipes need to be updated to match? If so, how do we best handle that?
> Is the user going to get errors they can easily fix or how is that
> going to work?

The fetcher should be backward compatible for recipes. I add warnings 
which propose the desired changes:

Parameter 'package' in '<url>' is deprecated. Please use 'dn' parameter 
instead.
Parameter 'version' in '<url>' is deprecated. Please use 'dv' parameter 
instead.

If we have an agreement about a common schema for repository host, 
package name and package version I will update the crate and gomod 
fetcher in an backward compatible way too. Hopefully most users will 
switch to the new fetcher instead of updating there own tools to 
generate recipes and include files.

The only desired incompatible changes are the remove of the old 
npm-shrinkwrap.json format and the support for "latest" version in the 
npm fetcher. All upstream supported npm versions supports the new format 
and the user could update the package lock file via npm. Like AUTOREV 
the "latest" version for npm leads to many problems and the usability 
should be very low.

I leave the function which are used by oe-core in the npm and npmsw 
fetcher. They should be moved into oe-core and afterwards removed from 
the fetcher.



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 07/21] fetch2: add unpack support for npm archives
  2024-12-23 11:56   ` [bitbake-devel] " Richard Purdie
@ 2025-01-02 12:39     ` Stefan Herbrechtsmeier
  2025-01-02 13:59       ` Richard Purdie
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02 12:39 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

Am 23.12.2024 um 12:56 schrieb Richard Purdie:
> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> From: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com>
>>
>> Add unpack support for npm archives with unusual member ordering and
>> disable warnings for unknown extended header keywords.
>>
>> Signed-off-by: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com>
>> ---
>>
>>   lib/bb/fetch2/__init__.py | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
>> index 4b7c01d6a..7d8f71b20 100644
>> --- a/lib/bb/fetch2/__init__.py
>> +++ b/lib/bb/fetch2/__init__.py
>> @@ -1535,6 +1535,7 @@ class FetchMethod(object):
>>   
>>           if unpack:
>>               tar_cmd = 'tar --extract --no-same-owner'
>> +            tar_cmd += ' --delay-directory-restore --warning=no-unknown-keyword'
>>               if 'striplevel' in urldata.parm:
>>                   tar_cmd += ' --strip-components=%s' %  urldata.parm['striplevel']
>>               if file.endswith('.tar'):
> I think I'd be happier if this option was only added for the npm fetcher...

Can I check the ud.type to add the args or should I move the args to an 
optional argument for the unpack method or an FetchMethod variable?

[-- Attachment #2: Type: text/html, Size: 2397 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-02  9:32     ` Richard Purdie
  2025-01-02 10:51       ` Stefan Herbrechtsmeier
@ 2025-01-02 13:50       ` Stefan Herbrechtsmeier
  2025-01-02 14:07         ` Richard Purdie
  1 sibling, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02 13:50 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]


Am 02.01.2025 um 10:32 schrieb Richard Purdie:
> On Thu, 2025-01-02 at 09:55 +0100, Stefan Herbrechtsmeier wrote:
>> Am 23.12.2024 um 11:03 schrieb Richard Purdie via lists.openembedded.org:
>>> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>>>> From: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com>
>>>>
>>>> The patch series improves the fetcher support for tightly coupled
>>>> package manager (npm, go and cargo). It adds support for embedded
>>>> dependency fetcher via a common dependency mixin. The patch series
>>>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
>>>> fetcher for go.sum and cargo.lock files. The dependency mixin contains
>>>> two stages. The first stage locates a local specification file or
>>>> fetches an archive or git repository with a specification file. The
>>>> second stage resolves the dependency URLs from the specification file
>>>> and fetches the dependencies.
>>>>
>>>> SRC_URI = "<type>://npm-shrinkwrap.json"
>>>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
>>>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
>>>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
>>>>
>>>> Additionally, the patch series reworks the npm fetcher to work without a
>>>> npm binary and external package repository. It adds support for a common
>>>> dependency name and version schema to integrate the dependencies into
>>>> the SBOM.

[SNIP]

> I'm trying to aid the review process by asking
> those questions, it will just take longer if I have to work this out
> myself.

Maybe we are able to discuss some design decision without code to 
simplify the review:

The dependency fetcher need to know the path of the dependency 
specification file. In case of the local fetcher the path is the uri. In 
case of the git fetcher the path depends on the subdir and destsuffix 
parameter. In case of the wget the path is unknown. This series requires 
the parameters striplevel=1 and subdir=${BP} to work. Additionally it 
doesn't support specification files inside sub directories. Therefore I 
plan to add a srcdir parameter. Should this parameter be mandatory for 
the wget fetcher or should the fetcher use the PN or S variable to 
determine a default value?

[-- Attachment #2: Type: text/html, Size: 3420 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 07/21] fetch2: add unpack support for npm archives
  2025-01-02 12:39     ` Stefan Herbrechtsmeier
@ 2025-01-02 13:59       ` Richard Purdie
  0 siblings, 0 replies; 66+ messages in thread
From: Richard Purdie @ 2025-01-02 13:59 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Thu, 2025-01-02 at 13:39 +0100, Stefan Herbrechtsmeier wrote:
>  
> Am 23.12.2024 um 12:56 schrieb Richard Purdie:
>  
>  
> >  
> > On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via
> > lists.openembedded.org wrote:
> >  
> > >  
> > > From: Stefan Herbrechtsmeier
> > > <stefan.herbrechtsmeier@weidmueller.com>
> > > 
> > > Add unpack support for npm archives with unusual member ordering
> > > and
> > > disable warnings for unknown extended header keywords.
> > > 
> > > Signed-off-by: Stefan Herbrechtsmeier
> > > <stefan.herbrechtsmeier@weidmueller.com>
> > > ---
> > > 
> > >  lib/bb/fetch2/__init__.py | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/lib/bb/fetch2/__init__.py
> > > b/lib/bb/fetch2/__init__.py
> > > index 4b7c01d6a..7d8f71b20 100644
> > > --- a/lib/bb/fetch2/__init__.py
> > > +++ b/lib/bb/fetch2/__init__.py
> > > @@ -1535,6 +1535,7 @@ class FetchMethod(object):
> > >  
> > >          if unpack:
> > >              tar_cmd = 'tar --extract --no-same-owner'
> > > +            tar_cmd += ' --delay-directory-restore --warning=no-
> > > unknown-keyword'
> > >              if 'striplevel' in urldata.parm:
> > >                  tar_cmd += ' --strip-components=%s' % 
> > > urldata.parm['striplevel']
> > >              if file.endswith('.tar'):
> > >  
> >  
> > I think I'd be happier if this option was only added for the npm
> > fetcher...
> 
> Can I check the ud.type to add the args or should I move the args to
> an optional argument for the unpack method or an FetchMethod
> variable?

I suspect a FetchMethod variable would be most in keeping with the rest
of the code. We've tried very hard not to fill the code with "if
ud.type" conditionals as we did that in fetch v1 and it became
unreadable and unmaintainable. This way it forces us to have some kind
of thought and methods/variables to document things.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-02 13:50       ` Stefan Herbrechtsmeier
@ 2025-01-02 14:07         ` Richard Purdie
  2025-01-02 15:11           ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2025-01-02 14:07 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Thu, 2025-01-02 at 14:50 +0100, Stefan Herbrechtsmeier wrote:
> > 
> > I'm trying to aid the review process by asking
> > those questions, it will just take longer if I have to work this
> > out
> > myself.
> >  
> Maybe we are able to discuss some design decision without code to
> simplify the review:
> 
>  The dependency fetcher need to know the path of the dependency
> specification file. In case of the local fetcher the path is the uri.
> In case of the git fetcher the path depends on the subdir and
> destsuffix parameter. In case of the wget the path is unknown. This
> series requires the parameters striplevel=1 and subdir=${BP} to work.
> Additionally it doesn't support specification files inside sub
> directories. Therefore I plan to add a srcdir parameter. Should this
> parameter be mandatory for the wget fetcher or should the fetcher use
> the PN or S variable to determine a default value?

do_fetch never touches ${S}, it would only touch ${DL_DIR}. For that
reason, ${S} is passed as a parameter to do_unpack and is only
referenced at that time. 

wget shouldn't need more information to have a default it already has
in the current code so something isn't adding up.

I'm still not sure why you'd need both a subdir and srcdir but I think
I need to think about this more deeply, FWIW I'm technically still on
vacation.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-02 14:07         ` Richard Purdie
@ 2025-01-02 15:11           ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-02 15:11 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

Am 02.01.2025 um 15:07 schrieb Richard Purdie:
> On Thu, 2025-01-02 at 14:50 +0100, Stefan Herbrechtsmeier wrote:
>>> I'm trying to aid the review process by asking
>>> those questions, it will just take longer if I have to work this
>>> out
>>> myself.
>>>   
>> Maybe we are able to discuss some design decision without code to
>> simplify the review:
>>
>>   The dependency fetcher need to know the path of the dependency
>> specification file. In case of the local fetcher the path is the uri.
>> In case of the git fetcher the path depends on the subdir and
>> destsuffix parameter. In case of the wget the path is unknown. This
>> series requires the parameters striplevel=1 and subdir=${BP} to work.
>> Additionally it doesn't support specification files inside sub
>> directories. Therefore I plan to add a srcdir parameter. Should this
>> parameter be mandatory for the wget fetcher or should the fetcher use
>> the PN or S variable to determine a default value?
> do_fetch never touches ${S}, it would only touch ${DL_DIR}. For that
> reason, ${S} is passed as a parameter to do_unpack and is only
> referenced at that time.

The do_unpack use the ${UNPACKDIR} and not the ${S}. The ${S} points to 
the main folder of the source and in most cases contains the main folder 
of the archive.

> wget shouldn't need more information to have a default it already has
> in the current code so something isn't adding up.

The additional information are needed by the dependency resolution not 
the wget. The dependency resolution need to temporary unpack the archive 
to read the dependency specification file inside the archive.

> I'm still not sure why you'd need both a subdir and srcdir but I think
> I need to think about this more deeply, FWIW I'm technically still on
> vacation.

The subdir is used to place the archive content inside an arbitrary 
folder. This is only needed, if you need to place one source into an 
other source. The srcdir is needed to know the path of the specification 
file inside the archive. The main folder inside an archive is archive 
specific. Therefore the S variable uses a common default 
(${WORKDIR}/${BP}) and it is common to override the variable because the 
archive uses an other folder name. Additionally the specification file 
could be located inside a sub folder of the archive.

librsvg:
SRC_URI = 
"cargolock+${GNOME_MIRROR}/${GNOMEBN}/${@gnome_verdir("${PV}")}/${GNOMEBN}-${PV}.tar.${GNOME_COMPRESS_TYPE};name=archive;srcdir=${BP}"
--> librsvg-2.58.2/Cargo.lock

python3-bcrypt:
SRC_URI = 
"cargolock+${@pypi_src_uri(d)};srcdir=${PYPI_PACKAGE}-${PV}/${CARGO_SRC_DIR}"
S = "${WORKDIR}/${PYPI_PACKAGE}-${PV}"
CARGO_SRC_DIR = "src/_bcrypt"
--> bcrypt-4.2.1/src/_bcrypt/Cargo.lock

Maybe the srcdir should be named specdir. Or specsuffix if its default 
value is ${BN}.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (21 preceding siblings ...)
  2024-12-23 10:03 ` [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Richard Purdie
@ 2025-01-06 11:04 ` Richard Purdie
  2025-01-06 14:35   ` Stefan Herbrechtsmeier
  2025-01-09 11:53 ` Martin Jansa
       [not found] ` <1812DEFF37B8C65E.26783@lists.openembedded.org>
  24 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2025-01-06 11:04 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> 
> The patch series improves the fetcher support for tightly coupled
> package manager (npm, go and cargo). It adds support for embedded
> dependency fetcher via a common dependency mixin. The patch series
> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
> fetcher for go.sum and cargo.lock files. The dependency mixin contains
> two stages. The first stage locates a local specification file or
> fetches an archive or git repository with a specification file. The
> second stage resolves the dependency URLs from the specification file
> and fetches the dependencies.
> 
> SRC_URI = "<type>://npm-shrinkwrap.json"
> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
> 
> Additionally, the patch series reworks the npm fetcher to work without a
> npm binary and external package repository. It adds support for a common
> dependency name and version schema to integrate the dependencies into
> the SBOM.
> 
> = Background
> Bitbake has diverse concepts and drawbacks for different tightly coupled
> package manager. The Python support uses a recipe per dependency and
> generates common fetcher URLs via a python function. The other languages
> embed the dependencies inside the recipe. The Node.js support offers a
> npmsw fetcher which uses a lock file beside the recipe to generates
> multiple common fetcher URLs on the fly and thereby hides the real
> download sources. This leads to a single source in the SBOM for example.
> The Go support contains two parallel implementations. A vendor-based
> solution with a common fetcher and a go-mod-based solution with a gomod
> fetcher. The vendor-based solution includes the individual dependencies
> into the SRC_URI of the recipe and uses a python function to generate
> common fetcher URLs which additional information for the vendor task.The
> gomod fetcher uses a proprietary gomod URL. It translates the URL into a
> common URL and prepares meta data during unpack. The Rust support
> includes the individual dependencies in the SRC_URI of the recipe and
> uses proprietary crate URLs. The crate fetcher translates a proprietary
> URL into a common fetcher URL and prepares meta data during unpack. The
> recipetool does not support the crate and the gomod fetcher. This leads
> to missing licenses of the dependencies in the recipe for example
> librsvg.
> 
> The steps needed to fetch dependencies for Node.js, Go and Rust are
> similar:
> 1. Extract the dependencies from a specification file (name, version,
>    checksum and URL)
> 2. Generate proprietary fetcher URIs
>   a. npm://registry.npmjs.org/;package=glob;version= 10.3.15
>   b. gomod://golang.org/x/net;version=v0.9.0
>      gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net
>   c. crate://crates.io/glob/0.3.1
> 3. Generate wget or git fetcher URIs
>   a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=…
>   b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=…
>      git://go.googlesource.com/net;protocol=https; subdir=…
>   c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=…
> 4. Unpack
> 5. Create meta files
>   a. Update lockfile and create tar.gz archives
>   b. Create go.mod file
>      Create info, go.mod file and zip archives
>   c. Create .cargo-checksum.json files
> 
> It looks like the recipetool is not widely used and therefore this patch
> series integrates the dependency resolving into the fetcher. After an
> agreement on a concept the fetcher could be extended. The fetcher could
> download the license information per package and a new build task could
> run the license cruncher from the recipetool.

I've spent a bit more time thinking about this and looking at the code
and I've mixed feelings on it. I can certainly see why you've
implemented it this way and it does have a lot of potential but there
are also potential risks. My comments (on various elements):

With a npm-shrinkwrap.json/package-lock.json/go.sum file, are
dependencies always recorded as specific entities with checksums? I'm a
little bit worried about how easily you could sneak a "floating"
version into this and make the fetcher non-deterministic. Does (or
could?) the code detect and error on that?

Put another way, could one of these SRC_URIs map to multiple different
combinations of underlying component versions?

Our existing method effectively hardcodes/expands the lock file into
extended SRC_URI entries which makes the specific versions and
components really clear. This change abstracts that away into the
fetcher and makes it opaque to the user, and much harder for code like
the archiver/license/spdx code to fine/handle.

I noticed that any fetcher operation has to first expand the lock file
using a temporary directory. You're using DL_DIR for that which I
suspect isn't a great idea for tmp files. In many cases that will work
fine but it is a bit of a performance overhead.

I did start wondering if we should cache the lock files in a subdir of
DL_DIR to help performance and also give some extra assurance about
changing content.

The url scheme is clever but also has a potential risk in that you
can't really pass parameters to both the top level fetcher and the
underlying one. I'm worried that is going to bite us further down the
line.

> = Open questions
> 
> * Where should we download dependencies?
> ** Should we use a folder per fetcher (ex. git and npm)?
> ** Should we use the main folder (ex. crate)?
> ** Should we translate the name into folder (ex. gomod)?
> ** Should we integrate the name into the filename (ex. git)?

DL_DIR is meant to be a complete cache of the source so it would need
to be downloaded there. Given it maps to the other fetchers, the
existing cache mechanisms likely work for these just fine, the open
question is on whether the lock/spec files should be cached after
extraction.

> * Where should we unpack the dependencies?
> ** Should we use a folder inside the parent folder (ex. node_modules)?
> ** Should we use a fixed folder inside unpackdir
>    (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?

This likely depends on the fetcher as the different mechanisms will
have different expectations about how they should be extracted (as
npm/etc. would).

> * How should we treat archives for package manager caches?
> ** Should we unpack the archives to support patching (ex. npm)?
> ** Should we copy the packed archive to avoid unpacking and packaging
>    (ex. gomod)?

If there are archives left after do_unpack, which task is going to
unpack those? Are we expecting the build process in configure/compile
to decompress them? Would those management tools accept things if they
were extracted earlier? "unpack" would be the correct time to do it but
I can see this getting into conflict with the package manager :/.

> This patch series depends on patch series
> 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com
> ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior").

Those merged thanks. I did wonder if patches 1-5 of this series could
be merged separately too as they look reasonable regardless of the rest
of the series?

Cheers,

Richard

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-06 11:04 ` Richard Purdie
@ 2025-01-06 14:35   ` Stefan Herbrechtsmeier
  2025-01-06 15:30     ` Richard Purdie
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-06 14:35 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 12558 bytes --]

Am 06.01.2025 um 12:04 schrieb Richard Purdie:
> On Fri, 2024-12-20 at 12:25 +0100, Stefan Herbrechtsmeier via lists.openembedded.org wrote:
>> From: Stefan Herbrechtsmeier<stefan.herbrechtsmeier@weidmueller.com>
>>
>> The patch series improves the fetcher support for tightly coupled
>> package manager (npm, go and cargo). It adds support for embedded
>> dependency fetcher via a common dependency mixin. The patch series
>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
>> fetcher for go.sum and cargo.lock files. The dependency mixin contains
>> two stages. The first stage locates a local specification file or
>> fetches an archive or git repository with a specification file. The
>> second stage resolves the dependency URLs from the specification file
>> and fetches the dependencies.
>>
>> SRC_URI = "<type>://npm-shrinkwrap.json"
>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
>>
>> Additionally, the patch series reworks the npm fetcher to work without a
>> npm binary and external package repository. It adds support for a common
>> dependency name and version schema to integrate the dependencies into
>> the SBOM.
>>
>> = Background
>> Bitbake has diverse concepts and drawbacks for different tightly coupled
>> package manager. The Python support uses a recipe per dependency and
>> generates common fetcher URLs via a python function. The other languages
>> embed the dependencies inside the recipe. The Node.js support offers a
>> npmsw fetcher which uses a lock file beside the recipe to generates
>> multiple common fetcher URLs on the fly and thereby hides the real
>> download sources. This leads to a single source in the SBOM for example.
>> The Go support contains two parallel implementations. A vendor-based
>> solution with a common fetcher and a go-mod-based solution with a gomod
>> fetcher. The vendor-based solution includes the individual dependencies
>> into the SRC_URI of the recipe and uses a python function to generate
>> common fetcher URLs which additional information for the vendor task.The
>> gomod fetcher uses a proprietary gomod URL. It translates the URL into a
>> common URL and prepares meta data during unpack. The Rust support
>> includes the individual dependencies in the SRC_URI of the recipe and
>> uses proprietary crate URLs. The crate fetcher translates a proprietary
>> URL into a common fetcher URL and prepares meta data during unpack. The
>> recipetool does not support the crate and the gomod fetcher. This leads
>> to missing licenses of the dependencies in the recipe for example
>> librsvg.
>>
>> The steps needed to fetch dependencies for Node.js, Go and Rust are
>> similar:
>> 1. Extract the dependencies from a specification file (name, version,
>>     checksum and URL)
>> 2. Generate proprietary fetcher URIs
>>    a. npm://registry.npmjs.org/;package=glob;version= 10.3.15
>>    b. gomod://golang.org/x/net;version=v0.9.0
>>       gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net
>>    c. crate://crates.io/glob/0.3.1
>> 3. Generate wget or git fetcher URIs
>>    a.https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=…
>>    b.https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=…
>>       git://go.googlesource.com/net;protocol=https; subdir=…
>>    c.https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=…
>> 4. Unpack
>> 5. Create meta files
>>    a. Update lockfile and create tar.gz archives
>>    b. Create go.mod file
>>       Create info, go.mod file and zip archives
>>    c. Create .cargo-checksum.json files
>>
>> It looks like the recipetool is not widely used and therefore this patch
>> series integrates the dependency resolving into the fetcher. After an
>> agreement on a concept the fetcher could be extended. The fetcher could
>> download the license information per package and a new build task could
>> run the license cruncher from the recipetool.
> I've spent a bit more time thinking about this and looking at the code
> and I've mixed feelings on it.I can certainly see why you've
> implemented it this way and it does have a lot of potential but there
> are also potential risks.

Thank you very much for your feedback.

> My comments (on various elements):
>
> With a npm-shrinkwrap.json/package-lock.json/go.sum file, are
> dependencies always recorded as specific entities with checksums?

Yes, every dependency contains a fixed version and a checksum. The 
purpose of the file is integrity and reproducibility.

>   I'm a
> little bit worried about how easily you could sneak a "floating"
> version into this and make the fetcher non-deterministic. Does (or
> could?) the code detect and error on that?

We could raise an error if a checksum is missing in the dependency 
specification file or make the checksum mandatory for the dependency 
fetcher.  Furthermore we could inspect the dependency URLs to detect a 
misuse of the file like a latest string for the version.

> Put another way, could one of these SRC_URIs map to multiple different
> combinations of underlying component versions?

If you mean the extracted SRC_URI for a single dependency from the 
dependency specification file (ex. npm-shrinkwrap.json) it could use 
special URLs to map to the latest version. But this is a missus of the 
dependency specification file and could be detected. The tools generate 
files with fixed versions always because a floating version with a fixed 
checksum make no senses.

> Our existing method effectively hardcodes/expands the lock file into
> extended SRC_URI entries which makes the specific versions and
> components really clear. This change abstracts that away into the
> fetcher and makes it opaque to the user, and much harder for code like
> the archiver/license/spdx code to fine/handle.

Really? Let's use the crate fetcher as an example. At the moment the 
cargo-update-recipe-crates class extract the URI and checksum from the 
Cargo.lock. The class ignores the licenses and they leads to missing 
licenses in the recipe. The spdx files contains bitbake specific fetcher 
URLs only which are unknown outside of bitbake.

I also thought it would make sense to generate recipes from the 
dependency specification files and therefore worked on the recipetool 
previous. But it looks like the tool isn't really used and I'm afraid 
nobody will use the recipe to fix dependencies. In most cases it is easy 
to update a dependency in the native tooling and only provide an updated 
dependency specification file.

I have a WIP to integrate the the dependencies into the spdx . This uses 
the expanded_urldata / implicit_urldata function to add the dependencies 
to the process list of archiver and spdx.

https://github.com/weidmueller/poky/tree/feature/dependency-fetcher

Regarding the license we could migrate the functionality from recipetool 
into a class and detect the licenses at build time. Theoretically the 
fetcher could fetch the license from the package manager repository but 
we have to trust the repository because we have no checksum to detect 
changes. Maybe we could integrate tools like Syft or ScanCode to detect 
the licenses at build time. At the moment the best solution is to make 
sure that the SBOM contains the name and version of the dependencies and 
let other tools handle the license via SBOM for now. Therefore I propose 
a common scheme to define the dependency name (dn) and version (dv) in 
the SRC_URI.

> I noticed that any fetcher operation has to first expand the lock file
> using a temporary directory.
I follow gitsm and open for suggestions. The expand happens only once 
per fetcher object. The sub fechter object is saved in the proxy variable.

> You're using DL_DIR for that which I
> suspect isn't a great idea for tmp files.
Take over from gitsm.

>   In many cases that will work
> fine but it is a bit of a performance overhead.
>
> I did start wondering if we should cache the lock files in a subdir of
> DL_DIR to help performance and also give some extra assurance about
> changing content.
This would be possible. I assume the best would be another sub SRC_URI 
to avoid code duplication for the locking and change detection.

> The url scheme is clever but also has a potential risk in that you
> can't really pass parameters to both the top level fetcher and the
> underlying one. I'm worried that is going to bite us further down the
> line.

At the moment I don't see a real problem but maybe you are right. The 
existing language specific fetcher use fixed paths for there downloads.

What do you propose? Should the fetcher skip the unpack of the source or 
should we introduce a sub fetcher which uses the download from an other 
SRC_URI entry. The two entries could be linked via the name parameter. 
This approach could be combined with your suggestion above. The new 
fetcher will unpack a lock file from an other (default) download.

>> = Open questions
>>
>> * Where should we download dependencies?
>> ** Should we use a folder per fetcher (ex. git and npm)?
>> ** Should we use the main folder (ex. crate)?
>> ** Should we translate the name into folder (ex. gomod)?
>> ** Should we integrate the name into the filename (ex. git)?
> DL_DIR is meant to be a complete cache of the source so it would need
> to be downloaded there. Given it maps to the other fetchers, the
> existing cache mechanisms likely work for these just fine, the open
> question is on whether the lock/spec files should be cached after
> extraction.

You misunderstand the question. Its about the downloadfilename 
parameter. At the moment some fetcher use sub folder inside DL_DIR and 
others use the main folder. It looks like every fetcher has its own 
concept to handle file collision between different fetchers. The git and 
npm fetcher use there own folder, the crate fetcher use its own .crate 
file prefix, the gomod fetcher translate the URL into multiple folders 
and the git fetcher translate the URL into a single folder name.

>> * Where should we unpack the dependencies?
>> ** Should we use a folder inside the parent folder (ex. node_modules)?
>> ** Should we use a fixed folder inside unpackdir
>>     (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
> This likely depends on the fetcher as the different mechanisms will
> have different expectations about how they should be extracted (as
> npm/etc. would).

It depends on the fetcher but the fetcher could use the same approach. 
At the moment every fetcher use a different approach. The crate fetcher 
use a fixed value. The gomod fetcher uses a variable (GO_MOD_CACHE_DIR) 
and the npm fetcher uses a parameter (destsuffix). Furthermore the gomod 
fetcher override the common subdir parameter.

>> * How should we treat archives for package manager caches?
>> ** Should we unpack the archives to support patching (ex. npm)?
>> ** Should we copy the packed archive to avoid unpacking and packaging
>>     (ex. gomod)?
> If there are archives left after do_unpack, which task is going to
> unpack those? Are we expecting the build process in configure/compile
> to decompress them? Would those management tools accept things if they
> were extracted earlier? "unpack" would be the correct time to do it but
> I can see this getting into conflict with the package manager :/.

Most package manager expect archives. In the npm case the archive is 
unpack by the fetcher and packed by thenpm.bbclass to support patching. 
The gomod fetcher doesn't unpack the downloaded archive and the gomodgit 
fetcher create archives from git folders during unpack. It would be 
possible to always keep the archives or always extract the archives and 
recreate archives during build. It is a decision between performance and 
patchability.

At the moment it is complicated to work with the different fetcher 
because every fetcher use a different concept and it is unclear what is 
the desired approach.

>> This patch series depends on patch series
>> 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com
>> ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior").
> Those merged thanks.

Thanks.

> I did wonder if patches 1-5 of this series could
> be merged separately too as they look reasonable regardless of the rest
> of the series?

Sure. Should I resend the patches as separate series?

Regards
   Stefan

[-- Attachment #2: Type: text/html, Size: 17141 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2024-12-25 15:17   ` Alexander Kanavin
@ 2025-01-06 14:42     ` Stefan Herbrechtsmeier
  2025-01-09 10:40       ` Alexander Kanavin
       [not found]       ` <18190013516DD62F.1999@lists.openembedded.org>
  0 siblings, 2 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-06 14:42 UTC (permalink / raw)
  To: Alexander Kanavin, richard.purdie; +Cc: bitbake-devel, Stefan Herbrechtsmeier

Am 25.12.2024 um 16:17 schrieb Alexander Kanavin:
> On Mon, 23 Dec 2024 at 11:03, Richard Purdie via
> lists.openembedded.org
> <richard.purdie=linuxfoundation.org@lists.openembedded.org> wrote:
>> Would you be able to check if this work meets the criteria set out
>> there and if not, what the differences are?
> I'd also add that this would benefit from a demonstration with one of
> the real recipes go/rust recipes in oe-core: basically it would be
> good to push a branch of poky somewhere public, and provide
> instructions on how to see the new fetchers in action, and observe
> their benefits.

https://github.com/yoctoproject/poky/compare/master...weidmueller:poky:feature/dependency-fetcher

I have migrate the crate recipes to the new fetcher and improve the spdx 
2.2 class to include the name and version of the crate dependencies.

You have to inherit the create-spdx-2.2 class and build the librsvg 
recipe to test the new fetcher.



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-06 14:35   ` Stefan Herbrechtsmeier
@ 2025-01-06 15:30     ` Richard Purdie
  2025-01-07  9:47       ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2025-01-06 15:30 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
> Am 06.01.2025 um 12:04 schrieb Richard Purdie:
> 
> > 
> > My comments (on various elements):
> > 
> > With a npm-shrinkwrap.json/package-lock.json/go.sum file, are
> > dependencies always recorded as specific entities with checksums?
> >   
> Yes, every dependency contains a fixed version and a checksum. The
> purpose of the file is integrity and reproducibility.

Thanks for confirming, that hasn't always been the case for some of
these package management systems!

> >  I'm a
> > little bit worried about how easily you could sneak a "floating"
> > version into this and make the fetcher non-deterministic. Does (or
> > could?) the code detect and error on that? 
> 
> We could raise an error if a checksum is missing in the dependency
> specification file or make the checksum mandatory for the dependency
> fetcher.  Furthermore we could inspect the dependency URLs to detect
> a misuse of the file like a latest string for the version.


I think adding such an error would be a requirement for merging this.

> > Put another way, could one of these SRC_URIs map to multiple
> > different
> > combinations of underlying component versions?
> 
> If you mean the extracted SRC_URI for a single dependency from the
> dependency specification file (ex. npm-shrinkwrap.json) it could use
> special URLs to map to the latest version. But this is a missus of
> the dependency specification file and could be detected. The tools
> generate files with fixed versions always because a floating version
> with a fixed checksum make no senses.

Even if it shouldn't happen, we need to detect and error for this case
as it would become very problematic for us.

> > Our existing method effectively hardcodes/expands the lock file
> > into
> > extended SRC_URI entries which makes the specific versions and
> > components really clear. This change abstracts that away into the
> > fetcher and makes it opaque to the user, and much harder for code
> > like
> > the archiver/license/spdx code to fine/handle.
> 
> Really? Let's use the crate fetcher as an example. At the moment the
> cargo-update-recipe-crates class extract the URI and checksum from
> the Cargo.lock. The class ignores the licenses and they leads to
> missing licenses in the recipe. The spdx files contains bitbake
> specific fetcher URLs only which are unknown outside of bitbake.

I guess what I'm trying to say is that people generally easily
understand the explicit expanded urls. Whilst that class does ignore
license handling, the hope was that it would get added, it is certainly
possible to fix that.

>  I also thought it would make sense to generate recipes from the
> dependency specification files and therefore worked on the recipetool
> previous. But it looks like the tool isn't really used and I'm afraid
> nobody will use the recipe to fix dependencies. In most cases it is
> easy to update a dependency in the native tooling and only provide an
> updated dependency specification file.

I think people have wanted a single simple command to translate the
specification file into our recipe format to update the recipe. For
various reasons people didn't seem to find the recipetool approach was
working and created the task workflow based one. There are pros and
cons to both and I don't have a strong preference. I would like to see
something which makes it clear to users what is going on though and is
simple to use.

People do intuitively understand a .inc file with a list of urls in it.
There are challenges in updating it.

This other approach is not as intuitive as everything is abstracted out
of sight.

One thing for example which worries me is how are the license fields in
the recipe going to be updated?

Currently, if we teach the class, it can set LICENSE variables
appropriately. With the new approach, you don't know the licenses until
after unpack has run. Yes it can write it into the SPDX, but it won't
work for something like the layer index or forms of analysis which
don't build things.

This does also extend to vulnerability analysis since we can't know
what is in a given recipe without actually unpacking it. For example we
could know crate XXX at version YYY has a CVE but we can't tell if a
recipe uses that crate until after do_unpack, or at least not without
expandurl. 

>  I have a WIP to integrate the the dependencies into the spdx . This
> uses the expanded_urldata / implicit_urldata function to add the
> dependencies to the process list of archiver and spdx.
>  
> https://github.com/weidmueller/poky/tree/feature/dependency-fetcher
> 
> Regarding the license we could migrate the functionality from
> recipetool into a class and detect the licenses at build time.
> Theoretically the fetcher could fetch the license from the package
> manager repository but we have to trust the repository because we
> have no checksum to detect changes. Maybe we could integrate tools
> like Syft or ScanCode to detect the licenses at build time. At the
> moment the best solution is to make sure that the SBOM contains the
> name and version of the dependencies and let other tools handle the
> license via SBOM for now. Therefore I propose a common scheme to
> define the dependency name (dn) and version (dv) in the SRC_URI.

We could compare what licenses the package manager is showing us with
what is in the recipe and error if different. There would then need to
be a command to update the licenses in the recipe (in much the way urls
currently get updated).

> >  
> > I noticed that any fetcher operation has to first expand the lock
> > file
> > using a temporary directory.
> >  
>  I follow gitsm and open for suggestions. The expand happens only
> once per fetcher object. The sub fechter object is saved in the proxy
> variable.

That fetcher object has to be recreated in every task or task context
using the fetcher.

> >  
> > You're using DL_DIR for that which I
> > suspect isn't a great idea for tmp files.
> >  
>  Take over from gitsm.

Probably not the best fetcher and I'd say gitsm should be fixed.

> >  
> >  In many cases that will work
> > fine but it is a bit of a performance overhead.
> > 
> > I did start wondering if we should cache the lock files in a subdir
> > of
> > DL_DIR to help performance and also give some extra assurance about
> > changing content.
> >  
>  This would be possible. I assume the best would be another sub
> SRC_URI to avoid code duplication for the locking and change
> detection.

Probably, I did wonder if the mixin could cover that
abstraction/caching.

> >  
> > The url scheme is clever but also has a potential risk in that you
> > can't really pass parameters to both the top level fetcher and the
> > underlying one. I'm worried that is going to bite us further down
> > the
> > line.
> >  
>  
> At the moment I don't see a real problem but maybe you are right. The
> existing language specific fetcher use fixed paths for there
> downloads.
>  
>  What do you propose? Should the fetcher skip the unpack of the
> source or should we introduce a sub fetcher which uses the download
> from an other SRC_URI entry. The two entries could be linked via the
> name parameter. This approach could be combined with your suggestion
> above. The new fetcher will unpack a lock file from an other
> (default) download.
>  
I'm not really sure what is best right now. I'm trying to spell out the
pros/cons of what is going on here in the hope it encourages others to
give feedback as well. I agree there isn't a problem right now but I
worry there soon will be by mixing two things together like this. The
way we handle git protocol does cause us friction with other urls
schemes already.

> > > = Open questions
> > > 
> > > * Where should we download dependencies?
> > > ** Should we use a folder per fetcher (ex. git and npm)?
> > > ** Should we use the main folder (ex. crate)?
> > > ** Should we translate the name into folder (ex. gomod)?
> > > ** Should we integrate the name into the filename (ex. git)?
> > >  
> >  
> > DL_DIR is meant to be a complete cache of the source so it would
> > need
> > to be downloaded there. Given it maps to the other fetchers, the
> > existing cache mechanisms likely work for these just fine, the open
> > question is on whether the lock/spec files should be cached after
> > extraction.
> >  
>  
> You misunderstand the question. Its about the downloadfilename
> parameter. At the moment some fetcher use sub folder inside DL_DIR
> and others use the main folder. It looks like every fetcher has its
> own concept to handle file collision between different fetchers. The
> git and npm fetcher use there own folder, the crate fetcher use its
> own .crate file prefix, the gomod fetcher translate the URL into
> multiple folders and the git fetcher translate the URL into a single
> folder name.

That makes more sense. The layout is partially legacy. The wget and
local fetchers were first and hence go directly into DL_DIR. git/svn
were separated out into their own directories with a plan to have a
directory per fetcher. That didn't always work out with each newer
fetcher. Each fetcher does have to handle a unique naming of its urls
as only the specific fetcher can know all the urls parameters and which
ones affect the output vs which ones don't.


> > > * Where should we unpack the dependencies?
> > > ** Should we use a folder inside the parent folder (ex.
> > > node_modules)?
> > > ** Should we use a fixed folder inside unpackdir
> > >    (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
> > >  
> >  
> > This likely depends on the fetcher as the different mechanisms will
> > have different expectations about how they should be extracted (as
> > npm/etc. would).
> >  
>  
> It depends on the fetcher but the fetcher could use the same
> approach. At the moment every fetcher use a different approach. The
> crate fetcher use a fixed value. The gomod fetcher uses a variable
> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
> Furthermore the gomod fetcher override the common subdir parameter.

I think we really need to standardise that if we can. Each new fetcher
has claimed a certain approach is effectively required by the package
manager.

>  
> > > * How should we treat archives for package manager caches?
> > > ** Should we unpack the archives to support patching (ex. npm)?
> > > ** Should we copy the packed archive to avoid unpacking and
> > > packaging
> > >    (ex. gomod)?
> > >  
> >  
> > If there are archives left after do_unpack, which task is going to
> > unpack those? Are we expecting the build process in
> > configure/compile
> > to decompress them? Would those management tools accept things if
> > they
> > were extracted earlier? "unpack" would be the correct time to do it
> > but
> > I can see this getting into conflict with the package manager :/.
> >  
>  
> Most package manager expect archives. In the npm case the archive is
> unpack by the fetcher and packed by thenpm.bbclass to support
> patching. The gomod fetcher doesn't unpack the downloaded archive and
> the gomodgit fetcher create archives from git folders during unpack.
> It would be possible to always keep the archives or always extract
> the archives and recreate archives during build. It is a decision
> between performance and patchability.
>  
>  At the moment it is complicated to work with the different fetcher
> because every fetcher use a different concept and it is unclear what
> is the desired approach.

This is a challenge. Can we handle the unpacking with the package
manager as a specific step or does it have to be combined with other
steps like configure/compile?

> >  
> > I did wonder if patches 1-5 of this series could
> > be merged separately too as they look reasonable regardless of the
> > rest
> > of the series?
> >  
>  
> Sure. Should I resend the patches as separate series?

Yes please, that would then let us remove the bits we can easily
review/sort and focus on this other part.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-06 15:30     ` Richard Purdie
@ 2025-01-07  9:47       ` Stefan Herbrechtsmeier
  2025-01-07 11:01         ` Richard Purdie
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-07  9:47 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 16149 bytes --]

Am 06.01.2025 um 16:30 schrieb Richard Purdie:
> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
>> Am 06.01.2025 um 12:04 schrieb Richard Purdie:
>>
>>> My comments (on various elements):
>>>
>>> With a npm-shrinkwrap.json/package-lock.json/go.sum file, are
>>> dependencies always recorded as specific entities with checksums?
>>>    
>> Yes, every dependency contains a fixed version and a checksum. The
>> purpose of the file is integrity and reproducibility.
> Thanks for confirming, that hasn't always been the case for some of
> these package management systems!
>
>>>   I'm a
>>> little bit worried about how easily you could sneak a "floating"
>>> version into this and make the fetcher non-deterministic. Does (or
>>> could?) the code detect and error on that?
>> We could raise an error if a checksum is missing in the dependency
>> specification file or make the checksum mandatory for the dependency
>> fetcher.  Furthermore we could inspect the dependency URLs to detect
>> a misuse of the file like a latest string for the version.
>
> I think adding such an error would be a requirement for merging this.

Should the dependency fetcher (ex. npmsw) or the language specific 
fetcher (ex. npm) fail if the version points to a latest version?

>>> Put another way, could one of these SRC_URIs map to multiple
>>> different
>>> combinations of underlying component versions?
>> If you mean the extracted SRC_URI for a single dependency from the
>> dependency specification file (ex. npm-shrinkwrap.json) it could use
>> special URLs to map to the latest version. But this is a missus of
>> the dependency specification file and could be detected. The tools
>> generate files with fixed versions always because a floating version
>> with a fixed checksum make no senses.
> Even if it shouldn't happen, we need to detect and error for this case
> as it would become very problematic for us.

Okay. Should we disallow a dynamic version for package manager downloads 
generally or do you see a reasonable use case?

>>> Our existing method effectively hardcodes/expands the lock file
>>> into
>>> extended SRC_URI entries which makes the specific versions and
>>> components really clear. This change abstracts that away into the
>>> fetcher and makes it opaque to the user, and much harder for code
>>> like
>>> the archiver/license/spdx code to fine/handle.
>> Really? Let's use the crate fetcher as an example. At the moment the
>> cargo-update-recipe-crates class extract the URI and checksum from
>> the Cargo.lock. The class ignores the licenses and they leads to
>> missing licenses in the recipe. The spdx files contains bitbake
>> specific fetcher URLs only which are unknown outside of bitbake.
> I guess what I'm trying to say is that people generally easily
> understand the explicit expanded urls. Whilst that class does ignore
> license handling, the hope was that it would get added, it is certainly
> possible to fix that.

I will check how complicated it is to extract the license information 
from the package registries.

>>   I also thought it would make sense to generate recipes from the
>> dependency specification files and therefore worked on the recipetool
>> previous. But it looks like the tool isn't really used and I'm afraid
>> nobody will use the recipe to fix dependencies. In most cases it is
>> easy to update a dependency in the native tooling and only provide an
>> updated dependency specification file.
> I think people have wanted a single simple command to translate the
> specification file into our recipe format to update the recipe. For
> various reasons people didn't seem to find the recipetool approach was
> working and created the task workflow based one. There are pros and
> cons to both and I don't have a strong preference. I would like to see
> something which makes it clear to users what is going on though and is
> simple to use.
>
> People do intuitively understand a .inc file with a list of urls in it.
> There are challenges in updating it.
>
> This other approach is not as intuitive as everything is abstracted out
> of sight.
>
> One thing for example which worries me is how are the license fields in
> the recipe going to be updated?
>
> Currently, if we teach the class, it can set LICENSE variables
> appropriately. With the new approach, you don't know the licenses until
> after unpack has run. Yes it can write it into the SPDX, but it won't
> work for something like the layer index or forms of analysis which
> don't build things.
>
> This does also extend to vulnerability analysis since we can't know
> what is in a given recipe without actually unpacking it. For example we
> could know crate XXX at version YYY has a CVE but we can't tell if a
> recipe uses that crate until after do_unpack, or at least not without
> expandurl.

The main question is if the meta data should contain all information. If 
yes, we shouldn't allow any fetcher which requires an external source. 
This should include the gitsm fetcher and we should replace the single 
SRC_URI with multiple git SRC_URIs.

We can go even further and forbid specific package manager fetchers and 
use plain https or git SRC_URIs. The python and go-vendor fetcher use 
this approach.

Alternative we allow dependency fetchers and require that the meta data 
be always used via bitbake. In this case we could extend the meta data 
via the fetcher.

In both cases it is possible to produce the same meta data. It doesn't 
matter if we use recipetool, devtool, bbclasses or fetcher. In any case 
we could resolve the SRC_URIs, checksums or srcrev from a file. The 
license information could be fetched from the package repositories 
without integrity checks or could be extracted from the individual 
package description file inside the downloaded sources (ex. npm). We 
should skip the license detection from license files for now because 
they generate manual work and could be discuses later.

The recipe approach has the advantage that it uses fixed licenses and 
that license changes could be (theoretical) reviewed during recipe 
update. In contrast the fetcher approach reduces the update procedure to 
a simple file rename or SRCREV update (ex. gitsm). Furthermore, the user 
could simply place a file beside the recipe to update the dependencies. 
Could we realize the same via devtool integration and a patch?

We have different solutions between the languages (ex. npmsw vs crate vs 
pypi) and even inside the languages (ex. go-vendor vs gomod). I would 
like to unify the dependency support. It doesn't matter if we decide to 
use the bitbake fetcher or a bitbake / devtool command for the 
dependency and license resolution.

>>   I have a WIP to integrate the the dependencies into the spdx . This
>> uses the expanded_urldata / implicit_urldata function to add the
>> dependencies to the process list of archiver and spdx.
>>   
>> https://github.com/weidmueller/poky/tree/feature/dependency-fetcher
>>
>> Regarding the license we could migrate the functionality from
>> recipetool into a class and detect the licenses at build time.
>> Theoretically the fetcher could fetch the license from the package
>> manager repository but we have to trust the repository because we
>> have no checksum to detect changes. Maybe we could integrate tools
>> like Syft or ScanCode to detect the licenses at build time. At the
>> moment the best solution is to make sure that the SBOM contains the
>> name and version of the dependencies and let other tools handle the
>> license via SBOM for now. Therefore I propose a common scheme to
>> define the dependency name (dn) and version (dv) in the SRC_URI.
> We could compare what licenses the package manager is showing us with
> what is in the recipe and error if different. There would then need to
> be a command to update the licenses in the recipe (in much the way urls
> currently get updated).

Either we request the licenses from the package manager during package 
update or during fetch. I wouldn't do both. Instead I would analyze the 
the license file during build and compare the detected license with the 
recipe or fetcher generated licenses. But the license detection from 
files is an other topic and I would like to postpone it for now.

>>>   
>>> I noticed that any fetcher operation has to first expand the lock
>>> file
>>> using a temporary directory.
>>>   
>>   I follow gitsm and open for suggestions. The expand happens only
>> once per fetcher object. The sub fechter object is saved in the proxy
>> variable.
> That fetcher object has to be recreated in every task or task context
> using the fetcher.

Okay. In this case it makes sense to cache the resolved URIs.

>>> You're using DL_DIR for that which I
>>> suspect isn't a great idea for tmp files.
>>>   
>>   Take over from gitsm.
> Probably not the best fetcher and I'd say gitsm should be fixed.

I don't see a reason why the gitsm fetcher shouldn't handled like the 
other dependency fetcher. We could update the handler after we have a 
decision for the dependency fetchers.

>>>   In many cases that will work
>>> fine but it is a bit of a performance overhead.
>>>
>>> I did start wondering if we should cache the lock files in a subdir
>>> of
>>> DL_DIR to help performance and also give some extra assurance about
>>> changing content.
>>>   
>>   This would be possible. I assume the best would be another sub
>> SRC_URI to avoid code duplication for the locking and change
>> detection.
> Probably, I did wonder if the mixin could cover that
> abstraction/caching.

That should be possible.

>>> The url scheme is clever but also has a potential risk in that you
>>> can't really pass parameters to both the top level fetcher and the
>>> underlying one. I'm worried that is going to bite us further down
>>> the
>>> line.
>>>   
>>   
>> At the moment I don't see a real problem but maybe you are right. The
>> existing language specific fetcher use fixed paths for there
>> downloads.
>>   
>>   What do you propose? Should the fetcher skip the unpack of the
>> source or should we introduce a sub fetcher which uses the download
>> from an other SRC_URI entry. The two entries could be linked via the
>> name parameter. This approach could be combined with your suggestion
>> above. The new fetcher will unpack a lock file from an other
>> (default) download.
>>   
> I'm not really sure what is best right now. I'm trying to spell out the
> pros/cons of what is going on here in the hope it encourages others to
> give feedback as well. I agree there isn't a problem right now but I
> worry there soon will be by mixing two things together like this. The
> way we handle git protocol does cause us friction with other urls
> schemes already.

The dependency fetcher could simple skip the unpack. In this case the 
user needs to use a variable to pass the same URL to the git and 
dependency fetcher or we could provide a python function to generate two 
SRC_URI with the same base URL.

>>>> = Open questions
>>>>
>>>> * Where should we download dependencies?
>>>> ** Should we use a folder per fetcher (ex. git and npm)?
>>>> ** Should we use the main folder (ex. crate)?
>>>> ** Should we translate the name into folder (ex. gomod)?
>>>> ** Should we integrate the name into the filename (ex. git)?
>>>>   
>>>   
>>> DL_DIR is meant to be a complete cache of the source so it would
>>> need
>>> to be downloaded there. Given it maps to the other fetchers, the
>>> existing cache mechanisms likely work for these just fine, the open
>>> question is on whether the lock/spec files should be cached after
>>> extraction.
>>>   
>>   
>> You misunderstand the question. Its about the downloadfilename
>> parameter. At the moment some fetcher use sub folder inside DL_DIR
>> and others use the main folder. It looks like every fetcher has its
>> own concept to handle file collision between different fetchers. The
>> git and npm fetcher use there own folder, the crate fetcher use its
>> own .crate file prefix, the gomod fetcher translate the URL into
>> multiple folders and the git fetcher translate the URL into a single
>> folder name.
> That makes more sense. The layout is partially legacy. The wget and
> local fetchers were first and hence go directly into DL_DIR. git/svn
> were separated out into their own directories with a plan to have a
> directory per fetcher. That didn't always work out with each newer
> fetcher. Each fetcher does have to handle a unique naming of its urls
> as only the specific fetcher can know all the urls parameters and which
> ones affect the output vs which ones don't.
This doesn't explain why the npm but not the gomod and crate fetcher use 
a sub folder. All fetchers are based on the wget fetcher.

>>>> * Where should we unpack the dependencies?
>>>> ** Should we use a folder inside the parent folder (ex.
>>>> node_modules)?
>>>> ** Should we use a fixed folder inside unpackdir
>>>>     (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>>>>   
>>>   
>>> This likely depends on the fetcher as the different mechanisms will
>>> have different expectations about how they should be extracted (as
>>> npm/etc. would).
>>>   
>>   
>> It depends on the fetcher but the fetcher could use the same
>> approach. At the moment every fetcher use a different approach. The
>> crate fetcher use a fixed value. The gomod fetcher uses a variable
>> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
>> Furthermore the gomod fetcher override the common subdir parameter.
> I think we really need to standardise that if we can. Each new fetcher
> has claimed a certain approach is effectively required by the package
> manager.

What would be your desired solution? Is the variable okay or do you 
prefer a self contain SRC_URI?

>>>> * How should we treat archives for package manager caches?
>>>> ** Should we unpack the archives to support patching (ex. npm)?
>>>> ** Should we copy the packed archive to avoid unpacking and
>>>> packaging
>>>>     (ex. gomod)?
>>>>   
>>>   
>>> If there are archives left after do_unpack, which task is going to
>>> unpack those? Are we expecting the build process in
>>> configure/compile
>>> to decompress them? Would those management tools accept things if
>>> they
>>> were extracted earlier? "unpack" would be the correct time to do it
>>> but
>>> I can see this getting into conflict with the package manager :/.
>>>   
>>   
>> Most package manager expect archives. In the npm case the archive is
>> unpack by the fetcher and packed by thenpm.bbclass to support
>> patching. The gomod fetcher doesn't unpack the downloaded archive and
>> the gomodgit fetcher create archives from git folders during unpack.
>> It would be possible to always keep the archives or always extract
>> the archives and recreate archives during build. It is a decision
>> between performance and patchability.
>>   
>>   At the moment it is complicated to work with the different fetcher
>> because every fetcher use a different concept and it is unclear what
>> is the desired approach.
> This is a challenge. Can we handle the unpacking with the package
> manager as a specific step or does it have to be combined with other
> steps like configure/compile?

It looks like this is possible:
cargo fetch
go mod vendor
npm install

I suspect you're thinking about using the package manager in do_unpack 
to unpack the archives and patch the unpacked archives afterwards?

>>> I did wonder if patches 1-5 of this series could
>>> be merged separately too as they look reasonable regardless of the
>>> rest
>>> of the series?
>>>   
>>   
>> Sure. Should I resend the patches as separate series?
> Yes please, that would then let us remove the bits we can easily
> review/sort and focus on this other part.

Done.

I will also resend the go h1 checksum commit separate because it could 
be useful for the gomod fetcher.

Should I also move the dn / dv parameter patches to a separate series 
because it could be useful without the dependency fetcher. I could add 
the parameters to the fetchers in a backward compatible way.

Regards
   Stefan

[-- Attachment #2: Type: text/html, Size: 22351 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-07  9:47       ` Stefan Herbrechtsmeier
@ 2025-01-07 11:01         ` Richard Purdie
  2025-01-07 16:13           ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Richard Purdie @ 2025-01-07 11:01 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier, bitbake-devel; +Cc: Stefan Herbrechtsmeier

On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote:
> Am 06.01.2025 um 16:30 schrieb Richard Purdie:
> > On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
> > > >  I'm a little bit worried about how easily you could sneak a
> > > > "floating" version into this and make the fetcher non-
> > > > deterministic. Does (or could?) the code detect and error on
> > > > that? 
> > > >   
> > > We could raise an error if a checksum is missing in the
> > > dependency specification file or make the checksum mandatory for
> > > the dependency fetcher.  Furthermore we could inspect the
> > > dependency URLs to detect a misuse of the file like a latest
> > > string for the version.
> > >  
> > 
> > I think adding such an error would be a requirement for merging
> > this.
> >   
> Should the dependency fetcher (ex. npmsw) or the language specific
> fetcher (ex. npm) fail if the version points to a latest version?

I think right now it has to error to try and reduce complexity. It is
possible to support such things but you have to pass that version
information back up the stack so that PV represents the different
versions and that is a new level of complexity.

I guess we should consider how you could theoretically support it as
that might influence the design. With multiple git repos in SRC_URI for
example, we end up adding multiple shortened shas to construct a PV so
that if any change, PV changes. We also have to add an incrementing
integer so that on opkg/dpkg/rpm operations work and versions sort.

> > > > Put another way, could one of these SRC_URIs map to multiple
> > > > different combinations of underlying component versions?
> > > 
> > > If you mean the extracted SRC_URI for a single dependency from
> > > the dependency specification file (ex. npm-shrinkwrap.json) it
> > > could use special URLs to map to the latest version. But this is
> > > a missus of the dependency specification file and could be
> > > detected. The tools generate files with fixed versions always
> > > because a floating version with a fixed checksum make no senses. 
> > 
> > Even if it shouldn't happen, we need to detect and error for this
> > case as it would become very problematic for us.
> > 
> Okay. Should we disallow a dynamic version for package manager
> downloads generally or do you see a reasonable use case?

See above.

> > > 
> > >  I also thought it would make sense to generate recipes from the
> > > dependency specification files and therefore worked on the
> > > recipetool
> > > previous. But it looks like the tool isn't really used and I'm
> > > afraid
> > > nobody will use the recipe to fix dependencies. In most cases it
> > > is
> > > easy to update a dependency in the native tooling and only
> > > provide an
> > > updated dependency specification file.
> > >  
> >  
> > I think people have wanted a single simple command to translate the
> > specification file into our recipe format to update the recipe. For
> > various reasons people didn't seem to find the recipetool approach
> > was working and created the task workflow based one. There are pros
> > and cons to both and I don't have a strong preference. I would like
> > to see something which makes it clear to users what is going on
> > though and is simple to use.
> > 
> > People do intuitively understand a .inc file with a list of urls in
> > it. There are challenges in updating it.
> > 
> > This other approach is not as intuitive as everything is abstracted
> > out of sight.
> > 
> > One thing for example which worries me is how are the license
> > fields in the recipe going to be updated?
> > 
> > Currently, if we teach the class, it can set LICENSE variables
> > appropriately. With the new approach, you don't know the licenses
> > until
> > after unpack has run. Yes it can write it into the SPDX, but it
> > won't
> > work for something like the layer index or forms of analysis which
> > don't build things.
> > 
> > This does also extend to vulnerability analysis since we can't know
> > what is in a given recipe without actually unpacking it. For
> > example we
> > could know crate XXX at version YYY has a CVE but we can't tell if
> > a
> > recipe uses that crate until after do_unpack, or at least not
> > without
> > expandurl.
> >  
>  
> The main question is if the meta data should contain all information.
> If yes, we shouldn't allow any fetcher which requires an external
> source. This should include the gitsm fetcher and we should replace
> the single SRC_URI with multiple git SRC_URIs.


If we had tooling that supported that well we could certainly consider
it. It isn't straight forward as you can have a git repo containing
submodules which then themselves contain submodules which can then
contain more levels of submodules. There are therefore multiple levels
of expansion possible.

> We can go even further and forbid specific package manager fetchers
> and use plain https or git SRC_URIs. The python and go-vendor fetcher
> use this approach.
>  
>  Alternative we allow dependency fetchers and require that the meta
> data be always used via bitbake. In this case we could extend the
> meta data via the fetcher.
>  
>  In both cases it is possible to produce the same meta data. It
> doesn't matter if we use recipetool, devtool, bbclasses or fetcher.
> In any case we could resolve the SRC_URIs, checksums or srcrev from a
> file. The license information could be fetched from the package
> repositories without integrity checks or could be extracted from the
> individual package description file inside the downloaded sources
> (ex. npm). We should skip the license detection from license files
> for now because they generate manual work and could be discuses
> later.

That was the reason the current task based approach doesn't use them,
yet! I mention it just to highlight that it can be solved either way,
the approach doesn't really change what we need to do. The bigger
concern is having information available in the metadata which I think
we need do to some level regardless of which approach we choose.

> The recipe approach has the advantage that it uses fixed licenses and
> that license changes could be (theoretical) reviewed during recipe
> update. 

FWIW that is an important use case and one of our general strengths. We
can only do that as the license information is written in recipes and
can be compared at update time.

> In contrast the fetcher approach reduces the update procedure to a
> simple file rename or SRCREV update (ex. gitsm). Furthermore, the
> user could simply place a file beside the recipe to update the
> dependencies. Could we realize the same via devtool integration and a
> patch?

This is effectively what the task based approach is aiming for
currently. I think the idea was that we could have devtool/recipetool
integration around that update task, a task was just a convenient way
to capture the code to do it and get things working without needing the
tool to be finished.


>  We have different solutions between the languages (ex. npmsw vs
> crate vs pypi) and even inside the languages (ex. go-vendor vs
> gomod). I would like to unify the dependency support. It doesn't
> matter if we decide to use the bitbake fetcher or a bitbake / devtool
> command for the dependency and license resolution.

I do very much prefer having one good way of doing things rather than
multiple ways of doing things, each with a potential drawback. I'm
therefore broadly in favour of doing that as long as we don't upset too
much existing mindshare along the way.


> > >  
> > >  I have a WIP to integrate the the dependencies into the spdx .
> > > This
> > > uses the expanded_urldata / implicit_urldata function to add the
> > > dependencies to the process list of archiver and spdx.
> > >  
> > > https://github.com/weidmueller/poky/tree/feature/dependency-
> > > fetcher
> > > 
> > > Regarding the license we could migrate the functionality from
> > > recipetool into a class and detect the licenses at build time.
> > > Theoretically the fetcher could fetch the license from the
> > > package
> > > manager repository but we have to trust the repository because we
> > > have no checksum to detect changes. Maybe we could integrate
> > > tools
> > > like Syft or ScanCode to detect the licenses at build time. At
> > > the
> > > moment the best solution is to make sure that the SBOM contains
> > > the
> > > name and version of the dependencies and let other tools handle
> > > the
> > > license via SBOM for now. Therefore I propose a common scheme to
> > > define the dependency name (dn) and version (dv) in the SRC_URI.
> > >  
> >  
> > We could compare what licenses the package manager is showing us
> > with
> > what is in the recipe and error if different. There would then need
> > to
> > be a command to update the licenses in the recipe (in much the way
> > urls
> > currently get updated).
> >  
>  
> Either we request the licenses from the package manager during
> package update or during fetch. I wouldn't do both. Instead I would
> analyze the the license file during build and compare the detected
> license with the recipe or fetcher generated licenses. But the
> license detection from files is an other topic and I would like to
> postpone it for now.

Agreed, I mention it just to highlight that supporting them does have
impact on the design so any solution needs to ultimately be able to
support it.

> > > > You're using DL_DIR for that which I
> > > > suspect isn't a great idea for tmp files.
> > > 
> > >  Take over from gitsm.
> > 
> > Probably not the best fetcher and I'd say gitsm should be fixed.
>
> I don't see a reason why the gitsm fetcher shouldn't handled like the
> other dependency fetcher. We could update the handler after we have a
> decision for the dependency fetchers.


In principle perhaps but as mentioned above, gitsm has its own challenges.

> > > > The url scheme is clever but also has a potential risk in that you
> > > > can't really pass parameters to both the top level fetcher and the
> > > > underlying one. I'm worried that is going to bite us further down
> > > > the
> > > > line.
> > > 
> > > At the moment I don't see a real problem but maybe you are right. The
> > > existing language specific fetcher use fixed paths for there
> > > downloads.
> > >  
> > >  What do you propose? Should the fetcher skip the unpack of the
> > > source or should we introduce a sub fetcher which uses the download
> > > from an other SRC_URI entry. The two entries could be linked via the
> > > name parameter. This approach could be combined with your suggestion
> > > above. The new fetcher will unpack a lock file from an other
> > > (default) download.
> >  
> > I'm not really sure what is best right now. I'm trying to spell out the
> > pros/cons of what is going on here in the hope it encourages others to
> > give feedback as well. I agree there isn't a problem right now but I
> > worry there soon will be by mixing two things together like this. The
> > way we handle git protocol does cause us friction with other urls
> > schemes already.
> 
> The dependency fetcher could simple skip the unpack. In this case the
> user needs to use a variable to pass the same URL to the git and
> dependency fetcher or we could provide a python function to generate
> two SRC_URI with the same base URL.
> 



I'm starting to wonder about a slightly different approach, basically
an optional generated file alongside a recipe which contains "expanded"
information which is effectively expensive to generate (in computation
or resource like network access/process terms). We could teach bitbake
a new phase of parsing where it generated them if missing. There are
some other pieces of information which we know during the build process
which it would be helpful to know earlier (e.g. which packages a recipe
generates). I've wondered about this for a long time and the fetcher
issues remind me of it again. It would be a big change with advantages
and drawbacks. I think it would put more pressure on a layer maintainer
as they'd have to computationally keep this up to date and it would
complicate the patch workflow (who should send/regen the files?). I'm
putting the idea there, I'm not saying I think we should do it, I'm
just considering options.


>  = Open questions
> > > > > 
> > > > > * Where should we download dependencies?
> > > > > ** Should we use a folder per fetcher (ex. git and npm)?
> > > > > ** Should we use the main folder (ex. crate)?
> > > > > ** Should we translate the name into folder (ex. gomod)?
> > > > > ** Should we integrate the name into the filename (ex. git)?
> > > > >  
> > > > >  
> > > >  
> > > >  
> > > > DL_DIR is meant to be a complete cache of the source so it would
> > > > need
> > > > to be downloaded there. Given it maps to the other fetchers, the
> > > > existing cache mechanisms likely work for these just fine, the open
> > > > question is on whether the lock/spec files should be cached after
> > > > extraction.
> > >  
> > > You misunderstand the question. Its about the downloadfilename
> > > parameter. At the moment some fetcher use sub folder inside DL_DIR
> > > and others use the main folder. It looks like every fetcher has its
> > > own concept to handle file collision between different fetchers. The
> > > git and npm fetcher use there own folder, the crate fetcher use its
> > > own .crate file prefix, the gomod fetcher translate the URL into
> > > multiple folders and the git fetcher translate the URL into a single
> > > folder name.
> > 
> > That makes more sense. The layout is partially legacy. The wget and
> > local fetchers were first and hence go directly into DL_DIR. git/svn
> > were separated out into their own directories with a plan to have a
> > directory per fetcher. That didn't always work out with each newer
> > fetcher. Each fetcher does have to handle a unique naming of its urls
> > as only the specific fetcher can know all the urls parameters and which
> > ones affect the output vs which ones don't.
> >  
>  This doesn't explain why the npm but not the gomod and crate fetcher
> use a sub folder. All fetchers are based on the wget fetcher.

That is probably "my fault". Put yourself in my position. You get a ton
of different patches, all touching very varied aspects of the system.
When reviewing them you have to try and remember the original design
decisions, the future directions, the ways things broke in the past, a
desire to try and have clean consistent APIs and so on. I have tried
very hard to move things in a direction where things incrementally
improve, without unnecessarily blocking new features. It means that
things that merge often aren't perfect. We've tried a few different
approaches with the newer programming languages and each approach has
had pros and cons. The inconsistency is probably as I missed something
in review. Sorry :(.

I only have finite time. There are few people who seem to want to dive
in and help with review of patches like these. I did ask some people
yesterday, one told me they simply couldn't understand these patches.
I'm doing my best to ask the right questions, try and help others
understand them, ensure my own concerns I can identify are resolved and
I don't want to de-motivate you on this work either, I think the idea
of improving this is great and I'd love to see it. Equally, I'm also
the first person everyone will complain to if we change something and
it causes problems for people. 

So the explanation is probably I just missed something in review at
some point. The intent was to separate out the fetcher output going
forward (unless it makes sense to be shared).

FWIW there are multiple things which bother me about the existing
fetcher storage layout but that is a different discussion.

> > > > > 
> > > > > * Where should we unpack the dependencies?
> > > > > ** Should we use a folder inside the parent folder (ex.
> > > > > node_modules)?
> > > > > ** Should we use a fixed folder inside unpackdir
> > > > >    (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
> > > >  
> > > > This likely depends on the fetcher as the different mechanisms will
> > > > have different expectations about how they should be extracted (as
> > > > npm/etc. would).
> > > 
> > >  
> > > It depends on the fetcher but the fetcher could use the same
> > > approach. At the moment every fetcher use a different approach. The
> > > crate fetcher use a fixed value. The gomod fetcher uses a variable
> > > (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
> > > Furthermore the gomod fetcher override the common subdir parameter.
> > 
> > I think we really need to standardise that if we can. Each new fetcher
> > has claimed a certain approach is effectively required by the package
> > manager.


> >  What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI?

I suspect we need a default via a variable and then the option to
change the default via parameters. The default value should be a
bitbake fetcher namespaced control variable.

I'm wary of making a definitive statement saying X if that isn't going
to make sense for some backend though. I simply don't have enough
knowledge of them all, which is why you see me being reluctant to make
definitive statements about design.

> > > > > * How should we treat archives for package manager caches?
> > > > > ** Should we unpack the archives to support patching (ex. npm)?
> > > > > ** Should we copy the packed archive to avoid unpacking and
> > > > > packaging
> > > > >    (ex. gomod)?
> > > > >  
> > > > If there are archives left after do_unpack, which task is going
> > > > to unpack those? Are we expecting the build process in
> > > > configure/compile to decompress them? Would those management
> > > > tools accept things if they were extracted earlier? "unpack"
> > > > would be the correct time to do it but I can see this getting
> > > > into conflict with the package manager :/.
> > >  
> > > Most package manager expect archives. In the npm case the archive is
> > > unpack by the fetcher and packed by thenpm.bbclass to support
> > > patching. The gomod fetcher doesn't unpack the downloaded archive and
> > > the gomodgit fetcher create archives from git folders during unpack.
> > > It would be possible to always keep the archives or always extract
> > > the archives and recreate archives during build. It is a decision
> > > between performance and patchability.
> > >  
> > >  At the moment it is complicated to work with the different fetcher
> > > because every fetcher use a different concept and it is unclear what
> > > is the desired approach.
> >  
> > This is a challenge. Can we handle the unpacking with the package
> > manager as a specific step or does it have to be combined with other
> > steps like configure/compile?
> >   
> It looks like this is possible:
>  cargo fetch
>  go mod vendor
>  npm install
>  
>  I suspect you're thinking about using the package manager in
> do_unpack to unpack the archives and patch the unpacked archives
> afterwards?

I'm wondering about it, yes. I know we've had challenges with patching
rust modules for example so this isn't a theoretical problem :/.


> > > > I did wonder if patches 1-5 of this series could be merged
> > > > separately too as they look reasonable regardless of the rest
> > > > of the series? 
> > >  
> > > Sure. Should I resend the patches as separate series?
> > 
> > Yes please, that would then let us remove the bits we can easily
> > review/sort and focus on this other part.
> >   
> Done.

Thanks.

> I will also resend the go h1 checksum commit separate because it
> could be useful for the gomod fetcher.

Yes, I was waiting for a new version of that one with the naming tweaked.

> Should I also move the dn / dv parameter patches to a separate series
> because it could be useful without the dependency fetcher. I could
> add the parameters to the fetchers in a backward compatible way.

I need to think more about that one...

Cheers,

Richard





^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-07 11:01         ` Richard Purdie
@ 2025-01-07 16:13           ` Stefan Herbrechtsmeier
  2025-01-07 16:58             ` Bruce Ashfield
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-07 16:13 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel; +Cc: Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 21775 bytes --]

Am 07.01.2025 um 12:01 schrieb Richard Purdie:
> On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote:
>> Am 06.01.2025 um 16:30 schrieb Richard Purdie:
>>> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
>>>>>   I'm a little bit worried about how easily you could sneak a
>>>>> "floating" version into this and make the fetcher non-
>>>>> deterministic. Does (or could?) the code detect and error on
>>>>> that?
>>>>>    
>>>> We could raise an error if a checksum is missing in the
>>>> dependency specification file or make the checksum mandatory for
>>>> the dependency fetcher.  Furthermore we could inspect the
>>>> dependency URLs to detect a misuse of the file like a latest
>>>> string for the version.
>>>>   
>>> I think adding such an error would be a requirement for merging
>>> this.
>>>    
>> Should the dependency fetcher (ex. npmsw) or the language specific
>> fetcher (ex. npm) fail if the version points to a latest version?
> I think right now it has to error to try and reduce complexity. It is
> possible to support such things but you have to pass that version
> information back up the stack so that PV represents the different
> versions and that is a new level of complexity.
>
> I guess we should consider how you could theoretically support it as
> that might influence the design. With multiple git repos in SRC_URI for
> example, we end up adding multiple shortened shas to construct a PV so
> that if any change, PV changes. We also have to add an incrementing
> integer so that on opkg/dpkg/rpm operations work and versions sort.

Okay. In this case we should add the checks to the dependency 
resolution. Thereby we prohibit dynamic versions for the dependencies 
and allows users to add support for it to the fetcher of the package 
manager.

>>>>> Put another way, could one of these SRC_URIs map to multiple
>>>>> different combinations of underlying component versions?
>>>> If you mean the extracted SRC_URI for a single dependency from
>>>> the dependency specification file (ex. npm-shrinkwrap.json) it
>>>> could use special URLs to map to the latest version. But this is
>>>> a missus of the dependency specification file and could be
>>>> detected. The tools generate files with fixed versions always
>>>> because a floating version with a fixed checksum make no senses.
>>> Even if it shouldn't happen, we need to detect and error for this
>>> case as it would become very problematic for us.
>>>
>> Okay. Should we disallow a dynamic version for package manager
>> downloads generally or do you see a reasonable use case?
> See above.
>
>>>>   I also thought it would make sense to generate recipes from the
>>>> dependency specification files and therefore worked on the
>>>> recipetool
>>>> previous. But it looks like the tool isn't really used and I'm
>>>> afraid
>>>> nobody will use the recipe to fix dependencies. In most cases it
>>>> is
>>>> easy to update a dependency in the native tooling and only
>>>> provide an
>>>> updated dependency specification file.
>>>>   
>>>   
>>> I think people have wanted a single simple command to translate the
>>> specification file into our recipe format to update the recipe. For
>>> various reasons people didn't seem to find the recipetool approach
>>> was working and created the task workflow based one. There are pros
>>> and cons to both and I don't have a strong preference. I would like
>>> to see something which makes it clear to users what is going on
>>> though and is simple to use.
>>>
>>> People do intuitively understand a .inc file with a list of urls in
>>> it. There are challenges in updating it.
>>>
>>> This other approach is not as intuitive as everything is abstracted
>>> out of sight.
>>>
>>> One thing for example which worries me is how are the license
>>> fields in the recipe going to be updated?
>>>
>>> Currently, if we teach the class, it can set LICENSE variables
>>> appropriately. With the new approach, you don't know the licenses
>>> until
>>> after unpack has run. Yes it can write it into the SPDX, but it
>>> won't
>>> work for something like the layer index or forms of analysis which
>>> don't build things.
>>>
>>> This does also extend to vulnerability analysis since we can't know
>>> what is in a given recipe without actually unpacking it. For
>>> example we
>>> could know crate XXX at version YYY has a CVE but we can't tell if
>>> a
>>> recipe uses that crate until after do_unpack, or at least not
>>> without
>>> expandurl.
>>>   
>>   
>> The main question is if the meta data should contain all information.
>> If yes, we shouldn't allow any fetcher which requires an external
>> source. This should include the gitsm fetcher and we should replace
>> the single SRC_URI with multiple git SRC_URIs.
>
> If we had tooling that supported that well we could certainly consider
> it. It isn't straight forward as you can have a git repo containing
> submodules which then themselves contain submodules which can then
> contain more levels of submodules. There are therefore multiple levels
> of expansion possible.

Okay. That makes the git submodule special in compare to the other 
dependency fetcher.

>> We can go even further and forbid specific package manager fetchers
>> and use plain https or git SRC_URIs. The python and go-vendor fetcher
>> use this approach.
>>   
>>   Alternative we allow dependency fetchers and require that the meta
>> data be always used via bitbake. In this case we could extend the
>> meta data via the fetcher.
>>   
>>   In both cases it is possible to produce the same meta data. It
>> doesn't matter if we use recipetool, devtool, bbclasses or fetcher.
>> In any case we could resolve the SRC_URIs, checksums or srcrev from a
>> file. The license information could be fetched from the package
>> repositories without integrity checks or could be extracted from the
>> individual package description file inside the downloaded sources
>> (ex. npm). We should skip the license detection from license files
>> for now because they generate manual work and could be discuses
>> later.
> That was the reason the current task based approach doesn't use them,
> yet! I mention it just to highlight that it can be solved either way,
> the approach doesn't really change what we need to do. The bigger
> concern is having information available in the metadata which I think
> we need do to some level regardless of which approach we choose.
>
>> The recipe approach has the advantage that it uses fixed licenses and
>> that license changes could be (theoretical) reviewed during recipe
>> update.
> FWIW that is an important use case and one of our general strengths. We
> can only do that as the license information is written in recipes and
> can be compared at update time.

Does this apply to the license of the every individual dependency or 
only to the combined license?

>> In contrast the fetcher approach reduces the update procedure to a
>> simple file rename or SRCREV update (ex. gitsm). Furthermore, the
>> user could simply place a file beside the recipe to update the
>> dependencies. Could we realize the same via devtool integration and a
>> patch?
> This is effectively what the task based approach is aiming for
> currently. I think the idea was that we could have devtool/recipetool
> integration around that update task, a task was just a convenient way
> to capture the code to do it and get things working without needing the
> tool to be finished.
What is the task based approach? `bitbake -c update xyz`?

>>   We have different solutions between the languages (ex. npmsw vs
>> crate vs pypi) and even inside the languages (ex. go-vendor vs
>> gomod). I would like to unify the dependency support. It doesn't
>> matter if we decide to use the bitbake fetcher or a bitbake / devtool
>> command for the dependency and license resolution.
> I do very much prefer having one good way of doing things rather than
> multiple ways of doing things, each with a potential drawback. I'm
> therefore broadly in favour of doing that as long as we don't upset too
> much existing mindshare along the way.

Okay

>>>>   
>>>>   I have a WIP to integrate the the dependencies into the spdx .
>>>> This
>>>> uses the expanded_urldata / implicit_urldata function to add the
>>>> dependencies to the process list of archiver and spdx.
>>>>   
>>>> https://github.com/weidmueller/poky/tree/feature/dependency-
>>>> fetcher
>>>>
>>>> Regarding the license we could migrate the functionality from
>>>> recipetool into a class and detect the licenses at build time.
>>>> Theoretically the fetcher could fetch the license from the
>>>> package
>>>> manager repository but we have to trust the repository because we
>>>> have no checksum to detect changes. Maybe we could integrate
>>>> tools
>>>> like Syft or ScanCode to detect the licenses at build time. At
>>>> the
>>>> moment the best solution is to make sure that the SBOM contains
>>>> the
>>>> name and version of the dependencies and let other tools handle
>>>> the
>>>> license via SBOM for now. Therefore I propose a common scheme to
>>>> define the dependency name (dn) and version (dv) in the SRC_URI.
>>>>   
>>>   
>>> We could compare what licenses the package manager is showing us
>>> with
>>> what is in the recipe and error if different. There would then need
>>> to
>>> be a command to update the licenses in the recipe (in much the way
>>> urls
>>> currently get updated).
>>>   
>>   
>> Either we request the licenses from the package manager during
>> package update or during fetch. I wouldn't do both. Instead I would
>> analyze the the license file during build and compare the detected
>> license with the recipe or fetcher generated licenses. But the
>> license detection from files is an other topic and I would like to
>> postpone it for now.
> Agreed, I mention it just to highlight that supporting them does have
> impact on the design so any solution needs to ultimately be able to
> support it.
>
>>>>> You're using DL_DIR for that which I
>>>>> suspect isn't a great idea for tmp files.
>>>>   Take over from gitsm.
>>> Probably not the best fetcher and I'd say gitsm should be fixed.
>> I don't see a reason why the gitsm fetcher shouldn't handled like the
>> other dependency fetcher. We could update the handler after we have a
>> decision for the dependency fetchers.
>
> In principle perhaps but as mentioned above, gitsm has its own challenges.

Based on your feedback I have the feeling that a dependency fetcher 
isn't the correct solution. The fetcher makes it impossible to review 
changes during recipe update. Additionally it needs caching for the 
resolved fetch and license data.

The alternative is to create an inc file with SRC_URIs, checksums, 
SRCREVs and LICENSE. Any recommendation how to integrate the dependency 
resolution and inc creation into oe-core?

>>>>> The url scheme is clever but also has a potential risk in that you
>>>>> can't really pass parameters to both the top level fetcher and the
>>>>> underlying one. I'm worried that is going to bite us further down
>>>>> the
>>>>> line.
>>>> At the moment I don't see a real problem but maybe you are right. The
>>>> existing language specific fetcher use fixed paths for there
>>>> downloads.
>>>>   
>>>>   What do you propose? Should the fetcher skip the unpack of the
>>>> source or should we introduce a sub fetcher which uses the download
>>>> from an other SRC_URI entry. The two entries could be linked via the
>>>> name parameter. This approach could be combined with your suggestion
>>>> above. The new fetcher will unpack a lock file from an other
>>>> (default) download.
>>>   
>>> I'm not really sure what is best right now. I'm trying to spell out the
>>> pros/cons of what is going on here in the hope it encourages others to
>>> give feedback as well. I agree there isn't a problem right now but I
>>> worry there soon will be by mixing two things together like this. The
>>> way we handle git protocol does cause us friction with other urls
>>> schemes already.
>> The dependency fetcher could simple skip the unpack. In this case the
>> user needs to use a variable to pass the same URL to the git and
>> dependency fetcher or we could provide a python function to generate
>> two SRC_URI with the same base URL.
>>
> I'm starting to wonder about a slightly different approach, basically
> an optional generated file alongside a recipe which contains "expanded"
> information which is effectively expensive to generate (in computation
> or resource like network access/process terms). We could teach bitbake
> a new phase of parsing where it generated them if missing. There are
> some other pieces of information which we know during the build process
> which it would be helpful to know earlier (e.g. which packages a recipe
> generates). I've wondered about this for a long time and the fetcher
> issues remind me of it again. It would be a big change with advantages
> and drawbacks. I think it would put more pressure on a layer maintainer
> as they'd have to computationally keep this up to date and it would
> complicate the patch workflow (who should send/regen the files?). I'm
> putting the idea there, I'm not saying I think we should do it, I'm
> just considering options.

Do you mean like a cache or like the inc files? Is the file totally auto 
generated or is manual editing acceptable?

>>   = Open questions
>>>>>> * Where should we download dependencies?
>>>>>> ** Should we use a folder per fetcher (ex. git and npm)?
>>>>>> ** Should we use the main folder (ex. crate)?
>>>>>> ** Should we translate the name into folder (ex. gomod)?
>>>>>> ** Should we integrate the name into the filename (ex. git)?
>>>>>>   
>>>>>>   
>>>>>   
>>>>>   
>>>>> DL_DIR is meant to be a complete cache of the source so it would
>>>>> need
>>>>> to be downloaded there. Given it maps to the other fetchers, the
>>>>> existing cache mechanisms likely work for these just fine, the open
>>>>> question is on whether the lock/spec files should be cached after
>>>>> extraction.
>>>>   
>>>> You misunderstand the question. Its about the downloadfilename
>>>> parameter. At the moment some fetcher use sub folder inside DL_DIR
>>>> and others use the main folder. It looks like every fetcher has its
>>>> own concept to handle file collision between different fetchers. The
>>>> git and npm fetcher use there own folder, the crate fetcher use its
>>>> own .crate file prefix, the gomod fetcher translate the URL into
>>>> multiple folders and the git fetcher translate the URL into a single
>>>> folder name.
>>> That makes more sense. The layout is partially legacy. The wget and
>>> local fetchers were first and hence go directly into DL_DIR. git/svn
>>> were separated out into their own directories with a plan to have a
>>> directory per fetcher. That didn't always work out with each newer
>>> fetcher. Each fetcher does have to handle a unique naming of its urls
>>> as only the specific fetcher can know all the urls parameters and which
>>> ones affect the output vs which ones don't.
>>>   
>>   This doesn't explain why the npm but not the gomod and crate fetcher
>> use a sub folder. All fetchers are based on the wget fetcher.
> That is probably "my fault". Put yourself in my position. You get a ton
> of different patches, all touching very varied aspects of the system.
> When reviewing them you have to try and remember the original design
> decisions, the future directions, the ways things broke in the past, a
> desire to try and have clean consistent APIs and so on. I have tried
> very hard to move things in a direction where things incrementally
> improve, without unnecessarily blocking new features. It means that
> things that merge often aren't perfect. We've tried a few different
> approaches with the newer programming languages and each approach has
> had pros and cons. The inconsistency is probably as I missed something
> in review. Sorry :(.

Sorry, I don't want to criticism you. I see that you have a lot of work. 
I want to understand the reasons for the actual design and how it should 
look like.

> I only have finite time. There are few people who seem to want to dive
> in and help with review of patches like these. I did ask some people
> yesterday, one told me they simply couldn't understand these patches.

What can I do to improve the review?

> I'm doing my best to ask the right questions, try and help others
> understand them, ensure my own concerns I can identify are resolved and
> I don't want to de-motivate you on this work either, I think the idea
> of improving this is great and I'd love to see it. Equally, I'm also
> the first person everyone will complain to if we change something and
> it causes problems for people.
>
> So the explanation is probably I just missed something in review at
> some point. The intent was to separate out the fetcher output going
> forward (unless it makes sense to be shared).
>
> FWIW there are multiple things which bother me about the existing
> fetcher storage layout but that is a different discussion.

Okay.

>>>>>> * Where should we unpack the dependencies?
>>>>>> ** Should we use a folder inside the parent folder (ex.
>>>>>> node_modules)?
>>>>>> ** Should we use a fixed folder inside unpackdir
>>>>>>     (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>>>>>   
>>>>> This likely depends on the fetcher as the different mechanisms will
>>>>> have different expectations about how they should be extracted (as
>>>>> npm/etc. would).
>>>>   
>>>> It depends on the fetcher but the fetcher could use the same
>>>> approach. At the moment every fetcher use a different approach. The
>>>> crate fetcher use a fixed value. The gomod fetcher uses a variable
>>>> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
>>>> Furthermore the gomod fetcher override the common subdir parameter.
>>> I think we really need to standardise that if we can. Each new fetcher
>>> has claimed a certain approach is effectively required by the package
>>> manager.
>
>>>   What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI?
> I suspect we need a default via a variable and then the option to
> change the default via parameters. The default value should be a
> bitbake fetcher namespaced control variable.
>
> I'm wary of making a definitive statement saying X if that isn't going
> to make sense for some backend though. I simply don't have enough
> knowledge of them all, which is why you see me being reluctant to make
> definitive statements about design.

Okay.

>>>>>> * How should we treat archives for package manager caches?
>>>>>> ** Should we unpack the archives to support patching (ex. npm)?
>>>>>> ** Should we copy the packed archive to avoid unpacking and
>>>>>> packaging
>>>>>>     (ex. gomod)?
>>>>>>   
>>>>> If there are archives left after do_unpack, which task is going
>>>>> to unpack those? Are we expecting the build process in
>>>>> configure/compile to decompress them? Would those management
>>>>> tools accept things if they were extracted earlier? "unpack"
>>>>> would be the correct time to do it but I can see this getting
>>>>> into conflict with the package manager :/.
>>>>   
>>>> Most package manager expect archives. In the npm case the archive is
>>>> unpack by the fetcher and packed by thenpm.bbclass to support
>>>> patching. The gomod fetcher doesn't unpack the downloaded archive and
>>>> the gomodgit fetcher create archives from git folders during unpack.
>>>> It would be possible to always keep the archives or always extract
>>>> the archives and recreate archives during build. It is a decision
>>>> between performance and patchability.
>>>>   
>>>>   At the moment it is complicated to work with the different fetcher
>>>> because every fetcher use a different concept and it is unclear what
>>>> is the desired approach.
>>>   
>>> This is a challenge. Can we handle the unpacking with the package
>>> manager as a specific step or does it have to be combined with other
>>> steps like configure/compile?
>>>    
>> It looks like this is possible:
>>   cargo fetch
>>   go mod vendor
>>   npm install
>>   
>>   I suspect you're thinking about using the package manager in
>> do_unpack to unpack the archives and patch the unpacked archives
>> afterwards?
> I'm wondering about it, yes. I know we've had challenges with patching
> rust modules for example so this isn't a theoretical problem :/.

It is an interesting idea because most package manager check the 
integrity before unpack. Additionally it should simplify and speed up 
the npm build because it removes the repack of the packages. The problem 
is that we need an additional task to patch the dependency specification 
file and to unpack the file.

>>>>> I did wonder if patches 1-5 of this series could be merged
>>>>> separately too as they look reasonable regardless of the rest
>>>>> of the series?
>>>>   
>>>> Sure. Should I resend the patches as separate series?
>>> Yes please, that would then let us remove the bits we can easily
>>> review/sort and focus on this other part.
>>>    
>> Done.
> Thanks.
>
>> I will also resend the go h1 checksum commit separate because it
>> could be useful for the gomod fetcher.
> Yes, I was waiting for a new version of that one with the naming tweaked.

Done.

>> Should I also move the dn / dv parameter patches to a separate series
>> because it could be useful without the dependency fetcher. I could
>> add the parameters to the fetchers in a backward compatible way.
> I need to think more about that one...

The motivation is to include the dependencies with name, version, 
license and cpe into the SBOM.

Regards
   Stefan

[-- Attachment #2: Type: text/html, Size: 29770 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-07 16:13           ` Stefan Herbrechtsmeier
@ 2025-01-07 16:58             ` Bruce Ashfield
  2025-01-07 17:46               ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Bruce Ashfield @ 2025-01-07 16:58 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss
  Cc: Richard Purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 23730 bytes --]

Hi all,

I'm going to reply at this point in the thread to at least let everyone
know that I've been reading along, but honestly can't say if a few
questions that I have have been asked (and answered).

The biggest use case that I have for the layers and recipes that I maintain
is about being able to both "easily" patch or update vendor/dependencies of
the main application build.

It was unclear to me how I'd do that with these changes.

For the copied/extracted dependencies, I can see that you'd just be able to
figure out where they were extracted (and I see the discussions on where to
extract/store some of the files) and then write a patch as you would with
any recipe. But would there be a way to patch the dependency "lock file" ?
I definitely don't see a way that I'd be able to tweak a source hash and
have an updated dependency pulled in .. but I could have easily missed that.

Those are the primary reasons why I'll stay with explicitly listed /
visible dependencies, unless something similar is available in a re-worked
/ unified fetcher.

I prefer the translation to git, so I have debug source for vendor
dependencies as well as a well travelled path to mirror and archive the
source, but something like the update task of rust is at least explicit and
visible to me, so I can also use it without too many issues.

Bruce


On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via
lists.openembedded.org <stefan.herbrechtsmeier-oss=
weidmueller.com@lists.openembedded.org> wrote:

> Am 07.01.2025 um 12:01 schrieb Richard Purdie:
>
> On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote:
>
> Am 06.01.2025 um 16:30 schrieb Richard Purdie:
>
> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
>
>  I'm a little bit worried about how easily you could sneak a
> "floating" version into this and make the fetcher non-
> deterministic. Does (or could?) the code detect and error on
> that?
>
>
> We could raise an error if a checksum is missing in the
> dependency specification file or make the checksum mandatory for
> the dependency fetcher.  Furthermore we could inspect the
> dependency URLs to detect a misuse of the file like a latest
> string for the version.
>
>
> I think adding such an error would be a requirement for merging
> this.
>
>
> Should the dependency fetcher (ex. npmsw) or the language specific
> fetcher (ex. npm) fail if the version points to a latest version?
>
> I think right now it has to error to try and reduce complexity. It is
> possible to support such things but you have to pass that version
> information back up the stack so that PV represents the different
> versions and that is a new level of complexity.
>
> I guess we should consider how you could theoretically support it as
> that might influence the design. With multiple git repos in SRC_URI for
> example, we end up adding multiple shortened shas to construct a PV so
> that if any change, PV changes. We also have to add an incrementing
> integer so that on opkg/dpkg/rpm operations work and versions sort.
>
> Okay. In this case we should add the checks to the dependency resolution.
> Thereby we prohibit dynamic versions for the dependencies and allows users
> to add support for it to the fetcher of the package manager.
>
> Put another way, could one of these SRC_URIs map to multiple
> different combinations of underlying component versions?
>
> If you mean the extracted SRC_URI for a single dependency from
> the dependency specification file (ex. npm-shrinkwrap.json) it
> could use special URLs to map to the latest version. But this is
> a missus of the dependency specification file and could be
> detected. The tools generate files with fixed versions always
> because a floating version with a fixed checksum make no senses.
>
> Even if it shouldn't happen, we need to detect and error for this
> case as it would become very problematic for us.
>
>
> Okay. Should we disallow a dynamic version for package manager
> downloads generally or do you see a reasonable use case?
>
> See above.
>
>
>  I also thought it would make sense to generate recipes from the
> dependency specification files and therefore worked on the
> recipetool
> previous. But it looks like the tool isn't really used and I'm
> afraid
> nobody will use the recipe to fix dependencies. In most cases it
> is
> easy to update a dependency in the native tooling and only
> provide an
> updated dependency specification file.
>
>
>
> I think people have wanted a single simple command to translate the
> specification file into our recipe format to update the recipe. For
> various reasons people didn't seem to find the recipetool approach
> was working and created the task workflow based one. There are pros
> and cons to both and I don't have a strong preference. I would like
> to see something which makes it clear to users what is going on
> though and is simple to use.
>
> People do intuitively understand a .inc file with a list of urls in
> it. There are challenges in updating it.
>
> This other approach is not as intuitive as everything is abstracted
> out of sight.
>
> One thing for example which worries me is how are the license
> fields in the recipe going to be updated?
>
> Currently, if we teach the class, it can set LICENSE variables
> appropriately. With the new approach, you don't know the licenses
> until
> after unpack has run. Yes it can write it into the SPDX, but it
> won't
> work for something like the layer index or forms of analysis which
> don't build things.
>
> This does also extend to vulnerability analysis since we can't know
> what is in a given recipe without actually unpacking it. For
> example we
> could know crate XXX at version YYY has a CVE but we can't tell if
> a
> recipe uses that crate until after do_unpack, or at least not
> without
> expandurl.
>
>
>
> The main question is if the meta data should contain all information.
> If yes, we shouldn't allow any fetcher which requires an external
> source. This should include the gitsm fetcher and we should replace
> the single SRC_URI with multiple git SRC_URIs.
>
> If we had tooling that supported that well we could certainly consider
> it. It isn't straight forward as you can have a git repo containing
> submodules which then themselves contain submodules which can then
> contain more levels of submodules. There are therefore multiple levels
> of expansion possible.
>
> Okay. That makes the git submodule special in compare to the other
> dependency fetcher.
>
> We can go even further and forbid specific package manager fetchers
> and use plain https or git SRC_URIs. The python and go-vendor fetcher
> use this approach.
>
>  Alternative we allow dependency fetchers and require that the meta
> data be always used via bitbake. In this case we could extend the
> meta data via the fetcher.
>
>  In both cases it is possible to produce the same meta data. It
> doesn't matter if we use recipetool, devtool, bbclasses or fetcher.
> In any case we could resolve the SRC_URIs, checksums or srcrev from a
> file. The license information could be fetched from the package
> repositories without integrity checks or could be extracted from the
> individual package description file inside the downloaded sources
> (ex. npm). We should skip the license detection from license files
> for now because they generate manual work and could be discuses
> later.
>
> That was the reason the current task based approach doesn't use them,
> yet! I mention it just to highlight that it can be solved either way,
> the approach doesn't really change what we need to do. The bigger
> concern is having information available in the metadata which I think
> we need do to some level regardless of which approach we choose.
>
>
> The recipe approach has the advantage that it uses fixed licenses and
> that license changes could be (theoretical) reviewed during recipe
> update.
>
> FWIW that is an important use case and one of our general strengths. We
> can only do that as the license information is written in recipes and
> can be compared at update time.
>
> Does this apply to the license of the every individual dependency or only
> to the combined license?
>
> In contrast the fetcher approach reduces the update procedure to a
> simple file rename or SRCREV update (ex. gitsm). Furthermore, the
> user could simply place a file beside the recipe to update the
> dependencies. Could we realize the same via devtool integration and a
> patch?
>
> This is effectively what the task based approach is aiming for
> currently. I think the idea was that we could have devtool/recipetool
> integration around that update task, a task was just a convenient way
> to capture the code to do it and get things working without needing the
> tool to be finished.
>
> What is the task based approach? `bitbake -c update xyz`?
>
>  We have different solutions between the languages (ex. npmsw vs
> crate vs pypi) and even inside the languages (ex. go-vendor vs
> gomod). I would like to unify the dependency support. It doesn't
> matter if we decide to use the bitbake fetcher or a bitbake / devtool
> command for the dependency and license resolution.
>
> I do very much prefer having one good way of doing things rather than
> multiple ways of doing things, each with a potential drawback. I'm
> therefore broadly in favour of doing that as long as we don't upset too
> much existing mindshare along the way.
>
> Okay
>
>
>  I have a WIP to integrate the the dependencies into the spdx .
> This
> uses the expanded_urldata / implicit_urldata function to add the
> dependencies to the process list of archiver and spdx.
>  https://github.com/weidmueller/poky/tree/feature/dependency-
> fetcher
>
> Regarding the license we could migrate the functionality from
> recipetool into a class and detect the licenses at build time.
> Theoretically the fetcher could fetch the license from the
> package
> manager repository but we have to trust the repository because we
> have no checksum to detect changes. Maybe we could integrate
> tools
> like Syft or ScanCode to detect the licenses at build time. At
> the
> moment the best solution is to make sure that the SBOM contains
> the
> name and version of the dependencies and let other tools handle
> the
> license via SBOM for now. Therefore I propose a common scheme to
> define the dependency name (dn) and version (dv) in the SRC_URI.
>
>
>
> We could compare what licenses the package manager is showing us
> with
> what is in the recipe and error if different. There would then need
> to
> be a command to update the licenses in the recipe (in much the way
> urls
> currently get updated).
>
>
>
> Either we request the licenses from the package manager during
> package update or during fetch. I wouldn't do both. Instead I would
> analyze the the license file during build and compare the detected
> license with the recipe or fetcher generated licenses. But the
> license detection from files is an other topic and I would like to
> postpone it for now.
>
> Agreed, I mention it just to highlight that supporting them does have
> impact on the design so any solution needs to ultimately be able to
> support it.
>
>
> You're using DL_DIR for that which I
> suspect isn't a great idea for tmp files.
>
>  Take over from gitsm.
>
> Probably not the best fetcher and I'd say gitsm should be fixed.
>
> I don't see a reason why the gitsm fetcher shouldn't handled like the
> other dependency fetcher. We could update the handler after we have a
> decision for the dependency fetchers.
>
> In principle perhaps but as mentioned above, gitsm has its own challenges.
>
> Based on your feedback I have the feeling that a dependency fetcher isn't
> the correct solution. The fetcher makes it impossible to review changes
> during recipe update. Additionally it needs caching for the resolved fetch
> and license data.
>
> The alternative is to create an inc file with SRC_URIs, checksums, SRCREVs
> and LICENSE. Any recommendation how to integrate the dependency resolution
> and inc creation into oe-core?
>
> The url scheme is clever but also has a potential risk in that you
> can't really pass parameters to both the top level fetcher and the
> underlying one. I'm worried that is going to bite us further down
> the
> line.
>
> At the moment I don't see a real problem but maybe you are right. The
> existing language specific fetcher use fixed paths for there
> downloads.
>
>  What do you propose? Should the fetcher skip the unpack of the
> source or should we introduce a sub fetcher which uses the download
> from an other SRC_URI entry. The two entries could be linked via the
> name parameter. This approach could be combined with your suggestion
> above. The new fetcher will unpack a lock file from an other
> (default) download.
>
>
> I'm not really sure what is best right now. I'm trying to spell out the
> pros/cons of what is going on here in the hope it encourages others to
> give feedback as well. I agree there isn't a problem right now but I
> worry there soon will be by mixing two things together like this. The
> way we handle git protocol does cause us friction with other urls
> schemes already.
>
> The dependency fetcher could simple skip the unpack. In this case the
> user needs to use a variable to pass the same URL to the git and
> dependency fetcher or we could provide a python function to generate
> two SRC_URI with the same base URL.
>
>
> I'm starting to wonder about a slightly different approach, basically
> an optional generated file alongside a recipe which contains "expanded"
> information which is effectively expensive to generate (in computation
> or resource like network access/process terms). We could teach bitbake
> a new phase of parsing where it generated them if missing. There are
> some other pieces of information which we know during the build process
> which it would be helpful to know earlier (e.g. which packages a recipe
> generates). I've wondered about this for a long time and the fetcher
> issues remind me of it again. It would be a big change with advantages
> and drawbacks. I think it would put more pressure on a layer maintainer
> as they'd have to computationally keep this up to date and it would
> complicate the patch workflow (who should send/regen the files?). I'm
> putting the idea there, I'm not saying I think we should do it, I'm
> just considering options.
>
> Do you mean like a cache or like the inc files? Is the file totally auto
> generated or is manual editing acceptable?
>
>  = Open questions
>
> * Where should we download dependencies?
> ** Should we use a folder per fetcher (ex. git and npm)?
> ** Should we use the main folder (ex. crate)?
> ** Should we translate the name into folder (ex. gomod)?
> ** Should we integrate the name into the filename (ex. git)?
>
>
>
>
>
> DL_DIR is meant to be a complete cache of the source so it would
> need
> to be downloaded there. Given it maps to the other fetchers, the
> existing cache mechanisms likely work for these just fine, the open
> question is on whether the lock/spec files should be cached after
> extraction.
>
>
> You misunderstand the question. Its about the downloadfilename
> parameter. At the moment some fetcher use sub folder inside DL_DIR
> and others use the main folder. It looks like every fetcher has its
> own concept to handle file collision between different fetchers. The
> git and npm fetcher use there own folder, the crate fetcher use its
> own .crate file prefix, the gomod fetcher translate the URL into
> multiple folders and the git fetcher translate the URL into a single
> folder name.
>
> That makes more sense. The layout is partially legacy. The wget and
> local fetchers were first and hence go directly into DL_DIR. git/svn
> were separated out into their own directories with a plan to have a
> directory per fetcher. That didn't always work out with each newer
> fetcher. Each fetcher does have to handle a unique naming of its urls
> as only the specific fetcher can know all the urls parameters and which
> ones affect the output vs which ones don't.
>
>
>  This doesn't explain why the npm but not the gomod and crate fetcher
> use a sub folder. All fetchers are based on the wget fetcher.
>
> That is probably "my fault". Put yourself in my position. You get a ton
> of different patches, all touching very varied aspects of the system.
> When reviewing them you have to try and remember the original design
> decisions, the future directions, the ways things broke in the past, a
> desire to try and have clean consistent APIs and so on. I have tried
> very hard to move things in a direction where things incrementally
> improve, without unnecessarily blocking new features. It means that
> things that merge often aren't perfect. We've tried a few different
> approaches with the newer programming languages and each approach has
> had pros and cons. The inconsistency is probably as I missed something
> in review. Sorry :(.
>
> Sorry, I don't want to criticism you. I see that you have a lot of work. I
> want to understand the reasons for the actual design and how it should look
> like.
>
> I only have finite time. There are few people who seem to want to dive
> in and help with review of patches like these. I did ask some people
> yesterday, one told me they simply couldn't understand these patches.
>
> What can I do to improve the review?
>
> I'm doing my best to ask the right questions, try and help others
> understand them, ensure my own concerns I can identify are resolved and
> I don't want to de-motivate you on this work either, I think the idea
> of improving this is great and I'd love to see it. Equally, I'm also
> the first person everyone will complain to if we change something and
> it causes problems for people.
>
> So the explanation is probably I just missed something in review at
> some point. The intent was to separate out the fetcher output going
> forward (unless it makes sense to be shared).
>
> FWIW there are multiple things which bother me about the existing
> fetcher storage layout but that is a different discussion.
>
> Okay.
>
> * Where should we unpack the dependencies?
> ** Should we use a folder inside the parent folder (ex.
> node_modules)?
> ** Should we use a fixed folder inside unpackdir
>    (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>
>
> This likely depends on the fetcher as the different mechanisms will
> have different expectations about how they should be extracted (as
> npm/etc. would).
>
>
> It depends on the fetcher but the fetcher could use the same
> approach. At the moment every fetcher use a different approach. The
> crate fetcher use a fixed value. The gomod fetcher uses a variable
> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
> Furthermore the gomod fetcher override the common subdir parameter.
>
> I think we really need to standardise that if we can. Each new fetcher
> has claimed a certain approach is effectively required by the package
> manager.
>
>  What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI?
>
> I suspect we need a default via a variable and then the option to
> change the default via parameters. The default value should be a
> bitbake fetcher namespaced control variable.
>
> I'm wary of making a definitive statement saying X if that isn't going
> to make sense for some backend though. I simply don't have enough
> knowledge of them all, which is why you see me being reluctant to make
> definitive statements about design.
>
> Okay.
>
> * How should we treat archives for package manager caches?
> ** Should we unpack the archives to support patching (ex. npm)?
> ** Should we copy the packed archive to avoid unpacking and
> packaging
>    (ex. gomod)?
>
>
> If there are archives left after do_unpack, which task is going
> to unpack those? Are we expecting the build process in
> configure/compile to decompress them? Would those management
> tools accept things if they were extracted earlier? "unpack"
> would be the correct time to do it but I can see this getting
> into conflict with the package manager :/.
>
>
> Most package manager expect archives. In the npm case the archive is
> unpack by the fetcher and packed by thenpm.bbclass to support
> patching. The gomod fetcher doesn't unpack the downloaded archive and
> the gomodgit fetcher create archives from git folders during unpack.
> It would be possible to always keep the archives or always extract
> the archives and recreate archives during build. It is a decision
> between performance and patchability.
>
>  At the moment it is complicated to work with the different fetcher
> because every fetcher use a different concept and it is unclear what
> is the desired approach.
>
>
> This is a challenge. Can we handle the unpacking with the package
> manager as a specific step or does it have to be combined with other
> steps like configure/compile?
>
>
> It looks like this is possible:
>  cargo fetch
>  go mod vendor
>  npm install
>
>  I suspect you're thinking about using the package manager in
> do_unpack to unpack the archives and patch the unpacked archives
> afterwards?
>
> I'm wondering about it, yes. I know we've had challenges with patching
> rust modules for example so this isn't a theoretical problem :/.
>
> It is an interesting idea because most package manager check the integrity
> before unpack. Additionally it should simplify and speed up the npm build
> because it removes the repack of the packages. The problem is that we need
> an additional task to patch the dependency specification file and to unpack
> the file.
>
> I did wonder if patches 1-5 of this series could be merged
> separately too as they look reasonable regardless of the rest
> of the series?
>
>
> Sure. Should I resend the patches as separate series?
>
> Yes please, that would then let us remove the bits we can easily
> review/sort and focus on this other part.
>
>
> Done.
>
> Thanks.
>
>
> I will also resend the go h1 checksum commit separate because it
> could be useful for the gomod fetcher.
>
> Yes, I was waiting for a new version of that one with the naming tweaked.
>
> Done.
>
> Should I also move the dn / dv parameter patches to a separate series
> because it could be useful without the dependency fetcher. I could
> add the parameters to the fetchers in a backward compatible way.
>
> I need to think more about that one...
>
> The motivation is to include the dependencies with name, version, license
> and cpe into the SBOM.
>
> Regards
>   Stefan
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16981):
> https://lists.openembedded.org/g/bitbake-devel/message/16981
> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> bruce.ashfield@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II

[-- Attachment #2: Type: text/html, Size: 31088 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-07 16:58             ` Bruce Ashfield
@ 2025-01-07 17:46               ` Stefan Herbrechtsmeier
  2025-01-08 15:43                 ` Bruce Ashfield
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-07 17:46 UTC (permalink / raw)
  To: Bruce Ashfield; +Cc: Richard Purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 26859 bytes --]

Am 07.01.2025 um 17:58 schrieb Bruce Ashfield:
> Hi all,
>
> I'm going to reply at this point in the thread to at least let 
> everyone know that I've been reading along, but honestly can't say if 
> a few questions that I have have been asked (and answered).
>
> The biggest use case that I have for the layers and recipes that I 
> maintain is about being able to both "easily" patch or update 
> vendor/dependencies of the main application build.
>
> It was unclear to me how I'd do that with these changes.
>
> For the copied/extracted dependencies, I can see that you'd just be 
> able to figure out where they were extracted (and I see the 
> discussions on where to extract/store some of the files) and then 
> write a patch as you would with any recipe. But would there be a way 
> to patch the dependency "lock file" ? I definitely don't see a way 
> that I'd be able to tweak a source hash and have an updated dependency 
> pulled in .. but I could have easily missed that.

You have to provide your own "lock file" and place it beside the recipe. 
The "lock file" is fetched via the file fetcher and is used to fetch the 
dependencies.

> Those are the primary reasons why I'll stay with explicitly listed / 
> visible dependencies, unless something similar is available in a 
> re-worked / unified fetcher.

It is impossible to patch the sources inside bitbake. Therefore the 
dependency resolution must be moved inside a dependency fetch task and 
an additional dependency patch task need to be added.

> I prefer the translation to git, so I have debug source for vendor 
> dependencies as well as a well travelled path to mirror and archive 
> the source

Do you reference to the go-vendor implementation? Do you mean the vendor 
directory? The gomod fetcher should support mirror and archive the 
sources. It should be possible to create a vendor folder from the gomod 
archives.

> , but something like the update task of rust is at least explicit and 
> visible to me, so I can also use it without too many issues.

Do you mean `bitbake -c update_crates recipe-name`?

Regards
   Stefan

> On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via 
> lists.openembedded.org <http://lists.openembedded.org> 
> <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> wrote:
>
>     Am 07.01.2025 um 12:01 schrieb Richard Purdie:
>>     On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote:
>>>     Am 06.01.2025 um 16:30 schrieb Richard Purdie:
>>>>     On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
>>>>>>       I'm a little bit worried about how easily you could sneak a
>>>>>>     "floating" version into this and make the fetcher non-
>>>>>>     deterministic. Does (or could?) the code detect and error on
>>>>>>     that?
>>>>>>        
>>>>>     We could raise an error if a checksum is missing in the
>>>>>     dependency specification file or make the checksum mandatory for
>>>>>     the dependency fetcher.  Furthermore we could inspect the
>>>>>     dependency URLs to detect a misuse of the file like a latest
>>>>>     string for the version.
>>>>>       
>>>>     I think adding such an error would be a requirement for merging
>>>>     this.
>>>>        
>>>     Should the dependency fetcher (ex. npmsw) or the language specific
>>>     fetcher (ex. npm) fail if the version points to a latest version?
>>     I think right now it has to error to try and reduce complexity. It is
>>     possible to support such things but you have to pass that version
>>     information back up the stack so that PV represents the different
>>     versions and that is a new level of complexity.
>>
>>     I guess we should consider how you could theoretically support it as
>>     that might influence the design. With multiple git repos in SRC_URI for
>>     example, we end up adding multiple shortened shas to construct a PV so
>>     that if any change, PV changes. We also have to add an incrementing
>>     integer so that on opkg/dpkg/rpm operations work and versions sort.
>
>     Okay. In this case we should add the checks to the dependency
>     resolution. Thereby we prohibit dynamic versions for the
>     dependencies and allows users to add support for it to the fetcher
>     of the package manager.
>
>>>>>>     Put another way, could one of these SRC_URIs map to multiple
>>>>>>     different combinations of underlying component versions?
>>>>>     If you mean the extracted SRC_URI for a single dependency from
>>>>>     the dependency specification file (ex. npm-shrinkwrap.json) it
>>>>>     could use special URLs to map to the latest version. But this is
>>>>>     a missus of the dependency specification file and could be
>>>>>     detected. The tools generate files with fixed versions always
>>>>>     because a floating version with a fixed checksum make no senses.
>>>>     Even if it shouldn't happen, we need to detect and error for this
>>>>     case as it would become very problematic for us.
>>>>
>>>     Okay. Should we disallow a dynamic version for package manager
>>>     downloads generally or do you see a reasonable use case?
>>     See above.
>>
>>>>>       I also thought it would make sense to generate recipes from the
>>>>>     dependency specification files and therefore worked on the
>>>>>     recipetool
>>>>>     previous. But it looks like the tool isn't really used and I'm
>>>>>     afraid
>>>>>     nobody will use the recipe to fix dependencies. In most cases it
>>>>>     is
>>>>>     easy to update a dependency in the native tooling and only
>>>>>     provide an
>>>>>     updated dependency specification file.
>>>>>       
>>>>       
>>>>     I think people have wanted a single simple command to translate the
>>>>     specification file into our recipe format to update the recipe. For
>>>>     various reasons people didn't seem to find the recipetool approach
>>>>     was working and created the task workflow based one. There are pros
>>>>     and cons to both and I don't have a strong preference. I would like
>>>>     to see something which makes it clear to users what is going on
>>>>     though and is simple to use.
>>>>
>>>>     People do intuitively understand a .inc file with a list of urls in
>>>>     it. There are challenges in updating it.
>>>>
>>>>     This other approach is not as intuitive as everything is abstracted
>>>>     out of sight.
>>>>
>>>>     One thing for example which worries me is how are the license
>>>>     fields in the recipe going to be updated?
>>>>
>>>>     Currently, if we teach the class, it can set LICENSE variables
>>>>     appropriately. With the new approach, you don't know the licenses
>>>>     until
>>>>     after unpack has run. Yes it can write it into the SPDX, but it
>>>>     won't
>>>>     work for something like the layer index or forms of analysis which
>>>>     don't build things.
>>>>
>>>>     This does also extend to vulnerability analysis since we can't know
>>>>     what is in a given recipe without actually unpacking it. For
>>>>     example we
>>>>     could know crate XXX at version YYY has a CVE but we can't tell if
>>>>     a
>>>>     recipe uses that crate until after do_unpack, or at least not
>>>>     without
>>>>     expandurl.
>>>>       
>>>       
>>>     The main question is if the meta data should contain all information.
>>>     If yes, we shouldn't allow any fetcher which requires an external
>>>     source. This should include the gitsm fetcher and we should replace
>>>     the single SRC_URI with multiple git SRC_URIs.
>>     If we had tooling that supported that well we could certainly consider
>>     it. It isn't straight forward as you can have a git repo containing
>>     submodules which then themselves contain submodules which can then
>>     contain more levels of submodules. There are therefore multiple levels
>>     of expansion possible.
>
>     Okay. That makes the git submodule special in compare to the other
>     dependency fetcher.
>
>>>     We can go even further and forbid specific package manager fetchers
>>>     and use plain https or git SRC_URIs. The python and go-vendor fetcher
>>>     use this approach.
>>>       
>>>       Alternative we allow dependency fetchers and require that the meta
>>>     data be always used via bitbake. In this case we could extend the
>>>     meta data via the fetcher.
>>>       
>>>       In both cases it is possible to produce the same meta data. It
>>>     doesn't matter if we use recipetool, devtool, bbclasses or fetcher.
>>>     In any case we could resolve the SRC_URIs, checksums or srcrev from a
>>>     file. The license information could be fetched from the package
>>>     repositories without integrity checks or could be extracted from the
>>>     individual package description file inside the downloaded sources
>>>     (ex. npm). We should skip the license detection from license files
>>>     for now because they generate manual work and could be discuses
>>>     later.
>>     That was the reason the current task based approach doesn't use them,
>>     yet! I mention it just to highlight that it can be solved either way,
>>     the approach doesn't really change what we need to do. The bigger
>>     concern is having information available in the metadata which I think
>>     we need do to some level regardless of which approach we choose.
>>
>>>     The recipe approach has the advantage that it uses fixed licenses and
>>>     that license changes could be (theoretical) reviewed during recipe
>>>     update.
>>     FWIW that is an important use case and one of our general strengths. We
>>     can only do that as the license information is written in recipes and
>>     can be compared at update time.
>
>     Does this apply to the license of the every individual dependency
>     or only to the combined license?
>
>>>     In contrast the fetcher approach reduces the update procedure to a
>>>     simple file rename or SRCREV update (ex. gitsm). Furthermore, the
>>>     user could simply place a file beside the recipe to update the
>>>     dependencies. Could we realize the same via devtool integration and a
>>>     patch?
>>     This is effectively what the task based approach is aiming for
>>     currently. I think the idea was that we could have devtool/recipetool
>>     integration around that update task, a task was just a convenient way
>>     to capture the code to do it and get things working without needing the
>>     tool to be finished.
>     What is the task based approach? `bitbake -c update xyz`?
>
>>>       We have different solutions between the languages (ex. npmsw vs
>>>     crate vs pypi) and even inside the languages (ex. go-vendor vs
>>>     gomod). I would like to unify the dependency support. It doesn't
>>>     matter if we decide to use the bitbake fetcher or a bitbake / devtool
>>>     command for the dependency and license resolution.
>>     I do very much prefer having one good way of doing things rather than
>>     multiple ways of doing things, each with a potential drawback. I'm
>>     therefore broadly in favour of doing that as long as we don't upset too
>>     much existing mindshare along the way.
>
>     Okay
>
>>>>>       
>>>>>       I have a WIP to integrate the the dependencies into the spdx .
>>>>>     This
>>>>>     uses the expanded_urldata / implicit_urldata function to add the
>>>>>     dependencies to the process list of archiver and spdx.
>>>>>       
>>>>>     https://github.com/weidmueller/poky/tree/feature/dependency-
>>>>>     fetcher
>>>>>
>>>>>     Regarding the license we could migrate the functionality from
>>>>>     recipetool into a class and detect the licenses at build time.
>>>>>     Theoretically the fetcher could fetch the license from the
>>>>>     package
>>>>>     manager repository but we have to trust the repository because we
>>>>>     have no checksum to detect changes. Maybe we could integrate
>>>>>     tools
>>>>>     like Syft or ScanCode to detect the licenses at build time. At
>>>>>     the
>>>>>     moment the best solution is to make sure that the SBOM contains
>>>>>     the
>>>>>     name and version of the dependencies and let other tools handle
>>>>>     the
>>>>>     license via SBOM for now. Therefore I propose a common scheme to
>>>>>     define the dependency name (dn) and version (dv) in the SRC_URI.
>>>>>       
>>>>       
>>>>     We could compare what licenses the package manager is showing us
>>>>     with
>>>>     what is in the recipe and error if different. There would then need
>>>>     to
>>>>     be a command to update the licenses in the recipe (in much the way
>>>>     urls
>>>>     currently get updated).
>>>>       
>>>       
>>>     Either we request the licenses from the package manager during
>>>     package update or during fetch. I wouldn't do both. Instead I would
>>>     analyze the the license file during build and compare the detected
>>>     license with the recipe or fetcher generated licenses. But the
>>>     license detection from files is an other topic and I would like to
>>>     postpone it for now.
>>     Agreed, I mention it just to highlight that supporting them does have
>>     impact on the design so any solution needs to ultimately be able to
>>     support it.
>>
>>>>>>     You're using DL_DIR for that which I
>>>>>>     suspect isn't a great idea for tmp files.
>>>>>       Take over from gitsm.
>>>>     Probably not the best fetcher and I'd say gitsm should be fixed.
>>>     I don't see a reason why the gitsm fetcher shouldn't handled like the
>>>     other dependency fetcher. We could update the handler after we have a
>>>     decision for the dependency fetchers.
>>     In principle perhaps but as mentioned above, gitsm has its own challenges.
>
>     Based on your feedback I have the feeling that a dependency
>     fetcher isn't the correct solution. The fetcher makes it
>     impossible to review changes during recipe update. Additionally it
>     needs caching for the resolved fetch and license data.
>
>     The alternative is to create an inc file with SRC_URIs, checksums,
>     SRCREVs and LICENSE. Any recommendation how to integrate the
>     dependency resolution and inc creation into oe-core?
>
>>>>>>     The url scheme is clever but also has a potential risk in that you
>>>>>>     can't really pass parameters to both the top level fetcher and the
>>>>>>     underlying one. I'm worried that is going to bite us further down
>>>>>>     the
>>>>>>     line.
>>>>>     At the moment I don't see a real problem but maybe you are right. The
>>>>>     existing language specific fetcher use fixed paths for there
>>>>>     downloads.
>>>>>       
>>>>>       What do you propose? Should the fetcher skip the unpack of the
>>>>>     source or should we introduce a sub fetcher which uses the download
>>>>>     from an other SRC_URI entry. The two entries could be linked via the
>>>>>     name parameter. This approach could be combined with your suggestion
>>>>>     above. The new fetcher will unpack a lock file from an other
>>>>>     (default) download.
>>>>       
>>>>     I'm not really sure what is best right now. I'm trying to spell out the
>>>>     pros/cons of what is going on here in the hope it encourages others to
>>>>     give feedback as well. I agree there isn't a problem right now but I
>>>>     worry there soon will be by mixing two things together like this. The
>>>>     way we handle git protocol does cause us friction with other urls
>>>>     schemes already.
>>>     The dependency fetcher could simple skip the unpack. In this case the
>>>     user needs to use a variable to pass the same URL to the git and
>>>     dependency fetcher or we could provide a python function to generate
>>>     two SRC_URI with the same base URL.
>>>
>>     I'm starting to wonder about a slightly different approach, basically
>>     an optional generated file alongside a recipe which contains "expanded"
>>     information which is effectively expensive to generate (in computation
>>     or resource like network access/process terms). We could teach bitbake
>>     a new phase of parsing where it generated them if missing. There are
>>     some other pieces of information which we know during the build process
>>     which it would be helpful to know earlier (e.g. which packages a recipe
>>     generates). I've wondered about this for a long time and the fetcher
>>     issues remind me of it again. It would be a big change with advantages
>>     and drawbacks. I think it would put more pressure on a layer maintainer
>>     as they'd have to computationally keep this up to date and it would
>>     complicate the patch workflow (who should send/regen the files?). I'm
>>     putting the idea there, I'm not saying I think we should do it, I'm
>>     just considering options.
>
>     Do you mean like a cache or like the inc files? Is the file
>     totally auto generated or is manual editing acceptable?
>
>>>       = Open questions
>>>>>>>     * Where should we download dependencies?
>>>>>>>     ** Should we use a folder per fetcher (ex. git and npm)?
>>>>>>>     ** Should we use the main folder (ex. crate)?
>>>>>>>     ** Should we translate the name into folder (ex. gomod)?
>>>>>>>     ** Should we integrate the name into the filename (ex. git)?
>>>>>>>       
>>>>>>>       
>>>>>>       
>>>>>>       
>>>>>>     DL_DIR is meant to be a complete cache of the source so it would
>>>>>>     need
>>>>>>     to be downloaded there. Given it maps to the other fetchers, the
>>>>>>     existing cache mechanisms likely work for these just fine, the open
>>>>>>     question is on whether the lock/spec files should be cached after
>>>>>>     extraction.
>>>>>       
>>>>>     You misunderstand the question. Its about the downloadfilename
>>>>>     parameter. At the moment some fetcher use sub folder inside DL_DIR
>>>>>     and others use the main folder. It looks like every fetcher has its
>>>>>     own concept to handle file collision between different fetchers. The
>>>>>     git and npm fetcher use there own folder, the crate fetcher use its
>>>>>     own .crate file prefix, the gomod fetcher translate the URL into
>>>>>     multiple folders and the git fetcher translate the URL into a single
>>>>>     folder name.
>>>>     That makes more sense. The layout is partially legacy. The wget and
>>>>     local fetchers were first and hence go directly into DL_DIR. git/svn
>>>>     were separated out into their own directories with a plan to have a
>>>>     directory per fetcher. That didn't always work out with each newer
>>>>     fetcher. Each fetcher does have to handle a unique naming of its urls
>>>>     as only the specific fetcher can know all the urls parameters and which
>>>>     ones affect the output vs which ones don't.
>>>>       
>>>       This doesn't explain why the npm but not the gomod and crate fetcher
>>>     use a sub folder. All fetchers are based on the wget fetcher.
>>     That is probably "my fault". Put yourself in my position. You get a ton
>>     of different patches, all touching very varied aspects of the system.
>>     When reviewing them you have to try and remember the original design
>>     decisions, the future directions, the ways things broke in the past, a
>>     desire to try and have clean consistent APIs and so on. I have tried
>>     very hard to move things in a direction where things incrementally
>>     improve, without unnecessarily blocking new features. It means that
>>     things that merge often aren't perfect. We've tried a few different
>>     approaches with the newer programming languages and each approach has
>>     had pros and cons. The inconsistency is probably as I missed something
>>     in review. Sorry :(.
>
>     Sorry, I don't want to criticism you. I see that you have a lot of
>     work. I want to understand the reasons for the actual design and
>     how it should look like.
>
>>     I only have finite time. There are few people who seem to want to dive
>>     in and help with review of patches like these. I did ask some people
>>     yesterday, one told me they simply couldn't understand these patches.
>
>     What can I do to improve the review?
>
>>     I'm doing my best to ask the right questions, try and help others
>>     understand them, ensure my own concerns I can identify are resolved and
>>     I don't want to de-motivate you on this work either, I think the idea
>>     of improving this is great and I'd love to see it. Equally, I'm also
>>     the first person everyone will complain to if we change something and
>>     it causes problems for people.
>>
>>     So the explanation is probably I just missed something in review at
>>     some point. The intent was to separate out the fetcher output going
>>     forward (unless it makes sense to be shared).
>>
>>     FWIW there are multiple things which bother me about the existing
>>     fetcher storage layout but that is a different discussion.
>
>     Okay.
>
>>>>>>>     * Where should we unpack the dependencies?
>>>>>>>     ** Should we use a folder inside the parent folder (ex.
>>>>>>>     node_modules)?
>>>>>>>     ** Should we use a fixed folder inside unpackdir
>>>>>>>         (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>>>>>>       
>>>>>>     This likely depends on the fetcher as the different mechanisms will
>>>>>>     have different expectations about how they should be extracted (as
>>>>>>     npm/etc. would).
>>>>>       
>>>>>     It depends on the fetcher but the fetcher could use the same
>>>>>     approach. At the moment every fetcher use a different approach. The
>>>>>     crate fetcher use a fixed value. The gomod fetcher uses a variable
>>>>>     (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
>>>>>     Furthermore the gomod fetcher override the common subdir parameter.
>>>>     I think we really need to standardise that if we can. Each new fetcher
>>>>     has claimed a certain approach is effectively required by the package
>>>>     manager.
>>>>       What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI?
>>     I suspect we need a default via a variable and then the option to
>>     change the default via parameters. The default value should be a
>>     bitbake fetcher namespaced control variable.
>>
>>     I'm wary of making a definitive statement saying X if that isn't going
>>     to make sense for some backend though. I simply don't have enough
>>     knowledge of them all, which is why you see me being reluctant to make
>>     definitive statements about design.
>
>     Okay.
>
>>>>>>>     * How should we treat archives for package manager caches?
>>>>>>>     ** Should we unpack the archives to support patching (ex. npm)?
>>>>>>>     ** Should we copy the packed archive to avoid unpacking and
>>>>>>>     packaging
>>>>>>>         (ex. gomod)?
>>>>>>>       
>>>>>>     If there are archives left after do_unpack, which task is going
>>>>>>     to unpack those? Are we expecting the build process in
>>>>>>     configure/compile to decompress them? Would those management
>>>>>>     tools accept things if they were extracted earlier? "unpack"
>>>>>>     would be the correct time to do it but I can see this getting
>>>>>>     into conflict with the package manager :/.
>>>>>       
>>>>>     Most package manager expect archives. In the npm case the archive is
>>>>>     unpack by the fetcher and packed by thenpm.bbclass to support
>>>>>     patching. The gomod fetcher doesn't unpack the downloaded archive and
>>>>>     the gomodgit fetcher create archives from git folders during unpack.
>>>>>     It would be possible to always keep the archives or always extract
>>>>>     the archives and recreate archives during build. It is a decision
>>>>>     between performance and patchability.
>>>>>       
>>>>>       At the moment it is complicated to work with the different fetcher
>>>>>     because every fetcher use a different concept and it is unclear what
>>>>>     is the desired approach.
>>>>       
>>>>     This is a challenge. Can we handle the unpacking with the package
>>>>     manager as a specific step or does it have to be combined with other
>>>>     steps like configure/compile?
>>>>        
>>>     It looks like this is possible:
>>>       cargo fetch
>>>       go mod vendor
>>>       npm install
>>>       
>>>       I suspect you're thinking about using the package manager in
>>>     do_unpack to unpack the archives and patch the unpacked archives
>>>     afterwards?
>>     I'm wondering about it, yes. I know we've had challenges with patching
>>     rust modules for example so this isn't a theoretical problem :/.
>
>     It is an interesting idea because most package manager check the
>     integrity before unpack. Additionally it should simplify and speed
>     up the npm build because it removes the repack of the packages.
>     The problem is that we need an additional task to patch the
>     dependency specification file and to unpack the file.
>
>>>>>>     I did wonder if patches 1-5 of this series could be merged
>>>>>>     separately too as they look reasonable regardless of the rest
>>>>>>     of the series?
>>>>>       
>>>>>     Sure. Should I resend the patches as separate series?
>>>>     Yes please, that would then let us remove the bits we can easily
>>>>     review/sort and focus on this other part.
>>>>        
>>>     Done.
>>     Thanks.
>>
>>>     I will also resend the go h1 checksum commit separate because it
>>>     could be useful for the gomod fetcher.
>>     Yes, I was waiting for a new version of that one with the naming tweaked.
>
>     Done.
>
>>>     Should I also move the dn / dv parameter patches to a separate series
>>>     because it could be useful without the dependency fetcher. I could
>>>     add the parameters to the fetchers in a backward compatible way.
>>     I need to think more about that one...
>
>     The motivation is to include the dependencies with name, version,
>     license and cpe into the SBOM.
>
>     Regards
>       Stefan
>
>
>     -=-=-=-=-=-=-=-=-=-=-=-
>     Links: You receive all messages sent to this group.
>     View/Reply Online (#16981):
>     https://lists.openembedded.org/g/bitbake-devel/message/16981
>     Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
>     Group Owner: bitbake-devel+owner@lists.openembedded.org
>     <mailto:bitbake-devel%2Bowner@lists.openembedded.org>
>     Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub
>     [bruce.ashfield@gmail.com]
>     -=-=-=-=-=-=-=-=-=-=-=-
>
>
>
> -- 
> - Thou shalt not follow the NULL pointer, for chaos and madness await 
> thee at its end
> - "Use the force Harry" - Gandalf, Star Trek II
>

[-- Attachment #2: Type: text/html, Size: 36014 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-07 17:46               ` Stefan Herbrechtsmeier
@ 2025-01-08 15:43                 ` Bruce Ashfield
  2025-01-09 11:51                   ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Bruce Ashfield @ 2025-01-08 15:43 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier
  Cc: Richard Purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 26322 bytes --]

On Tue, Jan 7, 2025 at 12:46 PM Stefan Herbrechtsmeier <
stefan.herbrechtsmeier-oss@weidmueller.com> wrote:

> Am 07.01.2025 um 17:58 schrieb Bruce Ashfield:
>
> Hi all,
>
> I'm going to reply at this point in the thread to at least let everyone
> know that I've been reading along, but honestly can't say if a few
> questions that I have have been asked (and answered).
>
> The biggest use case that I have for the layers and recipes that I
> maintain is about being able to both "easily" patch or update
> vendor/dependencies of the main application build.
>
> It was unclear to me how I'd do that with these changes.
>
> For the copied/extracted dependencies, I can see that you'd just be able
> to figure out where they were extracted (and I see the discussions on where
> to extract/store some of the files) and then write a patch as you would
> with any recipe. But would there be a way to patch the dependency "lock
> file" ? I definitely don't see a way that I'd be able to tweak a source
> hash and have an updated dependency pulled in .. but I could have easily
> missed that.
>
> You have to provide your own "lock file" and place it beside the recipe.
> The "lock file" is fetched via the file fetcher and is used to fetch the
> dependencies.
>
> My requirement would be to individually bump the vendored dependencies. A
copy and update of just a single entry in the lock file is possible, which
is what I'd do. I'm just pointing out that finer grained control is
required when quickly iterating or developing packages.

I find a lot of mindshare goes towards just building and creating images,
where there's also a need to support development workflows.


> Those are the primary reasons why I'll stay with explicitly listed /
> visible dependencies, unless something similar is available in a re-worked
> / unified fetcher.
>
> It is impossible to patch the sources inside bitbake. Therefore the
> dependency resolution must be moved inside a dependency fetch task and an
> additional dependency patch task need to be added.
>
I'm just talking about being able to patch the vendor source once they are
fetched and placed in their build location. Using normal patch files on the
SRC_URI. When the location of the vendor source isn't obvious (because it
is calculated or dynamically generated, this becomes more challenging).


> I prefer the translation to git, so I have debug source for vendor
> dependencies as well as a well travelled path to mirror and archive the
> source
>
> Do you reference to the go-vendor implementation? Do you mean the vendor
> directory? The gomod fetcher should support mirror and archive the sources.
> It should be possible to create a vendor folder from the gomod archives.
>

Nope. I don't use that either. I have my own tools to locate the source of
the dependencies, clone and put them into a vendor directory. The recipe
simply clones and copies using git after that.



> , but something like the update task of rust is at least explicit and
> visible to me, so I can also use it without too many issues.
>
> Do you mean `bitbake -c update_crates recipe-name`?
>

Correct. The .inc file updating mechanisms.

Bruce



>
> Regards
>   Stefan
>
> On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via
> lists.openembedded.org <stefan.herbrechtsmeier-oss=
> weidmueller.com@lists.openembedded.org> wrote:
>
>> Am 07.01.2025 um 12:01 schrieb Richard Purdie:
>>
>> On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote:
>>
>> Am 06.01.2025 um 16:30 schrieb Richard Purdie:
>>
>> On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
>>
>>  I'm a little bit worried about how easily you could sneak a
>> "floating" version into this and make the fetcher non-
>> deterministic. Does (or could?) the code detect and error on
>> that?
>>
>>
>> We could raise an error if a checksum is missing in the
>> dependency specification file or make the checksum mandatory for
>> the dependency fetcher.  Furthermore we could inspect the
>> dependency URLs to detect a misuse of the file like a latest
>> string for the version.
>>
>>
>> I think adding such an error would be a requirement for merging
>> this.
>>
>>
>> Should the dependency fetcher (ex. npmsw) or the language specific
>> fetcher (ex. npm) fail if the version points to a latest version?
>>
>> I think right now it has to error to try and reduce complexity. It is
>> possible to support such things but you have to pass that version
>> information back up the stack so that PV represents the different
>> versions and that is a new level of complexity.
>>
>> I guess we should consider how you could theoretically support it as
>> that might influence the design. With multiple git repos in SRC_URI for
>> example, we end up adding multiple shortened shas to construct a PV so
>> that if any change, PV changes. We also have to add an incrementing
>> integer so that on opkg/dpkg/rpm operations work and versions sort.
>>
>> Okay. In this case we should add the checks to the dependency resolution.
>> Thereby we prohibit dynamic versions for the dependencies and allows users
>> to add support for it to the fetcher of the package manager.
>>
>> Put another way, could one of these SRC_URIs map to multiple
>> different combinations of underlying component versions?
>>
>> If you mean the extracted SRC_URI for a single dependency from
>> the dependency specification file (ex. npm-shrinkwrap.json) it
>> could use special URLs to map to the latest version. But this is
>> a missus of the dependency specification file and could be
>> detected. The tools generate files with fixed versions always
>> because a floating version with a fixed checksum make no senses.
>>
>> Even if it shouldn't happen, we need to detect and error for this
>> case as it would become very problematic for us.
>>
>>
>> Okay. Should we disallow a dynamic version for package manager
>> downloads generally or do you see a reasonable use case?
>>
>> See above.
>>
>>
>>  I also thought it would make sense to generate recipes from the
>> dependency specification files and therefore worked on the
>> recipetool
>> previous. But it looks like the tool isn't really used and I'm
>> afraid
>> nobody will use the recipe to fix dependencies. In most cases it
>> is
>> easy to update a dependency in the native tooling and only
>> provide an
>> updated dependency specification file.
>>
>>
>>
>> I think people have wanted a single simple command to translate the
>> specification file into our recipe format to update the recipe. For
>> various reasons people didn't seem to find the recipetool approach
>> was working and created the task workflow based one. There are pros
>> and cons to both and I don't have a strong preference. I would like
>> to see something which makes it clear to users what is going on
>> though and is simple to use.
>>
>> People do intuitively understand a .inc file with a list of urls in
>> it. There are challenges in updating it.
>>
>> This other approach is not as intuitive as everything is abstracted
>> out of sight.
>>
>> One thing for example which worries me is how are the license
>> fields in the recipe going to be updated?
>>
>> Currently, if we teach the class, it can set LICENSE variables
>> appropriately. With the new approach, you don't know the licenses
>> until
>> after unpack has run. Yes it can write it into the SPDX, but it
>> won't
>> work for something like the layer index or forms of analysis which
>> don't build things.
>>
>> This does also extend to vulnerability analysis since we can't know
>> what is in a given recipe without actually unpacking it. For
>> example we
>> could know crate XXX at version YYY has a CVE but we can't tell if
>> a
>> recipe uses that crate until after do_unpack, or at least not
>> without
>> expandurl.
>>
>>
>>
>> The main question is if the meta data should contain all information.
>> If yes, we shouldn't allow any fetcher which requires an external
>> source. This should include the gitsm fetcher and we should replace
>> the single SRC_URI with multiple git SRC_URIs.
>>
>> If we had tooling that supported that well we could certainly consider
>> it. It isn't straight forward as you can have a git repo containing
>> submodules which then themselves contain submodules which can then
>> contain more levels of submodules. There are therefore multiple levels
>> of expansion possible.
>>
>> Okay. That makes the git submodule special in compare to the other
>> dependency fetcher.
>>
>> We can go even further and forbid specific package manager fetchers
>> and use plain https or git SRC_URIs. The python and go-vendor fetcher
>> use this approach.
>>
>>  Alternative we allow dependency fetchers and require that the meta
>> data be always used via bitbake. In this case we could extend the
>> meta data via the fetcher.
>>
>>  In both cases it is possible to produce the same meta data. It
>> doesn't matter if we use recipetool, devtool, bbclasses or fetcher.
>> In any case we could resolve the SRC_URIs, checksums or srcrev from a
>> file. The license information could be fetched from the package
>> repositories without integrity checks or could be extracted from the
>> individual package description file inside the downloaded sources
>> (ex. npm). We should skip the license detection from license files
>> for now because they generate manual work and could be discuses
>> later.
>>
>> That was the reason the current task based approach doesn't use them,
>> yet! I mention it just to highlight that it can be solved either way,
>> the approach doesn't really change what we need to do. The bigger
>> concern is having information available in the metadata which I think
>> we need do to some level regardless of which approach we choose.
>>
>>
>> The recipe approach has the advantage that it uses fixed licenses and
>> that license changes could be (theoretical) reviewed during recipe
>> update.
>>
>> FWIW that is an important use case and one of our general strengths. We
>> can only do that as the license information is written in recipes and
>> can be compared at update time.
>>
>> Does this apply to the license of the every individual dependency or only
>> to the combined license?
>>
>> In contrast the fetcher approach reduces the update procedure to a
>> simple file rename or SRCREV update (ex. gitsm). Furthermore, the
>> user could simply place a file beside the recipe to update the
>> dependencies. Could we realize the same via devtool integration and a
>> patch?
>>
>> This is effectively what the task based approach is aiming for
>> currently. I think the idea was that we could have devtool/recipetool
>> integration around that update task, a task was just a convenient way
>> to capture the code to do it and get things working without needing the
>> tool to be finished.
>>
>> What is the task based approach? `bitbake -c update xyz`?
>>
>>  We have different solutions between the languages (ex. npmsw vs
>> crate vs pypi) and even inside the languages (ex. go-vendor vs
>> gomod). I would like to unify the dependency support. It doesn't
>> matter if we decide to use the bitbake fetcher or a bitbake / devtool
>> command for the dependency and license resolution.
>>
>> I do very much prefer having one good way of doing things rather than
>> multiple ways of doing things, each with a potential drawback. I'm
>> therefore broadly in favour of doing that as long as we don't upset too
>> much existing mindshare along the way.
>>
>> Okay
>>
>>
>>  I have a WIP to integrate the the dependencies into the spdx .
>> This
>> uses the expanded_urldata / implicit_urldata function to add the
>> dependencies to the process list of archiver and spdx.
>>  https://github.com/weidmueller/poky/tree/feature/dependency-
>> fetcher
>>
>> Regarding the license we could migrate the functionality from
>> recipetool into a class and detect the licenses at build time.
>> Theoretically the fetcher could fetch the license from the
>> package
>> manager repository but we have to trust the repository because we
>> have no checksum to detect changes. Maybe we could integrate
>> tools
>> like Syft or ScanCode to detect the licenses at build time. At
>> the
>> moment the best solution is to make sure that the SBOM contains
>> the
>> name and version of the dependencies and let other tools handle
>> the
>> license via SBOM for now. Therefore I propose a common scheme to
>> define the dependency name (dn) and version (dv) in the SRC_URI.
>>
>>
>>
>> We could compare what licenses the package manager is showing us
>> with
>> what is in the recipe and error if different. There would then need
>> to
>> be a command to update the licenses in the recipe (in much the way
>> urls
>> currently get updated).
>>
>>
>>
>> Either we request the licenses from the package manager during
>> package update or during fetch. I wouldn't do both. Instead I would
>> analyze the the license file during build and compare the detected
>> license with the recipe or fetcher generated licenses. But the
>> license detection from files is an other topic and I would like to
>> postpone it for now.
>>
>> Agreed, I mention it just to highlight that supporting them does have
>> impact on the design so any solution needs to ultimately be able to
>> support it.
>>
>>
>> You're using DL_DIR for that which I
>> suspect isn't a great idea for tmp files.
>>
>>  Take over from gitsm.
>>
>> Probably not the best fetcher and I'd say gitsm should be fixed.
>>
>> I don't see a reason why the gitsm fetcher shouldn't handled like the
>> other dependency fetcher. We could update the handler after we have a
>> decision for the dependency fetchers.
>>
>> In principle perhaps but as mentioned above, gitsm has its own challenges.
>>
>> Based on your feedback I have the feeling that a dependency fetcher isn't
>> the correct solution. The fetcher makes it impossible to review changes
>> during recipe update. Additionally it needs caching for the resolved fetch
>> and license data.
>>
>> The alternative is to create an inc file with SRC_URIs, checksums,
>> SRCREVs and LICENSE. Any recommendation how to integrate the dependency
>> resolution and inc creation into oe-core?
>>
>> The url scheme is clever but also has a potential risk in that you
>> can't really pass parameters to both the top level fetcher and the
>> underlying one. I'm worried that is going to bite us further down
>> the
>> line.
>>
>> At the moment I don't see a real problem but maybe you are right. The
>> existing language specific fetcher use fixed paths for there
>> downloads.
>>
>>  What do you propose? Should the fetcher skip the unpack of the
>> source or should we introduce a sub fetcher which uses the download
>> from an other SRC_URI entry. The two entries could be linked via the
>> name parameter. This approach could be combined with your suggestion
>> above. The new fetcher will unpack a lock file from an other
>> (default) download.
>>
>>
>> I'm not really sure what is best right now. I'm trying to spell out the
>> pros/cons of what is going on here in the hope it encourages others to
>> give feedback as well. I agree there isn't a problem right now but I
>> worry there soon will be by mixing two things together like this. The
>> way we handle git protocol does cause us friction with other urls
>> schemes already.
>>
>> The dependency fetcher could simple skip the unpack. In this case the
>> user needs to use a variable to pass the same URL to the git and
>> dependency fetcher or we could provide a python function to generate
>> two SRC_URI with the same base URL.
>>
>>
>> I'm starting to wonder about a slightly different approach, basically
>> an optional generated file alongside a recipe which contains "expanded"
>> information which is effectively expensive to generate (in computation
>> or resource like network access/process terms). We could teach bitbake
>> a new phase of parsing where it generated them if missing. There are
>> some other pieces of information which we know during the build process
>> which it would be helpful to know earlier (e.g. which packages a recipe
>> generates). I've wondered about this for a long time and the fetcher
>> issues remind me of it again. It would be a big change with advantages
>> and drawbacks. I think it would put more pressure on a layer maintainer
>> as they'd have to computationally keep this up to date and it would
>> complicate the patch workflow (who should send/regen the files?). I'm
>> putting the idea there, I'm not saying I think we should do it, I'm
>> just considering options.
>>
>> Do you mean like a cache or like the inc files? Is the file totally auto
>> generated or is manual editing acceptable?
>>
>>  = Open questions
>>
>> * Where should we download dependencies?
>> ** Should we use a folder per fetcher (ex. git and npm)?
>> ** Should we use the main folder (ex. crate)?
>> ** Should we translate the name into folder (ex. gomod)?
>> ** Should we integrate the name into the filename (ex. git)?
>>
>>
>>
>>
>>
>> DL_DIR is meant to be a complete cache of the source so it would
>> need
>> to be downloaded there. Given it maps to the other fetchers, the
>> existing cache mechanisms likely work for these just fine, the open
>> question is on whether the lock/spec files should be cached after
>> extraction.
>>
>>
>> You misunderstand the question. Its about the downloadfilename
>> parameter. At the moment some fetcher use sub folder inside DL_DIR
>> and others use the main folder. It looks like every fetcher has its
>> own concept to handle file collision between different fetchers. The
>> git and npm fetcher use there own folder, the crate fetcher use its
>> own .crate file prefix, the gomod fetcher translate the URL into
>> multiple folders and the git fetcher translate the URL into a single
>> folder name.
>>
>> That makes more sense. The layout is partially legacy. The wget and
>> local fetchers were first and hence go directly into DL_DIR. git/svn
>> were separated out into their own directories with a plan to have a
>> directory per fetcher. That didn't always work out with each newer
>> fetcher. Each fetcher does have to handle a unique naming of its urls
>> as only the specific fetcher can know all the urls parameters and which
>> ones affect the output vs which ones don't.
>>
>>
>>  This doesn't explain why the npm but not the gomod and crate fetcher
>> use a sub folder. All fetchers are based on the wget fetcher.
>>
>> That is probably "my fault". Put yourself in my position. You get a ton
>> of different patches, all touching very varied aspects of the system.
>> When reviewing them you have to try and remember the original design
>> decisions, the future directions, the ways things broke in the past, a
>> desire to try and have clean consistent APIs and so on. I have tried
>> very hard to move things in a direction where things incrementally
>> improve, without unnecessarily blocking new features. It means that
>> things that merge often aren't perfect. We've tried a few different
>> approaches with the newer programming languages and each approach has
>> had pros and cons. The inconsistency is probably as I missed something
>> in review. Sorry :(.
>>
>> Sorry, I don't want to criticism you. I see that you have a lot of work.
>> I want to understand the reasons for the actual design and how it should
>> look like.
>>
>> I only have finite time. There are few people who seem to want to dive
>> in and help with review of patches like these. I did ask some people
>> yesterday, one told me they simply couldn't understand these patches.
>>
>> What can I do to improve the review?
>>
>> I'm doing my best to ask the right questions, try and help others
>> understand them, ensure my own concerns I can identify are resolved and
>> I don't want to de-motivate you on this work either, I think the idea
>> of improving this is great and I'd love to see it. Equally, I'm also
>> the first person everyone will complain to if we change something and
>> it causes problems for people.
>>
>> So the explanation is probably I just missed something in review at
>> some point. The intent was to separate out the fetcher output going
>> forward (unless it makes sense to be shared).
>>
>> FWIW there are multiple things which bother me about the existing
>> fetcher storage layout but that is a different discussion.
>>
>> Okay.
>>
>> * Where should we unpack the dependencies?
>> ** Should we use a folder inside the parent folder (ex.
>> node_modules)?
>> ** Should we use a fixed folder inside unpackdir
>>    (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>>
>>
>> This likely depends on the fetcher as the different mechanisms will
>> have different expectations about how they should be extracted (as
>> npm/etc. would).
>>
>>
>> It depends on the fetcher but the fetcher could use the same
>> approach. At the moment every fetcher use a different approach. The
>> crate fetcher use a fixed value. The gomod fetcher uses a variable
>> (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
>> Furthermore the gomod fetcher override the common subdir parameter.
>>
>> I think we really need to standardise that if we can. Each new fetcher
>> has claimed a certain approach is effectively required by the package
>> manager.
>>
>>  What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI?
>>
>> I suspect we need a default via a variable and then the option to
>> change the default via parameters. The default value should be a
>> bitbake fetcher namespaced control variable.
>>
>> I'm wary of making a definitive statement saying X if that isn't going
>> to make sense for some backend though. I simply don't have enough
>> knowledge of them all, which is why you see me being reluctant to make
>> definitive statements about design.
>>
>> Okay.
>>
>> * How should we treat archives for package manager caches?
>> ** Should we unpack the archives to support patching (ex. npm)?
>> ** Should we copy the packed archive to avoid unpacking and
>> packaging
>>    (ex. gomod)?
>>
>>
>> If there are archives left after do_unpack, which task is going
>> to unpack those? Are we expecting the build process in
>> configure/compile to decompress them? Would those management
>> tools accept things if they were extracted earlier? "unpack"
>> would be the correct time to do it but I can see this getting
>> into conflict with the package manager :/.
>>
>>
>> Most package manager expect archives. In the npm case the archive is
>> unpack by the fetcher and packed by thenpm.bbclass to support
>> patching. The gomod fetcher doesn't unpack the downloaded archive and
>> the gomodgit fetcher create archives from git folders during unpack.
>> It would be possible to always keep the archives or always extract
>> the archives and recreate archives during build. It is a decision
>> between performance and patchability.
>>
>>  At the moment it is complicated to work with the different fetcher
>> because every fetcher use a different concept and it is unclear what
>> is the desired approach.
>>
>>
>> This is a challenge. Can we handle the unpacking with the package
>> manager as a specific step or does it have to be combined with other
>> steps like configure/compile?
>>
>>
>> It looks like this is possible:
>>  cargo fetch
>>  go mod vendor
>>  npm install
>>
>>  I suspect you're thinking about using the package manager in
>> do_unpack to unpack the archives and patch the unpacked archives
>> afterwards?
>>
>> I'm wondering about it, yes. I know we've had challenges with patching
>> rust modules for example so this isn't a theoretical problem :/.
>>
>> It is an interesting idea because most package manager check the
>> integrity before unpack. Additionally it should simplify and speed up the
>> npm build because it removes the repack of the packages. The problem is
>> that we need an additional task to patch the dependency specification file
>> and to unpack the file.
>>
>> I did wonder if patches 1-5 of this series could be merged
>> separately too as they look reasonable regardless of the rest
>> of the series?
>>
>>
>> Sure. Should I resend the patches as separate series?
>>
>> Yes please, that would then let us remove the bits we can easily
>> review/sort and focus on this other part.
>>
>>
>> Done.
>>
>> Thanks.
>>
>>
>> I will also resend the go h1 checksum commit separate because it
>> could be useful for the gomod fetcher.
>>
>> Yes, I was waiting for a new version of that one with the naming tweaked.
>>
>> Done.
>>
>> Should I also move the dn / dv parameter patches to a separate series
>> because it could be useful without the dependency fetcher. I could
>> add the parameters to the fetchers in a backward compatible way.
>>
>> I need to think more about that one...
>>
>> The motivation is to include the dependencies with name, version, license
>> and cpe into the SBOM.
>>
>> Regards
>>   Stefan
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#16981):
>> https://lists.openembedded.org/g/bitbake-devel/message/16981
>> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
>> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
>> bruce.ashfield@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
>>
>
> --
> - Thou shalt not follow the NULL pointer, for chaos and madness await thee
> at its end
> - "Use the force Harry" - Gandalf, Star Trek II
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II

[-- Attachment #2: Type: text/html, Size: 38661 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-06 14:42     ` Stefan Herbrechtsmeier
@ 2025-01-09 10:40       ` Alexander Kanavin
  2025-01-09 14:00         ` Stefan Herbrechtsmeier
       [not found]       ` <18190013516DD62F.1999@lists.openembedded.org>
  1 sibling, 1 reply; 66+ messages in thread
From: Alexander Kanavin @ 2025-01-09 10:40 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier
  Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

On Mon, 6 Jan 2025 at 15:43, Stefan Herbrechtsmeier
<stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
> https://github.com/yoctoproject/poky/compare/master...weidmueller:poky:feature/dependency-fetcher
>
> I have migrate the crate recipes to the new fetcher and improve the spdx
> 2.2 class to include the name and version of the crate dependencies.
>
> You have to inherit the create-spdx-2.2 class and build the librsvg
> recipe to test the new fetcher.

Thanks, I checked out the branch and run bitbake -c patch librsvg with
the default build/conf/ config. It works and the recipe is short and
neat. I'm not sure what create-spdx-2.2 is needed for? I didn't use
it, and there were no errors.

Like others, I'm torn on two things:
- visibility
- control

When a recipe explicitly lists what goes into a build, this can be
easily seen, audited, and adjusted directly in the recipe. With the
new fetchers, you need to actually run a build to produce that list,
and it isn't clear where the list is placed, in which format, and what
to do if something needs to deviate from versions prescribed by
upstream.

This is not a theoretical concern, I'm thinking specifically of
log4j-like vulnerabilities, and how one would check that their product
doesn't contain them:
https://lwn.net/Articles/878570/

Alex

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
       [not found]       ` <18190013516DD62F.1999@lists.openembedded.org>
@ 2025-01-09 10:50         ` Alexander Kanavin
  2025-01-09 14:18           ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Alexander Kanavin @ 2025-01-09 10:50 UTC (permalink / raw)
  To: alex.kanavin
  Cc: Stefan Herbrechtsmeier, richard.purdie, bitbake-devel,
	Stefan Herbrechtsmeier

On Thu, 9 Jan 2025 at 11:40, Alexander Kanavin via
lists.openembedded.org <alex.kanavin=gmail.com@lists.openembedded.org>
wrote:
> This is not a theoretical concern, I'm thinking specifically of
> log4j-like vulnerabilities, and how one would check that their product
> doesn't contain them:
> https://lwn.net/Articles/878570/

I meant to say 'yocto layer' here, not product. And ideally it should
be possible with 'static analysis', e.g. just by looking at the layer
content.

Alex


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-08 15:43                 ` Bruce Ashfield
@ 2025-01-09 11:51                   ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-09 11:51 UTC (permalink / raw)
  To: Bruce Ashfield; +Cc: Richard Purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 31578 bytes --]

Am 08.01.2025 um 16:43 schrieb Bruce Ashfield:
>
> On Tue, Jan 7, 2025 at 12:46 PM Stefan Herbrechtsmeier 
> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>
>     Am 07.01.2025 um 17:58 schrieb Bruce Ashfield:
>>     Hi all,
>>
>>     I'm going to reply at this point in the thread to at least let
>>     everyone know that I've been reading along, but honestly can't
>>     say if a few questions that I have have been asked (and answered).
>>
>>     The biggest use case that I have for the layers and recipes that
>>     I maintain is about being able to both "easily" patch or update
>>     vendor/dependencies of the main application build.
>>
>>     It was unclear to me how I'd do that with these changes.
>>
>>     For the copied/extracted dependencies, I can see that you'd just
>>     be able to figure out where they were extracted (and I see the
>>     discussions on where to extract/store some of the files) and then
>>     write a patch as you would with any recipe. But would there be a
>>     way to patch the dependency "lock file" ? I definitely don't see
>>     a way that I'd be able to tweak a source hash and have an updated
>>     dependency pulled in .. but I could have easily missed that.
>
>     You have to provide your own "lock file" and place it beside the
>     recipe. The "lock file" is fetched via the file fetcher and is
>     used to fetch the dependencies.
>
> My requirement would be to individually bump the vendored 
> dependencies. A copy and update of just a single entry in the lock 
> file is possible, which is what I'd do. I'm just pointing out that 
> finer grained control is required when quickly iterating or developing 
> packages.
>
> I find a lot of mindshare goes towards just building and creating 
> images, where there's also a need to support development workflows.

That's the reason I use the package manager specific lock file as base. 
Every package manager has tools and workflows to manage, update or 
override the dependencies. This tools not only update the source URL and 
checksum but also handle the influence to other dependencies, sub 
dependencies or version selection. It is much easy to update the lock 
file with the existing tooling and pass the update lock file (or patch) 
to bitbake.

>>     Those are the primary reasons why I'll stay with explicitly
>>     listed / visible dependencies, unless something similar is
>>     available in a re-worked / unified fetcher.
>
>     It is impossible to patch the sources inside bitbake. Therefore
>     the dependency resolution must be moved inside a dependency fetch
>     task and an additional dependency patch task need to be added.
>
> I'm just talking about being able to patch the vendor source once they 
> are fetched and placed in their build location. Using normal patch 
> files on the SRC_URI. When the location of the vendor source isn't 
> obvious (because it is calculated or dynamically generated, this 
> becomes more challenging).

This should be possible if we use vendoring and create the vendor folder 
before do_patch.

do_fetch
do_unpack
do_vendor
do_patch

We could use a do_update task to parse the lock file and update a inc 
file. Or we could add additional tasks to resolve additional fetcher 
URLs from the spec / lock file:

do_vendor_spec_fetch
do_vendor_spec_unpack
do_vendor_spec_patch
do_vendor_fetch
do_fetch
do_unpack
do_vendor
do_patch

This sequence ensure that the do_fetch still download all dependencies. 
Only the do_vendor_fetch need internet access.

>
>>     I prefer the translation to git, so I have debug source for
>>     vendor dependencies as well as a well travelled path to mirror
>>     and archive the source
>
>     Do you reference to the go-vendor implementation? Do you mean the
>     vendor directory? The gomod fetcher should support mirror and
>     archive the sources. It should be possible to create a vendor
>     folder from the gomod archives.
>
>
> Nope. I don't use that either. I have my own tools to locate the 
> source of the dependencies, clone and put them into a vendor 
> directory. The recipe simply clones and copies using git after that.
>
>>     , but something like the update task of rust is at least explicit
>>     and visible to me, so I can also use it without too many issues.
>
>     Do you mean `bitbake -c update_crates recipe-name`?
>
>
> Correct. The .inc file updating mechanisms.

Okay.

Regards
  Stefan

>>     On Tue, Jan 7, 2025 at 11:13 AM Stefan Herbrechtsmeier via
>>     lists.openembedded.org <http://lists.openembedded.org>
>>     <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org>
>>     wrote:
>>
>>         Am 07.01.2025 um 12:01 schrieb Richard Purdie:
>>>         On Tue, 2025-01-07 at 10:47 +0100, Stefan Herbrechtsmeier wrote:
>>>>         Am 06.01.2025 um 16:30 schrieb Richard Purdie:
>>>>>         On Mon, 2025-01-06 at 15:35 +0100, Stefan Herbrechtsmeier wrote:
>>>>>>>           I'm a little bit worried about how easily you could sneak a
>>>>>>>         "floating" version into this and make the fetcher non-
>>>>>>>         deterministic. Does (or could?) the code detect and error on
>>>>>>>         that?
>>>>>>>            
>>>>>>         We could raise an error if a checksum is missing in the
>>>>>>         dependency specification file or make the checksum mandatory for
>>>>>>         the dependency fetcher.  Furthermore we could inspect the
>>>>>>         dependency URLs to detect a misuse of the file like a latest
>>>>>>         string for the version.
>>>>>>           
>>>>>         I think adding such an error would be a requirement for merging
>>>>>         this.
>>>>>            
>>>>         Should the dependency fetcher (ex. npmsw) or the language specific
>>>>         fetcher (ex. npm) fail if the version points to a latest version?
>>>         I think right now it has to error to try and reduce complexity. It is
>>>         possible to support such things but you have to pass that version
>>>         information back up the stack so that PV represents the different
>>>         versions and that is a new level of complexity.
>>>
>>>         I guess we should consider how you could theoretically support it as
>>>         that might influence the design. With multiple git repos in SRC_URI for
>>>         example, we end up adding multiple shortened shas to construct a PV so
>>>         that if any change, PV changes. We also have to add an incrementing
>>>         integer so that on opkg/dpkg/rpm operations work and versions sort.
>>
>>         Okay. In this case we should add the checks to the dependency
>>         resolution. Thereby we prohibit dynamic versions for the
>>         dependencies and allows users to add support for it to the
>>         fetcher of the package manager.
>>
>>>>>>>         Put another way, could one of these SRC_URIs map to multiple
>>>>>>>         different combinations of underlying component versions?
>>>>>>         If you mean the extracted SRC_URI for a single dependency from
>>>>>>         the dependency specification file (ex. npm-shrinkwrap.json) it
>>>>>>         could use special URLs to map to the latest version. But this is
>>>>>>         a missus of the dependency specification file and could be
>>>>>>         detected. The tools generate files with fixed versions always
>>>>>>         because a floating version with a fixed checksum make no senses.
>>>>>         Even if it shouldn't happen, we need to detect and error for this
>>>>>         case as it would become very problematic for us.
>>>>>
>>>>         Okay. Should we disallow a dynamic version for package manager
>>>>         downloads generally or do you see a reasonable use case?
>>>         See above.
>>>
>>>>>>           I also thought it would make sense to generate recipes from the
>>>>>>         dependency specification files and therefore worked on the
>>>>>>         recipetool
>>>>>>         previous. But it looks like the tool isn't really used and I'm
>>>>>>         afraid
>>>>>>         nobody will use the recipe to fix dependencies. In most cases it
>>>>>>         is
>>>>>>         easy to update a dependency in the native tooling and only
>>>>>>         provide an
>>>>>>         updated dependency specification file.
>>>>>>           
>>>>>           
>>>>>         I think people have wanted a single simple command to translate the
>>>>>         specification file into our recipe format to update the recipe. For
>>>>>         various reasons people didn't seem to find the recipetool approach
>>>>>         was working and created the task workflow based one. There are pros
>>>>>         and cons to both and I don't have a strong preference. I would like
>>>>>         to see something which makes it clear to users what is going on
>>>>>         though and is simple to use.
>>>>>
>>>>>         People do intuitively understand a .inc file with a list of urls in
>>>>>         it. There are challenges in updating it.
>>>>>
>>>>>         This other approach is not as intuitive as everything is abstracted
>>>>>         out of sight.
>>>>>
>>>>>         One thing for example which worries me is how are the license
>>>>>         fields in the recipe going to be updated?
>>>>>
>>>>>         Currently, if we teach the class, it can set LICENSE variables
>>>>>         appropriately. With the new approach, you don't know the licenses
>>>>>         until
>>>>>         after unpack has run. Yes it can write it into the SPDX, but it
>>>>>         won't
>>>>>         work for something like the layer index or forms of analysis which
>>>>>         don't build things.
>>>>>
>>>>>         This does also extend to vulnerability analysis since we can't know
>>>>>         what is in a given recipe without actually unpacking it. For
>>>>>         example we
>>>>>         could know crate XXX at version YYY has a CVE but we can't tell if
>>>>>         a
>>>>>         recipe uses that crate until after do_unpack, or at least not
>>>>>         without
>>>>>         expandurl.
>>>>>           
>>>>           
>>>>         The main question is if the meta data should contain all information.
>>>>         If yes, we shouldn't allow any fetcher which requires an external
>>>>         source. This should include the gitsm fetcher and we should replace
>>>>         the single SRC_URI with multiple git SRC_URIs.
>>>         If we had tooling that supported that well we could certainly consider
>>>         it. It isn't straight forward as you can have a git repo containing
>>>         submodules which then themselves contain submodules which can then
>>>         contain more levels of submodules. There are therefore multiple levels
>>>         of expansion possible.
>>
>>         Okay. That makes the git submodule special in compare to the
>>         other dependency fetcher.
>>
>>>>         We can go even further and forbid specific package manager fetchers
>>>>         and use plain https or git SRC_URIs. The python and go-vendor fetcher
>>>>         use this approach.
>>>>           
>>>>           Alternative we allow dependency fetchers and require that the meta
>>>>         data be always used via bitbake. In this case we could extend the
>>>>         meta data via the fetcher.
>>>>           
>>>>           In both cases it is possible to produce the same meta data. It
>>>>         doesn't matter if we use recipetool, devtool, bbclasses or fetcher.
>>>>         In any case we could resolve the SRC_URIs, checksums or srcrev from a
>>>>         file. The license information could be fetched from the package
>>>>         repositories without integrity checks or could be extracted from the
>>>>         individual package description file inside the downloaded sources
>>>>         (ex. npm). We should skip the license detection from license files
>>>>         for now because they generate manual work and could be discuses
>>>>         later.
>>>         That was the reason the current task based approach doesn't use them,
>>>         yet! I mention it just to highlight that it can be solved either way,
>>>         the approach doesn't really change what we need to do. The bigger
>>>         concern is having information available in the metadata which I think
>>>         we need do to some level regardless of which approach we choose.
>>>
>>>>         The recipe approach has the advantage that it uses fixed licenses and
>>>>         that license changes could be (theoretical) reviewed during recipe
>>>>         update.
>>>         FWIW that is an important use case and one of our general strengths. We
>>>         can only do that as the license information is written in recipes and
>>>         can be compared at update time.
>>
>>         Does this apply to the license of the every individual
>>         dependency or only to the combined license?
>>
>>>>         In contrast the fetcher approach reduces the update procedure to a
>>>>         simple file rename or SRCREV update (ex. gitsm). Furthermore, the
>>>>         user could simply place a file beside the recipe to update the
>>>>         dependencies. Could we realize the same via devtool integration and a
>>>>         patch?
>>>         This is effectively what the task based approach is aiming for
>>>         currently. I think the idea was that we could have devtool/recipetool
>>>         integration around that update task, a task was just a convenient way
>>>         to capture the code to do it and get things working without needing the
>>>         tool to be finished.
>>         What is the task based approach? `bitbake -c update xyz`?
>>
>>>>           We have different solutions between the languages (ex. npmsw vs
>>>>         crate vs pypi) and even inside the languages (ex. go-vendor vs
>>>>         gomod). I would like to unify the dependency support. It doesn't
>>>>         matter if we decide to use the bitbake fetcher or a bitbake / devtool
>>>>         command for the dependency and license resolution.
>>>         I do very much prefer having one good way of doing things rather than
>>>         multiple ways of doing things, each with a potential drawback. I'm
>>>         therefore broadly in favour of doing that as long as we don't upset too
>>>         much existing mindshare along the way.
>>
>>         Okay
>>
>>>>>>           
>>>>>>           I have a WIP to integrate the the dependencies into the spdx .
>>>>>>         This
>>>>>>         uses the expanded_urldata / implicit_urldata function to add the
>>>>>>         dependencies to the process list of archiver and spdx.
>>>>>>           
>>>>>>         https://github.com/weidmueller/poky/tree/feature/dependency-
>>>>>>         fetcher
>>>>>>
>>>>>>         Regarding the license we could migrate the functionality from
>>>>>>         recipetool into a class and detect the licenses at build time.
>>>>>>         Theoretically the fetcher could fetch the license from the
>>>>>>         package
>>>>>>         manager repository but we have to trust the repository because we
>>>>>>         have no checksum to detect changes. Maybe we could integrate
>>>>>>         tools
>>>>>>         like Syft or ScanCode to detect the licenses at build time. At
>>>>>>         the
>>>>>>         moment the best solution is to make sure that the SBOM contains
>>>>>>         the
>>>>>>         name and version of the dependencies and let other tools handle
>>>>>>         the
>>>>>>         license via SBOM for now. Therefore I propose a common scheme to
>>>>>>         define the dependency name (dn) and version (dv) in the SRC_URI.
>>>>>>           
>>>>>           
>>>>>         We could compare what licenses the package manager is showing us
>>>>>         with
>>>>>         what is in the recipe and error if different. There would then need
>>>>>         to
>>>>>         be a command to update the licenses in the recipe (in much the way
>>>>>         urls
>>>>>         currently get updated).
>>>>>           
>>>>           
>>>>         Either we request the licenses from the package manager during
>>>>         package update or during fetch. I wouldn't do both. Instead I would
>>>>         analyze the the license file during build and compare the detected
>>>>         license with the recipe or fetcher generated licenses. But the
>>>>         license detection from files is an other topic and I would like to
>>>>         postpone it for now.
>>>         Agreed, I mention it just to highlight that supporting them does have
>>>         impact on the design so any solution needs to ultimately be able to
>>>         support it.
>>>
>>>>>>>         You're using DL_DIR for that which I
>>>>>>>         suspect isn't a great idea for tmp files.
>>>>>>           Take over from gitsm.
>>>>>         Probably not the best fetcher and I'd say gitsm should be fixed.
>>>>         I don't see a reason why the gitsm fetcher shouldn't handled like the
>>>>         other dependency fetcher. We could update the handler after we have a
>>>>         decision for the dependency fetchers.
>>>         In principle perhaps but as mentioned above, gitsm has its own challenges.
>>
>>         Based on your feedback I have the feeling that a dependency
>>         fetcher isn't the correct solution. The fetcher makes it
>>         impossible to review changes during recipe
>>         update. Additionally it needs caching for the resolved fetch
>>         and license data.
>>
>>         The alternative is to create an inc file with SRC_URIs,
>>         checksums, SRCREVs and LICENSE. Any recommendation how to
>>         integrate the dependency resolution and inc creation into
>>         oe-core?
>>
>>>>>>>         The url scheme is clever but also has a potential risk in that you
>>>>>>>         can't really pass parameters to both the top level fetcher and the
>>>>>>>         underlying one. I'm worried that is going to bite us further down
>>>>>>>         the
>>>>>>>         line.
>>>>>>         At the moment I don't see a real problem but maybe you are right. The
>>>>>>         existing language specific fetcher use fixed paths for there
>>>>>>         downloads.
>>>>>>           
>>>>>>           What do you propose? Should the fetcher skip the unpack of the
>>>>>>         source or should we introduce a sub fetcher which uses the download
>>>>>>         from an other SRC_URI entry. The two entries could be linked via the
>>>>>>         name parameter. This approach could be combined with your suggestion
>>>>>>         above. The new fetcher will unpack a lock file from an other
>>>>>>         (default) download.
>>>>>           
>>>>>         I'm not really sure what is best right now. I'm trying to spell out the
>>>>>         pros/cons of what is going on here in the hope it encourages others to
>>>>>         give feedback as well. I agree there isn't a problem right now but I
>>>>>         worry there soon will be by mixing two things together like this. The
>>>>>         way we handle git protocol does cause us friction with other urls
>>>>>         schemes already.
>>>>         The dependency fetcher could simple skip the unpack. In this case the
>>>>         user needs to use a variable to pass the same URL to the git and
>>>>         dependency fetcher or we could provide a python function to generate
>>>>         two SRC_URI with the same base URL.
>>>>
>>>         I'm starting to wonder about a slightly different approach, basically
>>>         an optional generated file alongside a recipe which contains "expanded"
>>>         information which is effectively expensive to generate (in computation
>>>         or resource like network access/process terms). We could teach bitbake
>>>         a new phase of parsing where it generated them if missing. There are
>>>         some other pieces of information which we know during the build process
>>>         which it would be helpful to know earlier (e.g. which packages a recipe
>>>         generates). I've wondered about this for a long time and the fetcher
>>>         issues remind me of it again. It would be a big change with advantages
>>>         and drawbacks. I think it would put more pressure on a layer maintainer
>>>         as they'd have to computationally keep this up to date and it would
>>>         complicate the patch workflow (who should send/regen the files?). I'm
>>>         putting the idea there, I'm not saying I think we should do it, I'm
>>>         just considering options.
>>
>>         Do you mean like a cache or like the inc files? Is the file
>>         totally auto generated or is manual editing acceptable?
>>
>>>>           = Open questions
>>>>>>>>         * Where should we download dependencies?
>>>>>>>>         ** Should we use a folder per fetcher (ex. git and npm)?
>>>>>>>>         ** Should we use the main folder (ex. crate)?
>>>>>>>>         ** Should we translate the name into folder (ex. gomod)?
>>>>>>>>         ** Should we integrate the name into the filename (ex. git)?
>>>>>>>>           
>>>>>>>>           
>>>>>>>           
>>>>>>>           
>>>>>>>         DL_DIR is meant to be a complete cache of the source so it would
>>>>>>>         need
>>>>>>>         to be downloaded there. Given it maps to the other fetchers, the
>>>>>>>         existing cache mechanisms likely work for these just fine, the open
>>>>>>>         question is on whether the lock/spec files should be cached after
>>>>>>>         extraction.
>>>>>>           
>>>>>>         You misunderstand the question. Its about the downloadfilename
>>>>>>         parameter. At the moment some fetcher use sub folder inside DL_DIR
>>>>>>         and others use the main folder. It looks like every fetcher has its
>>>>>>         own concept to handle file collision between different fetchers. The
>>>>>>         git and npm fetcher use there own folder, the crate fetcher use its
>>>>>>         own .crate file prefix, the gomod fetcher translate the URL into
>>>>>>         multiple folders and the git fetcher translate the URL into a single
>>>>>>         folder name.
>>>>>         That makes more sense. The layout is partially legacy. The wget and
>>>>>         local fetchers were first and hence go directly into DL_DIR. git/svn
>>>>>         were separated out into their own directories with a plan to have a
>>>>>         directory per fetcher. That didn't always work out with each newer
>>>>>         fetcher. Each fetcher does have to handle a unique naming of its urls
>>>>>         as only the specific fetcher can know all the urls parameters and which
>>>>>         ones affect the output vs which ones don't.
>>>>>           
>>>>           This doesn't explain why the npm but not the gomod and crate fetcher
>>>>         use a sub folder. All fetchers are based on the wget fetcher.
>>>         That is probably "my fault". Put yourself in my position. You get a ton
>>>         of different patches, all touching very varied aspects of the system.
>>>         When reviewing them you have to try and remember the original design
>>>         decisions, the future directions, the ways things broke in the past, a
>>>         desire to try and have clean consistent APIs and so on. I have tried
>>>         very hard to move things in a direction where things incrementally
>>>         improve, without unnecessarily blocking new features. It means that
>>>         things that merge often aren't perfect. We've tried a few different
>>>         approaches with the newer programming languages and each approach has
>>>         had pros and cons. The inconsistency is probably as I missed something
>>>         in review. Sorry :(.
>>
>>         Sorry, I don't want to criticism you. I see that you have a
>>         lot of work. I want to understand the reasons for the actual
>>         design and how it should look like.
>>
>>>         I only have finite time. There are few people who seem to want to dive
>>>         in and help with review of patches like these. I did ask some people
>>>         yesterday, one told me they simply couldn't understand these patches.
>>
>>         What can I do to improve the review?
>>
>>>         I'm doing my best to ask the right questions, try and help others
>>>         understand them, ensure my own concerns I can identify are resolved and
>>>         I don't want to de-motivate you on this work either, I think the idea
>>>         of improving this is great and I'd love to see it. Equally, I'm also
>>>         the first person everyone will complain to if we change something and
>>>         it causes problems for people.
>>>
>>>         So the explanation is probably I just missed something in review at
>>>         some point. The intent was to separate out the fetcher output going
>>>         forward (unless it makes sense to be shared).
>>>
>>>         FWIW there are multiple things which bother me about the existing
>>>         fetcher storage layout but that is a different discussion.
>>
>>         Okay.
>>
>>>>>>>>         * Where should we unpack the dependencies?
>>>>>>>>         ** Should we use a folder inside the parent folder (ex.
>>>>>>>>         node_modules)?
>>>>>>>>         ** Should we use a fixed folder inside unpackdir
>>>>>>>>             (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>>>>>>>           
>>>>>>>         This likely depends on the fetcher as the different mechanisms will
>>>>>>>         have different expectations about how they should be extracted (as
>>>>>>>         npm/etc. would).
>>>>>>           
>>>>>>         It depends on the fetcher but the fetcher could use the same
>>>>>>         approach. At the moment every fetcher use a different approach. The
>>>>>>         crate fetcher use a fixed value. The gomod fetcher uses a variable
>>>>>>         (GO_MOD_CACHE_DIR) and the npm fetcher uses a parameter (destsuffix).
>>>>>>         Furthermore the gomod fetcher override the common subdir parameter.
>>>>>         I think we really need to standardise that if we can. Each new fetcher
>>>>>         has claimed a certain approach is effectively required by the package
>>>>>         manager.
>>>>>           What would be your desired solution? Is the variable okay or do you prefer a self contain SRC_URI?
>>>         I suspect we need a default via a variable and then the option to
>>>         change the default via parameters. The default value should be a
>>>         bitbake fetcher namespaced control variable.
>>>
>>>         I'm wary of making a definitive statement saying X if that isn't going
>>>         to make sense for some backend though. I simply don't have enough
>>>         knowledge of them all, which is why you see me being reluctant to make
>>>         definitive statements about design.
>>
>>         Okay.
>>
>>>>>>>>         * How should we treat archives for package manager caches?
>>>>>>>>         ** Should we unpack the archives to support patching (ex. npm)?
>>>>>>>>         ** Should we copy the packed archive to avoid unpacking and
>>>>>>>>         packaging
>>>>>>>>             (ex. gomod)?
>>>>>>>>           
>>>>>>>         If there are archives left after do_unpack, which task is going
>>>>>>>         to unpack those? Are we expecting the build process in
>>>>>>>         configure/compile to decompress them? Would those management
>>>>>>>         tools accept things if they were extracted earlier? "unpack"
>>>>>>>         would be the correct time to do it but I can see this getting
>>>>>>>         into conflict with the package manager :/.
>>>>>>           
>>>>>>         Most package manager expect archives. In the npm case the archive is
>>>>>>         unpack by the fetcher and packed by thenpm.bbclass to support
>>>>>>         patching. The gomod fetcher doesn't unpack the downloaded archive and
>>>>>>         the gomodgit fetcher create archives from git folders during unpack.
>>>>>>         It would be possible to always keep the archives or always extract
>>>>>>         the archives and recreate archives during build. It is a decision
>>>>>>         between performance and patchability.
>>>>>>           
>>>>>>           At the moment it is complicated to work with the different fetcher
>>>>>>         because every fetcher use a different concept and it is unclear what
>>>>>>         is the desired approach.
>>>>>           
>>>>>         This is a challenge. Can we handle the unpacking with the package
>>>>>         manager as a specific step or does it have to be combined with other
>>>>>         steps like configure/compile?
>>>>>            
>>>>         It looks like this is possible:
>>>>           cargo fetch
>>>>           go mod vendor
>>>>           npm install
>>>>           
>>>>           I suspect you're thinking about using the package manager in
>>>>         do_unpack to unpack the archives and patch the unpacked archives
>>>>         afterwards?
>>>         I'm wondering about it, yes. I know we've had challenges with patching
>>>         rust modules for example so this isn't a theoretical problem :/.
>>
>>         It is an interesting idea because most package manager check
>>         the integrity before unpack. Additionally it should simplify
>>         and speed up the npm build because it removes the repack of
>>         the packages. The problem is that we need an additional task
>>         to patch the dependency specification file and to unpack the
>>         file.
>>
>>>>>>>         I did wonder if patches 1-5 of this series could be merged
>>>>>>>         separately too as they look reasonable regardless of the rest
>>>>>>>         of the series?
>>>>>>           
>>>>>>         Sure. Should I resend the patches as separate series?
>>>>>         Yes please, that would then let us remove the bits we can easily
>>>>>         review/sort and focus on this other part.
>>>>>            
>>>>         Done.
>>>         Thanks.
>>>
>>>>         I will also resend the go h1 checksum commit separate because it
>>>>         could be useful for the gomod fetcher.
>>>         Yes, I was waiting for a new version of that one with the naming tweaked.
>>
>>         Done.
>>
>>>>         Should I also move the dn / dv parameter patches to a separate series
>>>>         because it could be useful without the dependency fetcher. I could
>>>>         add the parameters to the fetchers in a backward compatible way.
>>>         I need to think more about that one...
>>
>>         The motivation is to include the dependencies with name,
>>         version, license and cpe into the SBOM.
>>
>>         Regards
>>           Stefan
>>
>>
>>         -=-=-=-=-=-=-=-=-=-=-=-
>>         Links: You receive all messages sent to this group.
>>         View/Reply Online (#16981):
>>         https://lists.openembedded.org/g/bitbake-devel/message/16981
>>         Mute This Topic:
>>         https://lists.openembedded.org/mt/110212697/1050810
>>         Group Owner: bitbake-devel+owner@lists.openembedded.org
>>         <mailto:bitbake-devel%2Bowner@lists.openembedded.org>
>>         Unsubscribe:
>>         https://lists.openembedded.org/g/bitbake-devel/unsub
>>         [bruce.ashfield@gmail.com]
>>         -=-=-=-=-=-=-=-=-=-=-=-
>>
>>
>>
>>     -- 
>>     - Thou shalt not follow the NULL pointer, for chaos and madness
>>     await thee at its end
>>     - "Use the force Harry" - Gandalf, Star Trek II
>>
>
>
> -- 
> - Thou shalt not follow the NULL pointer, for chaos and madness await 
> thee at its end
> - "Use the force Harry" - Gandalf, Star Trek II
>

[-- Attachment #2: Type: text/html, Size: 46692 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
                   ` (22 preceding siblings ...)
  2025-01-06 11:04 ` Richard Purdie
@ 2025-01-09 11:53 ` Martin Jansa
  2025-01-09 14:26   ` Stefan Herbrechtsmeier
       [not found] ` <1812DEFF37B8C65E.26783@lists.openembedded.org>
  24 siblings, 1 reply; 66+ messages in thread
From: Martin Jansa @ 2025-01-09 11:53 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss; +Cc: bitbake-devel, Stefan Herbrechtsmeier

Hi,

thanks for looking into this.

With this series applied I've noticed some recipes now showing warnings like:
WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for
the url to npm fetcher:
https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz
WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for
the url to npm fetcher:
https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz
WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for
the url to npm fetcher:
https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz

and the same is shown before from do_fetch.

Not sure what's special about these, but I believe it used to work
with previous npmsw implementation. Any hint what to check?

On Fri, Dec 20, 2024 at 12:26 PM Stefan Herbrechtsmeier via
lists.openembedded.org
<stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org>
wrote:
>
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>
> The patch series improves the fetcher support for tightly coupled
> package manager (npm, go and cargo). It adds support for embedded
> dependency fetcher via a common dependency mixin. The patch series
> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
> fetcher for go.sum and cargo.lock files. The dependency mixin contains
> two stages. The first stage locates a local specification file or
> fetches an archive or git repository with a specification file. The
> second stage resolves the dependency URLs from the specification file
> and fetches the dependencies.
>
> SRC_URI = "<type>://npm-shrinkwrap.json"
> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
>
> Additionally, the patch series reworks the npm fetcher to work without a
> npm binary and external package repository. It adds support for a common
> dependency name and version schema to integrate the dependencies into
> the SBOM.
>
> = Background
> Bitbake has diverse concepts and drawbacks for different tightly coupled
> package manager. The Python support uses a recipe per dependency and
> generates common fetcher URLs via a python function. The other languages
> embed the dependencies inside the recipe. The Node.js support offers a
> npmsw fetcher which uses a lock file beside the recipe to generates
> multiple common fetcher URLs on the fly and thereby hides the real
> download sources. This leads to a single source in the SBOM for example.
> The Go support contains two parallel implementations. A vendor-based
> solution with a common fetcher and a go-mod-based solution with a gomod
> fetcher. The vendor-based solution includes the individual dependencies
> into the SRC_URI of the recipe and uses a python function to generate
> common fetcher URLs which additional information for the vendor task.The
> gomod fetcher uses a proprietary gomod URL. It translates the URL into a
> common URL and prepares meta data during unpack. The Rust support
> includes the individual dependencies in the SRC_URI of the recipe and
> uses proprietary crate URLs. The crate fetcher translates a proprietary
> URL into a common fetcher URL and prepares meta data during unpack. The
> recipetool does not support the crate and the gomod fetcher. This leads
> to missing licenses of the dependencies in the recipe for example
> librsvg.
>
> The steps needed to fetch dependencies for Node.js, Go and Rust are
> similar:
> 1. Extract the dependencies from a specification file (name, version,
>    checksum and URL)
> 2. Generate proprietary fetcher URIs
>   a. npm://registry.npmjs.org/;package=glob;version= 10.3.15
>   b. gomod://golang.org/x/net;version=v0.9.0
>      gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net
>   c. crate://crates.io/glob/0.3.1
> 3. Generate wget or git fetcher URIs
>   a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=…
>   b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=…
>      git://go.googlesource.com/net;protocol=https; subdir=…
>   c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=…
> 4. Unpack
> 5. Create meta files
>   a. Update lockfile and create tar.gz archives
>   b. Create go.mod file
>      Create info, go.mod file and zip archives
>   c. Create .cargo-checksum.json files
>
> It looks like the recipetool is not widely used and therefore this patch
> series integrates the dependency resolving into the fetcher. After an
> agreement on a concept the fetcher could be extended. The fetcher could
> download the license information per package and a new build task could
> run the license cruncher from the recipetool.
>
> = Open questions
>
> * Where should we download dependencies?
> ** Should we use a folder per fetcher (ex. git and npm)?
> ** Should we use the main folder (ex. crate)?
> ** Should we translate the name into folder (ex. gomod)?
> ** Should we integrate the name into the filename (ex. git)?
> * Where should we unpack the dependencies?
> ** Should we use a folder inside the parent folder (ex. node_modules)?
> ** Should we use a fixed folder inside unpackdir
>    (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
> * How should we treat archives for package manager caches?
> ** Should we unpack the archives to support patching (ex. npm)?
> ** Should we copy the packed archive to avoid unpacking and packaging
>    (ex. gomod)?
>
> This patch series depends on patch series
> 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com
> ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior").
>
>
> Stefan Herbrechtsmeier (21):
>   tests: fetch: update npmsw tests to new lockfile format
>   fetch2: npmsw: remove old lockfile format support
>   tests: fetch: replace [url] with urls for npm
>   fetch2: do not prefix embedded checksums
>   fetch2: read checksum from SRC_URI flag for npm
>   fetch2: introduce common package manager metadata
>   fetch2: add unpack support for npm archives
>   utils: add Go mod h1 checksum support
>   fetch2: add destdir to FetchData
>   fetch: npm: rework
>   tests: fetch: adapt style in npm(sw) class
>   tests: fetch: move npmsw test cases into npmsw test class
>   tests: fetch: adapt npm test cases
>   fetch: add dependency mixin
>   tests: fetch: add test cases for dependency fetcher
>   fetch: npmsw: migrate to dependency mixin
>   tests: fetch: adapt npmsw test cases
>   fetch: add gosum fetcher
>   tests: fetch: add test cases for gosum
>   fetch: add cargolock fetcher
>   tests: fetch: add test cases for cargolock
>
>  lib/bb/fetch2/__init__.py   |  35 +-
>  lib/bb/fetch2/cargolock.py  |  73 +++
>  lib/bb/fetch2/dependency.py | 167 +++++++
>  lib/bb/fetch2/gomod.py      |   5 +-
>  lib/bb/fetch2/gosum.py      |  51 +++
>  lib/bb/fetch2/npm.py        | 244 +++-------
>  lib/bb/fetch2/npmsw.py      | 347 ++++----------
>  lib/bb/tests/fetch.py       | 880 +++++++++++++++++-------------------
>  lib/bb/utils.py             |  25 +
>  9 files changed, 916 insertions(+), 911 deletions(-)
>  create mode 100644 lib/bb/fetch2/cargolock.py
>  create mode 100644 lib/bb/fetch2/dependency.py
>  create mode 100644 lib/bb/fetch2/gosum.py
>
> --
> 2.39.5
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#16920): https://lists.openembedded.org/g/bitbake-devel/message/16920
> Mute This Topic: https://lists.openembedded.org/mt/110212697/3617156
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [martin.jansa@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-09 10:40       ` Alexander Kanavin
@ 2025-01-09 14:00         ` Stefan Herbrechtsmeier
  2025-01-09 19:40           ` Alexander Kanavin
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-09 14:00 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

Am 09.01.2025 um 11:40 schrieb Alexander Kanavin:
> On Mon, 6 Jan 2025 at 15:43, Stefan Herbrechtsmeier
> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>> https://github.com/yoctoproject/poky/compare/master...weidmueller:poky:feature/dependency-fetcher
>>
>> I have migrate the crate recipes to the new fetcher and improve the spdx
>> 2.2 class to include the name and version of the crate dependencies.
>>
>> You have to inherit the create-spdx-2.2 class and build the librsvg
>> recipe to test the new fetcher.
> Thanks, I checked out the branch and run bitbake -c patch librsvg with
> the default build/conf/ config. It works and the recipe is short and
> neat.
Thanks for your test.

> I'm not sure what create-spdx-2.2 is needed for? I didn't use
> it, and there were no errors.

The change is needed to add the dependencies and their names and 
versions to the SBOM.

> Like others, I'm torn on two things:
> - visibility
> - control
>
> When a recipe explicitly lists what goes into a build, this can be
> easily seen, audited, and adjusted directly in the recipe. With the
> new fetchers, you need to actually run a build to produce that list,
> and it isn't clear where the list is placed, in which format, and what
> to do if something needs to deviate from versions prescribed by
> upstream.

I missed the appropriate function in the dependency mixin in this 
series. The list is created on demand (see archiver or spdx patch). 
Every derivation need to be handled in a package manager lock file. 
Therefore you could place a lock file beside the recipe. You could use 
an editor or the language specific tools to manipulate the lock file.

> This is not a theoretical concern, I'm thinking specifically of
> log4j-like vulnerabilities, and how one would check that their product
> doesn't contain them:
> https://lwn.net/Articles/878570/

Do you have any tools to check it at the moment?

I proposed a common style for the package name and version parameter of 
a package manager fetch URI (ex. crate). The information can be included 
in the SBOM and used outside of bitbake. As a follow up we could use the 
information to create a CPE and add the dependencies to the cve check.



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-09 10:50         ` Alexander Kanavin
@ 2025-01-09 14:18           ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-09 14:18 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

Am 09.01.2025 um 11:50 schrieb Alexander Kanavin:

> On Thu, 9 Jan 2025 at 11:40, Alexander Kanavin via
> lists.openembedded.org<alex.kanavin=gmail.com@lists.openembedded.org>
> wrote:
>> This is not a theoretical concern, I'm thinking specifically of
>> log4j-like vulnerabilities, and how one would check that their product
>> doesn't contain them:
>> https://lwn.net/Articles/878570/
> I meant to say 'yocto layer' here, not product. And ideally it should
> be possible with 'static analysis', e.g. just by looking at the layer
> content.

What is the motivation for that requirement ("just by looking at the 
layer content")? Why can't we use a SBOM for the vulnerability check?Do 
be really safe you have to scan the code because of embedded packages 
(vendoring) or git submodules.

[-- Attachment #2: Type: text/html, Size: 1485 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-09 11:53 ` Martin Jansa
@ 2025-01-09 14:26   ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-09 14:26 UTC (permalink / raw)
  To: Martin Jansa; +Cc: bitbake-devel, Stefan Herbrechtsmeier

Am 09.01.2025 um 12:53 schrieb Martin Jansa:
> Hi,
>
> thanks for looking into this.
>
> With this series applied I've noticed some recipes now showing warnings like:
> WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for
> the url to npm fetcher:
> https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz
> WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for
> the url to npm fetcher:
> https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz
> WARNING: enact-dev-native-6.1.3-r0 do_unpack: Please add support for
> the url to npm fetcher:
> https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz

Thanks for your test.

I assume the packages are renamed. I have a fix for that in my WIP branch.
https://github.com/weidmueller/poky/commit/ae988d20777d7a542fe18fcbf95110829eef0b4f

> and the same is shown before from do_fetch.
>
> Not sure what's special about these, but I believe it used to work
> with previous npmsw implementation. Any hint what to check?
Could you please check if the entry in the package-lock.json contains a 
"name" field. It looks like this is an undocumented feature of the 
package-lock.json.

> On Fri, Dec 20, 2024 at 12:26 PM Stefan Herbrechtsmeier via
> lists.openembedded.org
> <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org>
> wrote:
>> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>>
>> The patch series improves the fetcher support for tightly coupled
>> package manager (npm, go and cargo). It adds support for embedded
>> dependency fetcher via a common dependency mixin. The patch series
>> reworks the npm-shrinkwrap.json (package-lock.json) support and adds a
>> fetcher for go.sum and cargo.lock files. The dependency mixin contains
>> two stages. The first stage locates a local specification file or
>> fetches an archive or git repository with a specification file. The
>> second stage resolves the dependency URLs from the specification file
>> and fetches the dependencies.
>>
>> SRC_URI = "<type>://npm-shrinkwrap.json"
>> SRC_URI = "<type>+http://example.com/ npm-shrinkwrap.json"
>> SRC_URI = "<type>+http://example.com/${BP}.tar.gz;striplevel=1;subdir=${BP}"
>> SRC_URI = "<type>+git://example.com/${BPN}.git;protocol=https"
>>
>> Additionally, the patch series reworks the npm fetcher to work without a
>> npm binary and external package repository. It adds support for a common
>> dependency name and version schema to integrate the dependencies into
>> the SBOM.
>>
>> = Background
>> Bitbake has diverse concepts and drawbacks for different tightly coupled
>> package manager. The Python support uses a recipe per dependency and
>> generates common fetcher URLs via a python function. The other languages
>> embed the dependencies inside the recipe. The Node.js support offers a
>> npmsw fetcher which uses a lock file beside the recipe to generates
>> multiple common fetcher URLs on the fly and thereby hides the real
>> download sources. This leads to a single source in the SBOM for example.
>> The Go support contains two parallel implementations. A vendor-based
>> solution with a common fetcher and a go-mod-based solution with a gomod
>> fetcher. The vendor-based solution includes the individual dependencies
>> into the SRC_URI of the recipe and uses a python function to generate
>> common fetcher URLs which additional information for the vendor task.The
>> gomod fetcher uses a proprietary gomod URL. It translates the URL into a
>> common URL and prepares meta data during unpack. The Rust support
>> includes the individual dependencies in the SRC_URI of the recipe and
>> uses proprietary crate URLs. The crate fetcher translates a proprietary
>> URL into a common fetcher URL and prepares meta data during unpack. The
>> recipetool does not support the crate and the gomod fetcher. This leads
>> to missing licenses of the dependencies in the recipe for example
>> librsvg.
>>
>> The steps needed to fetch dependencies for Node.js, Go and Rust are
>> similar:
>> 1. Extract the dependencies from a specification file (name, version,
>>     checksum and URL)
>> 2. Generate proprietary fetcher URIs
>>    a. npm://registry.npmjs.org/;package=glob;version= 10.3.15
>>    b. gomod://golang.org/x/net;version=v0.9.0
>>       gomodgit://golang.org/x/net;version=v0.9.0;repo=go.googlesource.com/net
>>    c. crate://crates.io/glob/0.3.1
>> 3. Generate wget or git fetcher URIs
>>    a. https://registry.npmjs.org/glob/-/glob-10.3.15.tgz;downloadfilename=…
>>    b. https://proxy.golang.org/golang.org/x/net/@v/v0.9.0.zip;downloadfilename=…
>>       git://go.googlesource.com/net;protocol=https; subdir=…
>>    c. https://crates.io/api/v1/crates/glob/0.3.1/download;downloadfilename=…
>> 4. Unpack
>> 5. Create meta files
>>    a. Update lockfile and create tar.gz archives
>>    b. Create go.mod file
>>       Create info, go.mod file and zip archives
>>    c. Create .cargo-checksum.json files
>>
>> It looks like the recipetool is not widely used and therefore this patch
>> series integrates the dependency resolving into the fetcher. After an
>> agreement on a concept the fetcher could be extended. The fetcher could
>> download the license information per package and a new build task could
>> run the license cruncher from the recipetool.
>>
>> = Open questions
>>
>> * Where should we download dependencies?
>> ** Should we use a folder per fetcher (ex. git and npm)?
>> ** Should we use the main folder (ex. crate)?
>> ** Should we translate the name into folder (ex. gomod)?
>> ** Should we integrate the name into the filename (ex. git)?
>> * Where should we unpack the dependencies?
>> ** Should we use a folder inside the parent folder (ex. node_modules)?
>> ** Should we use a fixed folder inside unpackdir
>>     (ex. go/pkg/mod/cache/download and cargo_home/bitbake)?
>> * How should we treat archives for package manager caches?
>> ** Should we unpack the archives to support patching (ex. npm)?
>> ** Should we copy the packed archive to avoid unpacking and packaging
>>     (ex. gomod)?
>>
>> This patch series depends on patch series
>> 20241209103158.20833-1-stefan.herbrechtsmeier-oss@weidmueller.com
>> ("[1/4] tests: fetch: adapt npmsw tests to fixed unpack behavior").
>>
>>
>> Stefan Herbrechtsmeier (21):
>>    tests: fetch: update npmsw tests to new lockfile format
>>    fetch2: npmsw: remove old lockfile format support
>>    tests: fetch: replace [url] with urls for npm
>>    fetch2: do not prefix embedded checksums
>>    fetch2: read checksum from SRC_URI flag for npm
>>    fetch2: introduce common package manager metadata
>>    fetch2: add unpack support for npm archives
>>    utils: add Go mod h1 checksum support
>>    fetch2: add destdir to FetchData
>>    fetch: npm: rework
>>    tests: fetch: adapt style in npm(sw) class
>>    tests: fetch: move npmsw test cases into npmsw test class
>>    tests: fetch: adapt npm test cases
>>    fetch: add dependency mixin
>>    tests: fetch: add test cases for dependency fetcher
>>    fetch: npmsw: migrate to dependency mixin
>>    tests: fetch: adapt npmsw test cases
>>    fetch: add gosum fetcher
>>    tests: fetch: add test cases for gosum
>>    fetch: add cargolock fetcher
>>    tests: fetch: add test cases for cargolock
>>
>>   lib/bb/fetch2/__init__.py   |  35 +-
>>   lib/bb/fetch2/cargolock.py  |  73 +++
>>   lib/bb/fetch2/dependency.py | 167 +++++++
>>   lib/bb/fetch2/gomod.py      |   5 +-
>>   lib/bb/fetch2/gosum.py      |  51 +++
>>   lib/bb/fetch2/npm.py        | 244 +++-------
>>   lib/bb/fetch2/npmsw.py      | 347 ++++----------
>>   lib/bb/tests/fetch.py       | 880 +++++++++++++++++-------------------
>>   lib/bb/utils.py             |  25 +
>>   9 files changed, 916 insertions(+), 911 deletions(-)
>>   create mode 100644 lib/bb/fetch2/cargolock.py
>>   create mode 100644 lib/bb/fetch2/dependency.py
>>   create mode 100644 lib/bb/fetch2/gosum.py
>>
>> --
>> 2.39.5
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#16920): https://lists.openembedded.org/g/bitbake-devel/message/16920
>> Mute This Topic: https://lists.openembedded.org/mt/110212697/3617156
>> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [martin.jansa@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-09 14:00         ` Stefan Herbrechtsmeier
@ 2025-01-09 19:40           ` Alexander Kanavin
  2025-01-10 11:32             ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Alexander Kanavin @ 2025-01-09 19:40 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier
  Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

On Thu, 9 Jan 2025 at 15:00, Stefan Herbrechtsmeier
<stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
> I missed the appropriate function in the dependency mixin in this
> series. The list is created on demand (see archiver or spdx patch).
> Every derivation need to be handled in a package manager lock file.
> Therefore you could place a lock file beside the recipe. You could use
> an editor or the language specific tools to manipulate the lock file.

I'm not sure, is there code I can try that does this (and if so,
how?), or is this code still to be written?

> What is the motivation for that requirement ("just by looking at the layer content")?
> Why can't we use a SBOM for the vulnerability check? Do be really safe you have
> to scan the code because of embedded packages (vendoring) or git submodules.

That's right. I don't disagree with this.

I do however have another concern I want to express: I can't convince
myself that the 'integrated fetcher' is an overall significant,
obvious, major improvement over the 'generate the SRC_URI lists in
.inc files via task in a bbclass' approach.

Just a couple reasons:

- the 'integrated fetcher' is not trivial, and notably increases the
complexity of bitbake fetcher codebase. We already struggle to
maintain bitbake, RP is overloaded, and very few other people have
time and knowledge to look at bitbake patches, and understand what is
going on. You've already seen this with your patchset where getting it
properly reviewed by anyone else than RP is an ongoing challenge. On
the other hand, the .inc updaters are fully contained in oe-core
classes, they implement a task in well-understood 'recipe python'
dialect and thus benefit from a lot more people being able to take
care of them. They're also safer in the sense that any bugs in them
are only triggered when someone needs to update a recipe. Fetchers, on
the other hand, are fairly critical pieces of code and they must work
regardless of host environment, python versions, unforeseen corner
cases in source trees and so on.

- we might be able to remove those long SRC_URI lists by migrating
recipes to the integrated fetcher, but we won't be able to do this
with the licensing information (pointers+checksums to licenses,
license strings) for items that are being fetched. For that, you still
need some way to write it into a recipe with a tool. We don't do this
yet, but we really should.

Alex


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 06/21] fetch2: introduce common package manager metadata
       [not found] ` <1812DEFF37B8C65E.26783@lists.openembedded.org>
@ 2025-01-10  7:12   ` Stefan Herbrechtsmeier
  0 siblings, 0 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-10  7:12 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie; +Cc: Stefan Herbrechtsmeier

Am 20.12.2024 um 12:25 schrieb Stefan Herbrechtsmeier via 
lists.openembedded.org:
> From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
>
> Downloads from package manager repositories are identified via registry,
> name, and version. The fetchers use individual styles to define the
> download metadata:
>
> npm://<REGISTRY>;package=<NAME>;version=<VERSION>
>
> crate://<REGISTRY>/<NAME>/<VERSION>
>
> GO_MOD_PROXY = “<REGISTRY>”
> gomod://<NAME>;version=<VERSION>
> gomodgit://<NAME>;version=<VERSION>;repo= <REPOSITORY>
>
> The name and version are important for the SBOM to add usable name,
> version, and CPE to the SBOM entries for the downloaded dependencies.
> Introduce a common style and check the existence of the parameters:
>
> <TYPE>://<REGISTRY | REPOSITORY>;dn=<NAME>;dv=<VERSION>
>
> The style clearly separates the metadata and supports slashes and @
> in the name.
>
> Signed-off-by: Stefan Herbrechtsmeier <stefan.herbrechtsmeier@weidmueller.com>
> ---
>
>   lib/bb/fetch2/__init__.py | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/lib/bb/fetch2/__init__.py b/lib/bb/fetch2/__init__.py
> index d2a30c18f..4b7c01d6a 100644
> --- a/lib/bb/fetch2/__init__.py
> +++ b/lib/bb/fetch2/__init__.py
> @@ -1356,6 +1356,12 @@ class FetchData(object):
>           if hasattr(self.method, "urldata_init"):
>               self.method.urldata_init(self, d)
>   
> +        if self.method.require_download_metadata():
> +            if "dn" not in self.parm:
> +                raise MissingParameterError("dn", self.url)
> +            if "dv" not in self.parm:
> +                raise MissingParameterError("dv", self.url)
> +

Alternative to the short name (dn, dv) we could add an optional version 
to the resolution of the checksum and source revision and remove the 
version value from the name parameter:

configure_checksum:
     if all(key in self.parm for key in ["name", "version"]):
          checksum_name = "%s@%s.%ssum" % (self.parm["name"], 
self.parm["version"], checksum_id)

srcrev_internal_helper:
     if name and version:
         attempts.append("SRCREV_%s@%s" % (name, version))

>           for checksum_id in CHECKSUM_LIST:
>               configure_checksum(checksum_id)
>   
> @@ -1711,6 +1717,12 @@ class FetchMethod(object):
>           """
>           return []
>   
> +    def require_download_metadata(self):
> +        """
> +        The fetcher requires download name (dn) und version (dv) parameter.
> +        """
> +        return False
> +
>   
>   class DummyUnpackTracer(object):
>       """


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-09 19:40           ` Alexander Kanavin
@ 2025-01-10 11:32             ` Stefan Herbrechtsmeier
  2025-01-10 13:26               ` Alexander Kanavin
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-10 11:32 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 5696 bytes --]

Am 09.01.2025 um 20:40 schrieb Alexander Kanavin:
> On Thu, 9 Jan 2025 at 15:00, Stefan Herbrechtsmeier
> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>> I missed the appropriate function in the dependency mixin in this
>> series. The list is created on demand (see archiver or spdx patch).
>> Every derivation need to be handled in a package manager lock file.
>> Therefore you could place a lock file beside the recipe. You could use
>> an editor or the language specific tools to manipulate the lock file.
> I'm not sure, is there code I can try that does this (and if so,
> how?), or is this code still to be written?

You have to inherit create-spdx-2.2. If you use poky as distro you have 
to replace the 3.0 in create-spdx with 2.2 because it is impossible to 
override the inherit in poky.conf. Afterwards you can create a SBOM with 
the following command:

bitbake -c create_spdx librsvg

The same feature will be added to create-spdx-3.0 but therefore I need 
some recommendations from the spdx experts. After we have an agreement 
how to provide the needed information I will work on the create-spdx-3.0 
support.

>> What is the motivation for that requirement ("just by looking at the layer content")?
>> Why can't we use a SBOM for the vulnerability check? Do be really safe you have
>> to scan the code because of embedded packages (vendoring) or git submodules.
> That's right. I don't disagree with this.
>
> I do however have another concern I want to express: I can't convince
> myself that the 'integrated fetcher' is an overall significant,
> obvious, major improvement over the 'generate the SRC_URI lists in
> .inc files via task in a bbclass' approach.
>
> Just a couple reasons:
>
> - the 'integrated fetcher' is not trivial, and notably increases the
> complexity of bitbake fetcher codebase. We already struggle to
> maintain bitbake, RP is overloaded, and very few other people have
> time and knowledge to look at bitbake patches, and understand what is
> going on. You've already seen this with your patchset where getting it
> properly reviewed by anyone else than RP is an ongoing challenge. On
> the other hand, the .inc updaters are fully contained in oe-core
> classes, they implement a task in well-understood 'recipe python'
> dialect and thus benefit from a lot more people being able to take
> care of them. They're also safer in the sense that any bugs in them
> are only triggered when someone needs to update a recipe. Fetchers, on
> the other hand, are fairly critical pieces of code and they must work
> regardless of host environment, python versions, unforeseen corner
> cases in source trees and so on.

You mixed two different points. We have to distinguish between bitbake 
fetcher and the on-the-fly resolve of SRC_URIs.

Regarding the bitbake fetcher the same reason are true for the language 
specific fetchers. The fetchers are based on the wget or git fetcher. 
They only add a preprocessing of the source uri and a post-processing of 
the download. There is no requirement to do this inside the fetcher.

The on-the-fly resolve is also possible in oe-core. I think it isn't 
really practicable to manipulate the resoled source uris because of 
dependencies between dependencies and the relationship to other package 
manager configuration files. Why shouldn't we use the package manager 
specific tools to update the configuration and dependency specification. 
I understand that a patch is more straightforward than a new dependency 
specification file.

The inc file is an oe specific format of a dependency specification / 
lock file without available tools to update entries with respect to the 
relationship between entries. Furthermore it is impossible to use the 
changes outside of oe for tests or debugging.

What is your opinion regarding gitsm. Should we remove the bitbake 
fetcher and use a update task to generate a inc file with the source 
uris and source revisions?

Do you really review the changes of the inc file?

I understand the points but I have the feeling that they are more 
theoretically for package manager dependencies or could be solved in an 
other way (ex. caching)

> - we might be able to remove those long SRC_URI lists by migrating
> recipes to the integrated fetcher, but we won't be able to do this
> with the licensing information (pointers+checksums to licenses,
> license strings) for items that are being fetched. For that, you still
> need some way to write it into a recipe with a tool. We don't do this
> yet, but we really should.

The license topic is independent of the fetcher because the dependency 
specification doesn't contain license information. The recipetool shows 
that the license topic is really complicated. It is possible to fetch a 
license string from the package manager repository but this information 
is useless without a pointer to the license file and a checksum. The 
automatic determine of the license from a file need a very good tooling 
because we need to trust the process and it must minimize the manual 
correction. Furthermore you need a central database otherwise you have 
to fix the same problem twice because the recipes could use the same 
dependency.

Even if we put all license information inside the inc file. Who should 
review the changes ? What tooling is used to review the change (license 
content)? If we blindly trust the inc file generator, the inc file is 
useless and we can generate the information on-the-fly.

I understand the motivation for the update task / inc file but I don't 
think it adds any practical benefit.

Nevertheless I will move my implementation to oe-core and add a task to 
generate an inc file as starting point.

[-- Attachment #2: Type: text/html, Size: 7249 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-10 11:32             ` Stefan Herbrechtsmeier
@ 2025-01-10 13:26               ` Alexander Kanavin
  2025-01-10 15:04                 ` Stefan Herbrechtsmeier
  0 siblings, 1 reply; 66+ messages in thread
From: Alexander Kanavin @ 2025-01-10 13:26 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier
  Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier
<stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?

That ship has sailed. We can't remove gitsm, it has users, and they
will be very angry.

> Do you really review the changes of the inc file?
> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)

But do you? I have to restate the point: a solution that can be placed
inside a layer is much more scalable and maintainable than adding code
to bitbake. That's why I'm leaning towards drawing the line at
existing fetchers that are wget/git convenience wrappers, and shifting
dependency/lockfile management to layers. It's ultimately RP's call,
but he does seek feedback :)

I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
new versions. And since you asked, no, no one looks at them, they're
auto-generated noise that we learned to block out, just as we learned
to quickly skim over recipe patch changes that are just line number
churn and similar non-functional changes.

> Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly.

We won't blindly trust a generator. There are multiple gate-keeping
steps, some of which already work, and some should still be
implemented:

- when creating a recipe with devtool, devtool should discover all
licenses and generate appropriate recipe metadata. For classic unix-y
components this has to rely on 'guessing', but things like crates have
deterministic licensing metadata (a field in Cargo.toml, and LICENSE-*
files if I remember right). We can also propose adding such
determinism upstream if it's not currently good enough.

- when updating a recipe with devtool to a new upstream release, it
uses the file:// entries in LIC_FILES_CHKSUM to generate a diff of
previous license texts and the new ones, and writes that as a comment
into the updated recipe. The diff is reviewed by a human performing
the update, and condensed into an update to the LICENSE field (if
needed), and an explanation of what changed in the License-Update tag
in the commit message. This could be further automated if upstream has
deterministic ways to specify licenses, e.g. LICENSE =
"&".join(all_license_ids).

- when sending the resulting patch for review, there's a mailing list
bot (patchtest), which will check that any update in license checksums
is accompanied by an explanation in License-Update tag. There are also
humans which will check that the licensing changes are sensible.
Otherwise we do trust that submitters spot important changes in
licensing (from the diff in the previous step or by manual comparison,
if they want) and summarise them in LICENSE correctly.

- finally there are various license checks that run in recipe_qa task
and implemented in insane.bbclass. They could be extended to verify
that every dependency has a matching license entry in the recipe and
so on. Anything that can be caught by looking at the source tree and
the license metadata.

> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.

That would be much appreciated. The more I think about it the more I'm
convinced we should have it standardized in core.

Alex


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-10 13:26               ` Alexander Kanavin
@ 2025-01-10 15:04                 ` Stefan Herbrechtsmeier
  2025-01-10 16:07                   ` Alexander Kanavin
  2025-01-10 20:24                   ` Bruce Ashfield
  0 siblings, 2 replies; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-10 15:04 UTC (permalink / raw)
  To: Alexander Kanavin; +Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 4365 bytes --]

Am 10.01.2025 um 14:26 schrieb Alexander Kanavin:
> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier
> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
> That ship has sailed. We can't remove gitsm, it has users, and they
> will be very angry.

This makes it impossible to fix wrong design decision or remove code 
with a low code quality.

>> Do you really review the changes of the inc file?
>> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)
> But do you? I have to restate the point: a solution that can be placed
> inside a layer is much more scalable and maintainable than adding code
> to bitbake. That's why I'm leaning towards drawing the line at
> existing fetchers that are wget/git convenience wrappers, and shifting
> dependency/lockfile management to layers. It's ultimately RP's call,
> but he does seek feedback :)
I'm working on it.

> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
> new versions. And since you asked, no, no one looks at them, they're
> auto-generated noise that we learned to block out, just as we learned
> to quickly skim over recipe patch changes that are just line number
> churn and similar non-functional changes.

Instead of an inc file the generated SRC_URIs could be saved inside the 
work directory of the recipe. This will eliminate the noise and avoid a 
manual run of an update task after a recipe changes.

>> Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly.
> We won't blindly trust a generator. There are multiple gate-keeping
> steps, some of which already work, and some should still be
> implemented:
>
> - when creating a recipe with devtool, devtool should discover all
> licenses and generate appropriate recipe metadata. For classic unix-y
> components this has to rely on 'guessing', but things like crates have
> deterministic licensing metadata (a field in Cargo.toml, and LICENSE-*
> files if I remember right). We can also propose adding such
> determinism upstream if it's not currently good enough.
>
> - when updating a recipe with devtool to a new upstream release, it
> uses thefile:// entries in LIC_FILES_CHKSUM to generate a diff of
> previous license texts and the new ones, and writes that as a comment
> into the updated recipe. The diff is reviewed by a human performing
> the update, and condensed into an update to the LICENSE field (if
> needed), and an explanation of what changed in the License-Update tag
> in the commit message. This could be further automated if upstream has
> deterministic ways to specify licenses, e.g. LICENSE =
> "&".join(all_license_ids).
>
> - when sending the resulting patch for review, there's a mailing list
> bot (patchtest), which will check that any update in license checksums
> is accompanied by an explanation in License-Update tag. There are also
> humans which will check that the licensing changes are sensible.
> Otherwise we do trust that submitters spot important changes in
> licensing (from the diff in the previous step or by manual comparison,
> if they want) and summarise them in LICENSE correctly.
>
> - finally there are various license checks that run in recipe_qa task
> and implemented in insane.bbclass. They could be extended to verify
> that every dependency has a matching license entry in the recipe and
> so on. Anything that can be caught by looking at the source tree and
> the license metadata.
This works for individual project but become complicated for 
dependencies because you have to handle the same change multiple times. 
But lets stop the discussion for now because license is out of scope of 
this series.

>> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.
> That would be much appreciated. The more I think about it the more I'm
> convinced we should have it standardized in core.
What do you mean by standardized?

[-- Attachment #2: Type: text/html, Size: 6188 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-10 15:04                 ` Stefan Herbrechtsmeier
@ 2025-01-10 16:07                   ` Alexander Kanavin
  2025-01-10 20:24                   ` Bruce Ashfield
  1 sibling, 0 replies; 66+ messages in thread
From: Alexander Kanavin @ 2025-01-10 16:07 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier
  Cc: richard.purdie, bitbake-devel, Stefan Herbrechtsmeier

On Fri, 10 Jan 2025 at 16:04, Stefan Herbrechtsmeier
<stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
>
> That ship has sailed. We can't remove gitsm, it has users, and they
> will be very angry.
>
> This makes it impossible to fix wrong design decision or remove code with a low code quality.

It's still possible, you just can't be heavy-handed and dictatorial
about 'removing' stuff you don't like. When the existing thing works
very well for a lot of people (and gitsm does), then the new thing has
to be obviously better, you need to do your best to convince as many
people as possible of that, and it needs to co-exist with the old
thing, so that users can migrate at their own pace. And some of the
users may never do that, and they will get annoyed at or ignore
deprecation warnings or similar attempts to push them.

> Instead of an inc file the generated SRC_URIs could be saved inside the work directory of the recipe. This will eliminate the noise and avoid a manual run of an update task after a recipe changes.

I would be very interested to see the proof of concept that does this.

> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.
>
> That would be much appreciated. The more I think about it the more I'm
> convinced we should have it standardized in core.
>
> What do you mean by standardized?

Standardized handling of embedded dependencies, lock files and various
other aspects of language-specific package managers, so that adding
support for a new thing would be writing a new
extension/plugin/subclass for the existing framework.

Alex

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-10 15:04                 ` Stefan Herbrechtsmeier
  2025-01-10 16:07                   ` Alexander Kanavin
@ 2025-01-10 20:24                   ` Bruce Ashfield
  2025-01-13  7:11                     ` Stefan Herbrechtsmeier
  1 sibling, 1 reply; 66+ messages in thread
From: Bruce Ashfield @ 2025-01-10 20:24 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss
  Cc: Alexander Kanavin, richard.purdie, bitbake-devel,
	Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 5801 bytes --]

On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via
lists.openembedded.org <stefan.herbrechtsmeier-oss=
weidmueller.com@lists.openembedded.org> wrote:

> Am 10.01.2025 um 14:26 schrieb Alexander Kanavin:
>
> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier<stefan.herbrechtsmeier-oss@weidmueller.com> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>
> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
>
> That ship has sailed. We can't remove gitsm, it has users, and they
> will be very angry.
>
> This makes it impossible to fix wrong design decision or remove code with
> a low code quality.
>
> Do you really review the changes of the inc file?
> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)
>
> But do you? I have to restate the point: a solution that can be placed
> inside a layer is much more scalable and maintainable than adding code
> to bitbake. That's why I'm leaning towards drawing the line at
> existing fetchers that are wget/git convenience wrappers, and shifting
> dependency/lockfile management to layers. It's ultimately RP's call,
> but he does seek feedback :)
>
> I'm working on it.
>
> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
> new versions. And since you asked, no, no one looks at them, they're
> auto-generated noise that we learned to block out, just as we learned
> to quickly skim over recipe patch changes that are just line number
> churn and similar non-functional changes.
>
> Instead of an inc file the generated SRC_URIs could be saved inside the
> work directory of the recipe. This will eliminate the noise and avoid a
> manual run of an update task after a recipe changes.
>

Except for those that want the .inc file changes to be version controlled
(as well as SRC_URI changes), but maybe I'm misunderstanding what you
described above

A generated temporary/build file is definitely more visible than something
that is programmatically done and held internally during recipe processing
and build.  It opens the door for extension and doing version control on
it.  So I don't object to the concept, I just don't think I have all the
details straight in my head.

Cheers,

Bruce

Even if we put all license information inside the inc file. Who should
review the changes ? What tooling is used to review the change
(license content)? If we blindly trust the inc file generator, the inc
file is useless and we can generate the information on-the-fly.
>
> We won't blindly trust a generator. There are multiple gate-keeping
> steps, some of which already work, and some should still be
> implemented:
>
> - when creating a recipe with devtool, devtool should discover all
> licenses and generate appropriate recipe metadata. For classic unix-y
> components this has to rely on 'guessing', but things like crates have
> deterministic licensing metadata (a field in Cargo.toml, and LICENSE-*
> files if I remember right). We can also propose adding such
> determinism upstream if it's not currently good enough.
>
> - when updating a recipe with devtool to a new upstream release, it
> uses the file:// entries in LIC_FILES_CHKSUM to generate a diff of
> previous license texts and the new ones, and writes that as a comment
> into the updated recipe. The diff is reviewed by a human performing
> the update, and condensed into an update to the LICENSE field (if
> needed), and an explanation of what changed in the License-Update tag
> in the commit message. This could be further automated if upstream has
> deterministic ways to specify licenses, e.g. LICENSE =
> "&".join(all_license_ids).
>
> - when sending the resulting patch for review, there's a mailing list
> bot (patchtest), which will check that any update in license checksums
> is accompanied by an explanation in License-Update tag. There are also
> humans which will check that the licensing changes are sensible.
> Otherwise we do trust that submitters spot important changes in
> licensing (from the diff in the previous step or by manual comparison,
> if they want) and summarise them in LICENSE correctly.
>
> - finally there are various license checks that run in recipe_qa task
> and implemented in insane.bbclass. They could be extended to verify
> that every dependency has a matching license entry in the recipe and
> so on. Anything that can be caught by looking at the source tree and
> the license metadata.
>
> This works for individual project but become complicated for dependencies
> because you have to handle the same change multiple times. But lets stop
> the discussion for now because license is out of scope of this series.
>
> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.
>
> That would be much appreciated. The more I think about it the more I'm
> convinced we should have it standardized in core.
>
> What do you mean by standardized?
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#17006):
> https://lists.openembedded.org/g/bitbake-devel/message/17006
> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> bruce.ashfield@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II

[-- Attachment #2: Type: text/html, Size: 8388 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-10 20:24                   ` Bruce Ashfield
@ 2025-01-13  7:11                     ` Stefan Herbrechtsmeier
  2025-01-17  4:19                       ` Bruce Ashfield
  0 siblings, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-13  7:11 UTC (permalink / raw)
  To: Bruce Ashfield
  Cc: Alexander Kanavin, richard.purdie, bitbake-devel,
	Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 6850 bytes --]

Am 10.01.2025 um 21:24 schrieb Bruce Ashfield:
> On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via 
> lists.openembedded.org <http://lists.openembedded.org> 
> <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org> wrote:
>
>     Am 10.01.2025 um 14:26 schrieb Alexander Kanavin:
>>     On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier
>>     <stefan.herbrechtsmeier-oss@weidmueller.com> <mailto:stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>>>     What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
>>     That ship has sailed. We can't remove gitsm, it has users, and they
>>     will be very angry.
>
>     This makes it impossible to fix wrong design decision or remove
>     code with a low code quality.
>
>>>     Do you really review the changes of the inc file?
>>>     I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)
>>     But do you? I have to restate the point: a solution that can be placed
>>     inside a layer is much more scalable and maintainable than adding code
>>     to bitbake. That's why I'm leaning towards drawing the line at
>>     existing fetchers that are wget/git convenience wrappers, and shifting
>>     dependency/lockfile management to layers. It's ultimately RP's call,
>>     but he does seek feedback :)
>     I'm working on it.
>
>>     I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
>>     new versions. And since you asked, no, no one looks at them, they're
>>     auto-generated noise that we learned to block out, just as we learned
>>     to quickly skim over recipe patch changes that are just line number
>>     churn and similar non-functional changes.
>
>     Instead of an inc file the generated SRC_URIs could be saved
>     inside the work directory of the recipe. This will eliminate the
>     noise and avoid a manual run of an update task after a recipe changes.
>
>
> Except for those that want the .inc file changes to be version 
> controlled (as well as SRC_URI changes), but maybe I'm 
> misunderstanding what you described above

Why should somebody version control the generated SRC_URI?


> A generated temporary/build file is definitely more visible than 
> something that is programmatically done and held internally during 
> recipe processing and build.  It opens the door for extension and 
> doing version control on it. So I don't object to the concept, I just 
> don't think I have all the details straight in my head.

A generated build file will be saved in the work directory of the recipe 
like any other generated build file. It is impossible to add it to the 
version control system. The update task create a version controlled 
generated source file. I don't understand why the version control is 
needed because the source of the generator and the generator are version 
controlled. Especially if the output is ignored during patch review. I 
think it is much more straightforward to patch the source (lock file) 
because it is complicated to handle manual changes during regeneration 
of a generated file.

>>>     Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly.
>>     We won't blindly trust a generator. There are multiple gate-keeping
>>     steps, some of which already work, and some should still be
>>     implemented:
>>
>>     - when creating a recipe with devtool, devtool should discover all
>>     licenses and generate appropriate recipe metadata. For classic unix-y
>>     components this has to rely on 'guessing', but things like crates have
>>     deterministic licensing metadata (a field in Cargo.toml, and LICENSE-*
>>     files if I remember right). We can also propose adding such
>>     determinism upstream if it's not currently good enough.
>>
>>     - when updating a recipe with devtool to a new upstream release, it
>>     uses thefile:// entries in LIC_FILES_CHKSUM to generate a diff of
>>     previous license texts and the new ones, and writes that as a comment
>>     into the updated recipe. The diff is reviewed by a human performing
>>     the update, and condensed into an update to the LICENSE field (if
>>     needed), and an explanation of what changed in the License-Update tag
>>     in the commit message. This could be further automated if upstream has
>>     deterministic ways to specify licenses, e.g. LICENSE =
>>     "&".join(all_license_ids).
>>
>>     - when sending the resulting patch for review, there's a mailing list
>>     bot (patchtest), which will check that any update in license checksums
>>     is accompanied by an explanation in License-Update tag. There are also
>>     humans which will check that the licensing changes are sensible.
>>     Otherwise we do trust that submitters spot important changes in
>>     licensing (from the diff in the previous step or by manual comparison,
>>     if they want) and summarise them in LICENSE correctly.
>>
>>     - finally there are various license checks that run in recipe_qa task
>>     and implemented in insane.bbclass. They could be extended to verify
>>     that every dependency has a matching license entry in the recipe and
>>     so on. Anything that can be caught by looking at the source tree and
>>     the license metadata.
>     This works for individual project but become complicated for
>     dependencies because you have to handle the same change multiple
>     times. But lets stop the discussion for now because license is out
>     of scope of this series.
>
>>>     Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.
>>     That would be much appreciated. The more I think about it the more I'm
>>     convinced we should have it standardized in core.
>     What do you mean by standardized?
>
>
>     -=-=-=-=-=-=-=-=-=-=-=-
>     Links: You receive all messages sent to this group.
>     View/Reply Online (#17006):
>     https://lists.openembedded.org/g/bitbake-devel/message/17006
>     Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
>     Group Owner: bitbake-devel+owner@lists.openembedded.org
>     <mailto:bitbake-devel%2Bowner@lists.openembedded.org>
>     Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub
>     [bruce.ashfield@gmail.com]
>     -=-=-=-=-=-=-=-=-=-=-=-
>
>
>
> -- 
> - Thou shalt not follow the NULL pointer, for chaos and madness await 
> thee at its end
> - "Use the force Harry" - Gandalf, Star Trek II
>

[-- Attachment #2: Type: text/html, Size: 10920 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-13  7:11                     ` Stefan Herbrechtsmeier
@ 2025-01-17  4:19                       ` Bruce Ashfield
  2025-01-17  5:37                         ` Alexander Kanavin
  2025-01-17  7:45                         ` Stefan Herbrechtsmeier
  0 siblings, 2 replies; 66+ messages in thread
From: Bruce Ashfield @ 2025-01-17  4:19 UTC (permalink / raw)
  To: Stefan Herbrechtsmeier
  Cc: Alexander Kanavin, richard.purdie, bitbake-devel,
	Stefan Herbrechtsmeier

[-- Attachment #1: Type: text/plain, Size: 8158 bytes --]

On Mon, Jan 13, 2025 at 2:11 AM Stefan Herbrechtsmeier <
stefan.herbrechtsmeier-oss@weidmueller.com> wrote:

> Am 10.01.2025 um 21:24 schrieb Bruce Ashfield:
>
> On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via
> lists.openembedded.org <stefan.herbrechtsmeier-oss=
> weidmueller.com@lists.openembedded.org> wrote:
>
>> Am 10.01.2025 um 14:26 schrieb Alexander Kanavin:
>>
>> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier<stefan.herbrechtsmeier-oss@weidmueller.com> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>>
>> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
>>
>> That ship has sailed. We can't remove gitsm, it has users, and they
>> will be very angry.
>>
>> This makes it impossible to fix wrong design decision or remove code with
>> a low code quality.
>>
>> Do you really review the changes of the inc file?
>> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)
>>
>> But do you? I have to restate the point: a solution that can be placed
>> inside a layer is much more scalable and maintainable than adding code
>> to bitbake. That's why I'm leaning towards drawing the line at
>> existing fetchers that are wget/git convenience wrappers, and shifting
>> dependency/lockfile management to layers. It's ultimately RP's call,
>> but he does seek feedback :)
>>
>> I'm working on it.
>>
>> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
>> new versions. And since you asked, no, no one looks at them, they're
>> auto-generated noise that we learned to block out, just as we learned
>> to quickly skim over recipe patch changes that are just line number
>> churn and similar non-functional changes.
>>
>> Instead of an inc file the generated SRC_URIs could be saved inside the
>> work directory of the recipe. This will eliminate the noise and avoid a
>> manual run of an update task after a recipe changes.
>>
>
> Except for those that want the .inc file changes to be version controlled
> (as well as SRC_URI changes), but maybe I'm misunderstanding what you
> described above
>
> Why should somebody version control the generated SRC_URI?
>
>
> Why wouldn't they ? I'm talking about when the SRC_URI is generated to git
fetches (or whatever), that is part of the recipe and version controlled.

My point is that this is not throw away / transient information for many
use cases. It is something that can be tracked between updates to the
recipes.



> A generated temporary/build file is definitely more visible than something
> that is programmatically done and held internally during recipe processing
> and build.  It opens the door for extension and doing version control on
> it.  So I don't object to the concept, I just don't think I have all the
> details straight in my head.
>
> A generated build file will be saved in the work directory of the recipe
> like any other generated build file. It is impossible to add it to the
> version control system. The update task create a version controlled
> generated source file. I don't understand why the version control is needed
> because the source of the generator and the generator are version
> controlled. Especially if the output is ignored during patch review. I
> think it is much more straightforward to patch the source (lock file)
> because it is complicated to handle manual changes during regeneration of a
> generated file.
>
*sigh*. I'm quite aware of what can and cannot be done. That's not what I
meant. I'm obviously not talking about something in WORKDIR. I'm just
saying that if something is written to disk, then depending on how things
are implemented it can be viewed, debugged and manipulated. If it is always
generated, held internally to the classes and used, I have no options to do
that sort of debug. Similarly, anything that is generated, it would be
ideal if there was a way to re-use a previously generated artifact and not
generate it on the fly .. that's the element that opens the door to version
control and tracking.

We'll agree to disagree on what is or isn't efficient or complicated.
Luckily, this is all opt-in, so I'll never really have to use it. I'm just
sharing what it would take to get me to consider it based on what I've
learned/suffered in my time maintaining quite a few go recipes.

Cheers,

Bruce


> Even if we put all license information inside the inc file. Who should review the changes ? What tooling is used to review the change (license content)? If we blindly trust the inc file generator, the inc file is useless and we can generate the information on-the-fly.
>>
>> We won't blindly trust a generator. There are multiple gate-keeping
>> steps, some of which already work, and some should still be
>> implemented:
>>
>> - when creating a recipe with devtool, devtool should discover all
>> licenses and generate appropriate recipe metadata. For classic unix-y
>> components this has to rely on 'guessing', but things like crates have
>> deterministic licensing metadata (a field in Cargo.toml, and LICENSE-*
>> files if I remember right). We can also propose adding such
>> determinism upstream if it's not currently good enough.
>>
>> - when updating a recipe with devtool to a new upstream release, it
>> uses the file:// entries in LIC_FILES_CHKSUM to generate a diff of
>> previous license texts and the new ones, and writes that as a comment
>> into the updated recipe. The diff is reviewed by a human performing
>> the update, and condensed into an update to the LICENSE field (if
>> needed), and an explanation of what changed in the License-Update tag
>> in the commit message. This could be further automated if upstream has
>> deterministic ways to specify licenses, e.g. LICENSE =
>> "&".join(all_license_ids).
>>
>> - when sending the resulting patch for review, there's a mailing list
>> bot (patchtest), which will check that any update in license checksums
>> is accompanied by an explanation in License-Update tag. There are also
>> humans which will check that the licensing changes are sensible.
>> Otherwise we do trust that submitters spot important changes in
>> licensing (from the diff in the previous step or by manual comparison,
>> if they want) and summarise them in LICENSE correctly.
>>
>> - finally there are various license checks that run in recipe_qa task
>> and implemented in insane.bbclass. They could be extended to verify
>> that every dependency has a matching license entry in the recipe and
>> so on. Anything that can be caught by looking at the source tree and
>> the license metadata.
>>
>> This works for individual project but become complicated for dependencies
>> because you have to handle the same change multiple times. But lets stop
>> the discussion for now because license is out of scope of this series.
>>
>> Nevertheless I will move my implementation to oe-core and add a task to generate an inc file as starting point.
>>
>> That would be much appreciated. The more I think about it the more I'm
>> convinced we should have it standardized in core.
>>
>> What do you mean by standardized?
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>> View/Reply Online (#17006):
>> https://lists.openembedded.org/g/bitbake-devel/message/17006
>> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
>> Group Owner: bitbake-devel+owner@lists.openembedded.org
>> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
>> bruce.ashfield@gmail.com]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>
>>
>
> --
> - Thou shalt not follow the NULL pointer, for chaos and madness await thee
> at its end
> - "Use the force Harry" - Gandalf, Star Trek II
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II

[-- Attachment #2: Type: text/html, Size: 13326 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-17  4:19                       ` Bruce Ashfield
@ 2025-01-17  5:37                         ` Alexander Kanavin
  2025-01-17  7:45                         ` Stefan Herbrechtsmeier
  1 sibling, 0 replies; 66+ messages in thread
From: Alexander Kanavin @ 2025-01-17  5:37 UTC (permalink / raw)
  To: Bruce Ashfield
  Cc: Stefan Herbrechtsmeier, richard.purdie, bitbake-devel,
	Stefan Herbrechtsmeier

On Fri, 17 Jan 2025 at 05:20, Bruce Ashfield <bruce.ashfield@gmail.com> wrote:

> *sigh*. I'm quite aware of what can and cannot be done. That's not what I meant. I'm obviously not talking about something in WORKDIR. I'm just saying that if something is written to disk, then depending on how things are implemented it can be viewed, debugged and manipulated. If it is always generated, held internally to the classes and used, I have no options to do that sort of debug. Similarly, anything that is generated, it would be ideal if there was a way to re-use a previously generated artifact and not generate it on the fly .. that's the element that opens the door to version control and tracking.
>
> We'll agree to disagree on what is or isn't efficient or complicated. Luckily, this is all opt-in, so I'll never really have to use it. I'm just sharing what it would take to get me to consider it based on what I've learned/suffered in my time maintaining quite a few go recipes.

I beg to differ, as someone who maintains a few rust/cargo recipes.

I haven't once found this ability to track SRC_URIs in recipes useful.
It's always been auto-generated noise and I'd be very willing to
consider an implementation that keeps it neatly hidden, if this
implementation is fully oe-core based.

So Stefan, don't let this discourage you.

Alex


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-17  4:19                       ` Bruce Ashfield
  2025-01-17  5:37                         ` Alexander Kanavin
@ 2025-01-17  7:45                         ` Stefan Herbrechtsmeier
  2025-01-17 14:09                           ` Bruce Ashfield
  1 sibling, 1 reply; 66+ messages in thread
From: Stefan Herbrechtsmeier @ 2025-01-17  7:45 UTC (permalink / raw)
  To: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 6323 bytes --]

Am 17.01.2025 um 05:19 schrieb Bruce Ashfield via lists.openembedded.org:
> On Mon, Jan 13, 2025 at 2:11 AM Stefan Herbrechtsmeier 
> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>
>     Am 10.01.2025 um 21:24 schrieb Bruce Ashfield:
>>     On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via
>>     lists.openembedded.org <http://lists.openembedded.org>
>>     <stefan.herbrechtsmeier-oss=weidmueller.com@lists.openembedded.org>
>>     wrote:
>>
>>         Am 10.01.2025 um 14:26 schrieb Alexander Kanavin:
>>>         On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier
>>>         <stefan.herbrechtsmeier-oss@weidmueller.com> <mailto:stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>>>>         What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
>>>         That ship has sailed. We can't remove gitsm, it has users, and they
>>>         will be very angry.
>>
>>         This makes it impossible to fix wrong design decision or
>>         remove code with a low code quality.
>>
>>>>         Do you really review the changes of the inc file?
>>>>         I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)
>>>         But do you? I have to restate the point: a solution that can be placed
>>>         inside a layer is much more scalable and maintainable than adding code
>>>         to bitbake. That's why I'm leaning towards drawing the line at
>>>         existing fetchers that are wget/git convenience wrappers, and shifting
>>>         dependency/lockfile management to layers. It's ultimately RP's call,
>>>         but he does seek feedback :)
>>         I'm working on it.
>>
>>>         I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
>>>         new versions. And since you asked, no, no one looks at them, they're
>>>         auto-generated noise that we learned to block out, just as we learned
>>>         to quickly skim over recipe patch changes that are just line number
>>>         churn and similar non-functional changes.
>>
>>         Instead of an inc file the generated SRC_URIs could be saved
>>         inside the work directory of the recipe. This will eliminate
>>         the noise and avoid a manual run of an update task after a
>>         recipe changes.
>>
>>
>>     Except for those that want the .inc file changes to be version
>>     controlled (as well as SRC_URI changes), but maybe I'm
>>     misunderstanding what you described above
>
>     Why should somebody version control the generated SRC_URI?
>
>
> Why wouldn't they ? I'm talking about when the SRC_URI is generated to 
> git fetches (or whatever), that is part of the recipe and version 
> controlled.
>
> My point is that this is not throw away / transient information for 
> many use cases. It is something that can be tracked between updates to 
> the recipes.
>
>>     A generated temporary/build file is definitely more visible than
>>     something that is programmatically done and held internally
>>     during recipe processing and build.  It opens the door for
>>     extension and doing version control on it.  So I don't object to
>>     the concept, I just don't think I have all the details straight
>>     in my head.
>
>     A generated build file will be saved in the work directory of the
>     recipe like any other generated build file. It is impossible to
>     add it to the version control system. The update task create a
>     version controlled generated source file. I don't understand why
>     the version control is needed because the source of the generator
>     and the generator are version controlled. Especially if the output
>     is ignored during patch review. I think it is much more
>     straightforward to patch the source (lock file) because it is
>     complicated to handle manual changes during regeneration of a
>     generated file.
>
> *sigh*. I'm quite aware of what can and cannot be done. That's not 
> what I meant. I'm obviously not talking about something in WORKDIR. 
> I'm just saying that if something is written to disk, then depending 
> on how things are implemented it can be viewed, debugged and 
> manipulated. If it is always generated, held internally to the classes 
> and used, I have no options to do that sort of debug. Similarly, 
> anything that is generated, it would be ideal if there was a way to 
> re-use a previously generated artifact and not generate it on the fly 
> .. that's the element that opens the door to version control and tracking.

Why do we need to track the generated file if the source is version 
control and the generated file is cached like any other task output. I 
working on a prototype with the following steps:

1. Fetch the sources from the recipe (do_fetch)
2. Unpack the sources from the recipe (do_unpack)
2. Apply patches which are marked as early to patch the lock file 
(do_patch_early)
3. Resolve dependencies from the lock file and write it into a file 
(do_vendor_resolve)
4. Fetch dependencies (do_vendor_fetch)
5. Unpack dependencies into a package manager cache (do_vendor_unpack)
6. Create a vendor directory below the source folder (do_vendor)
7. Apply patches (do_patch)

The go, rust and npm fetchers work. The go vendor folder works. I'm 
still working on the vendor directory for crate, a solution for npm 
without JavaScript and the integration of the dynamic sources into the 
SBOM, archiver and so on.

Do you have a recommendation for an example project for the Rust, Go and 
npm fetcher?

> We'll agree to disagree on what is or isn't efficient or complicated. 
> Luckily, this is all opt-in, so I'll never really have to use it. I'm 
> just sharing what it would take to get me to consider it based on what 
> I've learned/suffered in my time maintaining quite a few go recipes.

My problem is to understand the reasons or use cases behind the inc for 
generated content and its version control. I understand that is must be 
possible to manipulate the fetches dependencies, to cache the generated 
fetcher URIs, to make the fetcher URIs viewable and to manipulate the 
fetched dependency sources.

Regards
   Stefan

[-- Attachment #2: Type: text/html, Size: 11288 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust)
  2025-01-17  7:45                         ` Stefan Herbrechtsmeier
@ 2025-01-17 14:09                           ` Bruce Ashfield
  0 siblings, 0 replies; 66+ messages in thread
From: Bruce Ashfield @ 2025-01-17 14:09 UTC (permalink / raw)
  To: stefan.herbrechtsmeier-oss; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 8326 bytes --]

On Fri, Jan 17, 2025 at 2:45 AM Stefan Herbrechtsmeier via
lists.openembedded.org <stefan.herbrechtsmeier-oss=
weidmueller.com@lists.openembedded.org> wrote:

> Am 17.01.2025 um 05:19 schrieb Bruce Ashfield via lists.openembedded.org:
>
> On Mon, Jan 13, 2025 at 2:11 AM Stefan Herbrechtsmeier <
> stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>
>> Am 10.01.2025 um 21:24 schrieb Bruce Ashfield:
>>
>> On Fri, Jan 10, 2025 at 10:04 AM Stefan Herbrechtsmeier via
>> lists.openembedded.org <stefan.herbrechtsmeier-oss=
>> weidmueller.com@lists.openembedded.org> wrote:
>>
>>> Am 10.01.2025 um 14:26 schrieb Alexander Kanavin:
>>>
>>> On Fri, 10 Jan 2025 at 12:32, Stefan Herbrechtsmeier<stefan.herbrechtsmeier-oss@weidmueller.com> <stefan.herbrechtsmeier-oss@weidmueller.com> wrote:
>>>
>>> What is your opinion regarding gitsm. Should we remove the bitbake fetcher and use a update task to generate a inc file with the source uris and source revisions?
>>>
>>> That ship has sailed. We can't remove gitsm, it has users, and they
>>> will be very angry.
>>>
>>> This makes it impossible to fix wrong design decision or remove code
>>> with a low code quality.
>>>
>>> Do you really review the changes of the inc file?
>>> I understand the points but I have the feeling that they are more theoretically for package manager dependencies or could be solved in an other way (ex. caching)
>>>
>>> But do you? I have to restate the point: a solution that can be placed
>>> inside a layer is much more scalable and maintainable than adding code
>>> to bitbake. That's why I'm leaning towards drawing the line at
>>> existing fetchers that are wget/git convenience wrappers, and shifting
>>> dependency/lockfile management to layers. It's ultimately RP's call,
>>> but he does seek feedback :)
>>>
>>> I'm working on it.
>>>
>>> I'm fine with large SRC_URI/sha256 diffs when recipes get updated to
>>> new versions. And since you asked, no, no one looks at them, they're
>>> auto-generated noise that we learned to block out, just as we learned
>>> to quickly skim over recipe patch changes that are just line number
>>> churn and similar non-functional changes.
>>>
>>> Instead of an inc file the generated SRC_URIs could be saved inside the
>>> work directory of the recipe. This will eliminate the noise and avoid a
>>> manual run of an update task after a recipe changes.
>>>
>>
>> Except for those that want the .inc file changes to be version controlled
>> (as well as SRC_URI changes), but maybe I'm misunderstanding what you
>> described above
>>
>> Why should somebody version control the generated SRC_URI?
>>
>>
>> Why wouldn't they ? I'm talking about when the SRC_URI is generated to
> git fetches (or whatever), that is part of the recipe and version
> controlled.
>
> My point is that this is not throw away / transient information for many
> use cases. It is something that can be tracked between updates to the
> recipes.
>
>
>
>> A generated temporary/build file is definitely more visible than
>> something that is programmatically done and held internally during recipe
>> processing and build.  It opens the door for extension and doing version
>> control on it.  So I don't object to the concept, I just don't think I have
>> all the details straight in my head.
>>
>> A generated build file will be saved in the work directory of the recipe
>> like any other generated build file. It is impossible to add it to the
>> version control system. The update task create a version controlled
>> generated source file. I don't understand why the version control is needed
>> because the source of the generator and the generator are version
>> controlled. Especially if the output is ignored during patch review. I
>> think it is much more straightforward to patch the source (lock file)
>> because it is complicated to handle manual changes during regeneration of a
>> generated file.
>>
> *sigh*. I'm quite aware of what can and cannot be done. That's not what I
> meant. I'm obviously not talking about something in WORKDIR. I'm just
> saying that if something is written to disk, then depending on how things
> are implemented it can be viewed, debugged and manipulated. If it is always
> generated, held internally to the classes and used, I have no options to do
> that sort of debug. Similarly, anything that is generated, it would be
> ideal if there was a way to re-use a previously generated artifact and not
> generate it on the fly .. that's the element that opens the door to version
> control and tracking.
>
> Why do we need to track the generated file if the source is version
> control and the generated file is cached like any other task output. I
> working on a prototype with the following steps:
>
> 1. Fetch the sources from the recipe (do_fetch)
> 2. Unpack the sources from the recipe (do_unpack)
> 2. Apply patches which are marked as early to patch the lock file
> (do_patch_early)
> 3. Resolve dependencies from the lock file and write it into a file
> (do_vendor_resolve)
>
4. Fetch dependencies (do_vendor_fetch)
> 5. Unpack dependencies into a package manager cache (do_vendor_unpack)
> 6. Create a vendor directory below the source folder (do_vendor)
> 7. Apply patches (do_patch)
>
I just track the vendor resolution over time. I've used it many times to
figure out what has gone wrong with the go recipes that I maintain when the
upstream repositories have done something odd with tags, etc, when I'm
doing recipe upgrades.

I use that same file to bump SRCREVs on the vendor dependency fetches when
picking upstream fixes, etc. because I'm typically working on dependencies
that don't have upstream releases that contain what I need and rather than
patch a vendor'd file, I just bump the individual dependency or point it
somewhere else (typically local to my machine) to fix the problem.
It's the workflow I've developed after needing to wade into very large go
recipes that went to go mod fetched vendor directories quite early on and
it ensured that I'm not relying on any proxies, infrastructure or much that
is hidden, so I'm able to debug, archive and be relatively sure that I can
keep things working over time. I'm not even remotely saying this workflow
is for everyone, I'm just trying to see if I could use some of this to
resolve those base fetches and be able to use the outputs of it (what I
currently have in .inc files) as part of my recipes.

The .inc files are the ones that have the fetches listed/resolved, and
those are the ones that are part of my recipe, so they are version
controlled along with the main recipe.

Cheers,

Bruce


> The go, rust and npm fetchers work. The go vendor folder works. I'm still
> working on the vendor directory for crate, a solution for npm without
> JavaScript and the integration of the dynamic sources into the SBOM,
> archiver and so on.
>
> Do you have a recommendation for an example project for the Rust, Go and
> npm fetcher?
>
> We'll agree to disagree on what is or isn't efficient or complicated.
> Luckily, this is all opt-in, so I'll never really have to use it. I'm just
> sharing what it would take to get me to consider it based on what I've
> learned/suffered in my time maintaining quite a few go recipes.
>
> My problem is to understand the reasons or use cases behind the inc for
> generated content and its version control. I understand that is must be
> possible to manipulate the fetches dependencies, to cache the generated
> fetcher URIs, to make the fetcher URIs viewable and to manipulate the
> fetched dependency sources.
>
> Regards
>   Stefan
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#17022):
> https://lists.openembedded.org/g/bitbake-devel/message/17022
> Mute This Topic: https://lists.openembedded.org/mt/110212697/1050810
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> bruce.ashfield@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II

[-- Attachment #2: Type: text/html, Size: 15349 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2025-01-17 14:09 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-20 11:25 [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 01/21] tests: fetch: update npmsw tests to new lockfile format Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 02/21] fetch2: npmsw: remove old lockfile format support Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 03/21] tests: fetch: replace [url] with urls for npm Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 04/21] fetch2: do not prefix embedded checksums Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 05/21] fetch2: read checksum from SRC_URI flag for npm Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 06/21] fetch2: introduce common package manager metadata Stefan Herbrechtsmeier
2024-12-20 11:25 ` [RFC PATCH 07/21] fetch2: add unpack support for npm archives Stefan Herbrechtsmeier
2024-12-23 11:56   ` [bitbake-devel] " Richard Purdie
2025-01-02 12:39     ` Stefan Herbrechtsmeier
2025-01-02 13:59       ` Richard Purdie
2024-12-20 11:25 ` [RFC PATCH 08/21] utils: add Go mod h1 checksum support Stefan Herbrechtsmeier
2024-12-23 10:01   ` [bitbake-devel] " Richard Purdie
2025-01-02  8:27     ` Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 09/21] fetch2: add destdir to FetchData Stefan Herbrechtsmeier
2024-12-23  9:56   ` [bitbake-devel] " Richard Purdie
2025-01-02  8:04     ` Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 10/21] fetch: npm: rework Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 11/21] tests: fetch: adapt style in npm(sw) class Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 12/21] tests: fetch: move npmsw test cases into npmsw test class Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 13/21] tests: fetch: adapt npm test cases Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 14/21] fetch: add dependency mixin Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 15/21] tests: fetch: add test cases for dependency fetcher Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 16/21] fetch: npmsw: migrate to dependency mixin Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 17/21] tests: fetch: adapt npmsw test cases Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 18/21] fetch: add gosum fetcher Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 19/21] tests: fetch: add test cases for gosum Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 20/21] fetch: add cargolock fetcher Stefan Herbrechtsmeier
2024-12-20 11:26 ` [RFC PATCH 21/21] tests: fetch: add test cases for cargolock Stefan Herbrechtsmeier
2024-12-23 10:03 ` [bitbake-devel] [RFC PATCH 00/21] Concept for tightly coupled package manager (Node.js, Go, Rust) Richard Purdie
2024-12-25 15:17   ` Alexander Kanavin
2025-01-06 14:42     ` Stefan Herbrechtsmeier
2025-01-09 10:40       ` Alexander Kanavin
2025-01-09 14:00         ` Stefan Herbrechtsmeier
2025-01-09 19:40           ` Alexander Kanavin
2025-01-10 11:32             ` Stefan Herbrechtsmeier
2025-01-10 13:26               ` Alexander Kanavin
2025-01-10 15:04                 ` Stefan Herbrechtsmeier
2025-01-10 16:07                   ` Alexander Kanavin
2025-01-10 20:24                   ` Bruce Ashfield
2025-01-13  7:11                     ` Stefan Herbrechtsmeier
2025-01-17  4:19                       ` Bruce Ashfield
2025-01-17  5:37                         ` Alexander Kanavin
2025-01-17  7:45                         ` Stefan Herbrechtsmeier
2025-01-17 14:09                           ` Bruce Ashfield
     [not found]       ` <18190013516DD62F.1999@lists.openembedded.org>
2025-01-09 10:50         ` Alexander Kanavin
2025-01-09 14:18           ` Stefan Herbrechtsmeier
2025-01-02  8:55   ` Stefan Herbrechtsmeier
2025-01-02  9:32     ` Richard Purdie
2025-01-02 10:51       ` Stefan Herbrechtsmeier
2025-01-02 13:50       ` Stefan Herbrechtsmeier
2025-01-02 14:07         ` Richard Purdie
2025-01-02 15:11           ` Stefan Herbrechtsmeier
2025-01-06 11:04 ` Richard Purdie
2025-01-06 14:35   ` Stefan Herbrechtsmeier
2025-01-06 15:30     ` Richard Purdie
2025-01-07  9:47       ` Stefan Herbrechtsmeier
2025-01-07 11:01         ` Richard Purdie
2025-01-07 16:13           ` Stefan Herbrechtsmeier
2025-01-07 16:58             ` Bruce Ashfield
2025-01-07 17:46               ` Stefan Herbrechtsmeier
2025-01-08 15:43                 ` Bruce Ashfield
2025-01-09 11:51                   ` Stefan Herbrechtsmeier
2025-01-09 11:53 ` Martin Jansa
2025-01-09 14:26   ` Stefan Herbrechtsmeier
     [not found] ` <1812DEFF37B8C65E.26783@lists.openembedded.org>
2025-01-10  7:12   ` [bitbake-devel] [RFC PATCH 06/21] fetch2: introduce common package manager metadata Stefan Herbrechtsmeier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.