public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com>
To: skandigraun@gmail.com, openembedded-core@lists.openembedded.org
Cc: Tom Geelen <t.f.g.geelen@gmail.com>
Subject: Re: [OE-core] [RFC PATCH] cargo_common.bbclass: use source replacement instead of dependency patching
Date: Tue, 7 Oct 2025 16:59:10 +0200	[thread overview]
Message-ID: <211549e4-c4c2-464d-9b63-88c27d5bdf18@weidmueller.com> (raw)
In-Reply-To: <20251003213000.2256939-1-skandigraun@gmail.com>

Am 03.10.2025 um 23:30 schrieb Gyorgy Sarvari via lists.openembedded.org:
> Cargo.toml files usually contain a list of dependencies in one of two forms:
> either a crate name that can be fetched from some registry (like crates.io), or
> as a source crate, which is most often fetched from a git repository.
>
> Normally cargo handles fetching the crates from both the registry and from git,
> however with Yocto this task is taken over by Bitbake.
>
> After fetching these crates, they are made available to cargo by adding the location
> to $CARGO_HOME/config.toml. The source crates are of interest here: each git repository
> that can be found in the SRC_URI is added as one source crate.
>
> This works most of the time, as long as the repository really contains one crate only.
>
> However in case the repository is a cargo workspace, it contains multiple crates in
> different subfolders, and in order to allow cargo to process them, they need to be
> listed separately. This is not happening with the current implementation of cargo_common.
>
> This change introduces the following:
> - instead of patching the dependencies, use source replacement (the primary motivation for
>    this was that maturin seems to ignore source crate patches from config.toml)
> - the above also allows to keep the original Cargo.lock untouched (the original implementation
>    deleted git repository lines from it)
> - it adds a new folder, currently ${UNPACKDIR}/yocto-vendored-source-crates. During processing
>    the separate crate folders are copied into this folder, and it is used as the central
>    vendoring folder. This is needed for source replacements: the folder that is used for
>    vendoring needs to contain the crates separately, one crate in one folder. Each folder
>    has the name of the crate that it contains. Workspaces are not included here (unless the
>    given manifest is a workspace AND a package at once)
> - previuosly the SRC_URI had to contain a "name" and a "destsuffix" parameter to be considered
>    to be a rust crate. The name is not derived from the Cargo.toml file, not from the SRC_URI.
>    Having destsuffix is still mandatory though.
>
> The change does not handle nested workspaces, only the top level Cargo.toml is processed.

I use a similar approach for my Cargo.lock fetcher. In my case the code 
finds the crate on the fly inside the a git repository because the 
Cargo.lock doesn't contain the subpath.


> Signed-off-by: Gyorgy Sarvari <skandigraun@gmail.com>
> Cc: Tom Geelen <t.f.g.geelen@gmail.com>
>
> ---
>   meta/classes-recipe/cargo_common.bbclass | 158 ++++++++++++++++-------
>   1 file changed, 108 insertions(+), 50 deletions(-)
>
> diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
> index c9eb2d09a5..79c1351298 100644
> --- a/meta/classes-recipe/cargo_common.bbclass
> +++ b/meta/classes-recipe/cargo_common.bbclass
> @@ -129,6 +129,44 @@ cargo_common_do_configure () {
>   python cargo_common_do_patch_paths() {
>       import shutil
>   
> +    def is_rust_crate_folder(path):
> +        cargo_toml_path = os.path.join(path, 'Cargo.toml')
> +        return os.path.exists(cargo_toml_path)
> +
> +    def load_toml_file(toml_path):
> +        import tomllib
> +        with open(toml_path, 'rb') as f:
> +            toml = tomllib.load(f)
> +        return toml
> +
> +    def get_matching_repo_from_lockfile(lockfile_repos, repo, revision):
> +        for lf_repo in lockfile_repos.keys():
> +            if repo in lf_repo and lf_repo.endswith(revision):

Does this works if the URL contains a "rev" query parameter? This 
happens if the same git repository is used with different revisions.

> +                lockfile_repos[lf_repo] = True
> +                return lf_repo.split("#")[0]
> +        bb.fatal('Cannot find %s (%s) repository from SRC_URI in Cargo.lock file' % (repo, revision))
> +
> +    def create_cargo_checksum(folder_path):
> +        checksum_path = os.path.join(folder_path, '.cargo-checksum.json')
> +        if os.path.exists(checksum_path):
> +            return
> +
> +        import hashlib, json
> +
> +        checksum = {'files': {}}
> +        for root, _, files in os.walk(folder_path):
> +            for f in files:
> +                full_path = os.path.join(root, f)
> +                relative_path = os.path.relpath(full_path, folder_path)
> +                if relative_path.startswith(".git/"):
> +                    continue
> +                with open(full_path, 'rb') as f2:
> +                    file_sha = hashlib.sha256(f2.read()).hexdigest()
> +                checksum["files"][relative_path] = file_sha

Do we really need the calculation of the checksum?

> +
> +        with open(checksum_path, 'w') as f:
> +            json.dump(checksum, f)
> +
>       cargo_config = os.path.join(d.getVar("CARGO_HOME"), "config.toml")
>       if not os.path.exists(cargo_config):
>           return
> @@ -137,66 +175,86 @@ python cargo_common_do_patch_paths() {
>       if len(src_uri) == 0:
>           return
>   
> -    patches = dict()
> +    lockfile = d.getVar("CARGO_LOCK_PATH")
> +    if not os.path.exists(lockfile):
> +        bb.fatal(f"{lockfile} file doesn't exist")
> +
> +    lockfile = load_toml_file(lockfile)
> +
> +    # key is the repo url, value is a boolean, which is used later
> +    # to indicate if there is a matching repository in SRC_URI also
> +    lockfile_git_repos = {}
> +    for p in lockfile['package']:
> +        if 'source' in p and p['source'].startswith('git+'):
> +            lockfile_git_repos[p['source']] = False
> +
> +    sources = dict()
>       workdir = d.getVar('UNPACKDIR')
>       fetcher = bb.fetch2.Fetch(src_uri, d)
> +
> +    vendor_folder = os.path.join(workdir, 'yocto-vendored-source-crates')
> +
> +    os.makedirs(vendor_folder)
> +
>       for url in fetcher.urls:
>           ud = fetcher.ud[url]
> -        if ud.type == 'git' or ud.type == 'gitsm':
> -            name = ud.parm.get('name')
> -            destsuffix = ud.parm.get('destsuffix')
> -            if name is not None and destsuffix is not None:
> -                if ud.user:
> -                    repo = '%s://%s@%s%s' % (ud.proto, ud.user, ud.host, ud.path)
> -                else:
> -                    repo = '%s://%s%s' % (ud.proto, ud.host, ud.path)
> -                path = '%s = { path = "%s" }' % (name, os.path.join(workdir, destsuffix))
> -                patches.setdefault(repo, []).append(path)
> +        if ud.type != 'git' and ud.type != 'gitsm':
> +            continue
>   
> -    with open(cargo_config, "a+") as config:
> -        for k, v in patches.items():
> -            print('\n[patch."%s"]' % k, file=config)
> -            for name in v:
> -                print(name, file=config)
> +        destsuffix = ud.parm.get('destsuffix')
> +        crate_folder = os.path.join(workdir, destsuffix)
>   
> -    if not patches:
> -        return
> +        if destsuffix is None or not is_rust_crate_folder(crate_folder):
> +            continue
>   
> -    # Cargo.lock file is needed for to be sure that artifacts
> -    # downloaded by the fetch steps are those expected by the
> -    # project and that the possible patches are correctly applied.
> -    # Moreover since we do not want any modification
> -    # of this file (for reproducibility purpose), we prevent it by
> -    # using --frozen flag (in CARGO_BUILD_FLAGS) and raise a clear error
> -    # here is better than letting cargo tell (in case the file is missing)
> -    # "Cargo.lock should be modified but --frozen was given"
> +        if ud.user:
> +            repo = '%s://%s@%s%s' % (ud.proto, ud.user, ud.host, ud.path)
> +        else:
> +            repo = '%s://%s%s' % (ud.proto, ud.host, ud.path)
>   
> -    lockfile = d.getVar("CARGO_LOCK_PATH")
> -    if not os.path.exists(lockfile):
> -        bb.fatal(f"{lockfile} file doesn't exist")
> +        sources[destsuffix] = (repo, ud.revision, crate_folder)
> +
> +        cargo_toml_path = os.path.join(workdir, destsuffix, 'Cargo.toml')
> +        cargo_toml = load_toml_file(cargo_toml_path)
> +
> +        if 'workspace' in cargo_toml:
> +            members = cargo_toml['workspace']['members']
> +            for member in members:
> +                member_crate_folder = os.path.join(workdir, destsuffix, member)
> +                member_crate_cargo_toml = os.path.join(member_crate_folder, 'Cargo.toml')
> +                member_cargo_toml = load_toml_file(member_crate_cargo_toml)
> +                member_crate_name = member_cargo_toml['package']['name']
> +                shutil.copytree(member_crate_folder, os.path.join(vendor_folder, member_crate_name))
> +
> +        if 'package' in cargo_toml:
> +            crate_folder = os.path.join(workdir, destsuffix)
> +            crate_name = cargo_toml['package']['name']
> +            shutil.copytree(crate_folder, os.path.join(vendor_folder, crate_name))
> +
> +    for d in os.scandir(vendor_folder):
> +        if d.is_dir():
> +            create_cargo_checksum(d.path)
> +
> +
> +    with open(cargo_config, "a+") as config:
> +        print('\n[source."yocto-vendored-sources"]', file=config)
> +        print('directory = "%s"' % vendor_folder, file=config)
> +
> +        for destsuffix, (repo, revision, repo_path) in sources.items():
> +            lockfile_repo = get_matching_repo_from_lockfile(lockfile_git_repos, repo, revision)
> +            print('\n[source."%s"]' % lockfile_repo, file=config)
> +            print('git = "%s"' % repo, file=config)
> +            print('rev = "%s"' % revision, file=config)
> +            print('replace-with = "yocto-vendored-sources"', file=config)
> +
> +    # check if there are any git repos in the lock file that were not visited
> +    # in the previous loop, when the source replacement was created, and warn about it
> +    for lf_repo, found_in_src_uri in lockfile_git_repos.items():
> +        if not found_in_src_uri:
> +            bb.warn(f"{lf_repo} is present in lockfile, but not found in SRC_URI")
>   
> -    # There are patched files and so Cargo.lock should be modified but we use
> -    # --frozen so let's handle that modifications here.
> -    #
> -    # Note that a "better" (more elegant ?) would have been to use cargo update for
> -    # patched packages:
> -    #  cargo update --offline -p package_1 -p package_2
> -    # But this is not possible since it requires that cargo local git db
> -    # to be populated and this is not the case as we fetch git repo ourself.
> -
> -    lockfile_orig = lockfile + ".orig"
> -    if not os.path.exists(lockfile_orig):
> -        shutil.copy(lockfile, lockfile_orig)
> -
> -    newlines = []
> -    with open(lockfile_orig, "r") as f:
> -        for line in f.readlines():
> -            if not line.startswith("source = \"git"):
> -                newlines.append(line)
> -
> -    with open(lockfile, "w") as f:
> -        f.writelines(newlines)
>   }
> +
>   do_configure[postfuncs] += "cargo_common_do_patch_paths"
>   
>   do_compile:prepend () {
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#224426): https://lists.openembedded.org/g/openembedded-core/message/224426
> Mute This Topic: https://lists.openembedded.org/mt/115578466/6374899
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [stefan.herbrechtsmeier-oss@weidmueller.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


  parent reply	other threads:[~2025-10-07 14:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03 21:30 [RFC PATCH] cargo_common.bbclass: use source replacement instead of dependency patching Gyorgy Sarvari
2025-10-05 13:23 ` [OE-core] " Mathieu Dubois-Briand
2025-10-05 13:31   ` Gyorgy Sarvari
2025-10-05 19:48     ` Peter Kjellerstedt
2025-10-07 14:59 ` Stefan Herbrechtsmeier [this message]
2025-10-08 11:01   ` Gyorgy Sarvari
2025-10-09  9:31     ` Stefan Herbrechtsmeier
2025-10-09 14:30       ` Gyorgy Sarvari
2025-10-10  6:27         ` Stefan Herbrechtsmeier
2025-10-10  8:04           ` Gyorgy Sarvari
2025-10-10 10:38             ` Stefan Herbrechtsmeier
2025-10-10 11:35               ` Gyorgy Sarvari
2025-10-10 17:04                 ` Stefan Herbrechtsmeier
2025-10-09 12:18 ` Yash Shinde
2025-10-09 14:03   ` [OE-core] " Gyorgy Sarvari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211549e4-c4c2-464d-9b63-88c27d5bdf18@weidmueller.com \
    --to=stefan.herbrechtsmeier-oss@weidmueller.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=skandigraun@gmail.com \
    --cc=t.f.g.geelen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox