From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com>
To: Gyorgy Sarvari <skandigraun@gmail.com>,
openembedded-core@lists.openembedded.org
Cc: Tom Geelen <t.f.g.geelen@gmail.com>
Subject: Re: [OE-core] [RFC PATCH] cargo_common.bbclass: use source replacement instead of dependency patching
Date: Fri, 10 Oct 2025 08:27:50 +0200 [thread overview]
Message-ID: <6f3eedff-44c0-48ca-86dc-c1ea8aecc9e0@weidmueller.com> (raw)
In-Reply-To: <239994b8-47de-4394-bcf0-16dc91ca654e@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 15305 bytes --]
Am 09.10.2025 um 16:30 schrieb Gyorgy Sarvari:
> On 10/9/25 11:31, Stefan Herbrechtsmeier wrote:
>> Am 08.10.2025 um 13:01 schrieb Gyorgy Sarvari:
>>> On 10/7/25 16:59, Stefan Herbrechtsmeier wrote:
>>>> Am 03.10.2025 um 23:30 schrieb Gyorgy Sarvari via lists.openembedded.org:
>>>>> Cargo.toml files usually contain a list of dependencies in one of two forms:
>>>>> either a crate name that can be fetched from some registry (like crates.io), or
>>>>> as a source crate, which is most often fetched from a git repository.
>>>>>
>>>>> Normally cargo handles fetching the crates from both the registry and from git,
>>>>> however with Yocto this task is taken over by Bitbake.
>>>>>
>>>>> After fetching these crates, they are made available to cargo by adding the location
>>>>> to $CARGO_HOME/config.toml. The source crates are of interest here: each git repository
>>>>> that can be found in the SRC_URI is added as one source crate.
>>>>>
>>>>> This works most of the time, as long as the repository really contains one crate only.
>>>>>
>>>>> However in case the repository is a cargo workspace, it contains multiple crates in
>>>>> different subfolders, and in order to allow cargo to process them, they need to be
>>>>> listed separately. This is not happening with the current implementation of cargo_common.
>>>>>
>>>>> This change introduces the following:
>>>>> - instead of patching the dependencies, use source replacement (the primary motivation for
>>>>> this was that maturin seems to ignore source crate patches from config.toml)
>>>>> - the above also allows to keep the original Cargo.lock untouched (the original implementation
>>>>> deleted git repository lines from it)
>>>>> - it adds a new folder, currently ${UNPACKDIR}/yocto-vendored-source-crates. During processing
>>>>> the separate crate folders are copied into this folder, and it is used as the central
>>>>> vendoring folder. This is needed for source replacements: the folder that is used for
>>>>> vendoring needs to contain the crates separately, one crate in one folder. Each folder
>>>>> has the name of the crate that it contains. Workspaces are not included here (unless the
>>>>> given manifest is a workspace AND a package at once)
>>>>> - previuosly the SRC_URI had to contain a "name" and a "destsuffix" parameter to be considered
>>>>> to be a rust crate. The name is not derived from the Cargo.toml file, not from the SRC_URI.
>>>>> Having destsuffix is still mandatory though.
>>>>>
>>>>> The change does not handle nested workspaces, only the top level Cargo.toml is processed.
>>>> I use a similar approach for my Cargo.lock fetcher. In my case the code
>>>> finds the crate on the fly inside the a git repository because the
>>>> Cargo.lock doesn't contain the subpath.
>>> By any chance, did you manage to solve the workspace problem? If you
>>> have a working solution, feel free to submit it, I wouldn't mind if I
>>> wouldn't have to debug mine :D
>> I haven't test a workspace project. Do you have an example project?
>>
> I have attached a sample recipe (that is very much based on Tom Geelen's
> initial work). It depends on at least 2 workspaces.
Thanks for the sample. After switching to my cargolock fecher and
cargo_vendor class the project build without problems. Your git URLs
need a parameter to inform the config generate that the source contains
a rev query parameter. Additionally you need to add the revision to the
name and destsuffix/subdir because it is possible to use crates with
different revisions from the same repository.
>>>>> Signed-off-by: Gyorgy Sarvari<skandigraun@gmail.com>
>>>>> Cc: Tom Geelen<t.f.g.geelen@gmail.com>
>>>>>
>>>>> ---
>>>>> meta/classes-recipe/cargo_common.bbclass | 158 ++++++++++++++++-------
>>>>> 1 file changed, 108 insertions(+), 50 deletions(-)
>>>>>
>>>>> diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
>>>>> index c9eb2d09a5..79c1351298 100644
>>>>> --- a/meta/classes-recipe/cargo_common.bbclass
>>>>> +++ b/meta/classes-recipe/cargo_common.bbclass
>>>>> @@ -129,6 +129,44 @@ cargo_common_do_configure () {
>>>>> python cargo_common_do_patch_paths() {
>>>>> import shutil
>>>>>
>>>>> + def is_rust_crate_folder(path):
>>>>> + cargo_toml_path = os.path.join(path, 'Cargo.toml')
>>>>> + return os.path.exists(cargo_toml_path)
>>>>> +
>>>>> + def load_toml_file(toml_path):
>>>>> + import tomllib
>>>>> + with open(toml_path, 'rb') as f:
>>>>> + toml = tomllib.load(f)
>>>>> + return toml
>>>>> +
>>>>> + def get_matching_repo_from_lockfile(lockfile_repos, repo, revision):
>>>>> + for lf_repo in lockfile_repos.keys():
>>>>> + if repo in lf_repo and lf_repo.endswith(revision):
>>>> Does this works if the URL contains a "rev" query parameter? This
>>>> happens if the same git repository is used with different revisions.
>>> I *think* yes, since I query the revision from the fetcher, instead of
>>> parsing it myself (and I use both the repo and revision for matching the
>>> cargo.lock repos). But will test it specifically, and make it work if it
>>> wouldn't work out of the box. Thanks for calling my attention on this.
>> The problem is that the source replacement key contains a query
>> parameter. The query isn't supported by the git fetcher. That means
>> you have to remove the query from the SRC_URI but add it back in the
>> source entry in the config.toml.
> You mean for dynamic fetching, from Cargo.lock? This patch still relies
> on the user adding these dependencies to the SRC_URI.
> Otherwise I might be misunderstanding your question...
Please check the source inside the Cargo.lock:
https://github.com/astral-sh/uv/blob/0.8.19/Cargo.lock#L302
It contains a rev query parameter. This query parameter must be part of
the source key inside the config.toml:
[source."git+https://github.com/astral-sh/rs-async-zip?rev=285e48742b74ab109887d62e1ae79e7c15fd4878"]
>>>>> + lockfile_repos[lf_repo] = True
>>>>> + return lf_repo.split("#")[0]
>>>>> + bb.fatal('Cannot find %s (%s) repository from SRC_URI in Cargo.lock file' % (repo, revision))
>>>>> +
>>>>> + def create_cargo_checksum(folder_path):
>>>>> + checksum_path = os.path.join(folder_path, '.cargo-checksum.json')
>>>>> + if os.path.exists(checksum_path):
>>>>> + return
>>>>> +
>>>>> + import hashlib, json
>>>>> +
>>>>> + checksum = {'files': {}}
>>>>> + for root, _, files in os.walk(folder_path):
>>>>> + for f in files:
>>>>> + full_path = os.path.join(root, f)
>>>>> + relative_path = os.path.relpath(full_path, folder_path)
>>>>> + if relative_path.startswith(".git/"):
>>>>> + continue
>>>>> + with open(full_path, 'rb') as f2:
>>>>> + file_sha = hashlib.sha256(f2.read()).hexdigest()
>>>>> + checksum["files"][relative_path] = file_sha
>>>> Do we really need the calculation of the checksum?
>>> For source replacement AFAIK it is mandatory, otherwise cargo complains.
>>> (But I'd be happy to stand corrected)
>> Have you test an empty dictionary for "files" and NULL for "package"?
>>
> Are these valid states? Currently the checksum calculation happens for
> crate folders that have been actually copied to the vendor folder. And
> that happens only, in case there is at least a Cargo.toml manifest in
> that folder, so the files dict shouldn't be empty. Otherwise the
> checksum sub iterates through all the files it can find, it doesn't try
> to validate it against any manifests.
Do we need the validation by cargo? The crate fetcher skip the
validation with an empty dict and the same works for git sources.
>>>>> +
>>>>> + with open(checksum_path, 'w') as f:
>>>>> + json.dump(checksum, f)
>>>>> +
>>>>> cargo_config = os.path.join(d.getVar("CARGO_HOME"), "config.toml")
>>>>> if not os.path.exists(cargo_config):
>>>>> return
>>>>> @@ -137,66 +175,86 @@ python cargo_common_do_patch_paths() {
>>>>> if len(src_uri) == 0:
>>>>> return
>>>>>
>>>>> - patches = dict()
>>>>> + lockfile = d.getVar("CARGO_LOCK_PATH")
>>>>> + if not os.path.exists(lockfile):
>>>>> + bb.fatal(f"{lockfile} file doesn't exist")
>>>>> +
>>>>> + lockfile = load_toml_file(lockfile)
>>>>> +
>>>>> + # key is the repo url, value is a boolean, which is used later
>>>>> + # to indicate if there is a matching repository in SRC_URI also
>>>>> + lockfile_git_repos = {}
>>>>> + for p in lockfile['package']:
>>>>> + if 'source' in p and p['source'].startswith('git+'):
>>>>> + lockfile_git_repos[p['source']] = False
>>>>> +
>>>>> + sources = dict()
>>>>> workdir = d.getVar('UNPACKDIR')
>>>>> fetcher = bb.fetch2.Fetch(src_uri, d)
>>>>> +
>>>>> + vendor_folder = os.path.join(workdir, 'yocto-vendored-source-crates')
>>>>> +
>>>>> + os.makedirs(vendor_folder)
>>>>> +
>>>>> for url in fetcher.urls:
>>>>> ud = fetcher.ud[url]
>>>>> - if ud.type == 'git' or ud.type == 'gitsm':
>>>>> - name = ud.parm.get('name')
>>>>> - destsuffix = ud.parm.get('destsuffix')
>>>>> - if name is not None and destsuffix is not None:
>>>>> - if ud.user:
>>>>> - repo = '%s://%s@%s%s' % (ud.proto, ud.user, ud.host, ud.path)
>>>>> - else:
>>>>> - repo = '%s://%s%s' % (ud.proto, ud.host, ud.path)
>>>>> - path = '%s = { path = "%s" }' % (name, os.path.join(workdir, destsuffix))
>>>>> - patches.setdefault(repo, []).append(path)
>>>>> + if ud.type != 'git' and ud.type != 'gitsm':
>>>>> + continue
>>>>>
>>>>> - with open(cargo_config, "a+") as config:
>>>>> - for k, v in patches.items():
>>>>> - print('\n[patch."%s"]' % k, file=config)
>>>>> - for name in v:
>>>>> - print(name, file=config)
>>>>> + destsuffix = ud.parm.get('destsuffix')
>>>>> + crate_folder = os.path.join(workdir, destsuffix)
>>>>>
>>>>> - if not patches:
>>>>> - return
>>>>> + if destsuffix is None or not is_rust_crate_folder(crate_folder):
>>>>> + continue
>>>>>
>>>>> - # Cargo.lock file is needed for to be sure that artifacts
>>>>> - # downloaded by the fetch steps are those expected by the
>>>>> - # project and that the possible patches are correctly applied.
>>>>> - # Moreover since we do not want any modification
>>>>> - # of this file (for reproducibility purpose), we prevent it by
>>>>> - # using --frozen flag (in CARGO_BUILD_FLAGS) and raise a clear error
>>>>> - # here is better than letting cargo tell (in case the file is missing)
>>>>> - # "Cargo.lock should be modified but --frozen was given"
>>>>> + if ud.user:
>>>>> + repo = '%s://%s@%s%s' % (ud.proto, ud.user, ud.host, ud.path)
>>>>> + else:
>>>>> + repo = '%s://%s%s' % (ud.proto, ud.host, ud.path)
>>>>>
>>>>> - lockfile = d.getVar("CARGO_LOCK_PATH")
>>>>> - if not os.path.exists(lockfile):
>>>>> - bb.fatal(f"{lockfile} file doesn't exist")
>>>>> + sources[destsuffix] = (repo, ud.revision, crate_folder)
>>>>> +
>>>>> + cargo_toml_path = os.path.join(workdir, destsuffix, 'Cargo.toml')
>>>>> + cargo_toml = load_toml_file(cargo_toml_path)
>>>>> +
>>>>> + if 'workspace' in cargo_toml:
>>>>> + members = cargo_toml['workspace']['members']
>>>>> + for member in members:
>>>>> + member_crate_folder = os.path.join(workdir, destsuffix, member)
>>>>> + member_crate_cargo_toml = os.path.join(member_crate_folder, 'Cargo.toml')
>>>>> + member_cargo_toml = load_toml_file(member_crate_cargo_toml)
>>>>> + member_crate_name = member_cargo_toml['package']['name']
>>>>> + shutil.copytree(member_crate_folder, os.path.join(vendor_folder, member_crate_name))
>>>>> +
>>>>> + if 'package' in cargo_toml:
>>>>> + crate_folder = os.path.join(workdir, destsuffix)
>>>>> + crate_name = cargo_toml['package']['name']
>>>>> + shutil.copytree(crate_folder, os.path.join(vendor_folder, crate_name))
>>>>> +
>>>>> + for d in os.scandir(vendor_folder):
>>>>> + if d.is_dir():
>>>>> + create_cargo_checksum(d.path)
>>>>> +
>>>>> +
>>>>> + with open(cargo_config, "a+") as config:
>>>>> + print('\n[source."yocto-vendored-sources"]', file=config)
>>>>> + print('directory = "%s"' % vendor_folder, file=config)
>>>>> +
>>>>> + for destsuffix, (repo, revision, repo_path) in sources.items():
>>>>> + lockfile_repo = get_matching_repo_from_lockfile(lockfile_git_repos, repo, revision)
>>>>> + print('\n[source."%s"]' % lockfile_repo, file=config)
>>>>> + print('git = "%s"' % repo, file=config)
>>>>> + print('rev = "%s"' % revision, file=config)
>>>>> + print('replace-with = "yocto-vendored-sources"', file=config)
>>>>> +
>>>>> + # check if there are any git repos in the lock file that were not visited
>>>>> + # in the previous loop, when the source replacement was created, and warn about it
>>>>> + for lf_repo, found_in_src_uri in lockfile_git_repos.items():
>>>>> + if not found_in_src_uri:
>>>>> + bb.warn(f"{lf_repo} is present in lockfile, but not found in SRC_URI")
>>>>>
>>>>> - # There are patched files and so Cargo.lock should be modified but we use
>>>>> - # --frozen so let's handle that modifications here.
>>>>> - #
>>>>> - # Note that a "better" (more elegant ?) would have been to use cargo update for
>>>>> - # patched packages:
>>>>> - # cargo update --offline -p package_1 -p package_2
>>>>> - # But this is not possible since it requires that cargo local git db
>>>>> - # to be populated and this is not the case as we fetch git repo ourself.
>>>>> -
>>>>> - lockfile_orig = lockfile + ".orig"
>>>>> - if not os.path.exists(lockfile_orig):
>>>>> - shutil.copy(lockfile, lockfile_orig)
>>>>> -
>>>>> - newlines = []
>>>>> - with open(lockfile_orig, "r") as f:
>>>>> - for line in f.readlines():
>>>>> - if not line.startswith("source = \"git"):
>>>>> - newlines.append(line)
>>>>> -
>>>>> - with open(lockfile, "w") as f:
>>>>> - f.writelines(newlines)
>>>>> }
>>>>> +
>>>>> do_configure[postfuncs] += "cargo_common_do_patch_paths"
>>>>>
>>>>> do_compile:prepend () {
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-
>>>>> Links: You receive all messages sent to this group.
>>>>> View/Reply Online (#224426):https://lists.openembedded.org/g/openembedded-core/message/224426
>>>>> Mute This Topic:https://lists.openembedded.org/mt/115578466/6374899
>>>>> Group Owner:openembedded-core+owner@lists.openembedded.org
>>>>> Unsubscribe:https://lists.openembedded.org/g/openembedded-core/unsub [stefan.herbrechtsmeier-oss@weidmueller.com]
>>>>> -=-=-=-=-=-=-=-=-=-=-=-
>>>>>
[-- Attachment #2: Type: text/html, Size: 17688 bytes --]
next prev parent reply other threads:[~2025-10-10 6:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 21:30 [RFC PATCH] cargo_common.bbclass: use source replacement instead of dependency patching Gyorgy Sarvari
2025-10-05 13:23 ` [OE-core] " Mathieu Dubois-Briand
2025-10-05 13:31 ` Gyorgy Sarvari
2025-10-05 19:48 ` Peter Kjellerstedt
2025-10-07 14:59 ` Stefan Herbrechtsmeier
2025-10-08 11:01 ` Gyorgy Sarvari
2025-10-09 9:31 ` Stefan Herbrechtsmeier
2025-10-09 14:30 ` Gyorgy Sarvari
2025-10-10 6:27 ` Stefan Herbrechtsmeier [this message]
2025-10-10 8:04 ` Gyorgy Sarvari
2025-10-10 10:38 ` Stefan Herbrechtsmeier
2025-10-10 11:35 ` Gyorgy Sarvari
2025-10-10 17:04 ` Stefan Herbrechtsmeier
2025-10-09 12:18 ` Yash Shinde
2025-10-09 14:03 ` [OE-core] " Gyorgy Sarvari
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f3eedff-44c0-48ca-86dc-c1ea8aecc9e0@weidmueller.com \
--to=stefan.herbrechtsmeier-oss@weidmueller.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=skandigraun@gmail.com \
--cc=t.f.g.geelen@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox