public inbox for distributions@lists.linux.dev
 help / color / mirror / Atom feed
* Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
@ 2025-01-23 13:14 Michał Górny
  2025-01-23 13:50 ` Bruno Haible
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Michał Górny @ 2025-01-23 13:14 UTC (permalink / raw)
  To: distributions

[-- Attachment #1: Type: text/plain, Size: 5179 bytes --]

Hello,

As a packager, I often find it necessary to ensure two important aspects
of the build process:

1) that the build process itself doesn't access the Internet,
in particular that it doesn't download any files on its own,

2) that the build process uses system shared libraries and other
dependencies whenever possible, rather than vendored (or downloaded)
copies.

Unfortunately, these requirements don't always align with what
the upstream considers best for their own use and what they consider
the best defaults for their users.  Fortunately, we are often able to
reach an agreement and get the options to adjust the build system
behavior.  However, these options are often defined per project
and they aren't necessarily consistent across different projects.

Given the recent success of how the NO_COLOR variable [1] became a de-
facto standard, I was wondering if we could perhaps attempt to
standardize environment variables for these two aspects as well.
I suppose many other distribution packagers are facing the same
problems, so I think this mailing list is a good place to discuss this.

What I'd like to propose are two environment variables:

1) NO_NETWORK -- if it's set to a non-empty value, it requests that
programs don't access the (TCP/IP) network.

2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
the build system does not use any vendored dependency for which it
supports using a system version instead, and that it links to shared
libraries whenever possible.

Some examples and thoughts below, followed by rationale.


For NO_NETWORK, my primary goal is to have build systems not issue
commands that fetch stuff from the Internet.  For example, if a build
system supports automatically fetching and building missing
dependencies, setting NO_NETWORK would imply that it would fail instead.
Technically, this could also be extended to tools like wget, effectively
blocking Internet access on multiple layers -- but that's not strictly
necessary.

Another use case would be test suites with tests accessing remote
servers -- the variable could automatically cause them to be skipped. 
It could be also used e.g. by pytest-socket plugin to automatically cut
the test suite from the Internet when loaded.

I think it would also make sense to imply not accessing local network
services, in particular local system services.  An example of that are
test suites that connect to the local database daemons rather than
starting their own isolated copies.

As for the rationale, my focus would be on security, 
and reliability.  Package management systems in general implement
streamlined procedures for fetching resources, including verification of
authenticity, use of local mirrors, caching and so on.  Fetching
resources directly bypasses this.  This could expose information about
what is happening on a particular machine, cause unnecessary server
load, cause fees due to data plan use, cause failures due to shoddy
Internet connection -- similarly for tests.  I've written a more
detailed explanation once in the Gentoo devmanual [1].


For USE_SYSTEM_DEPS, the primary goal is to build against system
dependencies.  For example, some upstreams either prefer using vendored
dependencies or fallback to them when the system dependencies aren't
found.  However, in Gentoo we really do want stuff to use system
dependencies -- and if we miss to specify them appropriately, we'd
rather see an error than an implicit fallback to a vendored dependency.
So if USE_SYSTEM_DEPS is set, the build system should enable using
system dependencies whenever supported, and disable all possible
fallbacks to vendored dependencies.

The wording proposed above tries to account for the special case when
the package does not support a system dependency version at all -- e.g.
when they are patching the vendored dependency.  It's not ideal, but I'd
like to avoid blocking a major improvement just because we can't get it
perfect.

The rationale for avoiding vendored dependencies have been repeated many
times, we have one on Gentoo Wiki [2], Fedora has one as well [3].
The main focus is ensuring that our security team is able to address
security issues, but being able to fix bugs and simply avoiding
duplication is also helpful.

I've included mention of shared libraries since some upstreams seem to
prefer static linking for similar reasons that they prefer vendoring. 
While technically this could be considered a separate issue (and perhaps
deserving a separate flag), I don't think there really are distributions
who want unvendoring but not dynamic linking, so it's simpler to have
a single "distro packager mode" variable.  Again, it assumes that there
will be cases when dynamic linking isn't possible, particularly when
there are no shared libraries.


[1] https://devmanual.gentoo.org/ebuild-writing/functions/src_test/index.html#tests-that-require-network-or-service-access
[2] https://wiki.gentoo.org/wiki/Why_not_bundle_dependencies
[3] https://fedoraproject.org/wiki/Bundled_Libraries

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:14 Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables Michał Górny
@ 2025-01-23 13:50 ` Bruno Haible
  2025-01-23 14:09   ` Eli Schwartz
  2025-01-23 14:26   ` Michał Górny
  2025-01-23 13:53 ` Simon Josefsson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Bruno Haible @ 2025-01-23 13:50 UTC (permalink / raw)
  To: distributions, Michał Górny

Michał Górny wrote:
> 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
> the build system does not use any vendored dependency for which it
> supports using a system version instead, and that it links to shared
> libraries whenever possible.

This contradicts the GNU Coding Standards [1]. For GNU packages,
configuration of such things should be done through --with-* and --without-*
options. NOT through environment variables.

Proposing something that contradicts the GNU Coding Standards is a non-starter.

Bruno

[1] https://www.gnu.org/prep/standards/html_node/Configuration.html




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:14 Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables Michał Górny
  2025-01-23 13:50 ` Bruno Haible
@ 2025-01-23 13:53 ` Simon Josefsson
  2025-01-23 14:37   ` Michał Górny
  2025-01-23 14:04 ` Bernhard M. Wiedemann
  2025-01-23 15:19 ` Celeste Liu
  3 siblings, 1 reply; 13+ messages in thread
From: Simon Josefsson @ 2025-01-23 13:53 UTC (permalink / raw)
  To: Michał Górny; +Cc: distributions

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

Michał Górny <mgorny@gentoo.org> writes:

> 1) NO_NETWORK -- if it's set to a non-empty value, it requests that
> programs don't access the (TCP/IP) network.

To me this sounds like an obviously good idea, and would consider using
both in my upstream and packaging work.

> 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
> the build system does not use any vendored dependency for which it
> supports using a system version instead, and that it links to shared
> libraries whenever possible.
...
> For USE_SYSTEM_DEPS, the primary goal is to build against system
> dependencies.  For example, some upstreams either prefer using vendored
> dependencies or fallback to them when the system dependencies aren't
> found.  However, in Gentoo we really do want stuff to use system
> dependencies -- and if we miss to specify them appropriately, we'd
> rather see an error than an implicit fallback to a vendored dependency.
> So if USE_SYSTEM_DEPS is set, the build system should enable using
> system dependencies whenever supported, and disable all possible
> fallbacks to vendored dependencies.

This sounds really complicated, and speaking as an upstream that would
receive a request like this, my initial reaction would be that an
environment variable like that is a bad idea and that it is better to
handle it by the packager for each distribution.  Why?  In many of my
upstream packages, I have ./configure checks that inspects system
characteristics and changes behaviour of my code as appropriate.  I
believe this is the correct approach to handle system differences.  I
also believe it is not possible to adhear to a variable like that,
because it assumes there is one single ideal "system" version of
dependencies.  This isn't the case for almost anything.  Even trivial
functions like strverscmp() in low-level libc has had behavioural bugs
in them.  Should a USE_SYSTEM_DEPS cause the project to assume that
strverscmp() works correctly or not?  You can repeat this question for
even more trivial matters up to big things where there is no single
right answer at all, consider having to support multiple OpenSSL APIs
for example.  What OpenSSL version is a "system dependency"?  The only
reasonable response upstreams can do for this is to detect system
differences, and act accordingly.  Distributions who package these
packages usually just use the defaults, but if there is a bug for your
particular system (like strverscmp() or OpenSSL check returns
incorrectly), you have to patch things depending on how your environment
looks like.  Upstreams doesn't have the context knowledge you do to do
the right thing here.

/Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1251 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:14 Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables Michał Górny
  2025-01-23 13:50 ` Bruno Haible
  2025-01-23 13:53 ` Simon Josefsson
@ 2025-01-23 14:04 ` Bernhard M. Wiedemann
  2025-01-23 14:43   ` Michał Górny
  2025-01-23 15:19 ` Celeste Liu
  3 siblings, 1 reply; 13+ messages in thread
From: Bernhard M. Wiedemann @ 2025-01-23 14:04 UTC (permalink / raw)
  To: Michał Górny, distributions

On 1/23/25 2:14 PM, Michał Górny wrote:
> Hello,
> 
> As a packager, I often find it necessary to ensure two important aspects
> of the build process:
> 
> 1) that the build process itself doesn't access the Internet,
> in particular that it doesn't download any files on its own,
> 
> 2) that the build process uses system shared libraries and other
> dependencies whenever possible, rather than vendored (or downloaded)
> copies.

For openSUSE, we build in Open-Build-Service (OBS) that runs the build 
in KVM without Internet-access. It is the only way to be sure.
That means that everything that is used during the build must be checked 
into the system, so it is known and versioned.
In practice, many rust and golang packages come with large vendor tarballs.

We try to un-bundle libraries to use our system versions instead, but 
what if there are too many incompatible versions required?

What is the problem with
configure --use-system-libfoo

Ciao
Bernhard M.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:50 ` Bruno Haible
@ 2025-01-23 14:09   ` Eli Schwartz
  2025-01-23 14:26   ` Michał Górny
  1 sibling, 0 replies; 13+ messages in thread
From: Eli Schwartz @ 2025-01-23 14:09 UTC (permalink / raw)
  To: Bruno Haible, distributions, Michał Górny


[-- Attachment #1.1: Type: text/plain, Size: 1889 bytes --]

On 1/23/25 8:50 AM, Bruno Haible wrote:
> Michał Górny wrote:
>> 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
>> the build system does not use any vendored dependency for which it
>> supports using a system version instead, and that it links to shared
>> libraries whenever possible.
> 
> This contradicts the GNU Coding Standards [1]. For GNU packages,
> configuration of such things should be done through --with-* and --without-*
> options. NOT through environment variables.
> 
> Proposing something that contradicts the GNU Coding Standards is a non-starter.
> 
> Bruno
> 
> [1] https://www.gnu.org/prep/standards/html_node/Configuration.html


Meson will not implement this environment variable either, for the same
reason.

We won't implement either one, in fact (for us, network is relevant
since meson has an automatic feature to download vendored dependencies
on demand). We have a configuration option accepted on argv to control both.

...

In general I think we are all quite aware of the reasons for avoiding
the network and in general build systems shouldn't attempt to
communicate via TCP/IP whether an environment variable is set or not...
unless the success of the build system directly hinges on network, and
failing to network directly means failing to build. That is the case for
meson with vendored deps, so configuring to use system deps already
means that no network connections will be made (but if you do use
vendored deps, all network downloads are securely verified via secure
hashes). So it suffices to simply design your build system to default to
system deps (meson already does this).

It's quite unclear to me why this should hinge on an environment
variable of all things. If you need an environment variable to control
this, you already failed somewhere else.


-- 
Eli Schwartz

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:50 ` Bruno Haible
  2025-01-23 14:09   ` Eli Schwartz
@ 2025-01-23 14:26   ` Michał Górny
  2025-01-23 17:47     ` Bruno Haible
  1 sibling, 1 reply; 13+ messages in thread
From: Michał Górny @ 2025-01-23 14:26 UTC (permalink / raw)
  To: Bruno Haible, distributions

[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]

On Thu, 2025-01-23 at 14:50 +0100, Bruno Haible wrote:
> Michał Górny wrote:
> > 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
> > the build system does not use any vendored dependency for which it
> > supports using a system version instead, and that it links to shared
> > libraries whenever possible.
> 
> This contradicts the GNU Coding Standards [1]. For GNU packages,
> configuration of such things should be done through --with-* and --without-*
> options. NOT through environment variables.
> 
> Proposing something that contradicts the GNU Coding Standards is a non-starter.
> 

The problem with explicit options is that 1) they are different for
every build system, 2) they require explicit support checks, and 3) they
assume you have an explicit control over the build command-line.

The first problem means that we simply can't standardize of anything
like that.  Even if we agreed on a single name, you'd have --disable-
network and --without-network for autoconf, -Dnetwork=false for Meson,
-DNETWORK=OFF for CMake and probably a dozen more.

The second problem is that we can't simply enable it unconditionally
and be done with it.  In Gentoo, we already do the messy thing of
checking --help output to determine if we can pass stuff like --disable-
shared, and that has already backfired.

The third problem goes much deeper.  I largely come from a Python
background.  The most problematic packages I work with generally involve
something like a PEP517 backend calling setup.py calling CMake, which
in turn may involve some subprojects.  Having an explicit option means
that I need to ask everyone to add an explicit option at every layer,
and ensure that the option is passed down.  And that's assuming that we
are dealing with layers that actually can accept options, which is not
always true.  We aren't living in a perfect world.

Just to be clear, I do not mind at all having explicit options to do
this.  I'm just saying that having these options default based
on environment variables would be very helpful, and make things much
easier.  They mean we can just unconditionally set NO_NETWORK=1
and USE_SYSTEM_DEPS=1, and with some luck, it will just work and make
our lives easier, and in the worst case it won't do any harm.

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:53 ` Simon Josefsson
@ 2025-01-23 14:37   ` Michał Górny
  0 siblings, 0 replies; 13+ messages in thread
From: Michał Górny @ 2025-01-23 14:37 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: distributions

[-- Attachment #1: Type: text/plain, Size: 4141 bytes --]

On Thu, 2025-01-23 at 14:53 +0100, Simon Josefsson wrote:
> Michał Górny <mgorny@gentoo.org> writes:
> 
> > 1) NO_NETWORK -- if it's set to a non-empty value, it requests that
> > programs don't access the (TCP/IP) network.
> 
> To me this sounds like an obviously good idea, and would consider using
> both in my upstream and packaging work.
> 
> > 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
> > the build system does not use any vendored dependency for which it
> > supports using a system version instead, and that it links to shared
> > libraries whenever possible.
> ...
> > For USE_SYSTEM_DEPS, the primary goal is to build against system
> > dependencies.  For example, some upstreams either prefer using vendored
> > dependencies or fallback to them when the system dependencies aren't
> > found.  However, in Gentoo we really do want stuff to use system
> > dependencies -- and if we miss to specify them appropriately, we'd
> > rather see an error than an implicit fallback to a vendored dependency.
> > So if USE_SYSTEM_DEPS is set, the build system should enable using
> > system dependencies whenever supported, and disable all possible
> > fallbacks to vendored dependencies.
> 
> This sounds really complicated, and speaking as an upstream that would
> receive a request like this, my initial reaction would be that an
> environment variable like that is a bad idea and that it is better to
> handle it by the packager for each distribution.  Why?  In many of my
> upstream packages, I have ./configure checks that inspects system
> characteristics and changes behaviour of my code as appropriate.  I
> believe this is the correct approach to handle system differences.  I
> also believe it is not possible to adhear to a variable like that,
> because it assumes there is one single ideal "system" version of
> dependencies.  This isn't the case for almost anything.  Even trivial
> functions like strverscmp() in low-level libc has had behavioural bugs
> in them.  Should a USE_SYSTEM_DEPS cause the project to assume that
> strverscmp() works correctly or not?  You can repeat this question for
> even more trivial matters up to big things where there is no single
> right answer at all, consider having to support multiple OpenSSL APIs
> for example.  What OpenSSL version is a "system dependency"?  The only
> reasonable response upstreams can do for this is to detect system
> differences, and act accordingly.  Distributions who package these
> packages usually just use the defaults, but if there is a bug for your
> particular system (like strverscmp() or OpenSSL check returns
> incorrectly), you have to patch things depending on how your environment
> looks like.  Upstreams doesn't have the context knowledge you do to do
> the right thing here.
> 

Well, I haven't said it's going to be easy.  This will really need to be
decided on case-by-case basis.

Sure, if we are talking about things such as libc differences,
i.e. things that cannot be trivially fixed by installing a missing
dependency or upgrading one, USE_SYSTEM_DEPS wouldn't be applicable
here.

However, if I understand the OpenSSL case correctly, then it would fall
under USE_SYSTEM_DEPS.  If you determine that my system has a broken
OpenSSL library (or there is some other problem related to the check), 
I want to know it, and explicitly decide how to address it.

If the problem is that our OpenSSL is too old, we probably need to bump
the dependency.  If there is something wrong with the check, we want to
look into fixing it.  All that is useful information, and if the package
chose to instead work around the problem, we may not even notice that
something is wrong.  And this is not just theory -- more than once I've
accidentally noticed bugs that were present in our packages for a long
time, but nobody noticed them because the build system chose to
workaround them.

And now imagine a situation where a problem with OpenSSL check results
in using a vendored vulnerable version of OpenSSL.

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 14:04 ` Bernhard M. Wiedemann
@ 2025-01-23 14:43   ` Michał Górny
  0 siblings, 0 replies; 13+ messages in thread
From: Michał Górny @ 2025-01-23 14:43 UTC (permalink / raw)
  To: Bernhard M. Wiedemann, distributions

[-- Attachment #1: Type: text/plain, Size: 1717 bytes --]

On Thu, 2025-01-23 at 15:04 +0100, Bernhard M. Wiedemann wrote:
> On 1/23/25 2:14 PM, Michał Górny wrote:
> > Hello,
> > 
> > As a packager, I often find it necessary to ensure two important aspects
> > of the build process:
> > 
> > 1) that the build process itself doesn't access the Internet,
> > in particular that it doesn't download any files on its own,
> > 
> > 2) that the build process uses system shared libraries and other
> > dependencies whenever possible, rather than vendored (or downloaded)
> > copies.
> 
> For openSUSE, we build in Open-Build-Service (OBS) that runs the build 
> in KVM without Internet-access. It is the only way to be sure.

I think all major distributions use some kind of Internet sandboxing
now.  Still, I think the more correct solution is for packages not to
attempt accessing Internet in the first place, rather than us to rely
that we will be able to actually successfully sandbox them.

> We try to un-bundle libraries to use our system versions instead, but 
> what if there are too many incompatible versions required?

As I've said in the other reply, then the decision should up
to the packager.  Yes, there will be cases when we will decide that
unvendoring is impossible or "not worth the drawbacks".  Still,
disabling vendoring by default and enabling it when necessary is better
than having to disable it every time separately.

> What is the problem with
> configure --use-system-libfoo

I've already explained that in the reply to Bruno Haible [1].

[1] https://lore.kernel.org/distributions/aad2b06f-df80-4751-a667-8fc4c4e10482@suse.de/T/#me20568c9c4267a16b1e096fa12224e980042f77a

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 13:14 Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables Michał Górny
                   ` (2 preceding siblings ...)
  2025-01-23 14:04 ` Bernhard M. Wiedemann
@ 2025-01-23 15:19 ` Celeste Liu
  2025-01-23 15:38   ` Michał Górny
  2025-01-23 19:42   ` Simon Josefsson
  3 siblings, 2 replies; 13+ messages in thread
From: Celeste Liu @ 2025-01-23 15:19 UTC (permalink / raw)
  To: mgorny, distributions; +Cc: Celeste Liu

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7502 bytes --]

On 2025-01-23 21:14, Michał Górny wrote:
> Hello,
> 
> As a packager, I often find it necessary to ensure two important aspects
> of the build process:
> 
> 1) that the build process itself doesn't access the Internet,
> in particular that it doesn't download any files on its own,
> 
> 2) that the build process uses system shared libraries and other
> dependencies whenever possible, rather than vendored (or downloaded)
> copies.
> 
> Unfortunately, these requirements don't always align with what
> the upstream considers best for their own use and what they consider
> the best defaults for their users.  Fortunately, we are often able to
> reach an agreement and get the options to adjust the build system
> behavior.  However, these options are often defined per project
> and they aren't necessarily consistent across different projects.
> 
> Given the recent success of how the NO_COLOR variable [1] became a de-
> facto standard, I was wondering if we could perhaps attempt to
> standardize environment variables for these two aspects as well.
> I suppose many other distribution packagers are facing the same
> problems, so I think this mailing list is a good place to discuss this.
> 
> What I'd like to propose are two environment variables:
> 
> 1) NO_NETWORK -- if it's set to a non-empty value, it requests that
> programs don't access the (TCP/IP) network.

It may be better to be named NO_INTERNET. Network is a confusing word in Linux 
world. They can refer to something from only Internet to any protocols in 
network subsystem, even include AF_NETLINK... I have been asked why my udev is 
broken when my program is in a netns many times. Systemd also have to inform 
this point in their document of PrivateNetwork. So use more limited word 
Internet to avoid this.

> 
> 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
> the build system does not use any vendored dependency for which it
> supports using a system version instead, and that it links to shared
> libraries whenever possible.
> 
> Some examples and thoughts below, followed by rationale.
> 
> 
> For NO_NETWORK, my primary goal is to have build systems not issue
> commands that fetch stuff from the Internet.  For example, if a build
> system supports automatically fetching and building missing
> dependencies, setting NO_NETWORK would imply that it would fail instead.
> Technically, this could also be extended to tools like wget, effectively
> blocking Internet access on multiple layers -- but that's not strictly
> necessary.
> 
> Another use case would be test suites with tests accessing remote
> servers -- the variable could automatically cause them to be skipped. 
> It could be also used e.g. by pytest-socket plugin to automatically cut
> the test suite from the Internet when loaded.
> 
> I think it would also make sense to imply not accessing local network
> services, in particular local system services.  An example of that are
> test suites that connect to the local database daemons rather than
> starting their own isolated copies.
> 
> As for the rationale, my focus would be on security, 
> and reliability.  Package management systems in general implement
> streamlined procedures for fetching resources, including verification of
> authenticity, use of local mirrors, caching and so on.  Fetching
> resources directly bypasses this.  This could expose information about
> what is happening on a particular machine, cause unnecessary server
> load, cause fees due to data plan use, cause failures due to shoddy
> Internet connection -- similarly for tests.  I've written a more
> detailed explanation once in the Gentoo devmanual [1].
> 
> 
> For USE_SYSTEM_DEPS, the primary goal is to build against system
> dependencies.  For example, some upstreams either prefer using vendored
> dependencies or fallback to them when the system dependencies aren't
> found.  However, in Gentoo we really do want stuff to use system
> dependencies -- and if we miss to specify them appropriately, we'd
> rather see an error than an implicit fallback to a vendored dependency.
> So if USE_SYSTEM_DEPS is set, the build system should enable using
> system dependencies whenever supported, and disable all possible
> fallbacks to vendored dependencies.

Some build system (e.g. Meson) have infrastructure of switch between system 
library and vendor library, it's good, we only need to expose the switch via 
environment variable.

But in many build system, especially in "modern" build system like Go, Cargo 
(Rust) and NPM (Node.js), they are not good on this infrastructure:

Cargo users normally use feature gate to control whether use system library, but 
the gate name and the gate direction are not standardized, someone use 
'vendored-xxx' and some others use 'system-xxx', and in Cargo we can only 
control the package behavior we faced directly, not indirect dependencies. if we 
want to control the bahavior of dependencies, the only way is hoping ALL the 
package author make a feature gate to pass this switch to its dependency.

In NPM, the situation is even worse. NPM ecosystem prefer to bundle everything. 
They have a --build-from-source in node-gyp, but not all package use it and it 
only affect the library that will be load by nodejs. In fact, many nodejs 
packages, especially which have some web contents, may download a copy of 
chromium. The switch for it is not standardized and even not existed in some 
projects.

For Golang, it doesn't have any infrastructure for switch. Their library is like 
an union: either use system library or use vendored version.

So the first step may be to build a basic standardized way to use system 
resources (include link library and use some executable files) in these language 
and build system's ecosystem.

> 
> The wording proposed above tries to account for the special case when
> the package does not support a system dependency version at all -- e.g.
> when they are patching the vendored dependency.  It's not ideal, but I'd
> like to avoid blocking a major improvement just because we can't get it
> perfect.
> 
> The rationale for avoiding vendored dependencies have been repeated many
> times, we have one on Gentoo Wiki [2], Fedora has one as well [3].
> The main focus is ensuring that our security team is able to address
> security issues, but being able to fix bugs and simply avoiding
> duplication is also helpful.
> 
> I've included mention of shared libraries since some upstreams seem to
> prefer static linking for similar reasons that they prefer vendoring. 
> While technically this could be considered a separate issue (and perhaps
> deserving a separate flag), I don't think there really are distributions
> who want unvendoring but not dynamic linking, so it's simpler to have
> a single "distro packager mode" variable.  Again, it assumes that there
> will be cases when dynamic linking isn't possible, particularly when
> there are no shared libraries.
> 
> 
> [1] https://devmanual.gentoo.org/ebuild-writing/functions/src_test/index.html#tests-that-require-network-or-service-access
> [2] https://wiki.gentoo.org/wiki/Why_not_bundle_dependencies
> [3] https://fedoraproject.org/wiki/Bundled_Libraries
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 15:19 ` Celeste Liu
@ 2025-01-23 15:38   ` Michał Górny
  2025-01-23 16:13     ` Celeste Liu
  2025-01-23 19:42   ` Simon Josefsson
  1 sibling, 1 reply; 13+ messages in thread
From: Michał Górny @ 2025-01-23 15:38 UTC (permalink / raw)
  To: Celeste Liu, distributions

[-- Attachment #1: Type: text/plain, Size: 3786 bytes --]

On Thu, 2025-01-23 at 23:19 +0800, Celeste Liu wrote:
> On 2025-01-23 21:14, Michał Górny wrote:
> > 1) NO_NETWORK -- if it's set to a non-empty value, it requests that
> > programs don't access the (TCP/IP) network.
> 
> It may be better to be named NO_INTERNET. Network is a confusing word in Linux 
> world. They can refer to something from only Internet to any protocols in 
> network subsystem, even include AF_NETLINK... I have been asked why my udev is 
> broken when my program is in a netns many times. Systemd also have to inform 
> this point in their document of PrivateNetwork. So use more limited word 
> Internet to avoid this.

To be honest, I've been trying to follow the term I've subjectively
judged to be the most common.  Some projects also use "socket".

On the other hand, as network namespacing shows, this isn't really
limited to Internet — we also want to cut stuff from accessing services
on local network as well.

> > 
> > 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
> > the build system does not use any vendored dependency for which it
> > supports using a system version instead, and that it links to shared
> > libraries whenever possible.
> > […]
> 
> Some build system (e.g. Meson) have infrastructure of switch between system 
> library and vendor library, it's good, we only need to expose the switch via 
> environment variable.
> 
> But in many build system, especially in "modern" build system like Go, Cargo 
> (Rust) and NPM (Node.js), they are not good on this infrastructure:
> 
> Cargo users normally use feature gate to control whether use system library, but 
> the gate name and the gate direction are not standardized, someone use 
> 'vendored-xxx' and some others use 'system-xxx', and in Cargo we can only 
> control the package behavior we faced directly, not indirect dependencies. if we 
> want to control the bahavior of dependencies, the only way is hoping ALL the 
> package author make a feature gate to pass this switch to its dependency.

Well, I have some Cargo experience, so I'm going to focus on this.
In my experience, feature-gating is not the only way this is done.  Some
packages (e.g. zstd-sys) use custom environment variables instead. 
Others just default to using a system library, with fallback to vendored
version.

> In NPM, the situation is even worse. NPM ecosystem prefer to bundle everything. 
> They have a --build-from-source in node-gyp, but not all package use it and it 
> only affect the library that will be load by nodejs. In fact, many nodejs 
> packages, especially which have some web contents, may download a copy of 
> chromium. The switch for it is not standardized and even not existed in some 
> projects.
> 
> For Golang, it doesn't have any infrastructure for switch. Their library is like 
> an union: either use system library or use vendored version.
> 
> So the first step may be to build a basic standardized way to use system 
> resources (include link library and use some executable files) in these language 
> and build system's ecosystem.

I am thinking of this proposal as a prelude to that.  What I really
would like to achieve here is to set some standard variable names, so we
could work on individual ecosystems and build systems with standards to
back that work.  In other words, I'd like to avoid having every package
come up with their own custom ways of doing this — and I'm worried that
if there's no "standard" behind my effort, different projects will be
more likely to choose their own variable names (say, NO_INTERNET vs.
NO_NETWORK vs. DISABLE_INTERNET…), or go for project-local variable
(FROBNICATE_NO_NETWORK).


-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 15:38   ` Michał Górny
@ 2025-01-23 16:13     ` Celeste Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Celeste Liu @ 2025-01-23 16:13 UTC (permalink / raw)
  To: Michał Górny, distributions

On 2025-01-23 23:38, Michał Górny wrote:
> On Thu, 2025-01-23 at 23:19 +0800, Celeste Liu wrote:
>> On 2025-01-23 21:14, Michał Górny wrote:
>>> 1) NO_NETWORK -- if it's set to a non-empty value, it requests that
>>> programs don't access the (TCP/IP) network.
>>
>> It may be better to be named NO_INTERNET. Network is a confusing word in Linux 
>> world. They can refer to something from only Internet to any protocols in 
>> network subsystem, even include AF_NETLINK... I have been asked why my udev is 
>> broken when my program is in a netns many times. Systemd also have to inform 
>> this point in their document of PrivateNetwork. So use more limited word 
>> Internet to avoid this.
> 
> To be honest, I've been trying to follow the term I've subjectively
> judged to be the most common.  Some projects also use "socket".
> 
> On the other hand, as network namespacing shows, this isn't really
> limited to Internet — we also want to cut stuff from accessing services
> on local network as well.

Yeah. I normal think isolation with Internet and isolation with other parts of 
the same OS are two parts. So I will get confused.

> 
>>>
>>> 2) USE_SYSTEM_DEPS -- if it's set to a non-empty value, it requests that
>>> the build system does not use any vendored dependency for which it
>>> supports using a system version instead, and that it links to shared
>>> libraries whenever possible.
>>> […]
>>
>> Some build system (e.g. Meson) have infrastructure of switch between system 
>> library and vendor library, it's good, we only need to expose the switch via 
>> environment variable.
>>
>> But in many build system, especially in "modern" build system like Go, Cargo 
>> (Rust) and NPM (Node.js), they are not good on this infrastructure:
>>
>> Cargo users normally use feature gate to control whether use system library, but 
>> the gate name and the gate direction are not standardized, someone use 
>> 'vendored-xxx' and some others use 'system-xxx', and in Cargo we can only 
>> control the package behavior we faced directly, not indirect dependencies. if we 
>> want to control the bahavior of dependencies, the only way is hoping ALL the 
>> package author make a feature gate to pass this switch to its dependency.
> 
> Well, I have some Cargo experience, so I'm going to focus on this.
> In my experience, feature-gating is not the only way this is done.  Some
> packages (e.g. zstd-sys) use custom environment variables instead. 
> Others just default to using a system library, with fallback to vendored
> version.
> 
>> In NPM, the situation is even worse. NPM ecosystem prefer to bundle everything. 
>> They have a --build-from-source in node-gyp, but not all package use it and it 
>> only affect the library that will be load by nodejs. In fact, many nodejs 
>> packages, especially which have some web contents, may download a copy of 
>> chromium. The switch for it is not standardized and even not existed in some 
>> projects.
>>
>> For Golang, it doesn't have any infrastructure for switch. Their library is like 
>> an union: either use system library or use vendored version.
>>
>> So the first step may be to build a basic standardized way to use system 
>> resources (include link library and use some executable files) in these language 
>> and build system's ecosystem.
> 
> I am thinking of this proposal as a prelude to that.  What I really
> would like to achieve here is to set some standard variable names, so we
> could work on individual ecosystems and build systems with standards to
> back that work.  In other words, I'd like to avoid having every package
> come up with their own custom ways of doing this — and I'm worried that
> if there's no "standard" behind my effort, different projects will be
> more likely to choose their own variable names (say, NO_INTERNET vs.
> NO_NETWORK vs. DISABLE_INTERNET…), or go for project-local variable
> (FROBNICATE_NO_NETWORK).

I mean that we needn't to inform every projects to make 
NO_NETWORK/USE_SYSTEM_DEPS were used, we just co-work with the build system 
develop team, help them unify the way to use system resources and access network 
in their build system ecosystem, and finally add NO_NETWORK/USE_SYSTEM_DEPS as
a way to control behaviors. i.e. I think it's better to make the build system
become the situation of Meson first, and then add NO_NETWORK/USE_SYSTEM_DEPS
to them. The project developers just treat it as a good new feature of build
system, needn't think the reason and internal implementation too much. The
build system developers do what they are good at (improve the build tool). 
The distro maintainers needn't to communite every projects maintainers. 
The project developers needn't to implement the dependencies handling by
themselves. Everyone will be happy.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 14:26   ` Michał Górny
@ 2025-01-23 17:47     ` Bruno Haible
  0 siblings, 0 replies; 13+ messages in thread
From: Bruno Haible @ 2025-01-23 17:47 UTC (permalink / raw)
  To: distributions, Michał Górny

Michał Górny wrote:
> The problem with explicit options

The problems with your environment variable proposal are that:

  1) You can't just shove your desired way of doing things into
     build systems that originated from 1995 to 2025, that are file-based
     or network-based (like npm or cargo), that have unit tests in the
     source files or in specific files, etc.

  2) Environment variable values don't appear in the logs, while command
     line invocations typically do. Thus a developer would wonder "why
     does this build behave differently than on my machine?", creating
     LOTS of headaches.

> is that 1) they are different for every build system
> The first problem means that we simply can't standardize of anything
> like that.  Even if we agreed on a single name, you'd have --disable-
> network and --without-network for autoconf, -Dnetwork=false for Meson,
> -DNETWORK=OFF for CMake and probably a dozen more.

Yeah. Face it. There is no magic wand that will make wide varieties
of build systems work alike.

> 2) they require explicit support checks,
> The second problem is that we can't simply enable it unconditionally
> and be done with it.  In Gentoo, we already do the messy thing of
> checking --help output to determine if we can pass stuff like --disable-
> shared, and that has already backfired.

Yes. A fact of life as well: Not all packages can be configured in the
same way. And yes, although the './configure --help' of some package
may show an option, occasionally such an option does not work.

If you want a distro will very small per-package configuration, look at
T2SDE.

> and 3) they
> assume you have an explicit control over the build command-line.
> 
> The third problem goes much deeper.  I largely come from a Python
> background.  The most problematic packages I work with generally involve
> something like a PEP517 backend calling setup.py calling CMake, which
> in turn may involve some subprojects.  Having an explicit option means
> that I need to ask everyone to add an explicit option at every layer,
> and ensure that the option is passed down.

Yes: If someone has added layers over a package's build system ('ebuild'
or whatever), that layer ought to make sure it doesn't prohibit configuration
possibilities on the way. That layer is not proprietary; so, you ought to
be able to fix that.

Bruno




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables
  2025-01-23 15:19 ` Celeste Liu
  2025-01-23 15:38   ` Michał Górny
@ 2025-01-23 19:42   ` Simon Josefsson
  1 sibling, 0 replies; 13+ messages in thread
From: Simon Josefsson @ 2025-01-23 19:42 UTC (permalink / raw)
  To: Celeste Liu; +Cc: mgorny, distributions

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

Celeste Liu <uwu@coelacanthus.name> writes:

>> 1) NO_NETWORK -- if it's set to a non-empty value, it requests that
>> programs don't access the (TCP/IP) network.
> 
> It may be better to be named NO_INTERNET. Network is a confusing word in Linux 
> world. They can refer to something from only Internet to any protocols in 
> network subsystem, even include AF_NETLINK... I have been asked why my udev is 
> broken when my program is in a netns many times. Systemd also have to inform 
> this point in their document of PrivateNetwork. So use more limited word 
> Internet to avoid this.

You are right.  While my initial reaction to NO_NETWORK was positive, I
have realized that there are subtle issues that is really hard to
resolve.  I considered adding support for NO_NETWORK to GNU InetUtils
but I am beginning to feel that even NO_NETWORK has the same critical
concerns that USE_SYSTEM_DEPS has: What exactly should NO_NETWORK mean
to a package?  What can it assume and what MUST it fail on?

- Can it use localhost IP connectivity?

- TCP?

- Multicast?

- Can it rely on non-DNS /etc/hosts name resolution working?

- Can it rely on /etc/services being able to lookup network service
  names?

- Is it allowed to inspect routing table on the system?

etc

Disabling all of those functionality when NO_NETWORK is probably not
what was intended, right?

When doing self-checks for 'ftp' and 'ftpd' it makes sense to start your
newly built ftpd and test interop it against your newly built ftp
binary.  I don't think the intention is to forbid this.

/Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1251 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-01-23 19:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 13:14 Standardizing NO_NETWORK and USE_SYSTEM_DEPS environment variables Michał Górny
2025-01-23 13:50 ` Bruno Haible
2025-01-23 14:09   ` Eli Schwartz
2025-01-23 14:26   ` Michał Górny
2025-01-23 17:47     ` Bruno Haible
2025-01-23 13:53 ` Simon Josefsson
2025-01-23 14:37   ` Michał Górny
2025-01-23 14:04 ` Bernhard M. Wiedemann
2025-01-23 14:43   ` Michał Górny
2025-01-23 15:19 ` Celeste Liu
2025-01-23 15:38   ` Michał Górny
2025-01-23 16:13     ` Celeste Liu
2025-01-23 19:42   ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox