* Representing Debian Metadata in Git
@ 2024-08-20 7:10 Simon Richter
2024-08-21 21:37 ` Chris Hofstaedtler
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Simon Richter @ 2024-08-20 7:10 UTC (permalink / raw)
To: git, debian-devel
Hi,
there's a bit of a discussion within Debian on collaborating using Git.
One of the long-standing issues is that there are multiple ways Debian
packaging can be represented in a git tree, and none of them are optimal.
The problem at hand is that the packaging workflow consists of
1. importing an upstream release
2. optionally stripping out undistributable parts
3. adding packaging metadata
4. optionally adding a patch stack
The workflow for upgrading a package is
1. import a new upstream release
2. apply and possibly modify the exclusion list
3. apply the packaging metadata, updating it in the process
4. rebase the patch stack
Right now, git is used mainly as a network file system, and only tagged
releases are expected to be consistent enough to compile, because often
going from one consistent state to another as an atomic operation would
require multiple changes to be applied in the same commit.
The imported archive is represented either directly as a tree (which may
be imported from the upstream project if no files are undistributable
for Debian), or via a mechanism that can reproduce a compressed archive
that is bitwise identical to the upstream release, from a tree and some
additional patch data.
The patch stack is stored as a set of patches inside a directory, and
rebased using quilt.
An alternate representation stores the patch stack as a branch that is
rebased using git, and then exported to single files.
The Debian changelog is stored as a file inside Git, but some automation
exists to update this from Git commit messages.
Debian changelog entries refer to bugs in the Debian Bug Tracking
system. There is a desire to also incorporate forges (currently, GitLab)
and refer to the forges' issue tracker from commit messages (where the
issue tracker is used for team collaboration, while the Debian BTS is
used for user-visible bugs).
All of this is very silly, because we're essentially storing metadata as
data because we cannot express in Git what we're actually doing, and the
conflicting priorities people have have led to conflicting solutions.
I'd like to xkcd 927 this now, and find a common mapping.
From a requirements perspective, I'd like to be able to
- express patches as commits:
- allow cherry-picking upstream commits as Debian patches
- allow cherry-picking Debian patches for upstream submission
- generate the Debian changelog from changes committed to Git
- express filter steps for generating the upstream archive(s) from a
tree‑ish and some metadata
- store upstream signatures inside Git
- keep a history of patches, including patches applied to previously
released packages
A possible implementation would be a type of Git "user extension" object
that contains
- an extension name
- an object type (interpreted by the extension)
- type-tagged references to other objects
- other type-tagged data
Validity of the object would be determined by the extension, and git
would treat this object as mostly opaque (i.e. whenever one is
encountered, the extension needs to be called). The only exception would
be references, because we need to be able to transfer these objects and
all their dependencies efficiently (so the extension would generate a
list of references that should be recursively packed or omitted).
On top of that, we could represent a Debian package through special
objects, such as
- debian::debian-dir (a tree-like object referenced from the root
tree, contains a tree for plain files plus links to special objects for
generated items, such as patch stacks)
- debian::upstream-archive (a tree-like object that marks the boundary
between objects imported from upstream, and objects that are part of
packaging, and gives instructions for regenerating the upstream archives
without storing them as blobs)
- debian::update-upstream (a commit-like object to move to a new
upstream-archive object, this contains the upstream version number that
the following upload object must use)
- debian::changelog-entry (a commit-like object that adds an item to
the Debian changelog)
- debian::upload (a commit-like object that adds a version to the
Debian changelog)
- debian::rebase-patches (a commit-like object that links the patch
stacks before and after a rebase)
- ...
Changes to packaging would still be represented as commit objects
containing a tree, but that tree would contain a special entry for the
"debian" subdirectory that points to the last packaging change.
This is very high-level so far, because I'd like to get some feedback
first on whether it makes sense to pursue this further. This would use
up the last unused three-bit object type in Git, so it will have to be
very generic on this side to not block future development -- and it
would require a lot of design effort on the Debian side as well to
hammer out the details.
Any feelings/objections/missed requirements?
Simon
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Representing Debian Metadata in Git
2024-08-20 7:10 Representing Debian Metadata in Git Simon Richter
@ 2024-08-21 21:37 ` Chris Hofstaedtler
2024-08-21 23:11 ` rsbecker
2024-08-22 19:27 ` Blair Noctis
2024-08-23 2:20 ` Sean Whitton
2 siblings, 1 reply; 6+ messages in thread
From: Chris Hofstaedtler @ 2024-08-21 21:37 UTC (permalink / raw)
To: Simon Richter; +Cc: git, debian-devel
Hi Simon,
* Simon Richter <sjr@debian.org> [240820 09:11]:
> One of the long-standing issues is that there are multiple ways Debian
> packaging can be represented in a git tree, and none of them are optimal.
[..]
> A possible implementation would be a type of Git "user extension" object
> that contains
>
> - an extension name
> - an object type (interpreted by the extension)
> - type-tagged references to other objects
> - other type-tagged data
[..]
> Any feelings/objections/missed requirements?
In the current DEP14/DEP18 discussions a lot of discussion was had
about how we should represent Debian things in git; your mail also
goes into this direction.
My *feeling* is we should do the opposite - that is, represent less
Debian stuff in git, and especially do it in less Debian-specific
ways. IOW, no git extensions, no setup with multiple branches that
contain more or less unrelated things, etc.
I think we should move more towards a setup that is easily
understood by people not closely following our Debian-specific
things. We should avoid surprising things, again that would include
the multiple branches and any git extensions.
Before pushing for new ways of representing Debian stuff in git, I
think it would be a good idea to learn from all the other distros
and distro-like systems successfully using git [1]. Debian is not
the only distro that wants to use git to capture changes and
encourage contributions to its packages.
Chris
[1] alpine, homebrew, freebsd ports come to mind immediately. nixos
and others too.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Representing Debian Metadata in Git
2024-08-21 21:37 ` Chris Hofstaedtler
@ 2024-08-21 23:11 ` rsbecker
2024-08-21 23:51 ` Chris Hofstaedtler
0 siblings, 1 reply; 6+ messages in thread
From: rsbecker @ 2024-08-21 23:11 UTC (permalink / raw)
To: 'Chris Hofstaedtler', 'Simon Richter'; +Cc: git, debian-devel
On Wednesday, August 21, 2024 5:38 PM, Chris Hofstaedtler wrote:
>* Simon Richter <sjr@debian.org> [240820 09:11]:
>> One of the long-standing issues is that there are multiple ways Debian
>> packaging can be represented in a git tree, and none of them are optimal.
>[..]
>> A possible implementation would be a type of Git "user extension"
>> object that contains
>>
>> - an extension name
>> - an object type (interpreted by the extension)
>> - type-tagged references to other objects
>> - other type-tagged data
>[..]
>
>> Any feelings/objections/missed requirements?
>
>In the current DEP14/DEP18 discussions a lot of discussion was had about how we
>should represent Debian things in git; your mail also goes into this direction.
>
>My *feeling* is we should do the opposite - that is, represent less Debian stuff in git,
>and especially do it in less Debian-specific ways. IOW, no git extensions, no setup
>with multiple branches that contain more or less unrelated things, etc.
>
>I think we should move more towards a setup that is easily understood by people
>not closely following our Debian-specific things. We should avoid surprising things,
>again that would include the multiple branches and any git extensions.
>
>Before pushing for new ways of representing Debian stuff in git, I think it would be a
>good idea to learn from all the other distros and distro-like systems successfully
>using git [1]. Debian is not the only distro that wants to use git to capture changes
>and encourage contributions to its packages.
On the other side (perhaps), git is increasingly being used in the Ops setting for
DevOps and DevSecOps. Production configurations for high-value applications are
moving to storing those configurations into git for tracing and audit. Git is an
enabler for good production operations practices. My $0.02 (and my customers')
--Randall
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Representing Debian Metadata in Git
2024-08-21 23:11 ` rsbecker
@ 2024-08-21 23:51 ` Chris Hofstaedtler
0 siblings, 0 replies; 6+ messages in thread
From: Chris Hofstaedtler @ 2024-08-21 23:51 UTC (permalink / raw)
To: rsbecker; +Cc: 'Simon Richter', git, debian-devel
* rsbecker@nexbridge.com <rsbecker@nexbridge.com> [240822 01:21]:
> >> Any feelings/objections/missed requirements?
> >
> >In the current DEP14/DEP18 discussions a lot of discussion was had about how we
> >should represent Debian things in git; your mail also goes into this direction.
> >
> >My *feeling* is we should do the opposite - that is, represent less Debian stuff in git,
> >and especially do it in less Debian-specific ways. IOW, no git extensions, no setup
> >with multiple branches that contain more or less unrelated things, etc.
>[..]
> On the other side (perhaps), git is increasingly being used in the Ops setting for
> DevOps and DevSecOps. Production configurations for high-value applications are
> moving to storing those configurations into git for tracing and audit. Git is an
> enabler for good production operations practices.
Don't get me wrong. Yes, we should use git to do what git is good
for (tracking changes, etc).
We should not invent new ways of using git that no one else uses.
I'd like to reduce the delta of "how Debian uses git" to "how
everyone else uses git" to, hopefully, zero.
Chris
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Representing Debian Metadata in Git
2024-08-20 7:10 Representing Debian Metadata in Git Simon Richter
2024-08-21 21:37 ` Chris Hofstaedtler
@ 2024-08-22 19:27 ` Blair Noctis
2024-08-23 2:20 ` Sean Whitton
2 siblings, 0 replies; 6+ messages in thread
From: Blair Noctis @ 2024-08-22 19:27 UTC (permalink / raw)
To: Simon Richter; +Cc: git, debian-devel
[-- Attachment #1: Type: text/plain, Size: 3605 bytes --]
On 2024-08-20 15:10, Simon Richter wrote:
(...)
> Right now, git is used mainly as a network file system, and only tagged
> releases are expected to be consistent enough to compile, because often
> going from one consistent state to another as an atomic operation would
> require multiple changes to be applied in the same commit.
>
> The imported archive is represented either directly as a tree (which may
> be imported from the upstream project if no files are undistributable
> for Debian), or via a mechanism that can reproduce a compressed archive
> that is bitwise identical to the upstream release, from a tree and some
> additional patch data.
>
> The patch stack is stored as a set of patches inside a directory, and
> rebased using quilt.
>
> An alternate representation stores the patch stack as a branch that is
> rebased using git, and then exported to single files.
>
> The Debian changelog is stored as a file inside Git, but some automation
> exists to update this from Git commit messages.
>
> Debian changelog entries refer to bugs in the Debian Bug Tracking
> system. There is a desire to also incorporate forges (currently, GitLab)
> and refer to the forges' issue tracker from commit messages (where the
> issue tracker is used for team collaboration, while the Debian BTS is
> used for user-visible bugs).
>
> All of this is very silly, because we're essentially storing metadata as
> data because we cannot express in Git what we're actually doing, and the
> conflicting priorities people have have led to conflicting solutions.
>
> I'd like to xkcd 927 this now, and find a common mapping.
Here's my very likely very naive 2 cents: we are basically maintaining a
fork for each non-native package.
Being a fork, a "Debianized" package can also live like other "upstream"
forks: with its own branch based on the original, make necessary changes
and record them as commits; merge original onto its own branch, dealing
with conflicts; maintain its own changelog; rinse and repeat.
Debian-specific metadata can be represented structurally in commit
messages, or if necessary, (still) in a plain debian/ subdirectory that
won't conflict with upstream.
Then,
> From a requirements perspective, I'd like to be able to
>
> - express patches as commits:
> - allow cherry-picking upstream commits as Debian patches
> - allow cherry-picking Debian patches for upstream submission
> - generate the Debian changelog from changes committed to Git
> - express filter steps for generating the upstream archive(s) from a
> tree‑ish and some metadata
> - store upstream signatures inside Git
> - keep a history of patches, including patches applied to previously
> released packages
these are naturally met; and
(...)
> Changes to packaging would still be represented as commit objects
> containing a tree, but that tree would contain a special entry for the
> "debian" subdirectory that points to the last packaging change.
no more needed.
> This is very high-level so far, because I'd like to get some feedback
> first on whether it makes sense to pursue this further.This would use
> up the last unused three-bit object type in Git, so it will have to be
> very generic on this side to not block future development -- and it
> would require a lot of design effort on the Debian side as well to
> hammer out the details.
Even less thought out, but probably easier to implement once the design
is finished. ;)
--
Sdrager,
Blair Noctis
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Representing Debian Metadata in Git
2024-08-20 7:10 Representing Debian Metadata in Git Simon Richter
2024-08-21 21:37 ` Chris Hofstaedtler
2024-08-22 19:27 ` Blair Noctis
@ 2024-08-23 2:20 ` Sean Whitton
2 siblings, 0 replies; 6+ messages in thread
From: Sean Whitton @ 2024-08-23 2:20 UTC (permalink / raw)
To: Simon Richter; +Cc: git, debian-devel
[-- Attachment #1: Type: text/plain, Size: 945 bytes --]
Hello,
I think that more than you realise of this already exists :)
On Tue 20 Aug 2024 at 04:10pm +09, Simon Richter wrote:
> From a requirements perspective, I'd like to be able to
>
> - express patches as commits:
> - allow cherry-picking upstream commits as Debian patches
> - allow cherry-picking Debian patches for upstream submission
git-debrebase and git-dpm already achieve this.
> - express filter steps for generating the upstream archive(s) from a tree‑ish
> and some metadata
Excluded-Files in d/copyright is for this.
I guess that you disprefer that because it's part of the tree, though.
> - store upstream signatures inside Git
Well, there's signatures on their tags.
> - keep a history of patches, including patches applied to previously released
> packages
This is already there with git-debrebase and git-dpm, though it is a bit
fiddly to dig it out.
--
Sean Whitton
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 869 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-23 2:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-20 7:10 Representing Debian Metadata in Git Simon Richter
2024-08-21 21:37 ` Chris Hofstaedtler
2024-08-21 23:11 ` rsbecker
2024-08-21 23:51 ` Chris Hofstaedtler
2024-08-22 19:27 ` Blair Noctis
2024-08-23 2:20 ` Sean Whitton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).