Notes from the Git Contributor's Summit, 2023

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Notes from the Git Contributor's Summit, 2023
@ 2023-10-02 15:15 Taylor Blau
  2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:15 UTC (permalink / raw)
  To: git

It was great to see folks virtually last week at the Contributor's
Summit!

I took a couple of days off at the end of last week, but polished up the
notes we took during the Contributor's Summit to share with the list.

The notes are available (as read-only) in Google Docs, too, for folks
who prefer to view them there are the following link:

    https://docs.google.com/document/d/1GKoYtVhpdr_N2BAonYsxVTpPToP1CgCS9um0K7Gx9gQ

At the Contributor's Summit, we discussed the following topics:

  - Welcome / Conservancy Update (Taylor Blau)
  - Next-gen reference backends (Patrick Steinhardt)
  - Libification Goals and Progress (Emily Shaffer)
  - Designing a Makefile for multiple libraries (Calvin Wan)
  - Scaling Git from a forge's perspective (Taylor Blau)
  - Replacing git LFS using multiple promisor remotes (Christian Couder)
  - Clarifying backwards compatibility and when we break it (Emily Shaffer)
  - Authentication to new hosts without setup (M Hickford)
  - Update on jj, including at Google (Martin von Zweigbergk)
  - Code churn and cleanups (Calvin Wan)
  - Project management practices (Emily Shaffer)
  - Improving new contrib onboarding (Jonathan Nieder)

The list of all topics proposed (and the number of votes they received)
are here:

    https://docs.google.com/spreadsheets/d/1EnhmTeEqRBlEI2pMAO3oZ4rO1xEwBzYp2vS4CMtvge8

I'll send the broken-out notes for each topic in a response to this
message for posterity, and so folks can continue the discussion on the
list.

Like last year, if you have any feedback on how the Contributor's Summit
went (especially as it relates to the virtual format we had this year),
please feel free to share it with me here, or off-list.

I hope to see everybody in person next year!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 0/12] Welcome / Conservancy Update
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
@ 2023-10-02 15:17 ` Taylor Blau
  2023-10-02 15:17 ` [TOPIC 1/12] Next-gen reference backends Taylor Blau
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:17 UTC (permalink / raw)
  To: git

(Presenter: Taylor Blau, Notetaker: Keanen Wold)

* Software Freedom Conservancy status report
* We have about $89k in the Git project account (up ~$20k from last year)
   * Biggest expense is Heroku - Fusion has been covering the bill
      * There's on and off work on porting from a Rails app to a static site:
        https://github.com/git/git-scm.com/issues/942
   * Dan Moore from FusionAuth has been providing donations
   * Ideally we are able to move away from using Heroku, but in the meantime
     we'll have coverage either from (a) FusionAuth, or (b) Heroku's new
     open-source credit system
* We have more money than we have plans for, we're looking for ideas[a] on how
  to spend this money such as funding people to visit our conferences and
  sponsoring students to learn more about Git
* Trademark considerations for people using "Git" in their product names
   * We do have general council and are trying to think more about what the Git
     trademark means
   * Question - are there other conservancy products who have trademark issues
      * They hold all trademarks for their projects
      * Git has had the most problems with people/products using Git in their
        name
      * They reach out with letters, etc. and have not had to take legal action
        in most cases
   * Question - how do we enforce the rules when we have GitHub and GitLab?
      * The trademark has exemptions for Hub and Lab
      * We need to hold the line for the trademark for new companies, etc. using
        the name otherwise we lose our leverage to protect the name
   * Question - have the trademark ‘offenses' been growing?
      * It's been pretty stable
      * We're looking to be fair
   * Additional questions can be sent to Pono

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 1/12] Next-gen reference backends
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
  2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
@ 2023-10-02 15:17 ` Taylor Blau
  2023-10-02 15:18 ` [TOPIC 02/12] Libification Goals and Progress Taylor Blau
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:17 UTC (permalink / raw)
  To: git

(Presenter: Patrick Steinhardt, Notetaker: Karthik Nayak)

* Summary: There have been multiple proposals for reference backends on the
  mailing list. Trying to converge to one solution.
* Problem: At GitLab we have certain repos with large amounts of references.
  Some repos have multi-million refs which causes scalability issues.
   * Current files backend uses a combination of loose files and packed-refs.
   * Deletion performance is bad.
   * Reference lookups are slow.
   * Storage space is also large.
   * There are some patches which improved the situation. e.g. skip-list for
     packed-refs by Taylor.
   * Atomic updates are currently not possible.
   * This is not an issue only faced by GitLab
* Two solutions proposed:
   * Reftables: Originally implemented by JGit (Shawn Pearce, 2017)
      * Google was storing the data in a table with one ref per row. This data
        was encrypted, which changes the ordering.
      * This led to realizing the ref storage itself was not optimal, so based
        on existing solutions at Google there was a proposal by Shawn and was
        implemented in JGit.
      * This solved the ref storage problem at Google.
      * The implementation in JGit by adoption was low because of compatibility
        requirement with CGit.
      * New patch series submitted which swaps out the packed-refs with
        ref-tables while keeping the existing file based loose-refs.
   * Incremental take on reference backend (aka. packed-refs v2) by Derrick
      * Uses pre-existing infrastructure in the git project. Makes it a more
        natural extension.
      * First part was to support a multi backend structure
      * Second part was packed references v2 in the Git project
* Question: How do we take it forward from here.
   * Emily: If the existing backend exists as a library. Might be easier to
     replace and experiment with.
      * Jeff: A lot of work in that direction has already been landed. But there
        is still some bleed of the implementation in other parts of the code.
        Might be messy to cleanup.
      * Patrick: Different implementations by different hosting providers with
        different requirements might cause issues for clients.[b]
   * Deletion performance is not the only issue faced (at GitLab) there are also
     deadlocks faced around this.
   * brian: If you have a large number of remote tracking refs you face the same
     perf issues.
   * Patrick: Any preference of which solution to go forward. GitLab is
     interested to pick this up and mostly going forward with reftables.
   * Reftables does support tombstoning, should solve the problem with multiple
     deletions.
      * There is still a problem with refs being a prefix of other refs.
   * Is there a world where loose refs are removed completely and replaced with
     reftables.
      * Debugging is much easier with loose refs, reftables is binary
        formatting. Might need additional tooling here. This is already proved
        to be working at Google.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 02/12] Libification Goals and Progress
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
  2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
  2023-10-02 15:17 ` [TOPIC 1/12] Next-gen reference backends Taylor Blau
@ 2023-10-02 15:18 ` Taylor Blau
  2023-10-02 15:18 ` [TOPIC 3/12] Designing a Makefile for multiple libraries Taylor Blau
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:18 UTC (permalink / raw)
  To: git

(Presenter: Emily Shaffer, Notetaker: Taylor Blau)

* The effort is to isolate some parts of Git into smaller, independently
  buildable libraries. Can unit test it, swap out implementations, etc.
* Calvin Wan has been working on extracting a common set of interfaces, refining
  the types, etc. This is in pursuit of a "standard library" implementation for
  Git. Close to being shippable.
* Josh Steadmon spent some time in the second half of the year suggesting a unit
  testing framework in order to test the library interfaces beyond our standard
  shell tests.
* Goals:
   * Google has a couple of ways to proceed with their libification effort.
     Community input is solicited:
      * Interfaces for VFS / callable by IDE integration to avoid shelling out
      * Target libification for the sake of Git itself. Code clean-up, making
        the code more predictable / testable. Example being submodules, which
        are messy and difficult to reason about. References backend, etc.
* Is there an appetite for libification? Some particular component that would
  especially benefit from clean-up, being made more test-able, hot-swappable,
  etc.
* (From Emily's comment above) If others are implementing the basic references
  backend via a different implementation, how do we make sure that we are
  building compatible parts? Goal would be to have Git's unit tests pass against
  a different API.
* (Patrick Steinhardt) For reference backends especially: would like to be able
  to split between "policy" and "mechanism". This would avoid the issue
  discussed in the last session where different e.g. refs backend
  implementations have different behavior.
   * Emily: white-box tests for the API to make sure that different
     implementations meet the policy
* (Jonathan Nieder) For reference backends in particular, the current
  implementation has an odd "layering" scheme - packed-refs today is an
  incomplete backend using the same interface as the complete "loose and packed
  refs" backend, serves as a mechanism without fulfilling the policy
  requirements. The approach above seems like a positive change.
* (Emily) Are also looking into a similar project around the object store, but
  have found that it is deeply intertwined throughout the rest of the code base.
  Difficult to reason about, even without a library interface. Can we make any
  given change safely?
   * Hunch is that it is still useful to target that sort of thing, even if
     there aren't clear boundaries.
   * In the interim, can still be part of the same compilation unit, just
     creating a clearer boundary.
* (Emily) For hosting providers and others building things on top of git, are
  there parts of git functionality that you'd like to have libified so you can
  get benefits without having to wait for feature lag?
* (brian) not interested in using Git as a library in GitHub's codebase because
  of license incompatibility. Would like to experiment with different algorithms
  for packing and delta-fication in Rust as a highly parallel system. Would be
  nice to be able to swap out something that is C-compatible. Have been able to
  make changes in libgit2 while causing libgit2 to segfault, doesn't want to
  write more segfaults.
* (Taylor) There's an effort going on in GitHub to reduce our dependency on
  libgit2, precisely for the feature lag reason Emily mentions. I don't think
  we're planning on using it as a library soon, but we rely on the Git
  command-line interface through fork/exec
* (Emily) Is licensing the only obstacle to using Git as a library, or are there
  other practical concerns?
* (Jeff Hostetler) Pulled libgit2-sharp out of Visual Studio. Issues with
  crashing, but also running into classical issues with large repositories.
  Memory consumption was a real issue at the time. Safer to have memory
  segmented across multiple processes so that processes can't touch other
  processes memory space.
* (Emily) Interesting: thinks that performance overhead would outweigh the
  memory issues.
* (Patrick) To reiterate from GitLab's point of view: we are in the same boat as
  Microsoft and GitHub. Have used libgit2 extensively in the past, but was able
  to drop support last month. No plans to use Git as a library in the future.
  Having a process boundary is useful, avoids memory leaks, bugs in Git spilling
  out to GitLab. Still have an "upstream-first" policy. Benefits everybody by
  spreading the maintenance burden and ensuring that others can benefit from
  such functionality.
* (Emily) If we had the capacity to write portions of Git's code in Rust (memory
  safety, performance, use it as a library), would we want to use it?
   * (Junio) I notice in the participant list people like Randall who work on
     NonStop. I'd worry about the effect on minority stakeholders, portability.
   * (Junio) Not fundamentally opposed to the direction.
* (Elijah) did not parallelize the C implementation of the new ORT backend.
  Wanted to rewrite it in Rust, cleaned up headers as a side-effect, and looked
  at other bits. Merge backends are already pluggable, could have a "normal" one
  in addition to a Rust backend.
* (Emily) If we already have something in C that establishes an existing API
  boundary, that makes it more tenable to rewrite it in Rust. Could say that the
  C version is deprecated and make future changes to Rust.
* (brian) Thinks they would be in favor of that; is personally happy to say that
  operating systems need to accept support for bottom languages eventually. All
  of the main Debian architectures in use have Rust ports. They are portable to
  all of the main architectures. Would make it easier to do unit testing. Could
  add parallelization and optimization without worrying about race conditions,
  which would be a benefit. Is happy to implement unit tests with Rust's nicer
  ecosystem.
* (Taylor) Is it just NonStop?
* (Elijah) Randall mentioned that they have a contractual agreement that is
  supposed to expire at some point
  (https://lore.kernel.org/git/004601d8ed6b$13a2f580$3ae8e080$@nexbridge.com/).
  Could we have a transition plan that:
   * Keeps NonStop users happy until their contract expires.
   * Allows the rest of us to get up to speed with Rust.
* (Jonathan Nieder) doing this in a "self-contained module" mode with fallback C
  implementation gives us the opportunity to back out in the future (at least in
  the early periods while we're still learning).
* (Jonathan Tan) back to process isolation: is the short lifetime of the process
  important?
* (Taylor Blau) seems like an impossible goal to be able to do multi-command
  executions in a single process, the code is just not designed for it.
* (Junio) is anybody using the `git cat-file --batch-command` mode that switches
  between batch and batch-check.
* (Patrick Steinhardt) they are longer lived, but only "middle" long-lived.
  GitLab limits the maximum runtime, on the order of ~minutes, at which point
  they are reaped.
* (Taylor Blau) lots of issues besides memory leaks that would become an issue
* (Jeff Hostetler) would be nice to keep memory-hungry components pinned across
  multiple command-equivalents.
* (Taylor Blau): same issue as reading configuration.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 3/12] Designing a Makefile for multiple libraries
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (2 preceding siblings ...)
  2023-10-02 15:18 ` [TOPIC 02/12] Libification Goals and Progress Taylor Blau
@ 2023-10-02 15:18 ` Taylor Blau
  2023-10-02 15:19 ` [TOPIC 4/12] Scaling Git from a forge's perspective Taylor Blau
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:18 UTC (permalink / raw)
  To: git

(Presenter: Calvin Wan, Notetaker: Keanen Wold)

* Looking for help with makefile use and how he's making libraries
* Wants to have rules to leverage makefiles repeatable for future libraries
* Wants a fail fast for breaking libraries
* Current approach that isn't working so well
   * Each library to have its own section - using directives to section off the
     libraries
* Request
   * Are there makefile experts who can help?
* (Jonathan) do you have an example?
   * (Calvin) using ‘ifdef GIT_STD_LIBRARY' to tweak what goes in LIB_OBJS. This
     approach doesn't scale.
* (Peff) for every C file you have two copies?
   * No, for every reference they are using the same file
* (Junio) libgit.a will we need something different, if so, why?
   * Stubs, how do they come into play?
   * If we had a makefile for a library, we're trying to understand how we have a subset
* (Jonathan) Do I end up with two different .o files?
   * Yes, there is a subset of shared and not shared files
   * Some of the objects are the same, the stubs are different.
   * The problems are the stubs which are shared
* (Calvin?) ideally we want the .o files to be the same
   * Yes
* (Peff) if you are worried about writing the same rules again and again, there
  should be a solution
   * Yes, it will likely have to be a custom pattern
   * Does anyone have a solution that has worked before? A simple solution? OR
     our own custom templating
* (Phillip) can we build the file in standard git so we're not creating the file
  for two different libraries?
* (Emily) if we are changing the behavior using standard git builds and library
  builds...
* (Jonathan) in the past other projects used recursive "make" to reflect module
  structure in the build structure, which has lots of downsides (Peter Miller,
  Recursive make considered harmful).
   * We can define our own structure of what we want the Makefile to look like.
     Linux kernel files are perhaps a good example. There's not necessarily one
     standard everyone follows, it tends to be very codebase specific
   * For better or worse, "make" is a "build your own" kind of build system
* (Emily) why are we not using a different build system? Such as CMake
   * What are the technical reasons for make?
* (Junio) How do the libraries and make relate to each other? Avoiding
  compile-time conditional behavior seems desirable anyway - git as a consumer
  of the library can also benefit from cleaner error handling.
   * (Emily) cleanup related to the library might mean moving exit()/die()
     closer to the built-in. Do we consider that potentially valuable instead of
     useless churn?
      * (Junio) yes
* (Jakub) It's easier to die when something is wrong at the top level
   * (Peff) It depends on what level of error handling we want to get to. The
     reality of C is every single malloc can fail. Do we need to check every
     error?
   * (brian) standard error handling mechanism would be helpful.
* (Emily) for libgit2 does the caller handle the memory?
   * (brian) a dummy value (git_buf_oom) where you can check if it's out of
     memory

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 4/12] Scaling Git from a forge's perspective
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (3 preceding siblings ...)
  2023-10-02 15:18 ` [TOPIC 3/12] Designing a Makefile for multiple libraries Taylor Blau
@ 2023-10-02 15:19 ` Taylor Blau
  2023-10-02 15:19 ` [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes Taylor Blau
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:19 UTC (permalink / raw)
  To: git

(Presenter: Taylor Blau, Notetaker: Karthik Nayak)

* Things on my mind!
* There's been a bunch of work from the forges over the last few years -
  bitmaps, commit-graphs. etc.
* Q: What should we do next? Curious to hear from everyone. Including Keanen's
  team
* Boundary-based bitmap traversals, already spoke about it last year. If you
  have lots of tips that you're excluding from the rev-list query. Backlog to
  check the perf of this.
   * Patrick: still not activated it on production. Faced some issues the last
     time it was activated. We do plan to experiment with this
     (https://gitlab.com/gitlab-org/gitaly/-/issues/5537)
   * Taylor: Curious of the impact.
   * In almost all cases they perform better, in some equal and very few worse.
* (Jonathan Nieder) Two open-ended questions:
   * Different forges run into the same problems. Maybe its worth comparing
     notes. Do we have a good way to do this. In Git discord there is a server
     operator channel, but only two messages.
      * Taylor and Patrick have conversations over this via email exchange.
      * Keanen: Used to have a quarterly meeting. Attendance is low.
      * From an opportunistic perspective, when people want to do this,
        currently seems like 1:1 conversations take place, but there hasn't been
        a wider-group forum
      * Server operator monthly might be fun to revive
      * Git contributor summit is where this generally happens. :)
   * At the last Git Merge there was a talk by Stolee about Git as a database
     and how as a user that can guide you in scaling. Potential roadmap for how
     a git server could do some of that automatically. Potential idea? For
     example, sharding by time? Like gc automatically generating a pack to serve
     shallow clones for recent history.
      * Extending cruft-pack implementation to more organically have a threshold
        on the number of bytes. The current scheme of rewriting the entire
        cruft-pack might not be the best for big repos.
      * Patrick: We currently have such a mechanism for geometric repacking.
* (Taylor Blau) Geometric repacking was done a number of years ago, to more
  gradually compress the repository from many to few packfiles. We still have
  periodic cases where the repository is reduced to 2 packs, one cruft and one
  of the objects. If you had some set of packs which contained disjoint objects
  (no duplicates), could we extend the verbatim packs to work with these
  multiple packs. Anyone had similar issues?
   * Jonathan: One problem is whether to know if a pack has a non-redundant
     reachable object or not without worrying about things like TTL. In git,
     there is "push quarantine" code, if the hook rejects it, it doesn't get
     added to the repo. In JGit there is nothing similar yet, so someone could
     push a bunch of objects, which get stored even though they're rejected by a
     pre-receive hook. Which could end up with packs with unreachable objects.
     With history rewriting we also run into complexity about knowing what packs
     are "live".
      * Patrick: Deterministically pruning objects from the repository is hard
        to solve. In GitLab it's a problem where replicas of the repository
        contain objects which probably need to be deleted.
      * Jeff H: Can we have a classification of refs which makes classification
        possible wherein some refs are transient and some are long term.
         * Jeff King: There are a bunch of heuristic inputs which can help with
           this. Like how older objects have lesser chance of change vs newer.
         * Taylor: Order by recency, so older ones are in one bitmap and newer
           changeable ones could be one clump of bitmaps.
* Minh: I have a question about Taylor's proposal of a single pack composed of
  multiple disjoint packs. Midx can notice duplicate objects. Does that help
  with knowing what can be streamed through?
   * Taylor: The pack reuse code is a bit too naive at this point, but
     conceptually this would work. We already have tools for working with packs
     like this. But this does give more flexibility.
* Taylor: GitHub recently switched to merge-ort for test merges, tremendous
  improvements, but sometimes creates a bunch of loose objects. Option to have
  merge-ort to side step loose objects (write to fast-import or write a pack
  directly)?
   * Things slow down when writing to the filesystem so much.
   * Jonathan Tan: one thing we've discussed is having support in git for a pack
     handle representing a still-open pack file that you can append to and read
     from in the context of an operation.
   * Dscho: that sounds like the sanest thing to do. There's a robust invariant
     of needing an idx for the pack file that you need for working with it
     efficiently, which requires the pack file to be closed. So some things to
     figure out there, I'm interested to follow it.
   * Junio: There was a patch sent to list to restrict the streaming interface.
     I wonder if that moves in the opposite direction of what we're describing
   * brian: In sha256 work I noticed it only currently works on blobs. But I
     don't think adapting it to other object types would be a major departure.
     As long as we don't make the interop harder, I don't see a big problem with
     doing that. Conversion happens at the pack-indexing time.
   * Elijah: Did I understand correctly that this produces a lot of cruft
     objects?
   * Dscho: Yes. We perform test merges and then no ref points to them.
   * Elijah: Nice. "git log --remerge-diff" similarly produces objects that
     don't need to be stored when it performs test merges; that code path is
     careful not to commit them to the object store. You might be able to reuse
     some of that code.
   * Dscho: Thanks! I'll take a look.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (4 preceding siblings ...)
  2023-10-02 15:19 ` [TOPIC 4/12] Scaling Git from a forge's perspective Taylor Blau
@ 2023-10-02 15:19 ` Taylor Blau
  2023-10-02 15:20 ` [TOPIC 6/12] Clarifying backwards compatibility and when we break it Taylor Blau
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:19 UTC (permalink / raw)
  To: git

(Presenter: Christian Couder, Notetaker: Jonathan Nieder)

* Idea: Git LFS has some downsides
   * Not integrated into Git, that's a problem in itself
   * Not easy to change decisions after the affect about what blobs to offload
     into LFS storage
* So I started work some years ago on multiple promisor remotes as an
  alternative to Git LFS
* Works! Requires some pieces
   * Filtering objects when repacking (git repack --filter, due to be merged
     hopefully soon)
* I'm curious about issues related to Git LFS - what leads people not to use Git
  LFS and to do things in other, less efficient ways?
* Choices
   * We can discuss details of a demo I worked on a few years ago
   * We can discuss Git LFS, how it works, and how we can do better
* brian: Sounds like this is a mostly server-side improvement. How does this
  work on the client side for avoiding to need old versions of huge files?
   * Christian: On the client side, you can get those files when you need them
     (using partial clone), and repack --filter allows you to remove your local
     copy when you don't need them any more
   * There could be more options and commands to manage that kind of removal
* Terry: with multiple promisor remotes, does gc write the large files as their
  own separate packfiles? What does the setup look like in practice?
   * Christian: You can do that. But you can also use a remote helper to access
     the remotes where the large files live. Such a cache server can be a plain
     http server hosting the large files, and the remote helper can know how to
     do a basic HTTP GET or RANGE request to get that file.
   * It can also work if the separate remote can be a git remote, specialized in
     handling large files.
   * Terry: So it can behave more like an LFS server, but as a native part of
     the git protocol. How flexible is it?
   * Christian: yes. Remote helpers can be scripts, they don't need to know a
     lot of things when they're just being used to get a few objects.
* Jonathan Tan: is it important for this use case that the server serve regular
  files instead of git packfiles?
   * Christian: not so important, but it can be useful because some people may
     want to access their large objects in different ways. As they're large,
     it's expensive to store them; using the same server to store them for all
     purposes can make things less expensive. E.g. "just stick the file on
     Google Drive".
* Taylor: in concept, this seems like a sensible direction. My concern would be
  immaturity of partial clone client behavior in these multiple-promisor
  scenarios
   * I don't think we have a lot of these users at GitHub. Have others had heavy
     use of partial clone? Have there been heavy issues on the client side?
   * Terry: Within the Android world, partial clone is heavily used by users and
     CI/CD and it's working well.
   * jrnieder: Two qualifications to add, we've been using it with blob filters
     and not tree filters. Haven't been using multiple promisor remotes.
   * Patrick: What's nice about LFS is that it's able to easily offload objects
     to a CDN. Reduce strain on the Git server itself. We might need a protocol
     addition here to redirect to a CDN.
* Jonathan Tan: if we have a protocol addition (server-side option for blob-only
  fetch or something), we can use a remote helper to do the appropriate logic,
  not necessarily involving a git server
   * The issue, though, is that Git expects packfiles, as the way it stores
     things in its object store.
   * As long as the CDN supports serving packfiles, this would all be doable
     using current Git.
   * If the file format differs, may need more work.
* jrn: Going back to Terry's question on the distinction between this and using
  an LFS server. One key diff is that with git LFS, is that the identifier is
  not the object ID, it's some other hash. Are there any other fundamental
  difference?
   * Christian: With git LFS if you want some blobs to be stored with LFS and
     they're not stored in LFS anymore you have to rewrite the history.
   * Using the git object ID gives you that flexibility
* brian: One thing Git LFS has that Git doesn't is deduping
   * On macOS and Windows and btrfs on Linux, having only one underlying copy of
     the file
   * That's possible because we store the file uncompressed
   * That's a feature some people would like to have some time. Not out of the
     question to do in Git, would require a change to how objects are stored in
     the git object store
* jrn: Is anyone using the demonstrated setup?
   * Christian: Doesn't seem so. It was considered interesting when demoed in
     GitLab.
* Jonathan Tan: is the COW thing brian mentioned part of what this would be
  intended to support?
   * Christian: Ultimately that would be possible.
   * brian: To replace Git LFS, you need the ability to store uncompressed
     objects in the git object store. E.g. game textures. Avoids waste of CPU
     and lets you use reflinks (ioctl to share extents).
   * Patrick: objects need the header prefix to denote the object type.
   * brian: Yes, you'd need the blobs + metadata. That's part of what Git LFS
     gives us within GitHub, avoiding having to spend CPU on compressing these
     large objects to serve to the user.
* jrn: Going back to the discussion with multiple promisors. When people turn on
  multiple promisors by mistake, the level of flexibility has been a problem.
  This causes a lot of failed/slow requests - git is very optimistic and tries
  to fetch objects from everywhere. This suggests the approach that Jonathan
  suggested, where the helper is responsible for choosing where to get objects
  from, it might help mitigate these issues.
   * Christian: yes
* Minh: can the server say "here are most of the objects you asked for, but
  these other objects I'd encourage you to get from elsewhere"?
   * Christian: you can configure the same promisor remote on the server. If the
     client doesn't use the promisor remote and only contacts the main server,
     the server will contact the promisor remote, get the object, and send it to
     the client. It's not very efficient, but it works. Another downside is that
     if this happens, that object from the promisor remote is now also on the
     server, so you need to remove it if you don't want to keep it there.
   * Minh: it seems someone has to pack the object with the header and compute
     the git blob id for it, which is itself expensive
   * Christian: if the promisor remote is a regular git server, then yes, the
     objects will be compressed in git packfile format. But if it's a plain HTTP
     server and you access with a helper, it doesn't need to. But of course, if
     the objects are ever fetched by the main server, then it's in packfile or
     loose object format there.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 6/12]  Clarifying backwards compatibility and when we break it
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (5 preceding siblings ...)
  2023-10-02 15:19 ` [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes Taylor Blau
@ 2023-10-02 15:20 ` Taylor Blau
  2023-10-02 15:21 ` [TOPIC 7/12] Authentication to new hosts without setup Taylor Blau
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:20 UTC (permalink / raw)
  To: git

(Presenter: Emily Shaffer, Notetaker: Taylor Blau)

* (Emily) In the last year, there were a handful of scenarios where we had
  issues with backwards compatibility.
   * E.g. config based hooks. As a result of this change, text coloration was
     broken from user-provided hooks. Missing tests, but still viewed it as
     backwards-compatibility-breaking.
   * E.g. deleted internal "git submodule--helper" that looked like a plumbing
     command, which other projects depended on. Was being used in the wild, even
     though we didn't expect it.
   * E.g. bare repository embedding. Interested in fixing that as a security
     concern
     (https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/,
     https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md).
     Weren't able to do so in a widespread fashion, since many projects are
     using it for testing setups (i.e. test fixtures).
* (Emily) When do we consider odd behavior as bugs, versus when do we consider
  them as something that's part of our backwards compatibility guarantee.
* (Emily) What do we want backwards compatibility to look like for libraries?
  How do we want to handle this in the future?
* (Minh) Is there documentation on how this should behave?
   * (Emily): Typically try to guarantee backwards compatibility via integration
     tests. Have changed documented behavior in the past when it seems "just
     wrong". Is the documentation the source of truth?
   * (Jonathan Nieder): In the case of browsers, using a specification for
     compatibility is a useful tool. What will work at the same time across
     different implementations? When there is a single implementation (e.g. git)
     it is easier to capture your intention with the implementation instead of a
     specification.
   * (Jonathan Nieder): Converting that documentation into a specification can
     hurt readability or inhibit its other uses.
   * (Junio): Tend to ensure that observable behavior is locked in via
     integration tests. Tests are the source of truth, along with
     implementation. Documentation is often lying. Unlike the browser example,
     we don't have an external specification to rely on. Intention from
     developers is captured in the log proposed message.
* (Minh) Should we be testing more, then?
   * (Emily) That's part of it, but some older behavior (e.g. from the original
     implementation) has less information in the commit message as a result of
     project culture at the time.
   * (Junio) Working-as-designed, but the design outlived its usefulness.
   * (Jonathan Nieder) E.g. ‘git-add' versus ‘git add'. Outcry after we changed
     behavior, so we rolled it back. Much later we got to a place where people
     weren't relying on this behavior as much.
* (Jonathan Nieder) There is another kind of documentation besides
  specification. E.g. the kernel has a documented guarantee about compatibility:
  "the kernel never breaks userspace". This doesn't mean that we can't have
  observable behavior changes. Only that they maintain "depended-upon" behavior
  that the kernel is reasonably responsible for providing. Can only determine
  this surface area by rolling out changes and seeing if folks complain.
* (Jonathan Nieder) It sometimes feels like we have adopted a similar
  philosophy, but the kernel has an easier job since POSIX, System V, etc have
  defined the overall shape of the syscall interface.
* (Elijah) Difficult to distinguish between bug fixes and breaking backwards
  compatibility. When we break existing test cases, document a rationale for
  doing so in the proposed patch message. Cases where documentation was just
  wrong. Often comes down to a judgment call.
* (Minh) Is the consensus to keep tests up-to-date, and add more tests when
  behavior is unclear?
* (Jonathan Nieder) Problem is that there can be differences of opinion on what
  are safe compatibility guarantees to break.
* (Emily) Also the case that there are tests that are in the integration suite
  that are enforcing things that weren't meant to be compatibility guarantees.
  E.g. enforcing error messages. How do we cope with legacy behavior and legacy
  tests when making a judgment call? There is some documentation in general
  about backwards compatibility, plumbing commands are frozen, porcelain
  commands are not. Should we expand that documentation to clarify how to
  decide?
* (brian) This would be helpful, but not sure what it would look like. Kernel's
  approach may be too rigid for Git. Sometimes useful to break backwards
  compatibility. E.g. "we have it, but it isn't a good choice." Users depend on
  those error messages. When we make a change that is overwhelmingly beneficial,
  can't please everybody all of the time.
* (Jonathan Nieder) Back to guarantees for library code. Kind of view the
  plumbing/porcelain decision as a failed experiment. Of course scripters are
  going to use the plumbing. Want a better backwards compatibility guarantee.
  People are going to want to add more functionality there and lock in
  additional behavior. When people write scripts, they write using the commands
  that they understand how to use. End-user commands gain for-scripting
  functionality.
* (Junio) Worse is when new features are added only to porcelain, and plumbing
  code is left behind.
* (Jonathan Nieder) In a way, we made it harder on ourselves. If porcelains are
  written as scripts, you need plumbing commands to expose the functionality
  they need. Now porcelains use function calls, so the well maintained interface
  is more on the (internal) library side
* Libification moves us in a good direction, since it provides an alternative to
  the CLI as a well-defined programmatic access method.
* (Jonathan Nieder) If we succeed at this, the command-line backwards
  compatibility guarantee for porcelain commands can break down a bit to the
  extent that users start to adopt the library code as their interface to Git.
* (Emily) If we have suitable replacements in the library, can we deprecate the
  plumbing variant of that functionality eventually? Freeze a particular
  plumbing command instead of adding to it
* (Taylor) Can't break existing behavior, shouldn't have to force users to
  upgrade to library code for existing behavior. Apologies if this is what you
  were saying.
* (Jakub) Auto-generated CLI shim, like cloud providers often provide for their
  APIs?
* (Jonathan Tan) Might be hard to create scriptable interfaces for library
  commands. Library allows us to pass pointers and function callbacks, neither
  of these we can accomplish via the shell.
* (Minh) Is there an understanding that the library has to implement 100% of the
  functionality of plumbing commands?
* (Emily) Not convinced that we need a one-to-one match between the library and
  command-line interface. Want to expose the same intent, not necessarily exact
  incantations.
* (Jonathan Nieder) Let me try and summarize. Question resonates with people, no
  one has a silver bullet. Maybe some agreement for using more tests, but the
  general approach to figuring out our compatibility guarantees remains an open
  discussion.
* (brian, via chat) One final thought: maybe we could look at what Semantic
  Versioning defines a breaking change as, since they've defined this in a very
  public way.
* (Phillip, via chat) Thinking back to yesterday there were people saying that
  they chose the cli over a library because of concerns about memory leaks and
  the library crashing/dying as well as licensing concerns. If we were to add
  new functionality only in libraries we'd need to make sure that they were
  robust.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 7/12] Authentication to new hosts without setup
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (6 preceding siblings ...)
  2023-10-02 15:20 ` [TOPIC 6/12] Clarifying backwards compatibility and when we break it Taylor Blau
@ 2023-10-02 15:21 ` Taylor Blau
  2023-10-02 15:21 ` [TOPIC 8/12] Update on jj, including at Google Taylor Blau
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:21 UTC (permalink / raw)
  To: git

(Presenter: M Hickford, Notetaker: Lessley Dennington)
(Slides: https://docs.google.com/presentation/d/127xue1Sr19J1m6wk1KwY9-5G1lPxbyHOgaIi2Ro12ts/edit?usp=sharing)

* (Hickford) I interact with many Git "hosts" (GitHub, GitLab,
  gitlab.freedesktop.org, etc.). I had 15 Personal Access Tokens (PATs) around,
  which was tedious. I was using Git Credential Manager, which has an option to
  authenticate via web browser which creates a token. I released
  git-credential-oauth with this feature which you can use with a storage
  helper. I'm going to show an example of authenticating to a host I've never
  used before (Gitea). Demonstrates signing into Gitea via web browser and
  cloning his fork of project xorm/xorm. Since the repo is public, no
  authentication is necessary. Makes a commit and pushes. Auth flow is
  triggered, provides consent. Authentication was successful. There was no need
  for PATs or shell keys. Git-credential-oauth supports GitHub, GitLab, Gitea,
  and Gitee out of the box. Works using new(ish) password_expiry_utc attribute
  and wwwauth[] headers.
* (brian) Thinks it's a great idea because it's convenient. github.com/github
  requires SAML/SSO and the browser, and this should work just fine. It wouldn't
  be great to have in C, but as a helper it's super convenient.
* (Hickford) Ruled out a C implementation due to the challenges. Goal was to
  remove a barrier to entry for contributors to OSS trying to make bug fixes and
  having to set up/deal with PATs/SSH keys.
* (Jakub) Still work to do with creating a fork, pushing.
* (brian) GCM does this but represents a greater barrier to entry for less Git
  literate users. Less beneficial for Git power users.
   * Edit: Lessley and brian spoke after the meeting, and Lessley realized the
     above was not recorded correctly. git-credential-oauth and GCM both remove
     the need for users to manually set up PATs/SSH keys (which was what was
     being considered as the high barrier to entry).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 8/12] Update on jj, including at Google
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (7 preceding siblings ...)
  2023-10-02 15:21 ` [TOPIC 7/12] Authentication to new hosts without setup Taylor Blau
@ 2023-10-02 15:21 ` Taylor Blau
  2023-10-02 15:21 ` [TOPIC 9/12] Code churn and cleanups Taylor Blau
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:21 UTC (permalink / raw)
  To: git

(Presenter: Martin von Zweigbergk, Notetaker: Glen Choo)

* (Martin) jj team at Google has been growing. The support for different commit
  "backends" has been expanding - we can now store "commits in the cloud" using
  the Google-internal backend.
   * "Revset" engine. Revset is a language for selecting commits (e.g. "select
     all commits by me"). We now have an implementation that scales to Google's
     millions of commits. Commit id prefixes are resolved against the "local"
     commits (not the full Google mainline).
   * Conflicts are now stored tree-level, instead of per-file level. Conflict
     detection is much faster since jj doesn't need to traverse the tree.
   * Exporting jj commits to internal code review tool (Critique).
* (Martin) What's left?
   * Renames: do we track renames? Do we detect them?
* (Elijah) If conflicts are tree-level, can you store partially-resolved
  conflicts?
   * (Martin) Yes, we store trees for each side of the conflict and resolve the
     conflicts only when needed.
* (Jrnieder) Are there lessons from jj development that Git would benefit from?
  What can Git do to make jj's life easier, and vice-versa?
   * (Martin) Conflicts-in-code work extremely well. I think Git could adopt
     that, but it would be very messy to migrate the UX to that. The operation
     log (a global view of all of the refs at a given "operation") is also a big
     improvement over e.g. the reflog.
   * (Martin) jj uses libgit2 (with Rust bindings) under the hood, so we're
     missing functionality like partial clone.
   * (Taylor) do you shell out to git, or only use libgit2? If you did shell
     out, are there other missing Git functions that you'd want?
      * (Martin) Only libgit2. Can't think of other features jj would want.
      * Until merge-ort existed, worktreeless merge would be an example.
      * (Glen) When jj pushes things to a Git server, it loses information. If
        the server understood obsolescence markers, that would be a huge
        improvement for jj.
      * (Martin) Yes, jj uses a change-id to associate different amended
        versions of the same change, similar to Gerrit - it would be nice for
        Git to support the same thing.
* (Junio) Did you have to make any breaking changes that affect your users?
   * (Martin) We make many. We're a small project, and people accept that it
     needs to break to get a nicer UX, which is a nice thing about being early
     in a project.
   * Format-wise, we try not to break the repo format - in terms of newer
     versions of jj being able to work with older versions of repositories.
     Older versions of jj are not expected to always be able to read repos
     written to by a newer version.
      * (Jonathan) "svn upgrade" style?
      * (Martin) Yes, except we immediately do the upgrade automatically.
      * (Jonathan) So the moment you use the new version of jj, you lose the
        ability to roll back.
      * (Martin) Yes. Errors out (crashes) when reading the format it doesn't
        understand.
      * One of these was annoying for users, we may be at the point where we
        need something more formal.
   * (Junio) In 2005, we did two huge breaking changes in the repo format. There
     were lots of users, but we did it anyway. One was about object naming (used
     to compress first, then hash, which was a bad way of doing it - swapped the
     order to compress better and faster without changing object names).
* (Elijah) If we rewrote parts of Git in Rust, would we be able to share code?
   * (Martin) Possibly, but it would require a lot of rewriting to make that
     work.
* (Emily) Greenfield features in jj, e.g. preventing users from rewriting
  "public" commits/history. Are there other ideas would we like to try in jj
  that are harder to do in Git?
   * concept of https://wiki.mercurial-scm.org/Phases makes some things (like
     safe interactive rebase) easier
   * (Terry) The usual practice is to have policies on branches (some branches
     are more experimental, some have stringent quality requirements, etc), but
     those are implemented on the hosting provider, not the VCS.
* (Terry) jj has lots of glowing reviews! Power users are happy with it, using
  jj locally. If anything is not supported in jj, they can use Git instead. Is
  there a roadmap for simplifying the experience for non-power users, having it
  automatically take care of things like when to run gc, etc?
   * (Martin) Re: gc, jj doesn't implement it yet.
   * (Terry) More general UX. If I'm a developer using git repositories and want
     to use jj, when do I get to a place where I have a nice end-to-end
     workflow?
   * (Martin) I already use jj, I don't have the "colocated repo" so I only run
     jj commands, can't run git commands. For blame I fall back to the hosting
     provider's web UI. :) That's something to add.
   * (Jrnieder) My impression from the jj discord is that the UX is very
     dependent on their code review tool. Amending/rebasing and sending to
     GitHub seems to work OK. Losing the obsolescence information when pushing
     to Gerrit works quite poorly.
* (Minh) Does jj store commits in Git form? Can it translate between different
  commit representations?
   * (Martin) It can store commits in Git form. The demand for on-the-fly
     conversion has come up.
* (Taylor) How does jj represent non-Git concepts in Git format, like having
  multiple trees in a commit?
   * (Martin) It stores extra metadata outside of the Git commits, and also it
     stores its own shape in Git format, e.g. for multiple trees, each tree is
     its own directory.
* (Minh) How do you optimize searches like "commits written by me"? Full text
  index?
   * (Martin) It's implementation-specific. On local repos, it just iterates
     commits.
   * (Martin) The revset language is quite expressive, e.g. you can specify AND
     and OR. The language is also separate from implementation.
* (Jakub) There are other tools that implement a query language for Git. It
  could be worth considering implementing one natively. (See Git Rev News
  archives.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 9/12] Code churn and cleanups
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (8 preceding siblings ...)
  2023-10-02 15:21 ` [TOPIC 8/12] Update on jj, including at Google Taylor Blau
@ 2023-10-02 15:21 ` Taylor Blau
  2023-10-02 15:22 ` [TOPIC 10/12] Project management practices Taylor Blau
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:21 UTC (permalink / raw)
  To: git

(Presenter: Calvin Wan, Notetaker: Taylor Blau)

* Question: When is refactoring worth the churn? The refactoring may or may not
  contribute to a different goal (e.g. libification). Other factors:
   * Should those refactor series be included with the feature?
   * Should they be split up?
   * Do they make sense as isolated units?
* Some examples: Elijah's cache.h cleanup series, which was obviously good.
  Others of dubious value.
* (Elijah) May have done the cache.h series a year or two earlier, but wasn't
  sure that it was obviously good.
* (Jonathan Tan) First have to define the churn. Two kinds:
   * Having reviewers look at it in the first place, since there are no
     objective user-facing improvements.
   * Causes additional toil in revision history.
* (Jonathan Tan) Let's start with reviewer churn. What constitutes "good" or
  "clean" code is subjective, so authors and reviewers may spend a large amount
  of time debating whether or not the refactoring meets that criteria. Can be
  avoided when the feature is on top in the same series.
   * (Junio) Speaking cynically: the new feature may be taking a subjective
     change or rejection of it hostage.
   * (Calvin) In other words, refactorings are of lower value than features?
   * (Junio) After you implement some features, you may discover opportunities
     for clean-up after the fact.
   * (Jonathan Nieder) In the course of solving a given problem, may come up
     with a lot of different changes that all help. If you generate a long patch
     series, you are over-constraining the maintainer in determining how to slot
     those changes in. Also makes applying to a maintenance branch, rolling back
     particular pieces harder, etc harder.
      * If I make a one-line bug fix and notice "this code was hard to
        understand, here's a refactoring that makes it more obvious", it's often
        more helpful to the project for the one-line bug fix to come first in
        the series and the refactoring to be a followup or later part of the
        series.
   * (Taylor) One thing that helps is motivating a refactoring. Saying "here's
     what this refactoring makes easier".
   * (Martin) What is "refactoring for its own sake"? For example, is removing
     global state something that we want without additional justification?
   * (Emily) Can we split the difference? Can we send cleanup patches with less
     context? With more context? Should we be better about committing to a
     feature and presumptively merging clean-up patches along the way.
   * (Junio) I rarely hear from reviewers the signals that would allow me to do
     this. "I have reviewed this series, and patches 1-4 look ready, I'd be
     happy with those landing and being built on top of".
   * (Emily) Could change our habits to add "LGTMs" part of the way through the
     series.
   * (Jonathan Tan) We often need to add a feature to "sweeten the deal". The
     feature proves that the refactoring is good. Doesn't add to the overall
     value, but makes it cost less to review the refactoring. Perhaps that the
     presence of the feature is proof enough, even if it isn't merged right
     away.
   * (Terry) Sounds like the question is, "what is the value proposition for
     refactoring?" Usually to lower tech debt. Challenge: maybe every
     refactoring should stand on its own?
      * In implementing a feature, I might notice "the object database interface
        is causing problems in this way". Then my cover letter can spell out
        those problems and how the refactoring addresses them.
      * It's hard work to describe why something isn't good, especially in a
        legacy codebase with some tech debt and some old changes missing clear
        commit messages. It's work but I think it's worthwhile. It builds an
        understanding in the community of how that subsystem should function.
   * (Elijah) My series might be an example of that, didn't have a feature
     associated with it. Helped with libification effort, and started with a
     smaller series to illustrate the direction. Guessing that there are certain
     types of refactoring that we already consider to be good.
   * (Jonathan Nieder) Could having a wiki page that lists helpful refactorings
     that would be likely to be accepted on their own?
   * (Jonathan Tan) I'd like to challenge Terry's challenge. It's a laudable
     goal, but a subsequent patch implementing the feature is worth 1,000 words.
   * (Jonathan Nieder) If we want to be doing more refactoring, then we're going
     to have to develop different skills as developers and reviewers. Reviewing
     refactoring is more like reviewing technical writing. Having examples to
     illustrate the idea can help, even if those examples are changes that
     aren't changes we want to make right now to Git.
   * (Terry) Some people are visual learners, some people are auditory learners,
     and so on. Having a change in place on top of a refactoring is worth 1,000
     words. But if you write well, maybe you don't need the latter patch.
   * (Taylor) I think I agree with both these things - I like having the
     documentation and explanation, but I also see Jonathan Tan's point about
     examples being helpful.
   * We should become more comfortable with throwing away work. Suppose I've
     made a refactoring and we decide not to ship the change it was meant to
     support. Is it worth the reviewer's time to take anyway?
      * We need to make the cover letters clearer, make the case for it being
        worth the time.
   * (Calvin) I think I agree with Taylor. To re-describe: our cost is code
     churn and reviewer time. Feature patches show that there is a 100%
     guarantee the preceding changes are worthwhile. There is a discount factor
     when you don't have a feature to illustrate the value. Authors need to be
     more clear when there doesn't exist a feature patch on what the value is.
      * Reviewers can encourage the author to give better examples of how the
        change will pay off.
   * (Glen) Are there things we could do to help newer contributors in this
     regard? Should we have a more opinionated style guide?
      * (Taylor) Separate CodingGuidelines into semantic requirements and more
        subjective "here are kinds of refactorings we like"
   * (Jonathan Nieder) For newer contributors: better/more worked examples of
     how experienced contributors justify their refactoring changes. E.g. "here
     are some series in the past that were harder to review because of the lack
     of this change". If people had examples to emulate, they would be doing it
     more.
   * (Emily) Difficult to synthesize commit messages without examples,
     especially for non-native English speakers, people who aren't great
     writers, etc.
* (Jonathan Tan) The other kind of churn in looking back at history and seeing
  what has happened in the file. One thing I worry about is that there may be
  another feature in the future that forces us to partially or entirely revert
  the refactoring. That reduces the probability of the refactoring being "good"
  in the first place.
* (Terry) Emily's point about inclusivity: that work (writing a persuasive
  essay, emulating examples) is tedious and difficult, it may not be natural to
  everybody. As a project, we should be creating those examples. Reviewers
  should help newer contributors along the way.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 10/12] Project management practices
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (9 preceding siblings ...)
  2023-10-02 15:21 ` [TOPIC 9/12] Code churn and cleanups Taylor Blau
@ 2023-10-02 15:22 ` Taylor Blau
  2023-10-02 15:22 ` [TOPIC 11/12] Improving new contributor on-boarding Taylor Blau
  2023-10-02 15:22 ` [TOPIC 12/12] Overflow discussion Taylor Blau
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:22 UTC (permalink / raw)
  To: git

(Presenter: Emily Shaffer, Notetaker: Jonathan Nieder)

* high hopes and low expectations! Let's play nice
* Project management related tools and practices we may be used to from $DAYJOB
   * Bug trackers
   * Long-term and short-term project planning
   * Roadmaps
   * Documented project direction
   * Tools, like wiki
* Some things other open source projects do
   * Some do things like two-week sprints with contributors, more structured
   * development of some features
* Some things that happen in regular workspaces
   * Ad hoc whiteboarding through a problem
   * Live chat with history
      * We have IRC but it's dwindling outside of IRC standup
      * There's informal discord
* Lots of tools! Are there ones we want to get benefit from?
* Example: bug tracker
   * Lots of interest, but don't have a shared understanding of what we want
     from a bug tracker
* If you could pick something from your day job and apply it to the Git project,
  what would you look for and what would you want from it?
* (Taylor) "Let's have a quick VC"
   * All being at the same organization within GitHub, people are very willing
     to just jump into a Zoom meeting and talk through a thing, whiteboard
   * There's benefit to having things documented on-list, but I think we could
     walk that back a little
   * (Emily) One thing I've liked with the contributor summit and similar events
     is sending something to the list so people who weren't there can still
     follow and respond
   * Would "Taylor and I just talked about cruft packs" emails be something we
     want more of?
   * (Taylor) Yes and no. Sometimes a conversation is for getting ideas,
     sometimes for making a decision. They deserve different approaches.
   * (Emily) By the way, I've been surprised at how open people are to VCing
     when I try it. Conversations about cruft packs, conversation about config
     based hooks series, tried "let's have a VC" and Ævar was very open to it in
     that example and it worked well
   * So it might be something we should just try more often
* (Emily) At a Git contributor summit, some attendees mentioned wishing they
  could see a published Git project direction
   * Various companies are putting dedicated time into the Git project, and
     those aren't published anyway
      * Page with "GitHub cares a lot about cruft packs, here's how you can
        help"
      * Is that something we could write down somewhere more static than the
        mailing list?
   * (Junio) How quickly would that become stale?
   * (Emily) It depends on how you build it into your own processes. Every
     quarter we write quarterly objectives and key results for my team, publish
     that to everyone at Google. I could also publish that to the Git mailing
     list, part of that normal process could be posting it on a Wiki page.
   * (Jonathan Tan) One thing I'd find difficult in publishing such a thing is
     understanding the priority of line items. Currently I learn about people's
     priorities from patches and from what they say in venues like a contributor
     summit
      * Contributor summit is once/year, if you're saying something, it's
        probably important to you
   * Things that change once/quarter are harder to judge. How much are you
     dedicating to them?
   * (Taylor) Suppose this existed. What would people use it to answer? Mutex?
   * (Emily) There have been some good cross-company collaborations in the past,
     such as partial clones. Noticing the opportunity to work together on such
     things is the kind of thing I'm thinking of.
   * (Taylor) Back in 2014-15, GitHub had an awesome tradition of there being a
     "radar issue", people could comment on this big long thread on what they
     want to hack on.
      * I think that's a little different than publishing a committed roadmap,
        with pressure and accountability. "What is Jonathan Tan interested in"
      * Could be as simple as every quarter we send an email to the list and all
        reply to it.
   * (Jakub) We can attach a roadmap to the same place Git Rev News[c] is, you
     can get news and a roadmap in the same place.
   * (jrn) Sharing your plans and priorities helps people know what they can
     expect you to care about. E.g. if your work is all reliability, all the
     time, maybe a new UX change is not as exciting right now, versus
     reliability focused work is a good place for collaboration.
   * (Calvin) Having timestamps, e.g. a pinned message on Discord, helps you
     know if something is stale.
   * (Jeff Hostetler) Make a repository, publish there "I'm working on this".
     Send a pull request, get feedback. Nice and compact, has timestamps, stays
     in our ecosystem.
* (Emily) I don't like the trend of projects being only managed on discord. But:
  I'm wondering, what changes would make the git community discord more of an
  official channel in the same way as the git mailing list is?
   * https://discord.gg/aUCkDVUqqu
   * (Elijah) There's a Git Discord?
   * (Taylor) We just needed to make Elijah aware of it. ;-)
   * (Jeff Hostetler) I think discord is a bit childish, git repos are something
     professional we all use every day.
   * (jrn) There's a bit of a shift IRC -> Slack -> Discord in a lot of projects
   * (Emily) A big benefit of IRC is that it's a decentralized protocol. Having
     a part of our infrastructure be a centralized, nontransferrable thing is
     scary to me, but maybe there are technical ways to address that. Export
     logs, matrix bridge to IRC, ?
   * (Taylor) I think a barrier to use of chat can be fear of decentralization
     of information, it's convenient that the git mailing list is a one-stop
     shop
      * (Jeff Hostetler) +1, having too many things to check
      * (Emily) I think this is also why we're hesitant about other things like
        bug trackers etc
* (Jonathan) Bug tracking
   * (Emily) This year we moved crbug.com/git (Monorail) to
     git.issues.gerritcodereview.com. There's 80ish issues there. Our team
     within Google uses it. But of course in reality no one else is making use
     of that issue tracker. If there were somewhere else to put bugs instead,
     we'd use it - I don't think it's too important where that is, as long as we
     can do it somewhere.
   * (Junio) Someone needs to curate it.
   * (Emily) It would be possible for us to curate, triage
     git.issues.gerritcodereview.com if people start using it.
   * (Junio) Not limited to bugs, but we from time to time talk about other
     aspects of tracking. Things like patchwork. We talk about mechanisms, but
     not so much about enforcing use of those mechanisms.
   * One work practice I like at work is that anyone can write a CL, and then
     people are forced to review or look at the patch in a reasonable amount of
     time.
   * It can be frustrating as a maintainer, because I don't want to be reviewing
     and looking at all the patches on the list myself. And I don't like having
     to queue patches not looked at by anybody.
   * (Emily) This makes me wonder if we should be having conversations about
     things like "whose turn is it to take action on this patch".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 11/12] Improving new contributor on-boarding
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (10 preceding siblings ...)
  2023-10-02 15:22 ` [TOPIC 10/12] Project management practices Taylor Blau
@ 2023-10-02 15:22 ` Taylor Blau
  2023-10-02 15:22 ` [TOPIC 12/12] Overflow discussion Taylor Blau
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:22 UTC (permalink / raw)
  To: git

(Presenter: Jonathan Nieder, Notetaker: Xing Huang, Ronald Bhuleskar)

* (Jonathan Nieder) Not as structured of a conversation, but I see a lot of
  interest, let's see how the conversation goes. Any open sourced project can be
  scary for newcomers; the git project in particular has its unique aspects of
  its workflow, such as the mailing list that rejects http formatted mails, etc.
  I think overall we are welcoming. Ideally, we would like to attract all types
  of contributors, in part because they help different kinds of users have more
  of a voice.
* I am interested in how to make the onboarding process easier for the new
  contributors; what do we see to make things easier? MyFirstContribution works
  well as a tutorial doc, what is the next step for someone after they send
  their first patch and get their first review in reply? How do you find a
  mentor? Things like how to interpret the reviewer's tone can be hard to
  navigate.
* (Emily) We can mark a patcher as a beginner's patch - the golang (?) project,
  for example, assigns a mentor to newcomers. We have a mentorship list that's
  inactive; could we use the same volunteers from there to give more hands-on
  mentoring?
* (Jonathan Tan) We could use a guideline on what's expected in terms of code
  quality.
* (Taylor) Folks who are newer contributors or haven't contributed much, do you
  have perspectives to share?
   * (Vincenzo) Finding a starting point, a problem to tackle, was difficult.
   * #leftoverbits search term is listed in our
     Documentation/ReviewingGuidelines.txt, but Taylor suspects no new comers
     are looking into it.
   * People in the project will start looking at the next event and get to meet
     the person face to face to have a less daunting relationship.
   * (Phillip) There is a lot of information for  new contributors to digest in
     CodingGuidelines, SubmittingPatches and MyFirstContribution. How do we find
     a balance between providing useful guidelines and overwhelming them?
   * (Jacob Stopak) As a newcomer, sent an idea that was too big to solve
     completely myself, but I would have liked to know where it was going, what
     is my part, what others will help with, and to be able to participate more
     in its implementation instead of it being done by others.
* (Jonathan Nieder) The mailing list is noisy and someone interested in a
  specific topic but the mailing list is flooded with lot of other things,
  unless they are specifically cc-ed on the right things. There's no easy middle
  ground between "my life is in the list" and "I only see what is sent to me".
* (Jakub) There's a bit of a middle ground - you can use a newsreader
* (Jonathan) In a project with a bug tracker, it's easier to know who is
j assigned to and who the collaborators are on something and what to expect
  moving forward. The information is in one place. In the Git project, if
  someone sends a patch on something I'm interested in, I have to interpret why
  they're doing that - do they want to take this over? Are they giving me a
  suggestion?
* (Han Young) Han finds contributor guide to be lacking in details, he finds
  READMEs and discord to be complementary to his newcomer experience.
* (Emily) Which of these ideas should be implemented that makes the most sense?
   * Auto assign 1:1 mentors to new contributors
   * Split up the doc a bit more
   * Wiki: Where to start
   * Have more conferences
   * Have a bug tracker
   * Process documentation: What to do when a review comes in, next steps beyond
     what MyFirstContribution describes.
* (Taylor) The mentor assignment bit is what excites me the most
   * Most new contributors use GitGitGadget, it could notice new contributors
     and find a mentor for them
   * The key there would be documenting what that relationship should look like.
     Helps with clear guidelines on avoiding the kind of hijacking case Jacob
     mentioned (sorry about that!)
* (Jonathan Nieder) Great thing to do if we have a pool of mentors available.
  This cultiire is appreciated.
   * (Emily) Such culture is ingrained in Google in the form of community
     contribution. (Junio) Hmm, where are the reviewers? :)
* (Glen) Discord or other informal channels are easier for mini-mentoring.
* (Jeff Hosteler) GitGitGadget is also doing mini-mentoring recently at a small
  scale that polishes before the author submits.
   * (Emily) Mostly GitHubbers? Should others pitch in?
   * (Jeff Hostetler) I think I'm auto-subscribed because I have write access to
     the repo.
   * (Junio) I've done some reviews there (it shouldn't be limited to GitHub
     folks).
* (Jacob) Thanks much for the documentation, step-by-step instructions are great
   * I used instructions on how to send patches with "git send-email". I didn't
     use GitGitGadget because it wasn't clear to me what it is.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TOPIC 12/12] Overflow discussion
  2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
                   ` (11 preceding siblings ...)
  2023-10-02 15:22 ` [TOPIC 11/12] Improving new contributor on-boarding Taylor Blau
@ 2023-10-02 15:22 ` Taylor Blau
  12 siblings, 0 replies; 14+ messages in thread
From: Taylor Blau @ 2023-10-02 15:22 UTC (permalink / raw)
  To: git

* trackers - bug, review, etc
   * "whose turn" for patches
* (Minh): Multi-pack index and when you repack a large number of packs, you can
  rewrite pack index partially on things that have been changed but can't do
  with bitmap? Is this assumption correct
   * (Taylor): yes: bitmaps get rewritten from scratch to what pack they belong
     but it's close to an incremental approach as long as there is an existing
     bitmap.
* Back to project management practices that Junio mentioned, we seem to not shy
  away from discussing what kind of tool will help us: Bug tracker, etc but have
  more trouble with what practice to put in place to enforce it. Ex: with a
  public Bug Tracker, who responds to user issues, what does priority mean, etc.
  Wonder if people who are motivated to solve by making a small group to define
  this, bring back a proposal to the list.
   * (Josh) As Junio mentioned lot of patches getting ignored - and mostly
     directed to our day job, can we get cross company commitment to
     review/volunteer/run a bug tracker to explicitly help community
   * (Emily) Can view this as a donation, "donating project management
     headcount".
   * (Jonathan) In the linux kernel there's a "Contribution maturity model"
     document. Common definition on what it means to be doing your part, allows
     companies to assess themselves.
   * (Taylor) Open Source donation was something that happened recently
   * (Emily) The Linux Foundation sponsors work to measure contribution: which
     company contributes how many patches/reviews
   * (Pono) Can help here to define qualitative metrics. Tools: CHAOSS that
     plugs into repository. Can work with someone to outlay this.
   * (Phillip) It's easier to find reviewers in some areas than others.
     Different companies have different areas of interest.
   * (Emily) Yeah, we've noticed this at Google. Example: Submodule
   * (Josh) Specific people to volunteer and how we recognize that volunteering
     effort in a smaller group. How about a Git reviewer conspiracy to honor
     people.
   * (Jonathan Nider) Make a small group and Jonathan can volunteer in it and
     Taylor is happy to help too.. (4 people volunteered - jrn, nasamuffin,
     keanen, ttaylorr)
* (Terry) semver was brought up in the compatibility discussion
   * I'd recommend looking at the Eclipse project's practices. It's Java based,
     has very clear language-based definition of what an API or ABI break is.
     But they also have a notion of "public" versus "public-internal" --
     public-internal interfaces can be things that are known to be unstable, and
     when you use them you know you'll be doing work to keep working on top of
     it. They also built a bunch of tools for checking when you break API/ABI.
     This was very successful.
   * Teams at Google building a web service don't have to deal with nearly as
   	 much of that - you can roll forward and back quickly - but that's not the
   	 case for things running on people's desktops, where you need to take a more
   	 principled approach to API evolution.
* (Emily) library API stability (or lack thereof)
* (Minh) Reg SHA256 migration - Git forges
   * First-mover topic - once one forge moves, others will have to scramble.
     Should we coordinate?
   * (Patrick) Gitaly supports SHA256, unofficially already works in importing
     code to GitLab. But we need to adapt a lot of the frontend to support it.
   * (Taylor) GitHub is in an earlier state but is also interested in picking
     this stuff up.
* (Emily) Backward compatibility discussion - Library API Stability
   * Put off version over version API guarantees
   * From talking with the LLVM team at Google, learned the LLVM project adopts
     similar attitude towards API backward compatibility. SHould be an active
     contributor to not break your API.
   * (Jonathan Nieder) Maintaining C++ compatibility is hard, fully expressive
     API in C isn't easy. So it's a nice dividing line there. In Git it's all in
     C, annotations/signals where people can distinguish between "#include this
     header for a stable API, #include this other header for
     use-at-your-own-risk shifting foundation".
   * (Terry) LLVM is for static analysis, but the Git project should probably
     provide a higher level of API guarantee as these 2 projects are at
     different levels.
   * (Jeff Hostetler) Is there a roadmap with milestones around things like "at
     this point, you can work with multiple independent object database
     objects"?
      * (Emily) Yes, that's part of the holy grail of what we're trying to
        accomplish, and it's needed for submodules.
   * (Pono) licensing? Okay with current license, not a concern for Google but
     its a concern for other people using it.
      * (Jonathan) License as part of the interface, as soon as we have
        potential callers for whom GPL is not suitable this conversation will be
        easier. "Shall we relicense this subsystem to support this caller?"

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-10-02 15:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-02 15:15 Notes from the Git Contributor's Summit, 2023 Taylor Blau
2023-10-02 15:17 ` [TOPIC 0/12] Welcome / Conservancy Update Taylor Blau
2023-10-02 15:17 ` [TOPIC 1/12] Next-gen reference backends Taylor Blau
2023-10-02 15:18 ` [TOPIC 02/12] Libification Goals and Progress Taylor Blau
2023-10-02 15:18 ` [TOPIC 3/12] Designing a Makefile for multiple libraries Taylor Blau
2023-10-02 15:19 ` [TOPIC 4/12] Scaling Git from a forge's perspective Taylor Blau
2023-10-02 15:19 ` [TOPIC 5/12] Replacing Git LFS using multiple promisor remotes Taylor Blau
2023-10-02 15:20 ` [TOPIC 6/12] Clarifying backwards compatibility and when we break it Taylor Blau
2023-10-02 15:21 ` [TOPIC 7/12] Authentication to new hosts without setup Taylor Blau
2023-10-02 15:21 ` [TOPIC 8/12] Update on jj, including at Google Taylor Blau
2023-10-02 15:21 ` [TOPIC 9/12] Code churn and cleanups Taylor Blau
2023-10-02 15:22 ` [TOPIC 10/12] Project management practices Taylor Blau
2023-10-02 15:22 ` [TOPIC 11/12] Improving new contributor on-boarding Taylor Blau
2023-10-02 15:22 ` [TOPIC 12/12] Overflow discussion Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).