From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, John Cai <johncai86@gmail.com>,
Patrick Steinhardt <ps@pks.im>, Taylor Blau <me@ttaylorr.com>,
Eric Sunshine <sunshine@sunshineco.com>,
Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v3 5/5] doc: add technical design doc for large object promisors
Date: Tue, 10 Dec 2024 20:43:03 +0900 [thread overview]
Message-ID: <xmqqjzc7lq60.fsf@gitster.g> (raw)
In-Reply-To: <20241206124248.160494-6-christian.couder@gmail.com> (Christian Couder's message of "Fri, 6 Dec 2024 13:42:48 +0100")
Christian Couder <christian.couder@gmail.com> writes:
> +We will call a "Large Object Promisor", or "LOP" in short, a promisor
> +remote which is used to store only large blobs and which is separate
> +from the main remote that should store the other Git objects and the
> +rest of the repos.
> +
> +By extension, we will also call "Large Object Promisor", or LOP, the
> +effort described in this document to add a set of features to make it
> +easier to handle large blobs/files in Git by using LOPs.
> +
> +This effort would especially improve things on the server side, and
> +especially for large blobs that are already compressed in a binary
> +format.
The implementation on the server side can be hidden and be improved
as long as we have a reasonable wire protocol. As it stands, even
with the promisor-remote referral extension, the data coming from
LOP still is expected to be a pack stream, which I am not sure is a
good match. Is the expectation (yes, I know the document later says
it won't go into storage layer, but still, in order to get the
details of the protocol extension right, we MUST have some idea on
the characteristics the storage layer has so that the protocol would
work well with the storage implementation with such characteristics)
that we give up on deltifying these LOP objects (which might be a
sensible assumption, if they are incompressible large binary gunk),
we store each object in LOP as base representation inside a pack
stream (i.e. the in-pack "undeltified representation" defined in
Documentation/gitformat-pack.txt), so that to send these LOP objects
is just the matter of preparing the pack header (PACK + version +
numobjects) and then concatenating these objects while computing the
running checksum to place in the trailer of the pack stream? Could
it still be too expensive for the server side, having to compute the
running sum, and we might want to update the object transfer part of
the pack stream definition somehow to reduce the load on the server
side?
> +- We will not discuss those client side improvements here, as they
> + would require changes in different parts of Git than this effort.
> ++
> +So we don't pretend to fully replace Git LFS with only this effort,
> +but we nevertheless believe that it can significantly improve the
> +current situation on the server side, and that other separate
> +efforts could also improve the situation on the client side.
We still need to come up with a minimally working client side
components, if our goal were to only improve the server side, in
order to demonstrate the benefit of the effort.
> +In other words, the goal of this document is not to talk about all the
> +possible ways to optimize how Git could handle large blobs, but to
> +describe how a LOP based solution could work well and alleviate a
> +number of current issues in the context of Git clients and servers
> +sharing Git objects.
But if you do not discuss even a single way, and handwave "we'll
have this magical object storage that would solve all the problems
for us", then we cannot really tell if the problem is solved by us,
or by handwaved away by assuming the magical object storage. We'd
need at least one working example.
> +6) A protocol negotiation should happen when a client clones
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a client clones from a main repo, there should be a protocol
> +negotiation so that the server can advertise one or more LOPs and so
> +that the client and the server can discuss if the client could
> +directly use a LOP the server is advertising. If the client and the
> +server can agree on that, then the client would be able to get the
> +large blobs directly from the LOP and the server would not need to
> +fetch those blobs from the LOP to be able to serve the client.
> +
> +Note
> +++++
> +
> +For fetches instead of clones, see the "What about fetches?" FAQ entry
> +below.
> +
> +Rationale
> ++++++++++
> +
> +Security, configurability and efficiency of setting things up.
It is unclear how it improves security and configurability if we
limit the protocol exchange only at the clone time (implying that
later either side cannot change it). It will lead to security
issues if we assume that it is impossible for one side to "lie" to
the other side what they earlier agreed on (unless we somehow make
it actually impossible to lie to the other side, of course).
> +7) A client can offload to a LOP
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a client is using a LOP that is also a LOP of its main remote,
> +the client should be able to offload some large blobs it has fetched,
> +but might not need anymore, to the LOP.
For a client that _creates_ a large object, the situation would be
the same, right? After it creates several versions of the opening
segment of, say, a movie, the latest version may be still wanted,
but the creating client may want to offload earlier versions.
next prev parent reply other threads:[~2024-12-10 11:43 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-31 13:40 [PATCH 0/4] Introduce a "promisor-remote" capability Christian Couder
2024-07-31 13:40 ` [PATCH 1/4] version: refactor strbuf_sanitize() Christian Couder
2024-07-31 17:18 ` Junio C Hamano
2024-08-20 11:29 ` Christian Couder
2024-07-31 13:40 ` [PATCH 2/4] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-07-31 17:29 ` Junio C Hamano
2024-07-31 21:49 ` Taylor Blau
2024-08-20 11:29 ` Christian Couder
2024-08-20 11:29 ` Christian Couder
2024-07-31 13:40 ` [PATCH 3/4] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-07-31 15:40 ` Taylor Blau
2024-08-20 11:32 ` Christian Couder
2024-08-20 17:01 ` Junio C Hamano
2024-09-10 16:32 ` Christian Couder
2024-07-31 16:16 ` Taylor Blau
2024-08-20 11:32 ` Christian Couder
2024-08-20 16:55 ` Junio C Hamano
2024-09-10 16:32 ` Christian Couder
2024-09-10 17:46 ` Junio C Hamano
2024-07-31 18:25 ` Junio C Hamano
2024-07-31 19:34 ` Junio C Hamano
2024-08-20 12:21 ` Christian Couder
2024-08-05 13:48 ` Patrick Steinhardt
2024-08-19 20:00 ` Junio C Hamano
2024-09-10 16:31 ` Christian Couder
2024-07-31 13:40 ` [PATCH 4/4] promisor-remote: check advertised name or URL Christian Couder
2024-07-31 18:35 ` Junio C Hamano
2024-09-10 16:32 ` Christian Couder
2024-07-31 16:01 ` [PATCH 0/4] Introduce a "promisor-remote" capability Junio C Hamano
2024-07-31 16:17 ` Taylor Blau
2024-09-10 16:29 ` [PATCH v2 " Christian Couder
2024-09-10 16:29 ` [PATCH v2 1/4] version: refactor strbuf_sanitize() Christian Couder
2024-09-10 16:29 ` [PATCH v2 2/4] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-09-10 16:29 ` [PATCH v2 3/4] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-09-30 7:56 ` Patrick Steinhardt
2024-09-30 13:28 ` Christian Couder
2024-10-01 10:14 ` Patrick Steinhardt
2024-10-01 18:47 ` Junio C Hamano
2024-11-06 14:04 ` Patrick Steinhardt
2024-11-28 5:47 ` Junio C Hamano
2024-11-28 15:31 ` Christian Couder
2024-11-29 1:31 ` Junio C Hamano
2024-09-10 16:30 ` [PATCH v2 4/4] promisor-remote: check advertised name or URL Christian Couder
2024-09-30 7:57 ` Patrick Steinhardt
2024-09-26 18:09 ` [PATCH v2 0/4] Introduce a "promisor-remote" capability Junio C Hamano
2024-09-27 9:15 ` Christian Couder
2024-09-27 22:48 ` Junio C Hamano
2024-09-27 23:31 ` rsbecker
2024-09-28 10:56 ` Kristoffer Haugsbakk
2024-09-30 7:57 ` Patrick Steinhardt
2024-09-30 9:17 ` Christian Couder
2024-09-30 16:52 ` Junio C Hamano
2024-10-01 10:14 ` Patrick Steinhardt
2024-09-30 16:34 ` Junio C Hamano
2024-09-30 21:26 ` brian m. carlson
2024-09-30 22:27 ` Junio C Hamano
2024-10-01 10:13 ` Patrick Steinhardt
2024-12-06 12:42 ` [PATCH v3 0/5] " Christian Couder
2024-12-06 12:42 ` [PATCH v3 1/5] version: refactor strbuf_sanitize() Christian Couder
2024-12-07 6:21 ` Junio C Hamano
2025-01-27 15:07 ` Christian Couder
2024-12-06 12:42 ` [PATCH v3 2/5] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-12-07 6:35 ` Junio C Hamano
2025-01-27 15:07 ` Christian Couder
2024-12-16 11:47 ` karthik nayak
2024-12-06 12:42 ` [PATCH v3 3/5] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-12-07 7:59 ` Junio C Hamano
2025-01-27 15:08 ` Christian Couder
2024-12-06 12:42 ` [PATCH v3 4/5] promisor-remote: check advertised name or URL Christian Couder
2024-12-06 12:42 ` [PATCH v3 5/5] doc: add technical design doc for large object promisors Christian Couder
2024-12-10 1:28 ` Junio C Hamano
2025-01-27 15:12 ` Christian Couder
2024-12-10 11:43 ` Junio C Hamano [this message]
2024-12-16 9:00 ` Patrick Steinhardt
2025-01-27 15:11 ` Christian Couder
2025-01-27 18:02 ` Junio C Hamano
2025-02-18 11:42 ` Christian Couder
2024-12-09 8:04 ` [PATCH v3 0/5] Introduce a "promisor-remote" capability Junio C Hamano
2024-12-09 10:40 ` Christian Couder
2024-12-09 10:42 ` Christian Couder
2024-12-09 23:01 ` Junio C Hamano
2025-01-27 15:05 ` Christian Couder
2025-01-27 19:38 ` Junio C Hamano
2025-01-27 15:16 ` [PATCH v4 0/6] " Christian Couder
2025-01-27 15:16 ` [PATCH v4 1/6] version: replace manual ASCII checks with isprint() for clarity Christian Couder
2025-01-27 15:16 ` [PATCH v4 2/6] version: refactor redact_non_printables() Christian Couder
2025-01-27 15:16 ` [PATCH v4 3/6] version: make redact_non_printables() non-static Christian Couder
2025-01-30 10:51 ` Patrick Steinhardt
2025-02-18 11:42 ` Christian Couder
2025-01-27 15:16 ` [PATCH v4 4/6] Add 'promisor-remote' capability to protocol v2 Christian Couder
2025-01-30 10:51 ` Patrick Steinhardt
2025-02-18 11:41 ` Christian Couder
2025-01-27 15:17 ` [PATCH v4 5/6] promisor-remote: check advertised name or URL Christian Couder
2025-01-27 23:48 ` Junio C Hamano
2025-01-28 0:01 ` Junio C Hamano
2025-01-30 10:51 ` Patrick Steinhardt
2025-02-18 11:41 ` Christian Couder
2025-02-18 11:42 ` Christian Couder
2025-01-27 15:17 ` [PATCH v4 6/6] doc: add technical design doc for large object promisors Christian Couder
2025-01-27 21:14 ` [PATCH v4 0/6] Introduce a "promisor-remote" capability Junio C Hamano
2025-02-18 11:40 ` Christian Couder
2025-02-18 11:32 ` [PATCH v5 0/3] " Christian Couder
2025-02-18 11:32 ` [PATCH v5 1/3] Add 'promisor-remote' capability to protocol v2 Christian Couder
2025-02-18 11:32 ` [PATCH v5 2/3] promisor-remote: check advertised name or URL Christian Couder
2025-02-18 11:32 ` [PATCH v5 3/3] doc: add technical design doc for large object promisors Christian Couder
2025-02-21 8:33 ` Patrick Steinhardt
2025-03-03 16:58 ` Junio C Hamano
2025-02-18 19:07 ` [PATCH v5 0/3] Introduce a "promisor-remote" capability Junio C Hamano
2025-02-21 8:34 ` Patrick Steinhardt
2025-02-21 18:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqjzc7lq60.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=chriscool@tuxfamily.org \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=johncai86@gmail.com \
--cc=me@ttaylorr.com \
--cc=ps@pks.im \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).