From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: Christian Couder <christian.couder@gmail.com>,
git@vger.kernel.org, John Cai <johncai86@gmail.com>,
Taylor Blau <me@ttaylorr.com>,
Eric Sunshine <sunshine@sunshineco.com>,
Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v3 5/5] doc: add technical design doc for large object promisors
Date: Mon, 16 Dec 2024 10:00:36 +0100 [thread overview]
Message-ID: <Z1_sNJoYHVfVsn51@pks.im> (raw)
In-Reply-To: <xmqqjzc7lq60.fsf@gitster.g>
On Tue, Dec 10, 2024 at 08:43:03PM +0900, Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> > +In other words, the goal of this document is not to talk about all the
> > +possible ways to optimize how Git could handle large blobs, but to
> > +describe how a LOP based solution could work well and alleviate a
> > +number of current issues in the context of Git clients and servers
> > +sharing Git objects.
>
> But if you do not discuss even a single way, and handwave "we'll
> have this magical object storage that would solve all the problems
> for us", then we cannot really tell if the problem is solved by us,
> or by handwaved away by assuming the magical object storage. We'd
> need at least one working example.
It's something we're working on in parallel with the effort to slowly
move towards pluggable object databases. We aren't yet totally clear
on how exactly to store such objects, but there are a couple of ideas:
- Store large objects verbatim in a separate path without any kind of
compression at all. It solves the problem of wasting compute time
during compression, but does not solve the problem of having to
store blobs multiple times even if only a tiny part of them change.
- Use a rolling hash function to split up large objects into smaller
hunks that can be deduplicated. This solves the issue of only small
parts of the binary file changing as we'd only have to store the
hunk that has changed.
This has been discussed e.g. in [1], and I've been talking with some
people about rolling hash functions.
In any case, getting to pluggale ODBs is likely a multi-year effort, so
I wonder how detailed we should be in the context of the document here.
We might want to mention that there are ideas and maybe even provide
some pointers, but I think it makes sense to defer the technical
discussion of how exactly this could look like to the future. Mostly
because I think it's going to be a rather big discussion on its own.
Patrick
[1]: https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/
next prev parent reply other threads:[~2024-12-16 9:00 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-31 13:40 [PATCH 0/4] Introduce a "promisor-remote" capability Christian Couder
2024-07-31 13:40 ` [PATCH 1/4] version: refactor strbuf_sanitize() Christian Couder
2024-07-31 17:18 ` Junio C Hamano
2024-08-20 11:29 ` Christian Couder
2024-07-31 13:40 ` [PATCH 2/4] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-07-31 17:29 ` Junio C Hamano
2024-07-31 21:49 ` Taylor Blau
2024-08-20 11:29 ` Christian Couder
2024-08-20 11:29 ` Christian Couder
2024-07-31 13:40 ` [PATCH 3/4] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-07-31 15:40 ` Taylor Blau
2024-08-20 11:32 ` Christian Couder
2024-08-20 17:01 ` Junio C Hamano
2024-09-10 16:32 ` Christian Couder
2024-07-31 16:16 ` Taylor Blau
2024-08-20 11:32 ` Christian Couder
2024-08-20 16:55 ` Junio C Hamano
2024-09-10 16:32 ` Christian Couder
2024-09-10 17:46 ` Junio C Hamano
2024-07-31 18:25 ` Junio C Hamano
2024-07-31 19:34 ` Junio C Hamano
2024-08-20 12:21 ` Christian Couder
2024-08-05 13:48 ` Patrick Steinhardt
2024-08-19 20:00 ` Junio C Hamano
2024-09-10 16:31 ` Christian Couder
2024-07-31 13:40 ` [PATCH 4/4] promisor-remote: check advertised name or URL Christian Couder
2024-07-31 18:35 ` Junio C Hamano
2024-09-10 16:32 ` Christian Couder
2024-07-31 16:01 ` [PATCH 0/4] Introduce a "promisor-remote" capability Junio C Hamano
2024-07-31 16:17 ` Taylor Blau
2024-09-10 16:29 ` [PATCH v2 " Christian Couder
2024-09-10 16:29 ` [PATCH v2 1/4] version: refactor strbuf_sanitize() Christian Couder
2024-09-10 16:29 ` [PATCH v2 2/4] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-09-10 16:29 ` [PATCH v2 3/4] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-09-30 7:56 ` Patrick Steinhardt
2024-09-30 13:28 ` Christian Couder
2024-10-01 10:14 ` Patrick Steinhardt
2024-10-01 18:47 ` Junio C Hamano
2024-11-06 14:04 ` Patrick Steinhardt
2024-11-28 5:47 ` Junio C Hamano
2024-11-28 15:31 ` Christian Couder
2024-11-29 1:31 ` Junio C Hamano
2024-09-10 16:30 ` [PATCH v2 4/4] promisor-remote: check advertised name or URL Christian Couder
2024-09-30 7:57 ` Patrick Steinhardt
2024-09-26 18:09 ` [PATCH v2 0/4] Introduce a "promisor-remote" capability Junio C Hamano
2024-09-27 9:15 ` Christian Couder
2024-09-27 22:48 ` Junio C Hamano
2024-09-27 23:31 ` rsbecker
2024-09-28 10:56 ` Kristoffer Haugsbakk
2024-09-30 7:57 ` Patrick Steinhardt
2024-09-30 9:17 ` Christian Couder
2024-09-30 16:52 ` Junio C Hamano
2024-10-01 10:14 ` Patrick Steinhardt
2024-09-30 16:34 ` Junio C Hamano
2024-09-30 21:26 ` brian m. carlson
2024-09-30 22:27 ` Junio C Hamano
2024-10-01 10:13 ` Patrick Steinhardt
2024-12-06 12:42 ` [PATCH v3 0/5] " Christian Couder
2024-12-06 12:42 ` [PATCH v3 1/5] version: refactor strbuf_sanitize() Christian Couder
2024-12-07 6:21 ` Junio C Hamano
2025-01-27 15:07 ` Christian Couder
2024-12-06 12:42 ` [PATCH v3 2/5] strbuf: refactor strbuf_trim_trailing_ch() Christian Couder
2024-12-07 6:35 ` Junio C Hamano
2025-01-27 15:07 ` Christian Couder
2024-12-16 11:47 ` karthik nayak
2024-12-06 12:42 ` [PATCH v3 3/5] Add 'promisor-remote' capability to protocol v2 Christian Couder
2024-12-07 7:59 ` Junio C Hamano
2025-01-27 15:08 ` Christian Couder
2024-12-06 12:42 ` [PATCH v3 4/5] promisor-remote: check advertised name or URL Christian Couder
2024-12-06 12:42 ` [PATCH v3 5/5] doc: add technical design doc for large object promisors Christian Couder
2024-12-10 1:28 ` Junio C Hamano
2025-01-27 15:12 ` Christian Couder
2024-12-10 11:43 ` Junio C Hamano
2024-12-16 9:00 ` Patrick Steinhardt [this message]
2025-01-27 15:11 ` Christian Couder
2025-01-27 18:02 ` Junio C Hamano
2025-02-18 11:42 ` Christian Couder
2024-12-09 8:04 ` [PATCH v3 0/5] Introduce a "promisor-remote" capability Junio C Hamano
2024-12-09 10:40 ` Christian Couder
2024-12-09 10:42 ` Christian Couder
2024-12-09 23:01 ` Junio C Hamano
2025-01-27 15:05 ` Christian Couder
2025-01-27 19:38 ` Junio C Hamano
2025-01-27 15:16 ` [PATCH v4 0/6] " Christian Couder
2025-01-27 15:16 ` [PATCH v4 1/6] version: replace manual ASCII checks with isprint() for clarity Christian Couder
2025-01-27 15:16 ` [PATCH v4 2/6] version: refactor redact_non_printables() Christian Couder
2025-01-27 15:16 ` [PATCH v4 3/6] version: make redact_non_printables() non-static Christian Couder
2025-01-30 10:51 ` Patrick Steinhardt
2025-02-18 11:42 ` Christian Couder
2025-01-27 15:16 ` [PATCH v4 4/6] Add 'promisor-remote' capability to protocol v2 Christian Couder
2025-01-30 10:51 ` Patrick Steinhardt
2025-02-18 11:41 ` Christian Couder
2025-01-27 15:17 ` [PATCH v4 5/6] promisor-remote: check advertised name or URL Christian Couder
2025-01-27 23:48 ` Junio C Hamano
2025-01-28 0:01 ` Junio C Hamano
2025-01-30 10:51 ` Patrick Steinhardt
2025-02-18 11:41 ` Christian Couder
2025-02-18 11:42 ` Christian Couder
2025-01-27 15:17 ` [PATCH v4 6/6] doc: add technical design doc for large object promisors Christian Couder
2025-01-27 21:14 ` [PATCH v4 0/6] Introduce a "promisor-remote" capability Junio C Hamano
2025-02-18 11:40 ` Christian Couder
2025-02-18 11:32 ` [PATCH v5 0/3] " Christian Couder
2025-02-18 11:32 ` [PATCH v5 1/3] Add 'promisor-remote' capability to protocol v2 Christian Couder
2025-02-18 11:32 ` [PATCH v5 2/3] promisor-remote: check advertised name or URL Christian Couder
2025-02-18 11:32 ` [PATCH v5 3/3] doc: add technical design doc for large object promisors Christian Couder
2025-02-21 8:33 ` Patrick Steinhardt
2025-03-03 16:58 ` Junio C Hamano
2025-02-18 19:07 ` [PATCH v5 0/3] Introduce a "promisor-remote" capability Junio C Hamano
2025-02-21 8:34 ` Patrick Steinhardt
2025-02-21 18:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z1_sNJoYHVfVsn51@pks.im \
--to=ps@pks.im \
--cc=chriscool@tuxfamily.org \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johncai86@gmail.com \
--cc=me@ttaylorr.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).