* [PATCH] doc: detail LMAP binary format specification
@ 2026-04-16 5:55 irsalshydiq
0 siblings, 0 replies; only message in thread
From: irsalshydiq @ 2026-04-16 5:55 UTC (permalink / raw)
To: git; +Cc: irsalshydiq
The experimental Rust implementation in 'loose.rs' introduces a new
binary format called 'LMAP' for mapping between different object ID
formats (e.g., SHA-1 and SHA-256).
However, the technical documentation for this format was missing from
the 'Documentation/technical/' directory, making it difficult for
C developers to understand the format's layout and logic.
Add 'Documentation/technical/loose-object-map.adoc' to provide a bit-by-bit
specification of the 'LMAP' format and register it in the build systems.
Signed-off-by: irsalshydiq <ichalprov@gmail.com>
---
Documentation/Makefile | 1 +
Documentation/technical/loose-object-map.adoc | 72 +++++++++++++++++++
Documentation/technical/meson.build | 1 +
3 files changed, 74 insertions(+)
create mode 100644 Documentation/technical/loose-object-map.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2699f0b24a..8ad908d62c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -126,6 +126,7 @@ TECH_DOCS += technical/directory-rename-detection
TECH_DOCS += technical/hash-function-transition
TECH_DOCS += technical/large-object-promisors
TECH_DOCS += technical/long-running-process-protocol
+TECH_DOCS += technical/loose-object-map
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
diff --git a/Documentation/technical/loose-object-map.adoc b/Documentation/technical/loose-object-map.adoc
new file mode 100644
index 0000000000..6e8dbd6c6f
--- /dev/null
+++ b/Documentation/technical/loose-object-map.adoc
@@ -0,0 +1,72 @@
+Loose Object Map (LMAP) format
+============================
+
+The loose object map file (LMAP) provides a way to map between different object
+ID formats (e.g., SHA-1 and SHA-256) for loose objects. It is designed for
+efficient lookup and storage of these mappings.
+
+All multi-byte integers are in network byte order (big-endian).
+
+== File Layout
+
+- A header (20 bytes)
+- An Object Format Table (16 bytes per format)
+- A Trailer Offset (8 bytes)
+- Data Sections (variable length, 4-byte aligned)
+- A Trailer (variable length, defined by hash algorithm)
+
+=== Header
+
+- 4-byte signature: `LMAP`
+- 4-byte version number: The current version is 1.
+- 4-byte header size: The total size of the header, including the Object Format Table and Trailer Offset.
+- 4-byte number of items: The number of object IDs mapped in this file.
+- 4-byte number of object formats: The number of different hash algorithms supported in this file (minimum 2).
+
+=== Object Format Table
+
+For each object format (as specified in the header), there is a 16-byte entry:
+
+- 4-byte Format ID: The identifier for the hash algorithm (e.g., `0x73686131` for SHA-1, `0x73323536` for SHA-256).
+- 4-byte Shortened Length: The minimum number of bytes needed to unambiguously identify an object ID in this format within this file.
+- 8-byte Data Offset: The absolute offset from the beginning of the file to the start of the data section for this format.
+
+=== Trailer Offset
+
+- 8-byte Trailer Offset: The absolute offset from the beginning of the file to the start of the Trailer.
+
+=== Data Sections
+
+Each object format has a corresponding data section starting at the offset provided in the Object Format Table. Each data section is aligned to a 4-byte boundary.
+
+==== Format 1 (Storage Format) Data Section
+
+The first format listed is considered the "storage" or "main" format. Its data section contains:
+
+1. **Shortened Index**: `(number of items) * (shortened length)` bytes.
+ This table contains the first `shortened length` bytes of each object ID, sorted lexicographically. This allows for binary search lookup.
+
+2. **Full OID Table**: `(number of items) * (hash length)` bytes.
+ The full object IDs for the storage format, in the same order as the Shortened Index.
+
+3. **Metadata Table**: `(number of items) * 4` bytes.
+ A table of 32-bit integers representing the type of each object:
+ - 0: Reserved (e.g., null OID, empty tree/blob)
+ - 1: Loose Object
+ - 2: Shallow Commit
+ - 3: Submodule Commit
+
+==== Subsequent Format (Compatibility) Data Section
+
+For each subsequent format, the data section contains:
+
+1. **Shortened Index**: Similar to the storage format, but for the compatibility algorithm's OIDs.
+
+2. **Full OID Table**: The full object IDs in the compatibility algorithm.
+
+3. **Mapping Table**: `(number of items) * 4` bytes.
+ A table of 32-bit integers. Each entry at index `i` provides the index in the **storage format's** tables that corresponds to this compatibility object ID.
+
+=== Trailer
+
+- Variable length: The hash of all preceding bytes in the file, calculated using the main hash algorithm.
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index ec07088c57..dc1249f9aa 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -16,6 +16,7 @@ articles = [
'large-object-promisors.adoc',
'long-running-process-protocol.adoc',
'multi-pack-index.adoc',
+ 'loose-object-map.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
'parallel-checkout.adoc',
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-06-18 17:06 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16 5:55 [PATCH] doc: detail LMAP binary format specification irsalshydiq
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.