* [PATCH] doc: detail LMAP binary format specification
@ 2026-04-16 5:55 irsalshydiq
0 siblings, 0 replies; only message in thread
From: irsalshydiq @ 2026-04-16 5:55 UTC (permalink / raw)
To: git; +Cc: irsalshydiq
The experimental Rust implementation in 'loose.rs' introduces a new
binary format called 'LMAP' for mapping between different object ID
formats (e.g., SHA-1 and SHA-256).
However, the technical documentation for this format was missing from
the 'Documentation/technical/' directory, making it difficult for
C developers to understand the format's layout and logic.
Add 'Documentation/technical/loose-object-map.adoc' to provide a bit-by-bit
specification of the 'LMAP' format and register it in the build systems.
Signed-off-by: irsalshydiq <ichalprov@gmail.com>
---
Documentation/Makefile | 1 +
Documentation/technical/loose-object-map.adoc | 72 +++++++++++++++++++
Documentation/technical/meson.build | 1 +
3 files changed, 74 insertions(+)
create mode 100644 Documentation/technical/loose-object-map.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2699f0b24a..8ad908d62c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -126,6 +126,7 @@ TECH_DOCS += technical/directory-rename-detection
TECH_DOCS += technical/hash-function-transition
TECH_DOCS += technical/large-object-promisors
TECH_DOCS += technical/long-running-process-protocol
+TECH_DOCS += technical/loose-object-map
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
diff --git a/Documentation/technical/loose-object-map.adoc b/Documentation/technical/loose-object-map.adoc
new file mode 100644
index 0000000000..6e8dbd6c6f
--- /dev/null
+++ b/Documentation/technical/loose-object-map.adoc
@@ -0,0 +1,72 @@
+Loose Object Map (LMAP) format
+============================
+
+The loose object map file (LMAP) provides a way to map between different object
+ID formats (e.g., SHA-1 and SHA-256) for loose objects. It is designed for
+efficient lookup and storage of these mappings.
+
+All multi-byte integers are in network byte order (big-endian).
+
+== File Layout
+
+- A header (20 bytes)
+- An Object Format Table (16 bytes per format)
+- A Trailer Offset (8 bytes)
+- Data Sections (variable length, 4-byte aligned)
+- A Trailer (variable length, defined by hash algorithm)
+
+=== Header
+
+- 4-byte signature: `LMAP`
+- 4-byte version number: The current version is 1.
+- 4-byte header size: The total size of the header, including the Object Format Table and Trailer Offset.
+- 4-byte number of items: The number of object IDs mapped in this file.
+- 4-byte number of object formats: The number of different hash algorithms supported in this file (minimum 2).
+
+=== Object Format Table
+
+For each object format (as specified in the header), there is a 16-byte entry:
+
+- 4-byte Format ID: The identifier for the hash algorithm (e.g., `0x73686131` for SHA-1, `0x73323536` for SHA-256).
+- 4-byte Shortened Length: The minimum number of bytes needed to unambiguously identify an object ID in this format within this file.
+- 8-byte Data Offset: The absolute offset from the beginning of the file to the start of the data section for this format.
+
+=== Trailer Offset
+
+- 8-byte Trailer Offset: The absolute offset from the beginning of the file to the start of the Trailer.
+
+=== Data Sections
+
+Each object format has a corresponding data section starting at the offset provided in the Object Format Table. Each data section is aligned to a 4-byte boundary.
+
+==== Format 1 (Storage Format) Data Section
+
+The first format listed is considered the "storage" or "main" format. Its data section contains:
+
+1. **Shortened Index**: `(number of items) * (shortened length)` bytes.
+ This table contains the first `shortened length` bytes of each object ID, sorted lexicographically. This allows for binary search lookup.
+
+2. **Full OID Table**: `(number of items) * (hash length)` bytes.
+ The full object IDs for the storage format, in the same order as the Shortened Index.
+
+3. **Metadata Table**: `(number of items) * 4` bytes.
+ A table of 32-bit integers representing the type of each object:
+ - 0: Reserved (e.g., null OID, empty tree/blob)
+ - 1: Loose Object
+ - 2: Shallow Commit
+ - 3: Submodule Commit
+
+==== Subsequent Format (Compatibility) Data Section
+
+For each subsequent format, the data section contains:
+
+1. **Shortened Index**: Similar to the storage format, but for the compatibility algorithm's OIDs.
+
+2. **Full OID Table**: The full object IDs in the compatibility algorithm.
+
+3. **Mapping Table**: `(number of items) * 4` bytes.
+ A table of 32-bit integers. Each entry at index `i` provides the index in the **storage format's** tables that corresponds to this compatibility object ID.
+
+=== Trailer
+
+- Variable length: The hash of all preceding bytes in the file, calculated using the main hash algorithm.
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index ec07088c57..dc1249f9aa 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -16,6 +16,7 @@ articles = [
'large-object-promisors.adoc',
'long-running-process-protocol.adoc',
'multi-pack-index.adoc',
+ 'loose-object-map.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
'parallel-checkout.adoc',
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-06-18 17:06 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16 5:55 [PATCH] doc: detail LMAP binary format specification irsalshydiq
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox