From: "Marco Costalba" <mcostalba@gmail.com>
To: "Nicolas Pitre" <nico@cam.org>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
"Peter Eriksen" <s022018@student.dtu.dk>,
git@vger.kernel.org
Subject: Re: Understanding version 4 packs
Date: Mon, 26 Mar 2007 19:10:05 +0200 [thread overview]
Message-ID: <e5bfff550703261010u67aa1207j1c6f0200bb7744a@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.0.83.0703261015110.3041@xanadu.home>
On 3/26/07, Nicolas Pitre <nico@cam.org> wrote:
> On Mon, 26 Mar 2007, Marco Costalba wrote:
>
> > Experimenting with file names cache in qgit I have found a big saving
> > splitting the paths in base name and file name and indexing both:
> >
> > drivers\usb\host\ehci.h
> > drivers\usb\host\ehci-pci.c
> > drivers\usb\host\ohci-pci.c
> > kernel\sched.c
> >
> > became:
> >
> > dir names table
> >
> > 0 drivers\usb\host
> > 1 kernel
> >
> >
> > file name table
> >
> > 0 ehci.h
> > 1 ehci-pci.c
> > 2 ohci-pci.c
> >
> > In this way a big saving is achieved in case of directories deep in
> > the tree (long paths) and a lot of files.
>
> Sure, but if you also consider drivers/usb/Makefile and drivers/Kconfig
> for example then you start losing on space saving.
In your example you'd have:
drivers/usb/Makefile
drivers/Kconfig
became
dir names table
0 drivers
1 drivers/usb
file name table
0 Makefile
1 Kconfig
I fail to see wher's the losing on space saving. More, you probably
have many paths both under 'drivers' and 'drivers/usb' and for each
added path it would be possible to avoid to store the prefix ('driver'
or 'driver/usb').
To better clarify, OBJ_DICT_TREE data *currently* looks like:
+------------+-------+-------+-------+-------+----
| NR_ENTRIES | name1 | hash1 | name2 | hash2 | ...
+------------+-------+-------+-------+-------+----
vint 2 bytes 4 bytes 2 bytes 4 bytes
where name1 is an index into the packfile's sole EXTOBJ_FILENAME_TABLE.
The possible improve is to define OBJ_DICT_TREE like
+------------+-------+-------+-------+-------+----
| NR_ENTRIES | dir1 | fiile1 | hash1| dir 2| fiile2|...
+------------+-------+-------+-------+-------+----
vint 2 bytes 2 bytes 2 bytes 4 bytes
where dir1 is an index into a new EXTOBJ_DIRNAME_TABLE and file1 is an
index in a new EXTOBJ_FILENAME_TABLE.
EXTOBJ_FILENAME_TABLE is defined as the currently (but much smaller in
size!!) and keeps only the file names, not the full paths, while
EXTOBJ_DIRNAME_TABLE is defined as EXTOBJ_FILENAME_TABLE but without
MODE field (associated to files only) and is used to store the dir
names.
Decopuling dir names from file names could improve saving space
because the length of proposed EXTOBJ_FILENAME_TABLE +
EXTOBJ_DIRNAME_TABLE < current EXTOBJ_FILENAME_TABLE.
Marco
P.S: Of course now you'd save 2+2 bytes in OBJ_DICT_TREE instead of 2
for 'name' index.
To avoid this and keep the idea of decopuling dir and file names an
still use 2 bytes in OBJ_DICT_TREE a possible layout of
EXTOBJ_FILENAME_TABLE could be:
+------------+------+-------+-----------------+---
-+----------------+-------+------+----------+
| NR_ENTRIES | dirA | file name1 | ofs1| file name2 | ofs 2|dirB
|file name3 | ofs3 | ....
+------------+------+-------+-----------------+----
+---------------+--------+------+----------+
Where ofs1 and ofs2 are 2-bytes values pointing to dirA, ofs3 points
to dirB and so on.
Where the tree layout of the above example is:
dirA \ file name1
dirA \ file name2
dirB \ file name3
With this approach you have both the saving in case of directories
with many files and still 2 bytes per 'name' index in OBJ_DICT_TREE
(that points to 'file name' field). This approach saves space as soon
as directory names are longer then 2 chars.
next prev parent reply other threads:[~2007-03-26 17:10 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-24 20:23 Understanding version 4 packs Peter Eriksen
2007-03-24 23:24 ` Nicolas Pitre
2007-03-25 8:35 ` Peter Eriksen
2007-03-25 9:18 ` Shawn O. Pearce
2007-03-25 17:09 ` Linus Torvalds
2007-03-25 20:31 ` Shawn O. Pearce
2007-03-26 1:12 ` Nicolas Pitre
2007-03-26 2:02 ` Shawn O. Pearce
2007-03-26 8:49 ` Jakub Narebski
2007-03-26 14:01 ` Nicolas Pitre
2007-03-26 12:16 ` Marco Costalba
2007-03-26 14:27 ` Nicolas Pitre
2007-03-26 17:10 ` Marco Costalba [this message]
2007-03-26 18:15 ` Nicolas Pitre
2007-03-26 18:43 ` Nicolas Pitre
2007-03-27 6:46 ` Marco Costalba
2007-03-27 6:55 ` Shawn O. Pearce
2007-03-25 8:46 ` Shawn O. Pearce
2007-03-25 9:40 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5bfff550703261010u67aa1207j1c6f0200bb7744a@mail.gmail.com \
--to=mcostalba@gmail.com \
--cc=git@vger.kernel.org \
--cc=nico@cam.org \
--cc=s022018@student.dtu.dk \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).