From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Junio C Hamano" <junkio@cox.net>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
"Git Mailing List" <git@vger.kernel.org>
Subject: Re: A tracking tree for the active work space
Date: Sun, 11 Mar 2007 16:35:50 -0400 [thread overview]
Message-ID: <9e4733910703111335j20c0acf4wa12c2d410580898b@mail.gmail.com> (raw)
In-Reply-To: <7vhcsrwn8d.fsf@assigned-by-dhcp.cox.net>
On 3/11/07, Junio C Hamano <junkio@cox.net> wrote:
> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
> > Reading the other thread on tracking temporary changes made me think
> > of using inotify with git. The basic idea would be to a daemon running
> > that uses inotify to listen for changes in the working tree. As these
> > changes happen they get committed to a tracking tree.
>
> I think it is an interesting idea, but can be used with any SCM
> not just git ;-).
As for the part about 'git grep' Shawn and I have been talking off
and on about experimenting with an inverted index for a packfile
format. The basic idea is that you tokenize the input and turn a
source file into a list of tokens. You diff with the list of tokens
like you would normally do with text. There is a universal dictionary
for tokens, a token's id is it's position in the dictionary.
Tokenized text is one of the most compact compression schemes known.
It can get even more compact by tokenizing common phrases and using
variable length token ids. Compression schemes like this are used in
web search engines. Of course you keep a check in place for input that
doesn't tokenize (binary) and fallback to gzip.
To build 'git grep' you make a bitmap index for each token in the
dictionary and put a one in it if the file has the token. Gzip these
indexes and then there are algorithms for doing and/or operations on
the zipped indexes without expanding them. grep is almost instant over
gigabytes of text if indexes like this are available.
Keeping everything up to date on a dual core system is pretty much
free since that second core is rarely doing anything while you are
editing.
--
Jon Smirl
jonsmirl@gmail.com
next prev parent reply other threads:[~2007-03-11 20:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-11 14:06 A tracking tree for the active work space Jon Smirl
2007-03-11 20:15 ` Junio C Hamano
2007-03-11 20:35 ` Jon Smirl [this message]
2007-03-11 21:31 ` Johannes Schindelin
2007-03-11 21:40 ` Jeff King
2007-03-12 1:39 ` Jon Smirl
2007-03-11 23:18 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e4733910703111335j20c0acf4wa12c2d410580898b@mail.gmail.com \
--to=jonsmirl@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).