git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marco Costalba" <mcostalba@gmail.com>
To: "Alex Riesen" <raa.lkml@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: Re: [PATCH] Add --show-size to git log to print message size
Date: Sun, 15 Jul 2007 12:06:53 +0200	[thread overview]
Message-ID: <e5bfff550707150306t3196f723ia3071ac301fb3f24@mail.gmail.com> (raw)
In-Reply-To: <20070715093529.GD2568@steel.home>

On 7/15/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> Marco Costalba, Sat, Jul 14, 2007 22:46:39 +0200:
> > Finding the delimiting '\0' it means to loop across the whole buffers
> > and _this_ is the expensive and not needed part. If just after the
>
> It is _not_ expensive. It could be made expensive, though. By using
> QString and QByteArray, for instance.
>

The searching we are talking about is this (Rev::indexData() in
git_startup.cpp):

int end = ba.indexOf('\0', idx); // this is the slowest find

the starting point 'idx' is at the beginning of the log message.


Qt implemantation of indexOf() is this (src/corelib/tools/qbytearray.cpp):

int QByteArray::indexOf(char ch, int from) const
{
    if (from < 0)
        from = qMax(from + d->size, 0);
    if (from < d->size) {
        const char *n = d->data + from - 1;
        const char *e = d->data + d->size;
        while (++n != e)
        if (*n == ch)
            return  n - d->data;
    }
    return -1;
}

Hope this clears any doubts regarding (supposed) slowness of Qt classes.

> > first line would be possible to point to the beginning of the next
> > revision this seeking for '\0' would be not necessary anymore.
>
> But this will make your reading different: you have to handle the case
> when the next revision is not _fully_ read in yet, but you already
> know its size.
>

Reading and creating revision is made as a streaming, it means that
when there is new data  from git a new Rev struct (well it's a class
indeed, but there's no diference) is created and populated with index
data: offset of the rev, parents number, offset of log message and so
on.

If, *while parsing the data* a truncated rev is found (we are at EOF
and no '\0' is found) the whole rev is discarded and deleted, we wait
for some more data and restart the process.

Because the above event is quite rare given the size of the buffers
where git row data is stored, no really loss of speed occurs and we
have the (big) advantage of indexing *while* searching for '\0', so to
scan data only once.

This is how it works now.

With the proposed patch will be easier to find a truncated rev,
because as soon as we know the rev size, after reading it from the
stream, we check:

             if (revision_offset + size > byte_array_size)
                       truncated_rev;


>
> P.S. BTW, why do you have some 20 source files marked executable in
> your qgit4 repository?
>

Importing from Windows: ntfs does not handles file attributes
correctly, I should clean up permissions but I'm lazy ;-)


Marco

P.S: I have an experimental branch where the above is implemented, I
cannot publish now because it requires the --show-size change in git,
but after initial testing I have found that with the above applied the
overhead of qgit on git-log it's about of only 16%.

It means that if git-log runs in say 3 seconds (warm cache), qgit with
the same git log arguments runs in about 3.5 seconds.

With cold cache overhead is also less because disk access is accounted
on the git side ;-)

  reply	other threads:[~2007-07-15 10:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-14 16:52 [PATCH] Add --show-size to git log to print message size Marco Costalba
2007-07-14 19:03 ` Junio C Hamano
2007-07-14 20:46   ` Marco Costalba
2007-07-15  9:35     ` Alex Riesen
2007-07-15 10:06       ` Marco Costalba [this message]
2007-07-15 10:48         ` Alex Riesen
2007-07-15 11:32           ` Marco Costalba
2007-07-15 12:29             ` Marco Costalba
2007-07-15 12:35               ` Sean
2007-07-15 14:58                 ` Marco Costalba
2007-07-15 15:04                   ` Sean
2007-07-15 15:58                     ` Marco Costalba
2007-07-15 16:16                       ` Sean
2007-07-15 16:27                         ` Marco Costalba
2007-07-15 16:34                           ` Sean
2007-07-15 16:54                             ` Marco Costalba
2007-07-15 18:14               ` Linus Torvalds
2007-07-15 18:45                 ` Marco Costalba
2007-07-16 12:04   ` Marco Costalba
2007-07-16 12:31     ` Alex Riesen
2007-07-16 17:50       ` Junio C Hamano
2007-07-16 17:55         ` Marco Costalba
2007-07-16 18:02           ` Marco Costalba
2007-07-16 22:37             ` Junio C Hamano
2007-07-16 17:50   ` Marco Costalba
2007-07-17  7:49 ` Andy Parkins
2007-07-17 16:36   ` Marco Costalba
2007-07-25  4:03 ` Junio C Hamano
2007-07-25  9:38   ` Marco Costalba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5bfff550707150306t3196f723ia3071ac301fb3f24@mail.gmail.com \
    --to=mcostalba@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=raa.lkml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).