From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Marco Costalba" Subject: Re: [PATCH] Add --show-size to git log to print message size Date: Sun, 15 Jul 2007 12:06:53 +0200 Message-ID: References: <7vodiehko7.fsf@assigned-by-dhcp.cox.net> <20070715093529.GD2568@steel.home> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "Junio C Hamano" , "Git Mailing List" To: "Alex Riesen" X-From: git-owner@vger.kernel.org Sun Jul 15 12:06:59 2007 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1IA10F-0004Rg-6h for gcvg-git@gmane.org; Sun, 15 Jul 2007 12:06:59 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757760AbXGOKGz (ORCPT ); Sun, 15 Jul 2007 06:06:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757851AbXGOKGz (ORCPT ); Sun, 15 Jul 2007 06:06:55 -0400 Received: from wa-out-1112.google.com ([209.85.146.177]:52494 "EHLO wa-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757648AbXGOKGy (ORCPT ); Sun, 15 Jul 2007 06:06:54 -0400 Received: by wa-out-1112.google.com with SMTP id v27so1218006wah for ; Sun, 15 Jul 2007 03:06:53 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=dvmPdFbuX6CZr/z+x/rh3FNrB5ng5A59F6uKEING6WhQv9bWCpJQz7Ym5ewSW8wNvD62mzdcW5Zj1tEyfq1yh6LJoAvn0PvSk1qqWy5sPpALhUgG5daAAr/3BF98KSu9/p0Kub4pfIsXzQck74hvYVBAVIsGpY/WGGtxO8HzHUQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=rXX3NlR/NwoVayX0q2UdkLUe+YRvpJ6ayokzl/zFXO7iB/Ga5ioZJqPFfTBnSNiLpNkUthuGfFiyUuxHd5qUY13c+vgOEerXcl65ExFe9tBOHk/9t/Ehkjy576kLQ/PRdYJQWSoSH+uKO5H5wYLpn8QdIA+FocmqW0jbigW4I6E= Received: by 10.114.103.1 with SMTP id a1mr3177191wac.1184494013175; Sun, 15 Jul 2007 03:06:53 -0700 (PDT) Received: by 10.114.56.6 with HTTP; Sun, 15 Jul 2007 03:06:53 -0700 (PDT) In-Reply-To: <20070715093529.GD2568@steel.home> Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: On 7/15/07, Alex Riesen wrote: > Marco Costalba, Sat, Jul 14, 2007 22:46:39 +0200: > > Finding the delimiting '\0' it means to loop across the whole buffers > > and _this_ is the expensive and not needed part. If just after the > > It is _not_ expensive. It could be made expensive, though. By using > QString and QByteArray, for instance. > The searching we are talking about is this (Rev::indexData() in git_startup.cpp): int end = ba.indexOf('\0', idx); // this is the slowest find the starting point 'idx' is at the beginning of the log message. Qt implemantation of indexOf() is this (src/corelib/tools/qbytearray.cpp): int QByteArray::indexOf(char ch, int from) const { if (from < 0) from = qMax(from + d->size, 0); if (from < d->size) { const char *n = d->data + from - 1; const char *e = d->data + d->size; while (++n != e) if (*n == ch) return n - d->data; } return -1; } Hope this clears any doubts regarding (supposed) slowness of Qt classes. > > first line would be possible to point to the beginning of the next > > revision this seeking for '\0' would be not necessary anymore. > > But this will make your reading different: you have to handle the case > when the next revision is not _fully_ read in yet, but you already > know its size. > Reading and creating revision is made as a streaming, it means that when there is new data from git a new Rev struct (well it's a class indeed, but there's no diference) is created and populated with index data: offset of the rev, parents number, offset of log message and so on. If, *while parsing the data* a truncated rev is found (we are at EOF and no '\0' is found) the whole rev is discarded and deleted, we wait for some more data and restart the process. Because the above event is quite rare given the size of the buffers where git row data is stored, no really loss of speed occurs and we have the (big) advantage of indexing *while* searching for '\0', so to scan data only once. This is how it works now. With the proposed patch will be easier to find a truncated rev, because as soon as we know the rev size, after reading it from the stream, we check: if (revision_offset + size > byte_array_size) truncated_rev; > > P.S. BTW, why do you have some 20 source files marked executable in > your qgit4 repository? > Importing from Windows: ntfs does not handles file attributes correctly, I should clean up permissions but I'm lazy ;-) Marco P.S: I have an experimental branch where the above is implemented, I cannot publish now because it requires the --show-size change in git, but after initial testing I have found that with the above applied the overhead of qgit on git-log it's about of only 16%. It means that if git-log runs in say 3 seconds (warm cache), qgit with the same git log arguments runs in about 3.5 seconds. With cold cache overhead is also less because disk access is accounted on the git side ;-)