Git development
 help / color / mirror / Atom feed
* [PATCH updated] Add "--dirstat" for some directory statistics
@ 2008-02-12 21:26 Linus Torvalds
  2008-02-12 21:56 ` Linus Torvalds
  2008-02-13  7:55 ` Abdelrazak Younes
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2008-02-12 21:26 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


This adds a new form of overview diffstat output, doing something that I 
have occasionally ended up doing manually (and badly, because it's 
actually pretty nasty to do), and that I think is very useful for an 
project like the kernel that has a fairly deep and well-separated 
directory structure with semantic meaning.

What I mean by that is that it's often interesting to see exactly which 
sub-directories are impacted by a patch, and to what degree - even if you 
don't perhaps care so much about the individual files themselves.

What makes the concept more interesting is that the "impact" is often 
hierarchical: in the kernel, for example, something could either have a 
very localized impact to "fs/ext3/" and then it's interesting to see that 
such a patch changes mostly that subdirectory, but you could have another 
patch that changes some generic VFS-layer issue which affects _many_ 
subdirectories that are all under "fs/", but none - or perhaps just a 
couple of them - of the individual filesystems are interesting in 
themselves.

So what commonly happens is that you may have big changes in a specific
sub-subdirectory, but still also significant separate changes to the
subdirectory leading up to that - maybe you have significant VFS-level
changes, but *also* changes under that VFS layer in the NFS-specific
directories, for example. In that case, you do want the low-level parts
that are significant to show up, but then the insignificant ones should
show up as under the more generic top-level directory.

This patch shows all of that with "--dirstat". The output can be either
something simple like

        commit 81772fe...
        Author: Thomas Gleixner <tglx@linutronix.de>
        Date:   Sun Feb 10 23:57:36 2008 +0100

            x86: remove over noisy debug printk

            pageattr-test.c contains a noisy debug printk that people reported.
            The condition under which it prints (randomly tapping into a mem_map[]
            hole and not being able to c_p_a() there) is valid behavior and not
            interesting to report.

            Remove it.

            Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
            Acked-by: Ingo Molnar <mingo@elte.hu>
            Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

         100.0% arch/x86/mm/

or something much more complex like

        commit e231c2e...
        Author: David Howells <dhowells@redhat.com>
        Date:   Thu Feb 7 00:15:26 2008 -0800

            Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)

	  20.5% crypto/
	   7.6% fs/afs/
	   7.6% fs/fuse/
	   7.6% fs/gfs2/
	   5.1% fs/jffs2/
	   5.1% fs/nfs/
	   5.1% fs/nfsd/
	   7.6% fs/reiserfs/
	  15.3% fs/
	   7.6% net/rxrpc/
	  10.2% security/keys/

where that latter example is an example of significant work in some 
individual fs/*/ subdirectories (like the patches to reiserfs accounting 
for 7.6% of the whole), but then discounting those individual filesystems, 
there's also 15.3% other "random" things that weren't worth reporting on 
their oen left over under fs/ in general (either in that directory itself, 
or in subdirectories of fs/ that didn't have enough changes to be reported 
individually).

I'd like to stress that the "15.3% fs/" mentioned above is the stuff that 
is under fs/ but that was _not_ significant enough to report on its own.  
So the above does _not_ mean that 15.3% of the work was under fs/ per se, 
because that 15.3% does *not* include the already-reported 7.6% of afs, 
7.6% of fuse etc.

If you want to enable "cumulative" directory statistics, you can use the
"--cumulative" flag, which adds up percentages recursively even when
they have been already reported for a sub-directory.  That cumulative
output is disabled if *all* of the changes in one subdirectory come from
a deeper subdirectory, to avoid repeating subdirectories all the way to
the root.

For an example of the cumulative reporting, the above commit becomes

	commit e231c2e...
	Author: David Howells <dhowells@redhat.com>
	Date:   Thu Feb 7 00:15:26 2008 -0800

	    Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)

	  20.5% crypto/
	   7.6% fs/afs/
	   7.6% fs/fuse/
	   7.6% fs/gfs2/
	   5.1% fs/jffs2/
	   5.1% fs/nfs/
	   5.1% fs/nfsd/
	   7.6% fs/reiserfs/
	  61.5% fs/
	   7.6% net/rxrpc/
	  10.2% security/keys/

in which the commit percentages now obviously add up to much more than 
100%: now the changes that were already reported for the sub-directories 
under fs/ are then cumulatively included in the whole percentage of fs/ 
(ie now shows 61.5% as opposed to the 15.3% without the cumulative 
reporting).

The default reporting limit has been arbitrarily set at 3%, which seems
to be a pretty good cut-off, but you can specify the cut-off manually by
giving it as an option parameter (eg "--dirstat=5" makes the cut-off be
at 5% instead)

NOTE! The percentages are purely about the total lines added and removed,
not anything smarter (or dumber) than that. Also note that you should not
generally expect things to add up to 100%: not only does it round down, we
don't report leftover scraps (they add up to the top-level change count,
but we don't even bother reporting that, it only reports subdirectories).

Quite frankly, as a top-level manager this is really convenient for me,
but it's going to be very boring for git itself since there are few
subdirectories. Also, don't expect things to make tons of sense if you
combine this with "-M" and there are cross-directory renames etc.

But even for git itself, you can get some fun statistics. Try out

        git log --dirstat

and see the occasional mentions of things like Documentation/, git-gui/,
gitweb/ and gitk-git/. Or try out something like

        git diff --dirstat v1.5.0..v1.5.4

which does kind of git an overview that shows *something*. But in general,
the output is more exciting for big projects with deeper structure, and
doing a

        git diff --dirstat v2.6.24..v2.6.25-rc1

on the kernel is what I actually wrote this for!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is a re-send of all my changes rolled up into one single patch. One 
of the reasons is that the "report percentages in tenths of percent" 
simply changed the examples in the original commit message. Another is 
that I actually made the "--cumulative" flag much more useful in practice 
by making it avoid reporting the subdirectories where all the percentages 
came from just a single already-reported sub-subdirectory.

The --cumulative thing still has the odd property of having all the 
percentages add up to more than 100%, but quite often it is now what you 
really want, and you can, for example, get output like this:

	  27.5% security/selinux/include/
	 100.0% security/selinux/

which means that everything is in "security/selinux", but within that 
directory we also show that 27.5% was within the include subdirectory. 
Note how it now no longer gives a "100.0% security/" report, since that 
would be purely redundant information.

Another example of this new semantic:

	  23.8% include/asm-sparc/
	  76.1% include/asm-sparc64/
	 100.0% include/

here, "100% include/" _is_ reported, because while the 100% did come from 
the already reported subdirectories, it didn't come from a _single_ one, 
so it basically means that you don't ever have to mentally add up things, 
and we only avoid reporting upper-level directory statistics when there 
really is no point to it.

With this change, "--cumulative" suddenly makes tons of sense to me. For 
example, for the 2.6.24..2.6.25-rc1 changes in the kernel (with a 2% 
cutoff), it now gives:

 - "git diff --dirstat=2 --cumulative v2.6.24..v2.6.25-rc1" output:

	   2.4% Documentation/
	   3.2% arch/arm/
	   2.0% arch/cris/
	   2.3% arch/powerpc/boot/
	   2.3% arch/powerpc/configs/
	   6.9% arch/powerpc/
	   3.2% arch/sh/
	   2.3% arch/x86/kernel/
	   6.9% arch/x86/
	  31.3% arch/
	   2.0% drivers/char/
	   2.0% drivers/infiniband/
	   2.9% drivers/media/video/
	   3.7% drivers/media/
	   2.4% drivers/net/wireless/iwlwifi/
	   7.3% drivers/net/wireless/
	  14.3% drivers/net/
	   2.8% drivers/scsi/
	  37.2% drivers/
	   3.5% fs/
	   2.0% include/asm-x86/
	  10.3% include/
	   6.0% net/
	   3.7% sound/

and now you can see at a glance that 31.3 was in arch/, and 37.2% was in 
drivers/. So this is a case where --cumulative actually makes sense (but 
it probably still doesn't make sense if you do a "git log --dirstat" and 
look at just individual commits that tend to be more about a single 
change).

Anyway, it's a bit larger now, but my explanations and blathering is
still much bigger than the patch itself ;)

 diff.c |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 diff.h |    3 ++
 2 files changed, 92 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index cd8bc4d..ad0ff28 100644
--- a/diff.c
+++ b/diff.c
@@ -990,6 +990,85 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options)
 	}
 }
 
+struct diffstat_dir {
+	struct diffstat_file **files;
+	int nr, percent, cumulative;
+};
+
+static long gather_dirstat(struct diffstat_dir *dir, unsigned long changed, const char *base, int baselen)
+{
+	unsigned long this_dir = 0;
+	unsigned int sources = 0;
+
+	while (dir->nr) {
+		struct diffstat_file *f = *dir->files;
+		int namelen = strlen(f->name);
+		unsigned long this;
+		char *slash;
+
+		if (namelen < baselen)
+			break;
+		if (memcmp(f->name, base, baselen))
+			break;
+		slash = strchr(f->name + baselen, '/');
+		if (slash) {
+			int newbaselen = slash + 1 - f->name;
+			this = gather_dirstat(dir, changed, f->name, newbaselen);
+			sources++;
+		} else {
+			this = f->added + f->deleted;
+			dir->files++;
+			dir->nr--;
+			sources += 2;
+		}
+		this_dir += this;
+	}
+
+	/*
+	 * We don't report dirstat's for 
+	 *  - the top level
+	 *  - or cases where everything came from a single directory
+	 *    under this directory (sources == 1).
+	 */
+	if (baselen && sources != 1) {
+		int permille = this_dir * 1000 / changed;
+		if (permille) {
+			int percent = permille / 10;
+			if (percent >= dir->percent) {
+				printf("%4d.%01d%% %.*s\n", percent, permille % 10, baselen, base);
+				if (!dir->cumulative)
+					return 0;
+			}
+		}
+	}
+	return this_dir;
+}
+
+static void show_dirstat(struct diffstat_t *data, struct diff_options *options)
+{
+	int i;
+	unsigned long changed;
+	struct diffstat_dir dir;
+
+	/* Calculate total changes */
+	changed = 0;
+	for (i = 0; i < data->nr; i++) {
+		changed += data->files[i]->added;
+		changed += data->files[i]->deleted;
+	}
+
+	/* This can happen even with many files, if everything was renames */
+	if (!changed)
+		return;
+
+	/* Show all directories with more than x% of the changes */
+	dir.files = data->files;
+	dir.nr = data->nr;
+	dir.percent = options->dirstat_percent;
+	dir.cumulative = options->output_format & DIFF_FORMAT_CUMULATIVE;
+	gather_dirstat(&dir, changed, "", 0);
+}
+
 static void free_diffstat_info(struct diffstat_t *diffstat)
 {
 	int i;
@@ -2058,6 +2137,7 @@ void diff_setup(struct diff_options *options)
 	options->line_termination = '\n';
 	options->break_opt = -1;
 	options->rename_limit = -1;
+	options->dirstat_percent = 3;
 	options->context = 3;
 	options->msg_sep = "";
 
@@ -2099,6 +2179,7 @@ int diff_setup_done(struct diff_options *options)
 					    DIFF_FORMAT_NUMSTAT |
 					    DIFF_FORMAT_DIFFSTAT |
 					    DIFF_FORMAT_SHORTSTAT |
+					    DIFF_FORMAT_DIRSTAT |
 					    DIFF_FORMAT_SUMMARY |
 					    DIFF_FORMAT_PATCH);
 
@@ -2110,6 +2191,7 @@ int diff_setup_done(struct diff_options *options)
 				      DIFF_FORMAT_NUMSTAT |
 				      DIFF_FORMAT_DIFFSTAT |
 				      DIFF_FORMAT_SHORTSTAT |
+				      DIFF_FORMAT_DIRSTAT |
 				      DIFF_FORMAT_SUMMARY |
 				      DIFF_FORMAT_CHECKDIFF))
 		DIFF_OPT_SET(options, RECURSIVE);
@@ -2220,6 +2302,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		options->output_format |= DIFF_FORMAT_NUMSTAT;
 	else if (!strcmp(arg, "--shortstat"))
 		options->output_format |= DIFF_FORMAT_SHORTSTAT;
+	else if (opt_arg(arg, 'X', "dirstat", &options->dirstat_percent))
+		options->output_format |= DIFF_FORMAT_DIRSTAT;
+	else if (!strcmp(arg, "--cumulative"))
+		options->output_format |= DIFF_FORMAT_CUMULATIVE;
 	else if (!strcmp(arg, "--check"))
 		options->output_format |= DIFF_FORMAT_CHECKDIFF;
 	else if (!strcmp(arg, "--summary"))
@@ -2938,7 +3024,7 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIRSTAT)) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -2948,6 +3034,8 @@ void diff_flush(struct diff_options *options)
 			if (check_pair_status(p))
 				diff_flush_stat(p, options, &diffstat);
 		}
+		if (output_format & DIFF_FORMAT_DIRSTAT)
+			show_dirstat(&diffstat, options);
 		if (output_format & DIFF_FORMAT_NUMSTAT)
 			show_numstat(&diffstat, options);
 		if (output_format & DIFF_FORMAT_DIFFSTAT)
diff --git a/diff.h b/diff.h
index 073d5cb..8c6bb54 100644
--- a/diff.h
+++ b/diff.h
@@ -30,6 +30,8 @@ typedef void (*diff_format_fn_t)(struct diff_queue_struct *q,
 #define DIFF_FORMAT_SUMMARY	0x0008
 #define DIFF_FORMAT_PATCH	0x0010
 #define DIFF_FORMAT_SHORTSTAT	0x0020
+#define DIFF_FORMAT_DIRSTAT	0x0040
+#define DIFF_FORMAT_CUMULATIVE	0x0080
 
 /* These override all above */
 #define DIFF_FORMAT_NAME	0x0100
@@ -80,6 +82,7 @@ struct diff_options {
 	int pickaxe_opts;
 	int rename_score;
 	int rename_limit;
+	int dirstat_percent;
 	int setup;
 	int abbrev;
 	const char *msg_sep;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-12 21:26 [PATCH updated] Add "--dirstat" for some directory statistics Linus Torvalds
@ 2008-02-12 21:56 ` Linus Torvalds
  2008-02-13  0:54   ` Junio C Hamano
  2008-02-13  7:55 ` Abdelrazak Younes
  1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2008-02-12 21:56 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List



On Tue, 12 Feb 2008, Linus Torvalds wrote:
> 
> But even for git itself, you can get some fun statistics. Try out
> 
>         git log --dirstat
> 
> and see the occasional mentions of things like Documentation/, git-gui/,
> gitweb/ and gitk-git/. Or try out something like
> 
>         git diff --dirstat v1.5.0..v1.5.4

Side note: this latter one is actually a good example of two 
different issues.

First off, it's an example of where --cumulative really does make sense, 
since it reports:

	  10.0% Documentation/
	   4.9% contrib/
	   5.7% git-gui/lib/
	  13.9% git-gui/macosx/
	   8.7% git-gui/po/
	  31.4% git-gui/
	   4.6% gitk-git/
	  13.1% t/

ie it makes it clear that almost a third of changes between 1.5.0 and 
1.5.4 was in git-gui/.

But secondly, it's an example of how the whole --dirstat is totally 
buggered in certain circumstances, because the above is simply not *true*: 
yes, a lot of the changes were actually due to git-gui, and all those *.po 
files, but the git-gui/macosx/ count is ridiculously off.

Why? That directory has a binary file, which doesn't have a "line count", 
so --dirstat actually ends up using the byte-count instead there, which 
inflates the macosx numbers a lot (same is true of git-gui/lib/, just to a 
smaller degree).

I don't think that's necessarily a bug, but it's certainly misleading. 
Does it matter? Not to me, because the kernel generally doesn't have those 
kinds of issues (no binary blobs that ever really change), but I did want 
to point out that there's room for perhaps improving things. Maybe we 
could add a byte count in general to the diffstat data structures and 
always just count bytes changed rather than lines changed. Or maybe we 
should consider binary files to have something like "32 bytes per virtual 
line" or something.

So _I_ don't care, but if somebody does, maybe they want to do something 
about it. I thought I'd point it out since I just noticed.

					Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-12 21:56 ` Linus Torvalds
@ 2008-02-13  0:54   ` Junio C Hamano
  2008-02-13  1:10     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2008-02-13  0:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Why? That directory has a binary file, which doesn't have a "line count", 
> so --dirstat actually ends up using the byte-count instead there, which 
> inflates the macosx numbers a lot (same is true of git-gui/lib/, just to a 
> smaller degree).
>
> I don't think that's necessarily a bug, but it's certainly misleading. 
> Does it matter? Not to me, because the kernel generally doesn't have those 
> kinds of issues (no binary blobs that ever really change), but I did want 
> to point out that there's room for perhaps improving things. Maybe we 
> could add a byte count in general to the diffstat data structures and 
> always just count bytes changed rather than lines changed. Or maybe we 
> should consider binary files to have something like "32 bytes per virtual 
> line" or something.

You can drop the "line count" idea and instead count the "amount
of damage" done to the preimage, just like we do in the rename
similarity computation.

The attached patch is just an outline.  There may need special
cases for unmerged paths.

Comparison with your examples is a bit interesting:

    (non cumulative)            (linus, non cumulative)
      18.6% crypto/              20.5% crypto/             
       7.7% fs/afs/               7.6% fs/afs/             
      10.9% fs/fuse/              7.6% fs/fuse/            
       7.4% fs/gfs2/              7.6% fs/gfs2/            
       5.3% fs/jffs2/             5.1% fs/jffs2/       
       4.9% fs/nfs/               5.1% fs/nfs/             
       4.5% fs/nfsd/              5.1% fs/nfsd/            
       7.4% fs/reiserfs/          7.6% fs/reiserfs/    
      14.4% fs/                  15.3% fs/         
       7.4% net/rxrpc/            7.6% net/rxrpc/      
      10.8% security/keys/       10.2% security/keys/  

So it does not change much for text-only project.  But the big
binary file difference is now in the noise:

    (git)                       (linus, git)                 
      12.4% Documentation/       10.0% Documentation/        
       7.3% contrib/              4.9% contrib/              
       6.0% git-gui/lib/          5.7% git-gui/lib/  
      12.9% git-gui/po/          13.9% git-gui/macosx/       
      23.0% git-gui/              8.7% git-gui/po/   
       7.3% gitk-git/            31.4% git-gui/              
      14.1% t/                    4.6% gitk-git/     
                                 13.1% t/                

git-gui/lib is about 220k while macosx is 30k, when counted with
"find $there -type f | xargs cat | wc -c", so this reflects the
reality much better.

I also suspect --cumulative should imply --dirstat but that is a
different patch if ever.

 diff.c |   80 ++++++++++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/diff.c b/diff.c
index dd374d4..8ad04ac 100644
--- a/diff.c
+++ b/diff.c
@@ -990,18 +990,23 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options)
 	}
 }
 
-struct diffstat_dir {
-	struct diffstat_file **files;
-	int nr, percent, cumulative;
+struct dirstat_file {
+	const char *name;
+	unsigned long changed;
 };
 
-static long gather_dirstat(struct diffstat_dir *dir, unsigned long changed, const char *base, int baselen)
+struct dirstat_dir {
+	struct dirstat_file *files;
+	int alloc, nr, percent, cumulative;
+};
+
+static long gather_dirstat(struct dirstat_dir *dir, unsigned long changed, const char *base, int baselen)
 {
 	unsigned long this_dir = 0;
 	unsigned int sources = 0;
 
 	while (dir->nr) {
-		struct diffstat_file *f = *dir->files;
+		struct dirstat_file *f = dir->files;
 		int namelen = strlen(f->name);
 		unsigned long this;
 		char *slash;
@@ -1016,7 +1021,7 @@ static long gather_dirstat(struct diffstat_dir *dir, unsigned long changed, cons
 			this = gather_dirstat(dir, changed, f->name, newbaselen);
 			sources++;
 		} else {
-			this = f->added + f->deleted;
+			this = f->changed;
 			dir->files++;
 			dir->nr--;
 			sources += 2;
@@ -1044,17 +1049,58 @@ static long gather_dirstat(struct diffstat_dir *dir, unsigned long changed, cons
 	return this_dir;
 }
 
-static void show_dirstat(struct diffstat_t *data, struct diff_options *options)
+static void show_dirstat(struct diff_options *options)
 {
 	int i;
 	unsigned long changed;
-	struct diffstat_dir dir;
+	struct dirstat_dir dir;
+	struct diff_queue_struct *q = &diff_queued_diff;
+
+	dir.files = NULL;
+	dir.alloc = 0;
+	dir.nr = 0;
+	dir.percent = options->dirstat_percent;
+	dir.cumulative = options->output_format & DIFF_FORMAT_CUMULATIVE;
 
-	/* Calculate total changes */
 	changed = 0;
-	for (i = 0; i < data->nr; i++) {
-		changed += data->files[i]->added;
-		changed += data->files[i]->deleted;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		const char *name;
+		unsigned long copied, added, damage;
+
+		name = p->one->path ? p->one->path : p->two->path;
+
+		if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
+			diff_populate_filespec(p->one, 0);
+			diff_populate_filespec(p->two, 0);
+			diffcore_count_changes(p->one, p->two, NULL, NULL, 0,
+					       &copied, &added);
+			diff_free_filespec_data(p->one);
+			diff_free_filespec_data(p->two);
+		} else if (DIFF_FILE_VALID(p->one)) {
+			diff_populate_filespec(p->one, 1);
+			copied = added = 0;
+			diff_free_filespec_data(p->one);
+		} else if (DIFF_FILE_VALID(p->two)) {
+			diff_populate_filespec(p->two, 1);
+			copied = 0;
+			added = p->two->size;
+			diff_free_filespec_data(p->two);
+		} else
+			continue;
+
+		/*
+		 * Original minus copied is the removed material,
+		 * added is the new material.  They are both damages
+		 * made to the preimage.
+		 */
+		damage = (p->one->size - copied) + added;
+
+		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
+		dir.files[dir.nr].name = name;
+		dir.files[dir.nr].changed = damage;
+		changed += damage;
+		dir.nr++;
 	}
 
 	/* This can happen even with many files, if everything was renames */
@@ -1062,10 +1108,6 @@ static void show_dirstat(struct diffstat_t *data, struct diff_options *options)
 		return;
 
 	/* Show all directories with more than x% of the changes */
-	dir.files = data->files;
-	dir.nr = data->nr;
-	dir.percent = options->dirstat_percent;
-	dir.cumulative = options->output_format & DIFF_FORMAT_CUMULATIVE;
 	gather_dirstat(&dir, changed, "", 0);
 }
 
@@ -3024,7 +3066,7 @@ void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIRSTAT)) {
+	if (output_format & (DIFF_FORMAT_DIFFSTAT|DIFF_FORMAT_SHORTSTAT|DIFF_FORMAT_NUMSTAT)) {
 		struct diffstat_t diffstat;
 
 		memset(&diffstat, 0, sizeof(struct diffstat_t));
@@ -3034,8 +3076,6 @@ void diff_flush(struct diff_options *options)
 			if (check_pair_status(p))
 				diff_flush_stat(p, options, &diffstat);
 		}
-		if (output_format & DIFF_FORMAT_DIRSTAT)
-			show_dirstat(&diffstat, options);
 		if (output_format & DIFF_FORMAT_NUMSTAT)
 			show_numstat(&diffstat, options);
 		if (output_format & DIFF_FORMAT_DIFFSTAT)
@@ -3045,6 +3085,8 @@ void diff_flush(struct diff_options *options)
 		free_diffstat_info(&diffstat);
 		separator++;
 	}
+	if (output_format & DIFF_FORMAT_DIRSTAT)
+		show_dirstat(options);
 
 	if (output_format & DIFF_FORMAT_SUMMARY && !is_summary_empty(q)) {
 		for (i = 0; i < q->nr; i++)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-13  0:54   ` Junio C Hamano
@ 2008-02-13  1:10     ` Linus Torvalds
  2008-02-13  1:27       ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2008-02-13  1:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List



On Tue, 12 Feb 2008, Junio C Hamano wrote:
> 
> You can drop the "line count" idea and instead count the "amount
> of damage" done to the preimage, just like we do in the rename
> similarity computation.
> 
> The attached patch is just an outline.  There may need special
> cases for unmerged paths.

Ouch. I like the concept and the result, but an not a huge fan of the the 
implementation. The problem with this is that it makes the whole damage 
computation something separate from the earlier phases.

So it actually seems to add a lot of cost. Right now there is basically 
zero added cost to just adding a "--dirstat" to one of my common 
operations, which is

	git diff -M --stat --summary

(that's what I do on merges). 

Wouldn't it be nicer to merge it with one of the stages we did earlier by 
simply saving the data. If not the --stat phase, then the -M phase? 

			Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-13  1:10     ` Linus Torvalds
@ 2008-02-13  1:27       ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2008-02-13  1:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 12 Feb 2008, Junio C Hamano wrote:
>> 
>> You can drop the "line count" idea and instead count the "amount
>> of damage" done to the preimage, just like we do in the rename
>> similarity computation.
>> 
>> The attached patch is just an outline.  There may need special
>> cases for unmerged paths.
>
> Ouch. I like the concept and the result, but an not a huge fan of the the 
> implementation. The problem with this is that it makes the whole damage 
> computation something separate from the earlier phases.
>
> So it actually seems to add a lot of cost. Right now there is basically 
> zero added cost to just adding a "--dirstat" to one of my common 
> operations, which is
>
> 	git diff -M --stat --summary
>
> (that's what I do on merges). 
>
> Wouldn't it be nicer to merge it with one of the stages we did earlier by 
> simply saving the data. If not the --stat phase, then the -M phase? 

We could add one ulong to diff_filepair struct to record
"damage", and populate it inside diffcore_rename().

Obviously:

 - You would need to diffcore_count_changes() in separate phase
   if you do not use -M;

 - Even if you use -M, we do not run diffcore_count_changes()
   for unrenamed paths during diffcore_rename(), so you would
   need to do that separately.

By the way, if you run "git diff --summary --dirstat -M" (that
is, without --stat), we will not have to run diffstat at all.
Instead, we only run diffcore_count_changes(), which can be
reused for renamed paths if we keep it in diff_filepair struct.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-12 21:26 [PATCH updated] Add "--dirstat" for some directory statistics Linus Torvalds
  2008-02-12 21:56 ` Linus Torvalds
@ 2008-02-13  7:55 ` Abdelrazak Younes
  2008-02-13 13:48   ` Johannes Schindelin
  2008-02-13 14:07   ` Wincent Colaiuta
  1 sibling, 2 replies; 8+ messages in thread
From: Abdelrazak Younes @ 2008-02-13  7:55 UTC (permalink / raw)
  To: git

Linus Torvalds wrote:
> 	   7.6% fs/afs/
> 	   7.6% fs/fuse/
> 	   7.6% fs/gfs2/
> 	   5.1% fs/jffs2/
> 	   5.1% fs/nfs/
> 	   5.1% fs/nfsd/
> 	   7.6% fs/reiserfs/
> 	  15.3% fs/
[...]
> For an example of the cumulative reporting, the above commit becomes
[...]
> 	   7.6% fs/afs/
> 	   7.6% fs/fuse/
> 	   7.6% fs/gfs2/
> 	   5.1% fs/jffs2/
> 	   5.1% fs/nfs/
> 	   5.1% fs/nfsd/
> 	   7.6% fs/reiserfs/
> 	  61.5% fs/


May I suggest this instead so to get rid of the cumulative option?

  	   7.6% fs/afs
  	   7.6% fs/fuse
  	   7.6% fs/gfs2
  	   5.1% fs/jffs2
  	   5.1% fs/nfs
  	   5.1% fs/nfsd
  	   7.6% fs/reiserfs
  	  15.3% fs/
  	  61.5% fs

A trailing slash would mean "no recursive, only this directory" and no 
trailing slash means well the opposite :-)

Abdel.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-13  7:55 ` Abdelrazak Younes
@ 2008-02-13 13:48   ` Johannes Schindelin
  2008-02-13 14:07   ` Wincent Colaiuta
  1 sibling, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2008-02-13 13:48 UTC (permalink / raw)
  To: Abdelrazak Younes; +Cc: git

Hi,

On Wed, 13 Feb 2008, Abdelrazak Younes wrote:

> Linus Torvalds wrote:
> > 	   7.6% fs/afs/
> > 	   7.6% fs/fuse/
> > 	   7.6% fs/gfs2/
> > 	   5.1% fs/jffs2/
> > 	   5.1% fs/nfs/
> > 	   5.1% fs/nfsd/
> > 	   7.6% fs/reiserfs/
> > 	  15.3% fs/
> [...]
> > For an example of the cumulative reporting, the above commit becomes
> [...]
> > 	   7.6% fs/afs/
> > 	   7.6% fs/fuse/
> > 	   7.6% fs/gfs2/
> > 	   5.1% fs/jffs2/
> > 	   5.1% fs/nfs/
> > 	   5.1% fs/nfsd/
> > 	   7.6% fs/reiserfs/
> > 	  61.5% fs/
> 
> 
> May I suggest this instead so to get rid of the cumulative option?
> 
>  	   7.6% fs/afs
>  	   7.6% fs/fuse
>  	   7.6% fs/gfs2
>  	   5.1% fs/jffs2
>  	   5.1% fs/nfs
>  	   5.1% fs/nfsd
>  	   7.6% fs/reiserfs
>  	  15.3% fs/
>  	  61.5% fs
> 
> A trailing slash would mean "no recursive, only this directory" and no
> trailing slash means well the opposite :-)

My 2cents: I find that highly counterintuitive (yes, I am a frequent user 
of "du").

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH updated] Add "--dirstat" for some directory statistics
  2008-02-13  7:55 ` Abdelrazak Younes
  2008-02-13 13:48   ` Johannes Schindelin
@ 2008-02-13 14:07   ` Wincent Colaiuta
  1 sibling, 0 replies; 8+ messages in thread
From: Wincent Colaiuta @ 2008-02-13 14:07 UTC (permalink / raw)
  To: Abdelrazak Younes; +Cc: git

El 13/2/2008, a las 8:55, Abdelrazak Younes escribió:

> May I suggest this instead so to get rid of the cumulative option?
>
> 	   7.6% fs/afs
> 	   7.6% fs/fuse
> 	   7.6% fs/gfs2
> 	   5.1% fs/jffs2
> 	   5.1% fs/nfs
> 	   5.1% fs/nfsd
> 	   7.6% fs/reiserfs
> 	  15.3% fs/
> 	  61.5% fs
>
> A trailing slash would mean "no recursive, only this directory" and  
> no trailing slash means well the opposite :-)

I think that's getting a little _too_ subtle.

Cheers,
Wincent

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-02-13 14:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-12 21:26 [PATCH updated] Add "--dirstat" for some directory statistics Linus Torvalds
2008-02-12 21:56 ` Linus Torvalds
2008-02-13  0:54   ` Junio C Hamano
2008-02-13  1:10     ` Linus Torvalds
2008-02-13  1:27       ` Junio C Hamano
2008-02-13  7:55 ` Abdelrazak Younes
2008-02-13 13:48   ` Johannes Schindelin
2008-02-13 14:07   ` Wincent Colaiuta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox