public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Making diff(1) of linux kernels faster
@ 2001-10-18 12:39 Marco C. Mason
  0 siblings, 0 replies; 23+ messages in thread
From: Marco C. Mason @ 2001-10-18 12:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: p_gortmaker

Paul--

Paul Gortmaker wrote:
> +          if (recursive && preread_dir)
> +           {
> +              preread(inf[0].name);
> +              preread(inf[1].name);
> +            }

While looking over your patch, I notice that you preload *both*
directories.  (At least, that's what the code above appears to do).
Have you tried just preloading one?  This may still give you the speed
benefit (as you'll most likely reduce the seeking) and put less pressure
on the memory system.

--marco



^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: Making diff(1) of linux kernels faster
@ 2001-10-22 16:39 Andries.Brouwer
  0 siblings, 0 replies; 23+ messages in thread
From: Andries.Brouwer @ 2001-10-22 16:39 UTC (permalink / raw)
  To: linux-kernel, moz; +Cc: p_gortmaker

>> Who's the maintainer for "diff" these days?

> afaict, there is no maintainer.
> The stated maintainer has been ignoring patches for years.

-rw-r--r--   1 aeb        312312 Oct  2  1994 diffutils-2.7.tar.gz

Yes, if that is the latest, that is old.

I wouldn't mind adding diff to util-linux until
the FSF maintainer wakes up.

(Am using a modified diff myself - one that doesn't give
a lot of output for diff after a successful cp -a,
and does not get into a loop when /etc/net is a symlink to /etc.)

Andries

[But try the FSF first. Is it Paul Eggert?]

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: Making diff(1) of linux kernels faster
@ 2001-10-18 14:48 Sean Neakums
  0 siblings, 0 replies; 23+ messages in thread
From: Sean Neakums @ 2001-10-18 14:48 UTC (permalink / raw)
  To: Horst von Brand; +Cc: linux-kernel

begin  Horst von Brand quotation:
> =?iso-8859-1?q?willy=20tarreau?= <wtarreau@yahoo.fr> said:
> 
> > Be very careful not to modify a multi-linked file, or
> > it will be damaged in all trees and won't be seen by
> > diff. your editor must unlink before saving.
> 
> Most don't. ed(1), vi(1) and emacs(1) are careful tro write to the
> very same file. jed(1) is the only outlier I'm aware of...

If Emacs is configured to save backups (it is shipped with this option
on by default) the existing file is renamed to the backup name and the
new, changed file is saved in a fresh file.  Thus the trick of diffing
two co-linked trees of files should work as expected.

Emacs users should look in the info node "(emacs)Backup Copying" for
complete information on this.

-- 
 /////////////////  |                  | The spark of a pin
<sneakums@zork.net> |  (require 'gnu)  | dropping, falling feather-like.
 \\\\\\\\\\\\\\\\\  |                  | There is too much noise.

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: Making diff(1) of linux kernels faster
@ 2001-10-17 17:57 willy tarreau
  2001-10-18  0:25 ` Horst von Brand
  0 siblings, 1 reply; 23+ messages in thread
From: willy tarreau @ 2001-10-17 17:57 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: linux-kernel

Hi Paul !

congratulations for this improvement, it seems really
interesting. BTW, I personnaly use hard links between
kernels to make the effective data set smaller, and
I'd
like to explain here how I proceed since there are
often people who seem completely amazed by this method
which I learned here on LKML a few years ago :

# cd /usr/src
# tar Ixf anydir/linux-2.4.12.tar.bz2
# cp -dRflp linux linux-2.4.12
>>> this way, only dir entries are duplicated, so very
>>> little overhead
# (cd linux && bzcat anydir/patch-2.4.13pre1.bz2|patch
-Np1)
# cp -dRflp linux linux-2.4.13pre1
>>> now, only file affected by the patch are
duplicated
>>> then, you can work inside linux dir, and construct
>>> your patches very quickly since a few files
>>> effectively differ from your new tree and old
ones.

Be very careful not to modify a multi-linked file, or
it will be damaged in all trees and won't be seen by
diff. your editor must unlink before saving.

I hope it will help someone as it has helped me for a
while now. I nearly always have sub-second diffs, even
with not-so-much RAM.

Cheers,
Willy


___________________________________________________________
Un nouveau Nokia Game commence. 
Allez sur http://fr.yahoo.com/nokiagame avant le 3 novembre
pour participer à cette aventure tous médias.

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Making diff(1) of linux kernels faster
@ 2001-10-14  8:58 Paul Gortmaker
  2001-10-14  9:51 ` john slee
  2001-10-14 15:48 ` Linus Torvalds
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Gortmaker @ 2001-10-14  8:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


A while ago somebody with too much memory was gloating that they 
would do a "find ... xargs cat>/dev/null" on several 2.4.x trees
so that diff wouldn't thrash the disk with a million seeks  :-)

Well, I taught diff to read each tree sequentially 1st and the results
were quite surprising (linux-2.2 kernel, two identical 8 MB trees, on 
some older hardware, average times reported, new diff option is "-z").

   diff -urN, nothing cached:  36 seconds
   diff -urzN, nothing cached:  7.5 seconds  (about 1/5 !!!!!)

   diff -urN, all cached:  1.04 seconds
   diff -urzN, all cached: 1.66 seconds

So, with the cold cache, my patch cut the time by a factor of 5(!!)
and the amount of audible death growls from the disk is also reduced.  
In the warm case, you pay a slight penalty since the simple hack
doesn't try to keep the file data around while priming the cache.

Now if I only had enough ram to personally test how much it helps
against a couple of 2.4.x kernel trees...  other stats welcomed.

Paul.

diff -ruz orig/diffutils-2.7/diff.c diffutils-2.7/diff.c
--- orig/diffutils-2.7/diff.c	Thu Sep 22 12:47:00 1994
+++ diffutils-2.7/diff.c	Sun Oct 14 03:59:33 2001
@@ -206,6 +206,7 @@
   {"exclude", 1, 0, 'x'},
   {"exclude-from", 1, 0, 'X'},
   {"side-by-side", 0, 0, 'y'},
+  {"zoom", 0, 0, 'z'},
   {"unified", 2, 0, 'U'},
   {"left-column", 0, 0, 129},
   {"suppress-common-lines", 0, 0, 130},
@@ -244,7 +245,7 @@
   /* Decode the options.  */
 
   while ((c = getopt_long (argc, argv,
-			   "0123456789abBcC:dD:efF:hHiI:lL:nNpPqrsS:tTuU:vwW:x:X:y",
+			   "0123456789abBcC:dD:efF:hHiI:lL:nNpPqrsS:tTuU:vwW:x:X:yz",
 			   longopts, 0)) != EOF)
     {
       switch (c)
@@ -493,6 +494,11 @@
 	  specify_style (OUTPUT_SDIFF);
 	  break;
 
+	case 'z':
+	  /* Pre-read each tree sequentially to prime cache, avoid seeks. */
+	  preread_tree = 1;
+	  break;
+
 	case 'W':
 	  /* Set the line width for OUTPUT_SDIFF.  */
 	  if (ck_atoi (optarg, &width) || width <= 0)
@@ -736,6 +742,7 @@
 "-S FILE  --starting-file=FILE  Start with FILE when comparing directories.\n",
 "--horizon-lines=NUM  Keep NUM lines of the common prefix and suffix.",
 "-d  --minimal  Try hard to find a smaller set of changes.",
+"-z  --zoom  Assume both trees (with -r) will fit into machine core.",
 "-H  --speed-large-files  Assume large files and many scattered small changes.\n",
 "-v  --version  Output version info.",
 "--help  Output this help.",
@@ -990,6 +997,15 @@
 	}
       else
 	{
+
+          /* Sometimes faster to load each tree into OS's cache 1st */
+
+          if (depth == 0 && recursive && preread_tree)
+	    {
+              preread(inf[0].name);
+              preread(inf[1].name);
+            }
+		
 	  val = diff_dirs (inf, compare_files, depth);
 	}
 
diff -ruz orig/diffutils-2.7/diff.h diffutils-2.7/diff.h
--- orig/diffutils-2.7/diff.h	Thu Sep 22 12:47:00 1994
+++ diffutils-2.7/diff.h	Fri Oct 12 11:50:43 2001
@@ -93,6 +93,9 @@
 /* File labels for `-c' output headers (-L).  */
 EXTERN char *file_label[2];
 
+/* 1 if trees should be read sequentially to avoid seeks during recursive. */
+EXTERN int	preread_tree;
+
 struct regexp_list
 {
   struct re_pattern_buffer buf;
diff -ruz orig/diffutils-2.7/io.c diffutils-2.7/io.c
--- orig/diffutils-2.7/io.c	Thu Sep 22 12:47:00 1994
+++ diffutils-2.7/io.c	Fri Oct 12 11:51:55 2001
@@ -182,6 +182,64 @@
       current->buffer = xrealloc (current->buffer, current->bufsize);
     }
 }
+
+/* Preload the OS's cache with all files of one branch for recursive diffs */
+
+void
+preread (dir)
+	const char *dir;
+{
+
+  DIR *d;
+  struct dirent *dent;
+
+  d = opendir(dir);
+  if (d == NULL) return;
+
+  while ((dent = readdir(d)) != NULL)
+    {
+
+      char *name, *path;
+      struct file_data *f;
+
+      name = dent->d_name;
+      if (name[0] == '.' && (name[1] == 0 || (name[1] == '.' && name[2] == 0)))
+            continue;
+
+      f = xmalloc(sizeof(struct file_data));
+      memset(f, 0, sizeof(struct file_data));
+
+      path = xmalloc(strlen(dir)+strlen(name)+2);
+      strcpy(path, dir);
+      strcat(path, "/");
+      strcat(path, name);
+
+      if (stat(path, &f->stat) != 0)
+        {
+           free(f);
+           free(path);
+           continue;
+        }
+	
+      if (S_ISDIR(f->stat.st_mode))
+           preread(path);
+      else if (S_ISREG(f->stat.st_mode))
+        {
+          f->desc = open(path, O_RDONLY);
+          if (f->desc != -1)
+            {
+              slurp(f); 
+              if (f->bufsize != 0)
+                free(f->buffer);
+              close(f->desc);
+            }
+        } 
+      free(path);
+      free(f); 
+  }
+  closedir(d);
+}
+
 \f
 /* Split the file into lines, simultaneously computing the equivalence class for
    each line. */



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2001-10-22 16:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-18 12:39 Making diff(1) of linux kernels faster Marco C. Mason
  -- strict thread matches above, loose matches on Subject: below --
2001-10-22 16:39 Andries.Brouwer
2001-10-18 14:48 Sean Neakums
2001-10-17 17:57 willy tarreau
2001-10-18  0:25 ` Horst von Brand
2001-10-18  8:02   ` Nick Craig-Wood
2001-10-18  9:55     ` Wojtek Pilorz
2001-10-18 11:18       ` vda
2001-10-14  8:58 Paul Gortmaker
2001-10-14  9:51 ` john slee
2001-10-14 15:48 ` Linus Torvalds
2001-10-17 12:25   ` Paul Gortmaker
2001-10-17 16:59     ` Linus Torvalds
2001-10-17 16:44       ` Marcelo Tosatti
2001-10-17 18:21         ` Linus Torvalds
2001-10-17 20:21           ` Andrea Arcangeli
2001-10-17 19:06             ` Marcelo Tosatti
2001-10-17 21:23             ` chris
2001-10-17 21:30               ` Andrea Arcangeli
2001-10-17 21:45               ` Linus Torvalds
2001-10-17 17:12       ` John Levon
2001-10-17 19:19       ` Benjamin LaHaise
2001-10-17 18:50     ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox