git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
@ 2007-07-11 18:50 Carlos Rica
  2007-07-11 19:17 ` Johannes Schindelin
  2007-07-11 22:24 ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Carlos Rica @ 2007-07-11 18:50 UTC (permalink / raw)
  To: git, Junio C Hamano, Johannes Schindelin, Kristian Høgsberg

An implementation easier to call from builtins. It is designed
to be used from the upcoming builtin-tag.c and builtin-commit.c,
because both need to remove unwanted spaces from messages.

Signed-off-by: Carlos Rica <jasampler@gmail.com>
---

  This new implementation is a response to some Junio's comments
  on the version of builtin-tag.c which Kristian released:
  http://article.gmane.org/gmane.comp.version-control.git/49601

  The aim on not making this function internal yet
  is to be able to test it easily using the extensive
  test suite designed for git-stripspace. Later,
  this function could be in the file named "editor.c",
  along with the "launch_editor" function called when the
  program asks the user for a message not given from
  command-line.

 builtin-stripspace.c |   61 ++++++++++++++++++++++++++++++-------------------
 builtin.h            |    2 +-
 2 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/builtin-stripspace.c b/builtin-stripspace.c
index d8358e2..949b640 100644
--- a/builtin-stripspace.c
+++ b/builtin-stripspace.c
@@ -2,12 +2,11 @@
 #include "cache.h"

 /*
- * Remove trailing spaces from a line.
+ * Returns the length of a line removing trailing spaces.
  *
  * If the line ends with newline, it will be removed too.
- * Returns the new length of the string.
  */
-static int cleanup(char *line, int len)
+static size_t cleanup(char *line, size_t len)
 {
 	if (len) {
 		if (line[len - 1] == '\n')
@@ -19,7 +18,6 @@ static int cleanup(char *line, int len)
 				break;
 			len--;
 		}
-		line[len] = 0;
 	}
 	return len;
 }
@@ -28,52 +26,67 @@ static int cleanup(char *line, int len)
  * Remove empty lines from the beginning and end
  * and also trailing spaces from every line.
  *
+ * Note that the buffer will not be null-terminated.
+ *
  * Turn multiple consecutive empty lines between paragraphs
  * into just one empty line.
  *
  * If the input has only empty lines and spaces,
  * no output will be produced.
  *
+ * If last line has a newline at the end, it will be removed.
+ *
  * Enable skip_comments to skip every line starting with "#".
  */
-void stripspace(FILE *in, FILE *out, int skip_comments)
+size_t stripspace(char *buffer, size_t length, int skip_comments)
 {
 	int empties = -1;
-	int alloc = 1024;
-	char *line = xmalloc(alloc);
+	size_t i, j, len, newlen;
+	char *eol;

-	while (fgets(line, alloc, in)) {
-		int len = strlen(line);
+	for (i = j = 0; i < length; i += len, j += newlen) {
+		eol = memchr(buffer + i, '\n', length - i);
+		len = eol ? eol - (buffer + i) + 1 : length - i;

-		while (len == alloc - 1 && line[len - 1] != '\n') {
-			alloc = alloc_nr(alloc);
-			line = xrealloc(line, alloc);
-			fgets(line + len, alloc - len, in);
-			len += strlen(line + len);
-		}
-
-		if (skip_comments && line[0] == '#')
+		if (skip_comments && len && buffer[i] == '#') {
+			newlen = 0;
 			continue;
-		len = cleanup(line, len);
+		}
+		newlen = cleanup(buffer + i, len);

 		/* Not just an empty line? */
-		if (len) {
+		if (newlen) {
+			if (empties != -1)
+				buffer[j++] = '\n';
 			if (empties > 0)
-				fputc('\n', out);
+				buffer[j++] = '\n';
 			empties = 0;
-			fputs(line, out);
-			fputc('\n', out);
+			memmove(buffer + j, buffer + i, newlen);
 			continue;
 		}
 		if (empties < 0)
 			continue;
 		empties++;
 	}
-	free(line);
+
+	return j;
 }

 int cmd_stripspace(int argc, const char **argv, const char *prefix)
 {
-	stripspace(stdin, stdout, 0);
+	char *buffer;
+	unsigned long size;
+
+	size = 1024;
+	buffer = xmalloc(size);
+	if (read_pipe(0, &buffer, &size))
+		die("could not read the input");
+
+	size = stripspace(buffer, size, 0);
+	write_or_die(1, buffer, size);
+	if (size)
+		putc('\n', stdout);
+
+	free(buffer);
 	return 0;
 }
diff --git a/builtin.h b/builtin.h
index 661a92f..4cc228d 100644
--- a/builtin.h
+++ b/builtin.h
@@ -7,7 +7,7 @@ extern const char git_version_string[];
 extern const char git_usage_string[];

 extern void help_unknown_cmd(const char *cmd);
-extern void stripspace(FILE *in, FILE *out, int skip_comments);
+extern size_t stripspace(char *buffer, size_t length, int skip_comments);
 extern int write_tree(unsigned char *sha1, int missing_ok, const char *prefix);
 extern void prune_packed_objects(int);

-- 
1.5.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
  2007-07-11 18:50 [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors Carlos Rica
@ 2007-07-11 19:17 ` Johannes Schindelin
  2007-07-11 22:24 ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2007-07-11 19:17 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git, Junio C Hamano, Kristian Høgsberg

Hi,

On Wed, 11 Jul 2007, Carlos Rica wrote:

> @@ -28,52 +26,67 @@ static int cleanup(char *line, int len)
>   * Remove empty lines from the beginning and end
>   * and also trailing spaces from every line.
>   *
> + * Note that the buffer will not be null-terminated.
> + *
>   * Turn multiple consecutive empty lines between paragraphs
>   * into just one empty line.
>   *
>   * If the input has only empty lines and spaces,
>   * no output will be produced.
>   *
> + * If last line has a newline at the end, it will be removed.
> + *

Please let me comment about the rationale for both changes: The 
stripspace() function (which this hunk is about) is more useful if it does 
not allocate a new buffer, but works in-place.

And since it knows the new length already, it can just as well return the 
length, and _not_ NUL terminate (which would mean that we have to 
reallocate if we used read_pipe() to get the buffer).

The reason for the missing newline at the end is the same: since we accept 
buffers with a missing newline at the end, we would have to reallocate in 
that case.

So for the sake of simplicity, we neither NUL-terminate, nor \n terminate 
the buffer, and leave that to the callers.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
  2007-07-11 18:50 [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors Carlos Rica
  2007-07-11 19:17 ` Johannes Schindelin
@ 2007-07-11 22:24 ` Junio C Hamano
  2007-07-11 23:20   ` Bill Lear
  2007-07-12  0:14   ` Carlos Rica
  1 sibling, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2007-07-11 22:24 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git, Johannes Schindelin, Kristian Høgsberg

Carlos Rica <jasampler@gmail.com> writes:

> diff --git a/builtin-stripspace.c b/builtin-stripspace.c
> index d8358e2..949b640 100644
> --- a/builtin-stripspace.c
> +++ b/builtin-stripspace.c
> @@ -2,12 +2,11 @@
>  #include "cache.h"
>
>  /*
> - * Remove trailing spaces from a line.
> + * Returns the length of a line removing trailing spaces.

This did not parse well for me; perhaps a comma before
"removing" would make it easier to read?

> @@ -28,52 +26,67 @@ static int cleanup(char *line, int len)
>   * Remove empty lines from the beginning and end
>   * and also trailing spaces from every line.
>   *
> + * Note that the buffer will not be null-terminated.
> + *

The name of the sentinel character '\0' is NUL, not null (which
is a different word, used to call a pointer that points
nowhere).  The buffer will not be "NUL-terminated".

> -void stripspace(FILE *in, FILE *out, int skip_comments)
> +size_t stripspace(char *buffer, size_t length, int skip_comments)
>  {
> +		newlen = cleanup(buffer + i, len);
>
>  		/* Not just an empty line? */
> +		if (newlen) {
> +			if (empties != -1)
> +				buffer[j++] = '\n';
>  			if (empties > 0)
> +				buffer[j++] = '\n';
>  			empties = 0;
> +			memmove(buffer + j, buffer + i, newlen);
>  			continue;
>  		}

It somehow strikes me odd that, given this:

	buffer       j        i
        **********************texttext  \n.......

you would first do this with cleanup():

	buffer       j        i
        **********************texttext\n.......

and then do this with this memmove():

	buffer       j        i
        *************texttext\n.......

Would it become simpler if cleanup() knew where the final text
goes (i.e. buffer+j)?

>  int cmd_stripspace(int argc, const char **argv, const char *prefix)
>  {
> -	stripspace(stdin, stdout, 0);
> +	char *buffer;
> +	unsigned long size;
> +
> +	size = 1024;
> +	buffer = xmalloc(size);
> +	if (read_pipe(0, &buffer, &size))
> +		die("could not read the input");

The command used to be capable of streaming and filtering a few
hundred gigabytes of text on a machine with small address space,
as it operated one line at a time, but now it cannot as it has
to hold everything in core before starting.

I do not think we miss that loss of capability too much, but I
wonder if we can be a bit more clever about it, perhaps feeding
a chunk at a time.  Not a very strong request, but just
wondering if it is an easy change.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
  2007-07-11 22:24 ` Junio C Hamano
@ 2007-07-11 23:20   ` Bill Lear
  2007-07-11 23:41     ` Carlos Rica
  2007-07-12  0:14   ` Carlos Rica
  1 sibling, 1 reply; 7+ messages in thread
From: Bill Lear @ 2007-07-11 23:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Carlos Rica, git, Johannes Schindelin, Kristian Høgsberg

On Wednesday, July 11, 2007 at 15:24:03 (-0700) Junio C Hamano writes:
>Carlos Rica <jasampler@gmail.com> writes:
>...
>> + * Returns the length of a line removing trailing spaces.
>
>This did not parse well for me; perhaps a comma before
>"removing" would make it easier to read?

Or:

Returns the length of a line, after removing trailing spaces.


Bill

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
  2007-07-11 23:20   ` Bill Lear
@ 2007-07-11 23:41     ` Carlos Rica
  2007-07-12  0:03       ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Rica @ 2007-07-11 23:41 UTC (permalink / raw)
  To: Bill Lear
  Cc: Junio C Hamano, git, Johannes Schindelin, Kristian Høgsberg

Now the function cleanup is not removing spaces at all, but only
counting where the line of text ends.

Before, in the previous version, the function cleanup was taking a
string as argument, and then it needed to modify that string.  Now, it
just returns the new length, "not counting" the spaces and newline at
the end of the buffer passed. Its name and comments then could be
different, but I didn't know which ones.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
  2007-07-11 23:41     ` Carlos Rica
@ 2007-07-12  0:03       ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2007-07-12  0:03 UTC (permalink / raw)
  To: Carlos Rica
  Cc: Bill Lear, Junio C Hamano, git, Johannes Schindelin,
	Kristian Høgsberg

"Carlos Rica" <jasampler@gmail.com> writes:

> Now the function cleanup is not removing spaces at all, but only
> counting where the line of text ends.
>
> Before, in the previous version, the function cleanup was taking a
> string as argument, and then it needed to modify that string.  Now, it
> just returns the new length, "not counting" the spaces and newline at
> the end of the buffer passed. Its name and comments then could be
> different, but I didn't know which ones.

Ah, you are right.  I misread the patch --- cleanup() does not
even touch the buffer.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors.
  2007-07-11 22:24 ` Junio C Hamano
  2007-07-11 23:20   ` Bill Lear
@ 2007-07-12  0:14   ` Carlos Rica
  1 sibling, 0 replies; 7+ messages in thread
From: Carlos Rica @ 2007-07-12  0:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin, Kristian Høgsberg

2007/7/12, Junio C Hamano <gitster@pobox.com>:
> Carlos Rica <jasampler@gmail.com> writes:
> > @@ -28,52 +26,67 @@ static int cleanup(char *line, int len)
> >   * Remove empty lines from the beginning and end
> >   * and also trailing spaces from every line.
> >   *
> > + * Note that the buffer will not be null-terminated.
> > + *
>
> The name of the sentinel character '\0' is NUL, not null (which
> is a different word, used to call a pointer that points
> nowhere).  The buffer will not be "NUL-terminated".

Thank you Junio, I will use it on the future.

> >  int cmd_stripspace(int argc, const char **argv, const char *prefix)
> >  {
> > -     stripspace(stdin, stdout, 0);
> > +     char *buffer;
> > +     unsigned long size;
> > +
> > +     size = 1024;
> > +     buffer = xmalloc(size);
> > +     if (read_pipe(0, &buffer, &size))
> > +             die("could not read the input");
>
> The command used to be capable of streaming and filtering a few
> hundred gigabytes of text on a machine with small address space,
> as it operated one line at a time, but now it cannot as it has
> to hold everything in core before starting.
>
> I do not think we miss that loss of capability too much, but I
> wonder if we can be a bit more clever about it, perhaps feeding
> a chunk at a time.  Not a very strong request, but just
> wondering if it is an easy change.

I did those changes because I was needing those tests that
I had written before in order to develop the function. After that,
we now can restore the previous function with file descriptors to
make it capable of filter a few hundred gigabytes of text, provided
that the text does not have long long lines on it.

Indeed, the implementation for composing a tag (header, cleaned
message and optional signature) in "builtin-tag.c", now pass it to
the function write_sha1_file as a buffer on memory, so it won't support
sizes bigger than memory available on the system. Messages should
not be so big, but I don't know how to limit those.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-07-12  0:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-11 18:50 [PATCH 1/2] Function stripspace now gets a buffer instead file descriptors Carlos Rica
2007-07-11 19:17 ` Johannes Schindelin
2007-07-11 22:24 ` Junio C Hamano
2007-07-11 23:20   ` Bill Lear
2007-07-11 23:41     ` Carlos Rica
2007-07-12  0:03       ` Junio C Hamano
2007-07-12  0:14   ` Carlos Rica

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).