git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] Added generic string handling code.
@ 2007-05-20  2:24 Timo Sirainen
  2007-05-20 10:01 ` Alex Riesen
  2007-05-22 13:40 ` Petr Baudis
  0 siblings, 2 replies; 7+ messages in thread
From: Timo Sirainen @ 2007-05-20  2:24 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4647 bytes --]

Aren't you already tired of using the crappy string handling functions
that libc provides? I see a lot of really ugly code in git that exists
just because this.

I also see a lot of potential buffer overflows because either no
overflow checking is done, or it's done wrong. Perhaps it doesn't matter
now if you're manually inspecting each patch before feeding to git, but
I fear that in future someone's automated git handler will be
responsible for getting malicious code added into Linux, just because of
a simple buffer overflow that could have been easily avoided.

So here's my try on starting with something simple. Unlike almost all
other string handling libraries, it doesn't allocate the memory
dynamically. This makes it really easy to convert existing code to use
it. I'm including some example changes in the other patches. Besides
making the code safer, it can also make it faster, especially those
strcat() replacements.

I'm aware of strbuf.[ch], but I wasn't sure if I should have merged this
code with it or what. It had this "eof" field which I think makes it
more like a "file reader string" and not a "string buffer". So I just
added new str.[ch] files.

---
 Makefile |    4 ++--
 str.c    |   40 ++++++++++++++++++++++++++++++++++++++++
 str.h    |   32 ++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 2 deletions(-)
 create mode 100644 str.c
 create mode 100644 str.h

diff --git a/Makefile b/Makefile
index 29243c6..f61ad50 100644
--- a/Makefile
+++ b/Makefile
@@ -294,7 +294,7 @@ XDIFF_LIB=xdiff/lib.a
 LIB_H = \
 	archive.h blob.h cache.h commit.h csum-file.h delta.h grep.h \
 	diff.h object.h pack.h pkt-line.h quote.h refs.h list-objects.h sideband.h \
-	run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h \
+	run-command.h strbuf.h str.h tag.h tree.h git-compat-util.h revision.h \
 	tree-walk.h log-tree.h dir.h path-list.h unpack-trees.h builtin.h \
 	utf8.h reflog-walk.h patch-ids.h attr.h decorate.h progress.h mailmap.h
 
@@ -312,7 +312,7 @@ LIB_OBJS = \
 	object.o pack-check.o pack-write.o patch-delta.o path.o pkt-line.o \
 	sideband.o reachable.o reflog-walk.o \
 	quote.o read-cache.o refs.o run-command.o dir.o object-refs.o \
-	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
+	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o str.o \
 	tag.o tree.o usage.o config.o environment.o ctype.o copy.o \
 	revision.o pager.o tree-walk.o xdiff-interface.o \
 	write_or_die.o trace.o list-objects.o grep.o match-trees.o \
diff --git a/str.c b/str.c
new file mode 100644
index 0000000..d46e7f4
--- /dev/null
+++ b/str.c
@@ -0,0 +1,40 @@
+#include "str.h"
+
+void _str_append(struct string *str, const char *cstr)
+{
+	unsigned int avail = str->size - str->len;
+	unsigned int len = strlen(cstr);
+
+	if (len >= avail) {
+		len = avail - 1;
+		str->overflowed = 1;
+	}
+	memcpy(str->buf + str->len, cstr, len);
+	str->len += len;
+	str->buf[str->len] = '\0';
+}
+
+void _str_printfa(struct string *str, const char *fmt, ...)
+{
+	unsigned int avail = str->size - str->len;
+	va_list va;
+	int ret;
+
+	va_start(va, fmt);
+	ret = vsnprintf(str->buf + str->len, avail, fmt, va);
+	if (ret < avail)
+		str->len += ret;
+	else {
+		str->len += avail - 1;
+		str->overflowed = 1;
+	}
+	va_end(va);
+}
+
+void _str_truncate(struct string *str, unsigned int len)
+{
+	if (len >= str->size)
+		len = str->size - 1;
+	str->len = len;
+	str->buf[len] = '\0';
+}
diff --git a/str.h b/str.h
new file mode 100644
index 0000000..99d3215
--- /dev/null
+++ b/str.h
@@ -0,0 +1,32 @@
+#ifndef STR_H
+#define STR_H
+
+#include "git-compat-util.h"
+
+struct string {
+	unsigned int size;
+	unsigned int len:31;
+	unsigned int overflowed:1;
+	char buf[];
+};
+
+#define stringbuf(name, size) \
+	union { \
+	  struct string string; \
+	  char string_buf[sizeof(struct string) + (size) + 1]; \
+	} name = { { (size)+1, 0, 0 } }
+
+extern void _str_append(struct string *str, const char *cstr);
+extern void _str_printfa(struct string *str, const char *fmt, ...)
+	__attribute__((format (printf, 2, 3)));
+extern void _str_truncate(struct string *str, unsigned int len);
+
+#define str_append(str, cstr) _str_append(&(str).string, cstr)
+#define str_printfa(str, fmt, ...) _str_printfa(&(str).string, fmt, __VA_ARGS__)
+#define str_truncate(str, len) _str_truncate(&(str).string, len)
+
+#define str_c(str) ((str).string.buf)
+#define str_len(str) ((str).string.len)
+#define str_overflowed(str) ((str).string.overflowed)
+
+#endif
-- 
1.5.1.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] Added generic string handling code.
  2007-05-20  2:24 [PATCH 1/3] Added generic string handling code Timo Sirainen
@ 2007-05-20 10:01 ` Alex Riesen
  2007-05-20 10:27   ` Timo Sirainen
  2007-05-22 13:40 ` Petr Baudis
  1 sibling, 1 reply; 7+ messages in thread
From: Alex Riesen @ 2007-05-20 10:01 UTC (permalink / raw)
  To: Timo Sirainen; +Cc: git

Timo Sirainen, Sun, May 20, 2007 04:24:29 +0200:
> So here's my try on starting with something simple. Unlike almost all
> other string handling libraries, it doesn't allocate the memory
> dynamically.

Sometimes you _need_ dinamic memory allocation.

> This makes it really easy to convert existing code to use it. I'm
> including some example changes in the other patches. Besides making
> the code safer, it can also make it faster, especially those
> strcat() replacements.

It is also bigger, heavier on stack and sometimes slower because of
more function calls involved.

Aside from that, I like it. I wouldn't use it universally, but
there were times when I wished it has been be done this way.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] Added generic string handling code.
  2007-05-20 10:01 ` Alex Riesen
@ 2007-05-20 10:27   ` Timo Sirainen
  0 siblings, 0 replies; 7+ messages in thread
From: Timo Sirainen @ 2007-05-20 10:27 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

On 20.5.2007, at 13.01, Alex Riesen wrote:

> Timo Sirainen, Sun, May 20, 2007 04:24:29 +0200:
>> So here's my try on starting with something simple. Unlike almost all
>> other string handling libraries, it doesn't allocate the memory
>> dynamically.
>
> Sometimes you _need_ dinamic memory allocation.

It's easy to use the same str_*() functions to implement dynamic  
memory allocation. I think I could have done a bit different naming,  
like maybe:

extern struct string *str_alloc(unsigned int len);
extern void str_append(struct string *str, const char *cstr);

#define static_string(name, size) ..
#define sstr_append(str, cstr) str_append(&(str).string, cstr)


>> This makes it really easy to convert existing code to use it. I'm
>> including some example changes in the other patches. Besides making
>> the code safer, it can also make it faster, especially those
>> strcat() replacements.
>
> It is also bigger, heavier on stack and sometimes slower because of
> more function calls involved.

I would hardly call 8 extra bytes on stack heavier. Also if this was  
used everywhere I wouldn't be surprised if it made the code faster,  
because it would remove a lot of overflow checking code so more code  
will fit into L1 cache.




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] Added generic string handling code.
  2007-05-20  2:24 [PATCH 1/3] Added generic string handling code Timo Sirainen
  2007-05-20 10:01 ` Alex Riesen
@ 2007-05-22 13:40 ` Petr Baudis
  2007-05-23 10:49   ` Timo Sirainen
  1 sibling, 1 reply; 7+ messages in thread
From: Petr Baudis @ 2007-05-22 13:40 UTC (permalink / raw)
  To: Timo Sirainen; +Cc: git

On Sun, May 20, 2007 at 04:24:29AM CEST, Timo Sirainen wrote:
> diff --git a/str.c b/str.c
> new file mode 100644
> index 0000000..d46e7f4
> --- /dev/null
> +++ b/str.c
> @@ -0,0 +1,40 @@
> +#include "str.h"
> +
> +void _str_append(struct string *str, const char *cstr)

_ is reserved namespace.

> +{
> +	unsigned int avail = str->size - str->len;
> +	unsigned int len = strlen(cstr);
> +
> +	if (len >= avail) {
> +		len = avail - 1;
> +		str->overflowed = 1;
> +	}
> +	memcpy(str->buf + str->len, cstr, len);
> +	str->len += len;
> +	str->buf[str->len] = '\0';

You can copy len + 1 and avoid this assignment.

> +}
> +
> +void _str_printfa(struct string *str, const char *fmt, ...)

printfA?

> +{
> +	unsigned int avail = str->size - str->len;
> +	va_list va;
> +	int ret;
> +
> +	va_start(va, fmt);
> +	ret = vsnprintf(str->buf + str->len, avail, fmt, va);
> +	if (ret < avail)
> +		str->len += ret;
> +	else {
> +		str->len += avail - 1;
> +		str->overflowed = 1;
> +	}
> +	va_end(va);
> +}

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] Added generic string handling code.
  2007-05-22 13:40 ` Petr Baudis
@ 2007-05-23 10:49   ` Timo Sirainen
  2007-05-23 13:24     ` Petr Baudis
  0 siblings, 1 reply; 7+ messages in thread
From: Timo Sirainen @ 2007-05-23 10:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

On Tue, 2007-05-22 at 15:40 +0200, Petr Baudis wrote:
> On Sun, May 20, 2007 at 04:24:29AM CEST, Timo Sirainen wrote:
> > diff --git a/str.c b/str.c
> > new file mode 100644
> > index 0000000..d46e7f4
> > --- /dev/null
> > +++ b/str.c
> > @@ -0,0 +1,40 @@
> > +#include "str.h"
> > +
> > +void _str_append(struct string *str, const char *cstr)
> 
> _ is reserved namespace.

I remember __ is, but was _ too? A lot of programs are using that. :)

> > +{
> > +	unsigned int avail = str->size - str->len;
> > +	unsigned int len = strlen(cstr);
> > +
> > +	if (len >= avail) {
> > +		len = avail - 1;
> > +		str->overflowed = 1;
> > +	}
> > +	memcpy(str->buf + str->len, cstr, len);
> > +	str->len += len;
> > +	str->buf[str->len] = '\0';
> 
> You can copy len + 1 and avoid this assignment.

Not if the string overflowed.

> > +}
> > +
> > +void _str_printfa(struct string *str, const char *fmt, ...)
> 
> printfA?

"append". I think I got it originally from glib:

/* These aliases are included for compatibility. */
#define g_string_sprintf g_string_printf
#define g_string_sprintfa g_string_append_printf

If there's a chance that this string handling code would get used, I
could write another patch with a bit clearer names and support for
dynamically growing strings too. Something like:

STATIC_STRING(name, 1234);
sstr_append(name, "hello"); // or static_str_append()?

struct string *dyn = str_new(1024); // initial length
str_append(dyn, "hello");


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] Added generic string handling code.
  2007-05-23 10:49   ` Timo Sirainen
@ 2007-05-23 13:24     ` Petr Baudis
  2007-05-23 13:56       ` Timo Sirainen
  0 siblings, 1 reply; 7+ messages in thread
From: Petr Baudis @ 2007-05-23 13:24 UTC (permalink / raw)
  To: Timo Sirainen; +Cc: git

On Wed, May 23, 2007 at 12:49:46PM CEST, Timo Sirainen wrote:
> On Tue, 2007-05-22 at 15:40 +0200, Petr Baudis wrote:
> > On Sun, May 20, 2007 at 04:24:29AM CEST, Timo Sirainen wrote:
> > > diff --git a/str.c b/str.c
> > > new file mode 100644
> > > index 0000000..d46e7f4
> > > --- /dev/null
> > > +++ b/str.c
> > > @@ -0,0 +1,40 @@
> > > +#include "str.h"
> > > +
> > > +void _str_append(struct string *str, const char *cstr)
> > 
> > _ is reserved namespace.
> 
> I remember __ is, but was _ too? A lot of programs are using that. :)

C99 7.1.3 says that

   -- All identifiers that begin with an underscore are always reserved
for use as identifiers with file scope in both the ordinary and tag name
spaces.

Hmm, this _is_ file scope, right?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] Added generic string handling code.
  2007-05-23 13:24     ` Petr Baudis
@ 2007-05-23 13:56       ` Timo Sirainen
  0 siblings, 0 replies; 7+ messages in thread
From: Timo Sirainen @ 2007-05-23 13:56 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

On Wed, 2007-05-23 at 15:24 +0200, Petr Baudis wrote:
> > > > +void _str_append(struct string *str, const char *cstr)
> > > 
> > > _ is reserved namespace.
> > 
> > I remember __ is, but was _ too? A lot of programs are using that. :)
> 
> C99 7.1.3 says that
> 
>    -- All identifiers that begin with an underscore are always reserved
> for use as identifiers with file scope in both the ordinary and tag name
> spaces.
> 
> Hmm, this _is_ file scope, right?

Right. I'll start changing my practices. Although grepping
under /usr/include shows that there are a lot of other software that
doesn't respect it either (X11, MySQL, OpenSSL at least).


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-05-23 13:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-20  2:24 [PATCH 1/3] Added generic string handling code Timo Sirainen
2007-05-20 10:01 ` Alex Riesen
2007-05-20 10:27   ` Timo Sirainen
2007-05-22 13:40 ` Petr Baudis
2007-05-23 10:49   ` Timo Sirainen
2007-05-23 13:24     ` Petr Baudis
2007-05-23 13:56       ` Timo Sirainen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).