* [PATCH] git-mailinfo: Fix getting the subject from the body
@ 2008-07-10 21:41 Lukas Sandström
[not found] ` <7vod55o0tx.fsf@gitster.siamese.dyndns.org>
2008-07-12 9:36 ` [PATCH] git-mailinfo: Fix getting the subject from the body Junio C Hamano
0 siblings, 2 replies; 14+ messages in thread
From: Lukas Sandström @ 2008-07-10 21:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
"Subject: " isn't in the static array "header", and thus
memcmp("Subject: ", header[i], 7) will never match.
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
This has been broken since 2007-03-12, with commit
87ab799234639c26ea10de74782fa511cb3ca606
so it might not be very important.
builtin-mailinfo.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 962aa34..2d1520f 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -334,7 +334,7 @@ static int check_header(char *line, unsigned linesize, char **hdr_data, int over
return 1;
if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
for (i = 0; header[i]; i++) {
- if (!memcmp("Subject: ", header[i], 9)) {
+ if (!memcmp("Subject", header[i], 7)) {
if (! handle_header(line, hdr_data[i], 0)) {
return 1;
}
--
1.5.4.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] git-mailinfo: Fix getting the subject from the body
[not found] ` <7vod55o0tx.fsf@gitster.siamese.dyndns.org>
@ 2008-07-10 22:37 ` Lukas Sandström
2008-07-10 23:25 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-10 22:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, Lukas Sandström
Junio C Hamano wrote:
> Lukas Sandström <lukass@etek.chalmers.se> writes:
>
>> "Subject: " isn't in the static array "header", and thus
>> memcmp("Subject: ", header[i], 7) will never match.
>>
>> Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
>> ---
>>
>> This has been broken since 2007-03-12, with commit
>> 87ab799234639c26ea10de74782fa511cb3ca606
>> so it might not be very important.
>
> Wow, thanks. Perhaps we would want some additional test scripts?
Yes, apparently.
After looking at this part some more, I see that there is no guarantee
that hdr_data[i] != NULL in this codepath, and then we won't use the
subject anyway.
I'm currently working on rewriting git-mailinfo to use strbuf's insted
of the preallocated buffers currently used. Do you want me to send a
patch allocating hdr_data[i], or can you wait for my strbuf-conversion
patch? It'll propably be ready for review tonight.
/Lukas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] git-mailinfo: Fix getting the subject from the body
2008-07-10 22:37 ` Lukas Sandström
@ 2008-07-10 23:25 ` Junio C Hamano
2008-07-10 23:41 ` [PATCH] Add some useful functions for strbuf manipulation Lukas Sandström
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2008-07-10 23:25 UTC (permalink / raw)
To: Lukas Sandström; +Cc: Git Mailing List
Lukas Sandström <lukass@etek.chalmers.se> writes:
> I'm currently working on rewriting git-mailinfo to use strbuf's insted
> of the preallocated buffers currently used. Do you want me to send a
> patch allocating hdr_data[i], or can you wait for my strbuf-conversion
> patch? It'll propably be ready for review tonight.
Heh, after getting burned by that NUL thingy, I was waiting for somebody
to step up. Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Add some useful functions for strbuf manipulation.
2008-07-10 23:25 ` Junio C Hamano
@ 2008-07-10 23:41 ` Lukas Sandström
2008-07-10 23:43 ` [PATCH/RFC] git-mailinfo: use strbuf's instead of fixed buffers Lukas Sandström
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-10 23:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
Junio C Hamano wrote:
> Heh, after getting burned by that NUL thingy, I was waiting for somebody
> to step up. Thanks.
Here we go then. Two freshly baked patches.
Note that this is a pretty straight buffers -> strbuf's conversion,
but I think it will be a good start for further hardening of mailinfo.
strbuf.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
strbuf.h | 6 +++++
2 files changed, 76 insertions(+), 0 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 4aed752..28d6776 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -60,6 +60,18 @@ void strbuf_grow(struct strbuf *sb, size_t extra)
ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
}
+void strbuf_trim(struct strbuf *sb)
+{
+ char *b = sb->buf;
+ while (sb->len > 0 && isspace((unsigned char)sb->buf[sb->len - 1]))
+ sb->len--;
+ while(sb->len > 0 && isspace(*b)) {
+ b++;
+ sb->len--;
+ }
+ memmove(sb->buf, b, sb->len);
+ sb->buf[sb->len] = '\0';
+}
void strbuf_rtrim(struct strbuf *sb)
{
while (sb->len > 0 && isspace((unsigned char)sb->buf[sb->len - 1]))
@@ -67,6 +79,64 @@ void strbuf_rtrim(struct strbuf *sb)
sb->buf[sb->len] = '\0';
}
+void strbuf_ltrim(struct strbuf *sb)
+{
+ char *b = sb->buf;
+ while(sb->len > 0 && isspace(*b)) {
+ b++;
+ sb->len--;
+ }
+ memmove(sb->buf, b, sb->len);
+ sb->buf[sb->len] = '\0';
+}
+
+void strbuf_tolower(struct strbuf *sb)
+{
+ int i;
+ for (i = 0; i < sb->len; i++)
+ sb->buf[i] = tolower(sb->buf[i]);
+}
+
+struct strbuf ** strbuf_split(struct strbuf *sb, int delim)
+{
+ int alloc = 2, pos = 0;
+ char *n, *p;
+ struct strbuf **ret;
+ struct strbuf *t;
+
+ ret = xcalloc(alloc, sizeof(struct strbuf *));
+ p = n = sb->buf;
+ while (n < sb->buf + sb->len) {
+ int len;
+ n = memchr(n, delim, sb->len - (n - sb->buf));
+ if (pos + 1 >= alloc) {
+ alloc = alloc * 2;
+ ret = xrealloc(ret, sizeof(struct strbuf *) * alloc);
+ }
+ if (!n)
+ n = sb->buf + sb->len - 1;
+ len = n - p + 1;
+ t = xmalloc(sizeof(struct strbuf));
+ strbuf_init(t, len);
+ strbuf_add(t, p, len);
+ ret[pos] = t;
+ ret[++pos] = NULL;
+ p = ++n;
+ }
+ return ret;
+}
+
+void strbuf_list_free(struct strbuf ** sbs)
+{
+ struct strbuf **s = sbs;
+
+ while(*s) {
+ strbuf_release(*s);
+ free(*s++);
+ }
+ free(sbs);
+}
+
int strbuf_cmp(struct strbuf *a, struct strbuf *b)
{
int cmp;
diff --git a/strbuf.h b/strbuf.h
index faec229..577d14e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -77,8 +77,14 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) {
#define strbuf_reset(sb) strbuf_setlen(sb, 0)
/*----- content related -----*/
+extern void strbuf_trim(struct strbuf *);
extern void strbuf_rtrim(struct strbuf *);
+extern void strbuf_ltrim(struct strbuf *);
extern int strbuf_cmp(struct strbuf *, struct strbuf *);
+extern void strbuf_tolower(struct strbuf *);
+
+extern struct strbuf ** strbuf_split(struct strbuf*, int delim);
+extern void strbuf_list_free(struct strbuf **);
/*----- add data in your buffer -----*/
static inline void strbuf_addch(struct strbuf *sb, int c) {
--
1.5.4.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH/RFC] git-mailinfo: use strbuf's instead of fixed buffers
2008-07-10 23:41 ` [PATCH] Add some useful functions for strbuf manipulation Lukas Sandström
@ 2008-07-10 23:43 ` Lukas Sandström
2008-07-12 6:10 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-10 23:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
builtin-mailinfo.c | 705 +++++++++++++++++++++++++---------------------------
1 files changed, 333 insertions(+), 372 deletions(-)
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 2d1520f..254a97c 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -5,14 +5,15 @@
#include "cache.h"
#include "builtin.h"
#include "utf8.h"
+#include "strbuf.h"
static FILE *cmitmsg, *patchfile, *fin, *fout;
static int keep_subject;
static const char *metainfo_charset;
-static char line[1000];
-static char name[1000];
-static char email[1000];
+static struct strbuf line = STRBUF_INIT;
+static struct strbuf name = STRBUF_INIT;
+static struct strbuf email = STRBUF_INIT;
static enum {
TE_DONTCARE, TE_QP, TE_BASE64,
@@ -21,74 +22,74 @@ static enum {
TYPE_TEXT, TYPE_OTHER,
} message_type;
-static char charset[256];
+static struct strbuf charset = STRBUF_INIT;
static int patch_lines;
-static char **p_hdr_data, **s_hdr_data;
+static struct strbuf **p_hdr_data, **s_hdr_data;
#define MAX_HDR_PARSED 10
#define MAX_BOUNDARIES 5
-static char *sanity_check(char *name, char *email)
+static void sanity_check(struct strbuf *out, struct strbuf *name, struct strbuf *email)
{
- int len = strlen(name);
- if (len < 3 || len > 60)
- return email;
- if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>'))
- return email;
- return name;
+ struct strbuf o = STRBUF_INIT;
+ if (name->len < 3 || name->len > 60)
+ strbuf_addbuf(&o, email);
+ if (strchr(name->buf, '@') || strchr(name->buf, '<') ||
+ strchr(name->buf, '>'))
+ strbuf_addbuf(&o, email);
+ strbuf_addbuf(&o, name);
+ strbuf_reset(out);
+ strbuf_addbuf(out, &o);
+ strbuf_release(&o);
}
-static int bogus_from(char *line)
+static int bogus_from(const struct strbuf *line)
{
/* John Doe <johndoe> */
- char *bra, *ket, *dst, *cp;
+ char *bra, *ket;
/* This is fallback, so do not bother if we already have an
* e-mail address.
*/
- if (*email)
+ if (email.len)
return 0;
- bra = strchr(line, '<');
+ bra = strchr(line->buf, '<');
if (!bra)
return 0;
ket = strchr(bra, '>');
if (!ket)
return 0;
- for (dst = email, cp = bra+1; cp < ket; )
- *dst++ = *cp++;
- *dst = 0;
- for (cp = line; isspace(*cp); cp++)
- ;
- for (bra--; isspace(*bra); bra--)
- *bra = 0;
- cp = sanity_check(cp, email);
- strcpy(name, cp);
+ strbuf_reset(&email);
+ strbuf_add(&email, bra + 1, ket - bra - 1);
+
+ strbuf_reset(&name);
+ strbuf_add(&name, line->buf, bra - line->buf);
+ strbuf_trim(&name);
+ sanity_check(&name, &name, &email);
return 1;
}
-static int handle_from(char *in_line)
+static int handle_from(struct strbuf *from)
{
- char line[1000];
char *at;
- char *dst;
+ size_t el;
- strcpy(line, in_line);
- at = strchr(line, '@');
+ at = strchr(from->buf, '@');
if (!at)
- return bogus_from(line);
+ return bogus_from(from);
/*
* If we already have one email, don't take any confusing lines
*/
- if (*email && strchr(at+1, '@'))
+ if (email.len && strchr(at + 1, '@'))
return 0;
/* Pick up the string around '@', possibly delimited with <>
- * pair; that is the email part. White them out while copying.
+ * pair; that is the email part.
*/
- while (at > line) {
+ while (at > from->buf) {
char c = at[-1];
if (isspace(c))
break;
@@ -98,56 +99,35 @@ static int handle_from(char *in_line)
}
at--;
}
- dst = email;
- for (;;) {
- unsigned char c = *at;
- if (!c || c == '>' || isspace(c)) {
- if (c == '>')
- *at = ' ';
- break;
- }
- *at++ = ' ';
- *dst++ = c;
- }
- *dst++ = 0;
-
+ el = strcspn(at, " \n\t\r\v\f>");
+ strbuf_reset(&email);
+ strbuf_add(&email, at, el);
+ strbuf_remove(from, at - from->buf, el + 1);
/* The remainder is name. It could be "John Doe <john.doe@xz>"
* or "john.doe@xz (John Doe)", but we have whited out the
* email part, so trim from both ends, possibly removing
* the () pair at the end.
*/
- at = line + strlen(line);
- while (at > line) {
- unsigned char c = *--at;
- if (!isspace(c)) {
- at[(c == ')') ? 0 : 1] = 0;
- break;
- }
- }
- at = line;
- for (;;) {
- unsigned char c = *at;
- if (!c || !isspace(c)) {
- if (c == '(')
- at++;
- break;
- }
- at++;
- }
- at = sanity_check(at, email);
- strcpy(name, at);
+ strbuf_trim(from);
+ if (*from->buf == '(')
+ strbuf_remove(&name, 0, 1);
+ if (*(from->buf + from->len - 1) == ')')
+ strbuf_setlen(from, from->len - 1);
+
+ sanity_check(&name, from, &email);
return 1;
}
-static int handle_header(char *line, char *data, int ofs)
+static void handle_header(struct strbuf **out, const struct strbuf *line)
{
- if (!line || !data)
- return 1;
-
- strcpy(data, line+ofs);
+ if (!*out) {
+ *out = xmalloc(sizeof(struct strbuf));
+ strbuf_init(*out, line->len);
+ } else
+ strbuf_reset(*out);
- return 0;
+ strbuf_addbuf(*out, (struct strbuf *)line); /* const warning */
}
/* NOTE NOTE NOTE. We do not claim we do full MIME. We just attempt
@@ -156,13 +136,13 @@ static int handle_header(char *line, char *data, int ofs)
* case insensitively.
*/
-static int slurp_attr(const char *line, const char *name, char *attr)
+static int slurp_attr(const char *line, const char *name, struct strbuf *attr)
{
const char *ends, *ap = strcasestr(line, name);
size_t sz;
if (!ap) {
- *attr = 0;
+ strbuf_setlen(attr, 0);
return 0;
}
ap += strlen(name);
@@ -173,180 +153,176 @@ static int slurp_attr(const char *line, const char *name, char *attr)
else
ends = "; \t";
sz = strcspn(ap, ends);
- memcpy(attr, ap, sz);
- attr[sz] = 0;
+ strbuf_add(attr, ap, sz);
return 1;
}
struct content_type {
- char *boundary;
- int boundary_len;
+ struct strbuf *boundary;
};
static struct content_type content[MAX_BOUNDARIES];
static struct content_type *content_top = content;
-static int handle_content_type(char *line)
+static int handle_content_type(struct strbuf *line)
{
- char boundary[256];
+ struct strbuf *boundary = xmalloc(sizeof(struct strbuf));
+ strbuf_init(boundary, line->len);
- if (strcasestr(line, "text/") == NULL)
+ if (!strcasestr(line->buf, "text/"))
message_type = TYPE_OTHER;
- if (slurp_attr(line, "boundary=", boundary + 2)) {
- memcpy(boundary, "--", 2);
+ if (slurp_attr(line->buf, "boundary=", boundary)) {
+ strbuf_insert(boundary, 0, "--", 2);
if (content_top++ >= &content[MAX_BOUNDARIES]) {
fprintf(stderr, "Too many boundaries to handle\n");
exit(1);
}
- content_top->boundary_len = strlen(boundary);
- content_top->boundary = xmalloc(content_top->boundary_len+1);
- strcpy(content_top->boundary, boundary);
- }
- if (slurp_attr(line, "charset=", charset)) {
- int i, c;
- for (i = 0; (c = charset[i]) != 0; i++)
- charset[i] = tolower(c);
+ content_top->boundary = boundary;
+ } else {
+ strbuf_release(boundary);
+ free(boundary);
}
+ if (slurp_attr(line->buf, "charset=", &charset))
+ strbuf_tolower(&charset);
return 0;
}
-static int handle_content_transfer_encoding(char *line)
+static int handle_content_transfer_encoding(struct strbuf *line)
{
- if (strcasestr(line, "base64"))
+ if (strcasestr(line->buf, "base64"))
transfer_encoding = TE_BASE64;
- else if (strcasestr(line, "quoted-printable"))
+ else if (strcasestr(line->buf, "quoted-printable"))
transfer_encoding = TE_QP;
else
transfer_encoding = TE_DONTCARE;
return 0;
}
-static int is_multipart_boundary(const char *line)
-{
- return (!memcmp(line, content_top->boundary, content_top->boundary_len));
-}
-
-static int eatspace(char *line)
+static int is_multipart_boundary(struct strbuf *line)
{
- int len = strlen(line);
- while (len > 0 && isspace(line[len-1]))
- line[--len] = 0;
- return len;
+ return !strbuf_cmp(line, content_top->boundary);
}
-static char *cleanup_subject(char *subject)
+static void cleanup_subject(struct strbuf *subject)
{
- for (;;) {
- char *p;
- int len, remove;
- switch (*subject) {
+ char *pos;
+ size_t remove;
+ while (subject->len) {
+ switch (*subject->buf) {
case 'r': case 'R':
- if (!memcmp("e:", subject+1, 2)) {
- subject += 3;
+ if (subject->len <= 3)
+ break;
+ if (!memcmp(subject->buf + 1, "e:", 2)) {
+ strbuf_remove(subject, 0, 3);
continue;
}
break;
case ' ': case '\t': case ':':
- subject++;
+ strbuf_remove(subject, 0, 1);
continue;
-
+ break;
case '[':
- p = strchr(subject, ']');
- if (!p) {
- subject++;
- continue;
- }
- len = strlen(p);
- remove = p - subject;
- if (remove <= len *2) {
- subject = p+1;
- continue;
- }
+ if ((pos = strchr(subject->buf, ']'))) {
+ remove = pos - subject->buf + 1;
+ /* Don't remove too much. */
+ if (remove <= (subject->len - remove + 1) * 2) {
+ strbuf_remove(subject, 0, remove);
+ continue;
+ }
+ } else
+ strbuf_remove(subject, 0, 1);
break;
}
- eatspace(subject);
- return subject;
+ strbuf_trim(subject);
+ return;
}
}
-static void cleanup_space(char *buf)
+static void cleanup_space(struct strbuf *sb)
{
- unsigned char c;
- while ((c = *buf) != 0) {
- buf++;
- if (isspace(c)) {
- buf[-1] = ' ';
- c = *buf;
- while (isspace(c)) {
- int len = strlen(buf);
- memmove(buf, buf+1, len);
- c = *buf;
- }
+ size_t pos, cnt;
+ for (pos = 0; pos < sb->len; pos++) {
+ if (isspace(sb->buf[pos])) {
+ sb->buf[pos] = ' ';
+ for (cnt = 0; isspace(sb->buf[pos + cnt + 1]); cnt++);
+ strbuf_remove(sb, pos + 1, cnt);
}
}
}
-static void decode_header(char *it, unsigned itsize);
+static void decode_header(struct strbuf *line);
static const char *header[MAX_HDR_PARSED] = {
"From","Subject","Date",
};
-static int check_header(char *line, unsigned linesize, char **hdr_data, int overwrite)
+static int cmp_header(const struct strbuf *line, const char *hdr)
{
- int i;
+ int len = strlen(hdr);
+ return !strncasecmp(line->buf, hdr, len) && line->len > len &&
+ line->buf[len] == ':' && isspace(line->buf[len + 1]);
+}
+static int check_header(const struct strbuf *line, struct strbuf *hdr_data[], int overwrite)
+{
+ int i, ret = 0, len;
+ struct strbuf sb = STRBUF_INIT;
/* search for the interesting parts */
for (i = 0; header[i]; i++) {
int len = strlen(header[i]);
- if ((!hdr_data[i] || overwrite) &&
- !strncasecmp(line, header[i], len) &&
- line[len] == ':' && isspace(line[len + 1])) {
+ if ((!hdr_data[i] || overwrite) && cmp_header(line, header[i])) {
/* Unwrap inline B and Q encoding, and optionally
* normalize the meta information to utf8.
*/
- decode_header(line + len + 2, linesize - len - 2);
- hdr_data[i] = xmalloc(1000 * sizeof(char));
- if (! handle_header(line, hdr_data[i], len + 2)) {
- return 1;
- }
+ strbuf_add(&sb, line->buf + len + 2, line->len - len -2);
+ decode_header(&sb);
+ handle_header(&hdr_data[i], &sb);
+ ret = 1;
+ goto check_header_out;
}
}
/* Content stuff */
- if (!strncasecmp(line, "Content-Type", 12) &&
- line[12] == ':' && isspace(line[12 + 1])) {
- decode_header(line + 12 + 2, linesize - 12 - 2);
- if (! handle_content_type(line)) {
- return 1;
- }
- }
- if (!strncasecmp(line, "Content-Transfer-Encoding", 25) &&
- line[25] == ':' && isspace(line[25 + 1])) {
- decode_header(line + 25 + 2, linesize - 25 - 2);
- if (! handle_content_transfer_encoding(line)) {
- return 1;
- }
+ if (cmp_header(line, "Content-Type")) {
+ len = strlen("Content-Type: ");
+ strbuf_reset(&sb);
+ strbuf_add(&sb, line->buf + len, line->len - len);
+ decode_header(&sb);
+ strbuf_insert(&sb, 0, "Content-Type: ", len);
+ if (!handle_content_type(&sb))
+ ret = 1;
+ goto check_header_out;
+ }
+ if (cmp_header(line, "Content-Transfer-Encoding")) {
+ len = strlen("Content-Transfer-Encoding: ");
+ strbuf_reset(&sb);
+ strbuf_add(&sb, line->buf + len, line->len - len);
+ decode_header(&sb);
+ if (!handle_content_transfer_encoding(&sb))
+ ret = 1;
+ goto check_header_out;
}
/* for inbody stuff */
- if (!memcmp(">From", line, 5) && isspace(line[5]))
- return 1;
- if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
+ if (!prefixcmp(line->buf, ">From") && isspace(line->buf[5]))
+ ret = 1;
+ goto check_header_out;
+ if (!prefixcmp(line->buf, "[PATCH]") && isspace(line->buf[7])) {
for (i = 0; header[i]; i++) {
if (!memcmp("Subject", header[i], 7)) {
- if (! handle_header(line, hdr_data[i], 0)) {
- return 1;
- }
+ handle_header(&hdr_data[i], line);
+ ret = 1;
+ goto check_header_out;
}
}
}
- /* no match */
- return 0;
+check_header_out:
+ strbuf_release(&sb);
+ return ret;
}
-static int is_rfc2822_header(char *line)
+static int is_rfc2822_header(const struct strbuf *line)
{
/*
* The section that defines the loosest possible
@@ -357,15 +333,15 @@ static int is_rfc2822_header(char *line)
* ftext = %d33-57 / %59-126
*/
int ch;
- char *cp = line;
+ char *cp = line->buf;
/* Count mbox From headers as headers */
- if (!memcmp(line, "From ", 5) || !memcmp(line, ">From ", 6))
+ if (line->len >= 6 && (!memcmp(cp, "From ", 5) || !memcmp(cp, ">From ", 6)))
return 1;
while ((ch = *cp++)) {
if (ch == ':')
- return cp != line;
+ return 1;
if ((33 <= ch && ch <= 57) ||
(59 <= ch && ch <= 126))
continue;
@@ -374,34 +350,20 @@ static int is_rfc2822_header(char *line)
return 0;
}
-/*
- * sz is size of 'line' buffer in bytes. Must be reasonably
- * long enough to hold one physical real-world e-mail line.
- */
-static int read_one_header_line(char *line, int sz, FILE *in)
+static int read_one_header_line(struct strbuf *line, FILE *in)
{
- int len;
-
- /*
- * We will read at most (sz-1) bytes and then potentially
- * re-add NUL after it. Accessing line[sz] after this is safe
- * and we can allow len to grow up to and including sz.
- */
- sz--;
-
/* Get the first part of the line. */
- if (!fgets(line, sz, in))
+ if (strbuf_getline(line, in, '\n'))
return 0;
/*
* Is it an empty line or not a valid rfc2822 header?
* If so, stop here, and return false ("not a header")
*/
- len = eatspace(line);
- if (!len || !is_rfc2822_header(line)) {
+ strbuf_rtrim(line);
+ if (!line->len || !is_rfc2822_header(line)) {
/* Re-add the newline */
- line[len] = '\n';
- line[len + 1] = '\0';
+ strbuf_addch(line, '\n');
return 0;
}
@@ -410,65 +372,53 @@ static int read_one_header_line(char *line, int sz, FILE *in)
* Yuck, 2822 header "folding"
*/
for (;;) {
- int peek, addlen;
- static char continuation[1000];
+ int peek;
+ struct strbuf continuation = STRBUF_INIT;
peek = fgetc(in); ungetc(peek, in);
if (peek != ' ' && peek != '\t')
break;
- if (!fgets(continuation, sizeof(continuation), in))
+ if (strbuf_getline(&continuation, in, '\n'))
break;
- addlen = eatspace(continuation);
- if (len < sz - 1) {
- if (addlen >= sz - len)
- addlen = sz - len - 1;
- memcpy(line + len, continuation, addlen);
- line[len] = '\n';
- len += addlen;
- }
+ continuation.buf[0] = '\n';
+ strbuf_rtrim(&continuation);
+ strbuf_addbuf(line, &continuation);
}
- line[len] = 0;
return 1;
}
-static int decode_q_segment(char *in, char *ot, unsigned otsize, char *ep, int rfc2047)
+static struct strbuf *decode_q_segment(const struct strbuf *q_seg, int rfc2047)
{
- char *otbegin = ot;
- char *otend = ot + otsize;
+ const char *in = q_seg->buf;
int c;
- while ((c = *in++) != 0 && (in <= ep)) {
- if (ot == otend) {
- *--ot = '\0';
- return -1;
- }
+ struct strbuf *out = xmalloc(sizeof(struct strbuf));
+ strbuf_init(out, q_seg->len);
+
+ while ((c = *in++) != 0) {
if (c == '=') {
int d = *in++;
if (d == '\n' || !d)
break; /* drop trailing newline */
- *ot++ = ((hexval(d) << 4) | hexval(*in++));
+ strbuf_addch(out, (hexval(d) << 4) | hexval(*in++));
continue;
}
if (rfc2047 && c == '_') /* rfc2047 4.2 (2) */
c = 0x20;
- *ot++ = c;
+ strbuf_addch(out, c);
}
- *ot = 0;
- return (ot - otbegin);
+ return out;
}
-static int decode_b_segment(char *in, char *ot, unsigned otsize, char *ep)
+static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
{
/* Decode in..ep, possibly in-place to ot */
int c, pos = 0, acc = 0;
- char *otbegin = ot;
- char *otend = ot + otsize;
+ const char *in = b_seg->buf;
+ struct strbuf *out = xmalloc(sizeof(struct strbuf));
+ strbuf_init(out, b_seg->len);
- while ((c = *in++) != 0 && (in <= ep)) {
- if (ot == otend) {
- *--ot = '\0';
- return -1;
- }
+ while ((c = *in++) != 0) {
if (c == '+')
c = 62;
else if (c == '/')
@@ -493,21 +443,20 @@ static int decode_b_segment(char *in, char *ot, unsigned otsize, char *ep)
acc = (c << 2);
break;
case 1:
- *ot++ = (acc | (c >> 4));
+ strbuf_addch(out, (acc | (c >> 4)));
acc = (c & 15) << 4;
break;
case 2:
- *ot++ = (acc | (c >> 2));
+ strbuf_addch(out, (acc | (c >> 2)));
acc = (c & 3) << 6;
break;
case 3:
- *ot++ = (acc | c);
+ strbuf_addch(out, (acc | c));
acc = pos = 0;
break;
}
}
- *ot = 0;
- return (ot - otbegin);
+ return out;
}
/*
@@ -521,16 +470,16 @@ static int decode_b_segment(char *in, char *ot, unsigned otsize, char *ep)
* Otherwise, we default to assuming it is Latin1 for historical
* reasons.
*/
-static const char *guess_charset(const char *line, const char *target_charset)
+static const char *guess_charset(const struct strbuf *line, const char *target_charset)
{
if (is_encoding_utf8(target_charset)) {
- if (is_utf8(line))
+ if (is_utf8(line->buf))
return NULL;
}
return "latin1";
}
-static void convert_to_utf8(char *line, unsigned linesize, const char *charset)
+static void convert_to_utf8(struct strbuf *line, const char *charset)
{
char *out;
@@ -542,112 +491,119 @@ static void convert_to_utf8(char *line, unsigned linesize, const char *charset)
if (!strcmp(metainfo_charset, charset))
return;
- out = reencode_string(line, metainfo_charset, charset);
+ out = reencode_string(line->buf, metainfo_charset, charset);
if (!out)
die("cannot convert from %s to %s\n",
charset, metainfo_charset);
- strlcpy(line, out, linesize);
- free(out);
+ strbuf_attach(line, out, strlen(out), strlen(out));
}
-static int decode_header_bq(char *it, unsigned itsize)
+static int decode_header_bq(struct strbuf *it)
{
char *in, *out, *ep, *cp, *sp;
- char outbuf[1000];
+ struct strbuf outbuf = STRBUF_INIT, *dec;
+ struct strbuf charset_q = STRBUF_INIT, piecebuf = STRBUF_INIT;
int rfc2047 = 0;
- in = it;
- out = outbuf;
- while ((ep = strstr(in, "=?")) != NULL) {
- int sz, encoding;
- char charset_q[256], piecebuf[256];
+ in = it->buf;
+ while (in - it->buf <= it->len && (ep = strstr(in, "=?")) != NULL) {
+ int encoding;
+ strbuf_reset(&charset_q);
+ strbuf_reset(&piecebuf);
rfc2047 = 1;
if (in != ep) {
- sz = ep - in;
- memcpy(out, in, sz);
- out += sz;
- in += sz;
+ strbuf_add(&outbuf, in, ep - in);
+ in = ep;
}
/* E.g.
* ep : "=?iso-2022-jp?B?GyR...?= foo"
* ep : "=?ISO-8859-1?Q?Foo=FCbar?= baz"
*/
ep += 2;
- cp = strchr(ep, '?');
- if (!cp)
- return rfc2047; /* no munging */
- for (sp = ep; sp < cp; sp++)
- charset_q[sp - ep] = tolower(*sp);
- charset_q[cp - ep] = 0;
+
+ if (ep - it->buf >= it->len || !(cp = strchr(ep, '?')))
+ goto decode_header_bq_out;
+
+ if (cp + 3 - it->buf > it->len)
+ goto decode_header_bq_out;
+ strbuf_add(&charset_q, ep, cp - ep);
+ strbuf_tolower(&charset_q);
+
encoding = cp[1];
if (!encoding || cp[2] != '?')
- return rfc2047; /* no munging */
+ goto decode_header_bq_out;
ep = strstr(cp + 3, "?=");
if (!ep)
- return rfc2047; /* no munging */
+ goto decode_header_bq_out;
+ strbuf_add(&piecebuf, cp + 3, ep - cp - 3);
switch (tolower(encoding)) {
default:
- return rfc2047; /* no munging */
+ goto decode_header_bq_out;
case 'b':
- sz = decode_b_segment(cp + 3, piecebuf, sizeof(piecebuf), ep);
+ dec = decode_b_segment(&piecebuf);
break;
case 'q':
- sz = decode_q_segment(cp + 3, piecebuf, sizeof(piecebuf), ep, 1);
+ dec = decode_q_segment(&piecebuf, 1);
break;
}
- if (sz < 0)
- return rfc2047;
if (metainfo_charset)
- convert_to_utf8(piecebuf, sizeof(piecebuf), charset_q);
+ convert_to_utf8(dec, charset_q.buf);
- sz = strlen(piecebuf);
- if (outbuf + sizeof(outbuf) <= out + sz)
- return rfc2047; /* no munging */
- strcpy(out, piecebuf);
- out += sz;
+ strbuf_addbuf(&outbuf, dec);
+ strbuf_release(dec);
+ free(dec);
in = ep + 2;
}
- strcpy(out, in);
- strlcpy(it, outbuf, itsize);
+ strbuf_addstr(&outbuf, in);
+ strbuf_reset(it);
+ strbuf_addbuf(it, &outbuf);
+decode_header_bq_out:
+ strbuf_release(&outbuf);
+ strbuf_release(&charset_q);
+ strbuf_release(&piecebuf);
return rfc2047;
}
-static void decode_header(char *it, unsigned itsize)
+static void decode_header(struct strbuf *it)
{
-
- if (decode_header_bq(it, itsize))
+ if (decode_header_bq(it))
return;
/* otherwise "it" is a straight copy of the input.
* This can be binary guck but there is no charset specified.
*/
if (metainfo_charset)
- convert_to_utf8(it, itsize, "");
+ convert_to_utf8(it, "");
}
-static int decode_transfer_encoding(char *line, unsigned linesize, int inputlen)
+static void decode_transfer_encoding(struct strbuf *line)
{
- char *ep;
+ struct strbuf *ret;
+ int len;
switch (transfer_encoding) {
case TE_QP:
- ep = line + inputlen;
- return decode_q_segment(line, line, linesize, ep, 0);
+ ret = decode_q_segment(line, 0);
+ break;
case TE_BASE64:
- ep = line + inputlen;
- return decode_b_segment(line, line, linesize, ep);
+ ret = decode_b_segment(line);
+ break;
case TE_DONTCARE:
default:
- return inputlen;
+ return;
}
+ strbuf_reset(line);
+ strbuf_addbuf(line, ret);
+ strbuf_release(ret);
+ free(ret);
}
-static int handle_filter(char *line, unsigned linesize, int linelen);
+static int handle_filter(struct strbuf *line);
static int find_boundary(void)
{
- while(fgets(line, sizeof(line), fin) != NULL) {
- if (is_multipart_boundary(line))
+ while(!strbuf_getline(&line, fin, '\n')) {
+ if (is_multipart_boundary(&line))
return 1;
}
return 0;
@@ -655,11 +611,15 @@ static int find_boundary(void)
static int handle_boundary(void)
{
- char newline[]="\n";
+ struct strbuf newline = STRBUF_INIT;
+
+ strbuf_addch(&newline, '\n');
again:
- if (!memcmp(line+content_top->boundary_len, "--", 2)) {
+ if (line.len >= content_top->boundary->len + 2 &&
+ !memcmp(line.buf + content_top->boundary->len, "--", 2)) {
/* we hit an end boundary */
/* pop the current boundary off the stack */
+ strbuf_release(content_top->boundary);
free(content_top->boundary);
/* technically won't happen as is_multipart_boundary()
@@ -670,7 +630,8 @@ again:
"can't recover\n");
exit(1);
}
- handle_filter(newline, sizeof(newline), strlen(newline));
+ handle_filter(&newline);
+ strbuf_release(&newline);
/* skip to the next boundary */
if (!find_boundary())
@@ -680,39 +641,44 @@ again:
/* set some defaults */
transfer_encoding = TE_DONTCARE;
- charset[0] = 0;
+ strbuf_reset(&charset);
message_type = TYPE_TEXT;
/* slurp in this section's info */
- while (read_one_header_line(line, sizeof(line), fin))
- check_header(line, sizeof(line), p_hdr_data, 0);
+ while (read_one_header_line(&line, fin))
+ check_header(&line, p_hdr_data, 0);
+ strbuf_release(&newline);
/* eat the blank line after section info */
- return (fgets(line, sizeof(line), fin) != NULL);
+ return (strbuf_getline(&line, fin, '\n') == 0);
}
-static inline int patchbreak(const char *line)
+static inline int patchbreak(const struct strbuf *line)
{
+ size_t i;
+
/* Beginning of a "diff -" header? */
- if (!memcmp("diff -", line, 6))
+ if (!prefixcmp(line->buf, "diff -"))
return 1;
/* CVS "Index: " line? */
- if (!memcmp("Index: ", line, 7))
+ if (!prefixcmp(line->buf, "Index: "))
return 1;
/*
* "--- <filename>" starts patches without headers
* "---<sp>*" is a manual separator
*/
- if (!memcmp("---", line, 3)) {
- line += 3;
+ if (line->len < 4)
+ return 0;
+
+ if (!prefixcmp(line->buf, "---")) {
/* space followed by a filename? */
- if (line[0] == ' ' && !isspace(line[1]))
+ if (line->buf[3] == ' ' && !isspace(line->buf[4]))
return 1;
/* Just whitespace? */
- for (;;) {
- unsigned char c = *line++;
+ for (i = 3; i < line->len; i++) {
+ unsigned char c = line->buf[i];
if (c == '\n')
return 1;
if (!isspace(c))
@@ -723,32 +689,25 @@ static inline int patchbreak(const char *line)
return 0;
}
-
-static int handle_commit_msg(char *line, unsigned linesize)
+static int handle_commit_msg(struct strbuf *line)
{
static int still_looking = 1;
- char *endline = line + linesize;
+ char *c;
if (!cmitmsg)
return 0;
if (still_looking) {
- char *cp = line;
- if (isspace(*line)) {
- for (cp = line + 1; *cp; cp++) {
- if (!isspace(*cp))
- break;
- }
- if (!*cp)
- return 0;
- }
- if ((still_looking = check_header(cp, endline - cp, s_hdr_data, 0)) != 0)
+ strbuf_ltrim(line);
+ if (!line->len)
+ return 0;
+ if ((still_looking = check_header(line, s_hdr_data, 0)) != 0)
return 0;
}
/* normalize the log message to UTF-8. */
if (metainfo_charset)
- convert_to_utf8(line, endline - line, charset);
+ convert_to_utf8(line, charset.buf);
if (patchbreak(line)) {
fclose(cmitmsg);
@@ -756,18 +715,18 @@ static int handle_commit_msg(char *line, unsigned linesize)
return 1;
}
- fputs(line, cmitmsg);
+ fputs(line->buf, cmitmsg);
return 0;
}
-static int handle_patch(char *line, int len)
+static int handle_patch(const struct strbuf *line)
{
- fwrite(line, 1, len, patchfile);
+ fwrite(line->buf, 1, line->len, patchfile);
patch_lines++;
return 0;
}
-static int handle_filter(char *line, unsigned linesize, int linelen)
+static int handle_filter(struct strbuf *line)
{
static int filter = 0;
@@ -776,11 +735,11 @@ static int handle_filter(char *line, unsigned linesize, int linelen)
*/
switch (filter) {
case 0:
- if (!handle_commit_msg(line, linesize))
+ if (!handle_commit_msg(line))
break;
filter++;
case 1:
- if (!handle_patch(line, linelen))
+ if (!handle_patch(line))
break;
filter++;
default:
@@ -793,101 +752,105 @@ static int handle_filter(char *line, unsigned linesize, int linelen)
static void handle_body(void)
{
int rc = 0;
- static char newline[2000];
- static char *np = newline;
- int len = strlen(line);
+ int len = 0;
+ struct strbuf prev = STRBUF_INIT;
/* Skip up to the first boundary */
if (content_top->boundary) {
if (!find_boundary())
- return;
+ goto handle_body_out;
}
do {
+ strbuf_setlen(&line, line.len + len);
+
/* process any boundary lines */
- if (content_top->boundary && is_multipart_boundary(line)) {
+ if (content_top->boundary && is_multipart_boundary(&line)) {
/* flush any leftover */
- if (np != newline)
- handle_filter(newline, sizeof(newline),
- np - newline);
+ if (line.len)
+ handle_filter(&line);
+
if (!handle_boundary())
- return;
- len = strlen(line);
+ goto handle_body_out;
}
/* Unwrap transfer encoding */
- len = decode_transfer_encoding(line, sizeof(line), len);
- if (len < 0) {
- error("Malformed input line");
- return;
- }
+ decode_transfer_encoding(&line);
switch (transfer_encoding) {
case TE_BASE64:
case TE_QP:
{
- char *op = line;
+ struct strbuf **lines, **it, *sb;
+
+ /* Prepend any previous partial lines */
+ strbuf_insert(&line, 0, prev.buf, prev.len);
+ strbuf_reset(&prev);
/* binary data most likely doesn't have newlines */
if (message_type != TYPE_TEXT) {
- rc = handle_filter(line, sizeof(line), len);
+ rc = handle_filter(&line);
break;
}
-
/*
* This is a decoded line that may contain
* multiple new lines. Pass only one chunk
* at a time to handle_filter()
*/
- do {
- while (op < line + len && *op != '\n')
- *np++ = *op++;
- *np = *op;
- if (*np != 0) {
- /* should be sitting on a new line */
- *(++np) = 0;
- op++;
- rc = handle_filter(newline, sizeof(newline), np - newline);
- np = newline;
- }
- } while (op < line + len);
+ lines = strbuf_split(&line, '\n');
+ strbuf_reset(&line);
+ for (it = lines; (sb = *it); it++) {
+ if (*(it + 1) == NULL) /* The last token */
+ if (sb->buf[sb->len - 1] != '\n') {
+ /* Partial line, save it for later. */
+ strbuf_addbuf(&prev, sb);
+ break;
+ }
+ rc = handle_filter(sb);
+ }
/*
- * The partial chunk is saved in newline and will be
+ * The partial chunk is saved in "prev" and will be
* appended by the next iteration of read_line_with_nul().
*/
+ strbuf_list_free(lines);
break;
}
default:
- rc = handle_filter(line, sizeof(line), len);
+ rc = handle_filter(&line);
+ strbuf_reset(&line);
}
if (rc)
/* nothing left to filter */
break;
- } while ((len = read_line_with_nul(line, sizeof(line), fin)));
+ if (strbuf_avail(&line) < 100)
+ strbuf_grow(&line, 100);
+ } while ((len = read_line_with_nul(line.buf, strbuf_avail(&line), fin)));
+handle_body_out:
+ strbuf_release(&prev);
return;
}
-static void output_header_lines(FILE *fout, const char *hdr, char *data)
+static void output_header_lines(FILE *fout, const char *hdr, const struct strbuf *data)
{
+ char *sp = data->buf;
while (1) {
- char *ep = strchr(data, '\n');
+ char *ep = strchr(sp, '\n');
int len;
if (!ep)
- len = strlen(data);
+ len = strlen(sp);
else
- len = ep - data;
- fprintf(fout, "%s: %.*s\n", hdr, len, data);
+ len = ep - sp;
+ fprintf(fout, "%s: %.*s\n", hdr, len, sp);
if (!ep)
break;
- data = ep + 1;
+ sp = ep + 1;
}
}
static void handle_info(void)
{
- char *sub;
- char *hdr;
+ struct strbuf *hdr;
int i;
for (i = 0; header[i]; i++) {
@@ -901,20 +864,18 @@ static void handle_info(void)
continue;
if (!memcmp(header[i], "Subject", 7)) {
- if (keep_subject)
- sub = hdr;
- else {
- sub = cleanup_subject(hdr);
- cleanup_space(sub);
+ if (!keep_subject) {
+ cleanup_subject(hdr);
+ cleanup_space(hdr);
}
- output_header_lines(fout, "Subject", sub);
+ output_header_lines(fout, "Subject", hdr);
} else if (!memcmp(header[i], "From", 4)) {
handle_from(hdr);
- fprintf(fout, "Author: %s\n", name);
- fprintf(fout, "Email: %s\n", email);
+ fprintf(fout, "Author: %s\n", name.buf);
+ fprintf(fout, "Email: %s\n", email.buf);
} else {
cleanup_space(hdr);
- fprintf(fout, "%s: %s\n", header[i], hdr);
+ fprintf(fout, "%s: %s\n", header[i], hdr->buf);
}
}
fprintf(fout, "\n");
@@ -941,8 +902,8 @@ static int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
return -1;
}
- p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(char *));
- s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(char *));
+ p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*p_hdr_data));
+ s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*s_hdr_data));
do {
peek = fgetc(in);
@@ -950,8 +911,8 @@ static int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
ungetc(peek, in);
/* process the email header */
- while (read_one_header_line(line, sizeof(line), fin))
- check_header(line, sizeof(line), p_hdr_data, 1);
+ while (read_one_header_line(&line, fin))
+ check_header(&line, p_hdr_data, 1);
handle_body();
handle_info();
--
1.5.4.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re:! [PATCH/RFC] git-mailinfo: use strbuf's instead of fixed buffers
2008-07-10 23:43 ` [PATCH/RFC] git-mailinfo: use strbuf's instead of fixed buffers Lukas Sandström
@ 2008-07-12 6:10 ` Junio C Hamano
2008-07-13 18:17 ` ! " Lukas Sandström
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2008-07-12 6:10 UTC (permalink / raw)
To: Lukas Sandström; +Cc: Git Mailing List
Lukas Sandström <lukass@etek.chalmers.se> writes:
> -static char *sanity_check(char *name, char *email)
> +static void sanity_check(struct strbuf *out, struct strbuf *name, struct strbuf *email)
> {
> - int len = strlen(name);
> - if (len < 3 || len > 60)
> - return email;
> - if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>'))
> - return email;
> - return name;
> + struct strbuf o = STRBUF_INIT;
> + if (name->len < 3 || name->len > 60)
> + strbuf_addbuf(&o, email);
> + if (strchr(name->buf, '@') || strchr(name->buf, '<') ||
> + strchr(name->buf, '>'))
> + strbuf_addbuf(&o, email);
> + strbuf_addbuf(&o, name);
> + strbuf_reset(out);
> + strbuf_addbuf(out, &o);
> + strbuf_release(&o);
This does not look like a correct conversion. When name is too short or
too long, we do not even look at name and return email straight. Perhaps
this would be more faithful conversion:
struct strbuf *src = name;
if (name->len < 3 ||
60 < name->len ||
strchr(name->buf, '@') ||
strchr(name->buf, '<') ||
strchr(name->buf, '>'))
src = email;
else if (name == out)
return;
strbuf_reset(out);
strbuf_addbuf(out, src);
It is not your fault, but sanity_check() is a grave misnomer for this
function. This does "get_sane_name" (i.e. we have name and email but if
name does not look right, use email instead).
> -static int bogus_from(char *line)
> +static int bogus_from(const struct strbuf *line)
> {
> /* John Doe <johndoe> */
> - char *bra, *ket, *dst, *cp;
>
> + char *bra, *ket;
> /* This is fallback, so do not bother if we already have an
> * e-mail address.
> */
> - if (*email)
> + if (email.len)
> return 0;
>
> - bra = strchr(line, '<');
> + bra = strchr(line->buf, '<');
> if (!bra)
> return 0;
> ket = strchr(bra, '>');
> if (!ket)
> return 0;
>
> - for (dst = email, cp = bra+1; cp < ket; )
> - *dst++ = *cp++;
> - *dst = 0;
> - for (cp = line; isspace(*cp); cp++)
> - ;
> - for (bra--; isspace(*bra); bra--)
> - *bra = 0;
> - cp = sanity_check(cp, email);
> - strcpy(name, cp);
> + strbuf_reset(&email);
> + strbuf_add(&email, bra + 1, ket - bra - 1);
> +
> + strbuf_reset(&name);
> + strbuf_add(&name, line->buf, bra - line->buf);
> + strbuf_trim(&name);
> + sanity_check(&name, &name, &email);
> return 1;
> }
Conversion looks correct but its return value does not make much sense
(again, not your fault). bogus_from() is given a bogus looking from line
(it is not about checking if it is bogus), and returns 0 if we already
have e-mail address, if the from line does not have bra-ket for grabbing
e-mail address for, but returns 1 if we managed to get name and email
pairs. The inconsistency does not matter only because its sole caller
handle_from() returns its return value, and its caller discards it. We
may be better off declaring this function and handle_from() as void.
> -static int handle_from(char *in_line)
> +static int handle_from(struct strbuf *from)
> ...
> + el = strcspn(at, " \n\t\r\v\f>");
> + strbuf_reset(&email);
> + strbuf_add(&email, at, el);
> + strbuf_remove(from, at - from->buf, el + 1);
> /* The remainder is name. It could be "John Doe <john.doe@xz>"
> * or "john.doe@xz (John Doe)", but we have whited out the
> * email part, so trim from both ends, possibly removing
> * the () pair at the end.
> */
Now, it should read "but we have removed the email part", I think.
> + strbuf_trim(from);
> + if (*from->buf == '(')
> + strbuf_remove(&name, 0, 1);
> + if (*(from->buf + from->len - 1) == ')')
Can from be empty at this point before this check?
> + strbuf_setlen(from, from->len - 1);
> +
> + sanity_check(&name, from, &email);
> return 1;
> }
We used to copy the data from the argument (in_line) before munging it in
this function, but now we are modifying it in place (from). Does this
upset our caller, or the original code was just doing an extra unnecessary
copy?
> -static int handle_header(char *line, char *data, int ofs)
> +static void handle_header(struct strbuf **out, const struct strbuf *line)
> {
> - if (!line || !data)
> - return 1;
> -
> - strcpy(data, line+ofs);
> + if (!*out) {
> + *out = xmalloc(sizeof(struct strbuf));
> + strbuf_init(*out, line->len);
> + } else
> + strbuf_reset(*out);
>
> - return 0;
> + strbuf_addbuf(*out, (struct strbuf *)line); /* const warning */
> }
I think its second parameter can safely become "const struct strbuf *";
perhaps we should fix the definition of strbuf_addbuf() in your first
patch?
> @@ -173,180 +153,176 @@ static int slurp_attr(const char *line, const char *name, char *attr)
> else
> ends = "; \t";
> sz = strcspn(ap, ends);
> - memcpy(attr, ap, sz);
> - attr[sz] = 0;
> + strbuf_add(attr, ap, sz);
> return 1;
> }
>
> struct content_type {
> - char *boundary;
> - int boundary_len;
> + struct strbuf *boundary;
> };
>
> static struct content_type content[MAX_BOUNDARIES];
Wouldn't it make more sense to get rid of "struct content_type" altogether
and use "struct strbuf *content[MAX_BOUNDARIES]" directly?
I'll review from handle_content_type() til the rest of the file
separately, as my concentration is wearing out..
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] git-mailinfo: Fix getting the subject from the body
2008-07-10 21:41 [PATCH] git-mailinfo: Fix getting the subject from the body Lukas Sandström
[not found] ` <7vod55o0tx.fsf@gitster.siamese.dyndns.org>
@ 2008-07-12 9:36 ` Junio C Hamano
2008-07-12 21:45 ` Lukas Sandström
2008-07-15 3:13 ` Don Zickus
1 sibling, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2008-07-12 9:36 UTC (permalink / raw)
To: Lukas Sandström; +Cc: Git Mailing List, Don Zickus
Lukas Sandström <lukass@etek.chalmers.se> writes:
> "Subject: " isn't in the static array "header", and thus
> memcmp("Subject: ", header[i], 7) will never match.
>
> Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
> ---
>
> This has been broken since 2007-03-12, with commit
> 87ab799234639c26ea10de74782fa511cb3ca606
> so it might not be very important.
>
> builtin-mailinfo.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
> index 962aa34..2d1520f 100644
> --- a/builtin-mailinfo.c
> +++ b/builtin-mailinfo.c
> @@ -334,7 +334,7 @@ static int check_header(char *line, unsigned linesize, char **hdr_data, int over
> return 1;
> if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
> for (i = 0; header[i]; i++) {
> - if (!memcmp("Subject: ", header[i], 9)) {
> + if (!memcmp("Subject", header[i], 7)) {
> if (! handle_header(line, hdr_data[i], 0)) {
> return 1;
> }
Actually, I do not think your patch alone makes any difference, and the
original code looks somewhat bogus. If there is no "Subject: " in the
same section of the message (either in e-mail header in which case
hdr_data == p_hdr_data[], or in the message body part in which case
hdr_data == s_hdr_data[]), hdr_data[1] will be NULL, because the only
place that allocates the storage for the data is the first loop of this
function that deals with real-RFC2822-header-looking lines.
You'd probably need something like this on top of your patch to actually
activate the code.
Another thing I noticed and found puzzling is the handling of ">From "
line that is shown in the context below. check_header() is supposed to
return true when it handled header (i.e. not part of the commit message)
and return false when line is not part of the header. As ">From " is part
of the commit log message, shouldn't it return zero?
Don, this part was what you introduced. Has this codepath ever been
exercised in the real life?
builtin-mailinfo.c | 2 ++
t/t5100-mailinfo.sh | 2 +-
t/t5100/sample.mbox | 35 +++++++++++++++++++++++++++++++++++
3 files changed, 38 insertions(+), 1 deletions(-)
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 2d1520f..13f0502 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -332,12 +332,14 @@ static int check_header(char *line, unsigned linesize, char **hdr_data, int over
/* for inbody stuff */
if (!memcmp(">From", line, 5) && isspace(line[5]))
return 1;
if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
for (i = 0; header[i]; i++) {
if (!memcmp("Subject", header[i], 7)) {
+ if (!hdr_data[i])
+ hdr_data[i] = xmalloc(linesize + 20);
if (! handle_header(line, hdr_data[i], 0)) {
return 1;
}
}
}
}
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 2d1520f..13f0502 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -335,6 +335,8 @@ static int check_header(char *line, unsigned linesize, char **hdr_data, int over
if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
for (i = 0; header[i]; i++) {
if (!memcmp("Subject", header[i], 7)) {
+ if (!hdr_data[i])
+ hdr_data[i] = xmalloc(linesize + 20);
if (! handle_header(line, hdr_data[i], 0)) {
return 1;
}
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 577ecc2..e9f3e72 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -11,7 +11,7 @@ test_expect_success 'split sample box' \
'git mailsplit -o. ../t5100/sample.mbox >last &&
last=`cat last` &&
echo total is $last &&
- test `cat last` = 9'
+ test `cat last` = 10'
for mail in `echo 00*`
do
diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox
index 0476b96..aba57f9 100644
--- a/t/t5100/sample.mbox
+++ b/t/t5100/sample.mbox
@@ -430,3 +430,38 @@ index b426a14..97756ec 100644
=20
=20
2. When the environment variable 'GIT_EXTERNAL_DIFF' is set, the
+From b9704a518e21158433baa2cc2d591fea687967f6 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Lukas=20Sandstr=C3=B6m?= <lukass@etek.chalmers.se>
+Date: Thu, 10 Jul 2008 23:41:33 +0200
+Subject: Re: discussion that lead to this patch
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+[PATCH] git-mailinfo: Fix getting the subject from the body
+
+"Subject: " isn't in the static array "header", and thus
+memcmp("Subject: ", header[i], 7) will never match.
+
+Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
+Signed-off-by: Junio C Hamano <gitster@pobox.com>
+---
+ builtin-mailinfo.c | 2 +-
+ 1 files changed, 1 insertions(+), 1 deletions(-)
+
+diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
+index 962aa34..2d1520f 100644
+--- a/builtin-mailinfo.c
++++ b/builtin-mailinfo.c
+@@ -334,7 +334,7 @@ static int check_header(char *line, unsigned linesize, char **hdr_data, int over
+ return 1;
+ if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
+ for (i = 0; header[i]; i++) {
+- if (!memcmp("Subject: ", header[i], 9)) {
++ if (!memcmp("Subject", header[i], 7)) {
+ if (! handle_header(line, hdr_data[i], 0)) {
+ return 1;
+ }
+--
+1.5.6.2.455.g1efb2
+
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] git-mailinfo: Fix getting the subject from the body
2008-07-12 9:36 ` [PATCH] git-mailinfo: Fix getting the subject from the body Junio C Hamano
@ 2008-07-12 21:45 ` Lukas Sandström
2008-07-15 3:13 ` Don Zickus
1 sibling, 0 replies; 14+ messages in thread
From: Lukas Sandström @ 2008-07-12 21:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, Don Zickus
Junio C Hamano wrote:
> Lukas Sandström <lukass@etek.chalmers.se> writes:
>
>> "Subject: " isn't in the static array "header", and thus
>> memcmp("Subject: ", header[i], 7) will never match.
>>
>> Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
>> ---
>>
>> This has been broken since 2007-03-12, with commit
>> 87ab799234639c26ea10de74782fa511cb3ca606
>> so it might not be very important.
>>
>> builtin-mailinfo.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
>> index 962aa34..2d1520f 100644
>> --- a/builtin-mailinfo.c
>> +++ b/builtin-mailinfo.c
>> @@ -334,7 +334,7 @@ static int check_header(char *line, unsigned linesize, char **hdr_data, int over
>> return 1;
>> if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
>> for (i = 0; header[i]; i++) {
>> - if (!memcmp("Subject: ", header[i], 9)) {
>> + if (!memcmp("Subject", header[i], 7)) {
>> if (! handle_header(line, hdr_data[i], 0)) {
>> return 1;
>> }
>
> Actually, I do not think your patch alone makes any difference, and the
> original code looks somewhat bogus. If there is no "Subject: " in the
> same section of the message (either in e-mail header in which case
> hdr_data == p_hdr_data[], or in the message body part in which case
> hdr_data == s_hdr_data[]), hdr_data[1] will be NULL, because the only
> place that allocates the storage for the data is the first loop of this
> function that deals with real-RFC2822-header-looking lines.
>
> You'd probably need something like this on top of your patch to actually
> activate the code.
Right, I noticed that too. It's fixed in the strbuf conversion, I think.
Lukas Sandström <lukass@etek.chalmers.se> wrote:
> After looking at this part some more, I see that there is no guarantee
> that hdr_data[i] != NULL in this codepath, and then we won't use the
> subject anyway.
I'll be hiking the next week, in case you wonder why I'm not responding.
I'll try to get another version of the patches out before I leave.
/Lukas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ! [PATCH/RFC] git-mailinfo: use strbuf's instead of fixed buffers
2008-07-12 6:10 ` Junio C Hamano
@ 2008-07-13 18:17 ` Lukas Sandström
2008-07-13 18:28 ` [PATCH] Make some strbuf_*() struct strbuf arguments const Lukas Sandström
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-13 18:17 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
Junio C Hamano wrote:
> Lukas Sandström <lukass@etek.chalmers.se> writes:
>
>> -static char *sanity_check(char *name, char *email)
>> +static void sanity_check(struct strbuf *out, struct strbuf *name, struct strbuf *email)
>> {
>> - int len = strlen(name);
>> - if (len < 3 || len > 60)
>> - return email;
>> - if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>'))
>> - return email;
>> - return name;
>> + struct strbuf o = STRBUF_INIT;
>> + if (name->len < 3 || name->len > 60)
>> + strbuf_addbuf(&o, email);
>> + if (strchr(name->buf, '@') || strchr(name->buf, '<') ||
>> + strchr(name->buf, '>'))
>> + strbuf_addbuf(&o, email);
>> + strbuf_addbuf(&o, name);
>> + strbuf_reset(out);
>> + strbuf_addbuf(out, &o);
>> + strbuf_release(&o);
>
> This does not look like a correct conversion. When name is too short or
> too long, we do not even look at name and return email straight. Perhaps
> this would be more faithful conversion:
>
> struct strbuf *src = name;
> if (name->len < 3 ||
> 60 < name->len ||
> strchr(name->buf, '@') ||
> strchr(name->buf, '<') ||
> strchr(name->buf, '>'))
> src = email;
> else if (name == out)
> return;
> strbuf_reset(out);
> strbuf_addbuf(out, src);
>
> It is not your fault, but sanity_check() is a grave misnomer for this
> function. This does "get_sane_name" (i.e. we have name and email but if
> name does not look right, use email instead).
Right. I changed the name and used your implementation.
>> -static int bogus_from(char *line)
>> +static int bogus_from(const struct strbuf *line)
>> {
[ ... ]
>> return 1;
>> }
>
> Conversion looks correct but its return value does not make much sense
> (again, not your fault). bogus_from() is given a bogus looking from line
> (it is not about checking if it is bogus), and returns 0 if we already
> have e-mail address, if the from line does not have bra-ket for grabbing
> e-mail address for, but returns 1 if we managed to get name and email
> pairs. The inconsistency does not matter only because its sole caller
> handle_from() returns its return value, and its caller discards it. We
> may be better off declaring this function and handle_from() as void.
>
Done.
>> /* The remainder is name. It could be "John Doe <john.doe@xz>"
>> * or "john.doe@xz (John Doe)", but we have whited out the
>> * email part, so trim from both ends, possibly removing
>> * the () pair at the end.
>> */
>
> Now, it should read "but we have removed the email part", I think.
Done.
>> + strbuf_trim(from);
>> + if (*from->buf == '(')
>> + strbuf_remove(&name, 0, 1);
>> + if (*(from->buf + from->len - 1) == ')')
>
> Can from be empty at this point before this check?
>
Yes. Added a test for that.
>> + strbuf_setlen(from, from->len - 1);
>> +
>> + sanity_check(&name, from, &email);
>> return 1;
>> }
>
> We used to copy the data from the argument (in_line) before munging it in
> this function, but now we are modifying it in place (from). Does this
> upset our caller, or the original code was just doing an extra unnecessary
> copy?
No, it's ok with the current code, since when handle_from() is called, it's
the last time we look at the header[i] field. I changed it to copy its argument
anyway, for future-proofing.
>
>> -static int handle_header(char *line, char *data, int ofs)
>> +static void handle_header(struct strbuf **out, const struct strbuf *line)
>> {
>> - if (!line || !data)
>> - return 1;
>> -
>> - strcpy(data, line+ofs);
>> + if (!*out) {
>> + *out = xmalloc(sizeof(struct strbuf));
>> + strbuf_init(*out, line->len);
>> + } else
>> + strbuf_reset(*out);
>>
>> - return 0;
>> + strbuf_addbuf(*out, (struct strbuf *)line); /* const warning */
>> }
>
> I think its second parameter can safely become "const struct strbuf *";
> perhaps we should fix the definition of strbuf_addbuf() in your first
> patch?
>
Done.
>> @@ -173,180 +153,176 @@ static int slurp_attr(const char *line, const char *name, char *attr)
>> else
>> ends = "; \t";
>> sz = strcspn(ap, ends);
>> - memcpy(attr, ap, sz);
>> - attr[sz] = 0;
>> + strbuf_add(attr, ap, sz);
>> return 1;
>> }
>>
>> struct content_type {
>> - char *boundary;
>> - int boundary_len;
>> + struct strbuf *boundary;
>> };
>>
>> static struct content_type content[MAX_BOUNDARIES];
>
> Wouldn't it make more sense to get rid of "struct content_type" altogether
> and use "struct strbuf *content[MAX_BOUNDARIES]" directly?
Sure.
I'll send a patch updated with your comments shortly.
/Lukas
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] Make some strbuf_*() struct strbuf arguments const.
2008-07-13 18:17 ` ! " Lukas Sandström
@ 2008-07-13 18:28 ` Lukas Sandström
2008-07-13 18:29 ` [PATCH] Add some useful functions for strbuf manipulation Lukas Sandström
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-13 18:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
strbuf.c | 2 +-
strbuf.h | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 4aed752..7767170 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -67,7 +67,7 @@ void strbuf_rtrim(struct strbuf *sb)
sb->buf[sb->len] = '\0';
}
-int strbuf_cmp(struct strbuf *a, struct strbuf *b)
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b)
{
int cmp;
if (a->len < b->len) {
diff --git a/strbuf.h b/strbuf.h
index faec229..a1b0143 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -61,7 +61,7 @@ static inline void strbuf_swap(struct strbuf *a, struct strbuf *b) {
}
/*----- strbuf size related -----*/
-static inline size_t strbuf_avail(struct strbuf *sb) {
+static inline size_t strbuf_avail(const struct strbuf *sb) {
return sb->alloc ? sb->alloc - sb->len - 1 : 0;
}
@@ -78,7 +78,7 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) {
/*----- content related -----*/
extern void strbuf_rtrim(struct strbuf *);
-extern int strbuf_cmp(struct strbuf *, struct strbuf *);
+extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
/*----- add data in your buffer -----*/
static inline void strbuf_addch(struct strbuf *sb, int c) {
@@ -98,7 +98,7 @@ extern void strbuf_add(struct strbuf *, const void *, size_t);
static inline void strbuf_addstr(struct strbuf *sb, const char *s) {
strbuf_add(sb, s, strlen(s));
}
-static inline void strbuf_addbuf(struct strbuf *sb, struct strbuf *sb2) {
+static inline void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2) {
strbuf_add(sb, sb2->buf, sb2->len);
}
extern void strbuf_adddup(struct strbuf *sb, size_t pos, size_t len);
--
1.5.4.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] Add some useful functions for strbuf manipulation.
2008-07-13 18:28 ` [PATCH] Make some strbuf_*() struct strbuf arguments const Lukas Sandström
@ 2008-07-13 18:29 ` Lukas Sandström
2008-07-13 18:30 ` [PATCH] git-mailinfo: use strbuf's instead of fixed buffers Lukas Sandström
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-13 18:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
strbuf.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
strbuf.h | 6 +++++
2 files changed, 76 insertions(+), 0 deletions(-)
diff --git a/strbuf.c b/strbuf.c
index 7767170..6294940 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -60,6 +60,18 @@ void strbuf_grow(struct strbuf *sb, size_t extra)
ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
}
+void strbuf_trim(struct strbuf *sb)
+{
+ char *b = sb->buf;
+ while (sb->len > 0 && isspace((unsigned char)sb->buf[sb->len - 1]))
+ sb->len--;
+ while(sb->len > 0 && isspace(*b)) {
+ b++;
+ sb->len--;
+ }
+ memmove(sb->buf, b, sb->len);
+ sb->buf[sb->len] = '\0';
+}
void strbuf_rtrim(struct strbuf *sb)
{
while (sb->len > 0 && isspace((unsigned char)sb->buf[sb->len - 1]))
@@ -67,6 +79,64 @@ void strbuf_rtrim(struct strbuf *sb)
sb->buf[sb->len] = '\0';
}
+void strbuf_ltrim(struct strbuf *sb)
+{
+ char *b = sb->buf;
+ while(sb->len > 0 && isspace(*b)) {
+ b++;
+ sb->len--;
+ }
+ memmove(sb->buf, b, sb->len);
+ sb->buf[sb->len] = '\0';
+}
+
+void strbuf_tolower(struct strbuf *sb)
+{
+ int i;
+ for (i = 0; i < sb->len; i++)
+ sb->buf[i] = tolower(sb->buf[i]);
+}
+
+struct strbuf ** strbuf_split(const struct strbuf *sb, int delim)
+{
+ int alloc = 2, pos = 0;
+ char *n, *p;
+ struct strbuf **ret;
+ struct strbuf *t;
+
+ ret = xcalloc(alloc, sizeof(struct strbuf *));
+ p = n = sb->buf;
+ while (n < sb->buf + sb->len) {
+ int len;
+ n = memchr(n, delim, sb->len - (n - sb->buf));
+ if (pos + 1 >= alloc) {
+ alloc = alloc * 2;
+ ret = xrealloc(ret, sizeof(struct strbuf *) * alloc);
+ }
+ if (!n)
+ n = sb->buf + sb->len - 1;
+ len = n - p + 1;
+ t = xmalloc(sizeof(struct strbuf));
+ strbuf_init(t, len);
+ strbuf_add(t, p, len);
+ ret[pos] = t;
+ ret[++pos] = NULL;
+ p = ++n;
+ }
+ return ret;
+}
+
+void strbuf_list_free(struct strbuf ** sbs)
+{
+ struct strbuf **s = sbs;
+
+ while(*s) {
+ strbuf_release(*s);
+ free(*s++);
+ }
+ free(sbs);
+}
+
int strbuf_cmp(const struct strbuf *a, const struct strbuf *b)
{
int cmp;
diff --git a/strbuf.h b/strbuf.h
index a1b0143..5e65a3f 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -77,8 +77,14 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) {
#define strbuf_reset(sb) strbuf_setlen(sb, 0)
/*----- content related -----*/
+extern void strbuf_trim(struct strbuf *);
extern void strbuf_rtrim(struct strbuf *);
+extern void strbuf_ltrim(struct strbuf *);
extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
+extern void strbuf_tolower(struct strbuf *);
+
+extern struct strbuf ** strbuf_split(const struct strbuf*, int delim);
+extern void strbuf_list_free(struct strbuf **);
/*----- add data in your buffer -----*/
static inline void strbuf_addch(struct strbuf *sb, int c) {
--
1.5.4.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] git-mailinfo: use strbuf's instead of fixed buffers
2008-07-13 18:29 ` [PATCH] Add some useful functions for strbuf manipulation Lukas Sandström
@ 2008-07-13 18:30 ` Lukas Sandström
2008-07-13 21:37 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Lukas Sandström @ 2008-07-13 18:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
---
builtin-mailinfo.c | 750 ++++++++++++++++++++++++----------------------------
1 files changed, 348 insertions(+), 402 deletions(-)
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index 2d1520f..329ac43 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -5,14 +5,15 @@
#include "cache.h"
#include "builtin.h"
#include "utf8.h"
+#include "strbuf.h"
static FILE *cmitmsg, *patchfile, *fin, *fout;
static int keep_subject;
static const char *metainfo_charset;
-static char line[1000];
-static char name[1000];
-static char email[1000];
+static struct strbuf line = STRBUF_INIT;
+static struct strbuf name = STRBUF_INIT;
+static struct strbuf email = STRBUF_INIT;
static enum {
TE_DONTCARE, TE_QP, TE_BASE64,
@@ -21,74 +22,77 @@ static enum {
TYPE_TEXT, TYPE_OTHER,
} message_type;
-static char charset[256];
+static struct strbuf charset = STRBUF_INIT;
static int patch_lines;
-static char **p_hdr_data, **s_hdr_data;
+static struct strbuf **p_hdr_data, **s_hdr_data;
#define MAX_HDR_PARSED 10
#define MAX_BOUNDARIES 5
-static char *sanity_check(char *name, char *email)
+static void get_sane_name(struct strbuf *out, struct strbuf *name, struct strbuf *email)
{
- int len = strlen(name);
- if (len < 3 || len > 60)
- return email;
- if (strchr(name, '@') || strchr(name, '<') || strchr(name, '>'))
- return email;
- return name;
+ struct strbuf *src = name;
+ if (name->len < 3 || 60 < name->len || strchr(name->buf, '@') ||
+ strchr(name->buf, '<') || strchr(name->buf, '>'))
+ src = email;
+ else if (name == out)
+ return;
+ strbuf_reset(out);
+ strbuf_addbuf(out, src);
}
-static int bogus_from(char *line)
+static void parse_bogus_from(const struct strbuf *line)
{
/* John Doe <johndoe> */
- char *bra, *ket, *dst, *cp;
+ char *bra, *ket;
/* This is fallback, so do not bother if we already have an
* e-mail address.
*/
- if (*email)
- return 0;
+ if (email.len)
+ return;
- bra = strchr(line, '<');
+ bra = strchr(line->buf, '<');
if (!bra)
- return 0;
+ return;
ket = strchr(bra, '>');
if (!ket)
- return 0;
+ return;
- for (dst = email, cp = bra+1; cp < ket; )
- *dst++ = *cp++;
- *dst = 0;
- for (cp = line; isspace(*cp); cp++)
- ;
- for (bra--; isspace(*bra); bra--)
- *bra = 0;
- cp = sanity_check(cp, email);
- strcpy(name, cp);
- return 1;
+ strbuf_reset(&email);
+ strbuf_add(&email, bra + 1, ket - bra - 1);
+
+ strbuf_reset(&name);
+ strbuf_add(&name, line->buf, bra - line->buf);
+ strbuf_trim(&name);
+ get_sane_name(&name, &name, &email);
}
-static int handle_from(char *in_line)
+static void handle_from(const struct strbuf *from)
{
- char line[1000];
char *at;
- char *dst;
+ size_t el;
+ struct strbuf f;
- strcpy(line, in_line);
- at = strchr(line, '@');
+ strbuf_init(&f, from->len);
+ strbuf_addbuf(&f, from);
+
+ at = strchr(f.buf, '@');
if (!at)
- return bogus_from(line);
+ return parse_bogus_from(from);
/*
* If we already have one email, don't take any confusing lines
*/
- if (*email && strchr(at+1, '@'))
- return 0;
+ if (email.len && strchr(at + 1, '@')) {
+ strbuf_release(&f);
+ return;
+ }
/* Pick up the string around '@', possibly delimited with <>
- * pair; that is the email part. White them out while copying.
+ * pair; that is the email part.
*/
- while (at > line) {
+ while (at > f.buf) {
char c = at[-1];
if (isspace(c))
break;
@@ -98,56 +102,35 @@ static int handle_from(char *in_line)
}
at--;
}
- dst = email;
- for (;;) {
- unsigned char c = *at;
- if (!c || c == '>' || isspace(c)) {
- if (c == '>')
- *at = ' ';
- break;
- }
- *at++ = ' ';
- *dst++ = c;
- }
- *dst++ = 0;
+ el = strcspn(at, " \n\t\r\v\f>");
+ strbuf_reset(&email);
+ strbuf_add(&email, at, el);
+ strbuf_remove(&f, at - f.buf, el + 1);
/* The remainder is name. It could be "John Doe <john.doe@xz>"
- * or "john.doe@xz (John Doe)", but we have whited out the
+ * or "john.doe@xz (John Doe)", but we have removed the
* email part, so trim from both ends, possibly removing
* the () pair at the end.
*/
- at = line + strlen(line);
- while (at > line) {
- unsigned char c = *--at;
- if (!isspace(c)) {
- at[(c == ')') ? 0 : 1] = 0;
- break;
- }
- }
+ strbuf_trim(&f);
+ if (f.buf[0] == '(')
+ strbuf_remove(&name, 0, 1);
+ if (f.len && f.buf[f.len - 1] == ')')
+ strbuf_setlen(&f, f.len - 1);
- at = line;
- for (;;) {
- unsigned char c = *at;
- if (!c || !isspace(c)) {
- if (c == '(')
- at++;
- break;
- }
- at++;
- }
- at = sanity_check(at, email);
- strcpy(name, at);
- return 1;
+ get_sane_name(&name, &f, &email);
+ strbuf_release(&f);
}
-static int handle_header(char *line, char *data, int ofs)
+static void handle_header(struct strbuf **out, const struct strbuf *line)
{
- if (!line || !data)
- return 1;
-
- strcpy(data, line+ofs);
+ if (!*out) {
+ *out = xmalloc(sizeof(struct strbuf));
+ strbuf_init(*out, line->len);
+ } else
+ strbuf_reset(*out);
- return 0;
+ strbuf_addbuf(*out, line);
}
/* NOTE NOTE NOTE. We do not claim we do full MIME. We just attempt
@@ -156,13 +139,13 @@ static int handle_header(char *line, char *data, int ofs)
* case insensitively.
*/
-static int slurp_attr(const char *line, const char *name, char *attr)
+static int slurp_attr(const char *line, const char *name, struct strbuf *attr)
{
const char *ends, *ap = strcasestr(line, name);
size_t sz;
if (!ap) {
- *attr = 0;
+ strbuf_setlen(attr, 0);
return 0;
}
ap += strlen(name);
@@ -173,180 +156,171 @@ static int slurp_attr(const char *line, const char *name, char *attr)
else
ends = "; \t";
sz = strcspn(ap, ends);
- memcpy(attr, ap, sz);
- attr[sz] = 0;
+ strbuf_add(attr, ap, sz);
return 1;
}
-struct content_type {
- char *boundary;
- int boundary_len;
-};
-
-static struct content_type content[MAX_BOUNDARIES];
+static struct strbuf *content[MAX_BOUNDARIES];
-static struct content_type *content_top = content;
+static struct strbuf **content_top = content;
-static int handle_content_type(char *line)
+static void handle_content_type(struct strbuf *line)
{
- char boundary[256];
+ struct strbuf *boundary = xmalloc(sizeof(struct strbuf));
+ strbuf_init(boundary, line->len);
- if (strcasestr(line, "text/") == NULL)
+ if (!strcasestr(line->buf, "text/"))
message_type = TYPE_OTHER;
- if (slurp_attr(line, "boundary=", boundary + 2)) {
- memcpy(boundary, "--", 2);
+ if (slurp_attr(line->buf, "boundary=", boundary)) {
+ strbuf_insert(boundary, 0, "--", 2);
if (content_top++ >= &content[MAX_BOUNDARIES]) {
fprintf(stderr, "Too many boundaries to handle\n");
exit(1);
}
- content_top->boundary_len = strlen(boundary);
- content_top->boundary = xmalloc(content_top->boundary_len+1);
- strcpy(content_top->boundary, boundary);
+ *content_top = boundary;
+ boundary = NULL;
}
- if (slurp_attr(line, "charset=", charset)) {
- int i, c;
- for (i = 0; (c = charset[i]) != 0; i++)
- charset[i] = tolower(c);
+ if (slurp_attr(line->buf, "charset=", &charset))
+ strbuf_tolower(&charset);
+
+ if (boundary) {
+ strbuf_release(boundary);
+ free(boundary);
}
- return 0;
}
-static int handle_content_transfer_encoding(char *line)
+static void handle_content_transfer_encoding(const struct strbuf *line)
{
- if (strcasestr(line, "base64"))
+ if (strcasestr(line->buf, "base64"))
transfer_encoding = TE_BASE64;
- else if (strcasestr(line, "quoted-printable"))
+ else if (strcasestr(line->buf, "quoted-printable"))
transfer_encoding = TE_QP;
else
transfer_encoding = TE_DONTCARE;
- return 0;
-}
-
-static int is_multipart_boundary(const char *line)
-{
- return (!memcmp(line, content_top->boundary, content_top->boundary_len));
}
-static int eatspace(char *line)
+static int is_multipart_boundary(const struct strbuf *line)
{
- int len = strlen(line);
- while (len > 0 && isspace(line[len-1]))
- line[--len] = 0;
- return len;
+ return !strbuf_cmp(line, *content_top);
}
-static char *cleanup_subject(char *subject)
+static void cleanup_subject(struct strbuf *subject)
{
- for (;;) {
- char *p;
- int len, remove;
- switch (*subject) {
+ char *pos;
+ size_t remove;
+ while (subject->len) {
+ switch (*subject->buf) {
case 'r': case 'R':
- if (!memcmp("e:", subject+1, 2)) {
- subject += 3;
+ if (subject->len <= 3)
+ break;
+ if (!memcmp(subject->buf + 1, "e:", 2)) {
+ strbuf_remove(subject, 0, 3);
continue;
}
break;
case ' ': case '\t': case ':':
- subject++;
+ strbuf_remove(subject, 0, 1);
continue;
-
case '[':
- p = strchr(subject, ']');
- if (!p) {
- subject++;
- continue;
- }
- len = strlen(p);
- remove = p - subject;
- if (remove <= len *2) {
- subject = p+1;
- continue;
- }
+ if ((pos = strchr(subject->buf, ']'))) {
+ remove = pos - subject->buf + 1;
+ /* Don't remove too much. */
+ if (remove <= (subject->len - remove + 1) * 2) {
+ strbuf_remove(subject, 0, remove);
+ continue;
+ }
+ } else
+ strbuf_remove(subject, 0, 1);
break;
}
- eatspace(subject);
- return subject;
+ strbuf_trim(subject);
+ return;
}
}
-static void cleanup_space(char *buf)
+static void cleanup_space(struct strbuf *sb)
{
- unsigned char c;
- while ((c = *buf) != 0) {
- buf++;
- if (isspace(c)) {
- buf[-1] = ' ';
- c = *buf;
- while (isspace(c)) {
- int len = strlen(buf);
- memmove(buf, buf+1, len);
- c = *buf;
- }
+ size_t pos, cnt;
+ for (pos = 0; pos < sb->len; pos++) {
+ if (isspace(sb->buf[pos])) {
+ sb->buf[pos] = ' ';
+ for (cnt = 0; isspace(sb->buf[pos + cnt + 1]); cnt++);
+ strbuf_remove(sb, pos + 1, cnt);
}
}
}
-static void decode_header(char *it, unsigned itsize);
+static void decode_header(struct strbuf *line);
static const char *header[MAX_HDR_PARSED] = {
"From","Subject","Date",
};
-static int check_header(char *line, unsigned linesize, char **hdr_data, int overwrite)
+static inline int cmp_header(const struct strbuf *line, const char *hdr)
{
- int i;
+ int len = strlen(hdr);
+ return !strncasecmp(line->buf, hdr, len) && line->len > len &&
+ line->buf[len] == ':' && isspace(line->buf[len + 1]);
+}
+static int check_header(const struct strbuf *line,
+ struct strbuf *hdr_data[], int overwrite)
+{
+ int i, ret = 0, len;
+ struct strbuf sb = STRBUF_INIT;
/* search for the interesting parts */
for (i = 0; header[i]; i++) {
int len = strlen(header[i]);
- if ((!hdr_data[i] || overwrite) &&
- !strncasecmp(line, header[i], len) &&
- line[len] == ':' && isspace(line[len + 1])) {
+ if ((!hdr_data[i] || overwrite) && cmp_header(line, header[i])) {
/* Unwrap inline B and Q encoding, and optionally
* normalize the meta information to utf8.
*/
- decode_header(line + len + 2, linesize - len - 2);
- hdr_data[i] = xmalloc(1000 * sizeof(char));
- if (! handle_header(line, hdr_data[i], len + 2)) {
- return 1;
- }
+ strbuf_add(&sb, line->buf + len + 2, line->len - len -2);
+ decode_header(&sb);
+ handle_header(&hdr_data[i], &sb);
+ ret = 1;
+ goto check_header_out;
}
}
/* Content stuff */
- if (!strncasecmp(line, "Content-Type", 12) &&
- line[12] == ':' && isspace(line[12 + 1])) {
- decode_header(line + 12 + 2, linesize - 12 - 2);
- if (! handle_content_type(line)) {
- return 1;
- }
- }
- if (!strncasecmp(line, "Content-Transfer-Encoding", 25) &&
- line[25] == ':' && isspace(line[25 + 1])) {
- decode_header(line + 25 + 2, linesize - 25 - 2);
- if (! handle_content_transfer_encoding(line)) {
- return 1;
- }
+ if (cmp_header(line, "Content-Type")) {
+ len = strlen("Content-Type: ");
+ strbuf_add(&sb, line->buf + len, line->len - len);
+ decode_header(&sb);
+ strbuf_insert(&sb, 0, "Content-Type: ", len);
+ handle_content_type(&sb);
+ ret = 1;
+ goto check_header_out;
+ }
+ if (cmp_header(line, "Content-Transfer-Encoding")) {
+ len = strlen("Content-Transfer-Encoding: ");
+ strbuf_add(&sb, line->buf + len, line->len - len);
+ decode_header(&sb);
+ handle_content_transfer_encoding(&sb);
+ ret = 1;
+ goto check_header_out;
}
/* for inbody stuff */
- if (!memcmp(">From", line, 5) && isspace(line[5]))
- return 1;
- if (!memcmp("[PATCH]", line, 7) && isspace(line[7])) {
+ if (!prefixcmp(line->buf, ">From") && isspace(line->buf[5]))
+ ret = 1; /* Should this return 0? */
+ goto check_header_out;
+ if (!prefixcmp(line->buf, "[PATCH]") && isspace(line->buf[7])) {
for (i = 0; header[i]; i++) {
if (!memcmp("Subject", header[i], 7)) {
- if (! handle_header(line, hdr_data[i], 0)) {
- return 1;
- }
+ handle_header(&hdr_data[i], line);
+ ret = 1;
+ goto check_header_out;
}
}
}
- /* no match */
- return 0;
+check_header_out:
+ strbuf_release(&sb);
+ return ret;
}
-static int is_rfc2822_header(char *line)
+static int is_rfc2822_header(const struct strbuf *line)
{
/*
* The section that defines the loosest possible
@@ -357,15 +331,15 @@ static int is_rfc2822_header(char *line)
* ftext = %d33-57 / %59-126
*/
int ch;
- char *cp = line;
+ char *cp = line->buf;
/* Count mbox From headers as headers */
- if (!memcmp(line, "From ", 5) || !memcmp(line, ">From ", 6))
+ if (!prefixcmp(cp, "From ") || !prefixcmp(cp, ">From "))
return 1;
while ((ch = *cp++)) {
if (ch == ':')
- return cp != line;
+ return 1;
if ((33 <= ch && ch <= 57) ||
(59 <= ch && ch <= 126))
continue;
@@ -374,34 +348,20 @@ static int is_rfc2822_header(char *line)
return 0;
}
-/*
- * sz is size of 'line' buffer in bytes. Must be reasonably
- * long enough to hold one physical real-world e-mail line.
- */
-static int read_one_header_line(char *line, int sz, FILE *in)
+static int read_one_header_line(struct strbuf *line, FILE *in)
{
- int len;
-
- /*
- * We will read at most (sz-1) bytes and then potentially
- * re-add NUL after it. Accessing line[sz] after this is safe
- * and we can allow len to grow up to and including sz.
- */
- sz--;
-
/* Get the first part of the line. */
- if (!fgets(line, sz, in))
+ if (strbuf_getline(line, in, '\n'))
return 0;
/*
* Is it an empty line or not a valid rfc2822 header?
* If so, stop here, and return false ("not a header")
*/
- len = eatspace(line);
- if (!len || !is_rfc2822_header(line)) {
+ strbuf_rtrim(line);
+ if (!line->len || !is_rfc2822_header(line)) {
/* Re-add the newline */
- line[len] = '\n';
- line[len + 1] = '\0';
+ strbuf_addch(line, '\n');
return 0;
}
@@ -410,65 +370,53 @@ static int read_one_header_line(char *line, int sz, FILE *in)
* Yuck, 2822 header "folding"
*/
for (;;) {
- int peek, addlen;
- static char continuation[1000];
+ int peek;
+ struct strbuf continuation = STRBUF_INIT;
peek = fgetc(in); ungetc(peek, in);
if (peek != ' ' && peek != '\t')
break;
- if (!fgets(continuation, sizeof(continuation), in))
+ if (strbuf_getline(&continuation, in, '\n'))
break;
- addlen = eatspace(continuation);
- if (len < sz - 1) {
- if (addlen >= sz - len)
- addlen = sz - len - 1;
- memcpy(line + len, continuation, addlen);
- line[len] = '\n';
- len += addlen;
- }
+ continuation.buf[0] = '\n';
+ strbuf_rtrim(&continuation);
+ strbuf_addbuf(line, &continuation);
}
- line[len] = 0;
return 1;
}
-static int decode_q_segment(char *in, char *ot, unsigned otsize, char *ep, int rfc2047)
+static struct strbuf *decode_q_segment(const struct strbuf *q_seg, int rfc2047)
{
- char *otbegin = ot;
- char *otend = ot + otsize;
+ const char *in = q_seg->buf;
int c;
- while ((c = *in++) != 0 && (in <= ep)) {
- if (ot == otend) {
- *--ot = '\0';
- return -1;
- }
+ struct strbuf *out = xmalloc(sizeof(struct strbuf));
+ strbuf_init(out, q_seg->len);
+
+ while ((c = *in++) != 0) {
if (c == '=') {
int d = *in++;
if (d == '\n' || !d)
break; /* drop trailing newline */
- *ot++ = ((hexval(d) << 4) | hexval(*in++));
+ strbuf_addch(out, (hexval(d) << 4) | hexval(*in++));
continue;
}
if (rfc2047 && c == '_') /* rfc2047 4.2 (2) */
c = 0x20;
- *ot++ = c;
+ strbuf_addch(out, c);
}
- *ot = 0;
- return (ot - otbegin);
+ return out;
}
-static int decode_b_segment(char *in, char *ot, unsigned otsize, char *ep)
+static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
{
/* Decode in..ep, possibly in-place to ot */
int c, pos = 0, acc = 0;
- char *otbegin = ot;
- char *otend = ot + otsize;
+ const char *in = b_seg->buf;
+ struct strbuf *out = xmalloc(sizeof(struct strbuf));
+ strbuf_init(out, b_seg->len);
- while ((c = *in++) != 0 && (in <= ep)) {
- if (ot == otend) {
- *--ot = '\0';
- return -1;
- }
+ while ((c = *in++) != 0) {
if (c == '+')
c = 62;
else if (c == '/')
@@ -493,21 +441,20 @@ static int decode_b_segment(char *in, char *ot, unsigned otsize, char *ep)
acc = (c << 2);
break;
case 1:
- *ot++ = (acc | (c >> 4));
+ strbuf_addch(out, (acc | (c >> 4)));
acc = (c & 15) << 4;
break;
case 2:
- *ot++ = (acc | (c >> 2));
+ strbuf_addch(out, (acc | (c >> 2)));
acc = (c & 3) << 6;
break;
case 3:
- *ot++ = (acc | c);
+ strbuf_addch(out, (acc | c));
acc = pos = 0;
break;
}
}
- *ot = 0;
- return (ot - otbegin);
+ return out;
}
/*
@@ -521,16 +468,16 @@ static int decode_b_segment(char *in, char *ot, unsigned otsize, char *ep)
* Otherwise, we default to assuming it is Latin1 for historical
* reasons.
*/
-static const char *guess_charset(const char *line, const char *target_charset)
+static const char *guess_charset(const struct strbuf *line, const char *target_charset)
{
if (is_encoding_utf8(target_charset)) {
- if (is_utf8(line))
+ if (is_utf8(line->buf))
return NULL;
}
return "latin1";
}
-static void convert_to_utf8(char *line, unsigned linesize, const char *charset)
+static void convert_to_utf8(struct strbuf *line, const char *charset)
{
char *out;
@@ -542,112 +489,119 @@ static void convert_to_utf8(char *line, unsigned linesize, const char *charset)
if (!strcmp(metainfo_charset, charset))
return;
- out = reencode_string(line, metainfo_charset, charset);
+ out = reencode_string(line->buf, metainfo_charset, charset);
if (!out)
die("cannot convert from %s to %s\n",
charset, metainfo_charset);
- strlcpy(line, out, linesize);
- free(out);
+ strbuf_attach(line, out, strlen(out), strlen(out));
}
-static int decode_header_bq(char *it, unsigned itsize)
+static int decode_header_bq(struct strbuf *it)
{
char *in, *out, *ep, *cp, *sp;
- char outbuf[1000];
+ struct strbuf outbuf = STRBUF_INIT, *dec;
+ struct strbuf charset_q = STRBUF_INIT, piecebuf = STRBUF_INIT;
int rfc2047 = 0;
- in = it;
- out = outbuf;
- while ((ep = strstr(in, "=?")) != NULL) {
- int sz, encoding;
- char charset_q[256], piecebuf[256];
+ in = it->buf;
+ while (in - it->buf <= it->len && (ep = strstr(in, "=?")) != NULL) {
+ int encoding;
+ strbuf_reset(&charset_q);
+ strbuf_reset(&piecebuf);
rfc2047 = 1;
if (in != ep) {
- sz = ep - in;
- memcpy(out, in, sz);
- out += sz;
- in += sz;
+ strbuf_add(&outbuf, in, ep - in);
+ in = ep;
}
/* E.g.
* ep : "=?iso-2022-jp?B?GyR...?= foo"
* ep : "=?ISO-8859-1?Q?Foo=FCbar?= baz"
*/
ep += 2;
- cp = strchr(ep, '?');
- if (!cp)
- return rfc2047; /* no munging */
- for (sp = ep; sp < cp; sp++)
- charset_q[sp - ep] = tolower(*sp);
- charset_q[cp - ep] = 0;
+
+ if (ep - it->buf >= it->len || !(cp = strchr(ep, '?')))
+ goto decode_header_bq_out;
+
+ if (cp + 3 - it->buf > it->len)
+ goto decode_header_bq_out;
+ strbuf_add(&charset_q, ep, cp - ep);
+ strbuf_tolower(&charset_q);
+
encoding = cp[1];
if (!encoding || cp[2] != '?')
- return rfc2047; /* no munging */
+ goto decode_header_bq_out;
ep = strstr(cp + 3, "?=");
if (!ep)
- return rfc2047; /* no munging */
+ goto decode_header_bq_out;
+ strbuf_add(&piecebuf, cp + 3, ep - cp - 3);
switch (tolower(encoding)) {
default:
- return rfc2047; /* no munging */
+ goto decode_header_bq_out;
case 'b':
- sz = decode_b_segment(cp + 3, piecebuf, sizeof(piecebuf), ep);
+ dec = decode_b_segment(&piecebuf);
break;
case 'q':
- sz = decode_q_segment(cp + 3, piecebuf, sizeof(piecebuf), ep, 1);
+ dec = decode_q_segment(&piecebuf, 1);
break;
}
- if (sz < 0)
- return rfc2047;
if (metainfo_charset)
- convert_to_utf8(piecebuf, sizeof(piecebuf), charset_q);
+ convert_to_utf8(dec, charset_q.buf);
- sz = strlen(piecebuf);
- if (outbuf + sizeof(outbuf) <= out + sz)
- return rfc2047; /* no munging */
- strcpy(out, piecebuf);
- out += sz;
+ strbuf_addbuf(&outbuf, dec);
+ strbuf_release(dec);
+ free(dec);
in = ep + 2;
}
- strcpy(out, in);
- strlcpy(it, outbuf, itsize);
+ strbuf_addstr(&outbuf, in);
+ strbuf_reset(it);
+ strbuf_addbuf(it, &outbuf);
+decode_header_bq_out:
+ strbuf_release(&outbuf);
+ strbuf_release(&charset_q);
+ strbuf_release(&piecebuf);
return rfc2047;
}
-static void decode_header(char *it, unsigned itsize)
+static void decode_header(struct strbuf *it)
{
-
- if (decode_header_bq(it, itsize))
+ if (decode_header_bq(it))
return;
/* otherwise "it" is a straight copy of the input.
* This can be binary guck but there is no charset specified.
*/
if (metainfo_charset)
- convert_to_utf8(it, itsize, "");
+ convert_to_utf8(it, "");
}
-static int decode_transfer_encoding(char *line, unsigned linesize, int inputlen)
+static void decode_transfer_encoding(struct strbuf *line)
{
- char *ep;
+ struct strbuf *ret;
+ int len;
switch (transfer_encoding) {
case TE_QP:
- ep = line + inputlen;
- return decode_q_segment(line, line, linesize, ep, 0);
+ ret = decode_q_segment(line, 0);
+ break;
case TE_BASE64:
- ep = line + inputlen;
- return decode_b_segment(line, line, linesize, ep);
+ ret = decode_b_segment(line);
+ break;
case TE_DONTCARE:
default:
- return inputlen;
+ return;
}
+ strbuf_reset(line);
+ strbuf_addbuf(line, ret);
+ strbuf_release(ret);
+ free(ret);
}
-static int handle_filter(char *line, unsigned linesize, int linelen);
+static void handle_filter(struct strbuf *line);
static int find_boundary(void)
{
- while(fgets(line, sizeof(line), fin) != NULL) {
- if (is_multipart_boundary(line))
+ while(!strbuf_getline(&line, fin, '\n')) {
+ if (is_multipart_boundary(&line))
return 1;
}
return 0;
@@ -655,12 +609,17 @@ static int find_boundary(void)
static int handle_boundary(void)
{
- char newline[]="\n";
+ struct strbuf newline = STRBUF_INIT;
+
+ strbuf_addch(&newline, '\n');
again:
- if (!memcmp(line+content_top->boundary_len, "--", 2)) {
+ if (line.len >= (*content_top)->len + 2 &&
+ !memcmp(line.buf + (*content_top)->len, "--", 2)) {
/* we hit an end boundary */
/* pop the current boundary off the stack */
- free(content_top->boundary);
+ strbuf_release(*content_top);
+ free(*content_top);
+ *content_top = NULL;
/* technically won't happen as is_multipart_boundary()
will fail first. But just in case..
@@ -670,7 +629,8 @@ again:
"can't recover\n");
exit(1);
}
- handle_filter(newline, sizeof(newline), strlen(newline));
+ handle_filter(&newline);
+ strbuf_release(&newline);
/* skip to the next boundary */
if (!find_boundary())
@@ -680,39 +640,44 @@ again:
/* set some defaults */
transfer_encoding = TE_DONTCARE;
- charset[0] = 0;
+ strbuf_reset(&charset);
message_type = TYPE_TEXT;
/* slurp in this section's info */
- while (read_one_header_line(line, sizeof(line), fin))
- check_header(line, sizeof(line), p_hdr_data, 0);
+ while (read_one_header_line(&line, fin))
+ check_header(&line, p_hdr_data, 0);
+ strbuf_release(&newline);
/* eat the blank line after section info */
- return (fgets(line, sizeof(line), fin) != NULL);
+ return (strbuf_getline(&line, fin, '\n') == 0);
}
-static inline int patchbreak(const char *line)
+static inline int patchbreak(const struct strbuf *line)
{
+ size_t i;
+
/* Beginning of a "diff -" header? */
- if (!memcmp("diff -", line, 6))
+ if (!prefixcmp(line->buf, "diff -"))
return 1;
/* CVS "Index: " line? */
- if (!memcmp("Index: ", line, 7))
+ if (!prefixcmp(line->buf, "Index: "))
return 1;
/*
* "--- <filename>" starts patches without headers
* "---<sp>*" is a manual separator
*/
- if (!memcmp("---", line, 3)) {
- line += 3;
+ if (line->len < 4)
+ return 0;
+
+ if (!prefixcmp(line->buf, "---")) {
/* space followed by a filename? */
- if (line[0] == ' ' && !isspace(line[1]))
+ if (line->buf[3] == ' ' && !isspace(line->buf[4]))
return 1;
/* Just whitespace? */
- for (;;) {
- unsigned char c = *line++;
+ for (i = 3; i < line->len; i++) {
+ unsigned char c = line->buf[i];
if (c == '\n')
return 1;
if (!isspace(c))
@@ -723,32 +688,25 @@ static inline int patchbreak(const char *line)
return 0;
}
-
-static int handle_commit_msg(char *line, unsigned linesize)
+static int handle_commit_msg(struct strbuf *line)
{
static int still_looking = 1;
- char *endline = line + linesize;
+ char *c;
if (!cmitmsg)
return 0;
if (still_looking) {
- char *cp = line;
- if (isspace(*line)) {
- for (cp = line + 1; *cp; cp++) {
- if (!isspace(*cp))
- break;
- }
- if (!*cp)
- return 0;
- }
- if ((still_looking = check_header(cp, endline - cp, s_hdr_data, 0)) != 0)
+ strbuf_ltrim(line);
+ if (!line->len)
+ return 0;
+ if ((still_looking = check_header(line, s_hdr_data, 0)) != 0)
return 0;
}
/* normalize the log message to UTF-8. */
if (metainfo_charset)
- convert_to_utf8(line, endline - line, charset);
+ convert_to_utf8(line, charset.buf);
if (patchbreak(line)) {
fclose(cmitmsg);
@@ -756,142 +714,132 @@ static int handle_commit_msg(char *line, unsigned linesize)
return 1;
}
- fputs(line, cmitmsg);
+ fputs(line->buf, cmitmsg);
return 0;
}
-static int handle_patch(char *line, int len)
+static void handle_patch(const struct strbuf *line)
{
- fwrite(line, 1, len, patchfile);
+ fwrite(line->buf, 1, line->len, patchfile);
patch_lines++;
- return 0;
}
-static int handle_filter(char *line, unsigned linesize, int linelen)
+static void handle_filter(struct strbuf *line)
{
static int filter = 0;
- /* filter tells us which part we left off on
- * a non-zero return indicates we hit a filter point
- */
+ /* filter tells us which part we left off on */
switch (filter) {
case 0:
- if (!handle_commit_msg(line, linesize))
+ if (!handle_commit_msg(line))
break;
filter++;
case 1:
- if (!handle_patch(line, linelen))
- break;
- filter++;
- default:
- return 1;
+ handle_patch(line);
+ break;
}
-
- return 0;
}
static void handle_body(void)
{
- int rc = 0;
- static char newline[2000];
- static char *np = newline;
- int len = strlen(line);
+ int len = 0;
+ struct strbuf prev = STRBUF_INIT;
/* Skip up to the first boundary */
- if (content_top->boundary) {
+ if (*content_top) {
if (!find_boundary())
- return;
+ goto handle_body_out;
}
do {
+ strbuf_setlen(&line, line.len + len);
+
/* process any boundary lines */
- if (content_top->boundary && is_multipart_boundary(line)) {
+ if (*content_top && is_multipart_boundary(&line)) {
/* flush any leftover */
- if (np != newline)
- handle_filter(newline, sizeof(newline),
- np - newline);
+ if (line.len)
+ handle_filter(&line);
+
if (!handle_boundary())
- return;
- len = strlen(line);
+ goto handle_body_out;
}
/* Unwrap transfer encoding */
- len = decode_transfer_encoding(line, sizeof(line), len);
- if (len < 0) {
- error("Malformed input line");
- return;
- }
+ decode_transfer_encoding(&line);
switch (transfer_encoding) {
case TE_BASE64:
case TE_QP:
{
- char *op = line;
+ struct strbuf **lines, **it, *sb;
+
+ /* Prepend any previous partial lines */
+ strbuf_insert(&line, 0, prev.buf, prev.len);
+ strbuf_reset(&prev);
/* binary data most likely doesn't have newlines */
if (message_type != TYPE_TEXT) {
- rc = handle_filter(line, sizeof(line), len);
+ handle_filter(&line);
break;
}
-
/*
* This is a decoded line that may contain
* multiple new lines. Pass only one chunk
* at a time to handle_filter()
*/
- do {
- while (op < line + len && *op != '\n')
- *np++ = *op++;
- *np = *op;
- if (*np != 0) {
- /* should be sitting on a new line */
- *(++np) = 0;
- op++;
- rc = handle_filter(newline, sizeof(newline), np - newline);
- np = newline;
- }
- } while (op < line + len);
+ lines = strbuf_split(&line, '\n');
+ for (it = lines; (sb = *it); it++) {
+ if (*(it + 1) == NULL) /* The last line */
+ if (sb->buf[sb->len - 1] != '\n') {
+ /* Partial line, save it for later. */
+ strbuf_addbuf(&prev, sb);
+ break;
+ }
+ handle_filter(sb);
+ }
/*
- * The partial chunk is saved in newline and will be
+ * The partial chunk is saved in "prev" and will be
* appended by the next iteration of read_line_with_nul().
*/
+ strbuf_list_free(lines);
break;
}
default:
- rc = handle_filter(line, sizeof(line), len);
+ handle_filter(&line);
}
- if (rc)
- /* nothing left to filter */
- break;
- } while ((len = read_line_with_nul(line, sizeof(line), fin)));
- return;
+ strbuf_reset(&line);
+ if (strbuf_avail(&line) < 100)
+ strbuf_grow(&line, 100);
+ } while ((len = read_line_with_nul(line.buf, strbuf_avail(&line), fin)));
+
+handle_body_out:
+ strbuf_release(&prev);
}
-static void output_header_lines(FILE *fout, const char *hdr, char *data)
+static void output_header_lines(FILE *fout, const char *hdr, const struct strbuf *data)
{
+ const char *sp = data->buf;
while (1) {
- char *ep = strchr(data, '\n');
+ char *ep = strchr(sp, '\n');
int len;
if (!ep)
- len = strlen(data);
+ len = strlen(sp);
else
- len = ep - data;
- fprintf(fout, "%s: %.*s\n", hdr, len, data);
+ len = ep - sp;
+ fprintf(fout, "%s: %.*s\n", hdr, len, sp);
if (!ep)
break;
- data = ep + 1;
+ sp = ep + 1;
}
}
static void handle_info(void)
{
- char *sub;
- char *hdr;
+ struct strbuf *hdr;
int i;
for (i = 0; header[i]; i++) {
-
/* only print inbody headers if we output a patch file */
if (patch_lines && s_hdr_data[i])
hdr = s_hdr_data[i];
@@ -901,20 +849,18 @@ static void handle_info(void)
continue;
if (!memcmp(header[i], "Subject", 7)) {
- if (keep_subject)
- sub = hdr;
- else {
- sub = cleanup_subject(hdr);
- cleanup_space(sub);
+ if (!keep_subject) {
+ cleanup_subject(hdr);
+ cleanup_space(hdr);
}
- output_header_lines(fout, "Subject", sub);
+ output_header_lines(fout, "Subject", hdr);
} else if (!memcmp(header[i], "From", 4)) {
handle_from(hdr);
- fprintf(fout, "Author: %s\n", name);
- fprintf(fout, "Email: %s\n", email);
+ fprintf(fout, "Author: %s\n", name.buf);
+ fprintf(fout, "Email: %s\n", email.buf);
} else {
cleanup_space(hdr);
- fprintf(fout, "%s: %s\n", header[i], hdr);
+ fprintf(fout, "%s: %s\n", header[i], hdr->buf);
}
}
fprintf(fout, "\n");
@@ -941,8 +887,8 @@ static int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
return -1;
}
- p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(char *));
- s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(char *));
+ p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*p_hdr_data));
+ s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*s_hdr_data));
do {
peek = fgetc(in);
@@ -950,8 +896,8 @@ static int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
ungetc(peek, in);
/* process the email header */
- while (read_one_header_line(line, sizeof(line), fin))
- check_header(line, sizeof(line), p_hdr_data, 1);
+ while (read_one_header_line(&line, fin))
+ check_header(&line, p_hdr_data, 1);
handle_body();
handle_info();
--
1.5.4.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] git-mailinfo: use strbuf's instead of fixed buffers
2008-07-13 18:30 ` [PATCH] git-mailinfo: use strbuf's instead of fixed buffers Lukas Sandström
@ 2008-07-13 21:37 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2008-07-13 21:37 UTC (permalink / raw)
To: Lukas Sandström; +Cc: Git Mailing List
Lukas Sandström <lukass@etek.chalmers.se> writes:
> +static void parse_bogus_from(const struct strbuf *line)
> {
> ...
> +static void handle_from(const struct strbuf *from)
> {
> ...
> if (!at)
> - return bogus_from(line);
> + return parse_bogus_from(from);
That's GCCism isn't it? I'll queue this probably for 'pu' with local
fixups, so no need to resend for this particular issue, though.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] git-mailinfo: Fix getting the subject from the body
2008-07-12 9:36 ` [PATCH] git-mailinfo: Fix getting the subject from the body Junio C Hamano
2008-07-12 21:45 ` Lukas Sandström
@ 2008-07-15 3:13 ` Don Zickus
1 sibling, 0 replies; 14+ messages in thread
From: Don Zickus @ 2008-07-15 3:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lukas Sandström, Git Mailing List
On Sat, Jul 12, 2008 at 02:36:57AM -0700, Junio C Hamano wrote:
> Another thing I noticed and found puzzling is the handling of ">From "
> line that is shown in the context below. check_header() is supposed to
> return true when it handled header (i.e. not part of the commit message)
> and return false when line is not part of the header. As ">From " is part
> of the commit log message, shouldn't it return zero?
>
> Don, this part was what you introduced. Has this codepath ever been
> exercised in the real life?
Heh. Most emails I deal with usually wind up causing the code to stop
looking for header info (still_looking=0). So I never ran into that
scenario. And I never really tried to rely on inbody stuff.
I thought I was mimicing the original code, guess not.
Now that I think about it, I did run into a situation last year where
git-mailinfo parsed the '>From' as an inbody header instead of a commit
msg. I just put a stupid hack in my scripts to work around, thinking it
was my scripts.
Anyway if it returns zero, wouldn't it be better to just remove the check
to begin with? I kinda forgot why it is there in the first place (my
changes just copied it from somewhere else).
Cheers,
Don
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-07-15 3:18 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-10 21:41 [PATCH] git-mailinfo: Fix getting the subject from the body Lukas Sandström
[not found] ` <7vod55o0tx.fsf@gitster.siamese.dyndns.org>
2008-07-10 22:37 ` Lukas Sandström
2008-07-10 23:25 ` Junio C Hamano
2008-07-10 23:41 ` [PATCH] Add some useful functions for strbuf manipulation Lukas Sandström
2008-07-10 23:43 ` [PATCH/RFC] git-mailinfo: use strbuf's instead of fixed buffers Lukas Sandström
2008-07-12 6:10 ` Junio C Hamano
2008-07-13 18:17 ` ! " Lukas Sandström
2008-07-13 18:28 ` [PATCH] Make some strbuf_*() struct strbuf arguments const Lukas Sandström
2008-07-13 18:29 ` [PATCH] Add some useful functions for strbuf manipulation Lukas Sandström
2008-07-13 18:30 ` [PATCH] git-mailinfo: use strbuf's instead of fixed buffers Lukas Sandström
2008-07-13 21:37 ` Junio C Hamano
2008-07-12 9:36 ` [PATCH] git-mailinfo: Fix getting the subject from the body Junio C Hamano
2008-07-12 21:45 ` Lukas Sandström
2008-07-15 3:13 ` Don Zickus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).