* strbuf `filter' API @ 2007-10-05 20:19 Pierre Habouzit [not found] ` <1191615571-15946-2-git-send-email-madcoder@debian.org> 0 siblings, 1 reply; 8+ messages in thread From: Pierre Habouzit @ 2007-10-05 20:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Ooookay, I wasn't more thrilled about the previous patch than linus is. Here is a possible API to hide the muck in the strbuf.c module, and have a way nicer to read code in convert.c. It remais as efficient as the previous code, tries to reuse the buffer as much as possible to avoid reallocations, and deal with even more cases (it's possible to have the "source" buffer be inside the destination strbuf now, which could sometimes help a lot). I based this on top of master (because it's where my previous patch was) but it should apply on next properly. It depends upon the previous patch though. I'd say that the previous patch should go in master ASAP, and that those two could be considered for next (having merged master in it first btw). Cheers, -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1191615571-15946-2-git-send-email-madcoder@debian.org>]
* Re: [PATCH 1/2] Have a filter_start/filter_end API. [not found] ` <1191615571-15946-2-git-send-email-madcoder@debian.org> @ 2007-10-06 9:06 ` Alex Riesen 2007-10-07 14:53 ` Pierre Habouzit 0 siblings, 1 reply; 8+ messages in thread From: Alex Riesen @ 2007-10-06 9:06 UTC (permalink / raw) To: Pierre Habouzit; +Cc: Linus Torvalds, Junio C Hamano, git Pierre Habouzit, Fri, Oct 05, 2007 22:19:30 +0200: > Those are helpers to build functions that transform a buffer into a > strbuf, allowing the "buffer" to be taken from the destination buffer. They are horrible. And very specialized for these "filter" routines. To the point where I would move them into the file where they are used (convert.c only, isn't it?) If you continue to insist the code is generic enough to justify its residence in strbuf.c, continue reading. First off, what was wrong with dumb void strbuf_make_room(struct strbuf *, size_t newsize); again? > +void *strbuf_start_filter(struct strbuf *sb, const char *src, ssize_t hint) > +{ > + if (src < sb->buf || src >= sb->buf + sb->alloc) { > + if (hint > 0 && (size_t)hint >= sb->alloc) > + strbuf_grow(sb, hint - sb->len); > + return NULL; > + } > + > + if (hint < 0) > + return strbuf_detach(sb, NULL); what is that for? Why can't the caller just use strbuf_detach? (He already has to pass negative hint somehow, which should be a concious action). > + trailing space (two HTs) > + if ((size_t)hint >= sb->alloc) { > + void *tmp = strbuf_detach(sb, NULL); > + strbuf_grow(sb, hint); > + return tmp; > + } > + > + return NULL; > +} How can one know when it sb is safe to use after strbuf_end_filter? It is for the first "if", for example. free() wont free the buf in sb. Oh, right, one can check if returned pointer !NULL. Which just adds more code to handle your API. What actually happens to sb? Is it detached? Is it reallocated? When it is detached and when it is reallocated? Why is the returned pointer useful only for giving it to strbuf_end_filter? Take for example your change in crlf_to_git: @@ -85,6 +85,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len, { struct text_stat stats; char *dst; + void *tmp; if ((action == CRLF_BINARY) || !auto_crlf || !len) return 0; @@ -110,9 +111,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len, return 0; } - /* only grow if not in place */ - if (strbuf_avail(buf) + buf->len < len) - strbuf_grow(buf, len - buf->len); + tmp = strbuf_start_filter(buf, src, len); dst = buf->buf; if (action == CRLF_GUESS) { /* @@ -133,13 +132,14 @@ static int crlf_to_git(const char *path, const char *src, size_t len, } while (--len); } strbuf_setlen(buf, dst - buf->buf); + strbuf_end_filter(tmp); return 1; } And try to rewrite it with the strbuf_make_room: @@ -110,9 +111,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len, return 0; } - /* only grow if not in place */ - if (strbuf_avail(buf) + buf->len < len) - strbuf_grow(buf, len - buf->len); + strbuf_make_room(buf, len); dst = buf->buf; if (action == CRLF_GUESS) { /* The change looks rather simple > +/*----- filter API -----*/ > +extern void *strbuf_start_filter(struct strbuf *, const char *, ssize_t); > +extern void strbuf_end_filter(void *p); I find the naming very confusing: what filtering takes place where? If strbuf_end_filter is just free, why is it needed at all? For the sake of wrapping free()? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] Have a filter_start/filter_end API. 2007-10-06 9:06 ` [PATCH 1/2] Have a filter_start/filter_end API Alex Riesen @ 2007-10-07 14:53 ` Pierre Habouzit 2007-10-07 16:07 ` Alex Riesen 0 siblings, 1 reply; 8+ messages in thread From: Pierre Habouzit @ 2007-10-07 14:53 UTC (permalink / raw) To: Alex Riesen; +Cc: Linus Torvalds, Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 5549 bytes --] On Sat, Oct 06, 2007 at 09:06:21AM +0000, Alex Riesen wrote: > Pierre Habouzit, Fri, Oct 05, 2007 22:19:30 +0200: > > Those are helpers to build functions that transform a buffer into a > > strbuf, allowing the "buffer" to be taken from the destination buffer. > > They are horrible. And very specialized for these "filter" routines. > To the point where I would move them into the file where they are used > (convert.c only, isn't it?) For now they are, but moving them into convert.c is at the very least awkward. > If you continue to insist the code is generic enough to justify its > residence in strbuf.c, continue reading. > > First off, what was wrong with dumb > > void strbuf_make_room(struct strbuf *, size_t newsize); > > again? If newsize is >= sb->alloc then the area is reallocated, the pointer may move, and the "src" pointer would then be invalidated. > > +void *strbuf_start_filter(struct strbuf *sb, const char *src, ssize_t hint) > > +{ > > + if (src < sb->buf || src >= sb->buf + sb->alloc) { > > + if (hint > 0 && (size_t)hint >= sb->alloc) > > + strbuf_grow(sb, hint - sb->len); > > + return NULL; > > + } > > + > > + if (hint < 0) > > + return strbuf_detach(sb, NULL); > what is that for? Why can't the caller just use strbuf_detach? (He > already has to pass negative hint somehow, which should be a concious > action). The idea is to have a unified API to deal with both the cases where the filtering is known not to work in place by the caller, or for the cases where it could if enough space is allocated but that a realloc is needed. > > + if ((size_t)hint >= sb->alloc) { > > + void *tmp = strbuf_detach(sb, NULL); > > + strbuf_grow(sb, hint); > > + return tmp; > > + } > > + > > + return NULL; > > +} > > How can one know when it sb is safe to use after strbuf_end_filter? We could document it, that's not an issue. > It is for the first "if", for example. free() wont free the buf in sb. > Oh, right, one can check if returned pointer !NULL. Which just adds > more code to handle your API. I don't get that part. free(NULL) is totally ok. > What actually happens to sb? Is it detached? Is it reallocated? > When it is detached and when it is reallocated? It is detached if the filter does not works in place (caller says that with '-1' as a hint) or if it works in place but needs a buffer reallocation. > Why is the returned pointer useful only for giving it to > strbuf_end_filter? Because the filter works on a {src, len} buffer, that points into the buffer that we must free later in strbuf_end_filter. > Take for example your change in crlf_to_git: > @@ -85,6 +85,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len, > { > struct text_stat stats; > char *dst; > + void *tmp; > > if ((action == CRLF_BINARY) || !auto_crlf || !len) > return 0; > @@ -110,9 +111,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len, > return 0; > } > > - /* only grow if not in place */ > - if (strbuf_avail(buf) + buf->len < len) > - strbuf_grow(buf, len - buf->len); > + tmp = strbuf_start_filter(buf, src, len); > dst = buf->buf; > if (action == CRLF_GUESS) { > /* > @@ -133,13 +132,14 @@ static int crlf_to_git(const char *path, const char *src, size_t len, > } while (--len); > } > strbuf_setlen(buf, dst - buf->buf); > + strbuf_end_filter(tmp); > return 1; > } > > And try to rewrite it with the strbuf_make_room: > > @@ -110,9 +111,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len, > return 0; > } > > - /* only grow if not in place */ > - if (strbuf_avail(buf) + buf->len < len) > - strbuf_grow(buf, len - buf->len); > + strbuf_make_room(buf, len); > dst = buf->buf; > if (action == CRLF_GUESS) { > /* > > The change looks rather simple but is wrong because you may just break "src". It's not the case here because len always fits, but it's IMHO the kind of knowledge of the internals of strbufs we should not rely on. > > +/*----- filter API -----*/ > > +extern void *strbuf_start_filter(struct strbuf *, const char *, ssize_t); > > +extern void strbuf_end_filter(void *p); > > I find the naming very confusing: what filtering takes place where? > If strbuf_end_filter is just free, why is it needed at all? For the > sake of wrapping free()? Those are not filters, but help writing filters that would have this prototype: int some_filter(const char *src, size_t len, struct strbuf *dst); Allowing [src..src+len[ to be a subpart of `dst'. The usual semantics of such filters is that it returns 0 if it did nothing, 1 if it wrote in dst. This way, you can write filter_all being filter1, filter2, filter3 done one after the other this way: int filter_all(const char *src, size_t len, struct strbuf *dst) { int res = 0; res |= filter1(src, len, dst); if (res) { src = dst->buf; len = dst->len; } res |= filter2(src, len, dst); if (res) { src = dst->buf; len = dst->len; } return res | filter3(src, len, dst); } Note that I did not created this semantics, it was how convert.c worked already, in a even more convoluted way before. -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] Have a filter_start/filter_end API. 2007-10-07 14:53 ` Pierre Habouzit @ 2007-10-07 16:07 ` Alex Riesen 2007-10-07 16:52 ` Pierre Habouzit 0 siblings, 1 reply; 8+ messages in thread From: Alex Riesen @ 2007-10-07 16:07 UTC (permalink / raw) To: Pierre Habouzit, Linus Torvalds, Junio C Hamano, git Pierre Habouzit, Sun, Oct 07, 2007 16:53:55 +0200: > On Sat, Oct 06, 2007 at 09:06:21AM +0000, Alex Riesen wrote: > > If you continue to insist the code is generic enough to justify its > > residence in strbuf.c, continue reading. > > > > First off, what was wrong with dumb > > > > void strbuf_make_room(struct strbuf *, size_t newsize); > > > > again? > > If newsize is >= sb->alloc then the area is reallocated, the pointer > may move, and the "src" pointer would then be invalidated. So what? You already _have_ to know it points inside the strbuf, so you can't expect it to be valid after any serious strbuf operation. > > what is that for? Why can't the caller just use strbuf_detach? (He > > already has to pass negative hint somehow, which should be a concious > > action). > > The idea is to have a unified API to deal with both the cases where > the filtering is known not to work in place by the caller, or for the > cases where it could if enough space is allocated but that a realloc is > needed. this just makes it convoluted and opaque (as in "not transparent") > > > + if ((size_t)hint >= sb->alloc) { > > > + void *tmp = strbuf_detach(sb, NULL); > > > + strbuf_grow(sb, hint); > > > + return tmp; > > > + } > > > + > > > + return NULL; > > > +} > > > > How can one know when it sb is safe to use after strbuf_end_filter? > > We could document it, that's not an issue. The fact that you _have_to_ document is. > > It is for the first "if", for example. free() wont free the buf in sb. > > Oh, right, one can check if returned pointer !NULL. Which just adds > > more code to handle your API. > > I don't get that part. free(NULL) is totally ok. Not that. One have to store the result of start_filter and check it > > What actually happens to sb? Is it detached? Is it reallocated? > > When it is detached and when it is reallocated? > > It is detached if the filter does not works in place (caller says that > with '-1' as a hint) or if it works in place but needs a buffer > reallocation. Too many if's. Ugly > Note that I did not created this semantics, it was how convert.c > worked already, in a even more convoluted way before. And why shouldn't these semantics kept to convert.c? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] Have a filter_start/filter_end API. 2007-10-07 16:07 ` Alex Riesen @ 2007-10-07 16:52 ` Pierre Habouzit 2007-10-07 21:50 ` Alex Riesen 0 siblings, 1 reply; 8+ messages in thread From: Pierre Habouzit @ 2007-10-07 16:52 UTC (permalink / raw) To: Alex Riesen; +Cc: Linus Torvalds, Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 3367 bytes --] On Sun, Oct 07, 2007 at 04:07:07PM +0000, Alex Riesen wrote: > Pierre Habouzit, Sun, Oct 07, 2007 16:53:55 +0200: > > On Sat, Oct 06, 2007 at 09:06:21AM +0000, Alex Riesen wrote: > > > If you continue to insist the code is generic enough to justify its > > > residence in strbuf.c, continue reading. > > > > > > First off, what was wrong with dumb > > > > > > void strbuf_make_room(struct strbuf *, size_t newsize); > > > > > > again? > > > > If newsize is >= sb->alloc then the area is reallocated, the pointer > > may move, and the "src" pointer would then be invalidated. > > So what? You already _have_ to know it points inside the strbuf, so > you can't expect it to be valid after any serious strbuf operation. Then you can't write a filter, because you need to reallocate the buffer before even starting to read the input buffer sometimes. If you reallocate the strbuf, and that your original buffer was in there, you lose. > > The idea is to have a unified API to deal with both the cases where > > the filtering is known not to work in place by the caller, or for the > > cases where it could if enough space is allocated but that a realloc is > > needed. > > this just makes it convoluted and opaque (as in "not transparent") I'm totally open to better alternatives. We could probably easily get rid of strbuf_end_filter, as whichever way to deal with the issue I try to solve in a better way, in the end, it will always be just a "free". So, maybe there is a way to rename strbuf_start_filter so that it's more straightforward. The way to use the API is: @ char *to_free = NULL; @ if ((src is inside dst && need_realloc) || operation is not in-place) @ to_free = strbuf_detach(dst, NULL); @ strbuf_make_room(dst, needed_size); // do whatever you want with src and dst free(to_free); strbuf_start_filter tries to implement the block marked with `@'. Of course: * need_realloc == (needed_size >= dst->alloc) * src is inside dst means: src > dst->buf && src < dst->buf + dst->alloc Though, those are both things that I find ugly to "know" in convert.c. How things are allocated in strbufs is one of the few things we don't want to assume anywhere outside of the strbuf module. > > > It is for the first "if", for example. free() wont free the buf in sb. > > > Oh, right, one can check if returned pointer !NULL. Which just adds > > > more code to handle your API. > > > > I don't get that part. free(NULL) is totally ok. > > Not that. One have to store the result of start_filter and check it Why check it ? You don't have to check. You have to keep it until you're done with "src". Whichever the result was. The return of strbuf_start_filter intends to stash a pointer to be freed for the cases where "src" points into the destination buffer. > > Note that I did not created this semantics, it was how convert.c > > worked already, in a even more convoluted way before. > > And why shouldn't these semantics kept to convert.c? I missed where having this live in convert.c rather than in strbuf.c makes it less ugly or better in any sense. -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] Have a filter_start/filter_end API. 2007-10-07 16:52 ` Pierre Habouzit @ 2007-10-07 21:50 ` Alex Riesen 2007-10-08 7:29 ` Pierre Habouzit 0 siblings, 1 reply; 8+ messages in thread From: Alex Riesen @ 2007-10-07 21:50 UTC (permalink / raw) To: Pierre Habouzit, Linus Torvalds, Junio C Hamano, git Pierre Habouzit, Sun, Oct 07, 2007 18:52:18 +0200: > So, maybe there is a way to rename strbuf_start_filter so that it's > more straightforward. The way to use the API is: > > @ char *to_free = NULL; > @ if ((src is inside dst && need_realloc) || operation is not in-place) > @ to_free = strbuf_detach(dst, NULL); > @ strbuf_make_room(dst, needed_size); Why do you have to play these games with src being inside/outside? Don't you know where src already is? AFAICS in convert.c crlf_to_git - you do. And everywhere else you know this fact too. > // do whatever you want with src and dst > > free(to_free); > > strbuf_start_filter tries to implement the block marked with `@'. Of > course: > * need_realloc == (needed_size >= dst->alloc) > * src is inside dst means: > src > dst->buf && src < dst->buf + dst->alloc in convert.c it is never "inside". It is _exactly_ the ->buf. No need for generic "inside" check. Just test if src == ->buf. And AFAICS, there is no other users of the function. > Though, those are both things that I find ugly to "know" in convert.c. > How things are allocated in strbufs is one of the few things we don't > want to assume anywhere outside of the strbuf module. src is outside of strbuf scope. It is not internal to struct strbuf. The caller must already know if it is inside of the given strbuf instance. need_realloc is covered by make_room, isn't it? I'd suggest just fix the caller, it is simple in convert.c: just use ret, which contains exactly this information. If you insist on editing in-place, which makes your routines really need the in-placeability informaion. Just give it to them, better explicitely. All of this makes the routines very convert.c specific, which is the reason why I argument to have them just there and nowhere else. Alternatively, one can memdup ->buf (as it is the input for next filter) every time a filter modifies it (which is safe, but simple, slow, requires memory, and may fragment heap): /* simplified */ int convert_to_git(const char *path, const char *src, size_t len, struct strbuf *dst) { int crlf = CRLF_GUESS; int ident = 0, ret = 0; char *filter = NULL; const char *tmp = src; ret |= apply_filter(path, tmp, len, dst, filter); if (ret) { tmp = xmemdupz(dst->buf, dst->len); len = dst->len; } if (crlf_to_git(path, tmp, len, dst, crlf)) { ret |= 1; if (tmp != src) free((void*)tmp); tmp = xmemdupz(dst->buf, dst->len); len = dst->len; } ret |= ident_to_git(path, tmp, len, dst, ident); if (tmp != src) free((void*)tmp); return ret; } It is just to show that nothing like strbuf_start_filter is _needed_ at all. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] Have a filter_start/filter_end API. 2007-10-07 21:50 ` Alex Riesen @ 2007-10-08 7:29 ` Pierre Habouzit 2007-10-08 18:48 ` Alex Riesen 0 siblings, 1 reply; 8+ messages in thread From: Pierre Habouzit @ 2007-10-08 7:29 UTC (permalink / raw) To: Alex Riesen; +Cc: Linus Torvalds, Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 1664 bytes --] On Sun, Oct 07, 2007 at 09:50:41PM +0000, Alex Riesen wrote: > Pierre Habouzit, Sun, Oct 07, 2007 18:52:18 +0200: > > Though, those are both things that I find ugly to "know" in convert.c. > > How things are allocated in strbufs is one of the few things we don't > > want to assume anywhere outside of the strbuf module. > > src is outside of strbuf scope. It is not internal to struct strbuf. > The caller must already know if it is inside of the given strbuf > instance. > > need_realloc is covered by make_room, isn't it? Internally yes, but make_room may move the buffer, if that happens, there is nothing we can do, in the case where we point inside (or at the begining of - fwiw it's the same here) the buffer > I'd suggest just fix the caller, it is simple in convert.c: just use > ret, which contains exactly this information. If you insist on editing > in-place, which makes your routines really need the in-placeability > informaion. Just give it to them, better explicitely. All of this > makes the routines very convert.c specific, which is the reason why I > argument to have them just there and nowhere else. > > Alternatively, one can memdup ->buf (as it is the input for next > filter) every time a filter modifies it (which is safe, but simple, > slow, requires memory, and may fragment heap): This is exactly what we are trying to avoid with the current form. Given how you try to micro-optimize strbuf_cmp I'm a bit lost here… -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] Have a filter_start/filter_end API. 2007-10-08 7:29 ` Pierre Habouzit @ 2007-10-08 18:48 ` Alex Riesen 0 siblings, 0 replies; 8+ messages in thread From: Alex Riesen @ 2007-10-08 18:48 UTC (permalink / raw) To: Pierre Habouzit, Linus Torvalds, Junio C Hamano, git Pierre Habouzit, Mon, Oct 08, 2007 09:29:47 +0200: > On Sun, Oct 07, 2007 at 09:50:41PM +0000, Alex Riesen wrote: > > Pierre Habouzit, Sun, Oct 07, 2007 18:52:18 +0200: > > > Though, those are both things that I find ugly to "know" in convert.c. > > > How things are allocated in strbufs is one of the few things we don't > > > want to assume anywhere outside of the strbuf module. > > > > src is outside of strbuf scope. It is not internal to struct strbuf. > > The caller must already know if it is inside of the given strbuf > > instance. > > > > need_realloc is covered by make_room, isn't it? > > Internally yes, but make_room may move the buffer, if that happens, > there is nothing we can do, in the case where we point inside (or at the > begining of - fwiw it's the same here) the buffer update the outside pointer you can. But I actually lost all interest: personally I never use the filters and deeply despise the reasons caused their existence. > > I'd suggest just fix the caller, it is simple in convert.c: just use > > ret, which contains exactly this information. If you insist on editing > > in-place, which makes your routines really need the in-placeability > > informaion. Just give it to them, better explicitely. All of this > > makes the routines very convert.c specific, which is the reason why I > > argument to have them just there and nowhere else. > > > > Alternatively, one can memdup ->buf (as it is the input for next > > filter) every time a filter modifies it (which is safe, but simple, > > slow, requires memory, and may fragment heap): > > This is exactly what we are trying to avoid with the current form. I take this suggestion back. Do not use memdup, as it is slow, requires lots of memory and may fragment heap. Sorry for repeating. > Given how you try to micro-optimize strbuf_cmp I'm a bit lost here… > I didn't. It just happened. If I _wanted_ to optimize anything I wouldn't start with a function which is used exactly one time in a program with a name like "rerere" which is not even used in default configuration. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-10-08 18:48 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-05 20:19 strbuf `filter' API Pierre Habouzit [not found] ` <1191615571-15946-2-git-send-email-madcoder@debian.org> 2007-10-06 9:06 ` [PATCH 1/2] Have a filter_start/filter_end API Alex Riesen 2007-10-07 14:53 ` Pierre Habouzit 2007-10-07 16:07 ` Alex Riesen 2007-10-07 16:52 ` Pierre Habouzit 2007-10-07 21:50 ` Alex Riesen 2007-10-08 7:29 ` Pierre Habouzit 2007-10-08 18:48 ` Alex Riesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).