* memmove broken @ 2003-07-03 17:20 Joakim Tjernlund 2003-07-04 6:17 ` Paul Mackerras 0 siblings, 1 reply; 15+ messages in thread From: Joakim Tjernlund @ 2003-07-03 17:20 UTC (permalink / raw) To: Linuxppc-Dev@Lists. Linuxppc. Org Hi I just tried to used memmove in zlib(zlib_inflate_fast) to avoid expensive byte copies. That didn't work and the reason is that the ppc memmove can handle overlapping memory it seems. memmove(dst, dst-x, n) where x is between 1 and 39, n >= x will cause corruption for me. Jocke ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-03 17:20 memmove broken Joakim Tjernlund @ 2003-07-04 6:17 ` Paul Mackerras 2003-07-04 8:32 ` Joakim Tjernlund 0 siblings, 1 reply; 15+ messages in thread From: Paul Mackerras @ 2003-07-04 6:17 UTC (permalink / raw) To: joakim.tjernlund; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org Joakim Tjernlund writes: > I just tried to used memmove in zlib(zlib_inflate_fast) to avoid expensive > byte copies. That didn't work and the reason is that the ppc memmove can handle > overlapping memory it seems. I assume you mean "can't handle"? If so, this is interesting because it is supposed to work. :) > memmove(dst, dst-x, n) where x is between 1 and 39, n >= x will cause corruption for me. Gack. It should be using backwards_memcpy in arch/ppc/lib/string.S. I'll have to take a look at it. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: memmove broken 2003-07-04 6:17 ` Paul Mackerras @ 2003-07-04 8:32 ` Joakim Tjernlund 2003-07-04 9:37 ` Paul Mackerras 0 siblings, 1 reply; 15+ messages in thread From: Joakim Tjernlund @ 2003-07-04 8:32 UTC (permalink / raw) To: Paul Mackerras; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org > Joakim Tjernlund writes: > > > I just tried to used memmove in zlib(zlib_inflate_fast) to avoid expensive > > byte copies. That didn't work and the reason is that the ppc memmove can handle > > overlapping memory it seems. > > I assume you mean "can't handle"? Yes :) > > If so, this is interesting because it is supposed to work. :) > > > memmove(dst, dst-x, n) where x is between 1 and 39, n >= x will cause corruption for me. > > Gack. It should be using backwards_memcpy in arch/ppc/lib/string.S. > I'll have to take a look at it. Paul, I just realized that there is something strange going on in zlib. zlib_inflate_fast() does a standard byte copy: do{ *q++ = *r++; while(--c); /* c >= 3 */ q is always greater than r and this works. This is something else than a regular mem copy. I havn't figured out yet what is happening here but I don't think there is a problem with backwards_memcpy() Jocke ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: memmove broken 2003-07-04 8:32 ` Joakim Tjernlund @ 2003-07-04 9:37 ` Paul Mackerras 2003-07-04 11:07 ` Joakim Tjernlund 0 siblings, 1 reply; 15+ messages in thread From: Paul Mackerras @ 2003-07-04 9:37 UTC (permalink / raw) To: joakim.tjernlund; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org Joakim Tjernlund writes: > I just realized that there is something strange going on in zlib. > zlib_inflate_fast() does a standard byte copy: > do{ > *q++ = *r++; > while(--c); /* c >= 3 */ > > q is always greater than r and this works. This is something else than a > regular mem copy. I havn't figured out yet what is happening here but I don't > think there is a problem with backwards_memcpy() Ah. Yes. This is actually quite clever. If you have for example 256 zero bytes to compress, zlib will output a single literal zero byte and then a block reference of length 255 and offset -1, saying that the block from 1..255 is the same as the block from 0..254. So the fact that it copies forwards byte by byte is essential. So you couldn't use memcpy() either since it copies in larger chunks than 1 byte. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: memmove broken 2003-07-04 9:37 ` Paul Mackerras @ 2003-07-04 11:07 ` Joakim Tjernlund 2003-07-04 23:38 ` Paul Mackerras 0 siblings, 1 reply; 15+ messages in thread From: Joakim Tjernlund @ 2003-07-04 11:07 UTC (permalink / raw) To: Paul Mackerras; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org, Jorn Engel > Joakim Tjernlund writes: > > > I just realized that there is something strange going on in zlib. > > zlib_inflate_fast() does a standard byte copy: > > do{ > > *q++ = *r++; > > while(--c); /* c >= 3 */ > > > > q is always greater than r and this works. This is something else than a > > regular mem copy. I havn't figured out yet what is happening here but I don't > > think there is a problem with backwards_memcpy() > > Ah. Yes. This is actually quite clever. If you have for example 256 > zero bytes to compress, zlib will output a single literal zero byte > and then a block reference of length 255 and offset -1, saying that > the block from 1..255 is the same as the block from 0..254. So the > fact that it copies forwards byte by byte is essential. So you > couldn't use memcpy() either since it copies in larger chunks than 1 > byte. I see, thanks. Currenly this works, but it is PPC specific: static inline void memcpy_update(Byte **dest, Byte **src, size_t *n) { if(*dest - *src <= 8){ /* memcpy on PPC will do 8 bytes in a chunk */ **dest = **src; --(*n); do { *++(*dest) = *++(*src); } while (--(*n)); (*dest)++; (*src)++; } else { memcpy(*dest, *src, *n); *dest += *n; *src += *n; *n = 0; } Jocke ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: memmove broken 2003-07-04 11:07 ` Joakim Tjernlund @ 2003-07-04 23:38 ` Paul Mackerras 2003-07-05 7:21 ` Jörn Engel 0 siblings, 1 reply; 15+ messages in thread From: Paul Mackerras @ 2003-07-04 23:38 UTC (permalink / raw) To: joakim.tjernlund; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org, Jorn Engel Joakim Tjernlund writes: > Currenly this works, but it is PPC specific: > static inline void memcpy_update(Byte **dest, Byte **src, size_t *n) > { > if(*dest - *src <= 8){ /* memcpy on PPC will do 8 bytes in a chunk */ > **dest = **src; > --(*n); > do { > *++(*dest) = *++(*src); > } while (--(*n)); > (*dest)++; (*src)++; > } else { > memcpy(*dest, *src, *n); > *dest += *n; > *src += *n; > *n = 0; > } Why is it an issue? Is the performance of the byte-by-byte loop really a limiting factor for you? In general the behaviour of memcpy with overlapping regions is very implementation-dependent. If we use the code above then that would place constraints on what we can do in memcpy - and very non-obvious constraints at that. For example if we decided to use altivec loads and stores in memcpy, it would then do 16 or 32 bytes in a chunk and thus break zlib again. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-04 23:38 ` Paul Mackerras @ 2003-07-05 7:21 ` Jörn Engel 2003-07-05 9:55 ` Holger Bettag 2003-07-05 13:01 ` Paul Mackerras 0 siblings, 2 replies; 15+ messages in thread From: Jörn Engel @ 2003-07-05 7:21 UTC (permalink / raw) To: Paul Mackerras; +Cc: joakim.tjernlund, Linuxppc-Dev@Lists. Linuxppc. Org On Sat, 5 July 2003 09:38:46 +1000, Paul Mackerras wrote: > Joakim Tjernlund writes: > > > Currenly this works, but it is PPC specific: > > static inline void memcpy_update(Byte **dest, Byte **src, size_t *n) > > { > > if(*dest - *src <= 8){ /* memcpy on PPC will do 8 bytes in a chunk */ > > **dest = **src; > > --(*n); > > do { > > *++(*dest) = *++(*src); > > } while (--(*n)); > > (*dest)++; (*src)++; > > } else { > > memcpy(*dest, *src, *n); > > *dest += *n; > > *src += *n; > > *n = 0; > > } > > Why is it an issue? Is the performance of the byte-by-byte loop > really a limiting factor for you? Not a limiting factor, but it should be noticable. What is more important - in my eyes - is that we can replace magic with obvious code. > In general the behaviour of memcpy with overlapping regions is very > implementation-dependent. If we use the code above then that would > place constraints on what we can do in memcpy - and very non-obvious > constraints at that. For example if we decided to use altivec loads > and stores in memcpy, it would then do 16 or 32 bytes in a chunk and > thus break zlib again. That should not be a problem. memcpy has a very defined behaviour, as long as source and destination don't overlap at all or as the sourse is smaller, than the distination. Cool. We should be able to do something like this in the zlib: if (repeat one byte over and over) /* undefined behaviour for memcpy */ memset(); else { if (copy is wrapped) { memcpy(wrapped part); tweak pointers; } memcpy(); With that, source should always be smaller than destination for memcpy, so the implementation details don't matter, as long as memcpy doesn't do a reverse memcpy. Plus, the zlib code is shorter and tells the reader exactly what it does, without the need for extra comments. Jörn -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by Raph Levien, 1979 ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-05 7:21 ` Jörn Engel @ 2003-07-05 9:55 ` Holger Bettag 2003-07-05 13:01 ` Paul Mackerras 1 sibling, 0 replies; 15+ messages in thread From: Holger Bettag @ 2003-07-05 9:55 UTC (permalink / raw) To: Jörn Engel; +Cc: linuxppc-dev On Sat, 5 Jul 2003, [iso-8859-1] Jörn Engel wrote: [...] > > Why is it an issue? Is the performance of the byte-by-byte loop > > really a limiting factor for you? > > Not a limiting factor, but it should be noticable. What is more > important - in my eyes - is that we can replace magic with obvious > code. > Doing runlength coding with LZ-style compressors is not magic, it's actually a fairly old trick. [...] > memcpy has a very defined behaviour, as > long as source and destination don't overlap at all or as the sourse > is smaller, than the distination. Cool. > The direction of memcpy is not guaranteed. Non-overlap is the only situation when memcpy has portable behaviour. [...] > With that, source should always be smaller than destination for > memcpy, so the implementation details don't matter, as long as memcpy > doesn't do a reverse memcpy. > Not even a guarantee of direction helps, because memcpy might copy in larger units than single chars. This also breaks the dependency chain of an originally byte-wise copy. (If you copy in units of 32 bits, the first char of the source block will definitely not be replicated over the successive three chars, as a byte copy would do.) > Plus, the zlib code is shorter and tells the reader exactly what it > does, without the need for extra comments. > There is only one portable and correct way to use memcpy in an LZ decompressor: the source data must be completely within the "past" part of data (already decompressed, immutable), and the destination address must be completely within the "future" part (the undefined memory buffer ahead of the pointer to the current byte). So you have to check if the ending address of the source block is smaller than the current address (the "present moment"). In that case, memcpy will work correctly no matter how it is implemented. Otherwise, you must use a byte-wise copy, because the semantics of the compressed stream rely on that. If you want to optimize further, you can shorten source blocks to the largest legal size (i.e. up to "present"), and recursively call memcpy several times. This preserves the data dependencies as expected by the decompressor, but allows you to call memcpy with exponentially increasing block sizes. Another optimization would be to check for a source block that is only a single byte into the past, and then use memset to replicate it. Anything else violates the assumptions made by the compressor/decompressor. Holger ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-05 7:21 ` Jörn Engel 2003-07-05 9:55 ` Holger Bettag @ 2003-07-05 13:01 ` Paul Mackerras 2003-07-05 14:57 ` Joakim Tjernlund 2003-07-05 17:01 ` Jörn Engel 1 sibling, 2 replies; 15+ messages in thread From: Paul Mackerras @ 2003-07-05 13:01 UTC (permalink / raw) To: Jörn Engel; +Cc: joakim.tjernlund, Linuxppc-Dev@Lists. Linuxppc. Org Jörn Engel writes: > > Why is it an issue? Is the performance of the byte-by-byte loop > > really a limiting factor for you? > > Not a limiting factor, but it should be noticable. What is more > important - in my eyes - is that we can replace magic with obvious > code. You are proposing to replace simple, working code with considerably more complex code. You need to be able to point to a specific real situation where your change makes a significant difference if you want to get it accepted. If the behaviour of the copy loop is non-obvious then the thing to do is to add a comment rather than make it more complicated. > That should not be a problem. memcpy has a very defined behaviour, as > long as source and destination don't overlap at all or as the sourse > is smaller, than the distination. Cool. The behaviour is defined as long as source and destination don't overlap. You get no guarantees from having source < destination. It could quite legitimately copy backwards in all situations, or work exactly the same as memmove. > We should be able to do something like this in the zlib: > if (repeat one byte over and over) /* undefined behaviour for memcpy */ > memset(); > else { > if (copy is wrapped) { > memcpy(wrapped part); > tweak pointers; > } > memcpy(); > > With that, source should always be smaller than destination for > memcpy, so the implementation details don't matter, as long as memcpy > doesn't do a reverse memcpy. > > Plus, the zlib code is shorter and tells the reader exactly what it > does, without the need for extra comments. Well no, I think it still isn't right. What if the length is large but dst - src == 2 (e.g. if you compress "abababababab") or 3 (e.g. if you compress "abcabcabcabcabc")? Or 4 or 5 or ...? You will end up with an awfully large number of special cases. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-05 13:01 ` Paul Mackerras @ 2003-07-05 14:57 ` Joakim Tjernlund 2003-07-05 17:01 ` Jörn Engel 1 sibling, 0 replies; 15+ messages in thread From: Joakim Tjernlund @ 2003-07-05 14:57 UTC (permalink / raw) To: Paul Mackerras, J?rn Engel; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org > > We should be able to do something like this in the zlib: > > if (repeat one byte over and over) /* undefined behaviour for memcpy */ > > memset(); > > else { > > if (copy is wrapped) { > > memcpy(wrapped part); > > tweak pointers; > > } > > memcpy(); > > > > With that, source should always be smaller than destination for > > memcpy, so the implementation details don't matter, as long as memcpy > > doesn't do a reverse memcpy. > > > > Plus, the zlib code is shorter and tells the reader exactly what it > > does, without the need for extra comments. > > Well no, I think it still isn't right. What if the length is large > but dst - src == 2 (e.g. if you compress "abababababab") or 3 (e.g. if > you compress "abcabcabcabcabc")? Or 4 or 5 or ...? You will end up > with an awfully large number of special cases. Hi, sorry for being a bit late to reply, but as of now I am on vacation. This works and does not depend on any arch or memcpy/memmove impl. details: static inline Byte * memmove_update(Byte *dest, Byte *src, size_t n) { /* n is always >= 3 and dest > src */ Byte *ret = dest + n; if (dest - src >= n) memmove(dest, src, n); /* no overlapping here */ else if(dest - src == 1) memset(dest, *src, n); /* special case of the pattern copy loop below, can be removed */ else { /* 1 > dest - src < n */ /* copy patten at *src to *dest, pattern len is dest - src */ /* one could use 2 byte chunks here, since dest - src is >= 2 */ *dest = *src; --n; do { *++dest = *++src; } while (--n); } return ret; } simple enough? Jocke ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-05 13:01 ` Paul Mackerras 2003-07-05 14:57 ` Joakim Tjernlund @ 2003-07-05 17:01 ` Jörn Engel 2003-07-07 10:32 ` Joakim Tjernlund 1 sibling, 1 reply; 15+ messages in thread From: Jörn Engel @ 2003-07-05 17:01 UTC (permalink / raw) To: Paul Mackerras; +Cc: joakim.tjernlund, Linuxppc-Dev@Lists. Linuxppc. Org On Sat, 5 July 2003 23:01:15 +1000, Paul Mackerras wrote: > > > > Not a limiting factor, but it should be noticable. What is more > > important - in my eyes - is that we can replace magic with obvious > > code. > > You are proposing to replace simple, working code with considerably > more complex code. You need to be able to point to a specific real > situation where your change makes a significant difference if you want > to get it accepted. Agreed. And even then, I don't want to submit it before 2.7. > If the behaviour of the copy loop is non-obvious then the thing to do > is to add a comment rather than make it more complicated. This may be highly subjective, but I consider the existing code to be more complicated. *q++ = *r++; c--; *q++ = *r++; c--; Those two lines are found three times in the current code. Why? Loop unrolling, nothing but an optimization, and not a very good one, if you look at current cpus. > The behaviour is defined as long as source and destination don't > overlap. You get no guarantees from having source < destination. It > could quite legitimately copy backwards in all situations, or work > exactly the same as memmove. Also agreed. It is hard to imagine a sane implementation that doesn't give these guarantees, but that doesn't mean much. > Well no, I think it still isn't right. What if the length is large > but dst - src == 2 (e.g. if you compress "abababababab") or 3 (e.g. if > you compress "abcabcabcabcabc")? Or 4 or 5 or ...? You will end up > with an awfully large number of special cases. That would really make my approach more complicated. Back to the drawing board then... Jörn -- Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. -- Rob Pike ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-05 17:01 ` Jörn Engel @ 2003-07-07 10:32 ` Joakim Tjernlund 2003-07-07 12:06 ` Jörn Engel 0 siblings, 1 reply; 15+ messages in thread From: Joakim Tjernlund @ 2003-07-07 10:32 UTC (permalink / raw) To: J?rn Engel, Paul Mackerras; +Cc: Linuxppc-Dev@Lists. Linuxppc. Org > > You are proposing to replace simple, working code with considerably > > more complex code. You need to be able to point to a specific real > > situation where your change makes a significant difference if you want > > to get it accepted. > > Agreed. And even then, I don't want to submit it before 2.7. Why? Just look at 2.4 and you will find much boulder stuff going in there than this. Jocke ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-07 10:32 ` Joakim Tjernlund @ 2003-07-07 12:06 ` Jörn Engel 2003-07-07 12:58 ` Joakim Tjernlund 0 siblings, 1 reply; 15+ messages in thread From: Jörn Engel @ 2003-07-07 12:06 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Paul Mackerras, Linuxppc-Dev@Lists. Linuxppc. Org On Mon, 7 July 2003 12:32:21 +0200, Joakim Tjernlund wrote: > > > Agreed. And even then, I don't want to submit it before 2.7. > > Why? Just look at 2.4 and you will find much boulder stuff going > in there than this. Other people's bad example is no justification for my misbehaviour. ;) This patch doesn't fix anything that is wrong, and it doesn't add functionality that isn't there right now. Performance is the least important aspect of computing, unless it has an impact on functionality as well. (audio without glitches and such) Besides, Marcelo didn't even take my fixes for the zlib yet, so there is more important stuff in the queue. Jörn -- Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. -- Rob Pike ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-07 12:06 ` Jörn Engel @ 2003-07-07 12:58 ` Joakim Tjernlund 2003-07-07 13:16 ` Jörn Engel 0 siblings, 1 reply; 15+ messages in thread From: Joakim Tjernlund @ 2003-07-07 12:58 UTC (permalink / raw) To: J?rn Engel; +Cc: Paul Mackerras, Linuxppc-Dev@Lists. Linuxppc. Org > On Mon, 7 July 2003 12:32:21 +0200, Joakim Tjernlund wrote: > > > > > Agreed. And even then, I don't want to submit it before 2.7. > > > > Why? Just look at 2.4 and you will find much boulder stuff going > > in there than this. > > Other people's bad example is no justification for my misbehaviour. ;) > > This patch doesn't fix anything that is wrong, and it doesn't add > functionality that isn't there right now. [SNIP] Nor does most of you zlib patches either, but most of them are in 2.5. Point being, this is not a major change(so far). If nothing wrong is found and it is noticeably faster, I think it should go into 2.5. Jocke ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: memmove broken 2003-07-07 12:58 ` Joakim Tjernlund @ 2003-07-07 13:16 ` Jörn Engel 0 siblings, 0 replies; 15+ messages in thread From: Jörn Engel @ 2003-07-07 13:16 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Paul Mackerras, Linuxppc-Dev@Lists. Linuxppc. Org On Mon, 7 July 2003 14:58:33 +0200, Joakim Tjernlund wrote: > > > > This patch doesn't fix anything that is wrong, and it doesn't add > > functionality that isn't there right now. > > Nor does most of you zlib patches either, but most of them are in 2.5. > Point being, this is not a major change(so far). If nothing wrong is found and > it is noticeably faster, I think it should go into 2.5. Point taken. If tests and benchmarks agree, it should go in. Jörn -- A defeated army first battles and then seeks victory. -- Sun Tzu ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-07-07 13:16 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-07-03 17:20 memmove broken Joakim Tjernlund 2003-07-04 6:17 ` Paul Mackerras 2003-07-04 8:32 ` Joakim Tjernlund 2003-07-04 9:37 ` Paul Mackerras 2003-07-04 11:07 ` Joakim Tjernlund 2003-07-04 23:38 ` Paul Mackerras 2003-07-05 7:21 ` Jörn Engel 2003-07-05 9:55 ` Holger Bettag 2003-07-05 13:01 ` Paul Mackerras 2003-07-05 14:57 ` Joakim Tjernlund 2003-07-05 17:01 ` Jörn Engel 2003-07-07 10:32 ` Joakim Tjernlund 2003-07-07 12:06 ` Jörn Engel 2003-07-07 12:58 ` Joakim Tjernlund 2003-07-07 13:16 ` Jörn Engel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).