git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: "Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
Date: Thu, 7 Jul 2022 21:11:28 +0200	[thread overview]
Message-ID: <cb866b8c-dcc6-557f-da23-1c1972619a8a@web.de> (raw)
In-Reply-To: <xmqq1quw23r8.fsf@gitster.g>

Am 07.07.22 um 20:10 schrieb Junio C Hamano:
> While our eyes are on array.cocci, I have a few observations on it.
>
> This is not meant specifically to you, Ævar, but comments by those
> more familiar with Coccinelle (and I am sure the bar to pass is
> fairly low, as I am not all that familiar) are very much
> appreciated.
>
>     @@
>     expression dst, src, n, E;
>     @@
>       memcpy(dst, src, n * sizeof(
>     - E[...]
>     + *(E)
>       ))
>
> This seems to force us prefer sizeof(*(E)) over sizeof(E[i]), when
> it is used to compute the byte size of memcpy() operation.  There is
> no reason to prefer one over the other, but I presume it is there
> only for convenience for the other rules in this file (I recall
> vaguely reading somewhere that these rules do not "execute" from top
> to bottom, so I wonder how effective it is?).

It halves the number of syntax variants to deal with.

>
>     @@
>     type T;
>     T *ptr;
>     T[] arr;
>     expression E, n;
>     @@
>     (
>       memcpy(ptr, E,
>     - n * sizeof(*(ptr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(arr, E,
>     - n * sizeof(*(arr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(E, ptr,
>     - n * sizeof(*(ptr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(E, arr,
>     - n * sizeof(*(arr))
>     + n * sizeof(T)
>       )
>     )
>
> Likewise, but this one is a lot worse.
>
> Taken alone, sizeof(*(ptr)) is far more preferrable than sizeof(T),
> because the code will be more maintainable.
>
>     Side Note.  I know builtin/mv.c had this type mismatch between
>     the variable and sizeof() from the beginning when 11be42a4 (Make
>     git-mv a builtin, 2006-07-26) introduced both the variable
>     declaration "const char **source" and memmove() on it, but a
>     code that starts out with "char **src" with memmove() that moves
>     part of src[] and uses sizeof(char *) to compute the byte size
>     of the move would become broken the same way when a later
>     developer tightens the declaration to use "const char **src"
>     without realizing that they have to update the type used in
>     sizeof().
>
> So even though I am guessing that this is to allow the later rules
> to worry only about sizeof(T), I am a bit unhappy to see the rule.
> If an existing code matched this rule and get rewritten to use
> sizeof(T), not sizeof(*(ptr)), but did not match the other rules to
> be rewritten to use COPY_ARRAY(), the overall effect would be that
> the automation made the code worse.

True.

>
>     @@
>     type T;
>     T *dst_ptr;
>     T *src_ptr;
>     T[] dst_arr;
>     T[] src_arr;
>     expression n;
>     @@
>     (
>     - memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_ptr, src_ptr, n)
>     |
>     - memcpy(dst_ptr, src_arr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_ptr, src_arr, n)
>     |
>     - memcpy(dst_arr, src_ptr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_arr, src_ptr, n)
>     |
>     - memcpy(dst_arr, src_arr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_arr, src_arr, n)
>     )
>
> I take it that thanks to the earlier "meh -- between sizeof(*p) and
> sizeof(p[0]) there is no reason to prefer one over the other" and
> "oh, no, we should prefer sizeof(*p) not sizeof(typeof(*p)) but this
> one is the other way around" rules, this one only has to deal with
> sizeof(T).
>
> Am I reading it correctly?

Yes.  Without the ugly normalization step in the middle could either
use twelve cases instead of four here or use inline alternatives,
e.g.:

type T;
T *dst_ptr;
T *src_ptr;
T[] dst_arr;
T[] src_arr;
expression n;
@@
(
- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_ptr, src_ptr, n)
|
- memcpy(dst_ptr, src_arr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_ptr, src_arr, n)
|
- memcpy(dst_arr, src_ptr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_arr, src_ptr, n)
|
- memcpy(dst_arr, src_arr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_arr, src_arr, n)
)

I seem to remember that rules like this missed some cases, but perhaps
that's no longer an issue with the latest Coccinelle version?

>
>     @@
>     type T;
>     T *dst;
>     T *src;
>     expression n;
>     @@
>     (
>     - memmove(dst, src, (n) * sizeof(*dst));
>     + MOVE_ARRAY(dst, src, n);
>     |
>     - memmove(dst, src, (n) * sizeof(*src));
>     + MOVE_ARRAY(dst, src, n);
>     |
>     - memmove(dst, src, (n) * sizeof(T));
>     + MOVE_ARRAY(dst, src, n);
>     )
>
> What I find interesting is that this one seems to be able to do the
> necessary rewrite without having to do the "turn everything into
> sizeof(T) first" trick.  If this approach works well, I'd rather see
> the COPY_ARRAY() done without the first two preliminary rewrite
> rules.

It doesn't support arrays (T[]).  That doesn't matter in practice
because we don't have such cases (yet?).

>
> I wonder if the pattern in the first rule catches sizeof(dst[0])
> instead of sizeof(*dst), though.

It doesn't.

>
>     @@
>     type T;
>     T *ptr;
>     expression n;
>     @@
>     - ptr = xmalloc((n) * sizeof(*ptr));
>     + ALLOC_ARRAY(ptr, n);
>
>     @@
>     type T;
>     T *ptr;
>     expression n;
>     @@
>     - ptr = xmalloc((n) * sizeof(T));
>     + ALLOC_ARRAY(ptr, n);
>
> Is it a no-op rewrite if we replace the above two rules with
> something like:
>
> .   @@
> .   type T;
> .   T *ptr;
> .   expression n;
> .   @@
> .   (
> .   - ptr = xmalloc((n) * sizeof(*ptr));
> .   + ALLOC_ARRAY(ptr, n);
> .   |
> .   - ptr = xmalloc((n) * sizeof(T));
> .   + ALLOC_ARRAY(ptr, n);
> .   )

I think so.

>
> or even following the pattern of the next one ...
>
> .   @@
> .   type T;
> .   T *ptr;
> .   expression n;
> .   @@
> .   - ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
> .   + ALLOC_ARRAY(ptr, n);
>
> ... I have to wonder?  I like the simplicity of this pattern.

In theory this is equivalent.

>
>     @@
>     type T;
>     T *ptr;
>     expression n != 1;
>     @@
>     - ptr = xcalloc(n, \( sizeof(*ptr) \| sizeof(T) \) )
>     + CALLOC_ARRAY(ptr, n)
>
> And this leaves xcalloc(1, ...) alone because it is a way to get a
> cleared block of memory that may not be an array at all.  Shouldn't
> we have "n != 1" for xmalloc rule as well, I wonder, if only for
> consistency?

I agree that a single-element array is a bit awkward, so allowing the
explicit sizeof in that case is less iffy.  ALLOC/CALLOC macros for
single items might make that automation more palatable..

René

  reply	other threads:[~2022-07-07 19:11 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07  2:02 [PATCH] builtin/mv.c: use correct type to compute size of an array element Junio C Hamano
2022-07-07  5:52 ` [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove() Junio C Hamano
2022-07-10  1:33   ` [PATCH v3] " Junio C Hamano
2022-07-18 20:30     ` Derrick Stolee
2022-07-07 12:11 ` [PATCH] builtin/mv.c: use correct type to compute size of an array element Ævar Arnfjörð Bjarmason
2022-07-07 18:10   ` Junio C Hamano
2022-07-07 19:11     ` René Scharfe [this message]
2022-07-09  8:16       ` René Scharfe
2022-07-10  5:38         ` Junio C Hamano
2022-07-10 10:05           ` [PATCH] cocci: avoid normalization rules for memcpy René Scharfe
2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
2022-07-10 16:32               ` Ævar Arnfjörð Bjarmason
2022-07-10 19:30               ` Junio C Hamano
2022-07-11 17:11               ` René Scharfe
2022-07-11 20:05                 ` Ævar Arnfjörð Bjarmason
2022-07-07 18:27   ` [PATCH] builtin/mv.c: use correct type to compute size of an array element René Scharfe
2022-07-07 18:42 ` Jeff King
2022-07-07 20:25   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb866b8c-dcc6-557f-da23-1c1972619a8a@web.de \
    --to=l.s.r@web.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).