* [Cocci] Various minor parsing bugs
@ 2015-06-18 2:33 Daniel Richard G.
2015-06-23 12:46 ` Julia Lawall
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Richard G. @ 2015-06-18 2:33 UTC (permalink / raw)
To: cocci
Hello list,
I am trialing the use of Coccinelle 1.0.0 on a large commercial
codebase. So far, I've encountered what appear to be a handful of
parsing bugs that have led to inconsistencies in the semantic-
patched output.
I have reduced the associated source code to a set of four minimal test
cases, contained in the attached tarball. For each bug, there is a C
source file, a Coccinelle patch, and possibly a "control" source file (a
minor edit to the test-case source that parses correctly, better
illustrating the issue).
My general use case is transforming a OEM codebase, with static and
global variables galore, into a thread-safe library. This, then,
necessitates adding a new "context" parameter to every function defined
in the codebase, and adding the new argument to every call to these same
functions. The semantic patches associated with these bug-cases will
generally attempt to add a new "void *ctx" parameter/argument to one or
more functions, and fail to do so due to the parsing issues.
Here are the [apparent] bugs I've found so far, then:
Bug 1: The patch adds a new "ctx" argument to calls to foo() and qux().
If qux() is in the argument list of a call to foo(), however, then the
foo() call is not modified.
Bug 2: Very strange case where parsing is affected by whitespace.
Bug 3: "break" at the end of a do{}while() loop throws off Coccinelle.
(It is pointless, but it should be handled.)
Bug 4: Coccinelle can't handle an expression whose content is affected
by a cpp conditional. (In my scenario, I would like it to ignore the
WIN32 side.) Can this be addressed with an appropriate macro-file
definition?
There were also a couple of issues that appeared to be parse errors, but
were actually due to Coccinelle not recognizing C99 integer types (e.g.
"int32_t"). Would these not be reasonable to add to the program's
default initializations?
--Daniel
--
Daniel Richard G. || skunk at iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cocci-bugs.tar.gz
Type: application/gzip
Size: 939 bytes
Desc: not available
URL: <https://systeme.lip6.fr/pipermail/cocci/attachments/20150617/494ae51e/attachment.bin>
^ permalink raw reply [flat|nested] 11+ messages in thread* [Cocci] Various minor parsing bugs
2015-06-18 2:33 [Cocci] Various minor parsing bugs Daniel Richard G.
@ 2015-06-23 12:46 ` Julia Lawall
2015-06-23 22:33 ` Daniel Richard G.
0 siblings, 1 reply; 11+ messages in thread
From: Julia Lawall @ 2015-06-23 12:46 UTC (permalink / raw)
To: cocci
On Wed, 17 Jun 2015, Daniel Richard G. wrote:
> Hello list,
>
> I am trialing the use of Coccinelle 1.0.0 on a large commercial
> codebase. So far, I've encountered what appear to be a handful of
> parsing bugs that have led to inconsistencies in the semantic-
> patched output.
>
> I have reduced the associated source code to a set of four minimal test
> cases, contained in the attached tarball. For each bug, there is a C
> source file, a Coccinelle patch, and possibly a "control" source file (a
> minor edit to the test-case source that parses correctly, better
> illustrating the issue).
>
> My general use case is transforming a OEM codebase, with static and
> global variables galore, into a thread-safe library. This, then,
> necessitates adding a new "context" parameter to every function defined
> in the codebase, and adding the new argument to every call to these same
> functions. The semantic patches associated with these bug-cases will
> generally attempt to add a new "void *ctx" parameter/argument to one or
> more functions, and fail to do so due to the parsing issues.
>
> Here are the [apparent] bugs I've found so far, then:
>
> Bug 1: The patch adds a new "ctx" argument to calls to foo() and qux().
> If qux() is in the argument list of a call to foo(), however, then the
> foo() call is not modified.
Coccinelle picks one branch of the disjunction for matching against each
"statement" (loop header, etc). It also tried to enforce the
prioritization of things early in a disjunction over things later. This
means that in your example it can't match both something with arguments
and something without.
On the other hand, the following works fine:
@expression@
identifier F =~ "^(bar|qux)$";
@@
F(
+ctx,
...)
> Bug 2: Very strange case where parsing is affected by whitespace.
No idea. I'll look into it.
> Bug 3: "break" at the end of a do{}while() loop throws off Coccinelle.
> (It is pointless, but it should be handled.)
OK, I'll have to look into this one too.
> Bug 4: Coccinelle can't handle an expression whose content is affected
> by a cpp conditional. (In my scenario, I would like it to ignore the
> WIN32 side.) Can this be addressed with an appropriate macro-file
> definition?
The problem is that the #ifdef is being ignored. It sees x += _timezone
timezone / 3600; Unless you know the names of all the variables you want
to ignore, I don't see an easy solution.
> There were also a couple of issues that appeared to be parse errors, but
> were actually due to Coccinelle not recognizing C99 integer types (e.g.
> "int32_t"). Would these not be reasonable to add to the program's
> default initializations?
Just add #define int32_t int to standard.h.
Sorry not to have responded sooner. Messages from nonmembers are
moderated, and since most such messages are spam, I don't pay a lot of
attention to them...
julia
^ permalink raw reply [flat|nested] 11+ messages in thread* [Cocci] Various minor parsing bugs
2015-06-23 12:46 ` Julia Lawall
@ 2015-06-23 22:33 ` Daniel Richard G.
2015-06-24 5:40 ` Julia Lawall
2015-06-24 7:36 ` Julia Lawall
0 siblings, 2 replies; 11+ messages in thread
From: Daniel Richard G. @ 2015-06-23 22:33 UTC (permalink / raw)
To: cocci
Hello Julia,
On Tue, 2015 Jun 23 14:46+0200, Julia Lawall wrote:
> >
> > Bug 1: The patch adds a new "ctx" argument to calls to foo() and
> > qux(). If qux() is in the argument list of a call to foo(), however,
> > then the foo() call is not modified.
>
> Coccinelle picks one branch of the disjunction for matching against
> each "statement" (loop header, etc). It also tried to enforce the
> prioritization of things early in a disjunction over things later.
> This means that in your example it can't match both something with
> arguments and something without.
>
> On the other hand, the following works fine:
>
> @expression@
> identifier F =~ "^(bar|qux)$";
> @@
> F(
> +ctx,
> ...)
Ohh, so Coccinelle knows to drop the comma if the matched function has
no arguments. That is helpful!
But I see that the original semantic patch will do
- x += bar(1, 0) + qux();
+ x += bar(ctx, 1, 0) + qux(ctx);
The argument is added to qux() as before, but bar() now gets it too,
within the same expression/statement. I'm not understanding the "one
branch" limitation in view of this...
> > Bug 4: Coccinelle can't handle an expression whose content is
> > affected by a cpp conditional. (In my scenario, I would like it to
> > ignore the WIN32 side.) Can this be addressed with an appropriate
> > macro-file definition?
>
> The problem is that the #ifdef is being ignored. It sees x +=
> _timezone timezone / 3600; Unless you know the names of all the
> variables you want to ignore, I don't see an easy solution.
Coccinelle already does some preprocessor handling; there isn't a way to
tell it to look at only one side of a particular conditional? The
example I gave is a trivial one---the expression could be written out
twice---but there are others less amenable to such a rewrite.
> > There were also a couple of issues that appeared to be parse errors,
> > but were actually due to Coccinelle not recognizing C99 integer
> > types (e.g. "int32_t"). Would these not be reasonable to add to the
> > program's default initializations?
>
> Just add #define int32_t int to standard.h.
Right; I used --macro-file so not to depend on site-specific changes.
But aren't these types ubiquitous enough (if not in the Linux kernel
tree) that they would merit inclusion in the upstream standard.h? With
all the other definitions in that file, it would seem a bit odd to leave
out something as basic as this.
(Is there any advantage in writing "typedef int int32_t;" rather than
a #define?)
> Sorry not to have responded sooner. Messages from nonmembers are
> moderated, and since most such messages are spam, I don't pay a lot of
> attention to them...
I suspected as much; thank you for pulling this needle from the haystack
:]
--Daniel
--
Daniel Richard G. || skunk at iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 11+ messages in thread* [Cocci] Various minor parsing bugs
2015-06-23 22:33 ` Daniel Richard G.
@ 2015-06-24 5:40 ` Julia Lawall
2015-06-24 7:36 ` Julia Lawall
1 sibling, 0 replies; 11+ messages in thread
From: Julia Lawall @ 2015-06-24 5:40 UTC (permalink / raw)
To: cocci
On Tue, 23 Jun 2015, Daniel Richard G. wrote:
> Hello Julia,
>
> On Tue, 2015 Jun 23 14:46+0200, Julia Lawall wrote:
> > >
> > > Bug 1: The patch adds a new "ctx" argument to calls to foo() and
> > > qux(). If qux() is in the argument list of a call to foo(), however,
> > > then the foo() call is not modified.
> >
> > Coccinelle picks one branch of the disjunction for matching against
> > each "statement" (loop header, etc). It also tried to enforce the
> > prioritization of things early in a disjunction over things later.
> > This means that in your example it can't match both something with
> > arguments and something without.
> >
> > On the other hand, the following works fine:
> >
> > @expression@
> > identifier F =~ "^(bar|qux)$";
> > @@
> > F(
> > +ctx,
> > ...)
>
> Ohh, so Coccinelle knows to drop the comma if the matched function has
> no arguments. That is helpful!
>
> But I see that the original semantic patch will do
>
> - x += bar(1, 0) + qux();
> + x += bar(ctx, 1, 0) + qux(ctx);
>
> The argument is added to qux() as before, but bar() now gets it too,
> within the same expression/statement. I'm not understanding the "one
> branch" limitation in view of this...
OK, perhaps I was not quite correct. Overall, it seems to be related to
the goal of matching disjunctions on expressions from left to right. For
example, if you have a pattern like:
(
- &x
+ y
|
- x
+ z
)
and the code you have is &x, then you expect to get y. Maybe it is
throwing away matches with later terms that overlap with matches for
earlier ones. I would have thought that only smaller matches would get
thrown away, but whenever things overlap there is a danger of inconsistent
modifications, so maybe any overlap at all is the better strategy.
> > > Bug 4: Coccinelle can't handle an expression whose content is
> > > affected by a cpp conditional. (In my scenario, I would like it to
> > > ignore the WIN32 side.) Can this be addressed with an appropriate
> > > macro-file definition?
> >
> > The problem is that the #ifdef is being ignored. It sees x +=
> > _timezone timezone / 3600; Unless you know the names of all the
> > variables you want to ignore, I don't see an easy solution.
>
> Coccinelle already does some preprocessor handling; there isn't a way to
> tell it to look at only one side of a particular conditional? The
> example I gave is a trivial one---the expression could be written out
> twice---but there are others less amenable to such a rewrite.
There are some special cases where it ignores one side, but in general it
doesn't. I think that to make this decision in general you have to parse,
and the ifdefs block parsing.
> > > There were also a couple of issues that appeared to be parse errors,
> > > but were actually due to Coccinelle not recognizing C99 integer
> > > types (e.g. "int32_t"). Would these not be reasonable to add to the
> > > program's default initializations?
> >
> > Just add #define int32_t int to standard.h.
>
> Right; I used --macro-file so not to depend on site-specific changes.
> But aren't these types ubiquitous enough (if not in the Linux kernel
> tree) that they would merit inclusion in the upstream standard.h? With
> all the other definitions in that file, it would seem a bit odd to leave
> out something as basic as this.
>
> (Is there any advantage in writing "typedef int int32_t;" rather than
> a #define?)
Mostly Coccinelle has heuristics for identifying types. I'll have to
check again why they were not triggered in your case.
julia
> > Sorry not to have responded sooner. Messages from nonmembers are
> > moderated, and since most such messages are spam, I don't pay a lot of
> > attention to them...
>
> I suspected as much; thank you for pulling this needle from the haystack
> :]
>
>
> --Daniel
>
>
> --
> Daniel Richard G. || skunk at iSKUNK.ORG
> My ASCII-art .sig got a bad case of Times New Roman.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Cocci] Various minor parsing bugs
2015-06-23 22:33 ` Daniel Richard G.
2015-06-24 5:40 ` Julia Lawall
@ 2015-06-24 7:36 ` Julia Lawall
2015-06-25 3:10 ` Daniel Richard G.
1 sibling, 1 reply; 11+ messages in thread
From: Julia Lawall @ 2015-06-24 7:36 UTC (permalink / raw)
To: cocci
For #ifdef, are the terms in the two branches proper expressions normally
(like in your example)? Perhaps we could do something for that. Or are
there things like
#ifdef XXX
3 +
#else
4 -
#endif
5
julia
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Cocci] Various minor parsing bugs
2015-06-24 7:36 ` Julia Lawall
@ 2015-06-25 3:10 ` Daniel Richard G.
2015-06-25 6:11 ` SF Markus Elfring
2015-06-26 12:50 ` Iago Abal
0 siblings, 2 replies; 11+ messages in thread
From: Daniel Richard G. @ 2015-06-25 3:10 UTC (permalink / raw)
To: cocci
On Wed, 2015 Jun 24 07:40+0200, Julia Lawall wrote:
>
> OK, perhaps I was not quite correct. Overall, it seems to be related to
> the goal of matching disjunctions on expressions from left to right. For
> example, if you have a pattern like:
>
> (
> - &x
> + y
> |
> - x
> + z
> )
>
> and the code you have is &x, then you expect to get y. Maybe it is
> throwing away matches with later terms that overlap with matches for
> earlier ones. I would have thought that only smaller matches would
> get thrown away, but whenever things overlap there is a danger of
> inconsistent modifications, so maybe any overlap at all is the better
> strategy.
Hmm. I think I may need to find a better example with which to examine
this behavior, one where the disjunction turns out to be necessary
after all. (I'm a bit uneasy with not knowing how overlaps interact
with disjunctions, and I suspect the issue will crop up again in a
different form.)
> > Coccinelle already does some preprocessor handling; there isn't a
> > way to tell it to look at only one side of a particular conditional?
> > The example I gave is a trivial one---the expression could be
> > written out twice---but there are others less amenable to such a
> > rewrite.
>
> There are some special cases where it ignores one side, but in general
> it doesn't. I think that to make this decision in general you have to
> parse, and the ifdefs block parsing.
Could some ifdefs be selectively evaluated/ignored for parsing purposes?
> Mostly Coccinelle has heuristics for identifying types. I'll have to
> check again why they were not triggered in your case.
If it helps, here are the minimal cases I put together where the C99
type flummoxed Coccinelle (when not mentioned in a macro file):
#include <stdlib.h>
#include <stdint.h>
size_t foo1(void *ptr)
{
return bar((int32_t)(size_t)ptr);
}
void foo2(char *array)
{
array[(int32_t)'.'] = '.';
}
On Wed, 2015 Jun 24 09:36+0200, Julia Lawall wrote:
> For #ifdef, are the terms in the two branches proper expressions
> normally (like in your example)? Perhaps we could do something for
> that. Or are there things like
>
> #ifdef XXX
> 3 +
> #else
> 4 -
> #endif
> 5
Just about, yes. There are idioms like
int
#ifdef WIN32
WINAPI WinMain([...])
#else
main(int argc, char *argv[])
#endif
{
and others where the conditional/surrounding parts can't stand on their
own. The wisdom of writing code in this way may be questionable, but I
would be happy for Coccinelle to evaluate the conditional itself, and
process only one side. (In my case, the WIN32 side is irrelevant.)
--Daniel
--
Daniel Richard G. || skunk at iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 11+ messages in thread* [Cocci] Various minor parsing bugs
2015-06-25 3:10 ` Daniel Richard G.
@ 2015-06-25 6:11 ` SF Markus Elfring
2015-06-26 4:33 ` Daniel Richard G.
2015-06-26 12:50 ` Iago Abal
1 sibling, 1 reply; 11+ messages in thread
From: SF Markus Elfring @ 2015-06-25 6:11 UTC (permalink / raw)
To: cocci
> Could some ifdefs be selectively evaluated/ignored for parsing purposes?
Would you like to consider any software improvements for the skipping
of source code analysis around specific parts?
https://github.com/coccinelle/coccinelle/issues/20
Regards,
Markus
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Cocci] Various minor parsing bugs
2015-06-25 6:11 ` SF Markus Elfring
@ 2015-06-26 4:33 ` Daniel Richard G.
0 siblings, 0 replies; 11+ messages in thread
From: Daniel Richard G. @ 2015-06-26 4:33 UTC (permalink / raw)
To: cocci
On Thu, 2015 Jun 25 08:11+0200, SF Markus Elfring wrote:
> > Could some ifdefs be selectively evaluated/ignored for parsing
> > purposes?
>
> Would you like to consider any software improvements for the skipping
> of source code analysis around specific parts?
> https://github.com/coccinelle/coccinelle/issues/20
If by "consider" you mean "support," then sure. If "implement," then,
well, OCaml isn't my strong suit :]
The magic coccinelle:skip comments are an interesting feature I was not
aware of; that should prove useful. But I would like the ability to
ignore certain sections of code that is based on existing preprocessor
conditionals, since it would be counterproductive to require adding
extra annotations that amount to the same thing.
--Daniel
--
Daniel Richard G. || skunk at iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Cocci] Various minor parsing bugs
2015-06-25 3:10 ` Daniel Richard G.
2015-06-25 6:11 ` SF Markus Elfring
@ 2015-06-26 12:50 ` Iago Abal
2015-07-04 8:13 ` Daniel Richard G.
1 sibling, 1 reply; 11+ messages in thread
From: Iago Abal @ 2015-06-26 12:50 UTC (permalink / raw)
To: cocci
Hi Daniel,
On Thu, Jun 25, 2015 at 5:10 AM, Daniel Richard G. <skunk@iskunk.org> wrote:
> Just about, yes. There are idioms like
>
> int
> #ifdef WIN32
> WINAPI WinMain([...])
> #else
> main(int argc, char *argv[])
> #endif
> {
>
> and others where the conditional/surrounding parts can't stand on their
> own. The wisdom of writing code in this way may be questionable, but I
> would be happy for Coccinelle to evaluate the conditional itself, and
> process only one side. (In my case, the WIN32 side is irrelevant.)
>
Try passing --undefined WIN32 to spatch, this should work for you.
Analogously, you can use --defined <SYMBOL>. This evaluates the condition
of an #ifdef or #ifndef, and may convert them into #if 1 or #if 0. For #if
0, Coccinelle has an option --if0-passing (enabled by default) that will
ignore code under #if 0. The only constraint is that the code under the #if
0 must still be lexically valid C code, because --if0-passing works after
lexing.
Using your example and passing --undefined WIN32 I get this from Coccinelle:
PARSING: foo.c
(ONCE) CPP-commenting a #if 0 or #if LINUX_VERSION or __cplusplus
passed:#ifdef WIN32
passed:WINAPI WinMain ( [ ... ] )
passed:#else
passed:#endif
-- iago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://systeme.lip6.fr/pipermail/cocci/attachments/20150626/707dca5b/attachment.html>
^ permalink raw reply [flat|nested] 11+ messages in thread* [Cocci] Various minor parsing bugs
2015-06-26 12:50 ` Iago Abal
@ 2015-07-04 8:13 ` Daniel Richard G.
2015-07-04 8:16 ` Julia Lawall
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Richard G. @ 2015-07-04 8:13 UTC (permalink / raw)
To: cocci
Hi Iago,
Apologies for the delay---my employer shifted me to some more urgent
work for the past few days.
On Fri, 2015 Jun 26 14:50+0200, Iago Abal wrote:
> Hi Daniel,
>
> Try passing --undefined WIN32 to spatch, this should work for you.
> Analogously, you can use --defined <SYMBOL>. This evaluates the
> condition of an #ifdef or #ifndef, and may convert them into #if 1 or
> #if 0. For #if , Coccinelle has an option --if0-passing (enabled by
> default) that will ignore code under #if 0. The only constraint is
> that the code under the #if must still be lexically valid C code,
> because --if0-passing works after lexing.
Yep, that appears to do the trick :)
This works pretty well, as I can also specify e.g. "--defined COCCI" and
then use "#ifndef COCCI" to hide tricky segments of code from
Coccinelle---not unlike the magic-comment strings but in a form more
recognizable to other developers.
(I had noticed the "#if 0" behavior previously, but something else was
needed if I still wanted regular C compilers to see the code.)
I had seen --(un)defined in the "spatch --help" output, but without any
description:
--defined
--undefined
Julia, may I suggest the following text for these?
--defined <symbol> treat cpp symbol as defined in #ifdef
--undefined <symbol> treat cpp symbol as undefined in #ifdef
If this minimal documentation had been in place, I might have been able
to discover this functionality on my own.
Iago, thank you again for filling me in!
--Daniel
--
Daniel Richard G. || skunk at iSKUNK.ORG
My ASCII-art .sig got a bad case of Times New Roman.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Cocci] Various minor parsing bugs
2015-07-04 8:13 ` Daniel Richard G.
@ 2015-07-04 8:16 ` Julia Lawall
0 siblings, 0 replies; 11+ messages in thread
From: Julia Lawall @ 2015-07-04 8:16 UTC (permalink / raw)
To: cocci
> I had seen --(un)defined in the "spatch --help" output, but without any
> description:
>
> --defined
> --undefined
>
> Julia, may I suggest the following text for these?
>
> --defined <symbol> treat cpp symbol as defined in #ifdef
> --undefined <symbol> treat cpp symbol as undefined in #ifdef
>
> If this minimal documentation had been in place, I might have been able
> to discover this functionality on my own.
Sure. I'll update the help text. Thanks for the suggestion.
julia
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-07-04 8:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-18 2:33 [Cocci] Various minor parsing bugs Daniel Richard G.
2015-06-23 12:46 ` Julia Lawall
2015-06-23 22:33 ` Daniel Richard G.
2015-06-24 5:40 ` Julia Lawall
2015-06-24 7:36 ` Julia Lawall
2015-06-25 3:10 ` Daniel Richard G.
2015-06-25 6:11 ` SF Markus Elfring
2015-06-26 4:33 ` Daniel Richard G.
2015-06-26 12:50 ` Iago Abal
2015-07-04 8:13 ` Daniel Richard G.
2015-07-04 8:16 ` Julia Lawall
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.