linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* '$' as "valid" character in identifiers
@ 2007-05-23 21:43 Michael Stefaniuc
  2007-05-23 22:00 ` Michael Stefaniuc
  2007-05-23 22:10 ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Stefaniuc @ 2007-05-23 21:43 UTC (permalink / raw)
  To: Sparse Mailing-list

echo 'int dollar$ = 1;' > /tmp/dollar.c
gcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c
echo $?
0
No comment ...

I didn't find '$' as valid identifier character in "The C Programming
Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html
allow it as valid char in the "source character set".

I found it because sparse tripped over that in the Wine source code.
Yes, i already sent a patch to fix that in Wine.

Not sure if sparse should change its behavior here;
cgcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c
/tmp/dollar.c:1:11: error: Expected ; at end of declaration
/tmp/dollar.c:1:11: error: got $
is a adequate response.

bye
	michael
-- 
Michael Stefaniuc               Tel.: +49-711-96437-199
Sr. Network Engineer            Fax.: +49-711-96437-111
Red Hat GmbH                    Email: mstefani@redhat.com
Hauptstaetterstr. 58            http://www.redhat.de/
D-70178 Stuttgart

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-23 21:43 '$' as "valid" character in identifiers Michael Stefaniuc
@ 2007-05-23 22:00 ` Michael Stefaniuc
  2007-05-23 22:10 ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Michael Stefaniuc @ 2007-05-23 22:00 UTC (permalink / raw)
  To: Sparse Mailing-list

Michael Stefaniuc wrote:
> echo 'int dollar$ = 1;' > /tmp/dollar.c
> gcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c
> echo $?
> 0
> No comment ...
> 
> I didn't find '$' as valid identifier character in "The C Programming
> Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html
> allow it as valid char in the "source character set".
> 
> I found it because sparse tripped over that in the Wine source code.
> Yes, i already sent a patch to fix that in Wine.
> 
> Not sure if sparse should change its behavior here;
> cgcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c
> /tmp/dollar.c:1:11: error: Expected ; at end of declaration
> /tmp/dollar.c:1:11: error: got $
> is a adequate response.
On a second thought something like:
/tmp/dollar.c:1:11: error: Invalid character '%c' in identifier
would be a better answer and make it more obvious what the problem is.

bye
	michael
-- 
Michael Stefaniuc               Tel.: +49-711-96437-199
Sr. Network Engineer            Fax.: +49-711-96437-111
Red Hat GmbH                    Email: mstefani@redhat.com
Hauptstaetterstr. 58            http://www.redhat.de/
D-70178 Stuttgart

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-23 21:43 '$' as "valid" character in identifiers Michael Stefaniuc
  2007-05-23 22:00 ` Michael Stefaniuc
@ 2007-05-23 22:10 ` Linus Torvalds
  2007-05-24 10:04   ` Al Viro
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2007-05-23 22:10 UTC (permalink / raw)
  To: Michael Stefaniuc; +Cc: Sparse Mailing-list



On Wed, 23 May 2007, Michael Stefaniuc wrote:
> 
> I didn't find '$' as valid identifier character in "The C Programming
> Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html
> allow it as valid char in the "source character set".

I think it was a common extension for some strange operating systems 
(read: VMS), where system symbols have "$" embedded in the name. So you'd 
have names like "sys$function()" for system functions.

It's possible others did it too - gcc says it's "traditional", but the 
only case I've seen it is from VMS (and thus from DEC->Compaq->HP C 
compilers).

But I certainly wouldn't object to sparse supporting it, although I would 
suggest that it at least warn by default.

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-23 22:10 ` Linus Torvalds
@ 2007-05-24 10:04   ` Al Viro
  2007-05-24 11:14     ` Derek M Jones
  2007-05-24 14:26     ` Neil Booth
  0 siblings, 2 replies; 13+ messages in thread
From: Al Viro @ 2007-05-24 10:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Michael Stefaniuc, Sparse Mailing-list

On Wed, May 23, 2007 at 03:10:57PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 23 May 2007, Michael Stefaniuc wrote:
> > 
> > I didn't find '$' as valid identifier character in "The C Programming
> > Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html
> > allow it as valid char in the "source character set".
> 
> I think it was a common extension for some strange operating systems 
> (read: VMS), where system symbols have "$" embedded in the name. So you'd 
> have names like "sys$function()" for system functions.
> 
> It's possible others did it too - gcc says it's "traditional", but the 
> only case I've seen it is from VMS (and thus from DEC->Compaq->HP C 
> compilers).
> 
> But I certainly wouldn't object to sparse supporting it, although I would 
> suggest that it at least warn by default.

The question is how do they treat $ in preprocessor tokens.  Is it a full
equivalent of letter?  I.e. is $x a valid identifier?  If it is, that's
easy - all we need is to add it cclass[] in tokenize.c as a letter and be
done with that.  If not (i.e. if it can only appear after the first
letter), we probably want to either classify it as digit or split the
"Digit" bit in two and modify the code checking for it.  In any case,
we need to figure out what to do with

#define A(x,y) x##y
A(a,$b)

Either $b is an identifier, or it would better be a valid pp-number; otherwise,
we'll get the second argument split in two tokens and get a$ b out of that
macro.

So far it's easy, but... generating a warning will be the nasty part.  We
certainly don't want it to be generated in tokenizer; after all,

#define A(x) #x
A($)

is legitimate, so we can't do anything until we are past preprocessor.
In any case, generating a warning on each instance of such identifier
would be an overkill.  So we'd have to do it somewhere around insertion
into symbol table.  And doing essentially strchr() in there is not
a nice thing - it's a hot path...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 10:04   ` Al Viro
@ 2007-05-24 11:14     ` Derek M Jones
  2007-05-24 12:35       ` Al Viro
  2007-05-24 14:26     ` Neil Booth
  1 sibling, 1 reply; 13+ messages in thread
From: Derek M Jones @ 2007-05-24 11:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

Al,

> The question is how do they treat $ in preprocessor tokens.  Is it a full
> equivalent of letter?  I.e. is $x a valid identifier?  If it is, that's
> easy - all we need is to add it cclass[] in tokenize.c as a letter and be
> done with that.  If not (i.e. if it can only appear after the first
> letter), we probably want to either classify it as digit or split the
> "Digit" bit in two and modify the code checking for it.  In any case,
> we need to figure out what to do with
> 
> #define A(x,y) x##y
> A(a,$b)
> 
> Either $b is an identifier, or it would better be a valid pp-number; otherwise,
> we'll get the second argument split in two tokens and get a$ b out of that
> macro.

Item 10 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n861.htm
gives some history and possible solutions.

If an implementation supports $ in identifiers, then it is an extension.
Implementation extensions are blessed in C99 provided they don't change
the behavior of strictly conforming programs.  Since $ is not in the
basic source character set a program that contains them is not strictly
conforming.

If sparse supports $ then it just has to do what the implementation it
is mimicing does.  There is no C Standard behavior as such to worry about.

-- 
Derek M. Jones                              tel: +44 (0) 1252 520 667
Knowledge Software Ltd                      mailto:derek@knosof.co.uk
Applications Standards Conformance Testing    http://www.knosof.co.uk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 11:14     ` Derek M Jones
@ 2007-05-24 12:35       ` Al Viro
  2007-05-24 13:18         ` Derek M Jones
  0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2007-05-24 12:35 UTC (permalink / raw)
  To: Derek M Jones; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

On Thu, May 24, 2007 at 12:14:03PM +0100, Derek M Jones wrote:
> Al,
> 
> >The question is how do they treat $ in preprocessor tokens.  Is it a full
> >equivalent of letter?  I.e. is $x a valid identifier?  If it is, that's
> >easy - all we need is to add it cclass[] in tokenize.c as a letter and be
> >done with that.  If not (i.e. if it can only appear after the first
> >letter), we probably want to either classify it as digit or split the
> >"Digit" bit in two and modify the code checking for it.  In any case,
> >we need to figure out what to do with
> >
> >#define A(x,y) x##y
> >A(a,$b)
> >
> >Either $b is an identifier, or it would better be a valid pp-number; 
> >otherwise,
> >we'll get the second argument split in two tokens and get a$ b out of that
> >macro.
> 
> Item 10 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n861.htm
> gives some history and possible solutions.

Irrelevant, AFAICS.
 
> If an implementation supports $ in identifiers, then it is an extension.
> Implementation extensions are blessed in C99 provided they don't change
> the behavior of strictly conforming programs.  Since $ is not in the
> basic source character set a program that contains them is not strictly
> conforming.
>
> If sparse supports $ then it just has to do what the implementation it
> is mimicing does.  There is no C Standard behavior as such to worry about.

And now for reality: of course if we set out to imitate the implementation
allowing $, we'd better imitate it.  The question is what to watch out
for and how to avoid buggering the tokenizer in process.

The question of in n861.10 has nothing whatsobleedingever to do with that.
It makes sure that valid macro definition with extended character set will
not be misparsed in smaller character set and will generate an error instead.
We do not enforce 6.10.3p3 (we ought to; the fix is trivial, I'll send it
today), but that has nothing to do with the testcase I'd mentioned:

#define A(x,y) x##y
A(a,$b)

needs $b to be interpreted as a single token if we want existing code in
preprocess.c to do the expected thing.  Otherwise it would produce two
tokens - a$ and b.  IOW, tokenizer needs to get a single token when it
sees $b and the question is which kind of token we'll be returning.
If $ acts as a letter, it's not a problem at all (existing logics will
return ident).  If it acts as a digit (i.e. it can't be the first character
of identifier in the implementation we are imitating) the things are trickier,
since we'll need the code parsing pp-numbers to handle that stuff.  Which
might take more work since simply classifying $ as digit could change
behaviour in other parts of tokenizer.

Tokenizer implementation resembles the structure of relevant part of
standard.  That (and not worrying about interpretation of wanted behaviour
in terms of modifications of standard) is what it's all about - modifications
of tokenizer itself would better be minimally intrusive.

I don't have access to VMS boxen (thanks $DEITY); gcc implementation seems
to accept '$' as equivalent to letter.  Resulting assembler won't pass
as(1) if it's the first character in identifier, though, so we don't get
any useful information out of the experiment[1].

IOW, we need documentation of the native compilers to find out which kind
of behaviour is expected.

[1] other than "with gcc on x86 with AT&T assembler syntax an identifier
starting with $ silently lands you in nasal demon country", that is.
No idea whether the toolchain in question uses AT&T or Intel syntax, no idea
what restrictions the native compilers might have...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 12:35       ` Al Viro
@ 2007-05-24 13:18         ` Derek M Jones
  2007-05-24 14:10           ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Derek M Jones @ 2007-05-24 13:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

Al,

> And now for reality: of course if we set out to imitate the implementation
> allowing $, we'd better imitate it.  The question is what to watch out
> for and how to avoid buggering the tokenizer in process.

If sparse is going to imitate a VAX implementation then how $ is glued
is probably the least of the implemention worries.

VAX C supported a whole host of extensions (eg, the ability to
glue comments, 8 & 9 in octal constants).

I have a pdf of the DEC C language reference manual for Tru64 which
people are welcome to a copy of.

> I don't have access to VMS boxen (thanks $DEITY);

Continuing in the vein of Irrelevant, AFAICS.  You can access various
Crays here: http://www.cray-cyber.org/access/index.php

Perhaps there is a similar site available for Vaxes?

> IOW, we need documentation of the native compilers to find out which kind
> of behaviour is expected.

I collect old C compiler manuals, if anybody locates a ps or pdf of
an original VAC C compiler manual, please forward me a copy.

-- 
Derek M. Jones                              tel: +44 (0) 1252 520 667
Knowledge Software Ltd                      mailto:derek@knosof.co.uk
Applications Standards Conformance Testing    http://www.knosof.co.uk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 13:18         ` Derek M Jones
@ 2007-05-24 14:10           ` Al Viro
  2007-05-24 14:43             ` Derek M Jones
  2007-05-24 14:50             ` Michael Stefaniuc
  0 siblings, 2 replies; 13+ messages in thread
From: Al Viro @ 2007-05-24 14:10 UTC (permalink / raw)
  To: Derek M Jones; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

On Thu, May 24, 2007 at 02:18:11PM +0100, Derek M Jones wrote:
> Al,
> 
> >And now for reality: of course if we set out to imitate the implementation
> >allowing $, we'd better imitate it.  The question is what to watch out
> >for and how to avoid buggering the tokenizer in process.
> 
> If sparse is going to imitate a VAX implementation then how $ is glued
> is probably the least of the implemention worries.

I suspect that the real issue is whatever stuff Windows uses (and no,
I don't have Windows boxen either).  Anybody who wants to work on
code that last compiled on VAX is not going to be happy with what
sparse will say about it, anyway.  More realistic case is a codebase
with some VMS ancestry that got moved to Windows.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 10:04   ` Al Viro
  2007-05-24 11:14     ` Derek M Jones
@ 2007-05-24 14:26     ` Neil Booth
  2007-05-24 14:35       ` Neil Booth
  1 sibling, 1 reply; 13+ messages in thread
From: Neil Booth @ 2007-05-24 14:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

Al Viro wrote:-

> > I think it was a common extension for some strange operating systems 
> > (read: VMS), where system symbols have "$" embedded in the name. So you'd 
> > have names like "sys$function()" for system functions.
> > 
> > It's possible others did it too - gcc says it's "traditional", but the 
> > only case I've seen it is from VMS (and thus from DEC->Compaq->HP C 
> > compilers).
> > 
> > But I certainly wouldn't object to sparse supporting it, although I would 
> > suggest that it at least warn by default.
> 
> The question is how do they treat $ in preprocessor tokens.  Is it a full
> equivalent of letter?  I.e. is $x a valid identifier?  If it is, that's

Apparently yes:

http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V50_HTML/ARH9NATE/DOCU_026.HTM

My personal opinion is we don't want to encourage $ in identifiers,
and so I'd urge you to drop the idea :).  Too many poor mis-featured
extensions in GCC already.  At least if you do go ahead it seems the
implementation is trivial (assembler aside).

Neil.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 14:26     ` Neil Booth
@ 2007-05-24 14:35       ` Neil Booth
  2007-05-24 14:36         ` Neil Booth
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Booth @ 2007-05-24 14:35 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

Neil Booth wrote:-

> Apparently yes:
> 
> http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V50_HTML/ARH9NATE/DOCU_026.HTM

This one is more explicit:

http://www.kednos.com/pli/docs/REFERENCE_MANUAL/6291pro.html

Neil.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 14:35       ` Neil Booth
@ 2007-05-24 14:36         ` Neil Booth
  0 siblings, 0 replies; 13+ messages in thread
From: Neil Booth @ 2007-05-24 14:36 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

> This one is more explicit:
> 
> http://www.kednos.com/pli/docs/REFERENCE_MANUAL/6291pro.html

Bah, never mind!  PL/I.

Neil.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 14:10           ` Al Viro
@ 2007-05-24 14:43             ` Derek M Jones
  2007-05-24 14:50             ` Michael Stefaniuc
  1 sibling, 0 replies; 13+ messages in thread
From: Derek M Jones @ 2007-05-24 14:43 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list

Al,

> I suspect that the real issue is whatever stuff Windows uses (and no,
> I don't have Windows boxen either).  Anybody who wants to work on

I do wish I had kept all my old Microsoft C compiler manuals.
Microsoft do a very poor job of documenting the extensions they
support (which has changed over time).  This may be intentional
as a way of limiting use of those features (apparently when
British rial want to kill off a service they first remove all
details of it from the time tables; passenger numbers then drop
and they can kill the service off on the basis of low passenger
numbers travelling that route).

Does anybody here have any old (ie, versions 1-5) Microsoft C
compiler manuals?

-- 
Derek M. Jones                              tel: +44 (0) 1252 520 667
Knowledge Software Ltd                      mailto:derek@knosof.co.uk
Applications Standards Conformance Testing    http://www.knosof.co.uk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: '$' as "valid" character in identifiers
  2007-05-24 14:10           ` Al Viro
  2007-05-24 14:43             ` Derek M Jones
@ 2007-05-24 14:50             ` Michael Stefaniuc
  1 sibling, 0 replies; 13+ messages in thread
From: Michael Stefaniuc @ 2007-05-24 14:50 UTC (permalink / raw)
  To: Al Viro; +Cc: Derek M Jones, Linus Torvalds, Sparse Mailing-list

Al Viro wrote:
> On Thu, May 24, 2007 at 02:18:11PM +0100, Derek M Jones wrote:
>>> And now for reality: of course if we set out to imitate the implementation
>>> allowing $, we'd better imitate it.  The question is what to watch out
>>> for and how to avoid buggering the tokenizer in process.
>> If sparse is going to imitate a VAX implementation then how $ is glued
>> is probably the least of the implemention worries.
> 
> I suspect that the real issue is whatever stuff Windows uses (and no,
> I don't have Windows boxen either).  Anybody who wants to work on
I do not know what VC++ is doing but the Win32 API has _no_ $ in 
identifiers whatsoever. The code over which sparse tripped was a 
variable defined in an else block in the winedump utility. And that is 
not even a Win32 application but a pure standard C one. I sent a patch 
to Wine to fix that (todays commit session didn't happen yet).

 From Wine point of view we do not need to handle the '$'.

> code that last compiled on VAX is not going to be happy with what
> sparse will say about it, anyway.  More realistic case is a codebase
> with some VMS ancestry that got moved to Windows.
Wine is not supported on VMS ;)

bye
	michael
-- 
Michael Stefaniuc               Tel.: +49-711-96437-199
Sr. Network Engineer            Fax.: +49-711-96437-111
Red Hat GmbH                    Email: mstefani@redhat.com
Hauptstaetterstr. 58            http://www.redhat.de/
D-70178 Stuttgart

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-05-24 14:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-23 21:43 '$' as "valid" character in identifiers Michael Stefaniuc
2007-05-23 22:00 ` Michael Stefaniuc
2007-05-23 22:10 ` Linus Torvalds
2007-05-24 10:04   ` Al Viro
2007-05-24 11:14     ` Derek M Jones
2007-05-24 12:35       ` Al Viro
2007-05-24 13:18         ` Derek M Jones
2007-05-24 14:10           ` Al Viro
2007-05-24 14:43             ` Derek M Jones
2007-05-24 14:50             ` Michael Stefaniuc
2007-05-24 14:26     ` Neil Booth
2007-05-24 14:35       ` Neil Booth
2007-05-24 14:36         ` Neil Booth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).