* '$' as "valid" character in identifiers @ 2007-05-23 21:43 Michael Stefaniuc 2007-05-23 22:00 ` Michael Stefaniuc 2007-05-23 22:10 ` Linus Torvalds 0 siblings, 2 replies; 13+ messages in thread From: Michael Stefaniuc @ 2007-05-23 21:43 UTC (permalink / raw) To: Sparse Mailing-list echo 'int dollar$ = 1;' > /tmp/dollar.c gcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c echo $? 0 No comment ... I didn't find '$' as valid identifier character in "The C Programming Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html allow it as valid char in the "source character set". I found it because sparse tripped over that in the Wine source code. Yes, i already sent a patch to fix that in Wine. Not sure if sparse should change its behavior here; cgcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c /tmp/dollar.c:1:11: error: Expected ; at end of declaration /tmp/dollar.c:1:11: error: got $ is a adequate response. bye michael -- Michael Stefaniuc Tel.: +49-711-96437-199 Sr. Network Engineer Fax.: +49-711-96437-111 Red Hat GmbH Email: mstefani@redhat.com Hauptstaetterstr. 58 http://www.redhat.de/ D-70178 Stuttgart ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-23 21:43 '$' as "valid" character in identifiers Michael Stefaniuc @ 2007-05-23 22:00 ` Michael Stefaniuc 2007-05-23 22:10 ` Linus Torvalds 1 sibling, 0 replies; 13+ messages in thread From: Michael Stefaniuc @ 2007-05-23 22:00 UTC (permalink / raw) To: Sparse Mailing-list Michael Stefaniuc wrote: > echo 'int dollar$ = 1;' > /tmp/dollar.c > gcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c > echo $? > 0 > No comment ... > > I didn't find '$' as valid identifier character in "The C Programming > Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html > allow it as valid char in the "source character set". > > I found it because sparse tripped over that in the Wine source code. > Yes, i already sent a patch to fix that in Wine. > > Not sure if sparse should change its behavior here; > cgcc -c -Wall -Wextra -o /tmp/dollar.o /tmp/dollar.c > /tmp/dollar.c:1:11: error: Expected ; at end of declaration > /tmp/dollar.c:1:11: error: got $ > is a adequate response. On a second thought something like: /tmp/dollar.c:1:11: error: Invalid character '%c' in identifier would be a better answer and make it more obvious what the problem is. bye michael -- Michael Stefaniuc Tel.: +49-711-96437-199 Sr. Network Engineer Fax.: +49-711-96437-111 Red Hat GmbH Email: mstefani@redhat.com Hauptstaetterstr. 58 http://www.redhat.de/ D-70178 Stuttgart ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-23 21:43 '$' as "valid" character in identifiers Michael Stefaniuc 2007-05-23 22:00 ` Michael Stefaniuc @ 2007-05-23 22:10 ` Linus Torvalds 2007-05-24 10:04 ` Al Viro 1 sibling, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2007-05-23 22:10 UTC (permalink / raw) To: Michael Stefaniuc; +Cc: Sparse Mailing-list On Wed, 23 May 2007, Michael Stefaniuc wrote: > > I didn't find '$' as valid identifier character in "The C Programming > Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html > allow it as valid char in the "source character set". I think it was a common extension for some strange operating systems (read: VMS), where system symbols have "$" embedded in the name. So you'd have names like "sys$function()" for system functions. It's possible others did it too - gcc says it's "traditional", but the only case I've seen it is from VMS (and thus from DEC->Compaq->HP C compilers). But I certainly wouldn't object to sparse supporting it, although I would suggest that it at least warn by default. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-23 22:10 ` Linus Torvalds @ 2007-05-24 10:04 ` Al Viro 2007-05-24 11:14 ` Derek M Jones 2007-05-24 14:26 ` Neil Booth 0 siblings, 2 replies; 13+ messages in thread From: Al Viro @ 2007-05-24 10:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Michael Stefaniuc, Sparse Mailing-list On Wed, May 23, 2007 at 03:10:57PM -0700, Linus Torvalds wrote: > > > On Wed, 23 May 2007, Michael Stefaniuc wrote: > > > > I didn't find '$' as valid identifier character in "The C Programming > > Language (ANSI C)" nor does http://c0x.coding-guidelines.com/5.2.1.html > > allow it as valid char in the "source character set". > > I think it was a common extension for some strange operating systems > (read: VMS), where system symbols have "$" embedded in the name. So you'd > have names like "sys$function()" for system functions. > > It's possible others did it too - gcc says it's "traditional", but the > only case I've seen it is from VMS (and thus from DEC->Compaq->HP C > compilers). > > But I certainly wouldn't object to sparse supporting it, although I would > suggest that it at least warn by default. The question is how do they treat $ in preprocessor tokens. Is it a full equivalent of letter? I.e. is $x a valid identifier? If it is, that's easy - all we need is to add it cclass[] in tokenize.c as a letter and be done with that. If not (i.e. if it can only appear after the first letter), we probably want to either classify it as digit or split the "Digit" bit in two and modify the code checking for it. In any case, we need to figure out what to do with #define A(x,y) x##y A(a,$b) Either $b is an identifier, or it would better be a valid pp-number; otherwise, we'll get the second argument split in two tokens and get a$ b out of that macro. So far it's easy, but... generating a warning will be the nasty part. We certainly don't want it to be generated in tokenizer; after all, #define A(x) #x A($) is legitimate, so we can't do anything until we are past preprocessor. In any case, generating a warning on each instance of such identifier would be an overkill. So we'd have to do it somewhere around insertion into symbol table. And doing essentially strchr() in there is not a nice thing - it's a hot path... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 10:04 ` Al Viro @ 2007-05-24 11:14 ` Derek M Jones 2007-05-24 12:35 ` Al Viro 2007-05-24 14:26 ` Neil Booth 1 sibling, 1 reply; 13+ messages in thread From: Derek M Jones @ 2007-05-24 11:14 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list Al, > The question is how do they treat $ in preprocessor tokens. Is it a full > equivalent of letter? I.e. is $x a valid identifier? If it is, that's > easy - all we need is to add it cclass[] in tokenize.c as a letter and be > done with that. If not (i.e. if it can only appear after the first > letter), we probably want to either classify it as digit or split the > "Digit" bit in two and modify the code checking for it. In any case, > we need to figure out what to do with > > #define A(x,y) x##y > A(a,$b) > > Either $b is an identifier, or it would better be a valid pp-number; otherwise, > we'll get the second argument split in two tokens and get a$ b out of that > macro. Item 10 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n861.htm gives some history and possible solutions. If an implementation supports $ in identifiers, then it is an extension. Implementation extensions are blessed in C99 provided they don't change the behavior of strictly conforming programs. Since $ is not in the basic source character set a program that contains them is not strictly conforming. If sparse supports $ then it just has to do what the implementation it is mimicing does. There is no C Standard behavior as such to worry about. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:derek@knosof.co.uk Applications Standards Conformance Testing http://www.knosof.co.uk ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 11:14 ` Derek M Jones @ 2007-05-24 12:35 ` Al Viro 2007-05-24 13:18 ` Derek M Jones 0 siblings, 1 reply; 13+ messages in thread From: Al Viro @ 2007-05-24 12:35 UTC (permalink / raw) To: Derek M Jones; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list On Thu, May 24, 2007 at 12:14:03PM +0100, Derek M Jones wrote: > Al, > > >The question is how do they treat $ in preprocessor tokens. Is it a full > >equivalent of letter? I.e. is $x a valid identifier? If it is, that's > >easy - all we need is to add it cclass[] in tokenize.c as a letter and be > >done with that. If not (i.e. if it can only appear after the first > >letter), we probably want to either classify it as digit or split the > >"Digit" bit in two and modify the code checking for it. In any case, > >we need to figure out what to do with > > > >#define A(x,y) x##y > >A(a,$b) > > > >Either $b is an identifier, or it would better be a valid pp-number; > >otherwise, > >we'll get the second argument split in two tokens and get a$ b out of that > >macro. > > Item 10 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n861.htm > gives some history and possible solutions. Irrelevant, AFAICS. > If an implementation supports $ in identifiers, then it is an extension. > Implementation extensions are blessed in C99 provided they don't change > the behavior of strictly conforming programs. Since $ is not in the > basic source character set a program that contains them is not strictly > conforming. > > If sparse supports $ then it just has to do what the implementation it > is mimicing does. There is no C Standard behavior as such to worry about. And now for reality: of course if we set out to imitate the implementation allowing $, we'd better imitate it. The question is what to watch out for and how to avoid buggering the tokenizer in process. The question of in n861.10 has nothing whatsobleedingever to do with that. It makes sure that valid macro definition with extended character set will not be misparsed in smaller character set and will generate an error instead. We do not enforce 6.10.3p3 (we ought to; the fix is trivial, I'll send it today), but that has nothing to do with the testcase I'd mentioned: #define A(x,y) x##y A(a,$b) needs $b to be interpreted as a single token if we want existing code in preprocess.c to do the expected thing. Otherwise it would produce two tokens - a$ and b. IOW, tokenizer needs to get a single token when it sees $b and the question is which kind of token we'll be returning. If $ acts as a letter, it's not a problem at all (existing logics will return ident). If it acts as a digit (i.e. it can't be the first character of identifier in the implementation we are imitating) the things are trickier, since we'll need the code parsing pp-numbers to handle that stuff. Which might take more work since simply classifying $ as digit could change behaviour in other parts of tokenizer. Tokenizer implementation resembles the structure of relevant part of standard. That (and not worrying about interpretation of wanted behaviour in terms of modifications of standard) is what it's all about - modifications of tokenizer itself would better be minimally intrusive. I don't have access to VMS boxen (thanks $DEITY); gcc implementation seems to accept '$' as equivalent to letter. Resulting assembler won't pass as(1) if it's the first character in identifier, though, so we don't get any useful information out of the experiment[1]. IOW, we need documentation of the native compilers to find out which kind of behaviour is expected. [1] other than "with gcc on x86 with AT&T assembler syntax an identifier starting with $ silently lands you in nasal demon country", that is. No idea whether the toolchain in question uses AT&T or Intel syntax, no idea what restrictions the native compilers might have... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 12:35 ` Al Viro @ 2007-05-24 13:18 ` Derek M Jones 2007-05-24 14:10 ` Al Viro 0 siblings, 1 reply; 13+ messages in thread From: Derek M Jones @ 2007-05-24 13:18 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list Al, > And now for reality: of course if we set out to imitate the implementation > allowing $, we'd better imitate it. The question is what to watch out > for and how to avoid buggering the tokenizer in process. If sparse is going to imitate a VAX implementation then how $ is glued is probably the least of the implemention worries. VAX C supported a whole host of extensions (eg, the ability to glue comments, 8 & 9 in octal constants). I have a pdf of the DEC C language reference manual for Tru64 which people are welcome to a copy of. > I don't have access to VMS boxen (thanks $DEITY); Continuing in the vein of Irrelevant, AFAICS. You can access various Crays here: http://www.cray-cyber.org/access/index.php Perhaps there is a similar site available for Vaxes? > IOW, we need documentation of the native compilers to find out which kind > of behaviour is expected. I collect old C compiler manuals, if anybody locates a ps or pdf of an original VAC C compiler manual, please forward me a copy. -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:derek@knosof.co.uk Applications Standards Conformance Testing http://www.knosof.co.uk ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 13:18 ` Derek M Jones @ 2007-05-24 14:10 ` Al Viro 2007-05-24 14:43 ` Derek M Jones 2007-05-24 14:50 ` Michael Stefaniuc 0 siblings, 2 replies; 13+ messages in thread From: Al Viro @ 2007-05-24 14:10 UTC (permalink / raw) To: Derek M Jones; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list On Thu, May 24, 2007 at 02:18:11PM +0100, Derek M Jones wrote: > Al, > > >And now for reality: of course if we set out to imitate the implementation > >allowing $, we'd better imitate it. The question is what to watch out > >for and how to avoid buggering the tokenizer in process. > > If sparse is going to imitate a VAX implementation then how $ is glued > is probably the least of the implemention worries. I suspect that the real issue is whatever stuff Windows uses (and no, I don't have Windows boxen either). Anybody who wants to work on code that last compiled on VAX is not going to be happy with what sparse will say about it, anyway. More realistic case is a codebase with some VMS ancestry that got moved to Windows. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 14:10 ` Al Viro @ 2007-05-24 14:43 ` Derek M Jones 2007-05-24 14:50 ` Michael Stefaniuc 1 sibling, 0 replies; 13+ messages in thread From: Derek M Jones @ 2007-05-24 14:43 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list Al, > I suspect that the real issue is whatever stuff Windows uses (and no, > I don't have Windows boxen either). Anybody who wants to work on I do wish I had kept all my old Microsoft C compiler manuals. Microsoft do a very poor job of documenting the extensions they support (which has changed over time). This may be intentional as a way of limiting use of those features (apparently when British rial want to kill off a service they first remove all details of it from the time tables; passenger numbers then drop and they can kill the service off on the basis of low passenger numbers travelling that route). Does anybody here have any old (ie, versions 1-5) Microsoft C compiler manuals? -- Derek M. Jones tel: +44 (0) 1252 520 667 Knowledge Software Ltd mailto:derek@knosof.co.uk Applications Standards Conformance Testing http://www.knosof.co.uk ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 14:10 ` Al Viro 2007-05-24 14:43 ` Derek M Jones @ 2007-05-24 14:50 ` Michael Stefaniuc 1 sibling, 0 replies; 13+ messages in thread From: Michael Stefaniuc @ 2007-05-24 14:50 UTC (permalink / raw) To: Al Viro; +Cc: Derek M Jones, Linus Torvalds, Sparse Mailing-list Al Viro wrote: > On Thu, May 24, 2007 at 02:18:11PM +0100, Derek M Jones wrote: >>> And now for reality: of course if we set out to imitate the implementation >>> allowing $, we'd better imitate it. The question is what to watch out >>> for and how to avoid buggering the tokenizer in process. >> If sparse is going to imitate a VAX implementation then how $ is glued >> is probably the least of the implemention worries. > > I suspect that the real issue is whatever stuff Windows uses (and no, > I don't have Windows boxen either). Anybody who wants to work on I do not know what VC++ is doing but the Win32 API has _no_ $ in identifiers whatsoever. The code over which sparse tripped was a variable defined in an else block in the winedump utility. And that is not even a Win32 application but a pure standard C one. I sent a patch to Wine to fix that (todays commit session didn't happen yet). From Wine point of view we do not need to handle the '$'. > code that last compiled on VAX is not going to be happy with what > sparse will say about it, anyway. More realistic case is a codebase > with some VMS ancestry that got moved to Windows. Wine is not supported on VMS ;) bye michael -- Michael Stefaniuc Tel.: +49-711-96437-199 Sr. Network Engineer Fax.: +49-711-96437-111 Red Hat GmbH Email: mstefani@redhat.com Hauptstaetterstr. 58 http://www.redhat.de/ D-70178 Stuttgart ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 10:04 ` Al Viro 2007-05-24 11:14 ` Derek M Jones @ 2007-05-24 14:26 ` Neil Booth 2007-05-24 14:35 ` Neil Booth 1 sibling, 1 reply; 13+ messages in thread From: Neil Booth @ 2007-05-24 14:26 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list Al Viro wrote:- > > I think it was a common extension for some strange operating systems > > (read: VMS), where system symbols have "$" embedded in the name. So you'd > > have names like "sys$function()" for system functions. > > > > It's possible others did it too - gcc says it's "traditional", but the > > only case I've seen it is from VMS (and thus from DEC->Compaq->HP C > > compilers). > > > > But I certainly wouldn't object to sparse supporting it, although I would > > suggest that it at least warn by default. > > The question is how do they treat $ in preprocessor tokens. Is it a full > equivalent of letter? I.e. is $x a valid identifier? If it is, that's Apparently yes: http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V50_HTML/ARH9NATE/DOCU_026.HTM My personal opinion is we don't want to encourage $ in identifiers, and so I'd urge you to drop the idea :). Too many poor mis-featured extensions in GCC already. At least if you do go ahead it seems the implementation is trivial (assembler aside). Neil. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 14:26 ` Neil Booth @ 2007-05-24 14:35 ` Neil Booth 2007-05-24 14:36 ` Neil Booth 0 siblings, 1 reply; 13+ messages in thread From: Neil Booth @ 2007-05-24 14:35 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list Neil Booth wrote:- > Apparently yes: > > http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V50_HTML/ARH9NATE/DOCU_026.HTM This one is more explicit: http://www.kednos.com/pli/docs/REFERENCE_MANUAL/6291pro.html Neil. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: '$' as "valid" character in identifiers 2007-05-24 14:35 ` Neil Booth @ 2007-05-24 14:36 ` Neil Booth 0 siblings, 0 replies; 13+ messages in thread From: Neil Booth @ 2007-05-24 14:36 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Michael Stefaniuc, Sparse Mailing-list > This one is more explicit: > > http://www.kednos.com/pli/docs/REFERENCE_MANUAL/6291pro.html Bah, never mind! PL/I. Neil. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-05-24 14:50 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-23 21:43 '$' as "valid" character in identifiers Michael Stefaniuc 2007-05-23 22:00 ` Michael Stefaniuc 2007-05-23 22:10 ` Linus Torvalds 2007-05-24 10:04 ` Al Viro 2007-05-24 11:14 ` Derek M Jones 2007-05-24 12:35 ` Al Viro 2007-05-24 13:18 ` Derek M Jones 2007-05-24 14:10 ` Al Viro 2007-05-24 14:43 ` Derek M Jones 2007-05-24 14:50 ` Michael Stefaniuc 2007-05-24 14:26 ` Neil Booth 2007-05-24 14:35 ` Neil Booth 2007-05-24 14:36 ` Neil Booth
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).