public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Paubert <paubert@iram.es>
To: tom st denis <tomstdenis@yahoo.com>
Cc: Christoph Hellwig <hch@infradead.org>, linux-kernel@vger.kernel.org
Subject: [OT] Re: 0xdeadbeef vs 0xdeadbeefL
Date: Thu, 8 Jul 2004 11:32:50 +0200	[thread overview]
Message-ID: <20040708093249.GC32629@iram.es> (raw)
In-Reply-To: <20040707185340.42091.qmail@web41112.mail.yahoo.com>

On Wed, Jul 07, 2004 at 11:53:40AM -0700, tom st denis wrote:
> --- Christoph Hellwig <hch@infradead.org> wrote:
> > On Wed, Jul 07, 2004 at 11:41:50AM -0700, tom st denis wrote:
> > > Um, actually "char" like "int" and "long" in C99 is signed.  So
> > while
> > > you can write 
> > > 
> > > signed int x = -3;
> > > 
> > > You don't have to.  in fact if you "have" to then your compiler is
> > > broken.  Now I know that GCC offers "unsigned chars" but that's an
> > > EXTENSION not part of the actual standard.  
> > 
> > ------------------------------ snip -----------------------------
> >  [#15]  The  three types char, signed char, and unsigned char
> >         are   collectively   called   the   character   types.   The
> >         implementation  shall  define  char  to have the same range,
> > 	representation,  and  behavior  as  either  signed  char or
> > 	unsigned char.35)
> > ------------------------------ snip -----------------------------
> 
> Right.  Didn't know that.  Whoa.  So in essence "char" is not a safe
> type.

It depends what you use it for, but it typically is not. 

The _very_ common mistake is assigning the result of fgetc/getc/getchar
(which are defined to return an _unsigned_ char cast to an int or EOF) 
to a plain char and then comparing it with -1 to check for EOF: 

1) it will never detect the EOF if the char is unsigned (PPC)

2) it will stop on a ÿ (that's an y with a diaeresis) on Intel. This
character is infrequent in the languages I use but it occasionally 
happens. 

Of course people who only use plain 7 bit ASCII never hit the bug,
but as soon as you go into Latin-$n encodings you may hit them (I'm 
only restricting myself to character sets based on the Latin alphabet). 

And no the solution is not to use -fsigned-char or -funsigned char
as an optin to GCC. Most of the time it only changes the kind of bugs 
that are hidden in the code, and 2) above is statistically harder to 
hit than 1).

> 
> > > As for writing portable code, um, jacka#!, BitKeeper, you know,
> > that
> > > thingy that hosts the Linux kernel?  Yeah it uses LibTomCrypt.  Why
> > not
> > > goto http://libtomcrypt.org and find out who the author is.  Oh
> > yeah,
> > > that would be me.  Why not email Wayne Scott [who has code in
> > > LibTomCrypt btw...] and ask him about it?

Yes, I know and I use BK. But given the fact that you insult me for 
better knowing C rules than you, I'm seriously considering switch 
to subversion or arch instead.

Argh, I've mentioned BK. There should be a Goldwin's law equivalent
for BitKeeper on lkml ;-)

> > > 
> > > Who elses uses LibTomCrypt?  Oh yeah, Sony, Gracenote, IBM [um Joy
> > > Latten can chip in about that], Intel, various schools including
> > > Harvard, Stanford, MIT, BYU, ...
> > 
> > Tons of people use windows aswell.  You just showed that you don't
> > know
> > C well enough, so maybe someone should better do an audit for your
> > code ;-)
> 
> To be honest I didn't know that above.  That's why I'm always explicit.
>  [btw my code builds in MSVC, BCC and ICC as well].
> 
> You don't need to know such details to be able to develop in C.  I'm
> sure if you walked into [say] Redhat and gave an "on the spot C quiz"
> about obscure rules they would fail.  You have to use some common sense
> and apply the more relevant rules.  

Well, I consider the rules about plain char to be among the most
relevant, since I've been hit by them _way_ _more_ than about any 
other badly known C rule.

And finally, I'd personnaly prefer the char to be unsigned, for several
reasons:
- its name which suggests that it is an enumeration of symbols. 
- strcmp and friends do the comparisons using _unsigned_ char,
despite the fact that the prototype declare plain char parameters
- the aforementioned fgetc/getc/getchar issues.
  
  
BTW, this signed/unsigned mess is a reason for some weirdness like 
tables with 384 entries in libc/ctype/ctype.h:


/* These are defined in ctype-info.c.
   The declarations here must match those in localeinfo.h.

   In the thread-specific locale model (see `uselocale' in <locale.h>)
   we cannot use global variables for these as was done in the past.
   Instead, the following accessor functions return the address of
   each variable, which is local to the current thread if multithreaded.

   These point into arrays of 384, so they can be indexed by any `unsigned
   char' value [0,255]; by EOF (-1); or by any `signed char' value
   [-128,-1).  ISO C requires that the ctype functions work for `unsigned
   char' values and for EOF; we also support negative `signed char' values
   for broken old programs.  
   
 [snipped]

Not specifying the signedness of the char types is one of C's original
mistakes, and the one that statistically mostly affects me.

	Gabriel (the only good char is the unsigned char)
 

  parent reply	other threads:[~2004-07-08  9:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-06 21:56 0xdeadbeef vs 0xdeadbeefL David Eger
2004-07-07  0:06 ` tom st denis
2004-07-07  3:00   ` viro
2004-07-07 11:10     ` tom st denis
2004-07-07 11:18       ` Prohibited attachment type (was 0xdeadbeef) Richard B. Johnson
2004-07-07 11:48         ` tom st denis
2004-07-07 12:29           ` Jakub Jelinek
2004-07-08  5:52             ` Pavel Machek
2004-07-08 14:03               ` Jakub Jelinek
2004-07-07 12:13         ` R. J. Wysocki
2004-07-07 14:22       ` 0xdeadbeef vs 0xdeadbeefL viro
2004-07-07 18:47         ` tom st denis
2004-07-07 16:30       ` Gabriel Paubert
2004-07-07 18:41         ` tom st denis
2004-07-07 18:47           ` Christoph Hellwig
2004-07-07 18:53             ` tom st denis
2004-07-07 23:17               ` Harald Arnesen
2004-07-08  6:15               ` David Weinehall
2004-07-08  9:32               ` Gabriel Paubert [this message]
2004-07-08 11:15                 ` [OT] " viro
2004-07-08 11:55                   ` Gabriel Paubert
2004-07-08 16:41                   ` Andries Brouwer
2004-07-08 17:13                     ` Michael Driscoll
2004-07-08 17:16           ` Horst von Brand
2004-07-10  1:52           ` Andrew Rodland
2004-07-07  0:38 ` Richard B. Johnson
2004-07-07  4:52   ` David Eger
2004-07-07 11:40     ` Richard B. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040708093249.GC32629@iram.es \
    --to=paubert@iram.es \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tomstdenis@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox