Pointer to a char

linux-c-programming.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Pointer to a char
@ 2012-09-18  9:29 Randi Botse
  2012-09-18 10:29 ` Phil Sutter
  2012-09-19  1:04 ` Jon Mayo
  0 siblings, 2 replies; 7+ messages in thread
From: Randi Botse @ 2012-09-18  9:29 UTC (permalink / raw)
  To: linux-c-programming

Hi, having coding in C for 3 years but I'm still not clear with this one.
Consider this code.

...
char *p;
unsigned int i = 0xcccccccc;
unsigned int j;

p = (char *)  &i;
printf("%.2x %.2x %.2x %.2x\n", *p, p[1], p[2], p[3]);

memcpy(&j, p, sizeof(unsigned int));
printf("%x\n", j);
...

Output:

ffffffcc ffffffcc ffffffcc ffffffcc
0xcccccccc

My questions are:

1. Why it prints "ffffffcc ffffffcc ffffffcc ffffffcc"? (if p is
unsigned char* then it will print correctly "cc cc cc cc")
2. Why pointer to char p copied to j correctly, why not every member
in p overflow? since it is a signed char.

Regards.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointer to a char
  2012-09-18  9:29 Pointer to a char Randi Botse
@ 2012-09-18 10:29 ` Phil Sutter
  2012-09-18 10:33   ` Duan Fugang-B38611
  2012-09-19  1:04 ` Jon Mayo
  1 sibling, 1 reply; 7+ messages in thread
From: Phil Sutter @ 2012-09-18 10:29 UTC (permalink / raw)
  To: Randi Botse; +Cc: linux-c-programming

Hi,

On Tue, Sep 18, 2012 at 04:29:32PM +0700, Randi Botse wrote:
> ...
> char *p;
> unsigned int i = 0xcccccccc;
> unsigned int j;
> 
> p = (char *)  &i;
> printf("%.2x %.2x %.2x %.2x\n", *p, p[1], p[2], p[3]);
> 
> memcpy(&j, p, sizeof(unsigned int));
> printf("%x\n", j);
> ...
> 
> Output:
> 
> ffffffcc ffffffcc ffffffcc ffffffcc
> 0xcccccccc
> 
> 
> My questions are:
> 
> 1. Why it prints "ffffffcc ffffffcc ffffffcc ffffffcc"? (if p is
> unsigned char* then it will print correctly "cc cc cc cc")

This is because of the two's complement in which singed absolute values
are stored internally. Since %x is a conversion of an integer, signed
extension of the passed char happens, which in two's complement means
that the leading bit is replicated to fill the upper bits. (0xC is 1100
in binary).

> 2. Why pointer to char p copied to j correctly, why not every member
> in p overflow? since it is a signed char.

I am not quite sure about what the question is here (maybe caused by the
lack of verbs in your sentence). Keep in mind that memcpy() only copies
the memory, irrespective of the pointer type passed. Also,
sizeof(unsigned int) == sizeof(int).

HTH, Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Pointer to a char
  2012-09-18 10:29 ` Phil Sutter
@ 2012-09-18 10:33   ` Duan Fugang-B38611
  0 siblings, 0 replies; 7+ messages in thread
From: Duan Fugang-B38611 @ 2012-09-18 10:33 UTC (permalink / raw)
  To: Phil Sutter, Randi Botse; +Cc: linux-c-programming

Thanks, Phil,

It is great for the detail explain.


Best Regards,
Andy

-----Original Message-----
From: linux-c-programming-owner@vger.kernel.org [mailto:linux-c-programming-owner@vger.kernel.org] On Behalf Of Phil Sutter
Sent: Tuesday, September 18, 2012 6:30 PM
To: Randi Botse
Cc: linux-c-programming
Subject: Re: Pointer to a char

Hi,

On Tue, Sep 18, 2012 at 04:29:32PM +0700, Randi Botse wrote:
> ...
> char *p;
> unsigned int i = 0xcccccccc;
> unsigned int j;
> 
> p = (char *)  &i;
> printf("%.2x %.2x %.2x %.2x\n", *p, p[1], p[2], p[3]);
> 
> memcpy(&j, p, sizeof(unsigned int));
> printf("%x\n", j);
> ...
> 
> Output:
> 
> ffffffcc ffffffcc ffffffcc ffffffcc
> 0xcccccccc
> 
> 
> My questions are:
> 
> 1. Why it prints "ffffffcc ffffffcc ffffffcc ffffffcc"? (if p is 
> unsigned char* then it will print correctly "cc cc cc cc")

This is because of the two's complement in which singed absolute values are stored internally. Since %x is a conversion of an integer, signed extension of the passed char happens, which in two's complement means that the leading bit is replicated to fill the upper bits. (0xC is 1100 in binary).

> 2. Why pointer to char p copied to j correctly, why not every member 
> in p overflow? since it is a signed char.

I am not quite sure about what the question is here (maybe caused by the lack of verbs in your sentence). Keep in mind that memcpy() only copies the memory, irrespective of the pointer type passed. Also, sizeof(unsigned int) == sizeof(int).

HTH, Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointer to a char
  2012-09-18  9:29 Pointer to a char Randi Botse
  2012-09-18 10:29 ` Phil Sutter
@ 2012-09-19  1:04 ` Jon Mayo
  2012-09-19  7:59   ` Randi Botse
  1 sibling, 1 reply; 7+ messages in thread
From: Jon Mayo @ 2012-09-19  1:04 UTC (permalink / raw)
  To: Randi Botse; +Cc: linux-c-programming

On Tue, Sep 18, 2012 at 2:29 AM, Randi Botse <nightdecoder@gmail.com> wrote:
> Hi, having coding in C for 3 years but I'm still not clear with this one.
> Consider this code.
>
> ...
> char *p;
> unsigned int i = 0xcccccccc;
> unsigned int j;
>
> p = (char *)  &i;
> printf("%.2x %.2x %.2x %.2x\n", *p, p[1], p[2], p[3]);
>

printf (and other var arg functions) don't take char, short or float.
they take int or double and a few other types.
those [signed] chars are going to get sign extended when they are
converted to signed int. (0xcc = -52 )

> memcpy(&j, p, sizeof(unsigned int));

the data at i, pointed to by p has not changed, so this memcpy works.
The only thing that is weird is how you interpreted the data (in your
printf above).

> printf("%x\n", j);
> ...
>
> Output:
>
> ffffffcc ffffffcc ffffffcc ffffffcc
> 0xcccccccc
>
>
> My questions are:
>
> 1. Why it prints "ffffffcc ffffffcc ffffffcc ffffffcc"? (if p is
> unsigned char* then it will print correctly "cc cc cc cc")
> 2. Why pointer to char p copied to j correctly, why not every member
> in p overflow? since it is a signed char.
>
> Regards.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointer to a char
  2012-09-19  1:04 ` Jon Mayo
@ 2012-09-19  7:59   ` Randi Botse
  2012-09-19  8:47     ` Leon Shaw
  2012-09-19 18:09     ` Jon Mayo
  0 siblings, 2 replies; 7+ messages in thread
From: Randi Botse @ 2012-09-19  7:59 UTC (permalink / raw)
  To: linux-c-programming

Hi Phil, Jon

Thanks, now I'm clear with this, assignment doesn't care with type modifier.

Code such as

unsigned int j = 0xffeeddcc;
int i = j;

Both has the same value depending on how them interpreted (is this
assumption correct?)

Because,

printf("%u", i) will be different to printf("%i", i)
- but -
printf("%u", i) wlll be same as printf("%u", j)


Actually why asking this because I often see a pointer to a char* cast

Let me show you with this example.
Consider some structures...

struct a_data {
    unsigned char f1[4];
    unsigned char f2[6];
    unsigned short f3[2];
};

and another struct named b_data, c_data, etc.

Then there is a general function to process all type of structure,
maybe something like this:

int process_data(char *buffer, size_t len);

Then if we cast for example a pointer to a_data struct to a char* as follow:

struct a_data a;
process_data((char*) &a, sizeof(a));

I though since it was cast to char*, the cast is "problem" because
every signed char buffer will have a range CHAR_MIN to CHAR_MAX,
therefore value of CHAR_MAX to UCHAR_MAX will broken (signed char
overflow)

I think process_data() should be declared with

int process_data(unsigned char *buffer, size_t len)

this declaration in seem correct and work for me.

However, now I'm conceptually understand why this works.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointer to a char
  2012-09-19  7:59   ` Randi Botse
@ 2012-09-19  8:47     ` Leon Shaw
  2012-09-19 18:09     ` Jon Mayo
  1 sibling, 0 replies; 7+ messages in thread
From: Leon Shaw @ 2012-09-19  8:47 UTC (permalink / raw)
  To: Randi Botse; +Cc: linux-c-programming

On Wed, Sep 19, 2012 at 3:59 PM, Randi Botse <nightdecoder@gmail.com> wrote:
> Hi Phil, Jon
>
> Thanks, now I'm clear with this, assignment doesn't care with type modifier.
>
> Code such as
>
> unsigned int j = 0xffeeddcc;
> int i = j;
>
> Both has the same value depending on how them interpreted (is this
> assumption correct?)
>

According to C99, when applying integer conversion, "if the new type
is signed and the value cannot be represented in it, either the result
is implementation-defined or an implementation-defined signal is
raised". But most implementation keeps the same memory representation.

> Because,
>
> printf("%u", i) will be different to printf("%i", i)
> - but -
> printf("%u", i) wlll be same as printf("%u", j)
>
>
> Actually why asking this because I often see a pointer to a char* cast
>
> Let me show you with this example.
> Consider some structures...
>
> struct a_data {
>     unsigned char f1[4];
>     unsigned char f2[6];
>     unsigned short f3[2];
> };
>
> and another struct named b_data, c_data, etc.
>
> Then there is a general function to process all type of structure,
> maybe something like this:
>
> int process_data(char *buffer, size_t len);
>
> Then if we cast for example a pointer to a_data struct to a char* as follow:
>
> struct a_data a;
> process_data((char*) &a, sizeof(a));
>
> I though since it was cast to char*, the cast is "problem" because
> every signed char buffer will have a range CHAR_MIN to CHAR_MAX,
> therefore value of CHAR_MAX to UCHAR_MAX will broken (signed char
> overflow)
>

Actually, whether char is signed or unsigned is
implementation-defined, though, normally, it is signed. SCHAR_MAX+1 ~
UCHAR_MAX can be mapped to SCHAR_MIN ~ -1.
For a pointer that denotes a memory region, what type it points to
doesn't cause much problem as long as you don't simply dereference it.
In such cases, void * might be less confusing.

Regards,
Leon


> I think process_data() should be declared with
>
> int process_data(unsigned char *buffer, size_t len)
>
> this declaration in seem correct and work for me.
>
> However, now I'm conceptually understand why this works.
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointer to a char
  2012-09-19  7:59   ` Randi Botse
  2012-09-19  8:47     ` Leon Shaw
@ 2012-09-19 18:09     ` Jon Mayo
  1 sibling, 0 replies; 7+ messages in thread
From: Jon Mayo @ 2012-09-19 18:09 UTC (permalink / raw)
  To: Randi Botse; +Cc: linux-c-programming

On Wed, Sep 19, 2012 at 12:59 AM, Randi Botse <nightdecoder@gmail.com> wrote:
> Hi Phil, Jon
>
> Thanks, now I'm clear with this, assignment doesn't care with type modifier.
>
> Code such as
>
> unsigned int j = 0xffeeddcc;
> int i = j;
>
> Both has the same value depending on how them interpreted (is this
> assumption correct?)
>
> Because,
>
> printf("%u", i) will be different to printf("%i", i)
> - but -
> printf("%u", i) wlll be same as printf("%u", j)
>
>

most architectures will work that way. some are a little nutty, but
standard C allows for implementation defined behavior when you
interpret a data type the wrong way. (it gets pretty specific about
signed versus unsigned representations)

I will readily admit that years of FORTH programming has warped my
mind and I no longer worry too much about signed int and unsigned int.
I tend to think more in terms of how big a data type is. The 'union'
keyword is especially useful for dealing with different ways to
interpret the same sized piece of memory.

float is often the same size as int. so this potentially works on some
platforms:

float f = 1;
int i = *(int*)&f;
printf("%u", i);

it would print some weird number that shows you how dramatic an
internal representation can differ if you manage to interpret it
incorrectly. (this trick is often used to dump float values in
hexidecimal "%x" for debugging purposes)

> Actually why asking this because I often see a pointer to a char* cast
>
> Let me show you with this example.
> Consider some structures...
>
> struct a_data {
>     unsigned char f1[4];
>     unsigned char f2[6];
>     unsigned short f3[2];
> };
>
> and another struct named b_data, c_data, etc.
>
> Then there is a general function to process all type of structure,
> maybe something like this:
>
> int process_data(char *buffer, size_t len);
>

I would have made process_data take a void * instead, so people
wouldn't have to hack around C's simple type checking with casts.

casting struct a_data* to char* doesn't change the value of the
pointer. if you ignore compiler warnings it will work without the
cast.

now inside process_data, the char* type is useful, because the pointer
math will use sizeof(char) [which is always 1] for calculating
offsets. while your sizeof(struct a_data) will be around 14 bytes.
Some people don't like to use void* here, because the compiler will
not like pointer math done on a void* as sizeof(void) doesn't make
sense. Old compilers hacked around this by treating it as 1. New
compilers will prefer that you cast or load the void* into a char*
(which is how i usually implement these sorts of functions)

> Then if we cast for example a pointer to a_data struct to a char* as follow:
>
> struct a_data a;
> process_data((char*) &a, sizeof(a));
>
> I though since it was cast to char*, the cast is "problem" because
> every signed char buffer will have a range CHAR_MIN to CHAR_MAX,
> therefore value of CHAR_MAX to UCHAR_MAX will broken (signed char
> overflow)
>

casting to a pointer won't alter the data. it just changes how you
would interpreter the data when dereferencing it. if process_data
doesn't dereference, then there is probably not a problem.

(also char can be signed or unsigned. in gcc you could use something
like -funsigned-char to override the default setting. which can
potentially break a lot of assumptions in your system and library
headers)

> I think process_data() should be declared with
>
> int process_data(unsigned char *buffer, size_t len)
>

you should use:
signed char *  - if you need signed
unsigned char * - if you need unsigned
char * - if you don't care either way. as long as the pointer points
to something char-sized.
void * - if you don't even care about what type it points to. (maybe a struct)

note- this rule is different than signed/unsigned int. int is always signed.

I use char* when dealing with strings, because I won't be using them
in situations where negative values could be a problem. but one
terrible issue you can run into is a simple function like this:

int isupper(char c)
{
const int upper_table[256] = { ... }; /* UCHAR_MAX is more appropriate here. */
return upper_table[c]; /* oops what if c is negative, that would be a
terrible array index. */
/* we would actually want to cast c to unsigned char, or at least
check x >= 0 && x < upper_table_len */
}

> this declaration in seem correct and work for me.
>
> However, now I'm conceptually understand why this works.
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-09-19 18:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-18  9:29 Pointer to a char Randi Botse
2012-09-18 10:29 ` Phil Sutter
2012-09-18 10:33   ` Duan Fugang-B38611
2012-09-19  1:04 ` Jon Mayo
2012-09-19  7:59   ` Randi Botse
2012-09-19  8:47     ` Leon Shaw
2012-09-19 18:09     ` Jon Mayo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).