* More on the Sun Disk Label Issue
@ 2006-04-10 22:48 Jim Gifford
2006-04-10 23:03 ` David S. Miller
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Jim Gifford @ 2006-04-10 22:48 UTC (permalink / raw)
To: sparclinux
My team has determined the cause of the issue is GCC 4.1, when
util-linux and the kernel is compiled with GCC 4.1, the checksums go out
of whack. We haven't haven't been able to isolate it down any further.
Here's what we know so far.
If we compile a Kernel(2.6.12-2.6.15) or util-linux(2.12r, 2.13-pre7) with GCC 4.1 we get
this message during boot-up.
sda:Dev sda Sun disklabel: Csum bad, label corrupted
unknown partition table
If we run fdisk on the machine if booted via a netboot, and attempt to fix the issue. The
following error occurs.
"Device contains neither a valid DOS partition table, nor Sun, SGI or
OSF disklabel
Building a new sun disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable."
Then I configured the label and saved, then I got the following error
message.
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
SCSI device sda: 35378533 512-byte hdwr sectors (18114 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
sda:Dev sda Sun disklabel: Csum bad, label corrupted
unknown partition table
SCSI device sda: 35378533 512-byte hdwr sectors (18114 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through w/ FUA
sda:Dev sda Sun disklabel: Csum bad, label corrupted
unknown partition table
Syncing disks.
I've tested using parted to create the label and the issues seems to
work for the current session. Investigating this currently.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
@ 2006-04-10 23:03 ` David S. Miller
2006-04-11 3:24 ` Jim Gifford
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-10 23:03 UTC (permalink / raw)
To: sparclinux
From: Jim Gifford <maillist@jg555.com>
Date: Mon, 10 Apr 2006 15:48:26 -0700
> If we compile a Kernel(2.6.12-2.6.15) or util-linux(2.12r, 2.13-pre7) with GCC 4.1 we get
> this message during boot-up.
> sda:Dev sda Sun disklabel: Csum bad, label corrupted
> unknown partition table
It might be laying out the Sun partition table data structure
differently or something like that. Or miscompiling the checksum
calculation loop.
Try compiling and running the following test program both
with a working compiler and with a non-working one, compare
the results. It should be a small enough test case for a
gcc expert to look at it:
#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef unsigned short __be16;
typedef unsigned int __be32;
struct sun_disklabel {
unsigned char info[128]; /* Informative text string */
unsigned char spare0[14];
struct sun_info {
unsigned char spare1;
unsigned char id;
unsigned char spare2;
unsigned char flags;
} infos[8];
unsigned char spare[246]; /* Boot information etc. */
__be16 rspeed; /* Disk rotational speed */
__be16 pcylcount; /* Physical cylinder count */
__be16 sparecyl; /* extra sects per cylinder */
unsigned char spare2[4]; /* More magic... */
__be16 ilfact; /* Interleave factor */
__be16 ncyl; /* Data cylinder count */
__be16 nacyl; /* Alt. cylinder count */
__be16 ntrks; /* Tracks per cylinder */
__be16 nsect; /* Sectors per track */
unsigned char spare3[4]; /* Even more magic... */
struct sun_partition {
__be32 start_cylinder;
__be32 num_sectors;
} partitions[8];
__be16 magic; /* Magic number */
__be16 csum; /* Label xor'd checksum */
};
#define dump_one_offset(name) \
printf("offsetof: " #name " %Zd\n", offsetof(struct sun_disklabel, name))
static void dump_offsets(void)
{
printf("sizeof: %Zd\n", sizeof(struct sun_disklabel));
dump_one_offset(info);
dump_one_offset(spare0);
dump_one_offset(infos);
dump_one_offset(spare);
dump_one_offset(rspeed);
dump_one_offset(pcylcount);
dump_one_offset(sparecyl);
dump_one_offset(spare2);
dump_one_offset(ilfact);
dump_one_offset(ncyl);
dump_one_offset(nacyl);
dump_one_offset(ntrks);
dump_one_offset(nsect);
dump_one_offset(spare3);
dump_one_offset(partitions);
dump_one_offset(magic);
dump_one_offset(csum);
}
int main(void)
{
struct sun_disklabel label;
__be16 csum, *ush;
dump_offsets();
memset(&label, 0xff, sizeof(label));
ush = ((__be16 *) (&label + 1)) - 1;
for (csum = 0; ush >= ((__be16 *) &label); )
csum ^= *ush--;
printf("Test checksum is %x\n", csum);
exit(0);
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
2006-04-10 23:03 ` David S. Miller
@ 2006-04-11 3:24 ` Jim Gifford
2006-04-11 6:03 ` David S. Miller
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jim Gifford @ 2006-04-11 3:24 UTC (permalink / raw)
To: sparclinux
Dave,
They both give the same output exactly.
sizeof: 512
offsetof: info 0
offsetof: spare0 128
offsetof: infos 142
offsetof: spare 174
offsetof: rspeed 420
offsetof: pcylcount 422
offsetof: sparecyl 424
offsetof: spare2 426
offsetof: ilfact 430
offsetof: ncyl 432
offsetof: nacyl 434
offsetof: ntrks 436
offsetof: nsect 438
offsetof: spare3 440
offsetof: partitions 444
offsetof: magic 508
offsetof: csum 510
Test checksum is 0
I think the problem may be with the way fdisk is creating the
labels. Here is how they are doing it.
typedef struct {
unsigned char info[128]; /* Informative text string */
unsigned char spare0[14];
struct sun_info {
unsigned char spare1;
unsigned char id;
unsigned char spare2;
unsigned char flags;
} infos[8];
unsigned char spare1[246]; /* Boot information etc. */
unsigned short rspeed; /* Disk rotational speed */
unsigned short pcylcount; /* Physical cylinder count */
unsigned short sparecyl; /* extra sects per cylinder */
unsigned char spare2[4]; /* More magic... */
unsigned short ilfact; /* Interleave factor */
unsigned short ncyl; /* Data cylinder count */
unsigned short nacyl; /* Alt. cylinder count */
unsigned short ntrks; /* Tracks per cylinder */
unsigned short nsect; /* Sectors per track */
unsigned char spare3[4]; /* Even more magic... */
struct sun_partition {
__u32 start_cylinder;
__u32 num_sectors;
} partitions[8];
unsigned short magic; /* Magic number */
unsigned short csum; /* Label xor'd checksum */
} sun_partition;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
2006-04-10 23:03 ` David S. Miller
2006-04-11 3:24 ` Jim Gifford
@ 2006-04-11 6:03 ` David S. Miller
2006-04-11 6:12 ` Jim Gifford
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-11 6:03 UTC (permalink / raw)
To: sparclinux
From: Jim Gifford <maillist@jg555.com>
Date: Mon, 10 Apr 2006 20:24:40 -0700
> I think the problem may be with the way fdisk is creating the
> labels. Here is how they are doing it.
It's how they are "doing it" eh?
You've shown a data structure, data structures don't "do" anything,
whereas code executes and "does" things.
What about this data structure layout makes you think fdisk is
doing something wrong?
Please don't just dump information into the discussion without saying
what you think it's showing us. What's wrong with it?
It's declared almost identically to the kernel one, the only
difference is that the kernel copy uses the big-endian tagged types
which are not available in userspace. None of the type sizes or
required alignment is different, the structure layout should be
identical.
Furthermore, you stated that if the only thing you change in the
test is to use a kernel compiled with gcc-4.1, that fails. How
can fdisk have any influence upon such a case?
People really need to learn how to report bugs properly and help the
process move forward, instead of backward. Many people who report
sparc64 bugs do things which waste time and hinder the debugging
process. Cheers to the folks who do not fall into this category, but
sadly most do and since I'm the only person working on the sparc64
kernel actively this is incredibly frustrating.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (2 preceding siblings ...)
2006-04-11 6:03 ` David S. Miller
@ 2006-04-11 6:12 ` Jim Gifford
2006-04-11 6:36 ` David S. Miller
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jim Gifford @ 2006-04-11 6:12 UTC (permalink / raw)
To: sparclinux
David S. Miller wrote:
> From: Jim Gifford <maillist@jg555.com>
> Date: Mon, 10 Apr 2006 20:24:40 -0700
>
>
>> I think the problem may be with the way fdisk is creating the
>> labels. Here is how they are doing it.
>>
>
> It's how they are "doing it" eh?
>
> You've shown a data structure, data structures don't "do" anything,
> whereas code executes and "does" things.
>
> What about this data structure layout makes you think fdisk is
> doing something wrong?
>
> Please don't just dump information into the discussion without saying
> what you think it's showing us. What's wrong with it?
>
Dave, the reason I say this is parted works. That's what's frustrating,
because of trying to determine what is actually at fault, is it the
kernel, or something else.
> People really need to learn how to report bugs properly and help the
> process move forward, instead of backward. Many people who report
> sparc64 bugs do things which waste time and hinder the debugging
> process. Cheers to the folks who do not fall into this category, but
> sadly most do and since I'm the only person working on the sparc64
> kernel actively this is incredibly frustrating.
>
The bottom line somehow the writing of the header is not being done
correctly. If it's fdisk or something related to the kernel, that's what
I'm trying to determine.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (3 preceding siblings ...)
2006-04-11 6:12 ` Jim Gifford
@ 2006-04-11 6:36 ` David S. Miller
2006-04-11 6:42 ` Jim Gifford
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-11 6:36 UTC (permalink / raw)
To: sparclinux
From: Jim Gifford <maillist@jg555.com>
Date: Mon, 10 Apr 2006 23:12:07 -0700
> The bottom line somehow the writing of the header is not being done
> correctly. If it's fdisk or something related to the kernel, that's what
> I'm trying to determine.
So let's start from the very beginning.
If you have an existing system that works, and you run a gcc-4.1
compiled kernel on it, does that work?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (4 preceding siblings ...)
2006-04-11 6:36 ` David S. Miller
@ 2006-04-11 6:42 ` Jim Gifford
2006-04-11 6:52 ` David S. Miller
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jim Gifford @ 2006-04-11 6:42 UTC (permalink / raw)
To: sparclinux
David S. Miller wrote:
> From: Jim Gifford <maillist@jg555.com>
> Date: Mon, 10 Apr 2006 23:12:07 -0700
>
>
>> The bottom line somehow the writing of the header is not being done
>> correctly. If it's fdisk or something related to the kernel, that's what
>> I'm trying to determine.
>>
>
> So let's start from the very beginning.
>
> If you have an existing system that works, and you run a gcc-4.1
> compiled kernel on it, does that work?
>
It works, but if the paritition is was created by a fdisk compiled with
GCC 4.1, it will give the checksum error during bootup. It just hangs,
the kernel works fine, but if you are using udev, this error prevents
the kernel from creating devices in /dev.
With everything I've discovered, and not my team, tonight. The problem
is actually a util-linux issue and not a kernel issue. The issue is that
util-linux doesn't generate a proper sun disk label, when it's built
with GCC 4.1.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (5 preceding siblings ...)
2006-04-11 6:42 ` Jim Gifford
@ 2006-04-11 6:52 ` David S. Miller
2006-04-15 0:47 ` Jim Gifford
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-11 6:52 UTC (permalink / raw)
To: sparclinux
From: Jim Gifford <maillist@jg555.com>
Date: Mon, 10 Apr 2006 23:42:31 -0700
> With everything I've discovered, and not my team, tonight. The problem
> is actually a util-linux issue and not a kernel issue. The issue is that
> util-linux doesn't generate a proper sun disk label, when it's built
> with GCC 4.1.
Ok, that's a great set of datapoints.
There are some things you can try to narrow this down further.
First, add some debugging to fdisk/fdisksunlabel.c in function
create_sunlabel(). Make sure it sets "other_endian" to zero.
Next thing you can try is to build just the fdisk/fdisksunlabel.o
object with gcc-4.0, then do the final link as normal. If this works,
you know for sure that fdisk/fdisksunlabel.c is being compiled
differently and we can narrow down our focus into there.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (6 preceding siblings ...)
2006-04-11 6:52 ` David S. Miller
@ 2006-04-15 0:47 ` Jim Gifford
2006-04-15 2:44 ` David S. Miller
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Jim Gifford @ 2006-04-15 0:47 UTC (permalink / raw)
To: sparclinux
Dave,
One of my team members solved the issue. Here are the patches that
we had to use to correct the issue.
Great work Joe.
For util-linux
http://www.linuxfromscratch.org/patches/downloads/util-linux/util-linux-2.12r-gcc41_sun_disklabel_fixes-1.patch
For Kernel
http://www.linuxfromscratch.org/patches/downloads/linux/linux-2.6.16.5-sun_disklabel_fix-1.patch
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (7 preceding siblings ...)
2006-04-15 0:47 ` Jim Gifford
@ 2006-04-15 2:44 ` David S. Miller
2006-04-15 2:48 ` Jim Gifford
2006-04-15 2:56 ` David S. Miller
10 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-15 2:44 UTC (permalink / raw)
To: sparclinux
From: Jim Gifford <maillist@jg555.com>
Date: Fri, 14 Apr 2006 17:47:47 -0700
> http://www.linuxfromscratch.org/patches/downloads/util-linux/util-linux-2.12r-gcc41_sun_disklabel_fixes-1.patch
Looks like you're simply working around a bug in the compiler.
That code has been in the kernel and util-linux for 10+ years.
Submit a bug report for gcc and get that bug fixed instead of papering
around the problem.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (8 preceding siblings ...)
2006-04-15 2:44 ` David S. Miller
@ 2006-04-15 2:48 ` Jim Gifford
2006-04-15 2:56 ` David S. Miller
10 siblings, 0 replies; 12+ messages in thread
From: Jim Gifford @ 2006-04-15 2:48 UTC (permalink / raw)
To: sparclinux
David S. Miller wrote:
> Looks like you're simply working around a bug in the compiler.
> That code has been in the kernel and util-linux for 10+ years.
>
> Submit a bug report for gcc and get that bug fixed instead of papering
> around the problem.
>
> Thanks.
>
Will do.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: More on the Sun Disk Label Issue
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
` (9 preceding siblings ...)
2006-04-15 2:48 ` Jim Gifford
@ 2006-04-15 2:56 ` David S. Miller
10 siblings, 0 replies; 12+ messages in thread
From: David S. Miller @ 2006-04-15 2:56 UTC (permalink / raw)
To: sparclinux
From: "David S. Miller" <davem@davemloft.net>
Date: Fri, 14 Apr 2006 19:44:44 -0700 (PDT)
> From: Jim Gifford <maillist@jg555.com>
> Date: Fri, 14 Apr 2006 17:47:47 -0700
>
> > http://www.linuxfromscratch.org/patches/downloads/util-linux/util-linux-2.12r-gcc41_sun_disklabel_fixes-1.patch
>
> Looks like you're simply working around a bug in the compiler.
> That code has been in the kernel and util-linux for 10+ years.
>
> Submit a bug report for gcc and get that bug fixed instead of papering
> around the problem.
When I see stuff like this, I literally want to cry....
I seriously question the correctness of your change:
- for (csum = 0; ush >= (unsigned short *)sunlabel;) csum ^= *ush--;
+ while (ush < (unsigned short *)sunlabel) csum ^= *ush--;
That's can't be correct, we're _DECREMENTING_ the pointer from the top
of the structure (minus one "unsigned short") _DOWN_ to the base at
"sunlabel", yet you've changed the pointer test into "less-than". "ush"
will _NEVER_ be less-than sunlabel, so the loop will exit immediately
and we won't compute a checksum at all.
If anything it should be:
+ while (ush >= (unsigned short *)sunlabel) csum ^= *ush--;
Look at how we initialize "ush":
ush = ((unsigned short *) (sunlabel + 1)) - 1;
That's "sunlabel + sizeof(struct sunlabel)" minus "sizeof(unsigned
short)" and then we march down from the top computing the checksum one
unsigned short at a time.
But even my "correct version" is wrong fundamentally because the code
is correct as-is so you're likely triggering some gcc bug that needs
to be investigated and fixed.
You really need to properly figure out what the bug is in gcc's loop
optimizer that the existing code is triggering, instead of papering
around it with incorrect patches to the source.
Patching this correct code is not the way to fix this problem.
Why do people do stuff like this without thinking? :-/
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-04-15 2:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-10 22:48 More on the Sun Disk Label Issue Jim Gifford
2006-04-10 23:03 ` David S. Miller
2006-04-11 3:24 ` Jim Gifford
2006-04-11 6:03 ` David S. Miller
2006-04-11 6:12 ` Jim Gifford
2006-04-11 6:36 ` David S. Miller
2006-04-11 6:42 ` Jim Gifford
2006-04-11 6:52 ` David S. Miller
2006-04-15 0:47 ` Jim Gifford
2006-04-15 2:44 ` David S. Miller
2006-04-15 2:48 ` Jim Gifford
2006-04-15 2:56 ` David S. Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.