* Big Endian fix patch (was: Re: Couple more fixes for Linux raid metadata 1.x support)
@ 2010-07-27 1:00 Doug Nazar
2010-07-27 15:26 ` Lennart Sorensen
0 siblings, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-27 1:00 UTC (permalink / raw)
To: lsorense, grub-devel
[-- Attachment #1: Type: text/plain, Size: 984 bytes --]
On Mon, Jul 26, 2010 at 01:20:34PM -0400, Lennart Sorensen wrote:
> On Mon, Jul 26, 2010 at 01:20:34PM -0400, Lennart Sorensen wrote:
> >/ Now this is on a powerpc64 system, so it is big endian. That number by/
> >/ the way is FFFFFFFFFFFFFF00./
> >
> >/ I wonder if some part of the 1.x raid handling code has an endianess bug./
> >
> >/ Got any guesses I can try before I just go convert back to 0.9 raids?/
> >/ I really hate giving up on things that ought to work and loose the/
> >/ debugging opportunity./
>
> Turns out it very much was endian issues.
>
> Here is a patch that fixes it for me.
>
> I am using your two recent patches as well.
I'd worried about endianess while testing. I've been spending the last 2
days trying to setup a QEMU powerpc64 image to run some regression
testing but it's taking forever (combination of old P4 computer and QEMU
giving me random errors from the cdrom drive).
I'll run it through the grinder in a bit.
Doug
[-- Attachment #2: Type: text/html, Size: 1363 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch (was: Re: Couple more fixes for Linux raid metadata 1.x support)
2010-07-27 1:00 Big Endian fix patch (was: Re: Couple more fixes for Linux raid metadata 1.x support) Doug Nazar
@ 2010-07-27 15:26 ` Lennart Sorensen
2010-07-27 23:58 ` Big Endian fix patch Doug Nazar
2010-07-28 8:51 ` Doug Nazar
0 siblings, 2 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-27 15:26 UTC (permalink / raw)
To: Doug Nazar; +Cc: grub-devel, lsorense
On Mon, Jul 26, 2010 at 09:00:53PM -0400, Doug Nazar wrote:
> I'd worried about endianess while testing. I've been spending the last 2
> days trying to setup a QEMU powerpc64 image to run some regression
> testing but it's taking forever (combination of old P4 computer and QEMU
> giving me random errors from the cdrom drive).
It is faster and simpler to use an iso image with qemu rather than the
real cdrom drive. Seems much more reliable that way.
> I'll run it through the grinder in a bit.
I have an IBM p520 here (dual core power6+ 4.2GHz). Nothing takes long
here. :)
So with my patch I am booting of the md raid1 very successfully (other
than manually having to fix grub-mkimage run since grub-install doesn't
understand the machine yet. I am working on fixing that now.)
Quite simply, _everything_ that accesses the superblock structure
numbers has to use endianess conversion since 1.x md superblocks are
always little endian. 0.9 is host endian all the time and hence simpler
(but also not portable).
And the checksum needs folding according to the kernel before comparing
(which I added). Otherwise different architectures may have different
results.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-27 15:26 ` Lennart Sorensen
@ 2010-07-27 23:58 ` Doug Nazar
2010-07-28 14:52 ` Lennart Sorensen
2010-07-28 8:51 ` Doug Nazar
1 sibling, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-27 23:58 UTC (permalink / raw)
To: Lennart Sorensen; +Cc: grub-devel
On 2010-07-27 11:26 AM, Lennart Sorensen wrote:
> It is faster and simpler to use an iso image with qemu rather than the
> real cdrom drive. Seems much more reliable that way.
It is an ISO image. That's what has me steamed. I'm not sure how I'm
still getting errors. First I was doing a Gentoo install but there were
a few too many decisions I wasn't sure about for PPC and I was having
problems cross-compiling some packages. So I grabbed an Ubuntu image and
it would randomly error out on different packages. I'm not too familiar
with Ubuntu but finally figured out how to nfs mount the image over the
cdrom mount point. Then found that adding extra drives caused the
openbios aliases to change and the boot command wouldn't work with the
full paths. After checking the openbios source found it ignores
boot-device, boot-file etc. so I ended up open-coding the boot-command
as a load & go to get grub to boot.
> I have an IBM p520 here (dual core power6+ 4.2GHz). Nothing takes long
> here. :)
>
Yes, I definitely need some new toys....<sigh>
> So with my patch I am booting of the md raid1 very successfully (other
> than manually having to fix grub-mkimage run since grub-install doesn't
> understand the machine yet. I am working on fixing that now.)
I didn't have a big problem with grub-mkimage. Used the command
./grub-mkimage -O powerpc-ieee1275 -d . -o grubof.modules -p
"(/pci/ata-io/ata-3/disk@0,4)/" *.mod
My biggest issue was cross-compiling grub. A couple commands are host
run so I had to borrow a copy from my x86 tree. Would there be any
interest in extending grub to do this automatically?
(Although re-reading your sentence I think you meant "having to run
grub-mkimage manually"....)
> Quite simply, _everything_ that accesses the superblock structure
> numbers has to use endianess conversion since 1.x md superblocks are
> always little endian. 0.9 is host endian all the time and hence simpler
> (but also not portable).
>
> And the checksum needs folding according to the kernel before comparing
> (which I added). Otherwise different architectures may have different
> results.
Yeah, I'd noticed that some of the original code didn't do this, but I
didn't want to make changes to it until I had a test environment.
I just finished building my test arrays. It consists of 13 partitions
used to build 4 arrays with a combination of metadata types, raid types,
& spares. Then building an lvm on top of that. I'll start running some
tests and randomly failing some members.
Does grub display correctly on the p520? I'm having display issues
(looks like the lines are too long) and I'm not sure yet if it's grub or
openbios. Also, a current version of openbios doesn't just drop in to QEMU.
But first some breakfast......;-)
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-27 15:26 ` Lennart Sorensen
2010-07-27 23:58 ` Big Endian fix patch Doug Nazar
@ 2010-07-28 8:51 ` Doug Nazar
2010-07-28 15:00 ` Lennart Sorensen
2010-09-13 19:54 ` Vladimir 'φ-coder/phcoder' Serbinenko
1 sibling, 2 replies; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 8:51 UTC (permalink / raw)
To: Lennart Sorensen; +Cc: grub-devel
[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]
On 2010-07-27 11:26 AM, Lennart Sorensen wrote:
> On Mon, Jul 26, 2010 at 09:00:53PM -0400, Doug Nazar wrote:
>
>> I'll run it through the grinder in a bit.
> So with my patch I am booting of the md raid1 very successfully (other
> than manually having to fix grub-mkimage run since grub-install doesn't
> understand the machine yet. I am working on fixing that now.)
Ok, finally made some progress. Ran into several issues, some of them
obviously QEMU/OpenBios that I'm not sure if GRUB should work around.
With your patch the raid mostly worked, small problem with too many
devices because OpenBios creates several aliases for some devices. Also
I think you missed the endianess when setting the level.
This patch includes the following:
- Fix the ofdisk_hash system. We weren't making a copy of the devpath so
never found the cached item again.
- Extend the ofdisk_hash to cache the disk size
- Scan for a disk size using seek (probably want to set a different
start size). Required for metadata 1.0 arrays
- Optimize checking of raid level
- If we find a duplicate disk (claims to be same index in the array),
skip it or else level 0 arrays wont be found
- QEMU/OpenBios doesn't seem to like if the prev & name parameters of
ieee1275_next_property are the same pointer which caused no devices to
be found
The issues that I came across which are in QEMU/OpenBios:
- The rows are misreported. screen-#rows is set to 75 when in fact there
are only 60 rows. Worked around using -prom-env parameter
- Aliases don't take into account the index (i.e. disk@1). I ended up with
disk /pci/pci-ata/ata-1/disk
hd /pci/pci-ata/ata-1/disk
ide0 /pci/pci-ata/ata-1/disk
ide1 /pci/mac-io/ata-3/disk
ide2 /pci/mac-io/ata-3/disk
when ide1 should be disk@0 and ide2 should be disk@1
- boot command hangs when passed wrong disk or used from boot-command.
Worked around by using load & go
Things do work, and fixing QEMU/OpenBios is a bit further down the
rabbit hole than I want to go. ;-)
Thanks,
Doug
[-- Attachment #2: grub-fix-qemu-ppc.diff --]
[-- Type: text/plain, Size: 6169 bytes --]
=== modified file 'disk/ieee1275/ofdisk.c'
--- disk/ieee1275/ofdisk.c 2010-05-31 19:01:01 +0000
+++ disk/ieee1275/ofdisk.c 2010-07-28 07:51:12 +0000
@@ -26,6 +26,7 @@
struct ofdisk_hash_ent
{
char *devpath;
+ grub_disk_addr_t size;
struct ofdisk_hash_ent *next;
};
@@ -63,7 +64,8 @@
if (p)
{
- p->devpath = devpath;
+ p->devpath = grub_strdup(devpath);
+ p->size = 0;
p->next = *head;
*head = p;
}
@@ -201,11 +203,72 @@
grub_dprintf ("disk", "Opened `%s' as handle %p.\n", op->devpath,
(void *) (unsigned long) dev_ihandle);
- /* XXX: There is no property to read the number of blocks. There
- should be a property `#blocks', but it is not there. Perhaps it
- is possible to use seek for this. */
- disk->total_sectors = GRUB_DISK_SIZE_UNKNOWN;
-
+ if (op->size != 0)
+ {
+ disk->total_sectors = op->size;
+ }
+ else
+ {
+ grub_disk_addr_t curr = 1LLU * 1024 * 1024 * 1024;
+ grub_disk_addr_t size = curr;
+ grub_ssize_t status;
+ int seek_top = 1;
+
+ disk->total_sectors = GRUB_DISK_SIZE_UNKNOWN;
+
+ while (1)
+ {
+ grub_int8_t success = 0;
+ char buf[32];
+
+ grub_ieee1275_seek (dev_ihandle, curr, &status);
+ if (status >= 0)
+ {
+ grub_ieee1275_read (dev_ihandle, buf, 1, &actual);
+ if (actual == 1)
+ success = 1;
+ }
+
+ if (seek_top)
+ {
+ if (success)
+ {
+ /* grow */
+ size = curr;
+ curr = curr * 2;
+ }
+ else
+ {
+ seek_top = 0;
+ }
+ }
+
+ if (!seek_top)
+ {
+ size /= 2;
+ if (size < 512)
+ size = 512;
+
+ if (success)
+ {
+ curr += size;
+ }
+ else
+ {
+ if (size == 512)
+ break;
+
+ curr -= size;
+ }
+ }
+
+ }
+
+ if (size > 1024)
+ disk->total_sectors = curr / 512;
+ op->size = disk->total_sectors;
+ }
+
disk->id = (unsigned long) op;
/* XXX: Read this, somehow. */
=== modified file 'disk/mdraid_linux.c'
--- disk/mdraid_linux.c 2010-07-28 08:15:40 +0000
+++ disk/mdraid_linux.c 2010-07-28 08:16:59 +0000
@@ -322,6 +322,7 @@
grub_disk_addr_t *start_sector)
{
grub_uint64_t sb_size;
+ grub_int32_t level;
struct grub_raid_super_1x *real_sb;
if (grub_le_to_cpu32(sb->major_version) != 1)
@@ -341,14 +342,15 @@
if (grub_le_to_cpu32(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE)
return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET, "reshape active");
+ level = grub_le_to_cpu32(sb->level);
/* Multipath. */
- if ((int) grub_le_to_cpu32(sb->level) == -4)
- sb->level = 1;
+ if (level == -4)
+ level = 1;
- if (grub_le_to_cpu32(sb->level) != 0 && grub_le_to_cpu32(sb->level) != 1 && grub_le_to_cpu32(sb->level) != 4 &&
- grub_le_to_cpu32(sb->level) != 5 && grub_le_to_cpu32(sb->level) != 6 && grub_le_to_cpu32(sb->level) != 10)
+ if (level != 0 && level != 1 && level != 4 &&
+ level != 5 && level != 6 && level != 10)
return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
- "Unsupported RAID level: %d", grub_le_to_cpu32(sb->level));
+ "Unsupported RAID level: %d", level);
/* 1.x superblocks don't have a fixed size on disk. So we have to
read it again now that we now the max device count. */
@@ -390,7 +392,7 @@
}
array->number = 0;
- array->level = grub_le_to_cpu32 (real_sb->level);
+ array->level = level;
array->layout = grub_le_to_cpu32 (real_sb->layout);
array->total_devs = grub_le_to_cpu32 (real_sb->raid_disks);
array->disk_size = grub_le_to_cpu64 (real_sb->size);
=== modified file 'disk/raid.c'
--- disk/raid.c 2010-07-24 09:59:10 +0000
+++ disk/raid.c 2010-07-28 07:38:32 +0000
@@ -511,10 +511,13 @@
array->total_devs);
if (array->device[new_array->index] != NULL)
+ {
/* We found multiple devices with the same number. Again,
this shouldn't happen. */
grub_dprintf ("raid", "Found two disks with the number %d?!?\n",
new_array->number);
+ return grub_error (GRUB_ERR_OUT_OF_RANGE, "duplicate disk number");
+ }
if (new_array->disk_size < array->disk_size)
array->disk_size = new_array->disk_size;
=== modified file 'kern/ieee1275/openfw.c'
--- kern/ieee1275/openfw.c 2010-07-02 12:47:14 +0000
+++ kern/ieee1275/openfw.c 2010-07-28 08:09:40 +0000
@@ -122,7 +122,7 @@
grub_devalias_iterate (int (*hook) (struct grub_ieee1275_devalias *alias))
{
grub_ieee1275_phandle_t aliases;
- char *aliasname, *devtype;
+ char *aliasname, *devtype, *prev;
grub_ssize_t actual;
struct grub_ieee1275_devalias alias;
int ret = 0;
@@ -133,21 +133,30 @@
aliasname = grub_malloc (IEEE1275_MAX_PROP_LEN);
if (!aliasname)
return 0;
+ prev = grub_malloc (IEEE1275_MAX_PROP_LEN);
+ if (!prev)
+ {
+ grub_free (aliasname);
+ return 0;
+ }
devtype = grub_malloc (IEEE1275_MAX_PROP_LEN);
if (!devtype)
- {
- grub_free (aliasname);
- return 0;
- }
-
+ {
+ grub_free (prev);
+ grub_free (aliasname);
+ return 0;
+ }
+
/* Find the first property. */
+ prev[0] = '\0';
aliasname[0] = '\0';
- while (grub_ieee1275_next_property (aliases, aliasname, aliasname) > 0)
+ while (grub_ieee1275_next_property (aliases, prev, aliasname) > 0)
{
grub_ieee1275_phandle_t dev;
grub_ssize_t pathlen;
char *devpath;
+ char *temp = prev;
grub_dprintf ("devalias", "devalias name = %s\n", aliasname);
@@ -155,7 +164,7 @@
/* The property `name' is a special case we should skip. */
if (!grub_strcmp (aliasname, "name"))
- continue;
+ goto skip;
/* Sun's OpenBoot often doesn't zero terminate the device alias
strings, so we will add a NULL byte at the end explicitly. */
@@ -165,6 +174,7 @@
if (! devpath)
{
grub_free (devtype);
+ grub_free (prev);
grub_free (aliasname);
return 0;
}
@@ -199,9 +209,14 @@
grub_free (devpath);
if (ret)
break;
+skip:
+ prev = aliasname;
+ aliasname = temp;
+ aliasname[0] = '\0';
}
grub_free (devtype);
+ grub_free (prev);
grub_free (aliasname);
return ret;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-27 23:58 ` Big Endian fix patch Doug Nazar
@ 2010-07-28 14:52 ` Lennart Sorensen
2010-07-28 15:29 ` Doug Nazar
0 siblings, 1 reply; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 14:52 UTC (permalink / raw)
To: The development of GNU GRUB; +Cc: Lennart Sorensen
On Tue, Jul 27, 2010 at 07:58:01PM -0400, Doug Nazar wrote:
> It is an ISO image. That's what has me steamed. I'm not sure how I'm
> still getting errors. First I was doing a Gentoo install but there were
> a few too many decisions I wasn't sure about for PPC and I was having
> problems cross-compiling some packages. So I grabbed an Ubuntu image and
> it would randomly error out on different packages. I'm not too familiar
> with Ubuntu but finally figured out how to nfs mount the image over the
> cdrom mount point. Then found that adding extra drives caused the
> openbios aliases to change and the boot command wouldn't work with the
> full paths. After checking the openbios source found it ignores
> boot-device, boot-file etc. so I ended up open-coding the boot-command
> as a load & go to get grub to boot.
I have booted Debian powerpc on qemu before without any issues.
It just worked. Well initially the install couldn't boot because it
was using quik at the time (oldworld mac emulation), and the ramdisk
image was too large for quik. I fixed quik to support larger ramdisks
(and ext3 and such just by using a new libe2fs which also broke it).
> Yes, I definitely need some new toys....<sigh>
Well we needed a faster build server for powerpc and the MPC8315 350MHz
with 128MB ram really wasn't doing it. A dual core 4.2Ghz Power6+
with 4GB ram will be a lot better.
>> So with my patch I am booting of the md raid1 very successfully (other
>> than manually having to fix grub-mkimage run since grub-install doesn't
>> understand the machine yet. I am working on fixing that now.)
> I didn't have a big problem with grub-mkimage. Used the command
> ./grub-mkimage -O powerpc-ieee1275 -d . -o grubof.modules -p
> "(/pci/ata-io/ata-3/disk@0,4)/" *.mod
Without a devalias, grub can't find ANY disks on the IBM firmware.
With a devalias created, it works fine. I suppose this qualifies as a
bug in grub.
> My biggest issue was cross-compiling grub. A couple commands are host
> run so I had to borrow a copy from my x86 tree. Would there be any
> interest in extending grub to do this automatically?
Well I haven't had to do that.
> (Although re-reading your sentence I think you meant "having to run
> grub-mkimage manually"....)
Yes, since grub-install sure didn't do it right. It has no clue how an
IBM powerpc works.
> Yeah, I'd noticed that some of the original code didn't do this, but I
> didn't want to make changes to it until I had a test environment.
>
> I just finished building my test arrays. It consists of 13 partitions
> used to build 4 arrays with a combination of metadata types, raid types,
> & spares. Then building an lvm on top of that. I'll start running some
> tests and randomly failing some members.
>
> Does grub display correctly on the p520? I'm having display issues
> (looks like the lines are too long) and I'm not sure yet if it's grub or
> openbios. Also, a current version of openbios doesn't just drop in to
> QEMU.
It is not too bad. I haven't noticed anything too awful in the display.
It messes up a lot when it starts scrolling the screen though, so I have
to 'clear' and then continue to keep it legible. The menu looks fine.
> But first some breakfast......;-)
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 8:51 ` Doug Nazar
@ 2010-07-28 15:00 ` Lennart Sorensen
2010-07-28 15:51 ` Lennart Sorensen
2010-07-28 15:52 ` Doug Nazar
2010-09-13 19:54 ` Vladimir 'φ-coder/phcoder' Serbinenko
1 sibling, 2 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 15:00 UTC (permalink / raw)
To: The development of GNU GRUB; +Cc: Lennart Sorensen
On Wed, Jul 28, 2010 at 04:51:24AM -0400, Doug Nazar wrote:
> Ok, finally made some progress. Ran into several issues, some of them
> obviously QEMU/OpenBios that I'm not sure if GRUB should work around.
> With your patch the raid mostly worked, small problem with too many
> devices because OpenBios creates several aliases for some devices. Also
> I think you missed the endianess when setting the level.
OK, no idea how I got by without that. It is currently working for me.
Weird.
I only use raid1 of course, so that is all I tested with.
> This patch includes the following:
>
> - Fix the ofdisk_hash system. We weren't making a copy of the devpath so
> never found the cached item again.
Could this have anything to do with why I can't see disks without
devaliases assigned?
> - Extend the ofdisk_hash to cache the disk size
> - Scan for a disk size using seek (probably want to set a different
> start size). Required for metadata 1.0 arrays
> - Optimize checking of raid level
> - If we find a duplicate disk (claims to be same index in the array),
> skip it or else level 0 arrays wont be found
> - QEMU/OpenBios doesn't seem to like if the prev & name parameters of
> ieee1275_next_property are the same pointer which caused no devices to
> be found
QEMU/openbios has many bugs unfortunately. If I could make any sense
of the code I would try to fix some of them, but I simply can't follow
that code.
> The issues that I came across which are in QEMU/OpenBios:
>
> - The rows are misreported. screen-#rows is set to 75 when in fact there
> are only 60 rows. Worked around using -prom-env parameter
> - Aliases don't take into account the index (i.e. disk@1). I ended up with
> disk /pci/pci-ata/ata-1/disk
> hd /pci/pci-ata/ata-1/disk
> ide0 /pci/pci-ata/ata-1/disk
> ide1 /pci/mac-io/ata-3/disk
> ide2 /pci/mac-io/ata-3/disk
>
> when ide1 should be disk@0 and ide2 should be disk@1
> - boot command hangs when passed wrong disk or used from boot-command.
> Worked around by using load & go
>
> Things do work, and fixing QEMU/OpenBios is a bit further down the
> rabbit hole than I want to go. ;-)
>
> Thanks,
> Doug
>
> === modified file 'disk/ieee1275/ofdisk.c'
> --- disk/ieee1275/ofdisk.c 2010-05-31 19:01:01 +0000
> +++ disk/ieee1275/ofdisk.c 2010-07-28 07:51:12 +0000
> @@ -26,6 +26,7 @@
> struct ofdisk_hash_ent
> {
> char *devpath;
> + grub_disk_addr_t size;
> struct ofdisk_hash_ent *next;
> };
>
> @@ -63,7 +64,8 @@
>
> if (p)
> {
> - p->devpath = devpath;
> + p->devpath = grub_strdup(devpath);
> + p->size = 0;
> p->next = *head;
> *head = p;
> }
> @@ -201,11 +203,72 @@
> grub_dprintf ("disk", "Opened `%s' as handle %p.\n", op->devpath,
> (void *) (unsigned long) dev_ihandle);
>
> - /* XXX: There is no property to read the number of blocks. There
> - should be a property `#blocks', but it is not there. Perhaps it
> - is possible to use seek for this. */
> - disk->total_sectors = GRUB_DISK_SIZE_UNKNOWN;
> -
> + if (op->size != 0)
> + {
> + disk->total_sectors = op->size;
> + }
> + else
> + {
> + grub_disk_addr_t curr = 1LLU * 1024 * 1024 * 1024;
> + grub_disk_addr_t size = curr;
> + grub_ssize_t status;
> + int seek_top = 1;
> +
> + disk->total_sectors = GRUB_DISK_SIZE_UNKNOWN;
> +
> + while (1)
> + {
> + grub_int8_t success = 0;
> + char buf[32];
> +
> + grub_ieee1275_seek (dev_ihandle, curr, &status);
> + if (status >= 0)
> + {
> + grub_ieee1275_read (dev_ihandle, buf, 1, &actual);
> + if (actual == 1)
> + success = 1;
> + }
> +
> + if (seek_top)
> + {
> + if (success)
> + {
> + /* grow */
> + size = curr;
> + curr = curr * 2;
> + }
> + else
> + {
> + seek_top = 0;
> + }
> + }
> +
> + if (!seek_top)
> + {
> + size /= 2;
> + if (size < 512)
> + size = 512;
> +
> + if (success)
> + {
> + curr += size;
> + }
> + else
> + {
> + if (size == 512)
> + break;
> +
> + curr -= size;
> + }
> + }
> +
> + }
> +
> + if (size > 1024)
> + disk->total_sectors = curr / 512;
> + op->size = disk->total_sectors;
> + }
> +
> disk->id = (unsigned long) op;
>
> /* XXX: Read this, somehow. */
>
> === modified file 'disk/mdraid_linux.c'
> --- disk/mdraid_linux.c 2010-07-28 08:15:40 +0000
> +++ disk/mdraid_linux.c 2010-07-28 08:16:59 +0000
> @@ -322,6 +322,7 @@
> grub_disk_addr_t *start_sector)
> {
> grub_uint64_t sb_size;
> + grub_int32_t level;
> struct grub_raid_super_1x *real_sb;
>
> if (grub_le_to_cpu32(sb->major_version) != 1)
> @@ -341,14 +342,15 @@
> if (grub_le_to_cpu32(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE)
> return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET, "reshape active");
>
> + level = grub_le_to_cpu32(sb->level);
> /* Multipath. */
> - if ((int) grub_le_to_cpu32(sb->level) == -4)
> - sb->level = 1;
> + if (level == -4)
> + level = 1;
Oh I see. I missed the endian conversion on setting the level. Given I
wasn't using multipath (or whatever -4 is) I would not have hit that.
The one time conversion to a local variable and reuse of that does look
a lot better.
> - if (grub_le_to_cpu32(sb->level) != 0 && grub_le_to_cpu32(sb->level) != 1 && grub_le_to_cpu32(sb->level) != 4 &&
> - grub_le_to_cpu32(sb->level) != 5 && grub_le_to_cpu32(sb->level) != 6 && grub_le_to_cpu32(sb->level) != 10)
> + if (level != 0 && level != 1 && level != 4 &&
> + level != 5 && level != 6 && level != 10)
> return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
> - "Unsupported RAID level: %d", grub_le_to_cpu32(sb->level));
> + "Unsupported RAID level: %d", level);
>
> /* 1.x superblocks don't have a fixed size on disk. So we have to
> read it again now that we now the max device count. */
> @@ -390,7 +392,7 @@
> }
>
> array->number = 0;
> - array->level = grub_le_to_cpu32 (real_sb->level);
> + array->level = level;
> array->layout = grub_le_to_cpu32 (real_sb->layout);
> array->total_devs = grub_le_to_cpu32 (real_sb->raid_disks);
> array->disk_size = grub_le_to_cpu64 (real_sb->size);
>
> === modified file 'disk/raid.c'
> --- disk/raid.c 2010-07-24 09:59:10 +0000
> +++ disk/raid.c 2010-07-28 07:38:32 +0000
> @@ -511,10 +511,13 @@
> array->total_devs);
>
> if (array->device[new_array->index] != NULL)
> + {
> /* We found multiple devices with the same number. Again,
> this shouldn't happen. */
> grub_dprintf ("raid", "Found two disks with the number %d?!?\n",
> new_array->number);
> + return grub_error (GRUB_ERR_OUT_OF_RANGE, "duplicate disk number");
> + }
>
> if (new_array->disk_size < array->disk_size)
> array->disk_size = new_array->disk_size;
>
> === modified file 'kern/ieee1275/openfw.c'
> --- kern/ieee1275/openfw.c 2010-07-02 12:47:14 +0000
> +++ kern/ieee1275/openfw.c 2010-07-28 08:09:40 +0000
> @@ -122,7 +122,7 @@
> grub_devalias_iterate (int (*hook) (struct grub_ieee1275_devalias *alias))
> {
> grub_ieee1275_phandle_t aliases;
> - char *aliasname, *devtype;
> + char *aliasname, *devtype, *prev;
> grub_ssize_t actual;
> struct grub_ieee1275_devalias alias;
> int ret = 0;
> @@ -133,21 +133,30 @@
> aliasname = grub_malloc (IEEE1275_MAX_PROP_LEN);
> if (!aliasname)
> return 0;
> + prev = grub_malloc (IEEE1275_MAX_PROP_LEN);
> + if (!prev)
> + {
> + grub_free (aliasname);
> + return 0;
> + }
> devtype = grub_malloc (IEEE1275_MAX_PROP_LEN);
> if (!devtype)
> - {
> - grub_free (aliasname);
> - return 0;
> - }
> -
> + {
> + grub_free (prev);
> + grub_free (aliasname);
> + return 0;
> + }
> +
> /* Find the first property. */
> + prev[0] = '\0';
> aliasname[0] = '\0';
>
> - while (grub_ieee1275_next_property (aliases, aliasname, aliasname) > 0)
> + while (grub_ieee1275_next_property (aliases, prev, aliasname) > 0)
> {
> grub_ieee1275_phandle_t dev;
> grub_ssize_t pathlen;
> char *devpath;
> + char *temp = prev;
>
> grub_dprintf ("devalias", "devalias name = %s\n", aliasname);
>
> @@ -155,7 +164,7 @@
>
> /* The property `name' is a special case we should skip. */
> if (!grub_strcmp (aliasname, "name"))
> - continue;
> + goto skip;
>
> /* Sun's OpenBoot often doesn't zero terminate the device alias
> strings, so we will add a NULL byte at the end explicitly. */
> @@ -165,6 +174,7 @@
> if (! devpath)
> {
> grub_free (devtype);
> + grub_free (prev);
> grub_free (aliasname);
> return 0;
> }
> @@ -199,9 +209,14 @@
> grub_free (devpath);
> if (ret)
> break;
> +skip:
> + prev = aliasname;
> + aliasname = temp;
> + aliasname[0] = '\0';
> }
>
> grub_free (devtype);
> + grub_free (prev);
> grub_free (aliasname);
> return ret;
> }
I will give this patch a try on top of mine then and see how it behaves.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 14:52 ` Lennart Sorensen
@ 2010-07-28 15:29 ` Doug Nazar
0 siblings, 0 replies; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 15:29 UTC (permalink / raw)
To: grub-devel
On 2010-07-28 10:52 AM, Lennart Sorensen wrote:
>
> I have booted Debian powerpc on qemu before without any issues.
> It just worked. Well initially the install couldn't boot because it
> was using quik at the time (oldworld mac emulation), and the ramdisk
> image was too large for quik. I fixed quik to support larger ramdisks
> (and ext3 and such just by using a new libe2fs which also broke it).
>
This is my first foray into PPC. I've worked with SPARC so I have some
familiarity with OF but OpenBios just didn't seem to handle what I
remembered and the help/docs leave a lot to be desired. QEMU/Ubuntu
mentioned different machine types and several other things that I wasn't
sure about. Probably makes a lot of sense if you're coming from real
hardware.
> Without a devalias, grub can't find ANY disks on the IBM firmware.
> With a devalias created, it works fine. I suppose this qualifies as a
> bug in grub.
>
Right now the iterate function only scans the /aliases node for block
devices. I'd rather it scan the real devices however I still want the
aliases to show up in 'ls'. Still thinking about how to do this without
too much surgery.
> It is not too bad. I haven't noticed anything too awful in the display.
> It messes up a lot when it starts scrolling the screen though, so I have
> to 'clear' and then continue to keep it legible. The menu looks fine.
That's what I was seeing. Turns out the screen-#rows default was way
off. Resetting that before booting grub made everything pretty.
I didn't dig too deep into Grub's OF console handling to see how it
handled the output. I saw it read the rows & columns from OF and just
assumed it was an OpenBios bug. Perhaps it uses a different font or
something and needs to recalc the height. I'll take a look.
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 15:00 ` Lennart Sorensen
@ 2010-07-28 15:51 ` Lennart Sorensen
2010-07-28 16:30 ` Doug Nazar
2010-07-28 15:52 ` Doug Nazar
1 sibling, 1 reply; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 15:51 UTC (permalink / raw)
To: Lennart Sorensen; +Cc: The development of GNU GRUB
On Wed, Jul 28, 2010 at 11:00:37AM -0400, Lennart Sorensen wrote:
> On Wed, Jul 28, 2010 at 04:51:24AM -0400, Doug Nazar wrote:
> > Ok, finally made some progress. Ran into several issues, some of them
> > obviously QEMU/OpenBios that I'm not sure if GRUB should work around.
> > With your patch the raid mostly worked, small problem with too many
> > devices because OpenBios creates several aliases for some devices. Also
> > I think you missed the endianess when setting the level.
>
> OK, no idea how I got by without that. It is currently working for me.
> Weird.
>
> I only use raid1 of course, so that is all I tested with.
>
> > This patch includes the following:
> >
> > - Fix the ofdisk_hash system. We weren't making a copy of the devpath so
> > never found the cached item again.
>
> Could this have anything to do with why I can't see disks without
> devaliases assigned?
>
> > - Extend the ofdisk_hash to cache the disk size
> > - Scan for a disk size using seek (probably want to set a different
> > start size). Required for metadata 1.0 arrays
> > - Optimize checking of raid level
> > - If we find a duplicate disk (claims to be same index in the array),
> > skip it or else level 0 arrays wont be found
> > - QEMU/OpenBios doesn't seem to like if the prev & name parameters of
> > ieee1275_next_property are the same pointer which caused no devices to
> > be found
>
> QEMU/openbios has many bugs unfortunately. If I could make any sense
> of the code I would try to fix some of them, but I simply can't follow
> that code.
>
> > The issues that I came across which are in QEMU/OpenBios:
> >
> > - The rows are misreported. screen-#rows is set to 75 when in fact there
> > are only 60 rows. Worked around using -prom-env parameter
> > - Aliases don't take into account the index (i.e. disk@1). I ended up with
> > disk /pci/pci-ata/ata-1/disk
> > hd /pci/pci-ata/ata-1/disk
> > ide0 /pci/pci-ata/ata-1/disk
> > ide1 /pci/mac-io/ata-3/disk
> > ide2 /pci/mac-io/ata-3/disk
> >
> > when ide1 should be disk@0 and ide2 should be disk@1
> > - boot command hangs when passed wrong disk or used from boot-command.
> > Worked around by using load & go
> >
> > Things do work, and fixing QEMU/OpenBios is a bit further down the
> > rabbit hole than I want to go. ;-)
Well the patch didn't break anything for me, but didn't seem to change
anything either. So so far not a problem.
I was wondering a bit why tab completion for ls doesn't work on md 1.x devices. If I do:
ls (md/0<tab>, I get
ls (md/0,
Well since I don't have any partitions, that doesn't help anything).
Also if I do:
ls (md/0)/<tab> I get nothing. ls (md/0)/ does work, but the tab
completion refuses to believe it could work. I miss tab completition. :)
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 15:00 ` Lennart Sorensen
2010-07-28 15:51 ` Lennart Sorensen
@ 2010-07-28 15:52 ` Doug Nazar
2010-07-28 15:55 ` Lennart Sorensen
1 sibling, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 15:52 UTC (permalink / raw)
To: grub-devel
On 2010-07-28 11:00 AM, Lennart Sorensen wrote:
>
> OK, no idea how I got by without that. It is currently working for me.
> Weird.
>
> I only use raid1 of course, so that is all I tested with.
>
It only mattered for multipath. Which is a kind of raid1 setup. Grub
doesn't handle it too well since the underlying device is the same and
has the same disk number. Before we'd overwrite the old path with the
last path found and incorrectly increase nr_devs. With my patch we just
ignore additional drives. So we'll die if we lose the path while booting
but otherwise it'll find at least one of the paths.
>> - Fix the ofdisk_hash system. We weren't making a copy of the devpath so
>> never found the cached item again.
> Could this have anything to do with why I can't see disks without
> devaliases assigned?
>
No. This would only affect the disk cache subsystem (since the hash
struct pointer was used as the disk id) and we'd be comparing the
devpath with random memory.
>
> QEMU/openbios has many bugs unfortunately. If I could make any sense
> of the code I would try to fix some of them, but I simply can't follow
> that code.
>
I've got a basic idea how it all hangs together internally and I started
to fix a few things in OpenBios but discovered the current source
doesn't work with QEMU 0.12.5. "boot" stopped working and if I used
"load" it started to work then complained that it was trying to
overwrite OpenBios. Since I have no idea how the memory is supposed to
be laid out I kinda backed away. Besides, Forth is not my first choice
of languages to figure out.
> I will give this patch a try on top of mine then and see how it behaves.
>
Thanks for testing,
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 15:52 ` Doug Nazar
@ 2010-07-28 15:55 ` Lennart Sorensen
0 siblings, 0 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 15:55 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 11:52:00AM -0400, Doug Nazar wrote:
> It only mattered for multipath. Which is a kind of raid1 setup. Grub
> doesn't handle it too well since the underlying device is the same and
> has the same disk number. Before we'd overwrite the old path with the
> last path found and incorrectly increase nr_devs. With my patch we just
> ignore additional drives. So we'll die if we lose the path while booting
> but otherwise it'll find at least one of the paths.
OK, that's what I thought.
> No. This would only affect the disk cache subsystem (since the hash
> struct pointer was used as the disk id) and we'd be comparing the
> devpath with random memory.
OK. I can live with having to create a devalias for now.
> I've got a basic idea how it all hangs together internally and I started
> to fix a few things in OpenBios but discovered the current source
> doesn't work with QEMU 0.12.5. "boot" stopped working and if I used
> "load" it started to work then complained that it was trying to
> overwrite OpenBios. Since I have no idea how the memory is supposed to
> be laid out I kinda backed away. Besides, Forth is not my first choice
> of languages to figure out.
Yeah that's what I decided too last time I tried to fix a bug in OpenBios
for qemu.
> Thanks for testing,
No problem.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 15:51 ` Lennart Sorensen
@ 2010-07-28 16:30 ` Doug Nazar
2010-07-28 17:01 ` Lennart Sorensen
0 siblings, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 16:30 UTC (permalink / raw)
To: grub-devel
On 2010-07-28 11:51 AM, Lennart Sorensen wrote:
> I was wondering a bit why tab completion for ls doesn't work on md 1.x
> devices. If I do:
> ls (md/0<tab>, I get
> ls (md/0,
> Well since I don't have any partitions, that doesn't help anything).
>
> Also if I do:
> ls (md/0)/<tab> I get nothing. ls (md/0)/ does work, but the tab
> completion refuses to believe it could work. I miss tab completition. :)
>
Think I spotted this by code review.
normal/completion.c:250
dir = grub_strchr (current_word, '/');
It looks for a starting '/' but I bet it's hitting the '/' in the
new-style md names. It should probably be
dir = grub_strchr (current_word + grub_strlen(device), '/');
or something similar. I've just started setting up a test since all my
arrays were under an LVM. It'll take me a little while to test with this
slow setup.
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 16:30 ` Doug Nazar
@ 2010-07-28 17:01 ` Lennart Sorensen
2010-07-28 17:12 ` Doug Nazar
0 siblings, 1 reply; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 17:01 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 12:30:47PM -0400, Doug Nazar wrote:
> Think I spotted this by code review.
>
> normal/completion.c:250
> dir = grub_strchr (current_word, '/');
>
> It looks for a starting '/' but I bet it's hitting the '/' in the
> new-style md names. It should probably be
>
> dir = grub_strchr (current_word + grub_strlen(device), '/');
>
> or something similar. I've just started setting up a test since all my
> arrays were under an LVM. It'll take me a little while to test with this
> slow setup.
Well I don't think that helped (unless I messed up the compile).
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 17:01 ` Lennart Sorensen
@ 2010-07-28 17:12 ` Doug Nazar
2010-07-28 17:40 ` Lennart Sorensen
2010-07-28 17:42 ` Lennart Sorensen
0 siblings, 2 replies; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 17:12 UTC (permalink / raw)
To: grub-devel
[-- Attachment #1: Type: text/plain, Size: 214 bytes --]
On 2010-07-28 1:01 PM, Lennart Sorensen wrote:
> Well I don't think that helped (unless I messed up the compile).
>
The following works for me. Just finished testing it. Damn these 15
minute boot times.
Doug
[-- Attachment #2: grub-fix-tab-completion-for-new-md-names.diff --]
[-- Type: text/plain, Size: 365 bytes --]
=== modified file 'normal/completion.c'
--- normal/completion.c 2010-06-11 20:31:16 +0000
+++ normal/completion.c 2010-07-28 17:02:42 +0000
@@ -247,7 +247,7 @@
goto fail;
}
- dir = grub_strchr (current_word, '/');
+ dir = grub_strchr (current_word + grub_strlen(device) + 2, '/');
last_dir = grub_strrchr (current_word, '/');
if (dir)
{
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 17:12 ` Doug Nazar
@ 2010-07-28 17:40 ` Lennart Sorensen
2010-07-28 17:42 ` Lennart Sorensen
1 sibling, 0 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 17:40 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 01:12:16PM -0400, Doug Nazar wrote:
> On 2010-07-28 1:01 PM, Lennart Sorensen wrote:
>> Well I don't think that helped (unless I messed up the compile).
>>
>
> The following works for me. Just finished testing it. Damn these 15
> minute boot times.
>
> Doug
>
> === modified file 'normal/completion.c'
> --- normal/completion.c 2010-06-11 20:31:16 +0000
> +++ normal/completion.c 2010-07-28 17:02:42 +0000
> @@ -247,7 +247,7 @@
> goto fail;
> }
>
> - dir = grub_strchr (current_word, '/');
> + dir = grub_strchr (current_word + grub_strlen(device) + 2, '/');
> last_dir = grub_strrchr (current_word, '/');
> if (dir)
> {
>
I was wondering if it needed + 2 to account for the parens. I wasn't
sure the current_word always contained a device name though.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 17:12 ` Doug Nazar
2010-07-28 17:40 ` Lennart Sorensen
@ 2010-07-28 17:42 ` Lennart Sorensen
2010-07-28 17:52 ` Lennart Sorensen
2010-07-28 18:01 ` Doug Nazar
1 sibling, 2 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 17:42 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 01:12:16PM -0400, Doug Nazar wrote:
> On 2010-07-28 1:01 PM, Lennart Sorensen wrote:
>> Well I don't think that helped (unless I messed up the compile).
>>
>
> The following works for me. Just finished testing it. Damn these 15
> minute boot times.
>
> Doug
>
> === modified file 'normal/completion.c'
> --- normal/completion.c 2010-06-11 20:31:16 +0000
> +++ normal/completion.c 2010-07-28 17:02:42 +0000
> @@ -247,7 +247,7 @@
> goto fail;
> }
>
> - dir = grub_strchr (current_word, '/');
> + dir = grub_strchr (current_word + grub_strlen(device) + 2, '/');
> last_dir = grub_strrchr (current_word, '/');
> if (dir)
> {
>
I wish the completion would realize that an md device without partitions
(so no partition table) should not complete with a ','. Not sure how
doable that is.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 17:42 ` Lennart Sorensen
@ 2010-07-28 17:52 ` Lennart Sorensen
2010-07-28 18:17 ` Doug Nazar
2010-07-28 18:01 ` Doug Nazar
1 sibling, 1 reply; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 17:52 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 01:42:19PM -0400, Lennart Sorensen wrote:
> On Wed, Jul 28, 2010 at 01:12:16PM -0400, Doug Nazar wrote:
> > On 2010-07-28 1:01 PM, Lennart Sorensen wrote:
> >> Well I don't think that helped (unless I messed up the compile).
> >>
> >
> > The following works for me. Just finished testing it. Damn these 15
> > minute boot times.
> >
> > Doug
> >
>
> > === modified file 'normal/completion.c'
> > --- normal/completion.c 2010-06-11 20:31:16 +0000
> > +++ normal/completion.c 2010-07-28 17:02:42 +0000
> > @@ -247,7 +247,7 @@
> > goto fail;
> > }
> >
> > - dir = grub_strchr (current_word, '/');
> > + dir = grub_strchr (current_word + grub_strlen(device) + 2, '/');
> > last_dir = grub_strrchr (current_word, '/');
> > if (dir)
> > {
> >
>
> I wish the completion would realize that an md device without partitions
> (so no partition table) should not complete with a ','. Not sure how
> doable that is.
Another thing that puzzles me is that when I boot linux, I always get
a message that says the state is wrong with the superblock on the sda
partitions, but the sdb partitions have the right state. I have no idea
if grub messes with this, or if the system isn't shutting down the raid
properly on reboots. It manages to assemble the raid and fix it
automatically, but the error is strange since I have never seen that
before. Of course I haven't really used 1.x raid before either, so
maybe that's part of it.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 17:42 ` Lennart Sorensen
2010-07-28 17:52 ` Lennart Sorensen
@ 2010-07-28 18:01 ` Doug Nazar
2010-07-28 18:50 ` Lennart Sorensen
1 sibling, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 18:01 UTC (permalink / raw)
To: grub-devel
On 2010-07-28 1:42 PM, Lennart Sorensen wrote:
> I wish the completion would realize that an md device without partitions
> (so no partition table) should not complete with a ','. Not sure how
> doable that is.
>
I was just looking at that. In disk/raid.c:129 we set the flag that
tells the system to try looking for partitions. I've been debating
whether to add code to disk/raid.c to detect if there actually is a
partition table (by probing for all known partition types) or fix the
tab completion code to offer a ')' also instead of assuming you want a
partition.
I'm not sure of the GRUB design philosophy, but the first option might
be considered a layering violation. Maybe grub_device_open() should
check and then clear the flag if it doesn't have a table.
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 17:52 ` Lennart Sorensen
@ 2010-07-28 18:17 ` Doug Nazar
2010-07-28 18:49 ` Lennart Sorensen
0 siblings, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 18:17 UTC (permalink / raw)
To: grub-devel
On 2010-07-28 1:52 PM, Lennart Sorensen wrote:
>
> Another thing that puzzles me is that when I boot linux, I always get
> a message that says the state is wrong with the superblock on the sda
> partitions, but the sdb partitions have the right state. I have no idea
> if grub messes with this, or if the system isn't shutting down the raid
> properly on reboots. It manages to assemble the raid and fix it
> automatically, but the error is strange since I have never seen that
> before. Of course I haven't really used 1.x raid before either, so
> maybe that's part of it.
>
It really shouldn't be doing that. Grub shouldn't be doing any writes at
all during boot (unless you're using some of the more exotic commands or
save_env) and the raid code specifically errors on writes to it.
If it was linux I'd expect both drives to be wrong unless there is
something silly in the init scripts.
When you say it fixes it, is that a full re-sync or something else?
Could you check the superblocks? In grub use the following on each
device for 1.2 arrays.
hexdump -s 4096 -n 256 (dev,p)
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 18:17 ` Doug Nazar
@ 2010-07-28 18:49 ` Lennart Sorensen
2010-07-28 20:10 ` Doug Nazar
2010-07-28 20:46 ` Doug Nazar
0 siblings, 2 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 18:49 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 02:17:10PM -0400, Doug Nazar wrote:
> It really shouldn't be doing that. Grub shouldn't be doing any writes at
> all during boot (unless you're using some of the more exotic commands or
> save_env) and the raid code specifically errors on writes to it.
>
> If it was linux I'd expect both drives to be wrong unless there is
> something silly in the init scripts.
>
> When you say it fixes it, is that a full re-sync or something else?
> Could you check the superblocks? In grub use the following on each
> device for 1.2 arrays.
Well it doesn't have to resync. It seems like it looks at both drives,
decides the first is wrong, the second is right, and then goes from there.
I suspect it is something in Linux doing it.
It says:
Loading, please wait...
mdadm: device 1 in /dev/md/0 has wrong state in superblock, but /dev/sdb2 seems ok
mdadm: /dev/md/0 has been started with 2 drives.
mdadm: device 1 in /dev/md/1 has wrong state in superblock, but /dev/sdb3 seems ok
mdadm: /dev/md/1 has been started with 2 drives.
mdadm: device 1 in /dev/md/2 has wrong state in superblock, but /dev/sdb4 seems ok
mdadm: /dev/md/2 has been started with 2 drives.
> hexdump -s 4096 -n 256 (dev,p)
grub> hexdump -s 4096 -n 256 (hd1,2)
00001000 fc 4e 2b a9 01 00 00 00 00 00 00 00 00 00 00 00 |.N+.............|
00001010 5d 42 7f 8b 3f 7d 35 69 13 c7 e1 e6 f8 ef 04 e3 |]B..?}5i........|
00001020 72 63 65 6e 67 30 33 3a 30 00 00 00 00 00 00 00 |rceng03:0.......|
00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001040 3d ba 4d 4c 00 00 00 00 01 00 00 00 00 00 00 00 |=.ML............|
00001050 e8 cf 1d 00 00 00 00 00 00 00 00 00 02 00 00 00 |................|
00001060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001080 18 00 00 00 00 00 00 00 e8 cf 1d 00 00 00 00 00 |................|
00001090 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000010a0 00 00 00 00 00 00 00 00 27 71 5c d1 ee 42 35 27 |........'q\..B5'|
000010b0 51 ca 3d f6 05 7f 3c 12 00 00 00 00 00 00 00 00 |Q.=...<.........|
000010c0 dc 78 50 4c 00 00 00 00 20 00 00 00 00 00 00 00 |.xPL.... .......|
000010d0 ff ff ff ff ff ff ff ff ca 2b 46 aa 80 01 00 00 |.........+F.....|
000010e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000010f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
grub> hexdump -s 4096 -n 256 (hd2,2)
00001000 fc 4e 2b a9 01 00 00 00 00 00 00 00 00 00 00 00 |.N+.............|
00001010 5d 42 7f 8b 3f 7d 35 69 13 c7 e1 e6 f8 ef 04 e3 |]B..?}5i........|
00001020 72 63 65 6e 67 30 33 3a 30 00 00 00 00 00 00 00 |rceng03:0.......|
00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001040 3d ba 4d 4c 00 00 00 00 01 00 00 00 00 00 00 00 |=.ML............|
00001050 e8 cf 1d 00 00 00 00 00 00 00 00 00 02 00 00 00 |................|
00001060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001080 18 00 00 00 00 00 00 00 e8 cf 1d 00 00 00 00 00 |................|
00001090 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000010a0 01 00 00 00 00 00 00 00 c5 5e 45 dc 14 b4 c6 c4 |.........^E.....|
000010b0 f5 bf 73 bc 05 9b c1 21 00 00 00 00 00 00 00 00 |..s....!........|
000010c0 dc 78 50 4c 00 00 00 00 20 00 00 00 00 00 00 00 |.xPL.... .......|
000010d0 ff ff ff ff ff ff ff ff 34 9c 7b 28 80 01 00 00 |........4.{(....|
000010e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000010f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
That's the first raid1.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 18:01 ` Doug Nazar
@ 2010-07-28 18:50 ` Lennart Sorensen
0 siblings, 0 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 18:50 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 02:01:46PM -0400, Doug Nazar wrote:
> I was just looking at that. In disk/raid.c:129 we set the flag that
> tells the system to try looking for partitions. I've been debating
> whether to add code to disk/raid.c to detect if there actually is a
> partition table (by probing for all known partition types) or fix the
> tab completion code to offer a ')' also instead of assuming you want a
> partition.
>
> I'm not sure of the GRUB design philosophy, but the first option might
> be considered a layering violation. Maybe grub_device_open() should
> check and then clear the flag if it doesn't have a table.
Certainly is seems USB keys and similar may hit the same problem.
Sometimes they have partitions and sometimes they don't. So not really
the raid code's fault.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 18:49 ` Lennart Sorensen
@ 2010-07-28 20:10 ` Doug Nazar
2010-07-28 22:25 ` Lennart Sorensen
2010-07-28 20:46 ` Doug Nazar
1 sibling, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 20:10 UTC (permalink / raw)
To: grub-devel
On 2010-07-28 2:49 PM, Lennart Sorensen wrote:
>
> mdadm: device 1 in /dev/md/0 has wrong state in superblock, but /dev/sdb2 seems
Ok, I think this is a bug in mdadm. I couldn't replicate it on a x86 box
but then tried it in the ppc image and it got the same message. Not sure
if it's because of an endian issue or version issue my x86 boxes have
version 3.1.2 while the Ubuntu ppc is running 2.6.7.1.
The error seems kinda strange as devices are 0 numbered so sdb2 is
probably device 1 (I assume sda2 is device 0). Unfortunately the mdadm
source has half a dozen structs with the same members so it's almost
impossible to grep for where things are. It looks like that error
happens when it's detecting if the drives are in the right roles.
There's a __cpu_to_le32 that seems suspicious to me (super1.c:676).
What version of mdadm do you have?
> That's the first raid1.
>
The superblocks are correct.
Doug
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 18:49 ` Lennart Sorensen
2010-07-28 20:10 ` Doug Nazar
@ 2010-07-28 20:46 ` Doug Nazar
2010-07-29 15:30 ` Lennart Sorensen
1 sibling, 1 reply; 25+ messages in thread
From: Doug Nazar @ 2010-07-28 20:46 UTC (permalink / raw)
To: grub-devel
[-- Attachment #1: Type: text/plain, Size: 241 bytes --]
On 2010-07-28 2:49 PM, Lennart Sorensen wrote:
>
> mdadm: device 1 in /dev/md/0 has wrong state in superblock, but /dev/sdb2 seems ok
> mdadm: /dev/md/0 has been started with 2 drives.
Ok, it was an endian issue. Here's the patch.
Doug
[-- Attachment #2: mdadm-assemble-roles-endian-fix.diff --]
[-- Type: text/plain, Size: 460 bytes --]
--- mdadm-3.1.2/super1.c 2010-03-09 18:26:44.000000000 -0500
+++ mdadm-3.1.2.dev/super1.c 2010-07-28 16:41:41.000000000 -0400
@@ -673,10 +673,10 @@
int d = info->disk.number;
int want;
if (info->disk.state == 6)
- want = __cpu_to_le32(info->disk.raid_disk);
+ want = info->disk.raid_disk;
else
want = 0xFFFF;
- if (sb->dev_roles[d] != want) {
+ if (__le16_to_cpu(sb->dev_roles[d]) != want) {
sb->dev_roles[d] = want;
rv = 1;
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 20:10 ` Doug Nazar
@ 2010-07-28 22:25 ` Lennart Sorensen
0 siblings, 0 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-28 22:25 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 04:10:58PM -0400, Doug Nazar wrote:
> On 2010-07-28 2:49 PM, Lennart Sorensen wrote:
>>
>> mdadm: device 1 in /dev/md/0 has wrong state in superblock, but /dev/sdb2 seems
> Ok, I think this is a bug in mdadm. I couldn't replicate it on a x86 box
> but then tried it in the ppc image and it got the same message. Not sure
> if it's because of an endian issue or version issue my x86 boxes have
> version 3.1.2 while the Ubuntu ppc is running 2.6.7.1.
>
> The error seems kinda strange as devices are 0 numbered so sdb2 is
> probably device 1 (I assume sda2 is device 0). Unfortunately the mdadm
> source has half a dozen structs with the same members so it's almost
> impossible to grep for where things are. It looks like that error
> happens when it's detecting if the drives are in the right roles.
> There's a __cpu_to_le32 that seems suspicious to me (super1.c:676).
>
> What version of mdadm do you have?
3.0.3.
>> That's the first raid1.
>>
> The superblocks are correct.
Well that's good. They do appear to work.
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 20:46 ` Doug Nazar
@ 2010-07-29 15:30 ` Lennart Sorensen
0 siblings, 0 replies; 25+ messages in thread
From: Lennart Sorensen @ 2010-07-29 15:30 UTC (permalink / raw)
To: The development of GNU GRUB
On Wed, Jul 28, 2010 at 04:46:39PM -0400, Doug Nazar wrote:
> On 2010-07-28 2:49 PM, Lennart Sorensen wrote:
>>
>> mdadm: device 1 in /dev/md/0 has wrong state in superblock, but /dev/sdb2 seems ok
>> mdadm: /dev/md/0 has been started with 2 drives.
>
> Ok, it was an endian issue. Here's the patch.
>
> Doug
>
> --- mdadm-3.1.2/super1.c 2010-03-09 18:26:44.000000000 -0500
> +++ mdadm-3.1.2.dev/super1.c 2010-07-28 16:41:41.000000000 -0400
> @@ -673,10 +673,10 @@
> int d = info->disk.number;
> int want;
> if (info->disk.state == 6)
> - want = __cpu_to_le32(info->disk.raid_disk);
> + want = info->disk.raid_disk;
> else
> want = 0xFFFF;
> - if (sb->dev_roles[d] != want) {
> + if (__le16_to_cpu(sb->dev_roles[d]) != want) {
> sb->dev_roles[d] = want;
> rv = 1;
> }
Great. That works. Have you submitted that to the mdadm developers?
--
Len Sorensen
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Big Endian fix patch
2010-07-28 8:51 ` Doug Nazar
2010-07-28 15:00 ` Lennart Sorensen
@ 2010-09-13 19:54 ` Vladimir 'φ-coder/phcoder' Serbinenko
1 sibling, 0 replies; 25+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2010-09-13 19:54 UTC (permalink / raw)
To: grub-devel, Doug Nazar
[-- Attachment #1: Type: text/plain, Size: 2953 bytes --]
On 07/28/2010 10:51 AM, Doug Nazar wrote:
> On 2010-07-27 11:26 AM, Lennart Sorensen wrote:
>> On Mon, Jul 26, 2010 at 09:00:53PM -0400, Doug Nazar wrote:
>>
>>> I'll run it through the grinder in a bit.
>> So with my patch I am booting of the md raid1 very successfully (other
>> than manually having to fix grub-mkimage run since grub-install doesn't
>> understand the machine yet. I am working on fixing that now.)
>
> Ok, finally made some progress. Ran into several issues, some of them
> obviously QEMU/OpenBios that I'm not sure if GRUB should work around.
> With your patch the raid mostly worked, small problem with too many
> devices because OpenBios creates several aliases for some devices.
> Also I think you missed the endianess when setting the level.
>
> This patch includes the following:
>
> - Fix the ofdisk_hash system. We weren't making a copy of the devpath
> so never found the cached item again.
> - Extend the ofdisk_hash to cache the disk size
> - Scan for a disk size using seek (probably want to set a different
> start size). Required for metadata 1.0 arrays
> - Optimize checking of raid level
> - If we find a duplicate disk (claims to be same index in the array),
> skip it or else level 0 arrays wont be found
> - QEMU/OpenBios doesn't seem to like if the prev & name parameters of
> ieee1275_next_property are the same pointer which caused no devices to
> be found
>
I merged ofdisk branch so I believe most of these fixes aren't necessary
anymore except determining the disk size. Some firmwares seem to react
badly to size scanning (like my Netra X-1 does). How long does it take
on your system? Perhaps we should introduce a separate function
-get_disk_size_really_need_it on disk device which is used for the case
like this when total_sectors is set to UNKNOWN but it's needed for
something more substantial then reporting to user or sanity checks.
Could you adjust your patch accordingly?
> The issues that I came across which are in QEMU/OpenBios:
>
> - The rows are misreported. screen-#rows is set to 75 when in fact
> there are only 60 rows. Worked around using -prom-env parameter
> - Aliases don't take into account the index (i.e. disk@1). I ended up
> with
> disk /pci/pci-ata/ata-1/disk
> hd /pci/pci-ata/ata-1/disk
> ide0 /pci/pci-ata/ata-1/disk
> ide1 /pci/mac-io/ata-3/disk
> ide2 /pci/mac-io/ata-3/disk
>
> when ide1 should be disk@0 and ide2 should be disk@1
> - boot command hangs when passed wrong disk or used from boot-command.
> Worked around by using load & go
>
> Things do work, and fixing QEMU/OpenBios is a bit further down the
> rabbit hole than I want to go. ;-)
>
> Thanks,
> Doug
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> http://lists.gnu.org/mailman/listinfo/grub-devel
>
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 294 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-09-13 19:55 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-27 1:00 Big Endian fix patch (was: Re: Couple more fixes for Linux raid metadata 1.x support) Doug Nazar
2010-07-27 15:26 ` Lennart Sorensen
2010-07-27 23:58 ` Big Endian fix patch Doug Nazar
2010-07-28 14:52 ` Lennart Sorensen
2010-07-28 15:29 ` Doug Nazar
2010-07-28 8:51 ` Doug Nazar
2010-07-28 15:00 ` Lennart Sorensen
2010-07-28 15:51 ` Lennart Sorensen
2010-07-28 16:30 ` Doug Nazar
2010-07-28 17:01 ` Lennart Sorensen
2010-07-28 17:12 ` Doug Nazar
2010-07-28 17:40 ` Lennart Sorensen
2010-07-28 17:42 ` Lennart Sorensen
2010-07-28 17:52 ` Lennart Sorensen
2010-07-28 18:17 ` Doug Nazar
2010-07-28 18:49 ` Lennart Sorensen
2010-07-28 20:10 ` Doug Nazar
2010-07-28 22:25 ` Lennart Sorensen
2010-07-28 20:46 ` Doug Nazar
2010-07-29 15:30 ` Lennart Sorensen
2010-07-28 18:01 ` Doug Nazar
2010-07-28 18:50 ` Lennart Sorensen
2010-07-28 15:52 ` Doug Nazar
2010-07-28 15:55 ` Lennart Sorensen
2010-09-13 19:54 ` Vladimir 'φ-coder/phcoder' Serbinenko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.