* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
[not found] ` <20080506133126.GG5055@thorin>
@ 2008-05-11 11:35 ` Török Edwin
2008-05-12 15:44 ` Robert Millan
0 siblings, 1 reply; 4+ messages in thread
From: Török Edwin @ 2008-05-11 11:35 UTC (permalink / raw)
To: grub-devel
[-- Attachment #1: Type: text/plain, Size: 3823 bytes --]
[sending to grub-devel@ as requested]
Robert Millan wrote:
> On Sun, May 04, 2008 at 05:01:32PM +0300, Török Edwin wrote:
>
>>>> Device Boot Start End Blocks Id System
>>>> /dev/sda1 * 1 1275 10241406 7 HPFS/NTFS
>>>> /dev/sda2 1276 2248 7815622+ a6 OpenBSD
>>>> /dev/sda3 2249 5289 24426832+ f W95 Ext'd (LBA)
>>>> /dev/sda4 6080 7296 9775552+ bf Solaris
>>>> /dev/sda5 2249 2371 987966 82 Linux swap / Solaris
>>>> /dev/sda6 2372 3587 9767488+ 83 Linux
>>>> /dev/sda7 3588 3600 104391 83 Linux
>>>> /dev/sda8 3601 4863 10145016 8e Linux LVM
>>>> /dev/sda9 4864 5228 2931831 a6 OpenBSD
>>>> /dev/sda10 5229 5289 489951 83 Linux
>>>>
>> [...]
>> grub> ls (hd0,10)
>> error: unknown device
>> grub> ls (hd0,11)
>> error: unknown device
>> grub>
>>
>
> I tried reproducing your setup, but I can't hit the same bug. This starts to
> look really nasty. Just spotted this:
>
> /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x80, type 0x7, start 0x3f, len 0x1388afc
> [...]
> /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x0, type 0x82, start 0x2270f07, len 0x1e267c
>
> for which I can't find any explanation other than memory corruption. Also,
> due to a missing fflush() call the output is somewhat scrambled, which makes
> it harder to track (I fixed this already in upstream).
>
> Could you:
>
> - Apply the attached patch & run grub-probe again (this time output
> will be a bit more readable)
>
There was no patch attached, however I did a 'cvs diff -u -D2008-04-30',
and applied that patch.
I found what the problem is, and it also explains why you couldn't
reproduce the problem.
/dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
iteration fails with an error: invalid disk label magic 0x%x.
If I replace that return with a continue, it works.
The problem is that grub2 stops looking for more partitions as soon as
it encountered the invalid partition,
grub 0.97 was working perfectly and I never noticed the partition has
the wrong type!
Also if I change the partition type to 83 (as it should be) an unpatched
grub-probe can find that /boot is on /dev/sda10:
# grub-probe -t device /boot
/dev/sda10
I think grub2 should handle errors more gracefully, eventually mark the
partition as invalid, and keep going.
grub-probe was looking for /dev/sda10, and it shouldn't be affected by
/dev/sda9 being corrupted/invalid.
Think of it this way: if a partition gets corrupted, that shouldn't
prevent from booting, assuming the boot and root partitions are
still ok.
Compare what grub-emu says when sda9 has wrong type:
grub> ls (hd0,10)
error: unknown device
And this is what it says when sda9 has the correct type:
grub> ls (hd0,10)
Partition hd0,10: Filesystem type ext2, Label debian_BOOT
> - Send it to grub-devel@gnu.org
>
Done
> ?
>
> Maybe someone there has an idea, but if it's memory corruption and we can't
> reproduce it, tracing the problem remotely isn't going to work very well.
>
It wasn't memory corruption, however I have run valgrind and it has
shown some leaks, plus call to stat() with NULL parameter.
The attached patch fixes some valgrind warnings. Some leaks still
remain, I attached the new valgrind logs.
P.S.: grub2 seems to work now, I am able to boot with it with the
text-mode menu. The default graphics mode doesn't work I will open a
separate bug about that.
Best regards,
--Edwin
[-- Attachment #2: grub2.patch --]
[-- Type: text/x-diff, Size: 1033 bytes --]
diff -ur grub2-1.96+20080429/kern/disk.c ../grub2-1.96+20080429/kern/disk.c
--- grub2-1.96+20080429/kern/disk.c 2008-02-08 14:22:51.000000000 +0200
+++ ../grub2-1.96+20080429/kern/disk.c 2008-05-11 13:58:02.270673755 +0300
@@ -317,7 +317,10 @@
/* Reset the timer. */
grub_last_time = grub_get_rtc ();
- grub_free (disk->partition);
+ if(disk->partition) {
+ grub_free (disk->partition->data);
+ grub_free (disk->partition);
+ }
grub_free ((void *) disk->name);
grub_free (disk);
}
diff -ur grub2-1.96+20080429/util/grub-probe.c ../grub2-1.96+20080429/util/grub-probe.c
--- grub2-1.96+20080429/util/grub-probe.c 2008-05-11 13:59:14.934811935 +0300
+++ ../grub2-1.96+20080429/util/grub-probe.c 2008-05-11 13:46:21.729236855 +0300
@@ -190,9 +190,10 @@
struct stat st;
grub_fs_t fs;
- stat (path, &st);
+ if(path)
+ stat (path, &st);
- if (st.st_mode == S_IFREG)
+ if (path && st.st_mode == S_IFREG)
{
/* Regular file. Verify that we can read it properly. */
[-- Attachment #3: vallog --]
[-- Type: text/plain, Size: 3800 bytes --]
==25071== Memcheck, a memory error detector.
==25071== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==25071== Using LibVEX rev 1804, a library for dynamic binary translation.
==25071== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==25071== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation framework.
==25071== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==25071== For more details, rerun with: -v
==25071==
==25071== My PID = 25071, parent PID = 5663. Prog and args are:
==25071== ./grub-probe
==25071== -d
==25071== /dev/sda10
==25071==
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071== This could cause spurious value errors to appear.
==25071== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071== This could cause spurious value errors to appear.
==25071== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071== This could cause spurious value errors to appear.
==25071== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==25071==
==25071== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 1)
==25071== malloc/free: in use at exit: 611,077 bytes in 176 blocks.
==25071== malloc/free: 901 allocs, 725 frees, 2,397,201 bytes allocated.
==25071== For counts of detected errors, rerun with: -v
==25071== searching for pointers to 176 not-freed blocks.
==25071== checked 662,256 bytes.
==25071==
==25071== 4,096 bytes in 1 blocks are possibly lost in loss record 3 of 5
==25071== at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071== by 0x804AFE4: xmalloc (misc.c:81)
==25071== by 0x804B41A: grub_malloc (misc.c:222)
==25071== by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071== by 0x804CDC1: grub_disk_read (disk.c:461)
==25071== by 0x8069A72: grub_lvm_scan_device (lvm.c:288)
==25071== by 0x804C014: iterate_partition.2134 (device.c:132)
==25071== by 0x8066C9C: pc_partition_map_iterate (pc.c:153)
==25071== by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071== by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071== by 0x80498FA: call_hook (biosdisk.c:132)
==25071== by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071==
==25071==
==25071== 41,136 (41,132 direct, 4 indirect) bytes in 12 blocks are definitely lost in loss record 4 of 5
==25071== at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071== by 0x804AFE4: xmalloc (misc.c:81)
==25071== by 0x804B41A: grub_malloc (misc.c:222)
==25071== by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071== by 0x804CDC1: grub_disk_read (disk.c:461)
==25071== by 0x8066D4E: pc_partition_map_iterate (pc.c:165)
==25071== by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071== by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071== by 0x80498FA: call_hook (biosdisk.c:132)
==25071== by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071== by 0x804C4CC: grub_disk_dev_iterate (disk.c:205)
==25071== by 0x804BF63: grub_device_iterate (device.c:138)
==25071==
==25071== LEAK SUMMARY:
==25071== definitely lost: 41,132 bytes in 12 blocks.
==25071== indirectly lost: 4 bytes in 1 blocks.
==25071== possibly lost: 4,096 bytes in 1 blocks.
==25071== still reachable: 565,845 bytes in 162 blocks.
==25071== suppressed: 0 bytes in 0 blocks.
==25071== Reachable blocks (those to which a pointer was found) are not shown.
==25071== To see them, rerun with: --leak-check=full --show-reachable=yes
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
2008-05-11 11:35 ` Bug#478238: grub-probe: fails to find drive for /dev/sda10 Török Edwin
@ 2008-05-12 15:44 ` Robert Millan
2008-05-12 16:02 ` Török Edwin
0 siblings, 1 reply; 4+ messages in thread
From: Robert Millan @ 2008-05-12 15:44 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 750 bytes --]
On Sun, May 11, 2008 at 02:35:41PM +0300, Török Edwin wrote:
>
> /dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
> iteration fails with an error: invalid disk label magic 0x%x.
> If I replace that return with a continue, it works.
>
> The problem is that grub2 stops looking for more partitions as soon as
> it encountered the invalid partition,
I think a correct fix for this belongs in grub_partition_iterate(). It should
only let its hook determine abortability rather than mandate that invalid
partitions should cause abortion. Please, can you test the attached patch?
--
Robert Millan
<GPLv2> I know my rights; I want my phone call!
<DRM> What use is a phone call… if you are unable to speak?
(as seen on /.)
[-- Attachment #2: partition_iterate.diff --]
[-- Type: text/x-diff, Size: 524 bytes --]
diff -x configure -x config.h.in -x CVS -x '*~' -x '*.mk' -urp ../grub2/kern/partition.c ./kern/partition.c
--- ../grub2/kern/partition.c 2007-07-22 01:32:26.000000000 +0200
+++ ./kern/partition.c 2008-05-12 17:41:51.000000000 +0200
@@ -89,7 +89,7 @@ grub_partition_iterate (struct grub_disk
const grub_partition_t partition))
{
grub_partition_map_t partmap = 0;
- int ret = 0;
+ int ret = 1;
auto int part_map_iterate (const grub_partition_map_t p);
auto int part_map_iterate_hook (grub_disk_t d,
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
2008-05-12 15:44 ` Robert Millan
@ 2008-05-12 16:02 ` Török Edwin
2008-05-12 18:43 ` Robert Millan
0 siblings, 1 reply; 4+ messages in thread
From: Török Edwin @ 2008-05-12 16:02 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]
Robert Millan wrote:
> On Sun, May 11, 2008 at 02:35:41PM +0300, Török Edwin wrote:
>
>> /dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
>> iteration fails with an error: invalid disk label magic 0x%x.
>> If I replace that return with a continue, it works.
>>
>> The problem is that grub2 stops looking for more partitions as soon as
>> it encountered the invalid partition,
>>
>
> I think a correct fix for this belongs in grub_partition_iterate(). It should
> only let its hook determine abortability rather than mandate that invalid
> partitions should cause abortion. Please, can you test the attached patch?
>
I tested the patch, however it still says:
grub-probe: error: Cannot find a GRUB drive for /dev/sda10. Check your
device.map.
I notice that /dev/sda10 appears in the -vv output though:
home/egrub-probe: info: opening the device `/dev/sda9'
grub-probe: info: opening the device `/dev/sda9'
grub-probe: info: /dev/sda10 starts from 83987883
grub-probe: info: opening the device hd0
I attached the full output of grub-probe -d /dev/sda10 -vv, for the case
when /dev/sda9 has the wrong partition type.
Best regards,
--Edwin
[-- Attachment #2: log.bz2 --]
[-- Type: application/x-bzip, Size: 2238 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
2008-05-12 16:02 ` Török Edwin
@ 2008-05-12 18:43 ` Robert Millan
0 siblings, 0 replies; 4+ messages in thread
From: Robert Millan @ 2008-05-12 18:43 UTC (permalink / raw)
To: The development of GRUB 2
On Mon, May 12, 2008 at 07:02:46PM +0300, Török Edwin wrote:
> Robert Millan wrote:
> > On Sun, May 11, 2008 at 02:35:41PM +0300, Török Edwin wrote:
> >
> >> /dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
> >> iteration fails with an error: invalid disk label magic 0x%x.
> >> If I replace that return with a continue, it works.
> >>
> >> The problem is that grub2 stops looking for more partitions as soon as
> >> it encountered the invalid partition,
> >>
> >
> > I think a correct fix for this belongs in grub_partition_iterate(). It should
> > only let its hook determine abortability rather than mandate that invalid
> > partitions should cause abortion. Please, can you test the attached patch?
> >
>
> I tested the patch, however it still says:
> grub-probe: error: Cannot find a GRUB drive for /dev/sda10. Check your
> device.map.
>
> I notice that /dev/sda10 appears in the -vv output though:
> home/egrub-probe: info: opening the device `/dev/sda9'
> grub-probe: info: opening the device `/dev/sda9'
> grub-probe: info: /dev/sda10 starts from 83987883
> grub-probe: info: opening the device hd0
>
> I attached the full output of grub-probe -d /dev/sda10 -vv, for the case
> when /dev/sda9 has the wrong partition type.
It's better is you use grub-emu for this test. It's less complex.
--
Robert Millan
<GPLv2> I know my rights; I want my phone call!
<DRM> What use is a phone call… if you are unable to speak?
(as seen on /.)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-05-12 18:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <481590B8.7000405@gmail.com>
[not found] ` <20080429134626.GB8328@thorin>
[not found] ` <481DC1BC.6020002@gmail.com>
[not found] ` <20080506133126.GG5055@thorin>
2008-05-11 11:35 ` Bug#478238: grub-probe: fails to find drive for /dev/sda10 Török Edwin
2008-05-12 15:44 ` Robert Millan
2008-05-12 16:02 ` Török Edwin
2008-05-12 18:43 ` Robert Millan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.