All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
       [not found]     ` <20080506133126.GG5055@thorin>
@ 2008-05-11 11:35       ` Török Edwin
  2008-05-12 15:44         ` Robert Millan
  0 siblings, 1 reply; 4+ messages in thread
From: Török Edwin @ 2008-05-11 11:35 UTC (permalink / raw)
  To: grub-devel

[-- Attachment #1: Type: text/plain, Size: 3823 bytes --]

[sending to grub-devel@ as requested]

Robert Millan wrote:
> On Sun, May 04, 2008 at 05:01:32PM +0300, Török Edwin wrote:
>   
>>>>    Device Boot      Start         End      Blocks   Id  System
>>>> /dev/sda1   *           1        1275    10241406    7  HPFS/NTFS
>>>> /dev/sda2            1276        2248     7815622+  a6  OpenBSD
>>>> /dev/sda3            2249        5289    24426832+   f  W95 Ext'd (LBA)
>>>> /dev/sda4            6080        7296     9775552+  bf  Solaris
>>>> /dev/sda5            2249        2371      987966   82  Linux swap / Solaris
>>>> /dev/sda6            2372        3587     9767488+  83  Linux
>>>> /dev/sda7            3588        3600      104391   83  Linux
>>>> /dev/sda8            3601        4863    10145016   8e  Linux LVM
>>>> /dev/sda9            4864        5228     2931831   a6  OpenBSD
>>>> /dev/sda10           5229        5289      489951   83  Linux
>>>>         
>> [...]
>> grub> ls (hd0,10)
>> error: unknown device
>> grub> ls (hd0,11)
>> error: unknown device
>> grub>
>>     
>
> I tried reproducing your setup, but I can't hit the same bug.  This starts to
> look really nasty.  Just spotted this:
>
>   /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x80, type 0x7, start 0x3f, len 0x1388afc
>   [...]
>   /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x0, type 0x82, start 0x2270f07, len 0x1e267c
>
> for which I can't find any explanation other than memory corruption.  Also,
> due to a missing fflush() call the output is somewhat scrambled, which makes
> it harder to track (I fixed this already in upstream).
>
> Could you:
>
>   - Apply the attached patch & run grub-probe again (this time output
>     will be a bit more readable)
>   

There was no patch attached, however I did a 'cvs diff -u -D2008-04-30',
and applied that patch.
I found what the problem is, and it also explains why you couldn't
reproduce the problem.

/dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
iteration fails with an error: invalid disk label magic 0x%x.
If I replace that return with a continue, it works.

The problem is that grub2 stops looking for more partitions as soon as
it encountered the invalid partition,
grub 0.97 was working perfectly and I never noticed the partition has
the wrong type!

Also if I change the partition type to 83 (as it should be) an unpatched
grub-probe can find that /boot is on /dev/sda10:
# grub-probe -t device /boot
/dev/sda10

I think grub2 should handle errors more gracefully, eventually mark the
partition as invalid, and keep going.
grub-probe was looking for /dev/sda10, and it shouldn't be affected by
/dev/sda9 being corrupted/invalid.
Think of it this way: if a partition gets corrupted, that shouldn't
prevent from booting, assuming the boot and root partitions are
still ok.

Compare what grub-emu says when sda9 has wrong type:

grub> ls (hd0,10)
error: unknown device

And this is what it says when sda9 has the correct type:
grub> ls (hd0,10)
      Partition hd0,10: Filesystem type ext2, Label debian_BOOT



>   - Send it to grub-devel@gnu.org
>   
Done
>   ?
>
> Maybe someone there has an idea, but if it's memory corruption and we can't
> reproduce it, tracing the problem remotely isn't going to work very well.
>   

It wasn't memory corruption, however I have run valgrind and it has
shown some leaks, plus call to stat() with NULL parameter.
The attached patch fixes some valgrind warnings. Some leaks still
remain, I attached the new valgrind logs.

P.S.: grub2 seems to work now, I am able to boot with it with the
text-mode menu. The default graphics mode doesn't work I will open a
separate bug about that.

Best regards,
--Edwin



[-- Attachment #2: grub2.patch --]
[-- Type: text/x-diff, Size: 1033 bytes --]

diff -ur grub2-1.96+20080429/kern/disk.c ../grub2-1.96+20080429/kern/disk.c
--- grub2-1.96+20080429/kern/disk.c	2008-02-08 14:22:51.000000000 +0200
+++ ../grub2-1.96+20080429/kern/disk.c	2008-05-11 13:58:02.270673755 +0300
@@ -317,7 +317,10 @@
   /* Reset the timer.  */
   grub_last_time = grub_get_rtc ();
 
-  grub_free (disk->partition);
+  if(disk->partition) {
+	  grub_free (disk->partition->data);
+	  grub_free (disk->partition);
+  }
   grub_free ((void *) disk->name);
   grub_free (disk);
 }
diff -ur grub2-1.96+20080429/util/grub-probe.c ../grub2-1.96+20080429/util/grub-probe.c
--- grub2-1.96+20080429/util/grub-probe.c	2008-05-11 13:59:14.934811935 +0300
+++ ../grub2-1.96+20080429/util/grub-probe.c	2008-05-11 13:46:21.729236855 +0300
@@ -190,9 +190,10 @@
       struct stat st;
       grub_fs_t fs;
 
-      stat (path, &st);
+      if(path)
+	      stat (path, &st);
 
-      if (st.st_mode == S_IFREG)
+      if (path && st.st_mode == S_IFREG)
 	{
 	  /* Regular file.  Verify that we can read it properly.  */
 


[-- Attachment #3: vallog --]
[-- Type: text/plain, Size: 3800 bytes --]

==25071== Memcheck, a memory error detector.
==25071== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==25071== Using LibVEX rev 1804, a library for dynamic binary translation.
==25071== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==25071== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation framework.
==25071== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==25071== For more details, rerun with: -v
==25071== 
==25071== My PID = 25071, parent PID = 5663.  Prog and args are:
==25071==    ./grub-probe
==25071==    -d
==25071==    /dev/sda10
==25071== 
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071==    This could cause spurious value errors to appear.
==25071==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071==    This could cause spurious value errors to appear.
==25071==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071==    This could cause spurious value errors to appear.
==25071==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==25071== 
==25071== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 1)
==25071== malloc/free: in use at exit: 611,077 bytes in 176 blocks.
==25071== malloc/free: 901 allocs, 725 frees, 2,397,201 bytes allocated.
==25071== For counts of detected errors, rerun with: -v
==25071== searching for pointers to 176 not-freed blocks.
==25071== checked 662,256 bytes.
==25071== 
==25071== 4,096 bytes in 1 blocks are possibly lost in loss record 3 of 5
==25071==    at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071==    by 0x804AFE4: xmalloc (misc.c:81)
==25071==    by 0x804B41A: grub_malloc (misc.c:222)
==25071==    by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071==    by 0x804CDC1: grub_disk_read (disk.c:461)
==25071==    by 0x8069A72: grub_lvm_scan_device (lvm.c:288)
==25071==    by 0x804C014: iterate_partition.2134 (device.c:132)
==25071==    by 0x8066C9C: pc_partition_map_iterate (pc.c:153)
==25071==    by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071==    by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071==    by 0x80498FA: call_hook (biosdisk.c:132)
==25071==    by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071== 
==25071== 
==25071== 41,136 (41,132 direct, 4 indirect) bytes in 12 blocks are definitely lost in loss record 4 of 5
==25071==    at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071==    by 0x804AFE4: xmalloc (misc.c:81)
==25071==    by 0x804B41A: grub_malloc (misc.c:222)
==25071==    by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071==    by 0x804CDC1: grub_disk_read (disk.c:461)
==25071==    by 0x8066D4E: pc_partition_map_iterate (pc.c:165)
==25071==    by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071==    by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071==    by 0x80498FA: call_hook (biosdisk.c:132)
==25071==    by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071==    by 0x804C4CC: grub_disk_dev_iterate (disk.c:205)
==25071==    by 0x804BF63: grub_device_iterate (device.c:138)
==25071== 
==25071== LEAK SUMMARY:
==25071==    definitely lost: 41,132 bytes in 12 blocks.
==25071==    indirectly lost: 4 bytes in 1 blocks.
==25071==      possibly lost: 4,096 bytes in 1 blocks.
==25071==    still reachable: 565,845 bytes in 162 blocks.
==25071==         suppressed: 0 bytes in 0 blocks.
==25071== Reachable blocks (those to which a pointer was found) are not shown.
==25071== To see them, rerun with: --leak-check=full --show-reachable=yes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
  2008-05-11 11:35       ` Bug#478238: grub-probe: fails to find drive for /dev/sda10 Török Edwin
@ 2008-05-12 15:44         ` Robert Millan
  2008-05-12 16:02           ` Török Edwin
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Millan @ 2008-05-12 15:44 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 750 bytes --]

On Sun, May 11, 2008 at 02:35:41PM +0300, Török Edwin wrote:
> 
> /dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
> iteration fails with an error: invalid disk label magic 0x%x.
> If I replace that return with a continue, it works.
> 
> The problem is that grub2 stops looking for more partitions as soon as
> it encountered the invalid partition,

I think a correct fix for this belongs in grub_partition_iterate().  It should
only let its hook determine abortability rather than mandate that invalid
partitions should cause abortion.  Please, can you test the attached patch?

-- 
Robert Millan

<GPLv2> I know my rights; I want my phone call!
<DRM> What use is a phone call… if you are unable to speak?
(as seen on /.)

[-- Attachment #2: partition_iterate.diff --]
[-- Type: text/x-diff, Size: 524 bytes --]

diff -x configure -x config.h.in -x CVS -x '*~' -x '*.mk' -urp ../grub2/kern/partition.c ./kern/partition.c
--- ../grub2/kern/partition.c	2007-07-22 01:32:26.000000000 +0200
+++ ./kern/partition.c	2008-05-12 17:41:51.000000000 +0200
@@ -89,7 +89,7 @@ grub_partition_iterate (struct grub_disk
 				     const grub_partition_t partition))
 {
   grub_partition_map_t partmap = 0;
-  int ret = 0;
+  int ret = 1;
   
   auto int part_map_iterate (const grub_partition_map_t p);
   auto int part_map_iterate_hook (grub_disk_t d,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
  2008-05-12 15:44         ` Robert Millan
@ 2008-05-12 16:02           ` Török Edwin
  2008-05-12 18:43             ` Robert Millan
  0 siblings, 1 reply; 4+ messages in thread
From: Török Edwin @ 2008-05-12 16:02 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

Robert Millan wrote:
> On Sun, May 11, 2008 at 02:35:41PM +0300, Török Edwin wrote:
>   
>> /dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
>> iteration fails with an error: invalid disk label magic 0x%x.
>> If I replace that return with a continue, it works.
>>
>> The problem is that grub2 stops looking for more partitions as soon as
>> it encountered the invalid partition,
>>     
>
> I think a correct fix for this belongs in grub_partition_iterate().  It should
> only let its hook determine abortability rather than mandate that invalid
> partitions should cause abortion.  Please, can you test the attached patch?
>   

I tested the patch, however it still says:
grub-probe: error: Cannot find a GRUB drive for /dev/sda10.  Check your
device.map.

I notice that /dev/sda10 appears in the -vv output though:
home/egrub-probe: info: opening the device `/dev/sda9'
grub-probe: info: opening the device `/dev/sda9'
grub-probe: info: /dev/sda10 starts from 83987883
grub-probe: info: opening the device hd0

I attached the full output of grub-probe -d /dev/sda10 -vv, for the case
when /dev/sda9 has the wrong partition type.

Best regards,
--Edwin

[-- Attachment #2: log.bz2 --]
[-- Type: application/x-bzip, Size: 2238 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
  2008-05-12 16:02           ` Török Edwin
@ 2008-05-12 18:43             ` Robert Millan
  0 siblings, 0 replies; 4+ messages in thread
From: Robert Millan @ 2008-05-12 18:43 UTC (permalink / raw)
  To: The development of GRUB 2

On Mon, May 12, 2008 at 07:02:46PM +0300, Török Edwin wrote:
> Robert Millan wrote:
> > On Sun, May 11, 2008 at 02:35:41PM +0300, Török Edwin wrote:
> >   
> >> /dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
> >> iteration fails with an error: invalid disk label magic 0x%x.
> >> If I replace that return with a continue, it works.
> >>
> >> The problem is that grub2 stops looking for more partitions as soon as
> >> it encountered the invalid partition,
> >>     
> >
> > I think a correct fix for this belongs in grub_partition_iterate().  It should
> > only let its hook determine abortability rather than mandate that invalid
> > partitions should cause abortion.  Please, can you test the attached patch?
> >   
> 
> I tested the patch, however it still says:
> grub-probe: error: Cannot find a GRUB drive for /dev/sda10.  Check your
> device.map.
> 
> I notice that /dev/sda10 appears in the -vv output though:
> home/egrub-probe: info: opening the device `/dev/sda9'
> grub-probe: info: opening the device `/dev/sda9'
> grub-probe: info: /dev/sda10 starts from 83987883
> grub-probe: info: opening the device hd0
> 
> I attached the full output of grub-probe -d /dev/sda10 -vv, for the case
> when /dev/sda9 has the wrong partition type.

It's better is you use grub-emu for this test.  It's less complex.

-- 
Robert Millan

<GPLv2> I know my rights; I want my phone call!
<DRM> What use is a phone call… if you are unable to speak?
(as seen on /.)



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-05-12 18:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <481590B8.7000405@gmail.com>
     [not found] ` <20080429134626.GB8328@thorin>
     [not found]   ` <481DC1BC.6020002@gmail.com>
     [not found]     ` <20080506133126.GG5055@thorin>
2008-05-11 11:35       ` Bug#478238: grub-probe: fails to find drive for /dev/sda10 Török Edwin
2008-05-12 15:44         ` Robert Millan
2008-05-12 16:02           ` Török Edwin
2008-05-12 18:43             ` Robert Millan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.