grub-devel.gnu.org archive mirror
 help / color / mirror / Atom feed
* [Patch] Robustly search for ZFS labels & uberblocks
@ 2011-09-19 18:45 Zachary Bedell
  2011-09-28 21:20 ` Vladimir 'φ-coder/phcoder' Serbinenko
  2011-11-03 14:45 ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 2 replies; 26+ messages in thread
From: Zachary Bedell @ 2011-09-19 18:45 UTC (permalink / raw)
  To: The GNU GRUB development of

[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]

Greetings,

In working with ZFSonLinux, I found myself with a number of somewhat inconstant pools which failed either when running grub-probe or at boot time.  In all cases, these pools were importable by the ZFS driver and completed a `zpool scrub` with no errors reported, but still Grub wasn't able to touch the pools.  In most (but not all) cases, running `zpool scrub` made the pool accessible to Grub again, though obviously that was less than helpful for boot-time failures.

I'm including an (unfortunately rather large) patch to Grub's ZFS code which fixes several edge cases where Grub is unable to read a pool which is otherwise valid as far as the full ZFS driver is concerned.  Changes include:

ashift:

* Support non-default values of ashift pool attribute.  When ashift is increased beyond 10, the size of the uberblocks changes.  Scan to find uberblocks must account for this and skip over the extra padding.  Based on patch to Grub 0.97 by Hans Rosenfeld at http://www.illumos.org/issues/1303


Label parsing changes:

* Access the two end-device labels at label-size aligned location rather than device_end-(2*label_size).  On-disk spec document incorrectly describes end-label locations.  Adapted from Grub 0.97 behavior.

* Scan all readable labels for uberblocks and accept the one with the highest txg/create date.  Previously UB scan would stop if a valid UB was found in Label 0 without checking for newer UB's in the other labels.  All labels must be scanned as a more recent block may be found in the other labels in certain circumstances.  This fixes a case where Grub would be unable to access a ZFS system unless the pool was scrubbed first (but ZFS itself saw no problems & scrub reported zero errors).

* Verify that uberblock txg is greater than that of the label before accepting the UB so that partially written UB's are not considered.

* Verify that underlying data pointed by uberblock ub_rootbp has valid checksums before accepting the UB.  This gives some possibility of booting from a technically invalid pool and effectively falls back to older uberblocks in a case where the correct uberblock is damaged.

* Store pool uuid found during zfs_mount to grub_zfs_data in order to prevent duplicated logic in zfs_uuid.


Data integrity:

* Verify checksum in grub_read_data before accepting block.  Attempt to read backup DVA's if checksum on first fails.  Previously backup DVA's were checked only if the physical read failed, not if bad data was read without error.  This resulted in pools which were valid and readable by ZFS (and transparently healed by ZFS' read behavior) being rejected by Grub.


Logging/Debugging:

* Provide more debugging output describing inconsistencies found in pool.

* Remove superflous debugging output to reduce bootup time in verbose mode to a mangeable level (~30min down to ~5min to boot w/ 'debug=all' in grub.cfg)



I do apologies for the monolithic patch.  I took a second look at the changes to try to pull certain elements apart, but I ended up in several situations where the test pools I have snapshotted in VM's each hit two or more of the issues above making testing in isolation difficult.  The total change set is able to probe and boot from all of the odd (as well as all of the normal) pools that I have on hand.  

If remixing this patch in some way would help in integrating it, I'd welcome any feedback on how to better present the changes.

Best regards,
Zac Bedell



[-- Attachment #2: grub-zfs-ashift-label.patch --]
[-- Type: application/octet-stream, Size: 28708 bytes --]

diff --git a/grub-core/fs/zfs/zfs.c b/grub-core/fs/zfs/zfs.c
index 1eea13b..d03e19b 100644
--- a/grub-core/fs/zfs/zfs.c
+++ b/grub-core/fs/zfs/zfs.c
@@ -75,9 +75,21 @@ GRUB_MOD_LICENSE ("GPLv3+");
 static grub_dl_t my_mod;
 #endif
 
+/*
+ * Macros to get fields in a bp or DVA.
+ */
 #define	P2PHASE(x, align)		((x) & ((align) - 1))
 #define	DVA_OFFSET_TO_PHYS_SECTOR(offset) \
 	((offset + VDEV_LABEL_START_SIZE) >> SPA_MINBLOCKSHIFT)
+	
+/*
+ * return x rounded down to an align boundary
+ * eg, P2ALIGN(1200, 1024) == 1024 (1*align)
+ * eg, P2ALIGN(1024, 1024) == 1024 (1*align)
+ * eg, P2ALIGN(0x1234, 0x100) == 0x1200 (0x12*align)
+ * eg, P2ALIGN(0x5600, 0x100) == 0x5600 (0x56*align)
+ */
+#define	P2ALIGN(x, align)		((x) & -(align))
 
 /*
  * FAT ZAP data structures
@@ -147,6 +159,14 @@ struct grub_zfs_data
   grub_uint64_t file_start;
   grub_uint64_t file_end;
 
+  /* XXX: ashift is per vdev, not per pool.  We currently only ever touch
+   * a single vdev, but when/if raid-z or stripes are supported, this
+   * may need revision.
+   */
+  grub_uint64_t vdev_ashift;
+  grub_uint64_t label_txg;
+  grub_uint64_t pool_guid;
+
   /* cache for a dnode block */
   dnode_phys_t *dnode_buf;
   dnode_phys_t *dnode_mdn;
@@ -192,7 +212,11 @@ static decomp_entry_t decomp_table[ZIO_COMPRESS_FUNCTIONS] = {
 
 static grub_err_t zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian,
 				 void *buf, struct grub_zfs_data *data);
-
+				
+static grub_err_t
+zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf, 
+	  grub_size_t *size, struct grub_zfs_data *data);
+	
 /*
  * Our own version of log2().  Same thing as highbit()-1.
  */
@@ -271,6 +295,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t checksum,
       || (actual_cksum.zc_word[2] != zc.zc_word[2]) 
       || (actual_cksum.zc_word[3] != zc.zc_word[3]))
     {
+      /*
       grub_dprintf ("zfs", "checksum %d verification failed\n", checksum);
       grub_dprintf ("zfs", "actual checksum %16llx %16llx %16llx %16llx\n",
 		    (unsigned long long) actual_cksum.zc_word[0], 
@@ -282,6 +307,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t checksum,
 		    (unsigned long long) zc.zc_word[1],
 		    (unsigned long long) zc.zc_word[2], 
 		    (unsigned long long) zc.zc_word[3]);
+	  */
       return grub_error (GRUB_ERR_BAD_FS, "checksum verification failed");
     }
 
@@ -332,17 +358,22 @@ vdev_uberblock_compare (uberblock_t * ub1, uberblock_t * ub2)
  * Three pieces of information are needed to verify an uberblock: the magic
  * number, the version number, and the checksum.
  *
- * Currently Implemented: version number, magic number
+ * Currently Implemented: version number, magic number, label txg
  * Need to Implement: checksum
  *
  */
 static grub_err_t
-uberblock_verify (uberblock_phys_t * ub, int offset)
+uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
 {
-  uberblock_t *uber = &ub->ubp_uberblock;
   grub_err_t err;
   grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
   zio_cksum_t zc;
+  void *osp = NULL;
+  grub_size_t ospsize;
+
+  if (uber->ub_txg < data->label_txg)
+    return grub_error (GRUB_ERR_BAD_FS, 
+	"ignoring partially written label: uber_txg < label_txg");
 
   if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
       && grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN) > 0 
@@ -358,10 +389,21 @@ uberblock_verify (uberblock_phys_t * ub, int offset)
     return grub_error (GRUB_ERR_BAD_FS, "invalid uberblock magic");
 
   grub_memset (&zc, 0, sizeof (zc));
-
   zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
   err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
-			     (char *) ub, UBERBLOCK_SIZE);
+			     (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
+
+  if (!err)
+    {
+      // Check that the data pointed by the rootbp is usable.
+      void *osp = NULL;
+      grub_size_t ospsize;
+      err = zio_read (&uber->ub_rootbp, endian, &osp, &ospsize, data);
+      grub_free (osp);
+
+      if (!err && ospsize < OBJSET_PHYS_SIZE_V14)
+		return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to invalid data");
+    }
 
   return err;
 }
@@ -372,32 +414,40 @@ uberblock_verify (uberblock_phys_t * ub, int offset)
  *    Success - Pointer to the best uberblock.
  *    Failure - NULL
  */
-static uberblock_phys_t *
-find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
+static uberblock_t *
+find_bestub (char *ub_array, struct grub_zfs_data *data)
 {
-  uberblock_phys_t *ubbest = NULL;
-  int i;
-  grub_disk_addr_t offset;
+  const grub_disk_addr_t sector = data->vdev_phys_sector;
+  uberblock_t *ubbest = NULL;
+  uberblock_t *ubnext;
+  unsigned int i, offset, pickedub = 0;
   grub_err_t err = GRUB_ERR_NONE;
 
-  for (i = 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++)
+  const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
+  const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
+
+  for (i = 0; i < UBCOUNT; i++)
     {
-      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
-	+ (i << VDEV_UBERBLOCK_SHIFT);
+      ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
+      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i * UBBYTES);
 
-      err = uberblock_verify (&ub_array[i], offset);
+      err = uberblock_verify (ubnext, offset, data);
       if (err)
-	{
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-      if (ubbest == NULL 
-	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
-				     &(ubbest->ubp_uberblock)) > 0)
-	ubbest = &ub_array[i];
+		{
+	      grub_errno = GRUB_ERR_NONE;
+	      continue;
+		}
+      if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest) > 0)
+		{
+	  	  ubbest = ubnext;
+	      pickedub = i;
+		}
     }
   if (!ubbest)
     grub_errno = err;
+  else
+    grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
+		  pickedub, (unsigned long long) ubbest->ub_txg);
 
   return (ubbest);
 }
@@ -412,9 +462,6 @@ get_psize (blkptr_t * bp, grub_zfs_endian_t endian)
 static grub_uint64_t
 dva_get_offset (dva_t * dva, grub_zfs_endian_t endian)
 {
-  grub_dprintf ("zfs", "dva=%llx, %llx\n", 
-		(unsigned long long) dva->dva_word[0], 
-		(unsigned long long) dva->dva_word[1]);
   return grub_zfs_to_cpu64 ((dva)->dva_word[1], 
 			    endian) << SPA_MINBLOCKSHIFT;
 }
@@ -440,11 +487,9 @@ zio_read_gang (blkptr_t * bp, grub_zfs_endian_t endian, dva_t * dva, void *buf,
   zio_gb = grub_malloc (SPA_GANGBLOCKSIZE);
   if (!zio_gb)
     return grub_errno;
-  grub_dprintf ("zfs", endian == LITTLE_ENDIAN ? "little-endian gang\n"
-		:"big-endian gang\n");
+
   offset = dva_get_offset (dva, endian);
   sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
-  grub_dprintf ("zfs", "offset=%llx\n", (unsigned long long) offset);
 
   /* read in the gang block header */
   err = grub_disk_read (data->disk, sector, 0, SPA_GANGBLOCKSIZE,
@@ -515,10 +560,20 @@ zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian, void *buf,
 	  sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
 	  err = grub_disk_read (data->disk, sector, 0, psize, buf); 
 	}
-      if (!err)
-	return GRUB_ERR_NONE;
-      grub_errno = GRUB_ERR_NONE;
-    }
+
+    if (!err)
+  	  {
+	  	// Check the underlying checksum before we rule this DVA as "good"
+	  	grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop, endian) >> 40) & 0xff;
+
+	  	err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf, psize);
+	  	if (!err)
+          return GRUB_ERR_NONE;
+	  }
+	
+       // If read failed or checksum bad, reset the error.  Hopefully we've got some more DVA's to try.
+       grub_errno = GRUB_ERR_NONE;
+     }
 
   if (!err)
     err = grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid DVA");
@@ -539,12 +594,9 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
   unsigned int comp;
   char *compbuf = NULL;
   grub_err_t err;
-  zio_cksum_t zc = bp->blk_cksum;
-  grub_uint32_t checksum;
 
   *buf = NULL;
 
-  checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian) >> 40) & 0xff;
   comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32) & 0xff;
   lsize = (BP_IS_HOLE(bp) ? 0 :
 	   (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian) & 0xffff) + 1)
@@ -571,7 +623,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
   else
     compbuf = *buf = grub_malloc (lsize);
 
-  grub_dprintf ("zfs", "endian = %d\n", endian);
   err = zio_read_data (bp, endian, compbuf, data);
   if (err)
     {
@@ -580,15 +631,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
       return err;
     }
 
-  err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
-  if (err)
-    {
-      grub_dprintf ("zfs", "incorrect checksum\n");
-      grub_free (compbuf);
-      *buf = NULL;
-      return err;
-    }
-
   if (comp != ZIO_COMPRESS_OFF)
     {
       *buf = grub_malloc (lsize);
@@ -635,7 +677,6 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, void **buf,
   endian = dn->endian;
   for (level = dn->dn.dn_nlevels - 1; level >= 0; level--)
     {
-      grub_dprintf ("zfs", "endian = %d\n", endian);
       idx = (blkid >> (epbs * level)) & ((1 << epbs) - 1);
       *bp = bp_array[idx];
       if (bp_array != dn->dn.dn_blkptr)
@@ -661,12 +702,10 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, void **buf,
 	}
       if (level == 0)
 	{
-	  grub_dprintf ("zfs", "endian = %d\n", endian);
 	  err = zio_read (bp, endian, buf, 0, data);
 	  endian = (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;
 	  break;
 	}
-      grub_dprintf ("zfs", "endian = %d\n", endian);
       err = zio_read (bp, endian, &tmpbuf, 0, data);
       endian = (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;
       if (err)
@@ -717,9 +756,6 @@ mzap_iterate (mzap_phys_t * zapobj, grub_zfs_endian_t endian, int objsize,
   chunks = objsize / MZAP_ENT_LEN - 1;
   for (i = 0; i < chunks; i++)
     {
-      grub_dprintf ("zfs", "zap: name = %s, value = %llx, cd = %x\n",
-		    mzap_ent[i].mze_name, (long long)mzap_ent[i].mze_value,
-		    (int)mzap_ent[i].mze_cd);
       if (hook (mzap_ent[i].mze_name, 
 		grub_zfs_to_cpu64 (mzap_ent[i].mze_value, endian)))
 	return 1;
@@ -849,8 +885,6 @@ zap_leaf_lookup (zap_leaf_phys_t * l, grub_zfs_endian_t endian,
       if (grub_zfs_to_cpu64 (le->le_hash,endian) != h)
 	continue;
 
-      grub_dprintf ("zfs", "fzap: length %d\n", (int) le->le_name_length);
-
       if (zap_leaf_array_equal (l, endian, blksft, 
 				grub_zfs_to_cpu16 (le->le_name_chunk,endian),
 				grub_zfs_to_cpu16 (le->le_name_length, endian),
@@ -1050,22 +1084,16 @@ zap_lookup (dnode_end_t * zap_dnode, char *name, grub_uint64_t * val,
     return err;
   block_type = grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endian);
 
-  grub_dprintf ("zfs", "zap read\n");
-
   if (block_type == ZBT_MICRO)
     {
-      grub_dprintf ("zfs", "micro zap\n");
       err = (mzap_lookup (zapbuf, endian, size, name, val));
-      grub_dprintf ("zfs", "returned %d\n", err);      
       grub_free (zapbuf);
       return err;
     }
   else if (block_type == ZBT_HEADER)
     {
-      grub_dprintf ("zfs", "fat zap\n");
       /* this is a fat zap */
       err = (fzap_lookup (zap_dnode, zapbuf, name, val, data));
-      grub_dprintf ("zfs", "returned %d\n", err);      
       grub_free (zapbuf);
       return err;
     }
@@ -1092,18 +1120,14 @@ zap_iterate (dnode_end_t * zap_dnode,
     return 0;
   block_type = grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endian);
 
-  grub_dprintf ("zfs", "zap read\n");
-
   if (block_type == ZBT_MICRO)
     {
-      grub_dprintf ("zfs", "micro zap\n");
       ret = mzap_iterate (zapbuf, endian, size, hook);
       grub_free (zapbuf);
       return ret;
     }
   else if (block_type == ZBT_HEADER)
     {
-      grub_dprintf ("zfs", "fat zap\n");
       /* this is a fat zap */
       ret = fzap_iterate (zap_dnode, zapbuf, hook, data);
       grub_free (zapbuf);
@@ -1150,12 +1174,9 @@ dnode_get (dnode_end_t * mdn, grub_uint64_t objnum, grub_uint8_t type,
       return GRUB_ERR_NONE;
     }
 
-  grub_dprintf ("zfs", "endian = %d, blkid=%llx\n", mdn->endian, 
-		(unsigned long long) blkid);
   err = dmu_read (mdn, blkid, &dnbuf, &endian, data);
   if (err)
     return err;
-  grub_dprintf ("zfs", "alive\n");
 
   grub_free (data->dnode_buf);
   grub_free (data->dnode_mdn);
@@ -1426,27 +1447,19 @@ get_filesystem_dnode (dnode_end_t * mosmdn, char *fsname,
   grub_uint64_t objnum;
   grub_err_t err;
 
-  grub_dprintf ("zfs", "endian = %d\n", mosmdn->endian);
-
   err = dnode_get (mosmdn, DMU_POOL_DIRECTORY_OBJECT, 
 		   DMU_OT_OBJECT_DIRECTORY, mdn, data);
   if (err)
     return err;
 
-  grub_dprintf ("zfs", "alive\n");
-
   err = zap_lookup (mdn, DMU_POOL_ROOT_DATASET, &objnum, data);
   if (err)
     return err;
 
-  grub_dprintf ("zfs", "alive\n");
-
   err = dnode_get (mosmdn, objnum, DMU_OT_DSL_DIR, mdn, data);
   if (err)
     return err;
 
-  grub_dprintf ("zfs", "alive\n");
-
   while (*fsname)
     {
       grub_uint64_t childobj;
@@ -1491,8 +1504,6 @@ make_mdn (dnode_end_t * mdn, struct grub_zfs_data *data)
   grub_size_t ospsize;
   grub_err_t err;
 
-  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
-
   bp = &(((dsl_dataset_phys_t *) DN_BONUS (&mdn->dn))->ds_bp);
   err = zio_read (bp, mdn->endian, &osp, &ospsize, data);
   if (err)
@@ -1558,7 +1569,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
       grub_dprintf ("zfs", "fsname = '%s' snapname='%s' filename = '%s'\n", 
 		    fsname, snapname, filename);
     }
-  grub_dprintf ("zfs", "alive\n");
   err = get_filesystem_dnode (&(data->mos), fsname, dn, data);
   if (err)
     {
@@ -1567,12 +1577,8 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
       return err;
     }
 
-  grub_dprintf ("zfs", "alive\n");
-
   headobj = grub_zfs_to_cpu64 (((dsl_dir_phys_t *) DN_BONUS (&dn->dn))->dd_head_dataset_obj, dn->endian);
 
-  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
-
   err = dnode_get (&(data->mos), headobj, DMU_OT_DSL_DATASET, mdn, data);
   if (err)
     {
@@ -1580,7 +1586,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
       grub_free (snapname);
       return err;
     }
-  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
 
   if (snapname)
     {
@@ -1606,8 +1611,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
     *mdnobj = headobj;
 
   make_mdn (mdn, data);
-  
-  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
 
   if (*isfs)
     {
@@ -1864,11 +1867,9 @@ zfs_fetch_nvlist (struct grub_zfs_data * data, char **nvlist)
 static grub_err_t
 check_pool_label (struct grub_zfs_data *data)
 {
-  grub_uint64_t pool_state, txg = 0;
-  char *nvlist;
-#if 0
-  char *nv;
-#endif
+  grub_uint64_t pool_state;
+  char *nvlist;			// for the pool
+  char *vdevnvlist;		// for the vdev
   grub_uint64_t diskguid;
   grub_uint64_t version;
   int found;
@@ -1878,8 +1879,6 @@ check_pool_label (struct grub_zfs_data *data)
   if (err)
     return err;
 
-  grub_dprintf ("zfs", "check 2 passed\n");
-
   found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_STATE,
 					 &pool_state);
   if (! found)
@@ -1889,16 +1888,16 @@ check_pool_label (struct grub_zfs_data *data)
 	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_STATE " not found");
       return grub_errno;
     }
-  grub_dprintf ("zfs", "check 3 passed\n");
 
   if (pool_state == POOL_STATE_DESTROYED)
     {
       grub_free (nvlist);
       return grub_error (GRUB_ERR_BAD_FS, "zpool is marked as destroyed");
     }
-  grub_dprintf ("zfs", "check 4 passed\n");
 
-  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG, &txg);
+  data->label_txg = 0;
+  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
+				   	&data->label_txg);
   if (!found)
     {
       grub_free (nvlist);
@@ -1906,15 +1905,13 @@ check_pool_label (struct grub_zfs_data *data)
 	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_TXG " not found");
       return grub_errno;
     }
-  grub_dprintf ("zfs", "check 6 passed\n");
 
   /* not an active device */
-  if (txg == 0)
+  if (data->label_txg == 0)
     {
       grub_free (nvlist);
       return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
     }
-  grub_dprintf ("zfs", "check 7 passed\n");
 
   found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_VERSION,
 					 &version);
@@ -1925,42 +1922,79 @@ check_pool_label (struct grub_zfs_data *data)
 	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VERSION " not found");
       return grub_errno;
     }
-  grub_dprintf ("zfs", "check 8 passed\n");
 
   if (version > SPA_VERSION)
     {
       grub_free (nvlist);
       return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
-			 "too new version %llu > %llu",
+			 "SPA version too new %llu > %llu",
 			 (unsigned long long) version,
 			 (unsigned long long) SPA_VERSION);
     }
-  grub_dprintf ("zfs", "check 9 passed\n");
-#if 0
-  if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE, &nv,
-			   DATA_TYPE_NVLIST, NULL))
+
+  vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_VDEV_TREE);
+  if (!vdevnvlist)
     {
-      grub_free (vdev);
-      return (GRUB_ERR_BAD_FS);
+      grub_free (nvlist);
+      if (!grub_errno)
+	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
+      return grub_errno;
+    }
+
+  found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
+					 &data->vdev_ashift);
+  grub_free (vdevnvlist);
+  if (!found)
+    {
+      grub_free (nvlist);
+      if (!grub_errno)
+	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
+      return grub_errno;
     }
-  grub_dprintf ("zfs", "check 10 passed\n");
-#endif
 
   found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID, &diskguid);
   if (! found)
     {
       grub_free (nvlist);
       if (! grub_errno)
-	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found");
+        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found");
+      return grub_errno;
+    }
+
+  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID, &data->pool_guid);
+  if (! found)
+    {
+      grub_free (nvlist);
+      if (! grub_errno)
+        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
       return grub_errno;
     }
-  grub_dprintf ("zfs", "check 11 passed\n");
 
   grub_free (nvlist);
 
+  grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
+	(unsigned long long) data->pool_guid,
+    (unsigned long long) data->pool_guid,
+	(unsigned long long) diskguid,
+    (unsigned long long) diskguid,
+    (unsigned long long) data->label_txg,
+    (unsigned long long) version,
+    (unsigned long long) data->vdev_ashift);
+
   return GRUB_ERR_NONE;
 }
 
+/*
+ * vdev_label_start returns the physical disk offset (in bytes) of
+ * label "l".
+ */
+static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
+{
+  return (l * sizeof (vdev_label_t) + (l < VDEV_LABELS / 2 ?
+				       0 : psize -
+				       VDEV_LABELS * sizeof (vdev_label_t)));
+}
+
 static void
 zfs_unmount (struct grub_zfs_data *data)
 {
@@ -1979,13 +2013,13 @@ static struct grub_zfs_data *
 zfs_mount (grub_device_t dev)
 {
   struct grub_zfs_data *data = 0;
-  int label = 0;
-  uberblock_phys_t *ub_array, *ubbest = NULL;
-  vdev_boot_header_t *bh;
+  int label = 0, bestlabel = -1;
+  char *ub_array;
+  uberblock_t *ubbest;
+  uberblock_t *ubcur = NULL;
   void *osp = 0;
   grub_size_t ospsize;
   grub_err_t err;
-  int vdevnum;
 
   if (! dev->disk)
     {
@@ -1997,11 +2031,6 @@ zfs_mount (grub_device_t dev)
   if (!data)
     return 0;
   grub_memset (data, 0, sizeof (*data));
-#if 0
-  /* if it's our first time here, zero the best uberblock out */
-  if (data->best_drive == 0 && data->best_part == 0 && find_best_root)
-    grub_memset (&current_uberblock, 0, sizeof (uberblock_t));
-#endif
 
   data->disk = dev->disk;
 
@@ -2012,100 +2041,129 @@ zfs_mount (grub_device_t dev)
       return 0;
     }
 
-  bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
-  if (!bh)
+  ubbest = grub_malloc (sizeof (*ubbest));
+  if (!ubbest)
     {
       zfs_unmount (data);
-      grub_free (ub_array);
       return 0;
     }
+  grub_memset (ubbest, 0, sizeof (*ubbest));
 
-  vdevnum = VDEV_LABELS;
+  // Establish some constants for where things are on the device:
 
-  /* Don't check back labels on CDROM.  */
-  if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
-    vdevnum = VDEV_LABELS / 2;
+  /*
+   * some eltorito stacks don't give us a size and
+   * we end up setting the size to MAXUINT, further
+   * some of these devices stop working once a single
+   * read past the end has been issued. Checking
+   * for a maximum part_length and skipping the backup
+   * labels at the end of the slice/partition/device
+   * avoids breaking down on such devices.
+   */
+  const int vdevnum =
+    grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
+    VDEV_LABELS / 2 : VDEV_LABELS;
+
+  // Size in bytes of the device (disk or partition) aligned to label size
+  grub_uint64_t device_size = 
+	grub_disk_get_size (dev->disk) << GRUB_DISK_SECTOR_BITS;
+  const grub_uint64_t alignedbytes =
+    P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
 
-  for (label = 0; ubbest == NULL && label < vdevnum; label++)
+  for (label = 0; label < vdevnum; label++)
     {
-      grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
-      grub_dprintf ("zfs", "label %d\n", label);
+      grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
+      grub_uint64_t labelstart = labelstartbytes >> GRUB_DISK_SECTOR_BITS;
 
-      data->vdev_phys_sector
-	= label * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT)
-	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT)
-	+ (label < VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
-	   - VDEV_LABELS * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT));
+      grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
+		    label, (unsigned long long) labelstart, 
+		(unsigned long long) labelstartbytes);
 
-      /* Read in the uberblock ring (128K). */
-      err = grub_disk_read (data->disk, data->vdev_phys_sector
-			    + (VDEV_PHYS_SIZE >> SPA_MINBLOCKSHIFT),
-			    0, VDEV_UBERBLOCK_RING, (char *) ub_array);
+      data->vdev_phys_sector = labelstart +
+		((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> GRUB_DISK_SECTOR_BITS);
+
+      err = check_pool_label (data);
       if (err)
-	{
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-      grub_dprintf ("zfs", "label ok %d\n", label);
+	    {
+		  grub_dprintf ("zfs", "error checking label %d\n", label);
+	      grub_errno = GRUB_ERR_NONE;
+	      continue;
+	    }
 
-      ubbest = find_bestub (ub_array, data->vdev_phys_sector);
-      if (!ubbest)
-	{
-	  grub_dprintf ("zfs", "No uberblock found\n");
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-      ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic, 
-				     LITTLE_ENDIAN) == UBERBLOCK_MAGIC 
-		   ? LITTLE_ENDIAN : BIG_ENDIAN);
-      err = zio_read (&ubbest->ubp_uberblock.ub_rootbp, 
-		      ub_endian,
-		      &osp, &ospsize, data);
+      /* Read in the uberblock ring (128K). */
+      err = grub_disk_read (data->disk,
+			    data->vdev_phys_sector +
+			    (VDEV_PHYS_SIZE >> GRUB_DISK_SECTOR_BITS),
+			    0, VDEV_UBERBLOCK_RING, ub_array);
       if (err)
-	{
-	  grub_dprintf ("zfs", "couldn't zio_read\n"); 
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
+	    {
+		  grub_dprintf ("zfs", "error reading uberblock ring for label %d\n", label);
+	      grub_errno = GRUB_ERR_NONE;
+	      continue;
+	    }
 
-      if (ospsize < OBJSET_PHYS_SIZE_V14)
-	{
-	  grub_dprintf ("zfs", "osp too small\n"); 
-	  grub_free (osp);
-	  continue;
-	}
-      grub_dprintf ("zfs", "ubbest %p\n", ubbest);
+      ubcur = find_bestub (ub_array, data);
+      if (!ubcur)
+	    {
+	      grub_dprintf ("zfs", "No good uberblocks found in label %d\n", label);
+	      grub_errno = GRUB_ERR_NONE;
+	      continue;
+	    }
 
-      err = check_pool_label (data);
-      if (err)
+      if (vdev_uberblock_compare (ubcur, ubbest) > 0)
+	    {
+		  // Looks like the block is good, so use it.
+	      grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
+	      bestlabel = label;
+	      grub_dprintf ("zfs", "Current best uberblock found in label %d\n", label);
+	    }
+    }
+  grub_free (ub_array);
+
+  // We zero'd the structure to begin with.  If we never assigned to it, 
+  // magic will still be zero.
+  if (!ubbest->ub_magic)
+    {
+      grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
+      zfs_unmount (data);
+      grub_free (ubbest);
+      return 0;
+    }
+
+  grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
+
+  grub_zfs_endian_t ub_endian =
+    grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
+    ? LITTLE_ENDIAN : BIG_ENDIAN;
+  err = zio_read (&ubbest->ub_rootbp, ub_endian, &osp, &ospsize, data);
+
+  if (err)
 	{
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
+	  grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
+	  zfs_unmount (data);
+	  grub_free (ubbest);
+	  return 0;
 	}
-#if 0
-      if (find_best_root &&
-	  vdev_uberblock_compare (&ubbest->ubp_uberblock,
-				  &(current_uberblock)) <= 0)
-	continue;
-#endif
-      /* Got the MOS. Save it at the memory addr MOS. */
-      grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode,
-		    DNODE_SIZE);
-      data->mos.endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
-      grub_memmove (&(data->current_uberblock),
-		    &ubbest->ubp_uberblock, sizeof (uberblock_t));
-      grub_free (ub_array);
-      grub_free (bh);
-      grub_free (osp);
-      return data;  
+
+  if (ospsize < OBJSET_PHYS_SIZE_V14)
+    {
+	  grub_error (GRUB_ERR_BAD_FS, "osp too small");
+	  zfs_unmount (data);
+	  grub_free (osp);
+	  grub_free (ubbest);
+	  return 0;
     }
-  grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
-  zfs_unmount (data);
-  grub_free (ub_array);
-  grub_free (bh);
-  grub_free (osp);
 
-  return 0;
+  /* Got the MOS. Save it at the memory addr MOS. */
+  grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode, DNODE_SIZE);
+  data->mos.endian =
+		(grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
+  grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
+
+  grub_free (osp);
+  grub_free (ubbest);
+  
+  return (data);
 }
 
 grub_err_t
@@ -2149,33 +2207,18 @@ zfs_label (grub_device_t device, char **label)
 static grub_err_t 
 zfs_uuid (grub_device_t device, char **uuid)
 {
-  char *nvlist;
-  int found;
   struct grub_zfs_data *data;
-  grub_uint64_t guid;
-  grub_err_t err;
-
-  *uuid = 0;
 
   data = zfs_mount (device);
   if (! data)
     return grub_errno;
 
-  err = zfs_fetch_nvlist (data, &nvlist);
-  if (err)
-    {
-      zfs_unmount (data);
-      return err;
-    }
-
-  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID, &guid);
-  if (! found)
-    return grub_errno;
-  grub_free (nvlist);
-  *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
+  *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
   zfs_unmount (data);
+
   if (! *uuid)
     return grub_errno;
+
   return GRUB_ERR_NONE;
 }
 
diff --git a/include/grub/zfs/uberblock_impl.h b/include/grub/zfs/uberblock_impl.h
index 1bf7f2b..94b5ad7 100644
--- a/include/grub/zfs/uberblock_impl.h
+++ b/include/grub/zfs/uberblock_impl.h
@@ -23,6 +23,8 @@
 #ifndef _SYS_UBERBLOCK_IMPL_H
 #define	_SYS_UBERBLOCK_IMPL_H
 
+#define UBMAX(a,b) ((a) > (b) ? (a) : (b))
+
 /*
  * The uberblock version is incremented whenever an incompatible on-disk
  * format change is made to the SPA, DMU, or ZAP.
@@ -45,16 +47,10 @@ typedef struct uberblock {
 	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
 } uberblock_t;
 
-#define	UBERBLOCK_SIZE		(1ULL << UBERBLOCK_SHIFT)
-#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
-
-/* XXX Uberblock_phys_t is no longer in the kernel zfs */
-typedef struct uberblock_phys {
-	uberblock_t	ubp_uberblock;
-	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
-				sizeof (zio_eck_t)];
-	zio_eck_t	ubp_zec;
-} uberblock_phys_t;
-
+#define	VDEV_UBERBLOCK_SHIFT(as)	UBMAX(as, UBERBLOCK_SHIFT)
+#define	UBERBLOCK_SIZE(as)		(1ULL << VDEV_UBERBLOCK_SHIFT(as))
+ 
+// Number of uberblocks that can fit in the ring at a given ashift
+#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT(as))
 
 #endif	/* _SYS_UBERBLOCK_IMPL_H */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2011-09-19 18:45 [Patch] Robustly search for ZFS labels & uberblocks Zachary Bedell
@ 2011-09-28 21:20 ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-19 11:36   ` Richard Laager
  2011-11-03 14:45 ` Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 1 reply; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2011-09-28 21:20 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 4025 bytes --]

Could you please split this patch? In particular the removing of dprintf
takes to much of this patch and makes it unreadable. Note that we don't
comment out the code, we only remove it.
On 19.09.2011 20:45, Zachary Bedell wrote:
> Greetings,
>
> In working with ZFSonLinux, I found myself with a number of somewhat inconstant pools which failed either when running grub-probe or at boot time.  In all cases, these pools were importable by the ZFS driver and completed a `zpool scrub` with no errors reported, but still Grub wasn't able to touch the pools.  In most (but not all) cases, running `zpool scrub` made the pool accessible to Grub again, though obviously that was less than helpful for boot-time failures.
>
> I'm including an (unfortunately rather large) patch to Grub's ZFS code which fixes several edge cases where Grub is unable to read a pool which is otherwise valid as far as the full ZFS driver is concerned.  Changes include:
>
> ashift:
>
> * Support non-default values of ashift pool attribute.  When ashift is increased beyond 10, the size of the uberblocks changes.  Scan to find uberblocks must account for this and skip over the extra padding.  Based on patch to Grub 0.97 by Hans Rosenfeld at http://www.illumos.org/issues/1303
>
>
> Label parsing changes:
>
> * Access the two end-device labels at label-size aligned location rather than device_end-(2*label_size).  On-disk spec document incorrectly describes end-label locations.  Adapted from Grub 0.97 behavior.
>
> * Scan all readable labels for uberblocks and accept the one with the highest txg/create date.  Previously UB scan would stop if a valid UB was found in Label 0 without checking for newer UB's in the other labels.  All labels must be scanned as a more recent block may be found in the other labels in certain circumstances.  This fixes a case where Grub would be unable to access a ZFS system unless the pool was scrubbed first (but ZFS itself saw no problems & scrub reported zero errors).
>
> * Verify that uberblock txg is greater than that of the label before accepting the UB so that partially written UB's are not considered.
>
> * Verify that underlying data pointed by uberblock ub_rootbp has valid checksums before accepting the UB.  This gives some possibility of booting from a technically invalid pool and effectively falls back to older uberblocks in a case where the correct uberblock is damaged.
>
> * Store pool uuid found during zfs_mount to grub_zfs_data in order to prevent duplicated logic in zfs_uuid.
>
>
> Data integrity:
>
> * Verify checksum in grub_read_data before accepting block.  Attempt to read backup DVA's if checksum on first fails.  Previously backup DVA's were checked only if the physical read failed, not if bad data was read without error.  This resulted in pools which were valid and readable by ZFS (and transparently healed by ZFS' read behavior) being rejected by Grub.
>
>
> Logging/Debugging:
>
> * Provide more debugging output describing inconsistencies found in pool.
>
> * Remove superflous debugging output to reduce bootup time in verbose mode to a mangeable level (~30min down to ~5min to boot w/ 'debug=all' in grub.cfg)
>
>
>
> I do apologies for the monolithic patch.  I took a second look at the changes to try to pull certain elements apart, but I ended up in several situations where the test pools I have snapshotted in VM's each hit two or more of the issues above making testing in isolation difficult.  The total change set is able to probe and boot from all of the odd (as well as all of the normal) pools that I have on hand.  
>
> If remixing this patch in some way would help in integrating it, I'd welcome any feedback on how to better present the changes.
>
> Best regards,
> Zac Bedell
>
>
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 294 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2011-09-19 18:45 [Patch] Robustly search for ZFS labels & uberblocks Zachary Bedell
  2011-09-28 21:20 ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2011-11-03 14:45 ` Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2011-11-03 14:45 UTC (permalink / raw)
  To: grub-devel

[-- Attachment #1: Type: text/plain, Size: 36080 bytes --]

On 19.09.2011 20:45, Zachary Bedell wrote:
> Greetings,
>
> In working with ZFSonLinux, I found myself with a number of somewhat inconstant pools which failed either when running grub-probe or at boot time.  In all cases, these pools were importable by the ZFS driver and completed a `zpool scrub` with no errors reported, but still Grub wasn't able to touch the pools.  In most (but not all) cases, running `zpool scrub` made the pool accessible to Grub again, though obviously that was less than helpful for boot-time failures.
>
Patch is pretty difficult to read due to spacing changes. Please use -bB
or indent
> I'm including an (unfortunately rather large) patch to Grub's ZFS code which fixes several edge cases where Grub is unable to read a pool which is otherwise valid as far as the full ZFS driver is concerned.  Changes include:
>
> ashift:
>
> * Support non-default values of ashift pool attribute.  When ashift is increased beyond 10, the size of the uberblocks changes.  Scan to find uberblocks must account for this and skip over the extra padding.  Based on patch to Grub 0.97 by Hans Rosenfeld at http://www.illumos.org/issues/1303
>
>
> Label parsing changes:
>
> * Access the two end-device labels at label-size aligned location rather than device_end-(2*label_size).  On-disk spec document incorrectly describes end-label locations.  Adapted from Grub 0.97 behavior.
>
> * Scan all readable labels for uberblocks and accept the one with the highest txg/create date.  Previously UB scan would stop if a valid UB was found in Label 0 without checking for newer UB's in the other labels.  All labels must be scanned as a more recent block may be found in the other labels in certain circumstances.  This fixes a case where Grub would be unable to access a ZFS system unless the pool was scrubbed first (but ZFS itself saw no problems & scrub reported zero errors).
>
> * Verify that uberblock txg is greater than that of the label before accepting the UB so that partially written UB's are not considered.
>
> * Verify that underlying data pointed by uberblock ub_rootbp has valid checksums before accepting the UB.  This gives some possibility of booting from a technically invalid pool and effectively falls back to older uberblocks in a case where the correct uberblock is damaged.
>
> * Store pool uuid found during zfs_mount to grub_zfs_data in order to prevent duplicated logic in zfs_uuid.
>
>
> Data integrity:
>
> * Verify checksum in grub_read_data before accepting block.  Attempt to read backup DVA's if checksum on first fails.  Previously backup DVA's were checked only if the physical read failed, not if bad data was read without error.  This resulted in pools which were valid and readable by ZFS (and transparently healed by ZFS' read behavior) being rejected by Grub.
>
>
> Logging/Debugging:
>
> * Provide more debugging output describing inconsistencies found in pool.
>
> * Remove superflous debugging output to reduce bootup time in verbose mode to a mangeable level (~30min down to ~5min to boot w/ 'debug=all' in grub.cfg)
>
>
>
> I do apologies for the monolithic patch.  I took a second look at the changes to try to pull certain elements apart, but I ended up in several situations where the test pools I have snapshotted in VM's each hit two or more of the issues above making testing in isolation difficult.  The total change set is able to probe and boot from all of the odd (as well as all of the normal) pools that I have on hand.  
>
> If remixing this patch in some way would help in integrating it, I'd welcome any feedback on how to better present the changes.
>
> Best regards,
> Zac Bedell
>
>
> grub-zfs-ashift-label.patch
>
>
> diff --git a/grub-core/fs/zfs/zfs.c b/grub-core/fs/zfs/zfs.c
> index 1eea13b..d03e19b 100644
> --- a/grub-core/fs/zfs/zfs.c
> +++ b/grub-core/fs/zfs/zfs.c
> @@ -75,9 +75,21 @@ GRUB_MOD_LICENSE ("GPLv3+");
>  static grub_dl_t my_mod;
>  #endif
>  
> +/*
> + * Macros to get fields in a bp or DVA.
> + */
>  #define	P2PHASE(x, align)		((x) & ((align) - 1))
>  #define	DVA_OFFSET_TO_PHYS_SECTOR(offset) \
>  	((offset + VDEV_LABEL_START_SIZE) >> SPA_MINBLOCKSHIFT)
> +	
> +/*
> + * return x rounded down to an align boundary
> + * eg, P2ALIGN(1200, 1024) == 1024 (1*align)
> + * eg, P2ALIGN(1024, 1024) == 1024 (1*align)
> + * eg, P2ALIGN(0x1234, 0x100) == 0x1200 (0x12*align)
> + * eg, P2ALIGN(0x5600, 0x100) == 0x5600 (0x56*align)
> + */
> +#define	P2ALIGN(x, align)		((x) & -(align))
>  
We already have such macros, they're called ALIGN_UP and ALIGN_DOWN.
>  /*
>   * FAT ZAP data structures
> @@ -147,6 +159,14 @@ struct grub_zfs_data
>    grub_uint64_t file_start;
>    grub_uint64_t file_end;
>  
> +  /* XXX: ashift is per vdev, not per pool.  We currently only ever touch
> +   * a single vdev, but when/if raid-z or stripes are supported, this
> +   * may need revision.
> +   */
> +  grub_uint64_t vdev_ashift;
> +  grub_uint64_t label_txg;
> +  grub_uint64_t pool_guid;
> +
We have multidevice support and we have a special structure for it.
Moreover it seems that you need it only on initial scan and so no need
to permanently store it.
>    /* cache for a dnode block */
>    dnode_phys_t *dnode_buf;
>    dnode_phys_t *dnode_mdn;
> @@ -192,7 +212,11 @@ static decomp_entry_t decomp_table[ZIO_COMPRESS_FUNCTIONS] = {
>  
>  static grub_err_t zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian,
>  				 void *buf, struct grub_zfs_data *data);
> -
> +				
> +static grub_err_t
> +zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf, 
> +	  grub_size_t *size, struct grub_zfs_data *data);
> +	
>  /*
>   * Our own version of log2().  Same thing as highbit()-1.
>   */
> @@ -271,6 +295,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t checksum,
>        || (actual_cksum.zc_word[2] != zc.zc_word[2]) 
>        || (actual_cksum.zc_word[3] != zc.zc_word[3]))
>      {
> +      /*
>        grub_dprintf ("zfs", "checksum %d verification failed\n", checksum);
>        grub_dprintf ("zfs", "actual checksum %16llx %16llx %16llx %16llx\n",
>  		    (unsigned long long) actual_cksum.zc_word[0], 
> @@ -282,6 +307,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t checksum,
>  		    (unsigned long long) zc.zc_word[1],
>  		    (unsigned long long) zc.zc_word[2], 
>  		    (unsigned long long) zc.zc_word[3]);
> +	  */
Please don't comment out the code. This particular one is very useful
actually.
>        return grub_error (GRUB_ERR_BAD_FS, "checksum verification failed");
>      }
>  
> @@ -332,17 +358,22 @@ vdev_uberblock_compare (uberblock_t * ub1, uberblock_t * ub2)
>   * Three pieces of information are needed to verify an uberblock: the magic
>   * number, the version number, and the checksum.
>   *
> - * Currently Implemented: version number, magic number
> + * Currently Implemented: version number, magic number, label txg
>   * Need to Implement: checksum
>   *
>   */
>  static grub_err_t
> -uberblock_verify (uberblock_phys_t * ub, int offset)
> +uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
>  {
> -  uberblock_t *uber = &ub->ubp_uberblock;
>    grub_err_t err;
>    grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
>    zio_cksum_t zc;
> +  void *osp = NULL;
> +  grub_size_t ospsize;
> +
> +  if (uber->ub_txg < data->label_txg)
> +    return grub_error (GRUB_ERR_BAD_FS, 
> +	"ignoring partially written label: uber_txg < label_txg");
>  
>    if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>        && grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN) > 0 
> @@ -358,10 +389,21 @@ uberblock_verify (uberblock_phys_t * ub, int offset)
>      return grub_error (GRUB_ERR_BAD_FS, "invalid uberblock magic");
>  
>    grub_memset (&zc, 0, sizeof (zc));
> -
>    zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
>    err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
> -			     (char *) ub, UBERBLOCK_SIZE);
> +			     (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
> +
> +  if (!err)
> +    {
> +      // Check that the data pointed by the rootbp is usable.
> +      void *osp = NULL;
> +      grub_size_t ospsize;
> +      err = zio_read (&uber->ub_rootbp, endian, &osp, &ospsize, data);
> +      grub_free (osp);
> +
> +      if (!err && ospsize < OBJSET_PHYS_SIZE_V14)
> +		return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to invalid data");
> +    }
>  
This should be if (err) return err; ....
>    return err;
>  }
> @@ -372,32 +414,40 @@ uberblock_verify (uberblock_phys_t * ub, int offset)
>   *    Success - Pointer to the best uberblock.
>   *    Failure - NULL
>   */
> -static uberblock_phys_t *
> -find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
> +static uberblock_t *
> +find_bestub (char *ub_array, struct grub_zfs_data *data)
>  {
> -  uberblock_phys_t *ubbest = NULL;
> -  int i;
> -  grub_disk_addr_t offset;
> +  const grub_disk_addr_t sector = data->vdev_phys_sector;
> +  uberblock_t *ubbest = NULL;
> +  uberblock_t *ubnext;
> +  unsigned int i, offset, pickedub = 0;
>    grub_err_t err = GRUB_ERR_NONE;
>  
> -  for (i = 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++)
> +  const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
> +  const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
> +
> +  for (i = 0; i < UBCOUNT; i++)
>      {
> -      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
> -	+ (i << VDEV_UBERBLOCK_SHIFT);
> +      ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
> +      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i * UBBYTES);
>  
> -      err = uberblock_verify (&ub_array[i], offset);
> +      err = uberblock_verify (ubnext, offset, data);
>        if (err)
> -	{
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -      if (ubbest == NULL 
> -	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
> -				     &(ubbest->ubp_uberblock)) > 0)
> -	ubbest = &ub_array[i];
> +		{
> +	      grub_errno = GRUB_ERR_NONE;
> +	      continue;
> +		}
> +      if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest) > 0)
> +		{
> +	  	  ubbest = ubnext;
> +	      pickedub = i;
> +		}
>      }
>    if (!ubbest)
>      grub_errno = err;
> +  else
> +    grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
> +		  pickedub, (unsigned long long) ubbest->ub_txg);
>  
Again the same error. Don't put such if's.
>    return (ubbest);
>  }
> @@ -412,9 +462,6 @@ get_psize (blkptr_t * bp, grub_zfs_endian_t endian)
>  static grub_uint64_t
>  dva_get_offset (dva_t * dva, grub_zfs_endian_t endian)
>  {
> -  grub_dprintf ("zfs", "dva=%llx, %llx\n", 
> -		(unsigned long long) dva->dva_word[0], 
> -		(unsigned long long) dva->dva_word[1]);
>    return grub_zfs_to_cpu64 ((dva)->dva_word[1], 
>  			    endian) << SPA_MINBLOCKSHIFT;
>  }
Please don't mix dprintf removals with actual changes. Just makes more
difficult to read.
> @@ -440,11 +487,9 @@ zio_read_gang (blkptr_t * bp, grub_zfs_endian_t endian, dva_t * dva, void *buf,
>    zio_gb = grub_malloc (SPA_GANGBLOCKSIZE);
>    if (!zio_gb)
>      return grub_errno;
> -  grub_dprintf ("zfs", endian == LITTLE_ENDIAN ? "little-endian gang\n"
> -		:"big-endian gang\n");
> +
>    offset = dva_get_offset (dva, endian);
>    sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
> -  grub_dprintf ("zfs", "offset=%llx\n", (unsigned long long) offset);
>  
>    /* read in the gang block header */
>    err = grub_disk_read (data->disk, sector, 0, SPA_GANGBLOCKSIZE,
> @@ -515,10 +560,20 @@ zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian, void *buf,
>  	  sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
>  	  err = grub_disk_read (data->disk, sector, 0, psize, buf); 
>  	}
> -      if (!err)
> -	return GRUB_ERR_NONE;
> -      grub_errno = GRUB_ERR_NONE;
> -    }
> +
> +    if (!err)
> +  	  {
> +	  	// Check the underlying checksum before we rule this DVA as "good"
> +	  	grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop, endian) >> 40) & 0xff;
> +
> +	  	err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf, psize);
> +	  	if (!err)
> +          return GRUB_ERR_NONE;
> +	  }
> +	
> +       // If read failed or checksum bad, reset the error.  Hopefully we've got some more DVA's to try.
You need to correctly handle the case when you don't. See e.g. the
mirror reading code. Also we don't use C++ style comments. I fail to see
the structure and need of this checksum move.
> +       grub_errno = GRUB_ERR_NONE;
> +     }
>  
>    if (!err)
>      err = grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid DVA");
> @@ -539,12 +594,9 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
>    unsigned int comp;
>    char *compbuf = NULL;
>    grub_err_t err;
> -  zio_cksum_t zc = bp->blk_cksum;
> -  grub_uint32_t checksum;
>  
>    *buf = NULL;
>  
> -  checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian) >> 40) & 0xff;
>    comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32) & 0xff;
>    lsize = (BP_IS_HOLE(bp) ? 0 :
>  	   (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian) & 0xffff) + 1)
> @@ -571,7 +623,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
>    else
>      compbuf = *buf = grub_malloc (lsize);
>  
> -  grub_dprintf ("zfs", "endian = %d\n", endian);
>    err = zio_read_data (bp, endian, compbuf, data);
>    if (err)
>      {
> @@ -580,15 +631,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
>        return err;
>      }
>  
> -  err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
> -  if (err)
> -    {
> -      grub_dprintf ("zfs", "incorrect checksum\n");
> -      grub_free (compbuf);
> -      *buf = NULL;
> -      return err;
> -    }
> -
>    if (comp != ZIO_COMPRESS_OFF)
>      {
>        *buf = grub_malloc (lsize);
> @@ -635,7 +677,6 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, void **buf,
>    endian = dn->endian;
>    for (level = dn->dn.dn_nlevels - 1; level >= 0; level--)
>      {
> -      grub_dprintf ("zfs", "endian = %d\n", endian);
>        idx = (blkid >> (epbs * level)) & ((1 << epbs) - 1);
>        *bp = bp_array[idx];
>        if (bp_array != dn->dn.dn_blkptr)
> @@ -661,12 +702,10 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, void **buf,
>  	}
>        if (level == 0)
>  	{
> -	  grub_dprintf ("zfs", "endian = %d\n", endian);
>  	  err = zio_read (bp, endian, buf, 0, data);
>  	  endian = (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;
>  	  break;
>  	}
> -      grub_dprintf ("zfs", "endian = %d\n", endian);
>        err = zio_read (bp, endian, &tmpbuf, 0, data);
>        endian = (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;
>        if (err)
> @@ -717,9 +756,6 @@ mzap_iterate (mzap_phys_t * zapobj, grub_zfs_endian_t endian, int objsize,
>    chunks = objsize / MZAP_ENT_LEN - 1;
>    for (i = 0; i < chunks; i++)
>      {
> -      grub_dprintf ("zfs", "zap: name = %s, value = %llx, cd = %x\n",
> -		    mzap_ent[i].mze_name, (long long)mzap_ent[i].mze_value,
> -		    (int)mzap_ent[i].mze_cd);
>        if (hook (mzap_ent[i].mze_name, 
>  		grub_zfs_to_cpu64 (mzap_ent[i].mze_value, endian)))
>  	return 1;
> @@ -849,8 +885,6 @@ zap_leaf_lookup (zap_leaf_phys_t * l, grub_zfs_endian_t endian,
>        if (grub_zfs_to_cpu64 (le->le_hash,endian) != h)
>  	continue;
>  
> -      grub_dprintf ("zfs", "fzap: length %d\n", (int) le->le_name_length);
> -
>        if (zap_leaf_array_equal (l, endian, blksft, 
>  				grub_zfs_to_cpu16 (le->le_name_chunk,endian),
>  				grub_zfs_to_cpu16 (le->le_name_length, endian),
> @@ -1050,22 +1084,16 @@ zap_lookup (dnode_end_t * zap_dnode, char *name, grub_uint64_t * val,
>      return err;
>    block_type = grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endian);
>  
> -  grub_dprintf ("zfs", "zap read\n");
> -
>    if (block_type == ZBT_MICRO)
>      {
> -      grub_dprintf ("zfs", "micro zap\n");
>        err = (mzap_lookup (zapbuf, endian, size, name, val));
> -      grub_dprintf ("zfs", "returned %d\n", err);      
>        grub_free (zapbuf);
>        return err;
>      }
>    else if (block_type == ZBT_HEADER)
>      {
> -      grub_dprintf ("zfs", "fat zap\n");
>        /* this is a fat zap */
>        err = (fzap_lookup (zap_dnode, zapbuf, name, val, data));
> -      grub_dprintf ("zfs", "returned %d\n", err);      
>        grub_free (zapbuf);
>        return err;
>      }
> @@ -1092,18 +1120,14 @@ zap_iterate (dnode_end_t * zap_dnode,
>      return 0;
>    block_type = grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endian);
>  
> -  grub_dprintf ("zfs", "zap read\n");
> -
>    if (block_type == ZBT_MICRO)
>      {
> -      grub_dprintf ("zfs", "micro zap\n");
>        ret = mzap_iterate (zapbuf, endian, size, hook);
>        grub_free (zapbuf);
>        return ret;
>      }
>    else if (block_type == ZBT_HEADER)
>      {
> -      grub_dprintf ("zfs", "fat zap\n");
>        /* this is a fat zap */
>        ret = fzap_iterate (zap_dnode, zapbuf, hook, data);
>        grub_free (zapbuf);
> @@ -1150,12 +1174,9 @@ dnode_get (dnode_end_t * mdn, grub_uint64_t objnum, grub_uint8_t type,
>        return GRUB_ERR_NONE;
>      }
>  
> -  grub_dprintf ("zfs", "endian = %d, blkid=%llx\n", mdn->endian, 
> -		(unsigned long long) blkid);
>    err = dmu_read (mdn, blkid, &dnbuf, &endian, data);
>    if (err)
>      return err;
> -  grub_dprintf ("zfs", "alive\n");
>  
>    grub_free (data->dnode_buf);
>    grub_free (data->dnode_mdn);
> @@ -1426,27 +1447,19 @@ get_filesystem_dnode (dnode_end_t * mosmdn, char *fsname,
>    grub_uint64_t objnum;
>    grub_err_t err;
>  
> -  grub_dprintf ("zfs", "endian = %d\n", mosmdn->endian);
> -
>    err = dnode_get (mosmdn, DMU_POOL_DIRECTORY_OBJECT, 
>  		   DMU_OT_OBJECT_DIRECTORY, mdn, data);
>    if (err)
>      return err;
>  
> -  grub_dprintf ("zfs", "alive\n");
> -
>    err = zap_lookup (mdn, DMU_POOL_ROOT_DATASET, &objnum, data);
>    if (err)
>      return err;
>  
> -  grub_dprintf ("zfs", "alive\n");
> -
>    err = dnode_get (mosmdn, objnum, DMU_OT_DSL_DIR, mdn, data);
>    if (err)
>      return err;
>  
> -  grub_dprintf ("zfs", "alive\n");
> -
>    while (*fsname)
>      {
>        grub_uint64_t childobj;
> @@ -1491,8 +1504,6 @@ make_mdn (dnode_end_t * mdn, struct grub_zfs_data *data)
>    grub_size_t ospsize;
>    grub_err_t err;
>  
> -  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
> -
>    bp = &(((dsl_dataset_phys_t *) DN_BONUS (&mdn->dn))->ds_bp);
>    err = zio_read (bp, mdn->endian, &osp, &ospsize, data);
>    if (err)
> @@ -1558,7 +1569,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
>        grub_dprintf ("zfs", "fsname = '%s' snapname='%s' filename = '%s'\n", 
>  		    fsname, snapname, filename);
>      }
> -  grub_dprintf ("zfs", "alive\n");
>    err = get_filesystem_dnode (&(data->mos), fsname, dn, data);
>    if (err)
>      {
> @@ -1567,12 +1577,8 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
>        return err;
>      }
>  
> -  grub_dprintf ("zfs", "alive\n");
> -
>    headobj = grub_zfs_to_cpu64 (((dsl_dir_phys_t *) DN_BONUS (&dn->dn))->dd_head_dataset_obj, dn->endian);
>  
> -  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
> -
>    err = dnode_get (&(data->mos), headobj, DMU_OT_DSL_DATASET, mdn, data);
>    if (err)
>      {
> @@ -1580,7 +1586,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
>        grub_free (snapname);
>        return err;
>      }
> -  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
>  
>    if (snapname)
>      {
> @@ -1606,8 +1611,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t * mdn,
>      *mdnobj = headobj;
>  
>    make_mdn (mdn, data);
> -  
> -  grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
>  
>    if (*isfs)
>      {
> @@ -1864,11 +1867,9 @@ zfs_fetch_nvlist (struct grub_zfs_data * data, char **nvlist)
>  static grub_err_t
>  check_pool_label (struct grub_zfs_data *data)
>  {
> -  grub_uint64_t pool_state, txg = 0;
> -  char *nvlist;
> -#if 0
> -  char *nv;
> -#endif
> +  grub_uint64_t pool_state;
> +  char *nvlist;			// for the pool
> +  char *vdevnvlist;		// for the vdev
>    grub_uint64_t diskguid;
>    grub_uint64_t version;
>    int found;
> @@ -1878,8 +1879,6 @@ check_pool_label (struct grub_zfs_data *data)
>    if (err)
>      return err;
>  
> -  grub_dprintf ("zfs", "check 2 passed\n");
> -
>    found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_STATE,
>  					 &pool_state);
>    if (! found)
> @@ -1889,16 +1888,16 @@ check_pool_label (struct grub_zfs_data *data)
>  	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_STATE " not found");
>        return grub_errno;
>      }
> -  grub_dprintf ("zfs", "check 3 passed\n");
>  
>    if (pool_state == POOL_STATE_DESTROYED)
>      {
>        grub_free (nvlist);
>        return grub_error (GRUB_ERR_BAD_FS, "zpool is marked as destroyed");
>      }
> -  grub_dprintf ("zfs", "check 4 passed\n");
>  
> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG, &txg);
> +  data->label_txg = 0;
> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
> +				   	&data->label_txg);
>    if (!found)
>      {
>        grub_free (nvlist);
> @@ -1906,15 +1905,13 @@ check_pool_label (struct grub_zfs_data *data)
>  	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_TXG " not found");
>        return grub_errno;
>      }
> -  grub_dprintf ("zfs", "check 6 passed\n");
>  
>    /* not an active device */
> -  if (txg == 0)
> +  if (data->label_txg == 0)
>      {
>        grub_free (nvlist);
>        return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
>      }
> -  grub_dprintf ("zfs", "check 7 passed\n");
>  
>    found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_VERSION,
>  					 &version);
> @@ -1925,42 +1922,79 @@ check_pool_label (struct grub_zfs_data *data)
>  	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VERSION " not found");
>        return grub_errno;
>      }
> -  grub_dprintf ("zfs", "check 8 passed\n");
>  
>    if (version > SPA_VERSION)
>      {
>        grub_free (nvlist);
>        return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
> -			 "too new version %llu > %llu",
> +			 "SPA version too new %llu > %llu",
>  			 (unsigned long long) version,
>  			 (unsigned long long) SPA_VERSION);
>      }
> -  grub_dprintf ("zfs", "check 9 passed\n");
> -#if 0
> -  if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE, &nv,
> -			   DATA_TYPE_NVLIST, NULL))
> +
> +  vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_VDEV_TREE);
> +  if (!vdevnvlist)
>      {
> -      grub_free (vdev);
> -      return (GRUB_ERR_BAD_FS);
> +      grub_free (nvlist);
> +      if (!grub_errno)
> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
> +      return grub_errno;
> +    }
> +
> +  found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
> +					 &data->vdev_ashift);
> +  grub_free (vdevnvlist);
> +  if (!found)
> +    {
> +      grub_free (nvlist);
> +      if (!grub_errno)
> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
> +      return grub_errno;
>      }
> -  grub_dprintf ("zfs", "check 10 passed\n");
> -#endif
>  
>    found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID, &diskguid);
>    if (! found)
>      {
>        grub_free (nvlist);
>        if (! grub_errno)
> -	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found");
> +        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found");
> +      return grub_errno;
> +    }
> +
> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID, &data->pool_guid);
> +  if (! found)
> +    {
> +      grub_free (nvlist);
> +      if (! grub_errno)
> +        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
>        return grub_errno;
>      }
> -  grub_dprintf ("zfs", "check 11 passed\n");
>  
>    grub_free (nvlist);
>  
> +  grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
> +	(unsigned long long) data->pool_guid,
> +    (unsigned long long) data->pool_guid,
> +	(unsigned long long) diskguid,
> +    (unsigned long long) diskguid,
> +    (unsigned long long) data->label_txg,
> +    (unsigned long long) version,
> +    (unsigned long long) data->vdev_ashift);
> +
>    return GRUB_ERR_NONE;
>  }
>  
> +/*
> + * vdev_label_start returns the physical disk offset (in bytes) of
> + * label "l".
> + */
> +static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
> +{
> +  return (l * sizeof (vdev_label_t) + (l < VDEV_LABELS / 2 ?
> +				       0 : psize -
> +				       VDEV_LABELS * sizeof (vdev_label_t)));
> +}
> +
>  static void
>  zfs_unmount (struct grub_zfs_data *data)
>  {
> @@ -1979,13 +2013,13 @@ static struct grub_zfs_data *
>  zfs_mount (grub_device_t dev)
>  {
>    struct grub_zfs_data *data = 0;
> -  int label = 0;
> -  uberblock_phys_t *ub_array, *ubbest = NULL;
> -  vdev_boot_header_t *bh;
> +  int label = 0, bestlabel = -1;
> +  char *ub_array;
> +  uberblock_t *ubbest;
> +  uberblock_t *ubcur = NULL;
>    void *osp = 0;
>    grub_size_t ospsize;
>    grub_err_t err;
> -  int vdevnum;
>  
>    if (! dev->disk)
>      {
> @@ -1997,11 +2031,6 @@ zfs_mount (grub_device_t dev)
>    if (!data)
>      return 0;
>    grub_memset (data, 0, sizeof (*data));
> -#if 0
> -  /* if it's our first time here, zero the best uberblock out */
> -  if (data->best_drive == 0 && data->best_part == 0 && find_best_root)
> -    grub_memset (&current_uberblock, 0, sizeof (uberblock_t));
> -#endif
>  
>    data->disk = dev->disk;
>  
> @@ -2012,100 +2041,129 @@ zfs_mount (grub_device_t dev)
>        return 0;
>      }
>  
> -  bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
> -  if (!bh)
> +  ubbest = grub_malloc (sizeof (*ubbest));
> +  if (!ubbest)
>      {
>        zfs_unmount (data);
> -      grub_free (ub_array);
>        return 0;
>      }
> +  grub_memset (ubbest, 0, sizeof (*ubbest));
>  
> -  vdevnum = VDEV_LABELS;
> +  // Establish some constants for where things are on the device:
>  
> -  /* Don't check back labels on CDROM.  */
> -  if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
> -    vdevnum = VDEV_LABELS / 2;
> +  /*
> +   * some eltorito stacks don't give us a size and
> +   * we end up setting the size to MAXUINT, further
> +   * some of these devices stop working once a single
> +   * read past the end has been issued. Checking
> +   * for a maximum part_length and skipping the backup
> +   * labels at the end of the slice/partition/device
> +   * avoids breaking down on such devices.
> +   */
> +  const int vdevnum =
> +    grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
> +    VDEV_LABELS / 2 : VDEV_LABELS;
> +
> +  // Size in bytes of the device (disk or partition) aligned to label size
> +  grub_uint64_t device_size = 
> +	grub_disk_get_size (dev->disk) << GRUB_DISK_SECTOR_BITS;
> +  const grub_uint64_t alignedbytes =
> +    P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
This is better to be moved to vdev_label_start.
>  
> -  for (label = 0; ubbest == NULL && label < vdevnum; label++)
> +  for (label = 0; label < vdevnum; label++)
>      {
> -      grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
> -      grub_dprintf ("zfs", "label %d\n", label);
> +      grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
> +      grub_uint64_t labelstart = labelstartbytes >> GRUB_DISK_SECTOR_BITS;
>  
> -      data->vdev_phys_sector
> -	= label * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT)
> -	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT)
> -	+ (label < VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
> -	   - VDEV_LABELS * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT));
> +      grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
> +		    label, (unsigned long long) labelstart, 
> +		(unsigned long long) labelstartbytes);
>  
> -      /* Read in the uberblock ring (128K). */
> -      err = grub_disk_read (data->disk, data->vdev_phys_sector
> -			    + (VDEV_PHYS_SIZE >> SPA_MINBLOCKSHIFT),
> -			    0, VDEV_UBERBLOCK_RING, (char *) ub_array);
> +      data->vdev_phys_sector = labelstart +
> +		((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> GRUB_DISK_SECTOR_BITS);
> +
> +      err = check_pool_label (data);
>        if (err)
> -	{
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -      grub_dprintf ("zfs", "label ok %d\n", label);
> +	    {
> +		  grub_dprintf ("zfs", "error checking label %d\n", label);
> +	      grub_errno = GRUB_ERR_NONE;
> +	      continue;
> +	    }
>  
> -      ubbest = find_bestub (ub_array, data->vdev_phys_sector);
> -      if (!ubbest)
> -	{
> -	  grub_dprintf ("zfs", "No uberblock found\n");
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -      ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic, 
> -				     LITTLE_ENDIAN) == UBERBLOCK_MAGIC 
> -		   ? LITTLE_ENDIAN : BIG_ENDIAN);
> -      err = zio_read (&ubbest->ubp_uberblock.ub_rootbp, 
> -		      ub_endian,
> -		      &osp, &ospsize, data);
> +      /* Read in the uberblock ring (128K). */
> +      err = grub_disk_read (data->disk,
> +			    data->vdev_phys_sector +
> +			    (VDEV_PHYS_SIZE >> GRUB_DISK_SECTOR_BITS),
> +			    0, VDEV_UBERBLOCK_RING, ub_array);
>        if (err)
> -	{
> -	  grub_dprintf ("zfs", "couldn't zio_read\n"); 
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> +	    {
> +		  grub_dprintf ("zfs", "error reading uberblock ring for label %d\n", label);
> +	      grub_errno = GRUB_ERR_NONE;
> +	      continue;
> +	    }
>  
> -      if (ospsize < OBJSET_PHYS_SIZE_V14)
> -	{
> -	  grub_dprintf ("zfs", "osp too small\n"); 
> -	  grub_free (osp);
> -	  continue;
> -	}
> -      grub_dprintf ("zfs", "ubbest %p\n", ubbest);
> +      ubcur = find_bestub (ub_array, data);
> +      if (!ubcur)
> +	    {
> +	      grub_dprintf ("zfs", "No good uberblocks found in label %d\n", label);
> +	      grub_errno = GRUB_ERR_NONE;
> +	      continue;
> +	    }
>  
> -      err = check_pool_label (data);
> -      if (err)
> +      if (vdev_uberblock_compare (ubcur, ubbest) > 0)
> +	    {
> +		  // Looks like the block is good, so use it.
> +	      grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
> +	      bestlabel = label;
> +	      grub_dprintf ("zfs", "Current best uberblock found in label %d\n", label);
> +	    }
> +    }
> +  grub_free (ub_array);
> +
> +  // We zero'd the structure to begin with.  If we never assigned to it, 
> +  // magic will still be zero.
> +  if (!ubbest->ub_magic)
> +    {
> +      grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
> +      zfs_unmount (data);
> +      grub_free (ubbest);
> +      return 0;
> +    }
> +
> +  grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
> +
> +  grub_zfs_endian_t ub_endian =
> +    grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
> +    ? LITTLE_ENDIAN : BIG_ENDIAN;
> +  err = zio_read (&ubbest->ub_rootbp, ub_endian, &osp, &ospsize, data);
> +
> +  if (err)
>  	{
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> +	  grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
> +	  zfs_unmount (data);
> +	  grub_free (ubbest);
> +	  return 0;
>  	}
> -#if 0
> -      if (find_best_root &&
> -	  vdev_uberblock_compare (&ubbest->ubp_uberblock,
> -				  &(current_uberblock)) <= 0)
> -	continue;
> -#endif
> -      /* Got the MOS. Save it at the memory addr MOS. */
> -      grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode,
> -		    DNODE_SIZE);
> -      data->mos.endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
> -      grub_memmove (&(data->current_uberblock),
> -		    &ubbest->ubp_uberblock, sizeof (uberblock_t));
> -      grub_free (ub_array);
> -      grub_free (bh);
> -      grub_free (osp);
> -      return data;  
> +
> +  if (ospsize < OBJSET_PHYS_SIZE_V14)
> +    {
> +	  grub_error (GRUB_ERR_BAD_FS, "osp too small");
> +	  zfs_unmount (data);
> +	  grub_free (osp);
> +	  grub_free (ubbest);
> +	  return 0;
>      }
> -  grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
> -  zfs_unmount (data);
> -  grub_free (ub_array);
> -  grub_free (bh);
> -  grub_free (osp);
>  
> -  return 0;
> +  /* Got the MOS. Save it at the memory addr MOS. */
> +  grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode, DNODE_SIZE);
> +  data->mos.endian =
> +		(grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
> +  grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
> +
> +  grub_free (osp);
> +  grub_free (ubbest);
> +  
> +  return (data);
>  }
>  
>  grub_err_t
> @@ -2149,33 +2207,18 @@ zfs_label (grub_device_t device, char **label)
>  static grub_err_t 
>  zfs_uuid (grub_device_t device, char **uuid)
>  {
> -  char *nvlist;
> -  int found;
>    struct grub_zfs_data *data;
> -  grub_uint64_t guid;
> -  grub_err_t err;
> -
> -  *uuid = 0;
>  
>    data = zfs_mount (device);
>    if (! data)
>      return grub_errno;
>  
> -  err = zfs_fetch_nvlist (data, &nvlist);
> -  if (err)
> -    {
> -      zfs_unmount (data);
> -      return err;
> -    }
> -
> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID, &guid);
> -  if (! found)
> -    return grub_errno;
> -  grub_free (nvlist);
> -  *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
> +  *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
>    zfs_unmount (data);
> +
>    if (! *uuid)
>      return grub_errno;
> +
>    return GRUB_ERR_NONE;
>  }
>  
> diff --git a/include/grub/zfs/uberblock_impl.h b/include/grub/zfs/uberblock_impl.h
> index 1bf7f2b..94b5ad7 100644
> --- a/include/grub/zfs/uberblock_impl.h
> +++ b/include/grub/zfs/uberblock_impl.h
> @@ -23,6 +23,8 @@
>  #ifndef _SYS_UBERBLOCK_IMPL_H
>  #define	_SYS_UBERBLOCK_IMPL_H
>  
> +#define UBMAX(a,b) ((a) > (b) ? (a) : (b))
> +
>  /*
>   * The uberblock version is incremented whenever an incompatible on-disk
>   * format change is made to the SPA, DMU, or ZAP.
> @@ -45,16 +47,10 @@ typedef struct uberblock {
>  	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
>  } uberblock_t;
>  
> -#define	UBERBLOCK_SIZE		(1ULL << UBERBLOCK_SHIFT)
> -#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
> -
> -/* XXX Uberblock_phys_t is no longer in the kernel zfs */
> -typedef struct uberblock_phys {
> -	uberblock_t	ubp_uberblock;
> -	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
> -				sizeof (zio_eck_t)];
> -	zio_eck_t	ubp_zec;
> -} uberblock_phys_t;
> -
> +#define	VDEV_UBERBLOCK_SHIFT(as)	UBMAX(as, UBERBLOCK_SHIFT)
> +#define	UBERBLOCK_SIZE(as)		(1ULL << VDEV_UBERBLOCK_SHIFT(as))
> + 
> +// Number of uberblocks that can fit in the ring at a given ashift
> +#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT(as))
>  
>  #endif	/* _SYS_UBERBLOCK_IMPL_H */
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 294 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2011-09-28 21:20 ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-01-19 11:36   ` Richard Laager
  2012-01-22 14:18     ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Laager @ 2012-01-19 11:36 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko,
	Zachary Bedell
  Cc: grub-devel


[-- Attachment #1.1: Type: text/plain, Size: 1387 bytes --]

Original mailing list thread:
http://lists.gnu.org/archive/html/grub-devel/2011-09/msg00031.html

On 2011-09-28, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> Could you please split this patch? In particular the removing of dprintf
> takes to much of this patch and makes it unreadable. Note that we don't
> comment out the code, we only remove it.

[I am not the original patch writer.]

I've edited the patch to get rid of the dprintf removals. I also fixed a
pile of whitespace discrepancies. The patch is much easier to read now.
This updated patch is attached. It applies cleanly to revision 3473.
Unfortunately, I'm not familiar enough with the GRUB codebase to merge
it with anything much newer.

According to the commit message in the URL above, there are 7
non-logging changes in this patch. Even knowing which of those, if any,
have already been addressed (or rendered moot) in trunk would be
extremely helpful, as it might bring the amount of work required to
within my grasp.

phcoder: Any chance you could take a look at this?

I think ashift=12 support is still missing in trunk. gentoofan on IRC
provided this (completely untested) patch to trunk:
http://paste.pocoo.org/show/535465/

pendor: Do you still have copies of the zpools that were triggering
these errors? Do they contain confidential data? How big are they?

Thanks,
Richard

[-- Attachment #1.2: grub-zfs-ashift-label-2.patch --]
[-- Type: text/x-patch, Size: 19519 bytes --]

=== modified file 'grub-core/fs/zfs/zfs.c'
--- grub-core/fs/zfs/zfs.c	2011-06-23 22:31:29 +0000
+++ grub-core/fs/zfs/zfs.c	2012-01-19 11:28:09 +0000
@@ -75,11 +75,23 @@
 static grub_dl_t my_mod;
 #endif
 
+/*
+ * Macros to get fields in a bp or DVA.
+ */
 #define	P2PHASE(x, align)		((x) & ((align) - 1))
 #define	DVA_OFFSET_TO_PHYS_SECTOR(offset) \
 	((offset + VDEV_LABEL_START_SIZE) >> SPA_MINBLOCKSHIFT)
 
 /*
+ * return x rounded down to an align boundary
+ * eg, P2ALIGN(1200, 1024) == 1024 (1*align)
+ * eg, P2ALIGN(1024, 1024) == 1024 (1*align)
+ * eg, P2ALIGN(0x1234, 0x100) == 0x1200 (0x12*align)
+ * eg, P2ALIGN(0x5600, 0x100) == 0x5600 (0x56*align)
+ */
+#define	P2ALIGN(x, align)		((x) & -(align))
+
+/*
  * FAT ZAP data structures
  */
 #define	ZFS_CRC64_POLY 0xC96C5795D7870F42ULL	/* ECMA-182, reflected form */
@@ -147,6 +159,14 @@
   grub_uint64_t file_start;
   grub_uint64_t file_end;
 
+  /* XXX: ashift is per vdev, not per pool.  We currently only ever touch
+   * a single vdev, but when/if raid-z or stripes are supported, this
+   * may need revision.
+   */
+  grub_uint64_t vdev_ashift;
+  grub_uint64_t label_txg;
+  grub_uint64_t pool_guid;
+
   /* cache for a dnode block */
   dnode_phys_t *dnode_buf;
   dnode_phys_t *dnode_mdn;
@@ -193,6 +213,10 @@
 static grub_err_t zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian,
 				 void *buf, struct grub_zfs_data *data);
 
+static grub_err_t
+zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
+	  grub_size_t *size, struct grub_zfs_data *data);
+
 /*
  * Our own version of log2().  Same thing as highbit()-1.
  */
@@ -332,17 +356,22 @@
  * Three pieces of information are needed to verify an uberblock: the magic
  * number, the version number, and the checksum.
  *
- * Currently Implemented: version number, magic number
+ * Currently Implemented: version number, magic number, label txg
  * Need to Implement: checksum
  *
  */
 static grub_err_t
-uberblock_verify (uberblock_phys_t * ub, int offset)
+uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
 {
-  uberblock_t *uber = &ub->ubp_uberblock;
   grub_err_t err;
   grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
   zio_cksum_t zc;
+  void *osp = NULL;
+  grub_size_t ospsize;
+
+  if (uber->ub_txg < data->label_txg)
+    return grub_error (GRUB_ERR_BAD_FS,
+	"ignoring partially written label: uber_txg < label_txg");
 
   if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
       && grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN) > 0 
@@ -361,7 +390,19 @@
 
   zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
   err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
-			     (char *) ub, UBERBLOCK_SIZE);
+			     (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
+
+  if (!err)
+    {
+      /* Check that the data pointed by the rootbp is usable. */
+      void *osp = NULL;
+      grub_size_t ospsize;
+      err = zio_read (&uber->ub_rootbp, endian, &osp, &ospsize, data);
+      grub_free (osp);
+
+      if (!err && ospsize < OBJSET_PHYS_SIZE_V14)
+	return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to invalid data");
+    }
 
   return err;
 }
@@ -372,32 +413,40 @@
  *    Success - Pointer to the best uberblock.
  *    Failure - NULL
  */
-static uberblock_phys_t *
-find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
+static uberblock_t *
+find_bestub (char *ub_array, struct grub_zfs_data *data)
 {
-  uberblock_phys_t *ubbest = NULL;
-  int i;
-  grub_disk_addr_t offset;
+  const grub_disk_addr_t sector = data->vdev_phys_sector;
+  uberblock_t *ubbest = NULL;
+  uberblock_t *ubnext;
+  unsigned int i, offset, pickedub = 0;
   grub_err_t err = GRUB_ERR_NONE;
 
-  for (i = 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++)
+  const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
+  const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
+
+  for (i = 0; i < UBCOUNT; i++)
     {
-      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
-	+ (i << VDEV_UBERBLOCK_SHIFT);
+      ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
+      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i * UBBYTES);
 
-      err = uberblock_verify (&ub_array[i], offset);
+      err = uberblock_verify (ubnext, offset, data);
       if (err)
 	{
 	  grub_errno = GRUB_ERR_NONE;
 	  continue;
 	}
-      if (ubbest == NULL 
-	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
-				     &(ubbest->ubp_uberblock)) > 0)
-	ubbest = &ub_array[i];
+      if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest) > 0)
+	{
+	  ubbest = ubnext;
+	  pickedub = i;
+	}
     }
   if (!ubbest)
     grub_errno = err;
+  else
+    grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
+		  pickedub, (unsigned long long) ubbest->ub_txg);
 
   return (ubbest);
 }
@@ -515,8 +564,18 @@
 	  sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
 	  err = grub_disk_read (data->disk, sector, 0, psize, buf); 
 	}
+
       if (!err)
-	return GRUB_ERR_NONE;
+	{
+	  /* Check the underlying checksum before we rule this DVA as "good" */
+	  grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop, endian) >> 40) & 0xff;
+	  err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf, psize);
+	  if (!err)
+	    return GRUB_ERR_NONE;
+	}
+
+      /* If read failed or checksum bad, reset the error.  Hopefully we've got
+       * some more DVA's to try. */
       grub_errno = GRUB_ERR_NONE;
     }
 
@@ -539,12 +598,9 @@
   unsigned int comp;
   char *compbuf = NULL;
   grub_err_t err;
-  zio_cksum_t zc = bp->blk_cksum;
-  grub_uint32_t checksum;
 
   *buf = NULL;
 
-  checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian) >> 40) & 0xff;
   comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32) & 0xff;
   lsize = (BP_IS_HOLE(bp) ? 0 :
 	   (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian) & 0xffff) + 1)
@@ -580,15 +636,6 @@
       return err;
     }
 
-  err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
-  if (err)
-    {
-      grub_dprintf ("zfs", "incorrect checksum\n");
-      grub_free (compbuf);
-      *buf = NULL;
-      return err;
-    }
-
   if (comp != ZIO_COMPRESS_OFF)
     {
       *buf = grub_malloc (lsize);
@@ -1864,11 +1911,9 @@
 static grub_err_t
 check_pool_label (struct grub_zfs_data *data)
 {
-  grub_uint64_t pool_state, txg = 0;
-  char *nvlist;
-#if 0
-  char *nv;
-#endif
+  grub_uint64_t pool_state;
+  char *nvlist;			/* for the pool */
+  char *vdevnvlist;		/* for the vdev */
   grub_uint64_t diskguid;
   grub_uint64_t version;
   int found;
@@ -1898,7 +1943,9 @@
     }
   grub_dprintf ("zfs", "check 4 passed\n");
 
-  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG, &txg);
+  data->label_txg = 0;
+  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
+					 &data->label_txg);
   if (!found)
     {
       grub_free (nvlist);
@@ -1909,7 +1956,7 @@
   grub_dprintf ("zfs", "check 6 passed\n");
 
   /* not an active device */
-  if (txg == 0)
+  if (data->label_txg == 0)
     {
       grub_free (nvlist);
       return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
@@ -1931,20 +1978,32 @@
     {
       grub_free (nvlist);
       return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
-			 "too new version %llu > %llu",
+			 "SPA version too new %llu > %llu",
 			 (unsigned long long) version,
 			 (unsigned long long) SPA_VERSION);
     }
   grub_dprintf ("zfs", "check 9 passed\n");
-#if 0
-  if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE, &nv,
-			   DATA_TYPE_NVLIST, NULL))
+
+  vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_VDEV_TREE);
+  if (!vdevnvlist)
     {
-      grub_free (vdev);
-      return (GRUB_ERR_BAD_FS);
+      grub_free (nvlist);
+      if (!grub_errno)
+	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
+      return grub_errno;
     }
   grub_dprintf ("zfs", "check 10 passed\n");
-#endif
+
+  found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
+					 &data->vdev_ashift);
+  grub_free (vdevnvlist);
+  if (!found)
+    {
+      grub_free (nvlist);
+      if (!grub_errno)
+	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
+      return grub_errno;
+    }
 
   found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID, &diskguid);
   if (! found)
@@ -1956,11 +2015,42 @@
     }
   grub_dprintf ("zfs", "check 11 passed\n");
 
+  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID, &data->pool_guid);
+  if (! found)
+    {
+      grub_free (nvlist);
+      if (! grub_errno)
+        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
+      return grub_errno;
+    }
+   grub_dprintf ("zfs", "check 12 passed\n");
+
+
   grub_free (nvlist);
 
+  grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
+    (unsigned long long) data->pool_guid,
+    (unsigned long long) data->pool_guid,
+    (unsigned long long) diskguid,
+    (unsigned long long) diskguid,
+    (unsigned long long) data->label_txg,
+    (unsigned long long) version,
+    (unsigned long long) data->vdev_ashift);
+
   return GRUB_ERR_NONE;
 }
 
+/*
+ * vdev_label_start returns the physical disk offset (in bytes) of
+ * label "l".
+ */
+static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
+{
+  return (l * sizeof (vdev_label_t) + (l < VDEV_LABELS / 2 ?
+				       0 : psize -
+				       VDEV_LABELS * sizeof (vdev_label_t)));
+}
+
 static void
 zfs_unmount (struct grub_zfs_data *data)
 {
@@ -1979,13 +2069,13 @@
 zfs_mount (grub_device_t dev)
 {
   struct grub_zfs_data *data = 0;
-  int label = 0;
-  uberblock_phys_t *ub_array, *ubbest = NULL;
-  vdev_boot_header_t *bh;
+  int label = 0, bestlabel = -1;
+  char *ub_array;
+  uberblock_t *ubbest;
+  uberblock_t *ubcur = NULL;
   void *osp = 0;
   grub_size_t ospsize;
   grub_err_t err;
-  int vdevnum;
 
   if (! dev->disk)
     {
@@ -1997,11 +2087,6 @@
   if (!data)
     return 0;
   grub_memset (data, 0, sizeof (*data));
-#if 0
-  /* if it's our first time here, zero the best uberblock out */
-  if (data->best_drive == 0 && data->best_part == 0 && find_best_root)
-    grub_memset (&current_uberblock, 0, sizeof (uberblock_t));
-#endif
 
   data->disk = dev->disk;
 
@@ -2012,100 +2097,129 @@
       return 0;
     }
 
-  bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
-  if (!bh)
+  ubbest = grub_malloc (sizeof (*ubbest));
+  if (!ubbest)
     {
       zfs_unmount (data);
-      grub_free (ub_array);
       return 0;
     }
-
-  vdevnum = VDEV_LABELS;
-
-  /* Don't check back labels on CDROM.  */
-  if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
-    vdevnum = VDEV_LABELS / 2;
-
-  for (label = 0; ubbest == NULL && label < vdevnum; label++)
+  grub_memset (ubbest, 0, sizeof (*ubbest));
+
+  /* Establish some constants for where things are on the device: */
+
+  /*
+   *  Some eltorito stacks don't give us a size and we end up setting the
+   * size to MAXUINT, further some of these devices stop working once a single
+   * read past the end has been issued. Checking for a maximum part_length and
+   * skipping the backup labels at the end of the slice/partition/device
+   * avoids breaking down on such devices.
+   */
+  const int vdevnum =
+    grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
+    VDEV_LABELS / 2 : VDEV_LABELS;
+
+  /* Size in bytes of the device (disk or partition) aligned to label size */
+  grub_uint64_t device_size =
+	grub_disk_get_size (dev->disk) << GRUB_DISK_SECTOR_BITS;
+  const grub_uint64_t alignedbytes =
+    P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
+
+  for (label = 0; label < vdevnum; label++)
     {
-      grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
-      grub_dprintf ("zfs", "label %d\n", label);
-
-      data->vdev_phys_sector
-	= label * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT)
-	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT)
-	+ (label < VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
-	   - VDEV_LABELS * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT));
+      grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
+      grub_uint64_t labelstart = labelstartbytes >> GRUB_DISK_SECTOR_BITS;
+
+      grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
+		    label, (unsigned long long) labelstart,
+		    (unsigned long long) labelstartbytes);
+
+      data->vdev_phys_sector = labelstart +
+		((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> GRUB_DISK_SECTOR_BITS);
+
+      err = check_pool_label (data);
+      if (err)
+	{
+	  grub_dprintf ("zfs", "error checking label %d\n", label);
+	  grub_errno = GRUB_ERR_NONE;
+	  continue;
+	}
 
       /* Read in the uberblock ring (128K). */
-      err = grub_disk_read (data->disk, data->vdev_phys_sector
-			    + (VDEV_PHYS_SIZE >> SPA_MINBLOCKSHIFT),
-			    0, VDEV_UBERBLOCK_RING, (char *) ub_array);
-      if (err)
-	{
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-      grub_dprintf ("zfs", "label ok %d\n", label);
-
-      ubbest = find_bestub (ub_array, data->vdev_phys_sector);
-      if (!ubbest)
-	{
-	  grub_dprintf ("zfs", "No uberblock found\n");
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-      ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic, 
-				     LITTLE_ENDIAN) == UBERBLOCK_MAGIC 
-		   ? LITTLE_ENDIAN : BIG_ENDIAN);
-      err = zio_read (&ubbest->ubp_uberblock.ub_rootbp, 
-		      ub_endian,
-		      &osp, &ospsize, data);
-      if (err)
-	{
-	  grub_dprintf ("zfs", "couldn't zio_read\n"); 
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-
-      if (ospsize < OBJSET_PHYS_SIZE_V14)
-	{
-	  grub_dprintf ("zfs", "osp too small\n"); 
-	  grub_free (osp);
-	  continue;
-	}
-      grub_dprintf ("zfs", "ubbest %p\n", ubbest);
-
-      err = check_pool_label (data);
-      if (err)
-	{
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
-#if 0
-      if (find_best_root &&
-	  vdev_uberblock_compare (&ubbest->ubp_uberblock,
-				  &(current_uberblock)) <= 0)
-	continue;
-#endif
-      /* Got the MOS. Save it at the memory addr MOS. */
-      grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode,
-		    DNODE_SIZE);
-      data->mos.endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
-      grub_memmove (&(data->current_uberblock),
-		    &ubbest->ubp_uberblock, sizeof (uberblock_t));
-      grub_free (ub_array);
-      grub_free (bh);
-      grub_free (osp);
-      return data;  
+      err = grub_disk_read (data->disk,
+			    data->vdev_phys_sector +
+			    (VDEV_PHYS_SIZE >> GRUB_DISK_SECTOR_BITS),
+			    0, VDEV_UBERBLOCK_RING, ub_array);
+      if (err)
+	{
+	  grub_dprintf ("zfs", "error reading uberblock ring for label %d\n", label);
+	  grub_errno = GRUB_ERR_NONE;
+	  continue;
+	}
+
+      ubcur = find_bestub (ub_array, data);
+      if (!ubcur)
+	{
+	  grub_dprintf ("zfs", "No good uberblocks found in label %d\n", label);
+	  grub_errno = GRUB_ERR_NONE;
+	  continue;
+	}
+
+      if (vdev_uberblock_compare (ubcur, ubbest) > 0)
+	{
+	  /* Looks like the block is good, so use it. */
+	  grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
+	  bestlabel = label;
+	  grub_dprintf ("zfs", "Current best uberblock found in label %d\n", label);
+	}
     }
-  grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
-  zfs_unmount (data);
   grub_free (ub_array);
-  grub_free (bh);
+
+  /*
+   * We zero'd the structure to begin with.  If we never assigned to it,
+   * magic will still be zero.
+   */
+  if (!ubbest->ub_magic)
+    {
+      grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
+      zfs_unmount (data);
+      grub_free (ubbest);
+      return 0;
+    }
+
+  grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
+
+  grub_zfs_endian_t ub_endian =
+    grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
+    ? LITTLE_ENDIAN : BIG_ENDIAN;
+  err = zio_read (&ubbest->ub_rootbp, ub_endian, &osp, &ospsize, data);
+
+  if (err)
+    {
+      grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
+      zfs_unmount (data);
+      grub_free (ubbest);
+      return 0;
+    }
+
+  if (ospsize < OBJSET_PHYS_SIZE_V14)
+    {
+      grub_error (GRUB_ERR_BAD_FS, "osp too small");
+      zfs_unmount (data);
+      grub_free (osp);
+      grub_free (ubbest);
+      return 0;
+    }
+
+  /* Got the MOS. Save it at the memory addr MOS. */
+  grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode, DNODE_SIZE);
+  data->mos.endian =
+		(grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
+  grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
+
   grub_free (osp);
+  grub_free (ubbest);
 
-  return 0;
+  return (data);
 }
 
 grub_err_t
@@ -2149,33 +2263,18 @@
 static grub_err_t 
 zfs_uuid (grub_device_t device, char **uuid)
 {
-  char *nvlist;
-  int found;
   struct grub_zfs_data *data;
-  grub_uint64_t guid;
-  grub_err_t err;
-
-  *uuid = 0;
 
   data = zfs_mount (device);
   if (! data)
     return grub_errno;
 
-  err = zfs_fetch_nvlist (data, &nvlist);
-  if (err)
-    {
-      zfs_unmount (data);
-      return err;
-    }
-
-  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID, &guid);
-  if (! found)
-    return grub_errno;
-  grub_free (nvlist);
-  *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
+  *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
   zfs_unmount (data);
+
   if (! *uuid)
     return grub_errno;
+
   return GRUB_ERR_NONE;
 }
 

=== modified file 'include/grub/zfs/uberblock_impl.h'
--- include/grub/zfs/uberblock_impl.h	2010-12-01 21:55:26 +0000
+++ include/grub/zfs/uberblock_impl.h	2012-01-19 11:28:09 +0000
@@ -23,6 +23,8 @@
 #ifndef _SYS_UBERBLOCK_IMPL_H
 #define	_SYS_UBERBLOCK_IMPL_H
 
+#define UBMAX(a,b) ((a) > (b) ? (a) : (b))
+
 /*
  * The uberblock version is incremented whenever an incompatible on-disk
  * format change is made to the SPA, DMU, or ZAP.
@@ -45,16 +47,10 @@
 	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
 } uberblock_t;
 
-#define	UBERBLOCK_SIZE		(1ULL << UBERBLOCK_SHIFT)
-#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
-
-/* XXX Uberblock_phys_t is no longer in the kernel zfs */
-typedef struct uberblock_phys {
-	uberblock_t	ubp_uberblock;
-	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
-				sizeof (zio_eck_t)];
-	zio_eck_t	ubp_zec;
-} uberblock_phys_t;
-
+#define	VDEV_UBERBLOCK_SHIFT(as)	UBMAX(as, UBERBLOCK_SHIFT)
+#define	UBERBLOCK_SIZE(as)		(1ULL << VDEV_UBERBLOCK_SHIFT(as))
+
+/* Number of uberblocks that can fit in the ring at a given ashift */
+#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT(as))
 
 #endif	/* _SYS_UBERBLOCK_IMPL_H */


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-19 11:36   ` Richard Laager
@ 2012-01-22 14:18     ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-22 20:31       ` Richard Laager
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-22 14:18 UTC (permalink / raw)
  To: Richard Laager; +Cc: grub-devel, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 18895 bytes --]

I've cleaned this patch up (attached). Can you test it on these 4K FSes 
or tell me how to create one w/o really having such a disk?
On 19.01.2012 12:36, Richard Laager wrote:
>   static grub_err_t
> -uberblock_verify (uberblock_phys_t * ub, int offset)
> +uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
>   {
> -  uberblock_t *uber =&ub->ubp_uberblock;
>     grub_err_t err;
>     grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
>     zio_cksum_t zc;
> +  void *osp = NULL;
> +  grub_size_t ospsize;
> +
> +  if (uber->ub_txg<  data->label_txg)
> +    return grub_error (GRUB_ERR_BAD_FS,
> +	"ignoring partially written label: uber_txg<  label_txg");
>   
I've left this part out since outside loop already looks for the newest txg.
>     if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>         &&  grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN)>  0
> @@ -361,7 +390,19 @@
>
>     zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
>     err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
> -			     (char *) ub, UBERBLOCK_SIZE);
> +			     (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
> +
> +  if (!err)
> +    {
> +      /* Check that the data pointed by the rootbp is usable. */
> +      void *osp = NULL;
> +      grub_size_t ospsize;
> +      err = zio_read (&uber->ub_rootbp, endian,&osp,&ospsize, data);
> +      grub_free (osp);
> +
> +      if (!err&&  ospsize<  OBJSET_PHYS_SIZE_V14)
> +	return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to invalid data");
> +    }
>   
Left this one out: why would it happen that uberblock checksum is valid 
but the other data in it isn't?
>     return err;
>   }
> @@ -372,32 +413,40 @@
>    *    Success - Pointer to the best uberblock.
>    *    Failure - NULL
>    */
> -static uberblock_phys_t *
> -find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
> +static uberblock_t *
> +find_bestub (char *ub_array, struct grub_zfs_data *data)
>   {
> -  uberblock_phys_t *ubbest = NULL;
> -  int i;
> -  grub_disk_addr_t offset;
> +  const grub_disk_addr_t sector = data->vdev_phys_sector;
> +  uberblock_t *ubbest = NULL;
> +  uberblock_t *ubnext;
> +  unsigned int i, offset, pickedub = 0;
>     grub_err_t err = GRUB_ERR_NONE;
>
> -  for (i = 0; i<  (VDEV_UBERBLOCK_RING>>  VDEV_UBERBLOCK_SHIFT); i++)
> +  const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
> +  const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
> +
> +  for (i = 0; i<  UBCOUNT; i++)
>       {
> -      offset = (sector<<  SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
> -	+ (i<<  VDEV_UBERBLOCK_SHIFT);
> +      ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
> +      offset = (sector<<  SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i * UBBYTES);
>   
This is wrong with respect to alignment invariants.
> -      err = uberblock_verify (&ub_array[i], offset);
> +      err = uberblock_verify (ubnext, offset, data);
>         if (err)
>   	{
>   	  grub_errno = GRUB_ERR_NONE;
>   	  continue;
>   	}
> -      if (ubbest == NULL
> -	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
> -				&(ubbest->ubp_uberblock))>  0)
> -	ubbest =&ub_array[i];
> +      if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest)>  0)
> +	{
> +	  ubbest = ubnext;
> +	  pickedub = i;
> +	}
Skipped, it sets the variable but never uses it.
>       }
>     if (!ubbest)
>       grub_errno = err;
> +  else
> +    grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
> +		  pickedub, (unsigned long long) ubbest->ub_txg);
>   
Skipped, not needed.
>     return (ubbest);
>   }
> @@ -515,8 +564,18 @@
>   	  sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
>   	  err = grub_disk_read (data->disk, sector, 0, psize, buf);
>   	}
> +
>         if (!err)
> -	return GRUB_ERR_NONE;
> +	{
> +	  /* Check the underlying checksum before we rule this DVA as "good" */
> +	  grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop, endian)>>  40)&  0xff;
> +	  err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf, psize);
> +	  if (!err)
> +	    return GRUB_ERR_NONE;
> +	}
> +
> +      /* If read failed or checksum bad, reset the error.  Hopefully we've got
> +       * some more DVA's to try. */
Skipped, already commented on this one.
>         grub_errno = GRUB_ERR_NONE;
>       }
>
> @@ -539,12 +598,9 @@
>     unsigned int comp;
>     char *compbuf = NULL;
>     grub_err_t err;
> -  zio_cksum_t zc = bp->blk_cksum;
> -  grub_uint32_t checksum;
>
>     *buf = NULL;
>
> -  checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>  40)&  0xff;
>     comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32)&  0xff;
>     lsize = (BP_IS_HOLE(bp) ? 0 :
>   	   (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian)&  0xffff) + 1)
> @@ -580,15 +636,6 @@
>         return err;
>       }
>
> -  err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
> -  if (err)
> -    {
> -      grub_dprintf ("zfs", "incorrect checksum\n");
> -      grub_free (compbuf);
> -      *buf = NULL;
> -      return err;
> -    }
> -
>     if (comp != ZIO_COMPRESS_OFF)
>       {
>         *buf = grub_malloc (lsize);
> @@ -1864,11 +1911,9 @@
>   static grub_err_t
>   check_pool_label (struct grub_zfs_data *data)
>   {
> -  grub_uint64_t pool_state, txg = 0;
> -  char *nvlist;
> -#if 0
> -  char *nv;
> -#endif
> +  grub_uint64_t pool_state;
> +  char *nvlist;			/* for the pool */
> +  char *vdevnvlist;		/* for the vdev */
>     grub_uint64_t diskguid;
>     grub_uint64_t version;
>     int found;
> @@ -1898,7 +1943,9 @@
>       }
>     grub_dprintf ("zfs", "check 4 passed\n");
>
> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,&txg);
> +  data->label_txg = 0;
> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
> +					&data->label_txg);
Not needed. Skipped.
>     if (!found)
>       {
>         grub_free (nvlist);
> @@ -1909,7 +1956,7 @@
>     grub_dprintf ("zfs", "check 6 passed\n");
>
>     /* not an active device */
> -  if (txg == 0)
> +  if (data->label_txg == 0)
>       {
>         grub_free (nvlist);
>         return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
> @@ -1931,20 +1978,32 @@
>       {
>         grub_free (nvlist);
>         return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
> -			 "too new version %llu>  %llu",
> +			 "SPA version too new %llu>  %llu",
>   			 (unsigned long long) version,
>   			 (unsigned long long) SPA_VERSION);
>       }
>     grub_dprintf ("zfs", "check 9 passed\n");
> -#if 0
> -  if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE,&nv,
> -			   DATA_TYPE_NVLIST, NULL))
> +
> +  vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_VDEV_TREE);
> +  if (!vdevnvlist)
>       {
> -      grub_free (vdev);
> -      return (GRUB_ERR_BAD_FS);
> +      grub_free (nvlist);
> +      if (!grub_errno)
> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
> +      return grub_errno;
>       }
>     grub_dprintf ("zfs", "check 10 passed\n");
> -#endif
> +
> +  found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
> +					&data->vdev_ashift);
> +  grub_free (vdevnvlist);
> +  if (!found)
> +    {
> +      grub_free (nvlist);
> +      if (!grub_errno)
> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
> +      return grub_errno;
> +    }
>
>     found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID,&diskguid);
>     if (! found)
> @@ -1956,11 +2015,42 @@
>       }
>     grub_dprintf ("zfs", "check 11 passed\n");
>
> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,&data->pool_guid);
> +  if (! found)
> +    {
> +      grub_free (nvlist);
> +      if (! grub_errno)
> +        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
> +      return grub_errno;
> +    }
> +   grub_dprintf ("zfs", "check 12 passed\n");
> +
> +
This check is already in. Skipped.
>     grub_free (nvlist);
>
> +  grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
> +    (unsigned long long) data->pool_guid,
> +    (unsigned long long) data->pool_guid,
> +    (unsigned long long) diskguid,
> +    (unsigned long long) diskguid,
> +    (unsigned long long) data->label_txg,
> +    (unsigned long long) version,
> +    (unsigned long long) data->vdev_ashift);
> +
Useless. Skipped.
>     return GRUB_ERR_NONE;
>   }
>
> +/*
> + * vdev_label_start returns the physical disk offset (in bytes) of
> + * label "l".
> + */
> +static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
> +{
> +  return (l * sizeof (vdev_label_t) + (l<  VDEV_LABELS / 2 ?
> +				       0 : psize -
> +				       VDEV_LABELS * sizeof (vdev_label_t)));
> +}
> +
>   static void
>   zfs_unmount (struct grub_zfs_data *data)
>   {
> @@ -1979,13 +2069,13 @@
>   zfs_mount (grub_device_t dev)
>   {
>     struct grub_zfs_data *data = 0;
> -  int label = 0;
> -  uberblock_phys_t *ub_array, *ubbest = NULL;
> -  vdev_boot_header_t *bh;
> +  int label = 0, bestlabel = -1;
> +  char *ub_array;
> +  uberblock_t *ubbest;
> +  uberblock_t *ubcur = NULL;
>     void *osp = 0;
>     grub_size_t ospsize;
>     grub_err_t err;
> -  int vdevnum;
>
>     if (! dev->disk)
>       {
> @@ -1997,11 +2087,6 @@
>     if (!data)
>       return 0;
>     grub_memset (data, 0, sizeof (*data));
> -#if 0
> -  /* if it's our first time here, zero the best uberblock out */
> -  if (data->best_drive == 0&&  data->best_part == 0&&  find_best_root)
> -    grub_memset (&current_uberblock, 0, sizeof (uberblock_t));
> -#endif
>
>     data->disk = dev->disk;
>
> @@ -2012,100 +2097,129 @@
>         return 0;
>       }
>
> -  bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
> -  if (!bh)
> +  ubbest = grub_malloc (sizeof (*ubbest));
> +  if (!ubbest)
>       {
>         zfs_unmount (data);
> -      grub_free (ub_array);
>         return 0;
>       }
> -
> -  vdevnum = VDEV_LABELS;
> -
> -  /* Don't check back labels on CDROM.  */
> -  if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
> -    vdevnum = VDEV_LABELS / 2;
> -
> -  for (label = 0; ubbest == NULL&&  label<  vdevnum; label++)
> +  grub_memset (ubbest, 0, sizeof (*ubbest));
> +
> +  /* Establish some constants for where things are on the device: */
> +
> +  /*
> +   *  Some eltorito stacks don't give us a size and we end up setting the
> +   * size to MAXUINT, further some of these devices stop working once a single
> +   * read past the end has been issued. Checking for a maximum part_length and
> +   * skipping the backup labels at the end of the slice/partition/device
> +   * avoids breaking down on such devices.
> +   */
> +  const int vdevnum =
> +    grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
> +    VDEV_LABELS / 2 : VDEV_LABELS;
> +
> +  /* Size in bytes of the device (disk or partition) aligned to label size */
> +  grub_uint64_t device_size =
> +	grub_disk_get_size (dev->disk)<<  GRUB_DISK_SECTOR_BITS;
> +  const grub_uint64_t alignedbytes =
> +    P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
> +
> +  for (label = 0; label<  vdevnum; label++)
>       {
> -      grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
> -      grub_dprintf ("zfs", "label %d\n", label);
> -
> -      data->vdev_phys_sector
> -	= label * (sizeof (vdev_label_t)>>  SPA_MINBLOCKSHIFT)
> -	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE)>>  SPA_MINBLOCKSHIFT)
> -	+ (label<  VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
> -	   - VDEV_LABELS * (sizeof (vdev_label_t)>>  SPA_MINBLOCKSHIFT));
> +      grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
> +      grub_uint64_t labelstart = labelstartbytes>>  GRUB_DISK_SECTOR_BITS;
> +
> +      grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
> +		    label, (unsigned long long) labelstart,
> +		    (unsigned long long) labelstartbytes);
> +
> +      data->vdev_phys_sector = labelstart +
> +		((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE)>>  GRUB_DISK_SECTOR_BITS);
> +
> +      err = check_pool_label (data);
> +      if (err)
> +	{
> +	  grub_dprintf ("zfs", "error checking label %d\n", label);
> +	  grub_errno = GRUB_ERR_NONE;
> +	  continue;
> +	}
>   
This seems like a lot of moving around but I don't see any real change 
other than aligning the size down.
>         /* Read in the uberblock ring (128K). */
> -      err = grub_disk_read (data->disk, data->vdev_phys_sector
> -			    + (VDEV_PHYS_SIZE>>  SPA_MINBLOCKSHIFT),
> -			    0, VDEV_UBERBLOCK_RING, (char *) ub_array);
> -      if (err)
> -	{
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -      grub_dprintf ("zfs", "label ok %d\n", label);
> -
> -      ubbest = find_bestub (ub_array, data->vdev_phys_sector);
> -      if (!ubbest)
> -	{
> -	  grub_dprintf ("zfs", "No uberblock found\n");
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -      ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic,
> -				     LITTLE_ENDIAN) == UBERBLOCK_MAGIC
> -		   ? LITTLE_ENDIAN : BIG_ENDIAN);
> -      err = zio_read (&ubbest->ubp_uberblock.ub_rootbp,
> -		      ub_endian,
> -		&osp,&ospsize, data);
> -      if (err)
> -	{
> -	  grub_dprintf ("zfs", "couldn't zio_read\n");
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -
> -      if (ospsize<  OBJSET_PHYS_SIZE_V14)
> -	{
> -	  grub_dprintf ("zfs", "osp too small\n");
> -	  grub_free (osp);
> -	  continue;
> -	}
> -      grub_dprintf ("zfs", "ubbest %p\n", ubbest);
> -
> -      err = check_pool_label (data);
> -      if (err)
> -	{
> -	  grub_errno = GRUB_ERR_NONE;
> -	  continue;
> -	}
> -#if 0
> -      if (find_best_root&&
> -	  vdev_uberblock_compare (&ubbest->ubp_uberblock,
> -				&(current_uberblock))<= 0)
> -	continue;
> -#endif
> -      /* Got the MOS. Save it at the memory addr MOS. */
> -      grub_memmove (&(data->mos.dn),&((objset_phys_t *) osp)->os_meta_dnode,
> -		    DNODE_SIZE);
> -      data->mos.endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian)>>  63)&  1;
> -      grub_memmove (&(data->current_uberblock),
> -		&ubbest->ubp_uberblock, sizeof (uberblock_t));
> -      grub_free (ub_array);
> -      grub_free (bh);
> -      grub_free (osp);
> -      return data;
> +      err = grub_disk_read (data->disk,
> +			    data->vdev_phys_sector +
> +			    (VDEV_PHYS_SIZE>>  GRUB_DISK_SECTOR_BITS),
> +			    0, VDEV_UBERBLOCK_RING, ub_array);
> +      if (err)
> +	{
> +	  grub_dprintf ("zfs", "error reading uberblock ring for label %d\n", label);
> +	  grub_errno = GRUB_ERR_NONE;
> +	  continue;
> +	}
> +
> +      ubcur = find_bestub (ub_array, data);
> +      if (!ubcur)
> +	{
> +	  grub_dprintf ("zfs", "No good uberblocks found in label %d\n", label);
> +	  grub_errno = GRUB_ERR_NONE;
> +	  continue;
> +	}
> +
> +      if (vdev_uberblock_compare (ubcur, ubbest)>  0)
> +	{
> +	  /* Looks like the block is good, so use it. */
> +	  grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
> +	  bestlabel = label;
> +	  grub_dprintf ("zfs", "Current best uberblock found in label %d\n", label);
> +	}
>       }
> -  grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
> -  zfs_unmount (data);
>     grub_free (ub_array);
> -  grub_free (bh);
> +
> +  /*
> +   * We zero'd the structure to begin with.  If we never assigned to it,
> +   * magic will still be zero.
> +   */
> +  if (!ubbest->ub_magic)
> +    {
> +      grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
> +      zfs_unmount (data);
> +      grub_free (ubbest);
> +      return 0;
> +    }
> +
> +  grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
> +
> +  grub_zfs_endian_t ub_endian =
> +    grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
> +    ? LITTLE_ENDIAN : BIG_ENDIAN;
> +  err = zio_read (&ubbest->ub_rootbp, ub_endian,&osp,&ospsize, data);
> +
> +  if (err)
> +    {
> +      grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
> +      zfs_unmount (data);
> +      grub_free (ubbest);
> +      return 0;
> +    }
> +
> +  if (ospsize<  OBJSET_PHYS_SIZE_V14)
> +    {
> +      grub_error (GRUB_ERR_BAD_FS, "osp too small");
> +      zfs_unmount (data);
> +      grub_free (osp);
> +      grub_free (ubbest);
> +      return 0;
> +    }
> +
> +  /* Got the MOS. Save it at the memory addr MOS. */
> +  grub_memmove (&(data->mos.dn),&((objset_phys_t *) osp)->os_meta_dnode, DNODE_SIZE);
> +  data->mos.endian =
> +		(grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian)>>  63)&  1;
> +  grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
> +
>     grub_free (osp);
> +  grub_free (ubbest);
>
> -  return 0;
> +  return (data);
>   }
>
>   grub_err_t
> @@ -2149,33 +2263,18 @@
>   static grub_err_t
>   zfs_uuid (grub_device_t device, char **uuid)
>   {
> -  char *nvlist;
> -  int found;
>     struct grub_zfs_data *data;
> -  grub_uint64_t guid;
> -  grub_err_t err;
> -
> -  *uuid = 0;
>
>     data = zfs_mount (device);
>     if (! data)
>       return grub_errno;
>
> -  err = zfs_fetch_nvlist (data,&nvlist);
> -  if (err)
> -    {
> -      zfs_unmount (data);
> -      return err;
> -    }
> -
> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,&guid);
> -  if (! found)
> -    return grub_errno;
> -  grub_free (nvlist);
> -  *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
> +  *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
>     zfs_unmount (data);
> +
>     if (! *uuid)
>       return grub_errno;
> +
>     return GRUB_ERR_NONE;
>   }
>
>
> === modified file 'include/grub/zfs/uberblock_impl.h'
> --- include/grub/zfs/uberblock_impl.h	2010-12-01 21:55:26 +0000
> +++ include/grub/zfs/uberblock_impl.h	2012-01-19 11:28:09 +0000
> @@ -23,6 +23,8 @@
>   #ifndef _SYS_UBERBLOCK_IMPL_H
>   #define	_SYS_UBERBLOCK_IMPL_H
>
> +#define UBMAX(a,b) ((a)>  (b) ? (a) : (b))
> +
>   /*
>    * The uberblock version is incremented whenever an incompatible on-disk
>    * format change is made to the SPA, DMU, or ZAP.
> @@ -45,16 +47,10 @@
>   	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
>   } uberblock_t;
>
> -#define	UBERBLOCK_SIZE		(1ULL<<  UBERBLOCK_SHIFT)
> -#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
> -
> -/* XXX Uberblock_phys_t is no longer in the kernel zfs */
> -typedef struct uberblock_phys {
> -	uberblock_t	ubp_uberblock;
> -	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
> -				sizeof (zio_eck_t)];
> -	zio_eck_t	ubp_zec;
> -} uberblock_phys_t;
> -
> +#define	VDEV_UBERBLOCK_SHIFT(as)	UBMAX(as, UBERBLOCK_SHIFT)
> +#define	UBERBLOCK_SIZE(as)		(1ULL<<  VDEV_UBERBLOCK_SHIFT(as))
> +
> +/* Number of uberblocks that can fit in the ring at a given ashift */
> +#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING>>  VDEV_UBERBLOCK_SHIFT(as))
>
>   #endif	/* _SYS_UBERBLOCK_IMPL_H */
>


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko


[-- Attachment #2: zfs.diff --]
[-- Type: text/x-diff, Size: 6079 bytes --]

=== modified file 'grub-core/fs/zfs/zfs.c'
--- grub-core/fs/zfs/zfs.c	2012-01-14 14:44:34 +0000
+++ grub-core/fs/zfs/zfs.c	2012-01-22 13:44:45 +0000
@@ -192,6 +192,7 @@
   enum { DEVICE_LEAF, DEVICE_MIRROR, DEVICE_RAIDZ } type;
   grub_uint64_t id;
   grub_uint64_t guid;
+  unsigned ashift;
 
   /* Valid only for non-leafs.  */
   unsigned n_children;
@@ -199,7 +200,6 @@
 
   /* Valid only for RAIDZ.  */
   unsigned nparity;
-  unsigned ashift;
 
   /* Valid only for leaf devices.  */
   grub_device_t dev;
@@ -466,12 +466,12 @@
  * Three pieces of information are needed to verify an uberblock: the magic
  * number, the version number, and the checksum.
  *
- * Currently Implemented: version number, magic number
- * Need to Implement: checksum
+ * Currently Implemented: version number, magic number, checksum
  *
  */
 static grub_err_t
-uberblock_verify (uberblock_phys_t * ub, grub_uint64_t offset)
+uberblock_verify (uberblock_phys_t * ub, grub_uint64_t offset,
+		  grub_size_t s)
 {
   uberblock_t *uber = &ub->ubp_uberblock;
   grub_err_t err;
@@ -498,7 +498,7 @@
 
   zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
   err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
-			     (char *) ub, UBERBLOCK_SIZE);
+			     (char *) ub, s);
 
   return err;
 }
@@ -510,28 +510,37 @@
  *    Failure - NULL
  */
 static uberblock_phys_t *
-find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
+find_bestub (uberblock_phys_t * ub_array,
+	     const struct grub_zfs_device_desc *desc)
 {
-  uberblock_phys_t *ubbest = NULL;
+  uberblock_phys_t *ubbest = NULL, *ubptr;
   int i;
   grub_disk_addr_t offset;
   grub_err_t err = GRUB_ERR_NONE;
-
-  for (i = 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++)
+  int ub_shift;
+
+  ub_shift = desc->ashift;
+  if (ub_shift < VDEV_UBERBLOCK_SHIFT)
+    ub_shift = VDEV_UBERBLOCK_SHIFT;
+
+  for (i = 0; i < (VDEV_UBERBLOCK_RING >> ub_shift); i++)
     {
-      offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
-	+ (i << VDEV_UBERBLOCK_SHIFT);
+      offset = (desc->vdev_phys_sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
+	+ (i << ub_shift);
 
-      err = uberblock_verify (&ub_array[i], offset);
+      ubptr = (uberblock_phys_t *) ((grub_properly_aligned_t *) ub_array
+				    + ((i << ub_shift)
+				       / sizeof (grub_properly_aligned_t)));
+      err = uberblock_verify (ubptr, offset, 1 << ub_shift);
       if (err)
 	{
 	  grub_errno = GRUB_ERR_NONE;
 	  continue;
 	}
       if (ubbest == NULL 
-	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
+	  || vdev_uberblock_compare (&(ubptr->ubp_uberblock),
 				     &(ubbest->ubp_uberblock)) > 0)
-	ubbest = &ub_array[i];
+	ubbest = ubptr;
     }
   if (!ubbest)
     grub_errno = err;
@@ -582,7 +591,8 @@
 		     const char *nvlist,
 		     struct grub_zfs_device_desc *fill,
 		     struct grub_zfs_device_desc *insert,
-		     int *inserted)
+		     int *inserted,
+		     unsigned ashift)
 {
   char *type;
 
@@ -603,6 +613,19 @@
       return grub_error (GRUB_ERR_BAD_FS, "couldn't find vdev id");
     }
 
+  {
+    grub_uint64_t par;
+    if (grub_zfs_nvlist_lookup_uint64 (nvlist, "ashift", &par))
+      fill->ashift = par;
+    else if (ashift != 0xffffffff)
+      fill->ashift = ashift;
+    else
+      {
+	grub_free (type);
+	return grub_error (GRUB_ERR_BAD_FS, "couldn't find ashift");
+      }
+  }
+
   if (grub_strcmp (type, VDEV_TYPE_DISK) == 0
       || grub_strcmp (type, VDEV_TYPE_FILE) == 0)
     {
@@ -616,6 +639,7 @@
 	  fill->original = insert->original;
 	  if (!data->device_original)
 	    data->device_original = fill;
+	  insert->ashift = fill->ashift;
 	  *inserted = 1;
 	}
 
@@ -641,12 +665,6 @@
 	      return grub_error (GRUB_ERR_BAD_FS, "couldn't find raidz parity");
 	    }
 	  fill->nparity = par;
-	  if (!grub_zfs_nvlist_lookup_uint64 (nvlist, "ashift", &par))
-	    {
-	      grub_free (type);
-	      return grub_error (GRUB_ERR_BAD_FS, "couldn't find raidz ashift");
-	    }
-	  fill->ashift = par;
 	}
 
       nelm = grub_zfs_nvlist_lookup_nvlist_array_get_nelm (nvlist,
@@ -675,7 +693,7 @@
 	    (nvlist, ZPOOL_CONFIG_CHILDREN, i);
 
 	  err = fill_vdev_info_real (data, child, &fill->children[i], insert,
-				     inserted);
+				     inserted, fill->ashift);
 
 	  grub_free (child);
 
@@ -710,7 +728,7 @@
   for (i = 0; i < data->n_devices_attached; i++)
     if (data->devices_attached[i].id == id)
       return fill_vdev_info_real (data, nvlist, &data->devices_attached[i],
-				  diskdesc, inserted);
+				  diskdesc, inserted, 0xffffffff);
 
   data->n_devices_attached++;
   if (data->n_devices_attached > data->n_devices_allocated)
@@ -733,7 +751,7 @@
 
   return fill_vdev_info_real (data, nvlist,
 			      &data->devices_attached[data->n_devices_attached - 1],
-			      diskdesc, inserted);
+			      diskdesc, inserted, 0xffffffff);
 }
 
 /*
@@ -908,7 +926,8 @@
       desc.vdev_phys_sector
 	= label * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT)
 	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT)
-	+ (label < VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
+	+ (label < VDEV_LABELS / 2 ? 0 : 
+	   ALIGN_DOWN (grub_disk_get_size (dev->disk), sizeof (vdev_label_t))
 	   - VDEV_LABELS * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT));
 
       /* Read in the uberblock ring (128K). */
@@ -922,7 +941,14 @@
 	}
       grub_dprintf ("zfs", "label ok %d\n", label);
 
-      ubbest = find_bestub (ub_array, desc.vdev_phys_sector);
+      err = check_pool_label (data, &desc, inserted);
+      if (err || !*inserted)
+	{
+	  grub_errno = GRUB_ERR_NONE;
+	  continue;
+	}
+
+      ubbest = find_bestub (ub_array, &desc);
       if (!ubbest)
 	{
 	  grub_dprintf ("zfs", "No uberblock found\n");
@@ -936,12 +962,6 @@
 	grub_memmove (&(data->current_uberblock),
 		      &ubbest->ubp_uberblock, sizeof (uberblock_t));
 
-      err = check_pool_label (data, &desc, inserted);
-      if (err)
-	{
-	  grub_errno = GRUB_ERR_NONE;
-	  continue;
-	}
 #if 0
       if (find_best_root &&
 	  vdev_uberblock_compare (&ubbest->ubp_uberblock,


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-22 14:18     ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-01-22 20:31       ` Richard Laager
  2012-01-24  7:12       ` Richard Laager
  2012-01-27 19:04       ` Zachary Bedell
  2 siblings, 0 replies; 26+ messages in thread
From: Richard Laager @ 2012-01-22 20:31 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

On Sun, 2012-01-22 at 15:18 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
wrote:
> I've cleaned this patch up (attached). Can you test it on these 4K FSes 
> or tell me how to create one w/o really having such a disk?

I'll hopefully be able to do a test install today or tomorrow.

With ZFS on Linux, you can create one with:
 zpool create -o ashift=12 POOLNAME DEVICE

This page covers FreeBSD. Search for the "3. ZFS" header:
 http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-22 14:18     ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-22 20:31       ` Richard Laager
@ 2012-01-24  7:12       ` Richard Laager
  2012-01-27 19:04       ` Zachary Bedell
  2 siblings, 0 replies; 26+ messages in thread
From: Richard Laager @ 2012-01-24  7:12 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 87 bytes --]

I tested in a virtual machine on a pool with ashift=12 set. It works!

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-22 14:18     ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-22 20:31       ` Richard Laager
  2012-01-24  7:12       ` Richard Laager
@ 2012-01-27 19:04       ` Zachary Bedell
  2012-01-27 22:22         ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-28  2:50         ` Richard Laager
  2 siblings, 2 replies; 26+ messages in thread
From: Zachary Bedell @ 2012-01-27 19:04 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Richard Laager

Thanks for plodding through my mess.  It looks like the changes are committed to trunk at this point, right?  I'm trying to get a test together, but I've had to forward port the other changes necessary to support build on Linux.  I think I have those changes done in a such a way as they won't conflict with other platforms.  I'll try submitting those changes again once I'm sure everything is working.  

I think I still have access to one of the pools that caused issues.  In any case, I have a pretty good memory of what the actual on-disk issues were, and I'm pretty sure I can take a hex editor to a disk image and break them appropriately.  Assuming I'm successful with that, I'll try to make minimal image files that can be posted somewhere for testing.

The pools in question were originally subject to an almost contrived level of abuse.  I was testing running swap on a ZVOL, and running into repeated kernel panics in the ZFSonLinux code when the pool was under heavy I/O.  Not a recipe for good data integrity...  In any case, the pools were always such that ZFS driver could import them.  In the majority of cases, they silently self-healed, in a few (especially where the good label was at the end of the device), a `zpool scrub` was necessary.  My metric for this stuff has been if the ZFS tools report errors, I'm not as worried if Grub can read the pool, but if ZFS says okay & Grub breaks, that worries me.

Few comments below...

On Jan 22, 2012, at 9:18 AM, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> I've cleaned this patch up (attached). Can you test it on these 4K FSes or tell me how to create one w/o really having such a disk?
> On 19.01.2012 12:36, Richard Laager wrote:
>>  static grub_err_t
>> -uberblock_verify (uberblock_phys_t * ub, int offset)
>> +uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
>>  {
>> -  uberblock_t *uber =&ub->ubp_uberblock;
>>    grub_err_t err;
>>    grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
>>    zio_cksum_t zc;
>> +  void *osp = NULL;
>> +  grub_size_t ospsize;
>> +
>> +  if (uber->ub_txg<  data->label_txg)
>> +    return grub_error (GRUB_ERR_BAD_FS,
>> +	"ignoring partially written label: uber_txg<  label_txg");
>>  
> I've left this part out since outside loop already looks for the newest txg.

Gotta double check this, but I think you're right.  Was a duplicate check.

>>    if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>>        &&  grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN)>  0
>> @@ -361,7 +390,19 @@
>> 
>>    zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
>>    err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
>> -			     (char *) ub, UBERBLOCK_SIZE);
>> +			     (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
>> +
>> +  if (!err)
>> +    {
>> +      /* Check that the data pointed by the rootbp is usable. */
>> +      void *osp = NULL;
>> +      grub_size_t ospsize;
>> +      err = zio_read (&uber->ub_rootbp, endian,&osp,&ospsize, data);
>> +      grub_free (osp);
>> +
>> +      if (!err&&  ospsize<  OBJSET_PHYS_SIZE_V14)
>> +	return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to invalid data");
>> +    }
>>  
> Left this one out: why would it happen that uberblock checksum is valid but the other data in it isn't?

This one I'm a little worried about.  Had a case where the pool had a technically valid UB (and it was the "best" one according to what I understand to be the proper logic).  The DVA in the UB was right, but the data that DVA pointed to was junk.  Best I could tell, the original Grub checking mechanism would drop to alt-DVA's if the physical read failed, but it wouldn't if the checksum of data pointed by the DVA (but not the UB itself's checksum) was bad.  At that point, Grub would toss the entire UB, and unless it happened to get lucky and find another one that was good enough, it would lose the pool.

I'm going to try to get my head back into this over the weekend.  I got pulled away from ZFS/Linux/Gentoo/Grub stuff for a while, and have been spending this week bringing all the Gentoo overlay stuff I built up to date with latest of everything.  I've got Grub building & detecting ZFS at this point, so I just need to figure out why Dracut broke, and I think I'm fully back in business for testing.

Thanks again,
Zac

-- end of comments --

>>    return err;
>>  }
>> @@ -372,32 +413,40 @@
>>   *    Success - Pointer to the best uberblock.
>>   *    Failure - NULL
>>   */
>> -static uberblock_phys_t *
>> -find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
>> +static uberblock_t *
>> +find_bestub (char *ub_array, struct grub_zfs_data *data)
>>  {
>> -  uberblock_phys_t *ubbest = NULL;
>> -  int i;
>> -  grub_disk_addr_t offset;
>> +  const grub_disk_addr_t sector = data->vdev_phys_sector;
>> +  uberblock_t *ubbest = NULL;
>> +  uberblock_t *ubnext;
>> +  unsigned int i, offset, pickedub = 0;
>>    grub_err_t err = GRUB_ERR_NONE;
>> 
>> -  for (i = 0; i<  (VDEV_UBERBLOCK_RING>>  VDEV_UBERBLOCK_SHIFT); i++)
>> +  const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
>> +  const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
>> +
>> +  for (i = 0; i<  UBCOUNT; i++)
>>      {
>> -      offset = (sector<<  SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
>> -	+ (i<<  VDEV_UBERBLOCK_SHIFT);
>> +      ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
>> +      offset = (sector<<  SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i * UBBYTES);
>>  
> This is wrong with respect to alignment invariants.
>> -      err = uberblock_verify (&ub_array[i], offset);
>> +      err = uberblock_verify (ubnext, offset, data);
>>        if (err)
>>  	{
>>  	  grub_errno = GRUB_ERR_NONE;
>>  	  continue;
>>  	}
>> -      if (ubbest == NULL
>> -	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
>> -				&(ubbest->ubp_uberblock))>  0)
>> -	ubbest =&ub_array[i];
>> +      if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest)>  0)
>> +	{
>> +	  ubbest = ubnext;
>> +	  pickedub = i;
>> +	}
> Skipped, it sets the variable but never uses it.
>>      }
>>    if (!ubbest)
>>      grub_errno = err;
>> +  else
>> +    grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
>> +		  pickedub, (unsigned long long) ubbest->ub_txg);
>>  
> Skipped, not needed.
>>    return (ubbest);
>>  }
>> @@ -515,8 +564,18 @@
>>  	  sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
>>  	  err = grub_disk_read (data->disk, sector, 0, psize, buf);
>>  	}
>> +
>>        if (!err)
>> -	return GRUB_ERR_NONE;
>> +	{
>> +	  /* Check the underlying checksum before we rule this DVA as "good" */
>> +	  grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop, endian)>>  40)&  0xff;
>> +	  err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf, psize);
>> +	  if (!err)
>> +	    return GRUB_ERR_NONE;
>> +	}
>> +
>> +      /* If read failed or checksum bad, reset the error.  Hopefully we've got
>> +       * some more DVA's to try. */
> Skipped, already commented on this one.
>>        grub_errno = GRUB_ERR_NONE;
>>      }
>> 
>> @@ -539,12 +598,9 @@
>>    unsigned int comp;
>>    char *compbuf = NULL;
>>    grub_err_t err;
>> -  zio_cksum_t zc = bp->blk_cksum;
>> -  grub_uint32_t checksum;
>> 
>>    *buf = NULL;
>> 
>> -  checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>  40)&  0xff;
>>    comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32)&  0xff;
>>    lsize = (BP_IS_HOLE(bp) ? 0 :
>>  	   (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian)&  0xffff) + 1)
>> @@ -580,15 +636,6 @@
>>        return err;
>>      }
>> 
>> -  err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
>> -  if (err)
>> -    {
>> -      grub_dprintf ("zfs", "incorrect checksum\n");
>> -      grub_free (compbuf);
>> -      *buf = NULL;
>> -      return err;
>> -    }
>> -
>>    if (comp != ZIO_COMPRESS_OFF)
>>      {
>>        *buf = grub_malloc (lsize);
>> @@ -1864,11 +1911,9 @@
>>  static grub_err_t
>>  check_pool_label (struct grub_zfs_data *data)
>>  {
>> -  grub_uint64_t pool_state, txg = 0;
>> -  char *nvlist;
>> -#if 0
>> -  char *nv;
>> -#endif
>> +  grub_uint64_t pool_state;
>> +  char *nvlist;			/* for the pool */
>> +  char *vdevnvlist;		/* for the vdev */
>>    grub_uint64_t diskguid;
>>    grub_uint64_t version;
>>    int found;
>> @@ -1898,7 +1943,9 @@
>>      }
>>    grub_dprintf ("zfs", "check 4 passed\n");
>> 
>> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,&txg);
>> +  data->label_txg = 0;
>> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
>> +					&data->label_txg);
> Not needed. Skipped.
>>    if (!found)
>>      {
>>        grub_free (nvlist);
>> @@ -1909,7 +1956,7 @@
>>    grub_dprintf ("zfs", "check 6 passed\n");
>> 
>>    /* not an active device */
>> -  if (txg == 0)
>> +  if (data->label_txg == 0)
>>      {
>>        grub_free (nvlist);
>>        return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
>> @@ -1931,20 +1978,32 @@
>>      {
>>        grub_free (nvlist);
>>        return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
>> -			 "too new version %llu>  %llu",
>> +			 "SPA version too new %llu>  %llu",
>>  			 (unsigned long long) version,
>>  			 (unsigned long long) SPA_VERSION);
>>      }
>>    grub_dprintf ("zfs", "check 9 passed\n");
>> -#if 0
>> -  if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE,&nv,
>> -			   DATA_TYPE_NVLIST, NULL))
>> +
>> +  vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_VDEV_TREE);
>> +  if (!vdevnvlist)
>>      {
>> -      grub_free (vdev);
>> -      return (GRUB_ERR_BAD_FS);
>> +      grub_free (nvlist);
>> +      if (!grub_errno)
>> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
>> +      return grub_errno;
>>      }
>>    grub_dprintf ("zfs", "check 10 passed\n");
>> -#endif
>> +
>> +  found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
>> +					&data->vdev_ashift);
>> +  grub_free (vdevnvlist);
>> +  if (!found)
>> +    {
>> +      grub_free (nvlist);
>> +      if (!grub_errno)
>> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
>> +      return grub_errno;
>> +    }
>> 
>>    found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID,&diskguid);
>>    if (! found)
>> @@ -1956,11 +2015,42 @@
>>      }
>>    grub_dprintf ("zfs", "check 11 passed\n");
>> 
>> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,&data->pool_guid);
>> +  if (! found)
>> +    {
>> +      grub_free (nvlist);
>> +      if (! grub_errno)
>> +        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
>> +      return grub_errno;
>> +    }
>> +   grub_dprintf ("zfs", "check 12 passed\n");
>> +
>> +
> This check is already in. Skipped.
>>    grub_free (nvlist);
>> 
>> +  grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
>> +    (unsigned long long) data->pool_guid,
>> +    (unsigned long long) data->pool_guid,
>> +    (unsigned long long) diskguid,
>> +    (unsigned long long) diskguid,
>> +    (unsigned long long) data->label_txg,
>> +    (unsigned long long) version,
>> +    (unsigned long long) data->vdev_ashift);
>> +
> Useless. Skipped.
>>    return GRUB_ERR_NONE;
>>  }
>> 
>> +/*
>> + * vdev_label_start returns the physical disk offset (in bytes) of
>> + * label "l".
>> + */
>> +static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
>> +{
>> +  return (l * sizeof (vdev_label_t) + (l<  VDEV_LABELS / 2 ?
>> +				       0 : psize -
>> +				       VDEV_LABELS * sizeof (vdev_label_t)));
>> +}
>> +
>>  static void
>>  zfs_unmount (struct grub_zfs_data *data)
>>  {
>> @@ -1979,13 +2069,13 @@
>>  zfs_mount (grub_device_t dev)
>>  {
>>    struct grub_zfs_data *data = 0;
>> -  int label = 0;
>> -  uberblock_phys_t *ub_array, *ubbest = NULL;
>> -  vdev_boot_header_t *bh;
>> +  int label = 0, bestlabel = -1;
>> +  char *ub_array;
>> +  uberblock_t *ubbest;
>> +  uberblock_t *ubcur = NULL;
>>    void *osp = 0;
>>    grub_size_t ospsize;
>>    grub_err_t err;
>> -  int vdevnum;
>> 
>>    if (! dev->disk)
>>      {
>> @@ -1997,11 +2087,6 @@
>>    if (!data)
>>      return 0;
>>    grub_memset (data, 0, sizeof (*data));
>> -#if 0
>> -  /* if it's our first time here, zero the best uberblock out */
>> -  if (data->best_drive == 0&&  data->best_part == 0&&  find_best_root)
>> -    grub_memset (&current_uberblock, 0, sizeof (uberblock_t));
>> -#endif
>> 
>>    data->disk = dev->disk;
>> 
>> @@ -2012,100 +2097,129 @@
>>        return 0;
>>      }
>> 
>> -  bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
>> -  if (!bh)
>> +  ubbest = grub_malloc (sizeof (*ubbest));
>> +  if (!ubbest)
>>      {
>>        zfs_unmount (data);
>> -      grub_free (ub_array);
>>        return 0;
>>      }
>> -
>> -  vdevnum = VDEV_LABELS;
>> -
>> -  /* Don't check back labels on CDROM.  */
>> -  if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
>> -    vdevnum = VDEV_LABELS / 2;
>> -
>> -  for (label = 0; ubbest == NULL&&  label<  vdevnum; label++)
>> +  grub_memset (ubbest, 0, sizeof (*ubbest));
>> +
>> +  /* Establish some constants for where things are on the device: */
>> +
>> +  /*
>> +   *  Some eltorito stacks don't give us a size and we end up setting the
>> +   * size to MAXUINT, further some of these devices stop working once a single
>> +   * read past the end has been issued. Checking for a maximum part_length and
>> +   * skipping the backup labels at the end of the slice/partition/device
>> +   * avoids breaking down on such devices.
>> +   */
>> +  const int vdevnum =
>> +    grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
>> +    VDEV_LABELS / 2 : VDEV_LABELS;
>> +
>> +  /* Size in bytes of the device (disk or partition) aligned to label size */
>> +  grub_uint64_t device_size =
>> +	grub_disk_get_size (dev->disk)<<  GRUB_DISK_SECTOR_BITS;
>> +  const grub_uint64_t alignedbytes =
>> +    P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
>> +
>> +  for (label = 0; label<  vdevnum; label++)
>>      {
>> -      grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
>> -      grub_dprintf ("zfs", "label %d\n", label);
>> -
>> -      data->vdev_phys_sector
>> -	= label * (sizeof (vdev_label_t)>>  SPA_MINBLOCKSHIFT)
>> -	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE)>>  SPA_MINBLOCKSHIFT)
>> -	+ (label<  VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
>> -	   - VDEV_LABELS * (sizeof (vdev_label_t)>>  SPA_MINBLOCKSHIFT));
>> +      grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
>> +      grub_uint64_t labelstart = labelstartbytes>>  GRUB_DISK_SECTOR_BITS;
>> +
>> +      grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
>> +		    label, (unsigned long long) labelstart,
>> +		    (unsigned long long) labelstartbytes);
>> +
>> +      data->vdev_phys_sector = labelstart +
>> +		((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE)>>  GRUB_DISK_SECTOR_BITS);
>> +
>> +      err = check_pool_label (data);
>> +      if (err)
>> +	{
>> +	  grub_dprintf ("zfs", "error checking label %d\n", label);
>> +	  grub_errno = GRUB_ERR_NONE;
>> +	  continue;
>> +	}
>>  
> This seems like a lot of moving around but I don't see any real change other than aligning the size down.
>>        /* Read in the uberblock ring (128K). */
>> -      err = grub_disk_read (data->disk, data->vdev_phys_sector
>> -			    + (VDEV_PHYS_SIZE>>  SPA_MINBLOCKSHIFT),
>> -			    0, VDEV_UBERBLOCK_RING, (char *) ub_array);
>> -      if (err)
>> -	{
>> -	  grub_errno = GRUB_ERR_NONE;
>> -	  continue;
>> -	}
>> -      grub_dprintf ("zfs", "label ok %d\n", label);
>> -
>> -      ubbest = find_bestub (ub_array, data->vdev_phys_sector);
>> -      if (!ubbest)
>> -	{
>> -	  grub_dprintf ("zfs", "No uberblock found\n");
>> -	  grub_errno = GRUB_ERR_NONE;
>> -	  continue;
>> -	}
>> -      ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic,
>> -				     LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>> -		   ? LITTLE_ENDIAN : BIG_ENDIAN);
>> -      err = zio_read (&ubbest->ubp_uberblock.ub_rootbp,
>> -		      ub_endian,
>> -		&osp,&ospsize, data);
>> -      if (err)
>> -	{
>> -	  grub_dprintf ("zfs", "couldn't zio_read\n");
>> -	  grub_errno = GRUB_ERR_NONE;
>> -	  continue;
>> -	}
>> -
>> -      if (ospsize<  OBJSET_PHYS_SIZE_V14)
>> -	{
>> -	  grub_dprintf ("zfs", "osp too small\n");
>> -	  grub_free (osp);
>> -	  continue;
>> -	}
>> -      grub_dprintf ("zfs", "ubbest %p\n", ubbest);
>> -
>> -      err = check_pool_label (data);
>> -      if (err)
>> -	{
>> -	  grub_errno = GRUB_ERR_NONE;
>> -	  continue;
>> -	}
>> -#if 0
>> -      if (find_best_root&&
>> -	  vdev_uberblock_compare (&ubbest->ubp_uberblock,
>> -				&(current_uberblock))<= 0)
>> -	continue;
>> -#endif
>> -      /* Got the MOS. Save it at the memory addr MOS. */
>> -      grub_memmove (&(data->mos.dn),&((objset_phys_t *) osp)->os_meta_dnode,
>> -		    DNODE_SIZE);
>> -      data->mos.endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian)>>  63)&  1;
>> -      grub_memmove (&(data->current_uberblock),
>> -		&ubbest->ubp_uberblock, sizeof (uberblock_t));
>> -      grub_free (ub_array);
>> -      grub_free (bh);
>> -      grub_free (osp);
>> -      return data;
>> +      err = grub_disk_read (data->disk,
>> +			    data->vdev_phys_sector +
>> +			    (VDEV_PHYS_SIZE>>  GRUB_DISK_SECTOR_BITS),
>> +			    0, VDEV_UBERBLOCK_RING, ub_array);
>> +      if (err)
>> +	{
>> +	  grub_dprintf ("zfs", "error reading uberblock ring for label %d\n", label);
>> +	  grub_errno = GRUB_ERR_NONE;
>> +	  continue;
>> +	}
>> +
>> +      ubcur = find_bestub (ub_array, data);
>> +      if (!ubcur)
>> +	{
>> +	  grub_dprintf ("zfs", "No good uberblocks found in label %d\n", label);
>> +	  grub_errno = GRUB_ERR_NONE;
>> +	  continue;
>> +	}
>> +
>> +      if (vdev_uberblock_compare (ubcur, ubbest)>  0)
>> +	{
>> +	  /* Looks like the block is good, so use it. */
>> +	  grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
>> +	  bestlabel = label;
>> +	  grub_dprintf ("zfs", "Current best uberblock found in label %d\n", label);
>> +	}
>>      }
>> -  grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
>> -  zfs_unmount (data);
>>    grub_free (ub_array);
>> -  grub_free (bh);
>> +
>> +  /*
>> +   * We zero'd the structure to begin with.  If we never assigned to it,
>> +   * magic will still be zero.
>> +   */
>> +  if (!ubbest->ub_magic)
>> +    {
>> +      grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
>> +      zfs_unmount (data);
>> +      grub_free (ubbest);
>> +      return 0;
>> +    }
>> +
>> +  grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
>> +
>> +  grub_zfs_endian_t ub_endian =
>> +    grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>> +    ? LITTLE_ENDIAN : BIG_ENDIAN;
>> +  err = zio_read (&ubbest->ub_rootbp, ub_endian,&osp,&ospsize, data);
>> +
>> +  if (err)
>> +    {
>> +      grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
>> +      zfs_unmount (data);
>> +      grub_free (ubbest);
>> +      return 0;
>> +    }
>> +
>> +  if (ospsize<  OBJSET_PHYS_SIZE_V14)
>> +    {
>> +      grub_error (GRUB_ERR_BAD_FS, "osp too small");
>> +      zfs_unmount (data);
>> +      grub_free (osp);
>> +      grub_free (ubbest);
>> +      return 0;
>> +    }
>> +
>> +  /* Got the MOS. Save it at the memory addr MOS. */
>> +  grub_memmove (&(data->mos.dn),&((objset_phys_t *) osp)->os_meta_dnode, DNODE_SIZE);
>> +  data->mos.endian =
>> +		(grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian)>>  63)&  1;
>> +  grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
>> +
>>    grub_free (osp);
>> +  grub_free (ubbest);
>> 
>> -  return 0;
>> +  return (data);
>>  }
>> 
>>  grub_err_t
>> @@ -2149,33 +2263,18 @@
>>  static grub_err_t
>>  zfs_uuid (grub_device_t device, char **uuid)
>>  {
>> -  char *nvlist;
>> -  int found;
>>    struct grub_zfs_data *data;
>> -  grub_uint64_t guid;
>> -  grub_err_t err;
>> -
>> -  *uuid = 0;
>> 
>>    data = zfs_mount (device);
>>    if (! data)
>>      return grub_errno;
>> 
>> -  err = zfs_fetch_nvlist (data,&nvlist);
>> -  if (err)
>> -    {
>> -      zfs_unmount (data);
>> -      return err;
>> -    }
>> -
>> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,&guid);
>> -  if (! found)
>> -    return grub_errno;
>> -  grub_free (nvlist);
>> -  *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
>> +  *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
>>    zfs_unmount (data);
>> +
>>    if (! *uuid)
>>      return grub_errno;
>> +
>>    return GRUB_ERR_NONE;
>>  }
>> 
>> 
>> === modified file 'include/grub/zfs/uberblock_impl.h'
>> --- include/grub/zfs/uberblock_impl.h	2010-12-01 21:55:26 +0000
>> +++ include/grub/zfs/uberblock_impl.h	2012-01-19 11:28:09 +0000
>> @@ -23,6 +23,8 @@
>>  #ifndef _SYS_UBERBLOCK_IMPL_H
>>  #define	_SYS_UBERBLOCK_IMPL_H
>> 
>> +#define UBMAX(a,b) ((a)>  (b) ? (a) : (b))
>> +
>>  /*
>>   * The uberblock version is incremented whenever an incompatible on-disk
>>   * format change is made to the SPA, DMU, or ZAP.
>> @@ -45,16 +47,10 @@
>>  	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
>>  } uberblock_t;
>> 
>> -#define	UBERBLOCK_SIZE		(1ULL<<  UBERBLOCK_SHIFT)
>> -#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
>> -
>> -/* XXX Uberblock_phys_t is no longer in the kernel zfs */
>> -typedef struct uberblock_phys {
>> -	uberblock_t	ubp_uberblock;
>> -	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
>> -				sizeof (zio_eck_t)];
>> -	zio_eck_t	ubp_zec;
>> -} uberblock_phys_t;
>> -
>> +#define	VDEV_UBERBLOCK_SHIFT(as)	UBMAX(as, UBERBLOCK_SHIFT)
>> +#define	UBERBLOCK_SIZE(as)		(1ULL<<  VDEV_UBERBLOCK_SHIFT(as))
>> +
>> +/* Number of uberblocks that can fit in the ring at a given ashift */
>> +#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING>>  VDEV_UBERBLOCK_SHIFT(as))
>> 
>>  #endif	/* _SYS_UBERBLOCK_IMPL_H */
>> 
> 
> 
> -- 
> Regards
> Vladimir 'φ-coder/phcoder' Serbinenko
> 
> <zfs.diff>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-27 19:04       ` Zachary Bedell
@ 2012-01-27 22:22         ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-28  2:50         ` Richard Laager
  1 sibling, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-27 22:22 UTC (permalink / raw)
  To: Zachary Bedell; +Cc: grub-devel, Richard Laager

On 27.01.2012 20:04, Zachary Bedell wrote:
> Thanks for plodding through my mess.  It looks like the changes are committed to trunk at this point, right?  I'm trying to get a test together, but I've had to forward port the other changes necessary to support build on Linux.  I think I have those changes done in a such a way as they won't conflict with other platforms.  I'll try submitting those changes again once I'm sure everything is working.
>
> I think I still have access to one of the pools that caused issues.  In any case, I have a pretty good memory of what the actual on-disk issues were, and I'm pretty sure I can take a hex editor to a disk image and break them appropriately.  Assuming I'm successful with that, I'll try to make minimal image files that can be posted somewhere for testing.
>
> The pools in question were originally subject to an almost contrived level of abuse.  I was testing running swap on a ZVOL, and running into repeated kernel panics in the ZFSonLinux code when the pool was under heavy I/O.  Not a recipe for good data integrity...  In any case, the pools were always such that ZFS driver could import them.  In the majority of cases, they silently self-healed, in a few (especially where the good label was at the end of the device), a `zpool scrub` was necessary.  My metric for this stuff has been if the ZFS tools report errors, I'm not as worried if Grub can read the pool, but if ZFS says okay&  Grub breaks, that worries me.
I agree. In most cases GRUB can even access to a heavily corrupted FS 
not mountable otherwise due to not caring about most of features.
I have a patch for supporting zfs-fuse including multi-device ZFS 
installs (they need special care in grub-probe) but I can't easily 
retrieve it from my repo due to data loss (the files are still intact so 
worst-case scenario I have to rebuild my repo and lose some history).
> Few comments below...
>
> On Jan 22, 2012, at 9:18 AM, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>> I've cleaned this patch up (attached). Can you test it on these 4K FSes or tell me how to create one w/o really having such a disk?
>> On 19.01.2012 12:36, Richard Laager wrote:
>>>   static grub_err_t
>>> -uberblock_verify (uberblock_phys_t * ub, int offset)
>>> +uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
>>>   {
>>> -  uberblock_t *uber =&ub->ubp_uberblock;
>>>     grub_err_t err;
>>>     grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
>>>     zio_cksum_t zc;
>>> +  void *osp = NULL;
>>> +  grub_size_t ospsize;
>>> +
>>> +  if (uber->ub_txg<   data->label_txg)
>>> +    return grub_error (GRUB_ERR_BAD_FS,
>>> +	"ignoring partially written label: uber_txg<   label_txg");
>>>
>> I've left this part out since outside loop already looks for the newest txg.
> Gotta double check this, but I think you're right.  Was a duplicate check.
>
>>>     if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>>>         &&   grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN)>   0
>>> @@ -361,7 +390,19 @@
>>>
>>>     zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
>>>     err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
>>> -			     (char *) ub, UBERBLOCK_SIZE);
>>> +			     (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
>>> +
>>> +  if (!err)
>>> +    {
>>> +      /* Check that the data pointed by the rootbp is usable. */
>>> +      void *osp = NULL;
>>> +      grub_size_t ospsize;
>>> +      err = zio_read (&uber->ub_rootbp, endian,&osp,&ospsize, data);
>>> +      grub_free (osp);
>>> +
>>> +      if (!err&&   ospsize<   OBJSET_PHYS_SIZE_V14)
>>> +	return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to invalid data");
>>> +    }
>>>
>> Left this one out: why would it happen that uberblock checksum is valid but the other data in it isn't?
> This one I'm a little worried about.  Had a case where the pool had a technically valid UB (and it was the "best" one according to what I understand to be the proper logic).  The DVA in the UB was right, but the data that DVA pointed to was junk.  Best I could tell, the original Grub checking mechanism would drop to alt-DVA's if the physical read failed, but it wouldn't if the checksum of data pointed by the DVA (but not the UB itself's checksum) was bad.  At that point, Grub would toss the entire UB, and unless it happened to get lucky and find another one that was good enough, it would lose the pool.
>
> I'm going to try to get my head back into this over the weekend.  I got pulled away from ZFS/Linux/Gentoo/Grub stuff for a while, and have been spending this week bringing all the Gentoo overlay stuff I built up to date with latest of everything.  I've got Grub building&  detecting ZFS at this point, so I just need to figure out why Dracut broke, and I think I'm fully back in business for testing.
>
> Thanks again,
> Zac
>
> -- end of comments --
>
>>>     return err;
>>>   }
>>> @@ -372,32 +413,40 @@
>>>    *    Success - Pointer to the best uberblock.
>>>    *    Failure - NULL
>>>    */
>>> -static uberblock_phys_t *
>>> -find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
>>> +static uberblock_t *
>>> +find_bestub (char *ub_array, struct grub_zfs_data *data)
>>>   {
>>> -  uberblock_phys_t *ubbest = NULL;
>>> -  int i;
>>> -  grub_disk_addr_t offset;
>>> +  const grub_disk_addr_t sector = data->vdev_phys_sector;
>>> +  uberblock_t *ubbest = NULL;
>>> +  uberblock_t *ubnext;
>>> +  unsigned int i, offset, pickedub = 0;
>>>     grub_err_t err = GRUB_ERR_NONE;
>>>
>>> -  for (i = 0; i<   (VDEV_UBERBLOCK_RING>>   VDEV_UBERBLOCK_SHIFT); i++)
>>> +  const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
>>> +  const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
>>> +
>>> +  for (i = 0; i<   UBCOUNT; i++)
>>>       {
>>> -      offset = (sector<<   SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
>>> -	+ (i<<   VDEV_UBERBLOCK_SHIFT);
>>> +      ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
>>> +      offset = (sector<<   SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i * UBBYTES);
>>>
>> This is wrong with respect to alignment invariants.
>>> -      err = uberblock_verify (&ub_array[i], offset);
>>> +      err = uberblock_verify (ubnext, offset, data);
>>>         if (err)
>>>   	{
>>>   	  grub_errno = GRUB_ERR_NONE;
>>>   	  continue;
>>>   	}
>>> -      if (ubbest == NULL
>>> -	  || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
>>> -				&(ubbest->ubp_uberblock))>   0)
>>> -	ubbest =&ub_array[i];
>>> +      if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest)>   0)
>>> +	{
>>> +	  ubbest = ubnext;
>>> +	  pickedub = i;
>>> +	}
>> Skipped, it sets the variable but never uses it.
>>>       }
>>>     if (!ubbest)
>>>       grub_errno = err;
>>> +  else
>>> +    grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
>>> +		  pickedub, (unsigned long long) ubbest->ub_txg);
>>>
>> Skipped, not needed.
>>>     return (ubbest);
>>>   }
>>> @@ -515,8 +564,18 @@
>>>   	  sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
>>>   	  err = grub_disk_read (data->disk, sector, 0, psize, buf);
>>>   	}
>>> +
>>>         if (!err)
>>> -	return GRUB_ERR_NONE;
>>> +	{
>>> +	  /* Check the underlying checksum before we rule this DVA as "good" */
>>> +	  grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop, endian)>>   40)&   0xff;
>>> +	  err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf, psize);
>>> +	  if (!err)
>>> +	    return GRUB_ERR_NONE;
>>> +	}
>>> +
>>> +      /* If read failed or checksum bad, reset the error.  Hopefully we've got
>>> +       * some more DVA's to try. */
>> Skipped, already commented on this one.
>>>         grub_errno = GRUB_ERR_NONE;
>>>       }
>>>
>>> @@ -539,12 +598,9 @@
>>>     unsigned int comp;
>>>     char *compbuf = NULL;
>>>     grub_err_t err;
>>> -  zio_cksum_t zc = bp->blk_cksum;
>>> -  grub_uint32_t checksum;
>>>
>>>     *buf = NULL;
>>>
>>> -  checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>   40)&   0xff;
>>>     comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32)&   0xff;
>>>     lsize = (BP_IS_HOLE(bp) ? 0 :
>>>   	   (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian)&   0xffff) + 1)
>>> @@ -580,15 +636,6 @@
>>>         return err;
>>>       }
>>>
>>> -  err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
>>> -  if (err)
>>> -    {
>>> -      grub_dprintf ("zfs", "incorrect checksum\n");
>>> -      grub_free (compbuf);
>>> -      *buf = NULL;
>>> -      return err;
>>> -    }
>>> -
>>>     if (comp != ZIO_COMPRESS_OFF)
>>>       {
>>>         *buf = grub_malloc (lsize);
>>> @@ -1864,11 +1911,9 @@
>>>   static grub_err_t
>>>   check_pool_label (struct grub_zfs_data *data)
>>>   {
>>> -  grub_uint64_t pool_state, txg = 0;
>>> -  char *nvlist;
>>> -#if 0
>>> -  char *nv;
>>> -#endif
>>> +  grub_uint64_t pool_state;
>>> +  char *nvlist;			/* for the pool */
>>> +  char *vdevnvlist;		/* for the vdev */
>>>     grub_uint64_t diskguid;
>>>     grub_uint64_t version;
>>>     int found;
>>> @@ -1898,7 +1943,9 @@
>>>       }
>>>     grub_dprintf ("zfs", "check 4 passed\n");
>>>
>>> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,&txg);
>>> +  data->label_txg = 0;
>>> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
>>> +					&data->label_txg);
>> Not needed. Skipped.
>>>     if (!found)
>>>       {
>>>         grub_free (nvlist);
>>> @@ -1909,7 +1956,7 @@
>>>     grub_dprintf ("zfs", "check 6 passed\n");
>>>
>>>     /* not an active device */
>>> -  if (txg == 0)
>>> +  if (data->label_txg == 0)
>>>       {
>>>         grub_free (nvlist);
>>>         return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
>>> @@ -1931,20 +1978,32 @@
>>>       {
>>>         grub_free (nvlist);
>>>         return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
>>> -			 "too new version %llu>   %llu",
>>> +			 "SPA version too new %llu>   %llu",
>>>   			 (unsigned long long) version,
>>>   			 (unsigned long long) SPA_VERSION);
>>>       }
>>>     grub_dprintf ("zfs", "check 9 passed\n");
>>> -#if 0
>>> -  if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE,&nv,
>>> -			   DATA_TYPE_NVLIST, NULL))
>>> +
>>> +  vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_VDEV_TREE);
>>> +  if (!vdevnvlist)
>>>       {
>>> -      grub_free (vdev);
>>> -      return (GRUB_ERR_BAD_FS);
>>> +      grub_free (nvlist);
>>> +      if (!grub_errno)
>>> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
>>> +      return grub_errno;
>>>       }
>>>     grub_dprintf ("zfs", "check 10 passed\n");
>>> -#endif
>>> +
>>> +  found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
>>> +					&data->vdev_ashift);
>>> +  grub_free (vdevnvlist);
>>> +  if (!found)
>>> +    {
>>> +      grub_free (nvlist);
>>> +      if (!grub_errno)
>>> +	grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
>>> +      return grub_errno;
>>> +    }
>>>
>>>     found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID,&diskguid);
>>>     if (! found)
>>> @@ -1956,11 +2015,42 @@
>>>       }
>>>     grub_dprintf ("zfs", "check 11 passed\n");
>>>
>>> +  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,&data->pool_guid);
>>> +  if (! found)
>>> +    {
>>> +      grub_free (nvlist);
>>> +      if (! grub_errno)
>>> +        grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
>>> +      return grub_errno;
>>> +    }
>>> +   grub_dprintf ("zfs", "check 12 passed\n");
>>> +
>>> +
>> This check is already in. Skipped.
>>>     grub_free (nvlist);
>>>
>>> +  grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
>>> +    (unsigned long long) data->pool_guid,
>>> +    (unsigned long long) data->pool_guid,
>>> +    (unsigned long long) diskguid,
>>> +    (unsigned long long) diskguid,
>>> +    (unsigned long long) data->label_txg,
>>> +    (unsigned long long) version,
>>> +    (unsigned long long) data->vdev_ashift);
>>> +
>> Useless. Skipped.
>>>     return GRUB_ERR_NONE;
>>>   }
>>>
>>> +/*
>>> + * vdev_label_start returns the physical disk offset (in bytes) of
>>> + * label "l".
>>> + */
>>> +static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
>>> +{
>>> +  return (l * sizeof (vdev_label_t) + (l<   VDEV_LABELS / 2 ?
>>> +				       0 : psize -
>>> +				       VDEV_LABELS * sizeof (vdev_label_t)));
>>> +}
>>> +
>>>   static void
>>>   zfs_unmount (struct grub_zfs_data *data)
>>>   {
>>> @@ -1979,13 +2069,13 @@
>>>   zfs_mount (grub_device_t dev)
>>>   {
>>>     struct grub_zfs_data *data = 0;
>>> -  int label = 0;
>>> -  uberblock_phys_t *ub_array, *ubbest = NULL;
>>> -  vdev_boot_header_t *bh;
>>> +  int label = 0, bestlabel = -1;
>>> +  char *ub_array;
>>> +  uberblock_t *ubbest;
>>> +  uberblock_t *ubcur = NULL;
>>>     void *osp = 0;
>>>     grub_size_t ospsize;
>>>     grub_err_t err;
>>> -  int vdevnum;
>>>
>>>     if (! dev->disk)
>>>       {
>>> @@ -1997,11 +2087,6 @@
>>>     if (!data)
>>>       return 0;
>>>     grub_memset (data, 0, sizeof (*data));
>>> -#if 0
>>> -  /* if it's our first time here, zero the best uberblock out */
>>> -  if (data->best_drive == 0&&   data->best_part == 0&&   find_best_root)
>>> -    grub_memset (&current_uberblock, 0, sizeof (uberblock_t));
>>> -#endif
>>>
>>>     data->disk = dev->disk;
>>>
>>> @@ -2012,100 +2097,129 @@
>>>         return 0;
>>>       }
>>>
>>> -  bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
>>> -  if (!bh)
>>> +  ubbest = grub_malloc (sizeof (*ubbest));
>>> +  if (!ubbest)
>>>       {
>>>         zfs_unmount (data);
>>> -      grub_free (ub_array);
>>>         return 0;
>>>       }
>>> -
>>> -  vdevnum = VDEV_LABELS;
>>> -
>>> -  /* Don't check back labels on CDROM.  */
>>> -  if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
>>> -    vdevnum = VDEV_LABELS / 2;
>>> -
>>> -  for (label = 0; ubbest == NULL&&   label<   vdevnum; label++)
>>> +  grub_memset (ubbest, 0, sizeof (*ubbest));
>>> +
>>> +  /* Establish some constants for where things are on the device: */
>>> +
>>> +  /*
>>> +   *  Some eltorito stacks don't give us a size and we end up setting the
>>> +   * size to MAXUINT, further some of these devices stop working once a single
>>> +   * read past the end has been issued. Checking for a maximum part_length and
>>> +   * skipping the backup labels at the end of the slice/partition/device
>>> +   * avoids breaking down on such devices.
>>> +   */
>>> +  const int vdevnum =
>>> +    grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
>>> +    VDEV_LABELS / 2 : VDEV_LABELS;
>>> +
>>> +  /* Size in bytes of the device (disk or partition) aligned to label size */
>>> +  grub_uint64_t device_size =
>>> +	grub_disk_get_size (dev->disk)<<   GRUB_DISK_SECTOR_BITS;
>>> +  const grub_uint64_t alignedbytes =
>>> +    P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
>>> +
>>> +  for (label = 0; label<   vdevnum; label++)
>>>       {
>>> -      grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
>>> -      grub_dprintf ("zfs", "label %d\n", label);
>>> -
>>> -      data->vdev_phys_sector
>>> -	= label * (sizeof (vdev_label_t)>>   SPA_MINBLOCKSHIFT)
>>> -	+ ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE)>>   SPA_MINBLOCKSHIFT)
>>> -	+ (label<   VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
>>> -	   - VDEV_LABELS * (sizeof (vdev_label_t)>>   SPA_MINBLOCKSHIFT));
>>> +      grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
>>> +      grub_uint64_t labelstart = labelstartbytes>>   GRUB_DISK_SECTOR_BITS;
>>> +
>>> +      grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
>>> +		    label, (unsigned long long) labelstart,
>>> +		    (unsigned long long) labelstartbytes);
>>> +
>>> +      data->vdev_phys_sector = labelstart +
>>> +		((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE)>>   GRUB_DISK_SECTOR_BITS);
>>> +
>>> +      err = check_pool_label (data);
>>> +      if (err)
>>> +	{
>>> +	  grub_dprintf ("zfs", "error checking label %d\n", label);
>>> +	  grub_errno = GRUB_ERR_NONE;
>>> +	  continue;
>>> +	}
>>>
>> This seems like a lot of moving around but I don't see any real change other than aligning the size down.
>>>         /* Read in the uberblock ring (128K). */
>>> -      err = grub_disk_read (data->disk, data->vdev_phys_sector
>>> -			    + (VDEV_PHYS_SIZE>>   SPA_MINBLOCKSHIFT),
>>> -			    0, VDEV_UBERBLOCK_RING, (char *) ub_array);
>>> -      if (err)
>>> -	{
>>> -	  grub_errno = GRUB_ERR_NONE;
>>> -	  continue;
>>> -	}
>>> -      grub_dprintf ("zfs", "label ok %d\n", label);
>>> -
>>> -      ubbest = find_bestub (ub_array, data->vdev_phys_sector);
>>> -      if (!ubbest)
>>> -	{
>>> -	  grub_dprintf ("zfs", "No uberblock found\n");
>>> -	  grub_errno = GRUB_ERR_NONE;
>>> -	  continue;
>>> -	}
>>> -      ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic,
>>> -				     LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>>> -		   ? LITTLE_ENDIAN : BIG_ENDIAN);
>>> -      err = zio_read (&ubbest->ubp_uberblock.ub_rootbp,
>>> -		      ub_endian,
>>> -		&osp,&ospsize, data);
>>> -      if (err)
>>> -	{
>>> -	  grub_dprintf ("zfs", "couldn't zio_read\n");
>>> -	  grub_errno = GRUB_ERR_NONE;
>>> -	  continue;
>>> -	}
>>> -
>>> -      if (ospsize<   OBJSET_PHYS_SIZE_V14)
>>> -	{
>>> -	  grub_dprintf ("zfs", "osp too small\n");
>>> -	  grub_free (osp);
>>> -	  continue;
>>> -	}
>>> -      grub_dprintf ("zfs", "ubbest %p\n", ubbest);
>>> -
>>> -      err = check_pool_label (data);
>>> -      if (err)
>>> -	{
>>> -	  grub_errno = GRUB_ERR_NONE;
>>> -	  continue;
>>> -	}
>>> -#if 0
>>> -      if (find_best_root&&
>>> -	  vdev_uberblock_compare (&ubbest->ubp_uberblock,
>>> -				&(current_uberblock))<= 0)
>>> -	continue;
>>> -#endif
>>> -      /* Got the MOS. Save it at the memory addr MOS. */
>>> -      grub_memmove (&(data->mos.dn),&((objset_phys_t *) osp)->os_meta_dnode,
>>> -		    DNODE_SIZE);
>>> -      data->mos.endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian)>>   63)&   1;
>>> -      grub_memmove (&(data->current_uberblock),
>>> -		&ubbest->ubp_uberblock, sizeof (uberblock_t));
>>> -      grub_free (ub_array);
>>> -      grub_free (bh);
>>> -      grub_free (osp);
>>> -      return data;
>>> +      err = grub_disk_read (data->disk,
>>> +			    data->vdev_phys_sector +
>>> +			    (VDEV_PHYS_SIZE>>   GRUB_DISK_SECTOR_BITS),
>>> +			    0, VDEV_UBERBLOCK_RING, ub_array);
>>> +      if (err)
>>> +	{
>>> +	  grub_dprintf ("zfs", "error reading uberblock ring for label %d\n", label);
>>> +	  grub_errno = GRUB_ERR_NONE;
>>> +	  continue;
>>> +	}
>>> +
>>> +      ubcur = find_bestub (ub_array, data);
>>> +      if (!ubcur)
>>> +	{
>>> +	  grub_dprintf ("zfs", "No good uberblocks found in label %d\n", label);
>>> +	  grub_errno = GRUB_ERR_NONE;
>>> +	  continue;
>>> +	}
>>> +
>>> +      if (vdev_uberblock_compare (ubcur, ubbest)>   0)
>>> +	{
>>> +	  /* Looks like the block is good, so use it. */
>>> +	  grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
>>> +	  bestlabel = label;
>>> +	  grub_dprintf ("zfs", "Current best uberblock found in label %d\n", label);
>>> +	}
>>>       }
>>> -  grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
>>> -  zfs_unmount (data);
>>>     grub_free (ub_array);
>>> -  grub_free (bh);
>>> +
>>> +  /*
>>> +   * We zero'd the structure to begin with.  If we never assigned to it,
>>> +   * magic will still be zero.
>>> +   */
>>> +  if (!ubbest->ub_magic)
>>> +    {
>>> +      grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
>>> +      zfs_unmount (data);
>>> +      grub_free (ubbest);
>>> +      return 0;
>>> +    }
>>> +
>>> +  grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
>>> +
>>> +  grub_zfs_endian_t ub_endian =
>>> +    grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
>>> +    ? LITTLE_ENDIAN : BIG_ENDIAN;
>>> +  err = zio_read (&ubbest->ub_rootbp, ub_endian,&osp,&ospsize, data);
>>> +
>>> +  if (err)
>>> +    {
>>> +      grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
>>> +      zfs_unmount (data);
>>> +      grub_free (ubbest);
>>> +      return 0;
>>> +    }
>>> +
>>> +  if (ospsize<   OBJSET_PHYS_SIZE_V14)
>>> +    {
>>> +      grub_error (GRUB_ERR_BAD_FS, "osp too small");
>>> +      zfs_unmount (data);
>>> +      grub_free (osp);
>>> +      grub_free (ubbest);
>>> +      return 0;
>>> +    }
>>> +
>>> +  /* Got the MOS. Save it at the memory addr MOS. */
>>> +  grub_memmove (&(data->mos.dn),&((objset_phys_t *) osp)->os_meta_dnode, DNODE_SIZE);
>>> +  data->mos.endian =
>>> +		(grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian)>>   63)&   1;
>>> +  grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
>>> +
>>>     grub_free (osp);
>>> +  grub_free (ubbest);
>>>
>>> -  return 0;
>>> +  return (data);
>>>   }
>>>
>>>   grub_err_t
>>> @@ -2149,33 +2263,18 @@
>>>   static grub_err_t
>>>   zfs_uuid (grub_device_t device, char **uuid)
>>>   {
>>> -  char *nvlist;
>>> -  int found;
>>>     struct grub_zfs_data *data;
>>> -  grub_uint64_t guid;
>>> -  grub_err_t err;
>>> -
>>> -  *uuid = 0;
>>>
>>>     data = zfs_mount (device);
>>>     if (! data)
>>>       return grub_errno;
>>>
>>> -  err = zfs_fetch_nvlist (data,&nvlist);
>>> -  if (err)
>>> -    {
>>> -      zfs_unmount (data);
>>> -      return err;
>>> -    }
>>> -
>>> -  found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,&guid);
>>> -  if (! found)
>>> -    return grub_errno;
>>> -  grub_free (nvlist);
>>> -  *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
>>> +  *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
>>>     zfs_unmount (data);
>>> +
>>>     if (! *uuid)
>>>       return grub_errno;
>>> +
>>>     return GRUB_ERR_NONE;
>>>   }
>>>
>>>
>>> === modified file 'include/grub/zfs/uberblock_impl.h'
>>> --- include/grub/zfs/uberblock_impl.h	2010-12-01 21:55:26 +0000
>>> +++ include/grub/zfs/uberblock_impl.h	2012-01-19 11:28:09 +0000
>>> @@ -23,6 +23,8 @@
>>>   #ifndef _SYS_UBERBLOCK_IMPL_H
>>>   #define	_SYS_UBERBLOCK_IMPL_H
>>>
>>> +#define UBMAX(a,b) ((a)>   (b) ? (a) : (b))
>>> +
>>>   /*
>>>    * The uberblock version is incremented whenever an incompatible on-disk
>>>    * format change is made to the SPA, DMU, or ZAP.
>>> @@ -45,16 +47,10 @@
>>>   	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
>>>   } uberblock_t;
>>>
>>> -#define	UBERBLOCK_SIZE		(1ULL<<   UBERBLOCK_SHIFT)
>>> -#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
>>> -
>>> -/* XXX Uberblock_phys_t is no longer in the kernel zfs */
>>> -typedef struct uberblock_phys {
>>> -	uberblock_t	ubp_uberblock;
>>> -	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
>>> -				sizeof (zio_eck_t)];
>>> -	zio_eck_t	ubp_zec;
>>> -} uberblock_phys_t;
>>> -
>>> +#define	VDEV_UBERBLOCK_SHIFT(as)	UBMAX(as, UBERBLOCK_SHIFT)
>>> +#define	UBERBLOCK_SIZE(as)		(1ULL<<   VDEV_UBERBLOCK_SHIFT(as))
>>> +
>>> +/* Number of uberblocks that can fit in the ring at a given ashift */
>>> +#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING>>   VDEV_UBERBLOCK_SHIFT(as))
>>>
>>>   #endif	/* _SYS_UBERBLOCK_IMPL_H */
>>>
>>
>> -- 
>> Regards
>> Vladimir 'φ-coder/phcoder' Serbinenko
>>
>> <zfs.diff>
>


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-27 19:04       ` Zachary Bedell
  2012-01-27 22:22         ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-01-28  2:50         ` Richard Laager
  2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Laager @ 2012-01-28  2:50 UTC (permalink / raw)
  To: Zachary Bedell; +Cc: grub-devel


[-- Attachment #1.1: Type: text/plain, Size: 665 bytes --]

On Fri, 2012-01-27 at 14:04 -0500, Zachary Bedell wrote:
> I've had to forward port the other changes necessary to support build on Linux.

Attached is the work I've done on this front. This includes the "allow
spaces in zpools" patch I previously submitted to grub-devel. And, the
changes in 10_linux.in are from dajhorn's Ubuntu packages.

I have not yet made anything boot on RAIDZ. I don't know if that's
caused by this patch.

If the changes to getroot.c are accepted upstream, I'd recommend moving
find_device_from_pool() higher up in that file, rather than using the
prototype, but the prototype keeps the patch smaller for now.

-- 
Richard

[-- Attachment #1.2: zfs-on-linux.patch --]
[-- Type: text/x-patch, Size: 6803 bytes --]

Index: grub/util/grub.d/10_linux.in
===================================================================
--- grub.orig/util/grub.d/10_linux.in	2012-01-24 23:44:10.530591000 -0600
+++ grub/util/grub.d/10_linux.in	2012-01-24 23:44:10.706928000 -0600
@@ -56,8 +56,10 @@
   LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
 fi
 
-if [ "x`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`" = xbtrfs ] \
-    || [ "x`stat -f --printf=%T /`" = xbtrfs ]; then
+LINUX_ROOT_FS=`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`
+LINUX_ROOT_STAT=`stat -f --printf=%T / || true`
+
+if [ "x${LINUX_ROOT_FS}" = xbtrfs -o "x${LINUX_ROOT_STAT}" = xbtrfs ]; then
   rootsubvol="`make_system_path_relative_to_its_root /`"
   rootsubvol="${rootsubvol#/}"
   if [ "x${rootsubvol}" != x ]; then
@@ -76,6 +78,10 @@
     GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"
 fi
 
+if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
+  GRUB_CMDLINE_LINUX="boot=zfs \$bootfs ${GRUB_CMDLINE_LINUX}"
+fi
+
 linux_entry ()
 {
   os="$1"
@@ -114,6 +120,12 @@
     fi
     printf '%s\n' "${prepare_boot_cache}"
   fi
+  if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
+    cat << EOF
+	insmod zfsinfo
+	zfs-bootfs (\$root) bootfs
+EOF
+  fi
   if [ "x$5" != "xquiet" ]; then
     message="$(gettext_printf "Loading Linux %s ..." ${version})"
     cat << EOF
Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-24 23:44:04.105772000 -0600
+++ grub/util/getroot.c	2012-01-27 20:48:50.875006000 -0600
@@ -52,6 +52,8 @@
 #endif
 
 #ifdef __linux__
+# include <stdio.h>
+# include <mntent.h>
 # include <sys/types.h>
 # include <sys/wait.h>
 #endif
@@ -115,6 +117,8 @@
   return path;
 }
 
+static char *find_device_from_pool (const char *poolname);
+
 #ifdef __linux__
 
 #define ESCAPED_PATH_MAX (4 * PATH_MAX)
@@ -263,7 +267,32 @@
       if (!*entries[i].device)
 	continue;
 
-      ret = strdup (entries[i].device);
+      if (strcmp (entries[i].fstype, "zfs") == 0)
+	{
+	  char *poolname = entries[i].device;
+	  char *poolname_i = poolname;
+	  char *poolname_j = poolname;
+	  /* Replace \040 with a space.  Cut at the first slash. */
+	  while (*poolname_j)
+	    {
+	      if (*poolname_j == '/')
+		break;
+	      if (strncmp (poolname_j, "\\040", 4) == 0)
+		{
+		  *poolname_i = ' ';
+		  poolname_i++;
+		  poolname_j += 4;
+		  continue;
+		}
+	      *poolname_i = *poolname_j;
+	      poolname_i++;
+	      poolname_j++;
+	    }
+	  *poolname_i = '\0';
+	  ret = find_device_from_pool (poolname);
+	}
+      else
+	ret = strdup (entries[i].device);
       if (relroot)
 	*relroot = strdup (entries[i].enc_root);
       break;
@@ -280,13 +309,25 @@
 static char *
 find_root_device_from_libzfs (const char *dir)
 {
-  char *device = NULL;
+  char *device;
   char *poolname;
   char *poolfs;
 
   grub_find_zpool_from_dir (dir, &poolname, &poolfs);
   if (! poolname)
     return NULL;
+  if (poolfs)
+    free (poolfs);
+
+  device = find_device_from_pool(poolname);
+  free(poolname);
+  return device;
+}
+
+static char *
+find_device_from_pool (const char *poolname)
+{
+  char *device = NULL;
 
 #if defined(HAVE_LIBZFS) && defined(HAVE_LIBNVPAIR)
   {
@@ -357,7 +398,7 @@
     char cksum[257], notes[257];
     unsigned int dummy;
 
-    cmd = xasprintf ("zpool status %s", poolname);
+    cmd = xasprintf ("zpool status \"%s\"", poolname);
     fp = popen (cmd, "r");
     free (cmd);
 
@@ -382,7 +423,10 @@
 		st++;
 	      break;
 	    case 1:
-	      if (!strcmp (name, poolname))
+	      /* Use strncmp() because poolname can technically have trailing
+	         spaces, which the sscanf() above will not catch.  Since we've
+	         asked about this pool specifically, this should be safe. */
+	      if (!strncmp (name, poolname, strlen(name)))
 		st++;
 	      break;
 	    case 2:
@@ -395,17 +439,71 @@
 	
 	free (line);
       }
+
+#ifdef __linux__
+    /* The name returned by zpool isn't necessarily directly under /dev. */
+    {
+      const char *disk_naming_schemes[] = {
+	"/dev/disk/by-id/%s",
+	"/dev/disk/by-path/%s",
+	"/dev/disk/by-uuid/%s",
+	"/dev/disk/by-partuuid/%s",
+	"/dev/disk/by-label/%s",
+	"/dev/disk/by-partlabel/%s",
+	"/dev/%s",
+	NULL
+      };
+      const char **disk_naming_scheme = disk_naming_schemes;
+
+      for (; *disk_naming_scheme ; disk_naming_scheme++)
+	{
+	  struct stat sb;
+	  device = xasprintf (*disk_naming_scheme, name);
+	  if (stat (device, &sb) == 0)
+	    {
+	      char *real_device;
+	      const char *c;
+	      char *first_partition;
+
+	      /* Resolve the symlink to something like /dev/sda. */
+	      real_device = realpath (device, NULL);
+	      free (device);
+
+	      /* It ends in a number; assume it's a partition and stop. */
+	      for (c = real_device ; *(c+1) ; c++);
+	      if (*c >= '0' && *c <= '9')
+		{
+		  device = real_device;
+		  break;
+		}
+
+	      /* Otherwise, it might be a partitioned wholedisk vdev. */
+	      first_partition = xasprintf ("%s1", real_device);
+	      if (stat (first_partition, &sb) == 0)
+		{
+		  free (real_device);
+		  device = first_partition;
+		  break;
+		}
+
+	      /* The device is not partitioned. */
+	      free (device);
+	      device = real_device;
+	      break;
+	    }
+	  free (device);
+	  device = NULL;
+	}
+    }
+#else
     device = xasprintf ("/dev/%s", name);
+#endif /* !__linux__ */
 
  fail:
     pclose (fp);
   }
 #endif
 
-  free (poolname);
-  if (poolfs)
-    free (poolfs);
-
   return device;
 }
 
@@ -708,10 +806,10 @@
 #ifdef __linux__
   if (!os_dev)
     os_dev = grub_find_root_device_from_mountinfo (dir, NULL);
-#endif /* __linux__ */
-
+#else
   if (!os_dev)
     os_dev = find_root_device_from_libzfs (dir);
+#endif /* !__linux__ */
 
   if (os_dev)
     {
@@ -1484,6 +1582,37 @@
 	    break;
 	  }
       }
+
+    fclose (mnttab);
+  }
+#elif defined(__linux__)
+  {
+    struct stat st;
+    struct mntent *mnt;
+    FILE *mnttab;
+
+    if (stat (dir, &st) != 0)
+      return;
+
+    mnttab = fopen ("/proc/mounts", "r");
+    if (! mnttab)
+      mnttab = fopen ("/etc/mtab", "r");
+    if (! mnttab)
+      return;
+
+    while ((mnt = getmntent (mnttab)) != NULL)
+      {
+	struct stat mnt_st;
+	if (strcmp (mnt->mnt_type, "zfs") != 0)
+	  continue;
+	if (stat (mnt->mnt_dir, &mnt_st) != 0)
+	  continue;
+	if (mnt_st.st_dev == st.st_dev)
+	  {
+	    *poolname = xstrdup (mnt->mnt_fsname);
+	    break;
+	  }
+      }
 
     fclose (mnttab);
   }

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28  2:50         ` Richard Laager
@ 2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-28 16:50             ` Richard Laager
                               ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-28 12:51 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Richard Laager, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 6412 bytes --]

On 28.01.2012 03:50, Richard Laager wrote:
> On Fri, 2012-01-27 at 14:04 -0500, Zachary Bedell wrote:
>> >  I've had to forward port the other changes necessary to support build on Linux.

I attach my zfs-related changed. I'll commit them before your patch and 
it already covers most of issues
> Attached is the work I've done on this front. This includes the "allow
> spaces in zpools" patch I previously submitted to grub-devel. And, the
> changes in 10_linux.in are from dajhorn's Ubuntu packages.
I already commented on 10_linux.in changes. They are pretty sloppy. 
(mostly is "it works for me and I don't care about other legitimate 
configs")
>
>
> zfs-on-linux.patch
>
>
> Index: grub/util/grub.d/10_linux.in
> ===================================================================
> --- grub.orig/util/grub.d/10_linux.in	2012-01-24 23:44:10.530591000 -0600
> +++ grub/util/grub.d/10_linux.in	2012-01-24 23:44:10.706928000 -0600
> @@ -56,8 +56,10 @@
>     LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
>   fi
>
> -if [ "x`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`" = xbtrfs ] \
> -    || [ "x`stat -f --printf=%T /`" = xbtrfs ]; then
> +LINUX_ROOT_FS=`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`
> +LINUX_ROOT_STAT=`stat -f --printf=%T / || true`
> +
> +if [ "x${LINUX_ROOT_FS}" = xbtrfs -o "x${LINUX_ROOT_STAT}" = xbtrfs ]; then
>     rootsubvol="`make_system_path_relative_to_its_root /`"
>     rootsubvol="${rootsubvol#/}"
>     if [ "x${rootsubvol}" != x ]; then
> @@ -76,6 +78,10 @@
>       GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"
>   fi
>
> +if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
> +  GRUB_CMDLINE_LINUX="boot=zfs \$bootfs ${GRUB_CMDLINE_LINUX}"
> +fi
> +
>   linux_entry ()
>   {
>     os="$1"
> @@ -114,6 +120,12 @@
>       fi
>       printf '%s\n' "${prepare_boot_cache}"
>     fi
> +  if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
> +    cat<<  EOF
> +	insmod zfsinfo
> +	zfs-bootfs (\$root) bootfs
This makes 3 wrong assumptions in a row:
- / and /boot may be different.
- Linux may be in a non-root subvolume. Then the subvolid points to 
wrong one.
- / may be unaccessible to GRUB altogether.
In short: this command line part has to be generated on grub-mkconfig 
time and have a stable representation. I'd recommend UUID and subvolume 
name.
> +EOF
> +  fi
>     if [ "x$5" != "xquiet" ]; then
>       message="$(gettext_printf "Loading Linux %s ..." ${version})"
>       cat<<  EOF
> Index: grub/util/getroot.c
> ===================================================================
> --- grub.orig/util/getroot.c	2012-01-24 23:44:04.105772000 -0600
> +++ grub/util/getroot.c	2012-01-27 20:48:50.875006000 -0600
> @@ -52,6 +52,8 @@
>   #endif
>
>   #ifdef __linux__
> +# include<stdio.h>
> +# include<mntent.h>
>   # include<sys/types.h>
>   # include<sys/wait.h>
>   #endif
> @@ -115,6 +117,8 @@
>     return path;
>   }
>
> +static char *find_device_from_pool (const char *poolname);
> +
>   #ifdef __linux__
>
>   #define ESCAPED_PATH_MAX (4 * PATH_MAX)
> @@ -263,7 +267,32 @@
>         if (!*entries[i].device)
>   	continue;
>
> -      ret = strdup (entries[i].device);
> +      if (strcmp (entries[i].fstype, "zfs") == 0)
> +	{
> +	  char *poolname = entries[i].device;
> +	  char *poolname_i = poolname;
> +	  char *poolname_j = poolname;
> +	  /* Replace \040 with a space.  Cut at the first slash. */
> +	  while (*poolname_j)
> +	    {
> +	      if (*poolname_j == '/')
> +		break;
> +	      if (strncmp (poolname_j, "\\040", 4) == 0)
> +		{
> +		  *poolname_i = ' ';
> +		  poolname_i++;
> +		  poolname_j += 4;
> +		  continue;
> +		}
> +	      *poolname_i = *poolname_j;
> +	      poolname_i++;
> +	      poolname_j++;
> +	    }
> +	  *poolname_i = '\0';
> +	  ret = find_device_from_pool (poolname);
> +	}
> +      else
> +	ret = strdup (entries[i].device);
This is the same as what I've done for zfs-fuse except that:
- unescaping is done before here since already many revisions.
- It forgets subvolume handling.
>         if (relroot)
>   	*relroot = strdup (entries[i].enc_root);
>         break;
> @@ -280,13 +309,25 @@
>   static char *
>   find_root_device_from_libzfs (const char *dir)
>   {
> -  char *device = NULL;
> +  char *device;
>     char *poolname;
>     char *poolfs;
>
>     grub_find_zpool_from_dir (dir,&poolname,&poolfs);
>     if (! poolname)
>       return NULL;
> +  if (poolfs)
> +    free (poolfs);
> +
> +  device = find_device_from_pool(poolname);
> +  free(poolname);
> +  return device;
> +}
> +
> +static char *
> +find_device_from_pool (const char *poolname)
> +{
> +  char *device = NULL;
>   
Same as my fuse work.
>   #if defined(HAVE_LIBZFS)&&  defined(HAVE_LIBNVPAIR)
>     {
> @@ -357,7 +398,7 @@
>       char cksum[257], notes[257];
>       unsigned int dummy;
>
> -    cmd = xasprintf ("zpool status %s", poolname);
> +    cmd = xasprintf ("zpool status \"%s\"", poolname);
>       fp = popen (cmd, "r");
>       free (cmd);
>
> @@ -382,7 +423,10 @@
>   		st++;
>   	      break;
>   	    case 1:
> -	      if (!strcmp (name, poolname))
> +	      /* Use strncmp() because poolname can technically have trailing
> +	         spaces, which the sscanf() above will not catch.  Since we've
> +	         asked about this pool specifically, this should be safe. */
> +	      if (!strncmp (name, poolname, strlen(name)))
>   		st++;
>   	      break;
>   	    case 2:
> @@ -395,17 +439,71 @@
>   	
>   	free (line);
>         }
> +
> +#ifdef __linux__
> +    /* The name returned by zpool isn't necessarily directly under /dev. */
> +    {
> +      const char *disk_naming_schemes[] = {
> +	"/dev/disk/by-id/%s",
> +	"/dev/disk/by-path/%s",
> +	"/dev/disk/by-uuid/%s",
> +	"/dev/disk/by-partuuid/%s",
> +	"/dev/disk/by-label/%s",
> +	"/dev/disk/by-partlabel/%s",
> +	"/dev/%s",
Such a list is difficult to maintain. It's better to scan /dev/disk if 
/dev/<name> is unavailable.

+    mnttab = fopen ("/proc/mounts", "r");
+    if (! mnttab)
+      mnttab = fopen ("/etc/mtab", "r");
+    if (! mnttab)
+      return;
/etc/mtab is unreliable. As for /proc/mounts : is there a reason to suppose that /proc/mounts would work when /proc/self/mountinfo doesn't

>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko


[-- Attachment #2: zfs.diff --]
[-- Type: text/x-diff, Size: 40652 bytes --]

=== modified file 'include/grub/emu/getroot.h'
--- include/grub/emu/getroot.h	2011-04-25 12:52:07 +0000
+++ include/grub/emu/getroot.h	2012-01-27 15:41:19 +0000
@@ -30,7 +30,7 @@
 };
 
 char *grub_find_device (const char *dir, dev_t dev);
-char *grub_guess_root_device (const char *dir);
+char **grub_guess_root_devices (const char *dir);
 int grub_util_get_dev_abstraction (const char *os_dev);
 char *grub_util_get_grub_dev (const char *os_dev);
 char *grub_make_system_path_relative_to_its_root (const char *path);

=== modified file 'include/grub/emu/misc.h'
--- include/grub/emu/misc.h	2011-12-13 13:51:41 +0000
+++ include/grub/emu/misc.h	2012-01-27 15:23:40 +0000
@@ -80,6 +80,4 @@
 int grub_device_mapper_supported (void);
 #endif
 
-char *grub_find_root_device_from_mountinfo (const char *dir, char **relroot);
-
 #endif /* GRUB_EMU_MISC_H */

=== modified file 'util/getroot.c'
--- util/getroot.c	2011-12-23 18:25:24 +0000
+++ util/getroot.c	2012-01-27 16:57:59 +0000
@@ -115,6 +115,149 @@
   return path;
 }
 
+static char **
+find_root_devices_from_poolname (char *poolname)
+{
+  char **devices = 0;
+  size_t ndevices = 0;
+  size_t devices_allocated = 0;
+
+#if defined(HAVE_LIBZFS) && defined(HAVE_LIBNVPAIR)
+  zpool_handle_t *zpool;
+  libzfs_handle_t *libzfs;
+  nvlist_t *config, *vdev_tree;
+  nvlist_t **children, **path;
+  unsigned int nvlist_count;
+  unsigned int i;
+  char *device = 0;
+
+  libzfs = grub_get_libzfs_handle ();
+  if (! libzfs)
+    return NULL;
+
+  zpool = zpool_open (libzfs, poolname);
+  config = zpool_get_config (zpool, NULL);
+
+  if (nvlist_lookup_nvlist (config, "vdev_tree", &vdev_tree) != 0)
+    error (1, errno, "nvlist_lookup_nvlist (\"vdev_tree\")");
+
+  if (nvlist_lookup_nvlist_array (vdev_tree, "children", &children, &nvlist_count) != 0)
+    error (1, errno, "nvlist_lookup_nvlist_array (\"children\")");
+  assert (nvlist_count > 0);
+
+  while (nvlist_lookup_nvlist_array (children[0], "children",
+				     &children, &nvlist_count) == 0)
+    assert (nvlist_count > 0);
+
+  for (i = 0; i < nvlist_count; i++)
+    {
+      if (nvlist_lookup_string (children[i], "path", &device) != 0)
+	error (1, errno, "nvlist_lookup_string (\"path\")");
+
+      struct stat st;
+      if (stat (device, &st) == 0)
+	{
+#ifdef __sun__
+	  if (grub_memcmp (device, "/dev/dsk/", sizeof ("/dev/dsk/") - 1)
+	      == 0)
+	    device = xasprintf ("/dev/rdsk/%s",
+				device + sizeof ("/dev/dsk/") - 1);
+	  else if (grub_memcmp (device, "/devices", sizeof ("/devices") - 1)
+		   == 0
+		   && grub_memcmp (device + strlen (device) - 4,
+				   ",raw", 4) != 0)
+	    device = xasprintf ("%s,raw", device);
+	  else
+#endif
+	    device = xstrdup (device);
+	  if (ndevices >= devices_allocated)
+	    {
+	      devices_allocated = 2 * (devices_allocated + 8);
+	      devices = xrealloc (devices, sizeof (devices[0])
+				  * devices_allocated);
+	    }
+	  devices[ndevices++] = device;
+	}
+
+      device = NULL;
+    }
+
+  zpool_close (zpool);
+#else
+  char *cmd;
+  FILE *fp;
+  int ret;
+  char *line;
+  size_t len;
+  int st;
+
+  char name[PATH_MAX + 1], state[257], readlen[257], writelen[257];
+  char cksum[257], notes[257];
+  unsigned int dummy;
+
+  cmd = xasprintf ("zpool status %s", poolname);
+  fp = popen (cmd, "r");
+  free (cmd);
+
+  st = 0;
+  while (1)
+    {
+      line = NULL;
+      ret = getline (&line, &len, fp);
+      if (ret == -1)
+	break;
+	
+      if (sscanf (line, " %s %256s %256s %256s %256s %256s",
+		  name, state, readlen, writelen, cksum, notes) >= 5)
+	switch (st)
+	  {
+	  case 0:
+	    if (!strcmp (name, "NAME")
+		&& !strcmp (state, "STATE")
+		&& !strcmp (readlen, "READ")
+		&& !strcmp (writelen, "WRITE")
+		&& !strcmp (cksum, "CKSUM"))
+	      st++;
+	    break;
+	  case 1:
+	    if (!strcmp (name, poolname))
+	      st++;
+	    break;
+	  case 2:
+	    if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
+		&& !sscanf (name, "raidz%u", &dummy)
+		&& !strcmp (state, "ONLINE"))
+	      {
+		char *tmp;
+		if (ndevices >= devices_allocated)
+		  {
+		    devices_allocated = 2 * (devices_allocated + 8);
+		    devices = xrealloc (devices, sizeof (devices[0])
+					* devices_allocated);
+		  }
+		devices[ndevices++] = xasprintf ("/dev/%s", name);
+	      }
+	    break;
+	  }
+	
+      free (line);
+    }
+
+  pclose (fp);
+#endif
+  if (devices)
+    {
+      if (ndevices >= devices_allocated)
+	{
+	  devices_allocated = 2 * (devices_allocated + 8);
+	  devices = xrealloc (devices, sizeof (devices[0])
+			      * devices_allocated);
+	}
+      devices[ndevices++] = 0;
+    }
+  return devices;
+}
+
 #ifdef __linux__
 
 #define ESCAPED_PATH_MAX (4 * PATH_MAX)
@@ -154,13 +297,13 @@
   *optr = 0;
 }
 
-char *
-grub_find_root_device_from_mountinfo (const char *dir, char **relroot)
+static char **
+grub_find_root_devices_from_mountinfo (const char *dir, char **relroot)
 {
   FILE *fp;
   char *buf = NULL;
   size_t len = 0;
-  char *ret = NULL;
+  char **ret = NULL;
   int entry_len = 0, entry_max = 4;
   struct mountinfo_entry *entries;
   struct mountinfo_entry parent_entry = { 0, 0, 0, "", "", "", "" };
@@ -263,9 +406,33 @@
       if (!*entries[i].device)
 	continue;
 
-      ret = strdup (entries[i].device);
-      if (relroot)
-	*relroot = strdup (entries[i].enc_root);
+      if (grub_strcmp (entries[i].fstype, "fuse.zfs") == 0)
+	{
+	  char *slash;
+	  slash = strchr (entries[i].device, '/');
+	  if (slash)
+	    *slash = 0;
+	  ret = find_root_devices_from_poolname (entries[i].device);
+	  if (slash)
+	    *slash = '/';
+	  if (relroot)
+	    {
+	      if (!slash)
+		*relroot = xasprintf ("/@%s", entries[i].enc_root);
+	      else if (strchr (slash + 1, '@'))
+		*relroot = xasprintf ("/%s%s", slash + 1, entries[i].enc_root);
+	      else
+		*relroot = xasprintf ("/%s@%s", slash + 1, entries[i].enc_root);
+	    }
+	}
+      else
+	{
+	  ret = xmalloc (2 * sizeof (ret[0]));
+	  ret[0] = strdup (entries[i].device);
+	  ret[1] = 0;
+	  if (relroot)
+	    *relroot = strdup (entries[i].enc_root);
+	}
       break;
     }
 
@@ -277,10 +444,10 @@
 
 #endif /* __linux__ */
 
-static char *
-find_root_device_from_libzfs (const char *dir)
+static char **
+find_root_devices_from_libzfs (const char *dir)
 {
-  char *device = NULL;
+  char **device = NULL;
   char *poolname;
   char *poolfs;
 
@@ -288,119 +455,7 @@
   if (! poolname)
     return NULL;
 
-#if defined(HAVE_LIBZFS) && defined(HAVE_LIBNVPAIR)
-  {
-    zpool_handle_t *zpool;
-    libzfs_handle_t *libzfs;
-    nvlist_t *config, *vdev_tree;
-    nvlist_t **children, **path;
-    unsigned int nvlist_count;
-    unsigned int i;
-
-    libzfs = grub_get_libzfs_handle ();
-    if (! libzfs)
-      return NULL;
-
-    zpool = zpool_open (libzfs, poolname);
-    config = zpool_get_config (zpool, NULL);
-
-    if (nvlist_lookup_nvlist (config, "vdev_tree", &vdev_tree) != 0)
-      error (1, errno, "nvlist_lookup_nvlist (\"vdev_tree\")");
-
-    if (nvlist_lookup_nvlist_array (vdev_tree, "children", &children, &nvlist_count) != 0)
-      error (1, errno, "nvlist_lookup_nvlist_array (\"children\")");
-    assert (nvlist_count > 0);
-
-    while (nvlist_lookup_nvlist_array (children[0], "children",
-				       &children, &nvlist_count) == 0)
-      assert (nvlist_count > 0);
-
-    for (i = 0; i < nvlist_count; i++)
-      {
-	if (nvlist_lookup_string (children[i], "path", &device) != 0)
-	  error (1, errno, "nvlist_lookup_string (\"path\")");
-
-	struct stat st;
-	if (stat (device, &st) == 0)
-	  {
-#ifdef __sun__
-	    if (grub_memcmp (device, "/dev/dsk/", sizeof ("/dev/dsk/") - 1)
-		== 0)
-	      device = xasprintf ("/dev/rdsk/%s",
-				  device + sizeof ("/dev/dsk/") - 1);
-	    else if (grub_memcmp (device, "/devices", sizeof ("/devices") - 1)
-		     == 0
-		     && grub_memcmp (device + strlen (device) - 4,
-				     ",raw", 4) != 0)
-	      device = xasprintf ("%s,raw", device);
-	    else
-#endif
-	      device = xstrdup (device);
-	    break;
-	  }
-
-	device = NULL;
-      }
-
-    zpool_close (zpool);
-  }
-#else
-  {
-    char *cmd;
-    FILE *fp;
-    int ret;
-    char *line;
-    size_t len;
-    int st;
-
-    char name[PATH_MAX + 1], state[257], readlen[257], writelen[257];
-    char cksum[257], notes[257];
-    unsigned int dummy;
-
-    cmd = xasprintf ("zpool status %s", poolname);
-    fp = popen (cmd, "r");
-    free (cmd);
-
-    st = 0;
-    while (st < 3)
-      {
-	line = NULL;
-	ret = getline (&line, &len, fp);
-	if (ret == -1)
-	  goto fail;
-	
-	if (sscanf (line, " %s %256s %256s %256s %256s %256s",
-		    name, state, readlen, writelen, cksum, notes) >= 5)
-	  switch (st)
-	    {
-	    case 0:
-	      if (!strcmp (name, "NAME")
-		  && !strcmp (state, "STATE")
-		  && !strcmp (readlen, "READ")
-		  && !strcmp (writelen, "WRITE")
-		  && !strcmp (cksum, "CKSUM"))
-		st++;
-	      break;
-	    case 1:
-	      if (!strcmp (name, poolname))
-		st++;
-	      break;
-	    case 2:
-	      if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
-		  && !sscanf (name, "raidz%u", &dummy)
-		  && !strcmp (state, "ONLINE"))
-		st++;
-	      break;
-	    }
-	
-	free (line);
-      }
-    device = xasprintf ("/dev/%s", name);
-
- fail:
-    pclose (fp);
-  }
-#endif
+  device = find_root_devices_from_poolname (poolname);
 
   free (poolname);
   if (poolfs)
@@ -641,10 +696,10 @@
 
 #endif /* __CYGWIN__ */
 
-char *
-grub_guess_root_device (const char *dir)
+char **
+grub_guess_root_devices (const char *dir)
 {
-  char *os_dev = NULL;
+  char **os_dev = NULL;
 #ifdef __GNU__
   file_t file;
   mach_port_t *ports;
@@ -678,9 +733,11 @@
   if (data[name_len - 1] != '\0')
     grub_util_error (_("Storage name for `%s' not NUL-terminated"), dir);
 
-  os_dev = xmalloc (strlen ("/dev/") + data_len);
-  memcpy (os_dev, "/dev/", strlen ("/dev/"));
-  memcpy (os_dev + strlen ("/dev/"), data, data_len);
+  os_dev = xmalloc (2 * sizeof (os_dev[0]));
+  os_dev[0] = xmalloc (strlen ("/dev/") + data_len);
+  memcpy (os_dev[0], "/dev/", strlen ("/dev/"));
+  memcpy (os_dev[0] + strlen ("/dev/"), data, data_len);
+  os_dev[1] = 0;
 
   if (ports && num_ports > 0)
     {
@@ -707,48 +764,56 @@
 
 #ifdef __linux__
   if (!os_dev)
-    os_dev = grub_find_root_device_from_mountinfo (dir, NULL);
+    os_dev = grub_find_root_devices_from_mountinfo (dir, NULL);
 #endif /* __linux__ */
 
   if (!os_dev)
-    os_dev = find_root_device_from_libzfs (dir);
-
-  if (os_dev)
-    {
-      char *tmp = os_dev;
-      os_dev = canonicalize_file_name (os_dev);
-      free (tmp);
-    }
-
-  if (os_dev)
-    {
-      int dm = (strncmp (os_dev, "/dev/dm-", sizeof ("/dev/dm-") - 1) == 0);
-      int root = (strcmp (os_dev, "/dev/root") == 0);
-      if (!dm && !root)
+    os_dev = find_root_devices_from_libzfs (dir);
+
+  if (os_dev)
+    {
+      char **cur;
+      for (cur = os_dev; *cur; cur++)
+	{
+	  char *tmp = *cur;
+	  int root, dm;
+	  *cur = canonicalize_file_name (*cur);
+	  free (tmp);
+	  root = (strcmp (*cur, "/dev/root") == 0);
+	  dm = (strncmp (*cur, "/dev/dm-", sizeof ("/dev/dm-") - 1) == 0);
+	  if (!dm && !root)
+	    continue;
+	  if (stat (*cur, &st) < 0)
+	    break;
+	  free (*cur);
+	  dev = st.st_rdev;
+	  *cur = grub_find_device (dm ? "/dev/mapper" : "/dev", dev);
+	}
+      if (!*cur)
 	return os_dev;
-      if (stat (os_dev, &st) >= 0)
-	{
-	  free (os_dev);
-	  dev = st.st_rdev;
-	  return grub_find_device (dm ? "/dev/mapper" : "/dev", dev);
-	}
+      for (cur = os_dev; *cur; cur++)
+	free (*cur);
       free (os_dev);
+      os_dev = 0;
     }
 
   if (stat (dir, &st) < 0)
     grub_util_error (_("cannot stat `%s'"), dir);
 
   dev = st.st_dev;
+
+  os_dev = xmalloc (2 * sizeof (os_dev[0]));
   
 #ifdef __CYGWIN__
   /* Cygwin specific function.  */
-  os_dev = grub_find_device (dir, dev);
+  os_dev[0] = grub_find_device (dir, dev);
 
 #else
 
   /* This might be truly slow, but is there any better way?  */
-  os_dev = grub_find_device ("/dev", dev);
+  os_dev[0] = grub_find_device ("/dev", dev);
 #endif
+  os_dev[1] = 0;
 #endif /* !__GNU__ */
 
   return os_dev;
@@ -1565,7 +1630,7 @@
 #ifdef __linux__
 	      {
 		char *bind;
-		grub_free (grub_find_root_device_from_mountinfo (buf2, &bind));
+		grub_free (grub_find_root_devices_from_mountinfo (buf2, &bind));
 		if (bind && bind[0] && bind[1])
 		  {
 		    buf3 = bind;
@@ -1598,7 +1663,7 @@
 #ifdef __linux__
   {
     char *bind;
-    grub_free (grub_find_root_device_from_mountinfo (buf2, &bind));
+    grub_free (grub_find_root_devices_from_mountinfo (buf2, &bind));
     if (bind && bind[0] && bind[1])
       {
 	char *temp = buf3;

=== modified file 'util/grub-install.in'
--- util/grub-install.in	2012-01-27 12:12:00 +0000
+++ util/grub-install.in	2012-01-27 18:29:29 +0000
@@ -473,7 +473,7 @@
 fi
 
 # Create the core image. First, auto-detect the filesystem module.
-fs_module="`"$grub_probe" --device-map="${device_map}" --target=fs --device "${grub_device}"`"
+fs_module="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=fs --device `"
 if test "x$fs_module" = x ; then
     echo "Auto-detection of a filesystem of ${grub_device} failed." 1>&2
     echo "Try with --recheck." 1>&2
@@ -485,7 +485,7 @@
 # this command is allowed to fail (--target=fs already grants us that the
 # filesystem will be accessible).
 partmap_module=
-for x in `"$grub_probe" --device-map="${device_map}" --target=partmap --device "${grub_device}" 2> /dev/null`; do
+for x in `echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=partmap --device 2> /dev/null`; do
    case "$x" in
        netbsd | openbsd) 
 	   partmap_module="$partmap_module part_bsd";;
@@ -496,7 +496,7 @@
 done
 
 # Device abstraction module, if any (lvm, raid).
-devabstraction_module="`"$grub_probe" --device-map="${device_map}" --target=abstraction --device "${grub_device}"`"
+devabstraction_module="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=abstraction --device`"
 
 if [ "x$disk_module" = xata ]; then
     disk_module=pata
@@ -538,14 +538,14 @@
       fi
       install_drive="`echo "${install_drive}" | sed -e 's/^(\(\([^,\\\\]\|\\\\\\\\\|\\\\,\)*\)\(\(,[a-zA-Z0-9]*\)*\))$/\1/'`"
     fi
-    grub_drive="`"$grub_probe" --device-map="${device_map}" --target=drive --device "${grub_device}"`" || exit 1
+    grub_drive="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=drive --device`" || exit 1
 
     # Strip partition number
     grub_partition="`echo "${grub_drive}" | sed -e 's/^(\(\([^,\\\\]\|\\\\\\\\\|\\\\,\)*\)\(\(,[a-zA-Z0-9]*\)*\))$/\3/'`"
     grub_drive="`echo "${grub_drive}" | sed -e 's/^(\(\([^,\\\\]\|\\\\\\\\\|\\\\,\)*\)\(\(,[a-zA-Z0-9]*\)*\))$/\1/'`"
     if ([ "x$disk_module" != x ] && [ "x$disk_module" != xbiosdisk ]) || [ "x${grub_drive}" != "x${install_drive}" ] || ([ "x$platform" != xefi ] && [ "x$platform" != xpc ] && [ x"${platform}" != x"ieee1275" ]); then
         # generic method (used on coreboot and ata mod)
-        uuid="`"$grub_probe" --device-map="${device_map}" --target=fs_uuid --device "${grub_device}"`"
+        uuid="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=fs_uuid --device`"
         if [ "x${uuid}" = "x" ] ; then
           if [ "x$platform" != xefi ] && [ "x$platform" != xpc ] && [ x"${platform}" != x"ieee1275" ]; then
              echo "UUID needed with $platform, but the filesystem containing ${grubdir} does not support UUIDs." 1>&2
@@ -559,15 +559,15 @@
         fi
 
 	if [ x"$disk_module" != x ] && [ x"$disk_module" != xbiosdisk ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device`"
 	elif [ x"$platform" = xpc ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=bios_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=bios_hints --device`"
 	elif [ x"$platform" = xefi ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=efi_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=efi_hints --device`"
 	elif [ x"$platform" = xieee1275 ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=ieee1275_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=ieee1275_hints --device`"
 	elif [ x"$platform" = xloongson ] || [ x"$platform" = xqemu ] || [ x"$platform" = xcoreboot ] || [ x"$platform" = xmultiboot ] || [ x"$platform" = xqemu-mips ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device`"
 	else
             echo "No hints available for your platform. Expect reduced performance"
 	    hints=
@@ -587,7 +587,7 @@
     fi
 else
     if [ x$GRUB_CRYPTODISK_ENABLE = xy ]; then
-	for uuid in "`"${grub_probe}" --device "${grub_device}" --target=cryptodisk_uuid`"; do
+	for uuid in "`echo "${grub_device}" | xargs "${grub_probe}"  --target=cryptodisk_uuid --device`"; do
 	    echo "cryptomount -u $uuid" >> "${grubdir}/load.cfg"
 	done
 	config_opt="-c ${grubdir}/load.cfg "

=== modified file 'util/grub-probe.c'
--- util/grub-probe.c	2012-01-23 18:33:40 +0000
+++ util/grub-probe.c	2012-01-27 18:16:06 +0000
@@ -300,299 +300,343 @@
     printf ("raid6rec ");
 }
 
+static inline void
+delim (int zdelim)
+{
+  if (zdelim)
+    putchar ('\0');
+  else
+    putchar ('\n');
+}
+
 static void
-probe (const char *path, char *device_name)
+probe (const char *path, char **device_names, int zdelim)
 {
-  char *drive_name = NULL;
+  char **drives_names = NULL;
+  char **curdev, **curdrive;
   char *grub_path = NULL;
-  char *filebuf_via_grub = NULL, *filebuf_via_sys = NULL;
-  grub_device_t dev = NULL;
-  grub_fs_t fs;
+  int ndev = 0;
 
-  if (path == NULL)
-    {
-#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || defined(__NetBSD__) || defined(__sun__)
-      if (! grub_util_check_char_device (device_name))
-        grub_util_error (_("%s is not a character device"), device_name);
-#else
-      if (! grub_util_check_block_device (device_name))
-        grub_util_error (_("%s is not a block device"), device_name);
-#endif
-    }
-  else
+  if (path != NULL)
     {
       grub_path = canonicalize_file_name (path);
-      device_name = grub_guess_root_device (grub_path);
+      device_names = grub_guess_root_devices (grub_path);
+      free (grub_path);
     }
 
-  if (! device_name)
+  if (! device_names)
     grub_util_error (_("cannot find a device for %s (is /dev mounted?)"), path);
 
   if (print == PRINT_DEVICE)
     {
-      printf ("%s\n", device_name);
-      goto end;
-    }
-
-  drive_name = grub_util_get_grub_dev (device_name);
-  if (! drive_name)
-    grub_util_error (_("cannot find a GRUB drive for %s.  Check your device.map"),
-		     device_name);
-
-  if (print == PRINT_DRIVE)
-    {
-      printf ("(%s)\n", drive_name);
-      goto end;
-    }
-
-  grub_util_info ("opening %s", drive_name);
-  dev = grub_device_open (drive_name);
-  if (! dev)
-    grub_util_error ("%s", _(grub_errmsg));
-
-  if (print == PRINT_HINT_STR)
-    {
-      const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
-      const char *orig_path = grub_util_devname_to_ofpath (osdev);
-      char *biosname, *bare, *efi;
-      const char *map;
-
-      if (orig_path)
-	{
+      for (curdev = device_names; *curdev; curdev++)
+	{
+	  printf ("%s", *curdev);
+	  delim (zdelim);
+	}
+      return;
+    }
+
+  for (curdev = device_names; *curdev; curdev++)
+    {
+      grub_util_pull_device (*curdev);
+      ndev++;
+    }
+  
+  drives_names = xmalloc (sizeof (drives_names[0]) * (ndev + 1)); 
+
+  for (curdev = device_names, curdrive = drives_names; *curdev; curdev++,
+       curdrive++)
+    {
+      *curdrive = grub_util_get_grub_dev (*curdev);
+      if (! *curdrive)
+	grub_util_error (_("cannot find a GRUB drive for %s.  Check your device.map"),
+			 *curdev);
+    }
+  *curdrive = 0;
+
+  if (print == PRINT_FS || print == PRINT_FS_UUID
+      || print == PRINT_FS_LABEL)
+    {
+      grub_device_t dev = NULL;
+      grub_fs_t fs;
+
+      grub_util_info ("opening %s", drives_names[0]);
+      dev = grub_device_open (drives_names[0]);
+      if (! dev)
+	grub_util_error ("%s", _(grub_errmsg));
+      
+      fs = grub_fs_probe (dev);
+      if (! fs)
+	grub_util_error ("%s", _(grub_errmsg));
+
+      if (print == PRINT_FS)
+	{
+	  printf ("%s", fs->name);
+	  delim (zdelim);
+	}
+      else if (print == PRINT_FS_UUID)
+	{
+	  char *uuid;
+	  if (! fs->uuid)
+	    grub_util_error (_("%s does not support UUIDs"), fs->name);
+
+	  if (fs->uuid (dev, &uuid) != GRUB_ERR_NONE)
+	    grub_util_error ("%s", grub_errmsg);
+
+	  printf ("%s", uuid);
+	  delim (zdelim);
+	}
+      else if (print == PRINT_FS_LABEL)
+	{
+	  char *label;
+	  if (! fs->label)
+	    grub_util_error (_("%s does not support labels"), fs->name);
+
+	  if (fs->label (dev, &label) != GRUB_ERR_NONE)
+	    grub_util_error ("%s", _(grub_errmsg));
+
+	  printf ("%s", label);
+	  delim (zdelim);
+	}
+      goto end;
+    }
+
+  for (curdrive = drives_names, curdev = device_names; *curdrive;
+       curdrive++, curdev++)
+    {
+      grub_device_t dev = NULL;
+
+      grub_util_info ("opening %s", *curdrive);
+      dev = grub_device_open (*curdrive);
+      if (! dev)
+	grub_util_error ("%s", _(grub_errmsg));
+
+      if (print == PRINT_HINT_STR)
+	{
+	  const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
+	  const char *orig_path = grub_util_devname_to_ofpath (osdev);
+	  char *biosname, *bare, *efi;
+	  const char *map;
+
+	  if (orig_path)
+	    {
+	      char *ofpath = escape_of_path (orig_path);
+	      printf ("--hint-ieee1275='");
+	      print_full_name (ofpath, dev);
+	      printf ("' ");
+	      free (ofpath);
+	    }
+
+	  biosname = guess_bios_drive (*curdev);
+	  if (biosname)
+	    {
+	      printf ("--hint-bios=");
+	      print_full_name (biosname, dev);
+	      printf (" ");
+	    }
+	  free (biosname);
+
+	  efi = guess_efi_drive (*curdev);
+	  if (efi)
+	    {
+	      printf ("--hint-efi=");
+	      print_full_name (efi, dev);
+	      printf (" ");
+	    }
+	  free (efi);
+
+	  bare = guess_baremetal_drive (*curdev);
+	  if (bare)
+	    {
+	      printf ("--hint-baremetal=");
+	      print_full_name (bare, dev);
+	      printf (" ");
+	    }
+	  free (bare);
+
+	  /* FIXME: Add ARC hint.  */
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      printf ("--hint='");
+	      print_full_name (map, dev);
+	      printf ("' ");
+	    }
+	  printf ("\n");
+
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_COMPATIBILITY_HINT)
+	{
+	  const char *map;
+	  char *biosname;
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      print_full_name (map, dev);
+	      delim (zdelim);
+	      grub_device_close (dev);
+	      /* Compatibility hint is one device only.  */
+	      break;
+	    }
+	  biosname = guess_bios_drive (*curdev);
+	  if (biosname)
+	    print_full_name (biosname, dev);
+	  delim (zdelim);
+	  free (biosname);
+	  grub_device_close (dev);
+	  /* Compatibility hint is one device only.  */
+	  if (biosname)
+	    break;
+	  continue;
+	}
+
+      if (print == PRINT_BIOS_HINT)
+	{
+	  char *biosname;
+	  biosname = guess_bios_drive (*curdev);
+	  if (biosname)
+	    print_full_name (biosname, dev);
+	  delim (zdelim);
+	  free (biosname);
+	  grub_device_close (dev);
+	  continue;
+	}
+      if (print == PRINT_IEEE1275_HINT)
+	{
+	  const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
+	  const char *orig_path = grub_util_devname_to_ofpath (osdev);
 	  char *ofpath = escape_of_path (orig_path);
-	  printf ("--hint-ieee1275='");
+	  const char *map;
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      printf (" ");
+	      print_full_name (map, dev);
+	    }
+
+	  printf (" ");
 	  print_full_name (ofpath, dev);
-	  printf ("' ");
+
+	  delim (zdelim);
 	  free (ofpath);
-	}
-
-      biosname = guess_bios_drive (device_name);
-      if (biosname)
-	{
-	  printf ("--hint-bios=");
-	  print_full_name (biosname, dev);
-	  printf (" ");
-	}
-      free (biosname);
-
-      efi = guess_efi_drive (device_name);
-      if (efi)
-	{
-	  printf ("--hint-efi=");
-	  print_full_name (efi, dev);
-	  printf (" ");
-	}
-      free (efi);
-
-      bare = guess_baremetal_drive (device_name);
-      if (bare)
-	{
-	  printf ("--hint-baremetal=");
-	  print_full_name (bare, dev);
-	  printf (" ");
-	}
-      free (bare);
-
-      /* FIXME: Add ARC hint.  */
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf ("--hint='");
-	  print_full_name (map, dev);
-	  printf ("' ");
-	}
-      printf ("\n");
-
-      goto end;
-    }
-
-  if (print == PRINT_COMPATIBILITY_HINT)
-    {
-      const char *map;
-      char *biosname;
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  print_full_name (map, dev);
-	  printf ("\n");
-	  goto end;
-	}
-      biosname = guess_bios_drive (device_name);
-      if (biosname)
-	print_full_name (biosname, dev);
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-
-  if (print == PRINT_BIOS_HINT)
-    {
-      char *biosname;
-      biosname = guess_bios_drive (device_name);
-      if (biosname)
-	print_full_name (biosname, dev);
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-  if (print == PRINT_IEEE1275_HINT)
-    {
-      const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
-      const char *orig_path = grub_util_devname_to_ofpath (osdev);
-      char *ofpath = escape_of_path (orig_path);
-      const char *map;
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-
-      printf (" ");
-      print_full_name (ofpath, dev);
-
-      printf ("\n");
-      free (ofpath);
-      goto end;
-    }
-  if (print == PRINT_EFI_HINT)
-    {
-      char *biosname;
-      char *name;
-      const char *map;
-      biosname = guess_efi_drive (device_name);
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-      if (biosname)
-	{
-	  printf (" ");
-	  print_full_name (biosname, dev);
-	}
-
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-
-  if (print == PRINT_BAREMETAL_HINT)
-    {
-      char *biosname;
-      char *name;
-      const char *map;
-
-      biosname = guess_baremetal_drive (device_name);
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-      if (biosname)
-	{
-	  printf (" ");
-	  print_full_name (biosname, dev);
-	}
-
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-
-  if (print == PRINT_ARC_HINT)
-    {
-      const char *map;
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-      printf ("\n");
-
-      /* FIXME */
-
-      goto end;
-    }
-
-  if (print == PRINT_ABSTRACTION)
-    {
-      probe_abstraction (dev->disk);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_CRYPTODISK_UUID)
-    {
-      probe_cryptodisk_uuid (dev->disk);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_PARTMAP)
-    {
-      /* Check if dev->disk itself is contained in a partmap.  */
-      probe_partmap (dev->disk);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_MSDOS_PARTTYPE)
-    {
-      if (dev->disk->partition
-	  && strcmp(dev->disk->partition->partmap->name, "msdos") == 0)
-        printf ("%02x", dev->disk->partition->msdostype);
-
-      printf ("\n");
-      goto end;
-    }
-
-  fs = grub_fs_probe (dev);
-  if (! fs)
-    grub_util_error ("%s", _(grub_errmsg));
-
-  if (print == PRINT_FS)
-    {
-      printf ("%s\n", fs->name);
-    }
-  else if (print == PRINT_FS_UUID)
-    {
-      char *uuid;
-      if (! fs->uuid)
-	grub_util_error (_("%s does not support UUIDs"), fs->name);
-
-      if (fs->uuid (dev, &uuid) != GRUB_ERR_NONE)
-	grub_util_error ("%s", grub_errmsg);
-
-      printf ("%s\n", uuid);
-    }
-  else if (print == PRINT_FS_LABEL)
-    {
-      char *label;
-      if (! fs->label)
-	grub_util_error (_("%s does not support labels"), fs->name);
-
-      if (fs->label (dev, &label) != GRUB_ERR_NONE)
-	grub_util_error ("%s", _(grub_errmsg));
-
-      printf ("%s\n", label);
+	  grub_device_close (dev);
+	  continue;
+	}
+      if (print == PRINT_EFI_HINT)
+	{
+	  char *biosname;
+	  char *name;
+	  const char *map;
+	  biosname = guess_efi_drive (*curdev);
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      printf (" ");
+	      print_full_name (map, dev);
+	    }
+	  if (biosname)
+	    {
+	      printf (" ");
+	      print_full_name (biosname, dev);
+	    }
+
+	  delim (zdelim);
+	  free (biosname);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_BAREMETAL_HINT)
+	{
+	  char *biosname;
+	  char *name;
+	  const char *map;
+
+	  biosname = guess_baremetal_drive (*curdev);
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      printf (" ");
+	      print_full_name (map, dev);
+	    }
+	  if (biosname)
+	    {
+	      printf (" ");
+	      print_full_name (biosname, dev);
+	    }
+
+	  delim (zdelim);
+	  free (biosname);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_ARC_HINT)
+	{
+	  const char *map;
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      printf (" ");
+	      print_full_name (map, dev);
+	    }
+	  delim (zdelim);
+
+	  /* FIXME */
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_ABSTRACTION)
+	{
+	  probe_abstraction (dev->disk);
+	  delim (zdelim);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_CRYPTODISK_UUID)
+	{
+	  probe_cryptodisk_uuid (dev->disk);
+	  delim (zdelim);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_PARTMAP)
+	{
+	  /* Check if dev->disk itself is contained in a partmap.  */
+	  probe_partmap (dev->disk);
+	  delim (zdelim);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_MSDOS_PARTTYPE)
+	{
+	  if (dev->disk->partition
+	      && strcmp(dev->disk->partition->partmap->name, "msdos") == 0)
+	    printf ("%02x", dev->disk->partition->msdostype);
+
+	  delim (zdelim);
+	  grub_device_close (dev);
+	  continue;
+	}
     }
 
  end:
-  if (dev)
-    grub_device_close (dev);
-  free (grub_path);
-  free (filebuf_via_grub);
-  free (filebuf_via_sys);
-  free (drive_name);
+  for (curdrive = drives_names; *curdrive; curdrive++)
+    free (*curdrive);
+  free (drives_names);
 }
 
 static struct option options[] =
@@ -637,7 +681,7 @@
 main (int argc, char *argv[])
 {
   char *dev_map = 0;
-  char *argument;
+  int zero_delim = 0;
 
   set_program_name (argv[0]);
 
@@ -646,7 +690,7 @@
   /* Check for options.  */
   while (1)
     {
-      int c = getopt_long (argc, argv, "dm:t:hVv", options, 0);
+      int c = getopt_long (argc, argv, "dm:t:hVv0", options, 0);
 
       if (c == -1)
 	break;
@@ -705,6 +749,10 @@
 	    usage (0);
 	    break;
 
+	  case '0':
+	    zero_delim = 1;
+	    break;
+
 	  case 'V':
 	    printf ("%s (%s) %s\n", program_name, PACKAGE_NAME, PACKAGE_VERSION);
 	    return 0;
@@ -729,14 +777,12 @@
       usage (1);
     }
 
-  if (optind + 1 != argc)
+  if (optind + 1 != argc && !argument_is_device)
     {
       fprintf (stderr, _("Unknown extra argument `%s'.\n"), argv[optind + 1]);
       usage (1);
     }
 
-  argument = argv[optind];
-
   /* Initialize the emulated biosdisk driver.  */
   grub_util_biosdisk_init (dev_map ? : DEFAULT_DEVICE_MAP);
 
@@ -755,9 +801,9 @@
 
   /* Do it.  */
   if (argument_is_device)
-    probe (NULL, argument);
+    probe (NULL, argv + optind, zero_delim);
   else
-    probe (argument, NULL);
+    probe (argv[optind], NULL, zero_delim);
 
   /* Free resources.  */
   grub_gcry_fini_all ();

=== modified file 'util/grub-setup.c'
--- util/grub-setup.c	2012-01-24 13:39:29 +0000
+++ util/grub-setup.c	2012-01-27 18:31:20 +0000
@@ -133,15 +139,16 @@
 static void
 setup (const char *dir,
        const char *boot_file, const char *core_file,
-       const char *root, const char *dest, int must_embed, int force,
+       const char *dest, int force,
        int fs_probe, int allow_floppy)
 {
   char *boot_path, *core_path, *core_path_dev, *core_path_dev_full;
   char *boot_img, *core_img;
+  char *root = 0;
   size_t boot_size, core_size;
   grub_uint16_t core_sectors;
-  grub_device_t root_dev, dest_dev;
-  struct grub_boot_blocklist *first_block, *block;
+  grub_device_t root_dev = 0, dest_dev;
+  struct grub_boot_blocklist *first_block, *block, *last_block;
   char *tmp_img;
   int i;
   grub_disk_addr_t first_sector;
@@ -235,17 +241,56 @@
 						- sizeof (*block));
   grub_util_info ("root is `%s', dest is `%s'", root, dest);
 
-  /* Open the root device and the destination device.  */
-  grub_util_info ("Opening root");
-  root_dev = grub_device_open (root);
-  if (! root_dev)
-    grub_util_error ("%s", _(grub_errmsg));
-
   grub_util_info ("Opening dest");
   dest_dev = grub_device_open (dest);
   if (! dest_dev)
     grub_util_error ("%s", _(grub_errmsg));
 
+  {
+    char **root_devices = grub_guess_root_devices (dir);
+    char **cur;
+    int found = 0;
+
+    for (cur = root_devices; *cur; cur++)
+      {
+	char *drive;
+	grub_device_t try_dev;
+
+	drive = grub_util_get_grub_dev (*cur);
+	if (!drive)
+	  continue;
+	try_dev = grub_device_open (drive);
+	if (! try_dev)
+	  continue;
+	if (!found && try_dev->disk->id == dest_dev->disk->id
+	    && try_dev->disk->dev->id == dest_dev->disk->dev->id)
+	  {
+	    if (root_dev)
+	      grub_device_close (root_dev);
+	    free (root);
+	    root_dev = try_dev;
+	    root = drive;
+	    found = 1;
+	    continue;
+	  }
+	if (!root_dev)
+	  {
+	    root_dev = try_dev;
+	    root = drive;
+	    continue;
+	  }
+	grub_device_close (try_dev);	
+	free (drive);
+      }
+    if (!root_dev)
+      {
+	grub_util_error ("guessing the root device failed, because of `%s'",
+			 grub_errmsg);
+      }
+    grub_util_info ("guessed root_dev `%s' from "
+		    "dir `%s'", root_dev->disk->name, dir);
+  }
+
   grub_util_info ("setting the root device to `%s'", root);
   if (grub_env_set ("root", root) != GRUB_ERR_NONE)
     grub_util_error ("%s", _(grub_errmsg));
@@ -485,16 +530,24 @@
 
 unable_to_embed:
 
-  if (must_embed)
-    grub_util_error (_("embedding is not possible, but this is required when "
-		       "the root device is on a RAID array or LVM volume"));
-
 #ifdef GRUB_MACHINE_PCBIOS
-  if (dest_dev->disk->id != root_dev->disk->id)
+  if (dest_dev->disk->id != root_dev->disk->id
+      || dest_dev->disk->dev->id != root_dev->disk->dev->id)
     grub_util_error (_("embedding is not possible, but this is required for "
 		       "cross-disk install"));
 #endif
 
+  {
+    grub_fs_t fs;
+    fs = grub_fs_probe (root_dev);
+    if (!fs)
+      grub_util_error (_("can't determine filesystem"));
+
+    if (!fs->blocklist_install)
+      grub_util_error (_("filesystem '%s' doesn't support blocklists"),
+		       fs->name);
+  }
+
   grub_util_warn (_("Embedding is not possible.  GRUB can only be installed in this "
 		    "setup by using blocklists.  However, blocklists are UNRELIABLE and "
 		    "their use is discouraged."));
@@ -617,11 +783,12 @@
     boot_devpath = (char *) (boot_img
 			     + GRUB_BOOT_AOUT_HEADER_SIZE
 			     + GRUB_BOOT_MACHINE_BOOT_DEVPATH);
-    if (file->device->disk->id != dest_dev->disk->id)
+    if (dest_dev->disk->id != root_dev->disk->id
+	|| dest_dev->disk->dev->id != root_dev->disk->dev->id)
       {
 	const char *dest_ofpath;
 	dest_ofpath
-	  = grub_util_devname_to_ofpath (grub_util_biosdisk_get_osdev (file->device->disk));
+	  = grub_util_devname_to_ofpath (grub_util_biosdisk_get_osdev (root_dev->disk));
 	grub_util_info ("dest_ofpath is `%s'", dest_ofpath);
 	strncpy (boot_devpath, dest_ofpath, GRUB_BOOT_MACHINE_BOOT_DEVPATH_END
 		 - GRUB_BOOT_MACHINE_BOOT_DEVPATH - 1);
@@ -722,7 +959,6 @@
   char *core_file;
   char *dir;
   char *dev_map;
-  char *root_dev;
   int  force;
   int  fs_probe;
   int allow_floppy;
@@ -783,13 +1019,6 @@
         arguments->dev_map = xstrdup (arg);
         break;
 
-      case 'r':
-        if (arguments->root_dev)
-          free (arguments->root_dev);
-
-        arguments->root_dev = xstrdup (arg);
-        break;
-
       case 'f':
         arguments->force = 1;
         break;
@@ -853,7 +1082,6 @@
 {
   char *root_dev = NULL;
   char *dest_dev = NULL;
-  int must_embed = 0;
   struct arguments arguments;
 
   set_program_name (argv[0]);
@@ -917,80 +1145,12 @@
       grub_util_info ("Using `%s' as GRUB device", dest_dev);
     }
 
-  if (arguments.root_dev)
-    {
-      root_dev = get_device_name (arguments.root_dev);
-
-      if (! root_dev)
-        grub_util_error (_("invalid root device `%s'"), arguments.root_dev);
-
-      root_dev = xstrdup (root_dev);
-    }
-  else
-    {
-      char *root_device =
-        grub_guess_root_device (arguments.dir ? : DEFAULT_DIRECTORY);
-
-      root_dev = grub_util_get_grub_dev (root_device);
-      if (! root_dev)
-	{
-	  grub_util_info ("guessing the root device failed, because of `%s'",
-			  grub_errmsg);
-          grub_util_error (_("cannot guess the root device. Specify the option "
-                             "`--root-device'"));
-	}
-      grub_util_info ("guessed root device `%s' and root_dev `%s' from "
-                      "dir `%s'", root_device, root_dev,
-                      arguments.dir ? : DEFAULT_DIRECTORY);
-    }
-
-#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
-  if (grub_util_lvm_isvolume (root_dev))
-    must_embed = 1;
-#endif
-
-#ifdef __linux__
-  if (root_dev[0] == 'm' && root_dev[1] == 'd'
-      && ((root_dev[2] >= '0' && root_dev[2] <= '9') || root_dev[2] == '/'))
-    {
-      /* FIXME: we can avoid this on RAID1.  */
-      must_embed = 1;
-    }
-
-  if (dest_dev[0] == 'm' && dest_dev[1] == 'd'
-      && ((dest_dev[2] >= '0' && dest_dev[2] <= '9') || dest_dev[2] == '/'))
-    {
-      char **devicelist;
-      int i;
-
-      if (arguments.device[0] == '/')
-	devicelist = grub_util_raid_getmembers (arguments.device, 1);
-      else
-	{
-	  char *devname;
-	  devname = xasprintf ("/dev/%s", dest_dev);
-	  devicelist = grub_util_raid_getmembers (dest_dev, 1);
-	  free (devname);
-	}
-
-      for (i = 0; devicelist[i]; i++)
-        {
-          setup (arguments.dir ? : DEFAULT_DIRECTORY,
-                 arguments.boot_file ? : DEFAULT_BOOT_FILE,
-                 arguments.core_file ? : DEFAULT_CORE_FILE,
-                 root_dev, grub_util_get_grub_dev (devicelist[i]), 1,
-                 arguments.force, arguments.fs_probe,
-		 arguments.allow_floppy);
-        }
-    }
-  else
-#endif
-    /* Do the real work.  */
-    setup (arguments.dir ? : DEFAULT_DIRECTORY,
-           arguments.boot_file ? : DEFAULT_BOOT_FILE,
-           arguments.core_file ? : DEFAULT_CORE_FILE,
-           root_dev, dest_dev, must_embed, arguments.force,
-	   arguments.fs_probe, arguments.allow_floppy);
+  /* Do the real work.  */
+  setup (arguments.dir ? : DEFAULT_DIRECTORY,
+	 arguments.boot_file ? : DEFAULT_BOOT_FILE,
+	 arguments.core_file ? : DEFAULT_CORE_FILE,
+	 dest_dev, arguments.force,
+	 arguments.fs_probe, arguments.allow_floppy);
 
   /* Free resources.  */
   grub_fini_all ();
@@ -999,7 +1159,6 @@
   free (arguments.boot_file);
   free (arguments.core_file);
   free (arguments.dir);
-  free (arguments.root_dev);
   free (arguments.dev_map);
   free (arguments.device);
   free (root_dev);


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-01-28 16:50             ` Richard Laager
  2012-01-28 17:06               ` Darik Horn
  2012-01-28 18:33             ` Richard Laager
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Richard Laager @ 2012-01-28 16:50 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko; +Cc: grub-devel

[-- Attachment #1: Type: text/plain, Size: 173 bytes --]

I tried to apply your patch to revision 3781 and got this:

util/grub-setup.c:541:12:
    error 'struct grub_fs' has no member named 'blocklist_install'

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 16:50             ` Richard Laager
@ 2012-01-28 17:06               ` Darik Horn
  2012-01-28 17:39                 ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 1 reply; 26+ messages in thread
From: Darik Horn @ 2012-01-28 17:06 UTC (permalink / raw)
  To: The development of GNU GRUB

2012/1/28 Richard Laager <rlaager@wiktel.com>:
> I tried to apply your patch to revision 3781 and got this:
>
> util/grub-setup.c:541:12:
>    error 'struct grub_fs' has no member named 'blocklist_install'

This is not caused by the proposed zfs patch.  The grub trunk fails to
build on Debian, Ubuntu, and OpenIndiana.

Which grub branch or which distribution should we use for testing?

-- 
Darik Horn <dajhorn@vanadac.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 17:06               ` Darik Horn
@ 2012-01-28 17:39                 ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-28 17:39 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Darik Horn

On 28.01.2012 18:06, Darik Horn wrote:
> 2012/1/28 Richard Laager<rlaager@wiktel.com>:
>> I tried to apply your patch to revision 3781 and got this:
>>
>> util/grub-setup.c:541:12:
>>     error 'struct grub_fs' has no member named 'blocklist_install'
> This is not caused by the proposed zfs patch.
It is. blocklist_install from unreleated change slipped into zfs.diff. 
No such thing exists in trunk.
>    The grub trunk fails to
> build on Debian, Ubuntu, and OpenIndiana.
What's the error? Builds here on Debian and Fedora just fine.
> Which grub branch or which distribution should we use for testing?
>


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-28 16:50             ` Richard Laager
@ 2012-01-28 18:33             ` Richard Laager
  2012-01-28 19:21               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-29 22:42               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-28 18:40             ` Darik Horn
  2012-01-30  1:22             ` Richard Laager
  3 siblings, 2 replies; 26+ messages in thread
From: Richard Laager @ 2012-01-28 18:33 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 2502 bytes --]

On Sat, 2012-01-28 at 13:51 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
wrote:
> > Index: grub/util/grub.d/10_linux.in
> > ===================================================================
> > --- grub.orig/util/grub.d/10_linux.in	2012-01-24 23:44:10.530591000 -0600
> > +++ grub/util/grub.d/10_linux.in	2012-01-24 23:44:10.706928000 -0600
> > @@ -56,8 +56,10 @@
> >     LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
> >   fi
> >
> > -if [ "x`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`" = xbtrfs ] \
> > -    || [ "x`stat -f --printf=%T /`" = xbtrfs ]; then
> > +LINUX_ROOT_FS=`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`
> > +LINUX_ROOT_STAT=`stat -f --printf=%T / || true`
> > +
> > +if [ "x${LINUX_ROOT_FS}" = xbtrfs -o "x${LINUX_ROOT_STAT}" = xbtrfs ]; then
> >     rootsubvol="`make_system_path_relative_to_its_root /`"
> >     rootsubvol="${rootsubvol#/}"
> >     if [ "x${rootsubvol}" != x ]; then
> > @@ -76,6 +78,10 @@
> >       GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"
> >   fi
> >
> > +if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
> > +  GRUB_CMDLINE_LINUX="boot=zfs \$bootfs ${GRUB_CMDLINE_LINUX}"
> > +fi
> > +
> >   linux_entry ()
> >   {
> >     os="$1"
> > @@ -114,6 +120,12 @@
> >       fi
> >       printf '%s\n' "${prepare_boot_cache}"
> >     fi
> > +  if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
> > +    cat<<  EOF
> > +	insmod zfsinfo
> > +	zfs-bootfs (\$root) bootfs
> This makes 3 wrong assumptions in a row:
> - / and /boot may be different.

Despite the variable being called LINUX_ROOT_FS, this is really the
output from grub-probe --device ${GRUB_DEVICE}. When / != /boot, is
$GRUB_DEVICE the device of / or /boot?

> - Linux may be in a non-root subvolume. Then the subvolid points to 
> wrong one.

By "Linux", you're talking about the kernel, as opposed to the root
filesystem, correct?

What do you mean by "non-root subvolume"? That sounds like a btrfs term,
not a ZFS term, so I don't follow.

> - / may be unaccessible to GRUB altogether.

Are you talking about at grub-install time or boot time? Can you provide
an example of when this might happen, so I can understand.

> In short: this command line part has to be generated on grub-mkconfig 
> time and have a stable representation. I'd recommend UUID and subvolume 
> name.

By "this command line part", are you talking about the path to the
kernel?

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-28 16:50             ` Richard Laager
  2012-01-28 18:33             ` Richard Laager
@ 2012-01-28 18:40             ` Darik Horn
  2012-01-28 19:27               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-30  1:22             ` Richard Laager
  3 siblings, 1 reply; 26+ messages in thread
From: Darik Horn @ 2012-01-28 18:40 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Richard Laager, Zachary Bedell

2012/1/28 Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@gmail.com>:
>
> I already commented on 10_linux.in changes. They are pretty sloppy. (mostly
> is "it works for me and I don't care about other legitimate configs")

How do you want these things changed?  I searched the email archive
and couldn't find where you gave an example of another legitimate
configuration, or suggested to how improve any of the related
submissions by Robert Millan, Richard Laager, or Zachary Bedell.

Keep in mind that ZoL cannot resolve the grub mdnobj number, or use
the Solaris bootpath or devid.  The only thing that matters on the
Linux command line is the pool name, and everything else is for
interface compatibility with Solaris.

Also note that this code is in the latest ZFS patch:

  if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
    && !sscanf (name, "raidz%u", &dummy)

That format doesn't match the `zpool status` output on my current
Solaris and Linux computers.  The vdev names now have the RAID level
number and a dash character like this:

  raidz1-0
  raidz1-1
  raidz2-0
  raidz3-0

Additionally, OpenIndiana currently changes an unqualified `zpool
create tank raidz ...` into "raidz1".

-- 
Darik Horn <dajhorn@vanadac.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 18:33             ` Richard Laager
@ 2012-01-28 19:21               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-29 22:42               ` Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-28 19:21 UTC (permalink / raw)
  To: Richard Laager; +Cc: grub-devel, Zachary Bedell

On 28.01.2012 19:33, Richard Laager wrote:
> On Sat, 2012-01-28 at 13:51 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
> wrote:
>>> Index: grub/util/grub.d/10_linux.in
>>> ===================================================================
>>> --- grub.orig/util/grub.d/10_linux.in	2012-01-24 23:44:10.530591000 -0600
>>> +++ grub/util/grub.d/10_linux.in	2012-01-24 23:44:10.706928000 -0600
>>> @@ -56,8 +56,10 @@
>>>      LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
>>>    fi
>>>
>>> -if [ "x`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`" = xbtrfs ] \
>>> -    || [ "x`stat -f --printf=%T /`" = xbtrfs ]; then
>>> +LINUX_ROOT_FS=`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`
>>> +LINUX_ROOT_STAT=`stat -f --printf=%T / || true`
>>> +
>>> +if [ "x${LINUX_ROOT_FS}" = xbtrfs -o "x${LINUX_ROOT_STAT}" = xbtrfs ]; then
>>>      rootsubvol="`make_system_path_relative_to_its_root /`"
>>>      rootsubvol="${rootsubvol#/}"
>>>      if [ "x${rootsubvol}" != x ]; then
>>> @@ -76,6 +78,10 @@
>>>        GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"
>>>    fi
>>>
>>> +if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
>>> +  GRUB_CMDLINE_LINUX="boot=zfs \$bootfs ${GRUB_CMDLINE_LINUX}"
>>> +fi
>>> +
>>>    linux_entry ()
>>>    {
>>>      os="$1"
>>> @@ -114,6 +120,12 @@
>>>        fi
>>>        printf '%s\n' "${prepare_boot_cache}"
>>>      fi
>>> +  if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
>>> +    cat<<   EOF
>>> +	insmod zfsinfo
>>> +	zfs-bootfs (\$root) bootfs
>> This makes 3 wrong assumptions in a row:
>> - / and /boot may be different.
> Despite the variable being called LINUX_ROOT_FS, this is really the
> output from grub-probe --device ${GRUB_DEVICE}. When / != /boot, is
> $GRUB_DEVICE the device of / or /boot?
Yes. But "zfs-bootfs (\$root) bootfs" will fill "bootfs" with the info 
for /boot.
>> - Linux may be in a non-root subvolume. Then the subvolid points to
>> wrong one.
> By "Linux", you're talking about the kernel, as opposed to the root
> filesystem, correct?
Sorry, I used wrong term. I meant "OS root".
>
> What do you mean by "non-root subvolume"? That sounds like a btrfs term,
> not a ZFS term, so I don't follow.
I refuse to use ZFS terminology due to its term misuse for marketing 
reasons (calling subvolume a "filesystem" just to say "we handle 
N*1000000 filesystems on the server" is pure marketing).
In short my system may be in mypool/OS/Debian and not just mypool/
>
>> - / may be unaccessible to GRUB altogether.
> Are you talking about at grub-install time or boot time? Can you provide
> an example of when this might happen, so I can understand.
boot time. Easy: ZFS on LUKS with a separate /boot and GRUB LUKS support 
disabled (like the default).
>> In short: this command line part has to be generated on grub-mkconfig
>> time and have a stable representation. I'd recommend UUID and subvolume
>> name.
> By "this command line part", are you talking about the path to the
> kernel?
The part "fs=zfs $bootfs"
>


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 18:40             ` Darik Horn
@ 2012-01-28 19:27               ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-28 19:27 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Darik Horn, Richard Laager, Zachary Bedell

On 28.01.2012 19:40, Darik Horn wrote:
> 2012/1/28 Vladimir 'φ-coder/phcoder' Serbinenko<phcoder@gmail.com>:
>> I already commented on 10_linux.in changes. They are pretty sloppy. (mostly
>> is "it works for me and I don't care about other legitimate configs")
> How do you want these things changed?  I searched the email archive
> and couldn't find where you gave an example of another legitimate
> configuration, or suggested to how improve any of the related
> submissions by Robert Millan, Richard Laager, or Zachary Bedell.
Even if I didn't do before (I didn't recheck), I did it in another 
message in this thread
> Keep in mind that ZoL cannot resolve the grub mdnobj number, or use
> the Solaris bootpath or devid.  The only thing that matters on the
> Linux command line is the pool name, and everything else is for
> interface compatibility with Solaris.
Even this is already one piece of info which will be resolved wrong.
(Also note that if you need only this particular info you should 
probably consider zfsinfo. In this case it doesn't matter because it's 
still wrong though)
> Also note that this code is in the latest ZFS patch:
>
>    if (strcmp (name, "mirror")&&  !sscanf (name, "mirror-%u",&dummy)
>      &&  !sscanf (name, "raidz%u",&dummy)
>
> That format doesn't match the `zpool status` output on my current
> Solaris and Linux computers.  The vdev names now have the RAID level
> number and a dash character like this:
>
>    raidz1-0
>    raidz1-1
>    raidz2-0
>    raidz3-0
Does adding

"&&  !sscanf (name, "raidz1-%u",&dummy)&&  &&  !sscanf (name, "raidz2-%u",&dummy)&&  !sscanf (name, "raidz3-%u",&dummy)"
resolve this problem?


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 18:33             ` Richard Laager
  2012-01-28 19:21               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-01-29 22:42               ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-01-31  8:45                 ` Richard Laager
  1 sibling, 1 reply; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-29 22:42 UTC (permalink / raw)
  To: Richard Laager; +Cc: grub-devel, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 2565 bytes --]

On 28.01.2012 19:33, Richard Laager wrote:
> On Sat, 2012-01-28 at 13:51 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
> wrote:
>>> Index: grub/util/grub.d/10_linux.in
>>> ===================================================================
>>> --- grub.orig/util/grub.d/10_linux.in	2012-01-24 23:44:10.530591000 -0600
>>> +++ grub/util/grub.d/10_linux.in	2012-01-24 23:44:10.706928000 -0600
>>> @@ -56,8 +56,10 @@
>>>      LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
>>>    fi
>>>
>>> -if [ "x`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`" = xbtrfs ] \
>>> -    || [ "x`stat -f --printf=%T /`" = xbtrfs ]; then
>>> +LINUX_ROOT_FS=`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`
>>> +LINUX_ROOT_STAT=`stat -f --printf=%T / || true`
>>> +
>>> +if [ "x${LINUX_ROOT_FS}" = xbtrfs -o "x${LINUX_ROOT_STAT}" = xbtrfs ]; then
>>>      rootsubvol="`make_system_path_relative_to_its_root /`"
>>>      rootsubvol="${rootsubvol#/}"
>>>      if [ "x${rootsubvol}" != x ]; then
>>> @@ -76,6 +78,10 @@
>>>        GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"
>>>    fi
>>>
>>> +if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
>>> +  GRUB_CMDLINE_LINUX="boot=zfs \$bootfs ${GRUB_CMDLINE_LINUX}"
>>> +fi
>>> +
>>>    linux_entry ()
>>>    {
>>>      os="$1"
>>> @@ -114,6 +120,12 @@
>>>        fi
>>>        printf '%s\n' "${prepare_boot_cache}"
>>>      fi
>>> +  if [ "x${LINUX_ROOT_FS}" = xzfs ]; then
>>> +    cat<<   EOF
>>> +	insmod zfsinfo
>>> +	zfs-bootfs (\$root) bootfs
>> This makes 3 wrong assumptions in a row:
>> - / and /boot may be different.
> Despite the variable being called LINUX_ROOT_FS, this is really the
> output from grub-probe --device ${GRUB_DEVICE}. When / != /boot, is
> $GRUB_DEVICE the device of / or /boot?
>
>> - Linux may be in a non-root subvolume. Then the subvolid points to
>> wrong one.
> By "Linux", you're talking about the kernel, as opposed to the root
> filesystem, correct?
>
> What do you mean by "non-root subvolume"? That sounds like a btrfs term,
> not a ZFS term, so I don't follow.
>
>> - / may be unaccessible to GRUB altogether.
> Are you talking about at grub-install time or boot time? Can you provide
> an example of when this might happen, so I can understand.
>
>> In short: this command line part has to be generated on grub-mkconfig
>> time and have a stable representation. I'd recommend UUID and subvolume
>> name.
> By "this command line part", are you talking about the path to the
> kernel?
>


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko


[-- Attachment #2: zfs.diff --]
[-- Type: text/x-diff, Size: 39020 bytes --]

=== modified file 'include/grub/emu/getroot.h'
--- include/grub/emu/getroot.h	2011-04-25 12:52:07 +0000
+++ include/grub/emu/getroot.h	2012-01-29 20:58:08 +0000
@@ -30,7 +30,7 @@
 };
 
 char *grub_find_device (const char *dir, dev_t dev);
-char *grub_guess_root_device (const char *dir);
+char **grub_guess_root_devices (const char *dir);
 int grub_util_get_dev_abstraction (const char *os_dev);
 char *grub_util_get_grub_dev (const char *os_dev);
 char *grub_make_system_path_relative_to_its_root (const char *path);

=== modified file 'include/grub/emu/misc.h'
--- include/grub/emu/misc.h	2011-12-13 13:51:41 +0000
+++ include/grub/emu/misc.h	2012-01-29 20:58:08 +0000
@@ -80,6 +80,4 @@
 int grub_device_mapper_supported (void);
 #endif
 
-char *grub_find_root_device_from_mountinfo (const char *dir, char **relroot);
-
 #endif /* GRUB_EMU_MISC_H */

=== modified file 'util/getroot.c'
--- util/getroot.c	2012-01-29 20:49:44 +0000
+++ util/getroot.c	2012-01-29 20:58:08 +0000
@@ -180,6 +180,149 @@
   return path;
 }
 
+static char **
+find_root_devices_from_poolname (char *poolname)
+{
+  char **devices = 0;
+  size_t ndevices = 0;
+  size_t devices_allocated = 0;
+
+#if defined(HAVE_LIBZFS) && defined(HAVE_LIBNVPAIR)
+  zpool_handle_t *zpool;
+  libzfs_handle_t *libzfs;
+  nvlist_t *config, *vdev_tree;
+  nvlist_t **children, **path;
+  unsigned int nvlist_count;
+  unsigned int i;
+  char *device = 0;
+
+  libzfs = grub_get_libzfs_handle ();
+  if (! libzfs)
+    return NULL;
+
+  zpool = zpool_open (libzfs, poolname);
+  config = zpool_get_config (zpool, NULL);
+
+  if (nvlist_lookup_nvlist (config, "vdev_tree", &vdev_tree) != 0)
+    error (1, errno, "nvlist_lookup_nvlist (\"vdev_tree\")");
+
+  if (nvlist_lookup_nvlist_array (vdev_tree, "children", &children, &nvlist_count) != 0)
+    error (1, errno, "nvlist_lookup_nvlist_array (\"children\")");
+  assert (nvlist_count > 0);
+
+  while (nvlist_lookup_nvlist_array (children[0], "children",
+				     &children, &nvlist_count) == 0)
+    assert (nvlist_count > 0);
+
+  for (i = 0; i < nvlist_count; i++)
+    {
+      if (nvlist_lookup_string (children[i], "path", &device) != 0)
+	error (1, errno, "nvlist_lookup_string (\"path\")");
+
+      struct stat st;
+      if (stat (device, &st) == 0)
+	{
+#ifdef __sun__
+	  if (grub_memcmp (device, "/dev/dsk/", sizeof ("/dev/dsk/") - 1)
+	      == 0)
+	    device = xasprintf ("/dev/rdsk/%s",
+				device + sizeof ("/dev/dsk/") - 1);
+	  else if (grub_memcmp (device, "/devices", sizeof ("/devices") - 1)
+		   == 0
+		   && grub_memcmp (device + strlen (device) - 4,
+				   ",raw", 4) != 0)
+	    device = xasprintf ("%s,raw", device);
+	  else
+#endif
+	    device = xstrdup (device);
+	  if (ndevices >= devices_allocated)
+	    {
+	      devices_allocated = 2 * (devices_allocated + 8);
+	      devices = xrealloc (devices, sizeof (devices[0])
+				  * devices_allocated);
+	    }
+	  devices[ndevices++] = device;
+	}
+
+      device = NULL;
+    }
+
+  zpool_close (zpool);
+#else
+  char *cmd;
+  FILE *fp;
+  int ret;
+  char *line;
+  size_t len;
+  int st;
+
+  char name[PATH_MAX + 1], state[257], readlen[257], writelen[257];
+  char cksum[257], notes[257];
+  unsigned int dummy;
+
+  cmd = xasprintf ("zpool status %s", poolname);
+  fp = popen (cmd, "r");
+  free (cmd);
+
+  st = 0;
+  while (1)
+    {
+      line = NULL;
+      ret = getline (&line, &len, fp);
+      if (ret == -1)
+	break;
+	
+      if (sscanf (line, " %s %256s %256s %256s %256s %256s",
+		  name, state, readlen, writelen, cksum, notes) >= 5)
+	switch (st)
+	  {
+	  case 0:
+	    if (!strcmp (name, "NAME")
+		&& !strcmp (state, "STATE")
+		&& !strcmp (readlen, "READ")
+		&& !strcmp (writelen, "WRITE")
+		&& !strcmp (cksum, "CKSUM"))
+	      st++;
+	    break;
+	  case 1:
+	    if (!strcmp (name, poolname))
+	      st++;
+	    break;
+	  case 2:
+	    if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
+		&& !sscanf (name, "raidz%u", &dummy)
+		&& !strcmp (state, "ONLINE"))
+	      {
+		char *tmp;
+		if (ndevices >= devices_allocated)
+		  {
+		    devices_allocated = 2 * (devices_allocated + 8);
+		    devices = xrealloc (devices, sizeof (devices[0])
+					* devices_allocated);
+		  }
+		devices[ndevices++] = xasprintf ("/dev/%s", name);
+	      }
+	    break;
+	  }
+	
+      free (line);
+    }
+
+  pclose (fp);
+#endif
+  if (devices)
+    {
+      if (ndevices >= devices_allocated)
+	{
+	  devices_allocated = 2 * (devices_allocated + 8);
+	  devices = xrealloc (devices, sizeof (devices[0])
+			      * devices_allocated);
+	}
+      devices[ndevices++] = 0;
+    }
+  return devices;
+}
+
 #ifdef __linux__
 
 #define ESCAPED_PATH_MAX (4 * PATH_MAX)
@@ -219,13 +362,13 @@
   *optr = 0;
 }
 
-char *
-grub_find_root_device_from_mountinfo (const char *dir, char **relroot)
+static char **
+grub_find_root_devices_from_mountinfo (const char *dir, char **relroot)
 {
   FILE *fp;
   char *buf = NULL;
   size_t len = 0;
-  char *ret = NULL;
+  char **ret = NULL;
   int entry_len = 0, entry_max = 4;
   struct mountinfo_entry *entries;
   struct mountinfo_entry parent_entry = { 0, 0, 0, "", "", "", "" };
@@ -328,9 +471,33 @@
       if (!*entries[i].device)
 	continue;
 
-      ret = strdup (entries[i].device);
-      if (relroot)
-	*relroot = strdup (entries[i].enc_root);
+      if (grub_strcmp (entries[i].fstype, "fuse.zfs") == 0)
+	{
+	  char *slash;
+	  slash = strchr (entries[i].device, '/');
+	  if (slash)
+	    *slash = 0;
+	  ret = find_root_devices_from_poolname (entries[i].device);
+	  if (slash)
+	    *slash = '/';
+	  if (relroot)
+	    {
+	      if (!slash)
+		*relroot = xasprintf ("/@%s", entries[i].enc_root);
+	      else if (strchr (slash + 1, '@'))
+		*relroot = xasprintf ("/%s%s", slash + 1, entries[i].enc_root);
+	      else
+		*relroot = xasprintf ("/%s@%s", slash + 1, entries[i].enc_root);
+	    }
+	}
+      else
+	{
+	  ret = xmalloc (2 * sizeof (ret[0]));
+	  ret[0] = strdup (entries[i].device);
+	  ret[1] = 0;
+	  if (relroot)
+	    *relroot = strdup (entries[i].enc_root);
+	}
       break;
     }
 
@@ -342,10 +509,10 @@
 
 #endif /* __linux__ */
 
-static char *
-find_root_device_from_libzfs (const char *dir)
+static char **
+find_root_devices_from_libzfs (const char *dir)
 {
-  char *device = NULL;
+  char **device = NULL;
   char *poolname;
   char *poolfs;
 
@@ -353,119 +520,7 @@
   if (! poolname)
     return NULL;
 
-#if defined(HAVE_LIBZFS) && defined(HAVE_LIBNVPAIR)
-  {
-    zpool_handle_t *zpool;
-    libzfs_handle_t *libzfs;
-    nvlist_t *config, *vdev_tree;
-    nvlist_t **children, **path;
-    unsigned int nvlist_count;
-    unsigned int i;
-
-    libzfs = grub_get_libzfs_handle ();
-    if (! libzfs)
-      return NULL;
-
-    zpool = zpool_open (libzfs, poolname);
-    config = zpool_get_config (zpool, NULL);
-
-    if (nvlist_lookup_nvlist (config, "vdev_tree", &vdev_tree) != 0)
-      error (1, errno, "nvlist_lookup_nvlist (\"vdev_tree\")");
-
-    if (nvlist_lookup_nvlist_array (vdev_tree, "children", &children, &nvlist_count) != 0)
-      error (1, errno, "nvlist_lookup_nvlist_array (\"children\")");
-    assert (nvlist_count > 0);
-
-    while (nvlist_lookup_nvlist_array (children[0], "children",
-				       &children, &nvlist_count) == 0)
-      assert (nvlist_count > 0);
-
-    for (i = 0; i < nvlist_count; i++)
-      {
-	if (nvlist_lookup_string (children[i], "path", &device) != 0)
-	  error (1, errno, "nvlist_lookup_string (\"path\")");
-
-	struct stat st;
-	if (stat (device, &st) == 0)
-	  {
-#ifdef __sun__
-	    if (grub_memcmp (device, "/dev/dsk/", sizeof ("/dev/dsk/") - 1)
-		== 0)
-	      device = xasprintf ("/dev/rdsk/%s",
-				  device + sizeof ("/dev/dsk/") - 1);
-	    else if (grub_memcmp (device, "/devices", sizeof ("/devices") - 1)
-		     == 0
-		     && grub_memcmp (device + strlen (device) - 4,
-				     ",raw", 4) != 0)
-	      device = xasprintf ("%s,raw", device);
-	    else
-#endif
-	      device = xstrdup (device);
-	    break;
-	  }
-
-	device = NULL;
-      }
-
-    zpool_close (zpool);
-  }
-#else
-  {
-    char *cmd;
-    FILE *fp;
-    int ret;
-    char *line;
-    size_t len;
-    int st;
-
-    char name[PATH_MAX + 1], state[257], readlen[257], writelen[257];
-    char cksum[257], notes[257];
-    unsigned int dummy;
-
-    cmd = xasprintf ("zpool status %s", poolname);
-    fp = popen (cmd, "r");
-    free (cmd);
-
-    st = 0;
-    while (st < 3)
-      {
-	line = NULL;
-	ret = getline (&line, &len, fp);
-	if (ret == -1)
-	  goto fail;
-	
-	if (sscanf (line, " %s %256s %256s %256s %256s %256s",
-		    name, state, readlen, writelen, cksum, notes) >= 5)
-	  switch (st)
-	    {
-	    case 0:
-	      if (!strcmp (name, "NAME")
-		  && !strcmp (state, "STATE")
-		  && !strcmp (readlen, "READ")
-		  && !strcmp (writelen, "WRITE")
-		  && !strcmp (cksum, "CKSUM"))
-		st++;
-	      break;
-	    case 1:
-	      if (!strcmp (name, poolname))
-		st++;
-	      break;
-	    case 2:
-	      if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
-		  && !sscanf (name, "raidz%u", &dummy)
-		  && !strcmp (state, "ONLINE"))
-		st++;
-	      break;
-	    }
-	
-	free (line);
-      }
-    device = xasprintf ("/dev/%s", name);
-
- fail:
-    pclose (fp);
-  }
-#endif
+  device = find_root_devices_from_poolname (poolname);
 
   free (poolname);
   if (poolfs)
@@ -706,10 +761,10 @@
 
 #endif /* __CYGWIN__ */
 
-char *
-grub_guess_root_device (const char *dir)
+char **
+grub_guess_root_devices (const char *dir)
 {
-  char *os_dev = NULL;
+  char **os_dev = NULL;
 #ifdef __GNU__
   file_t file;
   mach_port_t *ports;
@@ -743,9 +798,11 @@
   if (data[name_len - 1] != '\0')
     grub_util_error (_("Storage name for `%s' not NUL-terminated"), dir);
 
-  os_dev = xmalloc (strlen ("/dev/") + data_len);
-  memcpy (os_dev, "/dev/", strlen ("/dev/"));
-  memcpy (os_dev + strlen ("/dev/"), data, data_len);
+  os_dev = xmalloc (2 * sizeof (os_dev[0]));
+  os_dev[0] = xmalloc (strlen ("/dev/") + data_len);
+  memcpy (os_dev[0], "/dev/", strlen ("/dev/"));
+  memcpy (os_dev[0] + strlen ("/dev/"), data, data_len);
+  os_dev[1] = 0;
 
   if (ports && num_ports > 0)
     {
@@ -772,48 +829,56 @@
 
 #ifdef __linux__
   if (!os_dev)
-    os_dev = grub_find_root_device_from_mountinfo (dir, NULL);
+    os_dev = grub_find_root_devices_from_mountinfo (dir, NULL);
 #endif /* __linux__ */
 
   if (!os_dev)
-    os_dev = find_root_device_from_libzfs (dir);
-
-  if (os_dev)
-    {
-      char *tmp = os_dev;
-      os_dev = canonicalize_file_name (os_dev);
-      free (tmp);
-    }
-
-  if (os_dev)
-    {
-      int dm = (strncmp (os_dev, "/dev/dm-", sizeof ("/dev/dm-") - 1) == 0);
-      int root = (strcmp (os_dev, "/dev/root") == 0);
-      if (!dm && !root)
+    os_dev = find_root_devices_from_libzfs (dir);
+
+  if (os_dev)
+    {
+      char **cur;
+      for (cur = os_dev; *cur; cur++)
+	{
+	  char *tmp = *cur;
+	  int root, dm;
+	  *cur = canonicalize_file_name (*cur);
+	  free (tmp);
+	  root = (strcmp (*cur, "/dev/root") == 0);
+	  dm = (strncmp (*cur, "/dev/dm-", sizeof ("/dev/dm-") - 1) == 0);
+	  if (!dm && !root)
+	    continue;
+	  if (stat (*cur, &st) < 0)
+	    break;
+	  free (*cur);
+	  dev = st.st_rdev;
+	  *cur = grub_find_device (dm ? "/dev/mapper" : "/dev", dev);
+	}
+      if (!*cur)
 	return os_dev;
-      if (stat (os_dev, &st) >= 0)
-	{
-	  free (os_dev);
-	  dev = st.st_rdev;
-	  return grub_find_device (dm ? "/dev/mapper" : "/dev", dev);
-	}
+      for (cur = os_dev; *cur; cur++)
+	free (*cur);
       free (os_dev);
+      os_dev = 0;
     }
 
   if (stat (dir, &st) < 0)
     grub_util_error (_("cannot stat `%s'"), dir);
 
   dev = st.st_dev;
+
+  os_dev = xmalloc (2 * sizeof (os_dev[0]));
   
 #ifdef __CYGWIN__
   /* Cygwin specific function.  */
-  os_dev = grub_find_device (dir, dev);
+  os_dev[0] = grub_find_device (dir, dev);
 
 #else
 
   /* This might be truly slow, but is there any better way?  */
-  os_dev = grub_find_device ("/dev", dev);
+  os_dev[0] = grub_find_device ("/dev", dev);
 #endif
+  os_dev[1] = 0;
 #endif /* !__GNU__ */
 
   return os_dev;
@@ -2412,7 +2477,7 @@
 #ifdef __linux__
 	      {
 		char *bind;
-		grub_free (grub_find_root_device_from_mountinfo (buf2, &bind));
+		grub_free (grub_find_root_devices_from_mountinfo (buf2, &bind));
 		if (bind && bind[0] && bind[1])
 		  {
 		    buf3 = bind;
@@ -2445,7 +2510,7 @@
 #ifdef __linux__
   {
     char *bind;
-    grub_free (grub_find_root_device_from_mountinfo (buf2, &bind));
+    grub_free (grub_find_root_devices_from_mountinfo (buf2, &bind));
     if (bind && bind[0] && bind[1])
       {
 	char *temp = buf3;

=== modified file 'util/grub-install.in'
--- util/grub-install.in	2012-01-27 12:12:00 +0000
+++ util/grub-install.in	2012-01-29 20:58:08 +0000
@@ -473,7 +473,7 @@
 fi
 
 # Create the core image. First, auto-detect the filesystem module.
-fs_module="`"$grub_probe" --device-map="${device_map}" --target=fs --device "${grub_device}"`"
+fs_module="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=fs --device `"
 if test "x$fs_module" = x ; then
     echo "Auto-detection of a filesystem of ${grub_device} failed." 1>&2
     echo "Try with --recheck." 1>&2
@@ -485,7 +485,7 @@
 # this command is allowed to fail (--target=fs already grants us that the
 # filesystem will be accessible).
 partmap_module=
-for x in `"$grub_probe" --device-map="${device_map}" --target=partmap --device "${grub_device}" 2> /dev/null`; do
+for x in `echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=partmap --device 2> /dev/null`; do
    case "$x" in
        netbsd | openbsd) 
 	   partmap_module="$partmap_module part_bsd";;
@@ -496,7 +496,7 @@
 done
 
 # Device abstraction module, if any (lvm, raid).
-devabstraction_module="`"$grub_probe" --device-map="${device_map}" --target=abstraction --device "${grub_device}"`"
+devabstraction_module="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=abstraction --device`"
 
 if [ "x$disk_module" = xata ]; then
     disk_module=pata
@@ -538,14 +538,14 @@
       fi
       install_drive="`echo "${install_drive}" | sed -e 's/^(\(\([^,\\\\]\|\\\\\\\\\|\\\\,\)*\)\(\(,[a-zA-Z0-9]*\)*\))$/\1/'`"
     fi
-    grub_drive="`"$grub_probe" --device-map="${device_map}" --target=drive --device "${grub_device}"`" || exit 1
+    grub_drive="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=drive --device`" || exit 1
 
     # Strip partition number
     grub_partition="`echo "${grub_drive}" | sed -e 's/^(\(\([^,\\\\]\|\\\\\\\\\|\\\\,\)*\)\(\(,[a-zA-Z0-9]*\)*\))$/\3/'`"
     grub_drive="`echo "${grub_drive}" | sed -e 's/^(\(\([^,\\\\]\|\\\\\\\\\|\\\\,\)*\)\(\(,[a-zA-Z0-9]*\)*\))$/\1/'`"
     if ([ "x$disk_module" != x ] && [ "x$disk_module" != xbiosdisk ]) || [ "x${grub_drive}" != "x${install_drive}" ] || ([ "x$platform" != xefi ] && [ "x$platform" != xpc ] && [ x"${platform}" != x"ieee1275" ]); then
         # generic method (used on coreboot and ata mod)
-        uuid="`"$grub_probe" --device-map="${device_map}" --target=fs_uuid --device "${grub_device}"`"
+        uuid="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=fs_uuid --device`"
         if [ "x${uuid}" = "x" ] ; then
           if [ "x$platform" != xefi ] && [ "x$platform" != xpc ] && [ x"${platform}" != x"ieee1275" ]; then
              echo "UUID needed with $platform, but the filesystem containing ${grubdir} does not support UUIDs." 1>&2
@@ -559,15 +559,15 @@
         fi
 
 	if [ x"$disk_module" != x ] && [ x"$disk_module" != xbiosdisk ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device`"
 	elif [ x"$platform" = xpc ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=bios_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=bios_hints --device`"
 	elif [ x"$platform" = xefi ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=efi_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=efi_hints --device`"
 	elif [ x"$platform" = xieee1275 ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=ieee1275_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=ieee1275_hints --device`"
 	elif [ x"$platform" = xloongson ] || [ x"$platform" = xqemu ] || [ x"$platform" = xcoreboot ] || [ x"$platform" = xmultiboot ] || [ x"$platform" = xqemu-mips ]; then
-            hints="`"$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device "${grub_device}"`"
+            hints="`echo "${grub_device}" | xargs "$grub_probe" --device-map="${device_map}" --target=baremetal_hints --device`"
 	else
             echo "No hints available for your platform. Expect reduced performance"
 	    hints=
@@ -587,7 +587,7 @@
     fi
 else
     if [ x$GRUB_CRYPTODISK_ENABLE = xy ]; then
-	for uuid in "`"${grub_probe}" --device "${grub_device}" --target=cryptodisk_uuid`"; do
+	for uuid in "`echo "${grub_device}" | xargs "${grub_probe}"  --target=cryptodisk_uuid --device`"; do
 	    echo "cryptomount -u $uuid" >> "${grubdir}/load.cfg"
 	done
 	config_opt="-c ${grubdir}/load.cfg "

=== modified file 'util/grub-probe.c'
--- util/grub-probe.c	2012-01-29 20:49:44 +0000
+++ util/grub-probe.c	2012-01-29 21:08:56 +0000
@@ -313,307 +313,342 @@
 }
 
 static void
-probe (const char *path, char *device_name)
+probe (const char *path, char **device_names, char delim)
 {
-  char *drive_name = NULL;
+  char **drives_names = NULL;
+  char **curdev, **curdrive;
   char *grub_path = NULL;
-  char *filebuf_via_grub = NULL, *filebuf_via_sys = NULL;
-  grub_device_t dev = NULL;
-  grub_fs_t fs;
+  int ndev = 0;
 
-  if (path == NULL)
-    {
-#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || defined(__NetBSD__) || defined(__sun__)
-      if (! grub_util_check_char_device (device_name))
-        grub_util_error (_("%s is not a character device"), device_name);
-#else
-      if (! grub_util_check_block_device (device_name))
-        grub_util_error (_("%s is not a block device"), device_name);
-#endif
-    }
-  else
+  if (path != NULL)
     {
       grub_path = canonicalize_file_name (path);
-      device_name = grub_guess_root_device (grub_path);
+      device_names = grub_guess_root_devices (grub_path);
+      free (grub_path);
     }
 
-  if (! device_name)
+  if (! device_names)
     grub_util_error (_("cannot find a device for %s (is /dev mounted?)"), path);
 
   if (print == PRINT_DEVICE)
     {
-      printf ("%s\n", device_name);
-      goto end;
-    }
-
-  drive_name = grub_util_get_grub_dev (device_name);
-  if (! drive_name)
-    grub_util_error (_("cannot find a GRUB drive for %s.  Check your device.map"),
-		     device_name);
-
-  if (print == PRINT_DRIVE)
-    {
-      printf ("(%s)\n", drive_name);
-      goto end;
-    }
-
-  grub_util_info ("opening %s", drive_name);
-  dev = grub_device_open (drive_name);
-  if (! dev)
-    grub_util_error ("%s", _(grub_errmsg));
-
-  if (print == PRINT_HINT_STR)
-    {
-      const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
-      const char *ofpath = osdev ? grub_util_devname_to_ofpath (osdev) : 0;
-      char *biosname, *bare, *efi;
-      const char *map;
-
-      if (ofpath)
-	{
-	  printf ("--hint-ieee1275='");
-	  print_full_name (ofpath, dev);
-	  printf ("' ");
-	}
-
-      biosname = guess_bios_drive (device_name);
-      if (biosname)
-	{
-	  printf ("--hint-bios=");
-	  print_full_name (biosname, dev);
-	  printf (" ");
-	}
-      free (biosname);
-
-      efi = guess_efi_drive (device_name);
-      if (efi)
-	{
-	  printf ("--hint-efi=");
-	  print_full_name (efi, dev);
-	  printf (" ");
-	}
-      free (efi);
-
-      bare = guess_baremetal_drive (device_name);
-      if (bare)
-	{
-	  printf ("--hint-baremetal=");
-	  print_full_name (bare, dev);
-	  printf (" ");
-	}
-      free (bare);
-
-      /* FIXME: Add ARC hint.  */
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf ("--hint='");
-	  print_full_name (map, dev);
-	  printf ("' ");
-	}
-      printf ("\n");
-
-      goto end;
-    }
-
-  if ((print == PRINT_COMPATIBILITY_HINT || print == PRINT_BIOS_HINT
-       || print == PRINT_IEEE1275_HINT || print == PRINT_BAREMETAL_HINT
-       || print == PRINT_EFI_HINT || print == PRINT_ARC_HINT)
-      && dev->disk->dev->id != GRUB_DISK_DEVICE_HOSTDISK_ID)
-    {
-      print_full_name (dev->disk->name, dev);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_COMPATIBILITY_HINT)
-    {
-      const char *map;
-      char *biosname;
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  print_full_name (map, dev);
+      for (curdev = device_names; *curdev; curdev++)
+	{
+	  printf ("%s", *curdev);
+	  putchar (delim);
+	}
+      return;
+    }
+
+  for (curdev = device_names; *curdev; curdev++)
+    {
+      grub_util_pull_device (*curdev);
+      ndev++;
+    }
+  
+  drives_names = xmalloc (sizeof (drives_names[0]) * (ndev + 1)); 
+
+  for (curdev = device_names, curdrive = drives_names; *curdev; curdev++,
+       curdrive++)
+    {
+      *curdrive = grub_util_get_grub_dev (*curdev);
+      if (! *curdrive)
+	grub_util_error (_("cannot find a GRUB drive for %s.  Check your device.map"),
+			 *curdev);
+    }
+  *curdrive = 0;
+
+  if (print == PRINT_FS || print == PRINT_FS_UUID
+      || print == PRINT_FS_LABEL)
+    {
+      grub_device_t dev = NULL;
+      grub_fs_t fs;
+
+      grub_util_info ("opening %s", drives_names[0]);
+      dev = grub_device_open (drives_names[0]);
+      if (! dev)
+	grub_util_error ("%s", _(grub_errmsg));
+      
+      fs = grub_fs_probe (dev);
+      if (! fs)
+	grub_util_error ("%s", _(grub_errmsg));
+
+      if (print == PRINT_FS)
+	{
+	  printf ("%s", fs->name);
+	  putchar (delim);
+	}
+      else if (print == PRINT_FS_UUID)
+	{
+	  char *uuid;
+	  if (! fs->uuid)
+	    grub_util_error (_("%s does not support UUIDs"), fs->name);
+
+	  if (fs->uuid (dev, &uuid) != GRUB_ERR_NONE)
+	    grub_util_error ("%s", grub_errmsg);
+
+	  printf ("%s", uuid);
+	  putchar (delim);
+	}
+      else if (print == PRINT_FS_LABEL)
+	{
+	  char *label;
+	  if (! fs->label)
+	    grub_util_error (_("%s does not support labels"), fs->name);
+
+	  if (fs->label (dev, &label) != GRUB_ERR_NONE)
+	    grub_util_error ("%s", _(grub_errmsg));
+
+	  printf ("%s", label);
+	  putchar (delim);
+	}
+      goto end;
+    }
+
+  for (curdrive = drives_names, curdev = device_names; *curdrive;
+       curdrive++, curdev++)
+    {
+      grub_device_t dev = NULL;
+
+      grub_util_info ("opening %s", *curdrive);
+      dev = grub_device_open (*curdrive);
+      if (! dev)
+	grub_util_error ("%s", _(grub_errmsg));
+
+      if (print == PRINT_HINT_STR)
+	{
+	  const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
+	  const char *ofpath = osdev ? grub_util_devname_to_ofpath (osdev) : 0;
+	  char *biosname, *bare, *efi;
+	  const char *map;
+
+	  if (ofpath)
+	    {
+	      printf ("--hint-ieee1275='");
+	      print_full_name (ofpath, dev);
+	      printf ("' ");
+	    }
+
+	  biosname = guess_bios_drive (*curdev);
+	  if (biosname)
+	    {
+	      printf ("--hint-bios=");
+	      print_full_name (biosname, dev);
+	      printf (" ");
+	    }
+	  free (biosname);
+
+	  efi = guess_efi_drive (*curdev);
+	  if (efi)
+	    {
+	      printf ("--hint-efi=");
+	      print_full_name (efi, dev);
+	      printf (" ");
+	    }
+	  free (efi);
+
+	  bare = guess_baremetal_drive (*curdev);
+	  if (bare)
+	    {
+	      printf ("--hint-baremetal=");
+	      print_full_name (bare, dev);
+	      printf (" ");
+	    }
+	  free (bare);
+
+	  /* FIXME: Add ARC hint.  */
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      printf ("--hint='");
+	      print_full_name (map, dev);
+	      printf ("' ");
+	    }
 	  printf ("\n");
-	  goto end;
-	}
-      biosname = guess_bios_drive (device_name);
-      if (biosname)
-	print_full_name (biosname, dev);
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-
-  if (print == PRINT_BIOS_HINT)
-    {
-      char *biosname;
-      biosname = guess_bios_drive (device_name);
-      if (biosname)
-	print_full_name (biosname, dev);
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-  if (print == PRINT_IEEE1275_HINT)
-    {
-      const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
-      const char *ofpath = grub_util_devname_to_ofpath (osdev);
-      const char *map;
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-
-      if (ofpath)
-	{
-	  printf (" ");
-	  print_full_name (ofpath, dev);
-	}
-
-      printf ("\n");
-      goto end;
-    }
-  if (print == PRINT_EFI_HINT)
-    {
-      char *biosname;
-      char *name;
-      const char *map;
-      biosname = guess_efi_drive (device_name);
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-      if (biosname)
-	{
-	  printf (" ");
-	  print_full_name (biosname, dev);
-	}
-
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-
-  if (print == PRINT_BAREMETAL_HINT)
-    {
-      char *biosname;
-      char *name;
-      const char *map;
-
-      biosname = guess_baremetal_drive (device_name);
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-      if (biosname)
-	{
-	  printf (" ");
-	  print_full_name (biosname, dev);
-	}
-
-      printf ("\n");
-      free (biosname);
-      goto end;
-    }
-
-  if (print == PRINT_ARC_HINT)
-    {
-      const char *map;
-
-      map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
-      if (map)
-	{
-	  printf (" ");
-	  print_full_name (map, dev);
-	}
-      printf ("\n");
-
-      /* FIXME */
-
-      goto end;
-    }
-
-  if (print == PRINT_ABSTRACTION)
-    {
-      probe_abstraction (dev->disk);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_CRYPTODISK_UUID)
-    {
-      probe_cryptodisk_uuid (dev->disk);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_PARTMAP)
-    {
-      /* Check if dev->disk itself is contained in a partmap.  */
-      probe_partmap (dev->disk);
-      printf ("\n");
-      goto end;
-    }
-
-  if (print == PRINT_MSDOS_PARTTYPE)
-    {
-      if (dev->disk->partition
-	  && strcmp(dev->disk->partition->partmap->name, "msdos") == 0)
-        printf ("%02x", dev->disk->partition->msdostype);
-
-      printf ("\n");
-      goto end;
-    }
-
-  fs = grub_fs_probe (dev);
-  if (! fs)
-    grub_util_error ("%s", _(grub_errmsg));
-
-  if (print == PRINT_FS)
-    {
-      printf ("%s\n", fs->name);
-    }
-  else if (print == PRINT_FS_UUID)
-    {
-      char *uuid;
-      if (! fs->uuid)
-	grub_util_error (_("%s does not support UUIDs"), fs->name);
-
-      if (fs->uuid (dev, &uuid) != GRUB_ERR_NONE)
-	grub_util_error ("%s", grub_errmsg);
-
-      printf ("%s\n", uuid);
-    }
-  else if (print == PRINT_FS_LABEL)
-    {
-      char *label;
-      if (! fs->label)
-	grub_util_error (_("%s does not support labels"), fs->name);
-
-      if (fs->label (dev, &label) != GRUB_ERR_NONE)
-	grub_util_error ("%s", _(grub_errmsg));
-
-      printf ("%s\n", label);
+
+	  grub_device_close (dev);
+	  continue;
+	}
+      
+      if ((print == PRINT_COMPATIBILITY_HINT || print == PRINT_BIOS_HINT
+	   || print == PRINT_IEEE1275_HINT || print == PRINT_BAREMETAL_HINT
+	   || print == PRINT_EFI_HINT || print == PRINT_ARC_HINT)
+	  && dev->disk->dev->id != GRUB_DISK_DEVICE_HOSTDISK_ID)
+	{
+	  print_full_name (dev->disk->name, dev);
+	  putchar (delim);
+	  continue;
+	}
+
+      if (print == PRINT_COMPATIBILITY_HINT)
+	{
+	  const char *map;
+	  char *biosname;
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      print_full_name (map, dev);
+	      putchar (delim);
+	      grub_device_close (dev);
+	      /* Compatibility hint is one device only.  */
+	      break;
+	    }
+	  biosname = guess_bios_drive (*curdev);
+	  if (biosname)
+	    {
+	      print_full_name (biosname, dev);
+	      putchar (delim);
+	    }
+	  free (biosname);
+	  grub_device_close (dev);
+	  /* Compatibility hint is one device only.  */
+	  if (biosname)
+	    break;
+	  continue;
+	}
+
+      if (print == PRINT_BIOS_HINT)
+	{
+	  char *biosname;
+	  biosname = guess_bios_drive (*curdev);
+	  if (biosname)
+	    {
+	      print_full_name (biosname, dev);
+	      putchar (delim);
+	    }
+	  free (biosname);
+	  grub_device_close (dev);
+	  continue;
+	}
+      if (print == PRINT_IEEE1275_HINT)
+	{
+	  const char *osdev = grub_util_biosdisk_get_osdev (dev->disk);
+	  const char *ofpath = grub_util_devname_to_ofpath (osdev);
+	  const char *map;
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      print_full_name (map, dev);
+	      putchar (delim);
+	    }
+
+	  if (ofpath)
+	    {
+	      print_full_name (ofpath, dev);
+	      putchar (delim);
+	    }
+
+	  grub_device_close (dev);
+	  continue;
+	}
+      if (print == PRINT_EFI_HINT)
+	{
+	  char *biosname;
+	  char *name;
+	  const char *map;
+	  biosname = guess_efi_drive (*curdev);
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      print_full_name (map, dev);
+	      putchar (delim);
+	    }
+	  if (biosname)
+	    {
+	      print_full_name (biosname, dev);
+	      putchar (delim);
+	    }
+
+	  free (biosname);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_BAREMETAL_HINT)
+	{
+	  char *biosname;
+	  char *name;
+	  const char *map;
+
+	  biosname = guess_baremetal_drive (*curdev);
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      print_full_name (map, dev);
+	      putchar (delim);
+	    }
+	  if (biosname)
+	    {
+	      print_full_name (biosname, dev);
+	      putchar (delim);
+	    }
+
+	  free (biosname);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_ARC_HINT)
+	{
+	  const char *map;
+
+	  map = grub_util_biosdisk_get_compatibility_hint (dev->disk);
+	  if (map)
+	    {
+	      print_full_name (map, dev);
+	      putchar (delim);
+	    }
+
+	  /* FIXME */
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_ABSTRACTION)
+	{
+	  probe_abstraction (dev->disk);
+	  putchar (delim);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_CRYPTODISK_UUID)
+	{
+	  probe_cryptodisk_uuid (dev->disk);
+	  putchar (delim);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_PARTMAP)
+	{
+	  /* Check if dev->disk itself is contained in a partmap.  */
+	  probe_partmap (dev->disk);
+	  putchar (delim);
+	  grub_device_close (dev);
+	  continue;
+	}
+
+      if (print == PRINT_MSDOS_PARTTYPE)
+	{
+	  if (dev->disk->partition
+	      && strcmp(dev->disk->partition->partmap->name, "msdos") == 0)
+	    printf ("%02x", dev->disk->partition->msdostype);
+
+	  putchar (delim);
+	  grub_device_close (dev);
+	  continue;
+	}
     }
 
  end:
-  if (dev)
-    grub_device_close (dev);
-  free (grub_path);
-  free (filebuf_via_grub);
-  free (filebuf_via_sys);
-  free (drive_name);
+  for (curdrive = drives_names; *curdrive; curdrive++)
+    free (*curdrive);
+  free (drives_names);
 }
 
 static struct option options[] =
@@ -658,7 +693,8 @@
 main (int argc, char *argv[])
 {
   char *dev_map = 0;
-  char *argument;
+  int zero_delim = 0;
+  char delim;
 
   set_program_name (argv[0]);
 
@@ -667,7 +703,7 @@
   /* Check for options.  */
   while (1)
     {
-      int c = getopt_long (argc, argv, "dm:t:hVv", options, 0);
+      int c = getopt_long (argc, argv, "dm:t:hVv0", options, 0);
 
       if (c == -1)
 	break;
@@ -726,6 +762,10 @@
 	    usage (0);
 	    break;
 
+	  case '0':
+	    zero_delim = 1;
+	    break;
+
 	  case 'V':
 	    printf ("%s (%s) %s\n", program_name, PACKAGE_NAME, PACKAGE_VERSION);
 	    return 0;
@@ -750,14 +790,12 @@
       usage (1);
     }
 
-  if (optind + 1 != argc)
+  if (optind + 1 != argc && !argument_is_device)
     {
       fprintf (stderr, _("Unknown extra argument `%s'.\n"), argv[optind + 1]);
       usage (1);
     }
 
-  argument = argv[optind];
-
   /* Initialize the emulated biosdisk driver.  */
   grub_util_biosdisk_init (dev_map ? : DEFAULT_DEVICE_MAP);
 
@@ -774,11 +812,28 @@
   grub_mdraid1x_init ();
   grub_lvm_init ();
 
+  if (print == PRINT_COMPATIBILITY_HINT || print == PRINT_BIOS_HINT
+      || print == PRINT_IEEE1275_HINT || print == PRINT_BAREMETAL_HINT
+      || print == PRINT_EFI_HINT || print == PRINT_ARC_HINT)
+    delim = ' ';
+  else
+    delim = '\n';
+
+  if (zero_delim)
+    delim = '\0';
+
   /* Do it.  */
   if (argument_is_device)
-    probe (NULL, argument);
+    probe (NULL, argv + optind, delim);
   else
-    probe (argument, NULL);
+    probe (argv[optind], NULL, delim);
+
+  if (!zero_delim && (print == PRINT_COMPATIBILITY_HINT
+		      || print == PRINT_BIOS_HINT
+		      || print == PRINT_IEEE1275_HINT
+		      || print == PRINT_BAREMETAL_HINT
+		      || print == PRINT_EFI_HINT || print == PRINT_ARC_HINT))
+    putchar ('\n');
 
   /* Free resources.  */
   grub_gcry_fini_all ();

=== modified file 'util/grub-setup.c'
--- util/grub-setup.c	2012-01-29 13:28:01 +0000
+++ util/grub-setup.c	2012-01-29 21:13:11 +0000
@@ -133,15 +133,16 @@
 static void
 setup (const char *dir,
        const char *boot_file, const char *core_file,
-       const char *root, const char *dest, int force,
+       const char *dest, int force,
        int fs_probe, int allow_floppy)
 {
   char *boot_path, *core_path, *core_path_dev, *core_path_dev_full;
   char *boot_img, *core_img;
+  char *root = 0;
   size_t boot_size, core_size;
   grub_uint16_t core_sectors;
-  grub_device_t root_dev, dest_dev;
-  struct grub_boot_blocklist *first_block, *block;
+  grub_device_t root_dev = 0, dest_dev;
+  struct grub_boot_blocklist *first_block, *block, *last_block;
   char *tmp_img;
   int i;
   grub_disk_addr_t first_sector;
@@ -235,17 +236,56 @@
 						- sizeof (*block));
   grub_util_info ("root is `%s', dest is `%s'", root, dest);
 
-  /* Open the root device and the destination device.  */
-  grub_util_info ("Opening root");
-  root_dev = grub_device_open (root);
-  if (! root_dev)
-    grub_util_error ("%s", _(grub_errmsg));
-
   grub_util_info ("Opening dest");
   dest_dev = grub_device_open (dest);
   if (! dest_dev)
     grub_util_error ("%s", _(grub_errmsg));
 
+  {
+    char **root_devices = grub_guess_root_devices (dir);
+    char **cur;
+    int found = 0;
+
+    for (cur = root_devices; *cur; cur++)
+      {
+	char *drive;
+	grub_device_t try_dev;
+
+	drive = grub_util_get_grub_dev (*cur);
+	if (!drive)
+	  continue;
+	try_dev = grub_device_open (drive);
+	if (! try_dev)
+	  continue;
+	if (!found && try_dev->disk->id == dest_dev->disk->id
+	    && try_dev->disk->dev->id == dest_dev->disk->dev->id)
+	  {
+	    if (root_dev)
+	      grub_device_close (root_dev);
+	    free (root);
+	    root_dev = try_dev;
+	    root = drive;
+	    found = 1;
+	    continue;
+	  }
+	if (!root_dev)
+	  {
+	    root_dev = try_dev;
+	    root = drive;
+	    continue;
+	  }
+	grub_device_close (try_dev);	
+	free (drive);
+      }
+    if (!root_dev)
+      {
+	grub_util_error ("guessing the root device failed, because of `%s'",
+			 grub_errmsg);
+      }
+    grub_util_info ("guessed root_dev `%s' from "
+		    "dir `%s'", root_dev->disk->name, dir);
+  }
+
   grub_util_info ("setting the root device to `%s'", root);
   if (grub_env_set ("root", root) != GRUB_ERR_NONE)
     grub_util_error ("%s", _(grub_errmsg));
@@ -636,11 +676,12 @@
     boot_devpath = (char *) (boot_img
 			     + GRUB_BOOT_AOUT_HEADER_SIZE
 			     + GRUB_BOOT_MACHINE_BOOT_DEVPATH);
-    if (file->device->disk->id != dest_dev->disk->id)
+    if (dest_dev->disk->id != root_dev->disk->id
+	|| dest_dev->disk->dev->id != root_dev->disk->dev->id)
       {
 	const char *dest_ofpath;
 	dest_ofpath
-	  = grub_util_devname_to_ofpath (grub_util_biosdisk_get_osdev (file->device->disk));
+	  = grub_util_devname_to_ofpath (grub_util_biosdisk_get_osdev (root_dev->disk));
 	grub_util_info ("dest_ofpath is `%s'", dest_ofpath);
 	strncpy (boot_devpath, dest_ofpath, GRUB_BOOT_MACHINE_BOOT_DEVPATH_END
 		 - GRUB_BOOT_MACHINE_BOOT_DEVPATH - 1);
@@ -741,7 +782,6 @@
   char *core_file;
   char *dir;
   char *dev_map;
-  char *root_dev;
   int  force;
   int  fs_probe;
   int allow_floppy;
@@ -802,13 +842,6 @@
         arguments->dev_map = xstrdup (arg);
         break;
 
-      case 'r':
-        if (arguments->root_dev)
-          free (arguments->root_dev);
-
-        arguments->root_dev = xstrdup (arg);
-        break;
-
       case 'f':
         arguments->force = 1;
         break;
@@ -935,38 +968,11 @@
       grub_util_info ("Using `%s' as GRUB device", dest_dev);
     }
 
-  if (arguments.root_dev)
-    {
-      root_dev = get_device_name (arguments.root_dev);
-
-      if (! root_dev)
-        grub_util_error (_("invalid root device `%s'"), arguments.root_dev);
-
-      root_dev = xstrdup (root_dev);
-    }
-  else
-    {
-      char *root_device =
-        grub_guess_root_device (arguments.dir ? : DEFAULT_DIRECTORY);
-
-      root_dev = grub_util_get_grub_dev (root_device);
-      if (! root_dev)
-	{
-	  grub_util_info ("guessing the root device failed, because of `%s'",
-			  grub_errmsg);
-          grub_util_error (_("cannot guess the root device. Specify the option "
-                             "`--root-device'"));
-	}
-      grub_util_info ("guessed root device `%s' and root_dev `%s' from "
-                      "dir `%s'", root_device, root_dev,
-                      arguments.dir ? : DEFAULT_DIRECTORY);
-    }
-
   /* Do the real work.  */
   setup (arguments.dir ? : DEFAULT_DIRECTORY,
 	 arguments.boot_file ? : DEFAULT_BOOT_FILE,
 	 arguments.core_file ? : DEFAULT_CORE_FILE,
-	 root_dev, dest_dev, arguments.force,
+	 dest_dev, arguments.force,
 	 arguments.fs_probe, arguments.allow_floppy);
 
   /* Free resources.  */
@@ -976,7 +982,6 @@
   free (arguments.boot_file);
   free (arguments.core_file);
   free (arguments.dir);
-  free (arguments.root_dev);
   free (arguments.dev_map);
   free (arguments.device);
   free (root_dev);


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
                               ` (2 preceding siblings ...)
  2012-01-28 18:40             ` Darik Horn
@ 2012-01-30  1:22             ` Richard Laager
  2012-01-30  1:43               ` Vladimir 'φ-coder/phcoder' Serbinenko
  3 siblings, 1 reply; 26+ messages in thread
From: Richard Laager @ 2012-01-30  1:22 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Zachary Bedell

[-- Attachment #1: Type: text/plain, Size: 347 bytes --]

On Sat, 2012-01-28 at 13:51 +0100, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> As for /proc/mounts : is there a reason to suppose that /proc/mounts
> would work when /proc/self/mountinfo doesn't

/proc/self/mountinfo doesn't exist on older Linux kernels. What is
upstream GRUB's minimum required version of Linux kernel?

-- 
Richard

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-30  1:22             ` Richard Laager
@ 2012-01-30  1:43               ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-01-30  1:43 UTC (permalink / raw)
  To: Richard Laager; +Cc: grub-devel, Zachary Bedell

On 30.01.2012 02:22, Richard Laager wrote:
> On Sat, 2012-01-28 at 13:51 +0100, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>> As for /proc/mounts : is there a reason to suppose that /proc/mounts
>> would work when /proc/self/mountinfo doesn't
> /proc/self/mountinfo doesn't exist on older Linux kernels. What is
> upstream GRUB's minimum required version of Linux kernel?
>
Any of 2.x or 3.x line should work. However on older ones we just scan 
/dev to find a device with the same major/minor as the one we've got 
from stat (). It's possible to try to use mtab or /proc/mounts as hint 
while still checking for major/minor match but it's not important. Also 
I don't suppose anyone will use ZFS with older kernels


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-29 22:42               ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-01-31  8:45                 ` Richard Laager
  2012-02-02 11:13                   ` Richard Laager
  2012-02-03  9:52                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 2 replies; 26+ messages in thread
From: Richard Laager @ 2012-01-31  8:45 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Zachary Bedell


[-- Attachment #1.1: Type: text/plain, Size: 485 bytes --]

Attached is a stack of trivial patches that apply *on top of* your
zfs.diff from 2012-01-29. They each deal with one logical change and
should be very easy to review.

After your original patch and this stack have been dealt with, I'll
submit an updated patch for native ZFS on Linux support, which is much
shorter than before.

                                                                     -- 
                                                                 Richard

[-- Attachment #1.2: zfs-on-linux-rlaager0.patch --]
[-- Type: text/x-patch, Size: 1271 bytes --]

Eliminate stray trailing spaces.

Index: grub/util/grub-probe.c
===================================================================
--- grub.orig/util/grub-probe.c	2012-01-31 01:17:56.913537617 -0600
+++ grub/util/grub-probe.c	2012-01-31 01:18:20.531207000 -0600
@@ -269,7 +269,7 @@
   else
     printf ("%s", dname);
   free (dname);
-} 
+}
 
 static void
 probe_abstraction (grub_disk_t disk)
@@ -345,8 +345,8 @@
       grub_util_pull_device (*curdev);
       ndev++;
     }
-  
-  drives_names = xmalloc (sizeof (drives_names[0]) * (ndev + 1)); 
+
+  drives_names = xmalloc (sizeof (drives_names[0]) * (ndev + 1));
 
   for (curdev = device_names, curdrive = drives_names; *curdev; curdev++,
        curdrive++)
@@ -368,7 +368,7 @@
       dev = grub_device_open (drives_names[0]);
       if (! dev)
 	grub_util_error ("%s", _(grub_errmsg));
-      
+
       fs = grub_fs_probe (dev);
       if (! fs)
 	grub_util_error ("%s", _(grub_errmsg));
@@ -470,7 +470,7 @@
 	  grub_device_close (dev);
 	  continue;
 	}
-      
+
       if ((print == PRINT_COMPATIBILITY_HINT || print == PRINT_BIOS_HINT
 	   || print == PRINT_IEEE1275_HINT || print == PRINT_BAREMETAL_HINT
 	   || print == PRINT_EFI_HINT || print == PRINT_ARC_HINT)

[-- Attachment #1.3: zfs-on-linux-rlaager1.patch --]
[-- Type: text/x-patch, Size: 782 bytes --]

Change device to devices in find_root_devices_from_libzfs

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-31 00:34:31.014033000 -0600
+++ grub/util/getroot.c	2012-01-31 00:36:16.026730000 -0600
@@ -512,7 +512,7 @@
 static char **
 find_root_devices_from_libzfs (const char *dir)
 {
-  char **device = NULL;
+  char **devices = NULL;
   char *poolname;
   char *poolfs;
 
@@ -520,13 +520,13 @@
   if (! poolname)
     return NULL;
 
-  device = find_root_devices_from_poolname (poolname);
+  devices = find_root_devices_from_poolname (poolname);
 
   free (poolname);
   if (poolfs)
     free (poolfs);
 
-  return device;
+  return devices;
 }
 
 #ifdef __MINGW32__

[-- Attachment #1.4: zfs-on-linux-rlaager2.patch --]
[-- Type: text/x-patch, Size: 794 bytes --]

Change strlen to sizeof

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-31 00:36:16.026730000 -0600
+++ grub/util/getroot.c	2012-01-31 00:36:25.296311000 -0600
@@ -799,9 +799,9 @@
     grub_util_error (_("Storage name for `%s' not NUL-terminated"), dir);
 
   os_dev = xmalloc (2 * sizeof (os_dev[0]));
-  os_dev[0] = xmalloc (strlen ("/dev/") + data_len);
-  memcpy (os_dev[0], "/dev/", strlen ("/dev/"));
-  memcpy (os_dev[0] + strlen ("/dev/"), data, data_len);
+  os_dev[0] = xmalloc (sizeof ("/dev/") - 1 + data_len);
+  memcpy (os_dev[0], "/dev/", sizeof ("/dev/") - 1);
+  memcpy (os_dev[0] + sizeof ("/dev/") - 1, data, data_len);
   os_dev[1] = 0;
 
   if (ports && num_ports > 0)

[-- Attachment #1.5: zfs-on-linux-rlaager3.patch --]
[-- Type: text/x-patch, Size: 749 bytes --]

Avoid crashing when canonicalize_file_name() fails

I could reproduce this with: grub-probe /dev/sda1

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-30 00:29:30.754018000 -0600
+++ grub/util/getroot.c	2012-01-30 23:31:58.941211000 -0600
@@ -842,7 +931,12 @@
 	{
 	  char *tmp = *cur;
 	  int root, dm;
-	  *cur = canonicalize_file_name (*cur);
+	  *cur = canonicalize_file_name (tmp);
+	  if (*cur == NULL)
+	    {
+	      grub_util_error (_("failed to get canonical path of %s"), tmp);
+	      break;
+	    }
 	  free (tmp);
 	  root = (strcmp (*cur, "/dev/root") == 0);
 	  dm = (strncmp (*cur, "/dev/dm-", sizeof ("/dev/dm-") - 1) == 0);

[-- Attachment #1.6: zfs-on-linux-rlaager4.patch --]
[-- Type: text/x-patch, Size: 478 bytes --]

Drop an unused variable.

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-31 00:50:08.674484255 -0600
+++ grub/util/getroot.c	2012-01-31 00:50:24.200438000 -0600
@@ -293,7 +293,6 @@
 		&& !sscanf (name, "raidz%u", &dummy)
 		&& !strcmp (state, "ONLINE"))
 	      {
-		char *tmp;
 		if (ndevices >= devices_allocated)
 		  {
 		    devices_allocated = 2 * (devices_allocated + 8);

[-- Attachment #1.7: zfs-on-linux-rlaager5.patch --]
[-- Type: text/x-patch, Size: 2285 bytes --]

Add braces around and indent the `zpool status` parsing loop

The switch is one statement, so this doesn't change the actual effect
of the code, but it makes it more clear.  Also, if someone were to try
to add a debugging printf() after the if statement, it would work
correctly. ;)

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-31 00:50:24.000000000 -0600
+++ grub/util/getroot.c	2012-01-31 00:51:15.352398000 -0600
@@ -274,36 +274,38 @@
 	
       if (sscanf (line, " %s %256s %256s %256s %256s %256s",
 		  name, state, readlen, writelen, cksum, notes) >= 5)
-	switch (st)
-	  {
-	  case 0:
-	    if (!strcmp (name, "NAME")
-		&& !strcmp (state, "STATE")
-		&& !strcmp (readlen, "READ")
-		&& !strcmp (writelen, "WRITE")
-		&& !strcmp (cksum, "CKSUM"))
-	      st++;
-	    break;
-	  case 1:
-	    if (!strcmp (name, poolname))
-	      st++;
-	    break;
-	  case 2:
-	    if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
-		&& !sscanf (name, "raidz%u", &dummy)
-		&& !strcmp (state, "ONLINE"))
-	      {
-		if (ndevices >= devices_allocated)
-		  {
-		    devices_allocated = 2 * (devices_allocated + 8);
-		    devices = xrealloc (devices, sizeof (devices[0])
-					* devices_allocated);
-		  }
-		devices[ndevices++] = xasprintf ("/dev/%s", name);
-	      }
-	    break;
-	  }
-	
+	{
+	  switch (st)
+	    {
+	    case 0:
+	      if (!strcmp (name, "NAME")
+		  && !strcmp (state, "STATE")
+		  && !strcmp (readlen, "READ")
+		  && !strcmp (writelen, "WRITE")
+		  && !strcmp (cksum, "CKSUM"))
+		st++;
+	      break;
+	    case 1:
+	      if (!strcmp (name, poolname))
+		st++;
+	      break;
+	    case 2:
+	      if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
+		  && !sscanf (name, "raidz%u", &dummy)
+		  && !strcmp (state, "ONLINE"))
+		{
+		  if (ndevices >= devices_allocated)
+		    {
+		      devices_allocated = 2 * (devices_allocated + 8);
+		      devices = xrealloc (devices, sizeof (devices[0])
+					  * devices_allocated);
+		    }
+		  devices[ndevices++] = xasprintf ("/dev/%s", name);
+	        }
+	      break;
+	    }
+	}
+
       free (line);
     }
 

[-- Attachment #1.8: zfs-on-linux-rlaager6.patch --]
[-- Type: text/x-patch, Size: 786 bytes --]

Handle pool names with trailing spaces

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-31 01:05:12.387005000 -0600
+++ grub/util/getroot.c	2012-01-31 01:05:13.196100000 -0600
@@ -260,7 +260,7 @@
   char cksum[257], notes[257];
   unsigned int dummy;
 
-  cmd = xasprintf ("zpool status %s", poolname);
+  cmd = xasprintf ("zpool status \"%s\"", poolname);
   fp = popen (cmd, "r");
   free (cmd);
 
@@ -286,7 +286,8 @@
 		st++;
 	      break;
 	    case 1:
-	      if (!strcmp (name, poolname))
+              /* strncmp is used because pools can technically have trailing spaces. */
+	      if (!strncmp (name, poolname, strlen (name)))
 		st++;
 	      break;
 	    case 2:

[-- Attachment #1.9: zfs-on-linux-rlaager7.patch --]
[-- Type: text/x-patch, Size: 648 bytes --]

Handle all raidz types in `zpool status` output

Index: grub/util/getroot.c
===================================================================
--- grub.orig/util/getroot.c	2012-01-31 01:05:13.000000000 -0600
+++ grub/util/getroot.c	2012-01-31 01:06:53.856704000 -0600
@@ -293,6 +293,9 @@
 	    case 2:
 	      if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
 		  && !sscanf (name, "raidz%u", &dummy)
+		  && !sscanf (name, "raidz1-%u", &dummy)
+		  && !sscanf (name, "raidz2-%u", &dummy)
+		  && !sscanf (name, "raidz3-%u", &dummy)
 		  && !strcmp (state, "ONLINE"))
 		{
 		  if (ndevices >= devices_allocated)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-31  8:45                 ` Richard Laager
@ 2012-02-02 11:13                   ` Richard Laager
  2012-02-03 10:02                     ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-02-03  9:52                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Laager @ 2012-02-02 11:13 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: grub-devel, Zachary Bedell


[-- Attachment #1.1: Type: text/plain, Size: 68 bytes --]

Attached are two more patches to add to the stack.

-- 
Richard

[-- Attachment #1.2: zfs-on-linux-rlaager10.patch --]
[-- Type: text/x-patch, Size: 548 bytes --]

Index: grub/util/grub-probe.c
===================================================================
--- grub.orig/util/grub-probe.c	2012-02-02 03:36:38.827815635 -0600
+++ grub/util/grub-probe.c	2012-02-02 03:39:22.727085000 -0600
@@ -323,6 +323,11 @@
   if (path != NULL)
     {
       grub_path = canonicalize_file_name (path);
+      if (! grub_path)
+	{
+	  grub_util_error (_("failed to get canonical path of %s"), path);
+	  return;
+	}
       device_names = grub_guess_root_devices (grub_path);
       free (grub_path);
     }

[-- Attachment #1.3: zfs-on-linux-rlaager11.patch --]
[-- Type: text/x-patch, Size: 572 bytes --]

Index: grub/util/grub-probe.c
===================================================================
--- grub.orig/util/grub-probe.c	2012-02-02 04:04:16.154167324 -0600
+++ grub/util/grub-probe.c	2012-02-02 04:07:11.916090000 -0600
@@ -363,6 +363,16 @@
     }
   *curdrive = 0;
 
+  if (print == PRINT_DRIVE)
+    {
+      for (curdrive = drives_names; *curdrive; curdrive++)
+	{
+	  printf ("(%s)", *curdrive);
+	  putchar (delim);
+	}
+      return;
+    }
+
   if (print == PRINT_FS || print == PRINT_FS_UUID
       || print == PRINT_FS_LABEL)
     {

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-01-31  8:45                 ` Richard Laager
  2012-02-02 11:13                   ` Richard Laager
@ 2012-02-03  9:52                   ` Vladimir 'φ-coder/phcoder' Serbinenko
  2012-02-03 11:20                     ` Richard Laager
  1 sibling, 1 reply; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-02-03  9:52 UTC (permalink / raw)
  To: Richard Laager; +Cc: grub-devel, Zachary Bedell

On 31.01.2012 09:45, Richard Laager wrote:
> Attached is a stack of trivial patches that apply*on top of*  your
> zfs.diff from 2012-01-29. They each deal with one logical change and
> should be very easy to review.
>
> After your original patch and this stack have been dealt with, I'll
> submit an updated patch for native ZFS on Linux support, which is much
> shorter than before.
>
>                                                                       --
>                                                                   Richard
>
> zfs-on-linux-rlaager0.patch
>
>
> Eliminate stray trailing spaces.
We don't do it manually, just from time to time run it automatically on 
whole codebase.

>
> zfs-on-linux-rlaager1.patch
>
>
> Change device to devices in find_root_devices_from_libzfs
>
>
>
committed
> zfs-on-linux-rlaager2.patch
>
>
> Change strlen to sizeof
It doesn't matter since GCC changes strlen of const string to just a 
const. But applied for conformity with the rest of codebase
>
> zfs-on-linux-rlaager3.patch
>
>
> Avoid crashing when canonicalize_file_name() fails
>
> I could reproduce this with: grub-probe /dev/sda1
>
>
>
Committed
> zfs-on-linux-rlaager4.patch
>
>
> Drop an unused variable.
>
>
>
committed.
> zfs-on-linux-rlaager5.patch
>
>
> Add braces around and indent the `zpool status` parsing loop
Not needed. And we don't put a braces which are not needed.

>
> zfs-on-linux-rlaager6.patch
>
>
> Handle pool names with trailing spaces
What about the ones with spaces in the middle? It feels like the logic 
is broken elsewhere and using strncmp is just a workaround which works 
only for one particular case
> zfs-on-linux-rlaager7.patch
>
>
> Handle all raidz types in `zpool status` output
>
>
Committed.

-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-02-02 11:13                   ` Richard Laager
@ 2012-02-03 10:02                     ` Vladimir 'φ-coder/phcoder' Serbinenko
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2012-02-03 10:02 UTC (permalink / raw)
  To: Richard Laager; +Cc: grub-devel, Zachary Bedell

On 02.02.2012 12:13, Richard Laager wrote:
> Attached are two more patches to add to the stack.
Committed (see below for comments). Next time please attach ChangeLog 
entry and use -p on diff.
> -- Richard
>
> zfs-on-linux-rlaager10.patch
>
>
> Index: grub/util/grub-probe.c
> ===================================================================
> --- grub.orig/util/grub-probe.c	2012-02-02 03:36:38.827815635 -0600
> +++ grub/util/grub-probe.c	2012-02-02 03:39:22.727085000 -0600
> @@ -323,6 +323,11 @@
>     if (path != NULL)
>       {
>         grub_path = canonicalize_file_name (path);
> +      if (! grub_path)
> +	{
> +	  grub_util_error (_("failed to get canonical path of %s"), path);
> +	  return;
> +	}
No need to return. grub_util_error never returns.
> Index: grub/util/grub-probe.c
> ===================================================================
> --- grub.orig/util/grub-probe.c	2012-02-02 04:04:16.154167324 -0600
> +++ grub/util/grub-probe.c	2012-02-02 04:07:11.916090000 -0600
> @@ -363,6 +363,16 @@
>       }
>     *curdrive = 0;
>
> +  if (print == PRINT_DRIVE)
> +    {
> +      for (curdrive = drives_names; *curdrive; curdrive++)
> +	{
> +	  printf ("(%s)", *curdrive);
> +	  putchar (delim);
> +	}
> +      return;
It should be goto end;
> +    }
> +
>     if (print == PRINT_FS || print == PRINT_FS_UUID
>         || print == PRINT_FS_LABEL)
>       {


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Patch] Robustly search for ZFS labels & uberblocks
  2012-02-03  9:52                   ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2012-02-03 11:20                     ` Richard Laager
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Laager @ 2012-02-03 11:20 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko; +Cc: grub-devel


[-- Attachment #1.1: Type: text/plain, Size: 519 bytes --]

On Fri, 2012-02-03 at 10:52 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
wrote:
> > zfs-on-linux-rlaager6.patch
> >
> >
> > Handle pool names with trailing spaces
> What about the ones with spaces in the middle? It feels like the logic 
> is broken elsewhere and using strncmp is just a workaround which works 
> only for one particular case

I've attached a new patch. I've tested it with both cases.

I've also attached a patch that handles `zpool status` outputting full
device paths.

-- 
Richard

[-- Attachment #1.2: zfs-devices.patch --]
[-- Type: text/x-patch, Size: 573 bytes --]

Handle vdevs with full paths

Index: util/getroot.c
===================================================================
--- util/getroot.c	2012-02-03 04:21:29.199249594 -0600
+++ util/getroot.c	2012-02-03 05:02:55.693211000 -0600
@@ -302,7 +301,10 @@
 		    devices = xrealloc (devices, sizeof (devices[0])
 					* devices_allocated);
 		  }
-		devices[ndevices++] = xasprintf ("/dev/%s", name);
+		if (name[0] == '/')
+		  devices[ndevices++] = xstrdup (name);
+		else
+		  devices[ndevices++] = xasprintf ("/dev/%s", name);
 	      }
 	    break;
 	  }

[-- Attachment #1.3: zfs-poolname-spaces.patch --]
[-- Type: text/x-patch, Size: 1079 bytes --]

Handle pool names with spaces

Index: util/getroot.c
===================================================================
--- util/getroot.c	2012-02-03 04:21:29.199249594 -0600
+++ util/getroot.c	2012-02-03 05:02:55.693211000 -0600
@@ -260,7 +260,7 @@
   char cksum[257], notes[257];
   unsigned int dummy;
 
-  cmd = xasprintf ("zpool status %s", poolname);
+  cmd = xasprintf ("zpool status \"%s\"", poolname);
   fp = popen (cmd, "r");
   free (cmd);
 
@@ -285,8 +285,7 @@
 	      st++;
 	    break;
 	  case 1:
-	    if (!strcmp (name, poolname))
-	      st++;
+	    st++;
 	    break;
 	  case 2:
 	    if (strcmp (name, "mirror") && !sscanf (name, "mirror-%u", &dummy)
@@ -420,6 +422,9 @@
       if (sscanf (sep, "%s %s", entry.fstype, entry.device) != 2)
 	continue;
 
+      unescape (entry.fstype);
+      unescape (entry.device);
+
       /* Using the mount IDs, find out where this fits in the list of
 	 visible mount entries we've seen so far.  There are three
 	 interesting cases.  Firstly, it may be inserted at the end: this is

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-02-03 11:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-19 18:45 [Patch] Robustly search for ZFS labels & uberblocks Zachary Bedell
2011-09-28 21:20 ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-19 11:36   ` Richard Laager
2012-01-22 14:18     ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-22 20:31       ` Richard Laager
2012-01-24  7:12       ` Richard Laager
2012-01-27 19:04       ` Zachary Bedell
2012-01-27 22:22         ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-28  2:50         ` Richard Laager
2012-01-28 12:51           ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-28 16:50             ` Richard Laager
2012-01-28 17:06               ` Darik Horn
2012-01-28 17:39                 ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-28 18:33             ` Richard Laager
2012-01-28 19:21               ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-29 22:42               ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-31  8:45                 ` Richard Laager
2012-02-02 11:13                   ` Richard Laager
2012-02-03 10:02                     ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-02-03  9:52                   ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-02-03 11:20                     ` Richard Laager
2012-01-28 18:40             ` Darik Horn
2012-01-28 19:27               ` Vladimir 'φ-coder/phcoder' Serbinenko
2012-01-30  1:22             ` Richard Laager
2012-01-30  1:43               ` Vladimir 'φ-coder/phcoder' Serbinenko
2011-11-03 14:45 ` Vladimir 'φ-coder/phcoder' Serbinenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).