From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1RLyY7-0000hO-6s for mharc-grub-devel@gnu.org; Thu, 03 Nov 2011 10:45:47 -0400 Received: from eggs.gnu.org ([140.186.70.92]:57043) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RLyY1-0000gx-3t for grub-devel@gnu.org; Thu, 03 Nov 2011 10:45:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RLyXx-0004TA-RN for grub-devel@gnu.org; Thu, 03 Nov 2011 10:45:41 -0400 Received: from mail-wy0-f169.google.com ([74.125.82.169]:53476) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RLyXx-0004T3-8a for grub-devel@gnu.org; Thu, 03 Nov 2011 10:45:37 -0400 Received: by wyg24 with SMTP id 24so1620146wyg.0 for ; Thu, 03 Nov 2011 07:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:x-enigmail-version:content-type; bh=IfDA4yyfkscsp5bsmN49Q2V+KKtNXdIRB4+IhBa33yI=; b=BcH4qyjnBDkWda4LphMvxg/JWYZ+3a/4MNWBXm30C5C/cd6InDvaa17ofZIfIIZG7h Z+ohFHvcx8ZaToqnGvm9xa3bqMyi/7oFxSZI7muWSQXf0lyjdP1/sbSNA2DNTxhMU7X8 kNZ0zLCG8VOwC2yGwVGHRzXXTyorzpbYMjCq4= Received: by 10.216.220.223 with SMTP id o73mr338669wep.89.1320331536188; Thu, 03 Nov 2011 07:45:36 -0700 (PDT) Received: from debian.x201.phnet (gprs45.swisscom-mobile.ch. [193.247.250.45]) by mx.google.com with ESMTPS id a27sm10550561wbp.16.2011.11.03.07.45.33 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 03 Nov 2011 07:45:35 -0700 (PDT) Message-ID: <4EB2A8FE.5060509@gmail.com> Date: Thu, 03 Nov 2011 15:45:18 +0100 From: =?UTF-8?B?VmxhZGltaXIgJ8+GLWNvZGVyL3BoY29kZXInIFNlcmJpbmVua28=?= User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20111010 Iceowl/1.0b2 Icedove/3.1.15 MIME-Version: 1.0 To: grub-devel@gnu.org Subject: Re: [Patch] Robustly search for ZFS labels & uberblocks References: In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig60F57D4F7186C095BFD8FA22" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 74.125.82.169 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: The development of GNU GRUB List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Nov 2011 14:45:44 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig60F57D4F7186C095BFD8FA22 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 19.09.2011 20:45, Zachary Bedell wrote: > Greetings, > > In working with ZFSonLinux, I found myself with a number of somewhat in= constant pools which failed either when running grub-probe or at boot tim= e. In all cases, these pools were importable by the ZFS driver and compl= eted a `zpool scrub` with no errors reported, but still Grub wasn't able = to touch the pools. In most (but not all) cases, running `zpool scrub` m= ade the pool accessible to Grub again, though obviously that was less tha= n helpful for boot-time failures. > Patch is pretty difficult to read due to spacing changes. Please use -bB or indent > I'm including an (unfortunately rather large) patch to Grub's ZFS code = which fixes several edge cases where Grub is unable to read a pool which = is otherwise valid as far as the full ZFS driver is concerned. Changes i= nclude: > > ashift: > > * Support non-default values of ashift pool attribute. When ashift is = increased beyond 10, the size of the uberblocks changes. Scan to find ub= erblocks must account for this and skip over the extra padding. Based on= patch to Grub 0.97 by Hans Rosenfeld at http://www.illumos.org/issues/13= 03 > > > Label parsing changes: > > * Access the two end-device labels at label-size aligned location rathe= r than device_end-(2*label_size). On-disk spec document incorrectly desc= ribes end-label locations. Adapted from Grub 0.97 behavior. > > * Scan all readable labels for uberblocks and accept the one with the h= ighest txg/create date. Previously UB scan would stop if a valid UB was = found in Label 0 without checking for newer UB's in the other labels. Al= l labels must be scanned as a more recent block may be found in the other= labels in certain circumstances. This fixes a case where Grub would be = unable to access a ZFS system unless the pool was scrubbed first (but ZFS= itself saw no problems & scrub reported zero errors). > > * Verify that uberblock txg is greater than that of the label before ac= cepting the UB so that partially written UB's are not considered. > > * Verify that underlying data pointed by uberblock ub_rootbp has valid = checksums before accepting the UB. This gives some possibility of bootin= g from a technically invalid pool and effectively falls back to older ube= rblocks in a case where the correct uberblock is damaged. > > * Store pool uuid found during zfs_mount to grub_zfs_data in order to p= revent duplicated logic in zfs_uuid. > > > Data integrity: > > * Verify checksum in grub_read_data before accepting block. Attempt to= read backup DVA's if checksum on first fails. Previously backup DVA's w= ere checked only if the physical read failed, not if bad data was read wi= thout error. This resulted in pools which were valid and readable by ZFS= (and transparently healed by ZFS' read behavior) being rejected by Grub.= > > > Logging/Debugging: > > * Provide more debugging output describing inconsistencies found in poo= l. > > * Remove superflous debugging output to reduce bootup time in verbose m= ode to a mangeable level (~30min down to ~5min to boot w/ 'debug=3Dall' i= n grub.cfg) > > > > I do apologies for the monolithic patch. I took a second look at the c= hanges to try to pull certain elements apart, but I ended up in several s= ituations where the test pools I have snapshotted in VM's each hit two or= more of the issues above making testing in isolation difficult. The tot= al change set is able to probe and boot from all of the odd (as well as a= ll of the normal) pools that I have on hand. =20 > > If remixing this patch in some way would help in integrating it, I'd we= lcome any feedback on how to better present the changes. > > Best regards, > Zac Bedell > > > grub-zfs-ashift-label.patch > > > diff --git a/grub-core/fs/zfs/zfs.c b/grub-core/fs/zfs/zfs.c > index 1eea13b..d03e19b 100644 > --- a/grub-core/fs/zfs/zfs.c > +++ b/grub-core/fs/zfs/zfs.c > @@ -75,9 +75,21 @@ GRUB_MOD_LICENSE ("GPLv3+"); > static grub_dl_t my_mod; > #endif > =20 > +/* > + * Macros to get fields in a bp or DVA. > + */ > #define P2PHASE(x, align) ((x) & ((align) - 1)) > #define DVA_OFFSET_TO_PHYS_SECTOR(offset) \ > ((offset + VDEV_LABEL_START_SIZE) >> SPA_MINBLOCKSHIFT) > +=09 > +/* > + * return x rounded down to an align boundary > + * eg, P2ALIGN(1200, 1024) =3D=3D 1024 (1*align) > + * eg, P2ALIGN(1024, 1024) =3D=3D 1024 (1*align) > + * eg, P2ALIGN(0x1234, 0x100) =3D=3D 0x1200 (0x12*align) > + * eg, P2ALIGN(0x5600, 0x100) =3D=3D 0x5600 (0x56*align) > + */ > +#define P2ALIGN(x, align) ((x) & -(align)) > =20 We already have such macros, they're called ALIGN_UP and ALIGN_DOWN. > /* > * FAT ZAP data structures > @@ -147,6 +159,14 @@ struct grub_zfs_data > grub_uint64_t file_start; > grub_uint64_t file_end; > =20 > + /* XXX: ashift is per vdev, not per pool. We currently only ever to= uch > + * a single vdev, but when/if raid-z or stripes are supported, this > + * may need revision. > + */ > + grub_uint64_t vdev_ashift; > + grub_uint64_t label_txg; > + grub_uint64_t pool_guid; > + We have multidevice support and we have a special structure for it. Moreover it seems that you need it only on initial scan and so no need to permanently store it. > /* cache for a dnode block */ > dnode_phys_t *dnode_buf; > dnode_phys_t *dnode_mdn; > @@ -192,7 +212,11 @@ static decomp_entry_t decomp_table[ZIO_COMPRESS_FU= NCTIONS] =3D { > =20 > static grub_err_t zio_read_data (blkptr_t * bp, grub_zfs_endian_t endi= an, > void *buf, struct grub_zfs_data *data); > - > + =09 > +static grub_err_t > +zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,=20 > + grub_size_t *size, struct grub_zfs_data *data); > +=09 > /* > * Our own version of log2(). Same thing as highbit()-1. > */ > @@ -271,6 +295,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t = checksum, > || (actual_cksum.zc_word[2] !=3D zc.zc_word[2])=20 > || (actual_cksum.zc_word[3] !=3D zc.zc_word[3])) > { > + /* > grub_dprintf ("zfs", "checksum %d verification failed\n", checks= um); > grub_dprintf ("zfs", "actual checksum %16llx %16llx %16llx %16ll= x\n", > (unsigned long long) actual_cksum.zc_word[0],=20 > @@ -282,6 +307,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t = checksum, > (unsigned long long) zc.zc_word[1], > (unsigned long long) zc.zc_word[2],=20 > (unsigned long long) zc.zc_word[3]); > + */ Please don't comment out the code. This particular one is very useful actually. > return grub_error (GRUB_ERR_BAD_FS, "checksum verification faile= d"); > } > =20 > @@ -332,17 +358,22 @@ vdev_uberblock_compare (uberblock_t * ub1, uberbl= ock_t * ub2) > * Three pieces of information are needed to verify an uberblock: the = magic > * number, the version number, and the checksum. > * > - * Currently Implemented: version number, magic number > + * Currently Implemented: version number, magic number, label txg > * Need to Implement: checksum > * > */ > static grub_err_t > -uberblock_verify (uberblock_phys_t * ub, int offset) > +uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data= *data) > { > - uberblock_t *uber =3D &ub->ubp_uberblock; > grub_err_t err; > grub_zfs_endian_t endian =3D UNKNOWN_ENDIAN; > zio_cksum_t zc; > + void *osp =3D NULL; > + grub_size_t ospsize; > + > + if (uber->ub_txg < data->label_txg) > + return grub_error (GRUB_ERR_BAD_FS,=20 > + "ignoring partially written label: uber_txg < label_txg"); > =20 > if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) =3D=3D UBERBLO= CK_MAGIC > && grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN) > 0=20 > @@ -358,10 +389,21 @@ uberblock_verify (uberblock_phys_t * ub, int offs= et) > return grub_error (GRUB_ERR_BAD_FS, "invalid uberblock magic"); > =20 > grub_memset (&zc, 0, sizeof (zc)); > - > zc.zc_word[0] =3D grub_cpu_to_zfs64 (offset, endian); > err =3D zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian, > - (char *) ub, UBERBLOCK_SIZE); > + (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift)); > + > + if (!err) > + { > + // Check that the data pointed by the rootbp is usable. > + void *osp =3D NULL; > + grub_size_t ospsize; > + err =3D zio_read (&uber->ub_rootbp, endian, &osp, &ospsize, data= ); > + grub_free (osp); > + > + if (!err && ospsize < OBJSET_PHYS_SIZE_V14) > + return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to inva= lid data"); > + } > =20 This should be if (err) return err; .... > return err; > } > @@ -372,32 +414,40 @@ uberblock_verify (uberblock_phys_t * ub, int offs= et) > * Success - Pointer to the best uberblock. > * Failure - NULL > */ > -static uberblock_phys_t * > -find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector) > +static uberblock_t * > +find_bestub (char *ub_array, struct grub_zfs_data *data) > { > - uberblock_phys_t *ubbest =3D NULL; > - int i; > - grub_disk_addr_t offset; > + const grub_disk_addr_t sector =3D data->vdev_phys_sector; > + uberblock_t *ubbest =3D NULL; > + uberblock_t *ubnext; > + unsigned int i, offset, pickedub =3D 0; > grub_err_t err =3D GRUB_ERR_NONE; > =20 > - for (i =3D 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++= ) > + const unsigned int UBCOUNT =3D UBERBLOCK_COUNT (data->vdev_ashift); > + const grub_uint64_t UBBYTES =3D UBERBLOCK_SIZE (data->vdev_ashift); > + > + for (i =3D 0; i < UBCOUNT; i++) > { > - offset =3D (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE > - + (i << VDEV_UBERBLOCK_SHIFT); > + ubnext =3D (uberblock_t *) (i * UBBYTES + ub_array); > + offset =3D (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i *= UBBYTES); > =20 > - err =3D uberblock_verify (&ub_array[i], offset); > + err =3D uberblock_verify (ubnext, offset, data); > if (err) > - { > - grub_errno =3D GRUB_ERR_NONE; > - continue; > - } > - if (ubbest =3D=3D NULL=20 > - || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock), > - &(ubbest->ubp_uberblock)) > 0) > - ubbest =3D &ub_array[i]; > + { > + grub_errno =3D GRUB_ERR_NONE; > + continue; > + } > + if (ubbest =3D=3D NULL || vdev_uberblock_compare (ubnext, ubbest= ) > 0) > + { > + ubbest =3D ubnext; > + pickedub =3D i; > + } > } > if (!ubbest) > grub_errno =3D err; > + else > + grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",= > + pickedub, (unsigned long long) ubbest->ub_txg); > =20 Again the same error. Don't put such if's. > return (ubbest); > } > @@ -412,9 +462,6 @@ get_psize (blkptr_t * bp, grub_zfs_endian_t endian)= > static grub_uint64_t > dva_get_offset (dva_t * dva, grub_zfs_endian_t endian) > { > - grub_dprintf ("zfs", "dva=3D%llx, %llx\n",=20 > - (unsigned long long) dva->dva_word[0],=20 > - (unsigned long long) dva->dva_word[1]); > return grub_zfs_to_cpu64 ((dva)->dva_word[1],=20 > endian) << SPA_MINBLOCKSHIFT; > } Please don't mix dprintf removals with actual changes. Just makes more difficult to read. > @@ -440,11 +487,9 @@ zio_read_gang (blkptr_t * bp, grub_zfs_endian_t en= dian, dva_t * dva, void *buf, > zio_gb =3D grub_malloc (SPA_GANGBLOCKSIZE); > if (!zio_gb) > return grub_errno; > - grub_dprintf ("zfs", endian =3D=3D LITTLE_ENDIAN ? "little-endian ga= ng\n" > - :"big-endian gang\n"); > + > offset =3D dva_get_offset (dva, endian); > sector =3D DVA_OFFSET_TO_PHYS_SECTOR (offset); > - grub_dprintf ("zfs", "offset=3D%llx\n", (unsigned long long) offset)= ; > =20 > /* read in the gang block header */ > err =3D grub_disk_read (data->disk, sector, 0, SPA_GANGBLOCKSIZE, > @@ -515,10 +560,20 @@ zio_read_data (blkptr_t * bp, grub_zfs_endian_t e= ndian, void *buf, > sector =3D DVA_OFFSET_TO_PHYS_SECTOR (offset); > err =3D grub_disk_read (data->disk, sector, 0, psize, buf);=20 > } > - if (!err) > - return GRUB_ERR_NONE; > - grub_errno =3D GRUB_ERR_NONE; > - } > + > + if (!err) > + { > + // Check the underlying checksum before we rule this DVA as "good"= > + grub_uint32_t checkalgo =3D (grub_zfs_to_cpu64 ((bp)->blk_prop, en= dian) >> 40) & 0xff; > + > + err =3D zio_checksum_verify (bp->blk_cksum, checkalgo, endian, buf= , psize); > + if (!err) > + return GRUB_ERR_NONE; > + } > +=09 > + // If read failed or checksum bad, reset the error. Hopefully = we've got some more DVA's to try. You need to correctly handle the case when you don't. See e.g. the mirror reading code. Also we don't use C++ style comments. I fail to see the structure and need of this checksum move. > + grub_errno =3D GRUB_ERR_NONE; > + } > =20 > if (!err) > err =3D grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid DVA");= > @@ -539,12 +594,9 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian,= void **buf, > unsigned int comp; > char *compbuf =3D NULL; > grub_err_t err; > - zio_cksum_t zc =3D bp->blk_cksum; > - grub_uint32_t checksum; > =20 > *buf =3D NULL; > =20 > - checksum =3D (grub_zfs_to_cpu64((bp)->blk_prop, endian) >> 40) & 0xf= f; > comp =3D (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32) & 0xff; > lsize =3D (BP_IS_HOLE(bp) ? 0 : > (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian) & 0xffff) + 1) > @@ -571,7 +623,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, = void **buf, > else > compbuf =3D *buf =3D grub_malloc (lsize); > =20 > - grub_dprintf ("zfs", "endian =3D %d\n", endian); > err =3D zio_read_data (bp, endian, compbuf, data); > if (err) > { > @@ -580,15 +631,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian,= void **buf, > return err; > } > =20 > - err =3D zio_checksum_verify (zc, checksum, endian, compbuf, psize); > - if (err) > - { > - grub_dprintf ("zfs", "incorrect checksum\n"); > - grub_free (compbuf); > - *buf =3D NULL; > - return err; > - } > - > if (comp !=3D ZIO_COMPRESS_OFF) > { > *buf =3D grub_malloc (lsize); > @@ -635,7 +677,6 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, vo= id **buf, > endian =3D dn->endian; > for (level =3D dn->dn.dn_nlevels - 1; level >=3D 0; level--) > { > - grub_dprintf ("zfs", "endian =3D %d\n", endian); > idx =3D (blkid >> (epbs * level)) & ((1 << epbs) - 1); > *bp =3D bp_array[idx]; > if (bp_array !=3D dn->dn.dn_blkptr) > @@ -661,12 +702,10 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, = void **buf, > } > if (level =3D=3D 0) > { > - grub_dprintf ("zfs", "endian =3D %d\n", endian); > err =3D zio_read (bp, endian, buf, 0, data); > endian =3D (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1; > break; > } > - grub_dprintf ("zfs", "endian =3D %d\n", endian); > err =3D zio_read (bp, endian, &tmpbuf, 0, data); > endian =3D (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;= > if (err) > @@ -717,9 +756,6 @@ mzap_iterate (mzap_phys_t * zapobj, grub_zfs_endian= _t endian, int objsize, > chunks =3D objsize / MZAP_ENT_LEN - 1; > for (i =3D 0; i < chunks; i++) > { > - grub_dprintf ("zfs", "zap: name =3D %s, value =3D %llx, cd =3D %= x\n", > - mzap_ent[i].mze_name, (long long)mzap_ent[i].mze_value, > - (int)mzap_ent[i].mze_cd); > if (hook (mzap_ent[i].mze_name,=20 > grub_zfs_to_cpu64 (mzap_ent[i].mze_value, endian))) > return 1; > @@ -849,8 +885,6 @@ zap_leaf_lookup (zap_leaf_phys_t * l, grub_zfs_endi= an_t endian, > if (grub_zfs_to_cpu64 (le->le_hash,endian) !=3D h) > continue; > =20 > - grub_dprintf ("zfs", "fzap: length %d\n", (int) le->le_name_leng= th); > - > if (zap_leaf_array_equal (l, endian, blksft,=20 > grub_zfs_to_cpu16 (le->le_name_chunk,endian), > grub_zfs_to_cpu16 (le->le_name_length, endian), > @@ -1050,22 +1084,16 @@ zap_lookup (dnode_end_t * zap_dnode, char *name= , grub_uint64_t * val, > return err; > block_type =3D grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endia= n); > =20 > - grub_dprintf ("zfs", "zap read\n"); > - > if (block_type =3D=3D ZBT_MICRO) > { > - grub_dprintf ("zfs", "micro zap\n"); > err =3D (mzap_lookup (zapbuf, endian, size, name, val)); > - grub_dprintf ("zfs", "returned %d\n", err); =20 > grub_free (zapbuf); > return err; > } > else if (block_type =3D=3D ZBT_HEADER) > { > - grub_dprintf ("zfs", "fat zap\n"); > /* this is a fat zap */ > err =3D (fzap_lookup (zap_dnode, zapbuf, name, val, data)); > - grub_dprintf ("zfs", "returned %d\n", err); =20 > grub_free (zapbuf); > return err; > } > @@ -1092,18 +1120,14 @@ zap_iterate (dnode_end_t * zap_dnode, > return 0; > block_type =3D grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endia= n); > =20 > - grub_dprintf ("zfs", "zap read\n"); > - > if (block_type =3D=3D ZBT_MICRO) > { > - grub_dprintf ("zfs", "micro zap\n"); > ret =3D mzap_iterate (zapbuf, endian, size, hook); > grub_free (zapbuf); > return ret; > } > else if (block_type =3D=3D ZBT_HEADER) > { > - grub_dprintf ("zfs", "fat zap\n"); > /* this is a fat zap */ > ret =3D fzap_iterate (zap_dnode, zapbuf, hook, data); > grub_free (zapbuf); > @@ -1150,12 +1174,9 @@ dnode_get (dnode_end_t * mdn, grub_uint64_t objn= um, grub_uint8_t type, > return GRUB_ERR_NONE; > } > =20 > - grub_dprintf ("zfs", "endian =3D %d, blkid=3D%llx\n", mdn->endian,=20 > - (unsigned long long) blkid); > err =3D dmu_read (mdn, blkid, &dnbuf, &endian, data); > if (err) > return err; > - grub_dprintf ("zfs", "alive\n"); > =20 > grub_free (data->dnode_buf); > grub_free (data->dnode_mdn); > @@ -1426,27 +1447,19 @@ get_filesystem_dnode (dnode_end_t * mosmdn, cha= r *fsname, > grub_uint64_t objnum; > grub_err_t err; > =20 > - grub_dprintf ("zfs", "endian =3D %d\n", mosmdn->endian); > - > err =3D dnode_get (mosmdn, DMU_POOL_DIRECTORY_OBJECT,=20 > DMU_OT_OBJECT_DIRECTORY, mdn, data); > if (err) > return err; > =20 > - grub_dprintf ("zfs", "alive\n"); > - > err =3D zap_lookup (mdn, DMU_POOL_ROOT_DATASET, &objnum, data); > if (err) > return err; > =20 > - grub_dprintf ("zfs", "alive\n"); > - > err =3D dnode_get (mosmdn, objnum, DMU_OT_DSL_DIR, mdn, data); > if (err) > return err; > =20 > - grub_dprintf ("zfs", "alive\n"); > - > while (*fsname) > { > grub_uint64_t childobj; > @@ -1491,8 +1504,6 @@ make_mdn (dnode_end_t * mdn, struct grub_zfs_data= *data) > grub_size_t ospsize; > grub_err_t err; > =20 > - grub_dprintf ("zfs", "endian =3D %d\n", mdn->endian); > - > bp =3D &(((dsl_dataset_phys_t *) DN_BONUS (&mdn->dn))->ds_bp); > err =3D zio_read (bp, mdn->endian, &osp, &ospsize, data); > if (err) > @@ -1558,7 +1569,6 @@ dnode_get_fullpath (const char *fullpath, dnode_e= nd_t * mdn, > grub_dprintf ("zfs", "fsname =3D '%s' snapname=3D'%s' filename =3D= '%s'\n",=20 > fsname, snapname, filename); > } > - grub_dprintf ("zfs", "alive\n"); > err =3D get_filesystem_dnode (&(data->mos), fsname, dn, data); > if (err) > { > @@ -1567,12 +1577,8 @@ dnode_get_fullpath (const char *fullpath, dnode_= end_t * mdn, > return err; > } > =20 > - grub_dprintf ("zfs", "alive\n"); > - > headobj =3D grub_zfs_to_cpu64 (((dsl_dir_phys_t *) DN_BONUS (&dn->dn= ))->dd_head_dataset_obj, dn->endian); > =20 > - grub_dprintf ("zfs", "endian =3D %d\n", mdn->endian); > - > err =3D dnode_get (&(data->mos), headobj, DMU_OT_DSL_DATASET, mdn, d= ata); > if (err) > { > @@ -1580,7 +1586,6 @@ dnode_get_fullpath (const char *fullpath, dnode_e= nd_t * mdn, > grub_free (snapname); > return err; > } > - grub_dprintf ("zfs", "endian =3D %d\n", mdn->endian); > =20 > if (snapname) > { > @@ -1606,8 +1611,6 @@ dnode_get_fullpath (const char *fullpath, dnode_e= nd_t * mdn, > *mdnobj =3D headobj; > =20 > make_mdn (mdn, data); > - =20 > - grub_dprintf ("zfs", "endian =3D %d\n", mdn->endian); > =20 > if (*isfs) > { > @@ -1864,11 +1867,9 @@ zfs_fetch_nvlist (struct grub_zfs_data * data, c= har **nvlist) > static grub_err_t > check_pool_label (struct grub_zfs_data *data) > { > - grub_uint64_t pool_state, txg =3D 0; > - char *nvlist; > -#if 0 > - char *nv; > -#endif > + grub_uint64_t pool_state; > + char *nvlist; // for the pool > + char *vdevnvlist; // for the vdev > grub_uint64_t diskguid; > grub_uint64_t version; > int found; > @@ -1878,8 +1879,6 @@ check_pool_label (struct grub_zfs_data *data) > if (err) > return err; > =20 > - grub_dprintf ("zfs", "check 2 passed\n"); > - > found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_S= TATE, > &pool_state); > if (! found) > @@ -1889,16 +1888,16 @@ check_pool_label (struct grub_zfs_data *data) > grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_STATE " not found"); > return grub_errno; > } > - grub_dprintf ("zfs", "check 3 passed\n"); > =20 > if (pool_state =3D=3D POOL_STATE_DESTROYED) > { > grub_free (nvlist); > return grub_error (GRUB_ERR_BAD_FS, "zpool is marked as destroye= d"); > } > - grub_dprintf ("zfs", "check 4 passed\n"); > =20 > - found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_T= XG, &txg); > + data->label_txg =3D 0; > + found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_T= XG, > + &data->label_txg); > if (!found) > { > grub_free (nvlist); > @@ -1906,15 +1905,13 @@ check_pool_label (struct grub_zfs_data *data) > grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_TXG " not found"); > return grub_errno; > } > - grub_dprintf ("zfs", "check 6 passed\n"); > =20 > /* not an active device */ > - if (txg =3D=3D 0) > + if (data->label_txg =3D=3D 0) > { > grub_free (nvlist); > return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active"); > } > - grub_dprintf ("zfs", "check 7 passed\n"); > =20 > found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_VERSIO= N, > &version); > @@ -1925,42 +1922,79 @@ check_pool_label (struct grub_zfs_data *data) > grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VERSION " not found"); > return grub_errno; > } > - grub_dprintf ("zfs", "check 8 passed\n"); > =20 > if (version > SPA_VERSION) > { > grub_free (nvlist); > return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET, > - "too new version %llu > %llu", > + "SPA version too new %llu > %llu", > (unsigned long long) version, > (unsigned long long) SPA_VERSION); > } > - grub_dprintf ("zfs", "check 9 passed\n"); > -#if 0 > - if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE, &nv, > - DATA_TYPE_NVLIST, NULL)) > + > + vdevnvlist =3D grub_zfs_nvlist_lookup_nvlist (nvlist, ZPOOL_CONFIG_V= DEV_TREE); > + if (!vdevnvlist) > { > - grub_free (vdev); > - return (GRUB_ERR_BAD_FS); > + grub_free (nvlist); > + if (!grub_errno) > + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found"); > + return grub_errno; > + } > + > + found =3D grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_AS= HIFT, > + &data->vdev_ashift); > + grub_free (vdevnvlist); > + if (!found) > + { > + grub_free (nvlist); > + if (!grub_errno) > + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found"); > + return grub_errno; > } > - grub_dprintf ("zfs", "check 10 passed\n"); > -#endif > =20 > found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID, = &diskguid); > if (! found) > { > grub_free (nvlist); > if (! grub_errno) > - grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found"); > + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found"); > + return grub_errno; > + } > + > + found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_G= UID, &data->pool_guid); > + if (! found) > + { > + grub_free (nvlist); > + if (! grub_errno) > + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not foun= d"); > return grub_errno; > } > - grub_dprintf ("zfs", "check 11 passed\n"); > =20 > grub_free (nvlist); > =20 > + grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %ll= u (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n", > + (unsigned long long) data->pool_guid, > + (unsigned long long) data->pool_guid, > + (unsigned long long) diskguid, > + (unsigned long long) diskguid, > + (unsigned long long) data->label_txg, > + (unsigned long long) version, > + (unsigned long long) data->vdev_ashift); > + > return GRUB_ERR_NONE; > } > =20 > +/* > + * vdev_label_start returns the physical disk offset (in bytes) of > + * label "l". > + */ > +static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l) > +{ > + return (l * sizeof (vdev_label_t) + (l < VDEV_LABELS / 2 ? > + 0 : psize - > + VDEV_LABELS * sizeof (vdev_label_t))); > +} > + > static void > zfs_unmount (struct grub_zfs_data *data) > { > @@ -1979,13 +2013,13 @@ static struct grub_zfs_data * > zfs_mount (grub_device_t dev) > { > struct grub_zfs_data *data =3D 0; > - int label =3D 0; > - uberblock_phys_t *ub_array, *ubbest =3D NULL; > - vdev_boot_header_t *bh; > + int label =3D 0, bestlabel =3D -1; > + char *ub_array; > + uberblock_t *ubbest; > + uberblock_t *ubcur =3D NULL; > void *osp =3D 0; > grub_size_t ospsize; > grub_err_t err; > - int vdevnum; > =20 > if (! dev->disk) > { > @@ -1997,11 +2031,6 @@ zfs_mount (grub_device_t dev) > if (!data) > return 0; > grub_memset (data, 0, sizeof (*data)); > -#if 0 > - /* if it's our first time here, zero the best uberblock out */ > - if (data->best_drive =3D=3D 0 && data->best_part =3D=3D 0 && find_be= st_root) > - grub_memset (¤t_uberblock, 0, sizeof (uberblock_t)); > -#endif > =20 > data->disk =3D dev->disk; > =20 > @@ -2012,100 +2041,129 @@ zfs_mount (grub_device_t dev) > return 0; > } > =20 > - bh =3D grub_malloc (VDEV_BOOT_HEADER_SIZE); > - if (!bh) > + ubbest =3D grub_malloc (sizeof (*ubbest)); > + if (!ubbest) > { > zfs_unmount (data); > - grub_free (ub_array); > return 0; > } > + grub_memset (ubbest, 0, sizeof (*ubbest)); > =20 > - vdevnum =3D VDEV_LABELS; > + // Establish some constants for where things are on the device: > =20 > - /* Don't check back labels on CDROM. */ > - if (grub_disk_get_size (dev->disk) =3D=3D GRUB_DISK_SIZE_UNKNOWN) > - vdevnum =3D VDEV_LABELS / 2; > + /* > + * some eltorito stacks don't give us a size and > + * we end up setting the size to MAXUINT, further > + * some of these devices stop working once a single > + * read past the end has been issued. Checking > + * for a maximum part_length and skipping the backup > + * labels at the end of the slice/partition/device > + * avoids breaking down on such devices. > + */ > + const int vdevnum =3D > + grub_disk_get_size (dev->disk) =3D=3D GRUB_DISK_SIZE_UNKNOWN ? > + VDEV_LABELS / 2 : VDEV_LABELS; > + > + // Size in bytes of the device (disk or partition) aligned to label = size > + grub_uint64_t device_size =3D=20 > + grub_disk_get_size (dev->disk) << GRUB_DISK_SECTOR_BITS; > + const grub_uint64_t alignedbytes =3D > + P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t)); This is better to be moved to vdev_label_start. > =20 > - for (label =3D 0; ubbest =3D=3D NULL && label < vdevnum; label++) > + for (label =3D 0; label < vdevnum; label++) > { > - grub_zfs_endian_t ub_endian =3D UNKNOWN_ENDIAN; > - grub_dprintf ("zfs", "label %d\n", label); > + grub_uint64_t labelstartbytes =3D vdev_label_start (alignedbytes= , label); > + grub_uint64_t labelstart =3D labelstartbytes >> GRUB_DISK_SECTOR= _BITS; > =20 > - data->vdev_phys_sector > - =3D label * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT) > - + ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT) > - + (label < VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk) > - - VDEV_LABELS * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT)); > + grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu= )\n", > + label, (unsigned long long) labelstart,=20 > + (unsigned long long) labelstartbytes); > =20 > - /* Read in the uberblock ring (128K). */ > - err =3D grub_disk_read (data->disk, data->vdev_phys_sector > - + (VDEV_PHYS_SIZE >> SPA_MINBLOCKSHIFT), > - 0, VDEV_UBERBLOCK_RING, (char *) ub_array); > + data->vdev_phys_sector =3D labelstart + > + ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> GRUB_DISK_SECTOR_BITS);= > + > + err =3D check_pool_label (data); > if (err) > - { > - grub_errno =3D GRUB_ERR_NONE; > - continue; > - } > - grub_dprintf ("zfs", "label ok %d\n", label); > + { > + grub_dprintf ("zfs", "error checking label %d\n", label); > + grub_errno =3D GRUB_ERR_NONE; > + continue; > + } > =20 > - ubbest =3D find_bestub (ub_array, data->vdev_phys_sector); > - if (!ubbest) > - { > - grub_dprintf ("zfs", "No uberblock found\n"); > - grub_errno =3D GRUB_ERR_NONE; > - continue; > - } > - ub_endian =3D (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic= ,=20 > - LITTLE_ENDIAN) =3D=3D UBERBLOCK_MAGIC=20 > - ? LITTLE_ENDIAN : BIG_ENDIAN); > - err =3D zio_read (&ubbest->ubp_uberblock.ub_rootbp,=20 > - ub_endian, > - &osp, &ospsize, data); > + /* Read in the uberblock ring (128K). */ > + err =3D grub_disk_read (data->disk, > + data->vdev_phys_sector + > + (VDEV_PHYS_SIZE >> GRUB_DISK_SECTOR_BITS), > + 0, VDEV_UBERBLOCK_RING, ub_array); > if (err) > - { > - grub_dprintf ("zfs", "couldn't zio_read\n");=20 > - grub_errno =3D GRUB_ERR_NONE; > - continue; > - } > + { > + grub_dprintf ("zfs", "error reading uberblock ring for label %d\n"= , label); > + grub_errno =3D GRUB_ERR_NONE; > + continue; > + } > =20 > - if (ospsize < OBJSET_PHYS_SIZE_V14) > - { > - grub_dprintf ("zfs", "osp too small\n");=20 > - grub_free (osp); > - continue; > - } > - grub_dprintf ("zfs", "ubbest %p\n", ubbest); > + ubcur =3D find_bestub (ub_array, data); > + if (!ubcur) > + { > + grub_dprintf ("zfs", "No good uberblocks found in label %d\n", = label); > + grub_errno =3D GRUB_ERR_NONE; > + continue; > + } > =20 > - err =3D check_pool_label (data); > - if (err) > + if (vdev_uberblock_compare (ubcur, ubbest) > 0) > + { > + // Looks like the block is good, so use it. > + grub_memcpy (ubbest, ubcur, sizeof (*ubbest)); > + bestlabel =3D label; > + grub_dprintf ("zfs", "Current best uberblock found in label %d\= n", label); > + } > + } > + grub_free (ub_array); > + > + // We zero'd the structure to begin with. If we never assigned to i= t,=20 > + // magic will still be zero. > + if (!ubbest->ub_magic) > + { > + grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");= > + zfs_unmount (data); > + grub_free (ubbest); > + return 0; > + } > + > + grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel); > + > + grub_zfs_endian_t ub_endian =3D > + grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) =3D=3D UBERBLO= CK_MAGIC > + ? LITTLE_ENDIAN : BIG_ENDIAN; > + err =3D zio_read (&ubbest->ub_rootbp, ub_endian, &osp, &ospsize, dat= a); > + > + if (err) > { > - grub_errno =3D GRUB_ERR_NONE; > - continue; > + grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");= > + zfs_unmount (data); > + grub_free (ubbest); > + return 0; > } > -#if 0 > - if (find_best_root && > - vdev_uberblock_compare (&ubbest->ubp_uberblock, > - &(current_uberblock)) <=3D 0) > - continue; > -#endif > - /* Got the MOS. Save it at the memory addr MOS. */ > - grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta= _dnode, > - DNODE_SIZE); > - data->mos.endian =3D (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.u= b_rootbp.blk_prop, ub_endian) >> 63) & 1; > - grub_memmove (&(data->current_uberblock), > - &ubbest->ubp_uberblock, sizeof (uberblock_t)); > - grub_free (ub_array); > - grub_free (bh); > - grub_free (osp); > - return data; =20 > + > + if (ospsize < OBJSET_PHYS_SIZE_V14) > + { > + grub_error (GRUB_ERR_BAD_FS, "osp too small"); > + zfs_unmount (data); > + grub_free (osp); > + grub_free (ubbest); > + return 0; > } > - grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label"); > - zfs_unmount (data); > - grub_free (ub_array); > - grub_free (bh); > - grub_free (osp); > =20 > - return 0; > + /* Got the MOS. Save it at the memory addr MOS. */ > + grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dno= de, DNODE_SIZE); > + data->mos.endian =3D > + (grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian) >> 63) & = 1; > + grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_= t)); > + > + grub_free (osp); > + grub_free (ubbest); > + =20 > + return (data); > } > =20 > grub_err_t > @@ -2149,33 +2207,18 @@ zfs_label (grub_device_t device, char **label) > static grub_err_t=20 > zfs_uuid (grub_device_t device, char **uuid) > { > - char *nvlist; > - int found; > struct grub_zfs_data *data; > - grub_uint64_t guid; > - grub_err_t err; > - > - *uuid =3D 0; > =20 > data =3D zfs_mount (device); > if (! data) > return grub_errno; > =20 > - err =3D zfs_fetch_nvlist (data, &nvlist); > - if (err) > - { > - zfs_unmount (data); > - return err; > - } > - > - found =3D grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_G= UID, &guid); > - if (! found) > - return grub_errno; > - grub_free (nvlist); > - *uuid =3D grub_xasprintf ("%016llx", (long long unsigned) guid); > + *uuid =3D grub_xasprintf ("%016llx", (long long unsigned) data->pool= _guid); > zfs_unmount (data); > + > if (! *uuid) > return grub_errno; > + > return GRUB_ERR_NONE; > } > =20 > diff --git a/include/grub/zfs/uberblock_impl.h b/include/grub/zfs/uberb= lock_impl.h > index 1bf7f2b..94b5ad7 100644 > --- a/include/grub/zfs/uberblock_impl.h > +++ b/include/grub/zfs/uberblock_impl.h > @@ -23,6 +23,8 @@ > #ifndef _SYS_UBERBLOCK_IMPL_H > #define _SYS_UBERBLOCK_IMPL_H > =20 > +#define UBMAX(a,b) ((a) > (b) ? (a) : (b)) > + > /* > * The uberblock version is incremented whenever an incompatible on-di= sk > * format change is made to the SPA, DMU, or ZAP. > @@ -45,16 +47,10 @@ typedef struct uberblock { > blkptr_t ub_rootbp; /* MOS objset_phys_t */ > } uberblock_t; > =20 > -#define UBERBLOCK_SIZE (1ULL << UBERBLOCK_SHIFT) > -#define VDEV_UBERBLOCK_SHIFT UBERBLOCK_SHIFT > - > -/* XXX Uberblock_phys_t is no longer in the kernel zfs */ > -typedef struct uberblock_phys { > - uberblock_t ubp_uberblock; > - char ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) - > - sizeof (zio_eck_t)]; > - zio_eck_t ubp_zec; > -} uberblock_phys_t; > - > +#define VDEV_UBERBLOCK_SHIFT(as) UBMAX(as, UBERBLOCK_SHIFT) > +#define UBERBLOCK_SIZE(as) (1ULL << VDEV_UBERBLOCK_SHIFT(as)) > +=20 > +// Number of uberblocks that can fit in the ring at a given ashift > +#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHI= FT(as)) > =20 > #endif /* _SYS_UBERBLOCK_IMPL_H */ > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel --=20 Regards Vladimir '=CF=86-coder/phcoder' Serbinenko --------------enig60F57D4F7186C095BFD8FA22 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREKAAYFAk6yqP4ACgkQNak7dOguQgnT5QD8DU3snqYxBYHxNO6M0Qt8a5X0 91LSrweYHrefhC2jxCkBALYIxK0wwGkCPNN2L/B7AAF7tiRbSKAVT1iYiWjDoVkz =rZaF -----END PGP SIGNATURE----- --------------enig60F57D4F7186C095BFD8FA22--