linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blkid detection for ZFS
@ 2008-02-15  1:07 Andreas Dilger
  2008-02-20 12:57 ` Theodore Tso
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Dilger @ 2008-02-15  1:07 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

Attached is a patch to detect ZFS in libblkid.  It isn't by any means
complete, because it doesn't report the LABEL or UUID of the device,
nor names any of the constituent filesystems.  The latter is quite
complex to implement and may be beyond the scope of libblkid.

Some input is welcome here also...  There is a UUID (GUID) for the whole
"pool" (aggregation of devices that ZFS filesystems might live on), a
UUID for the "virtual device" (vdev) (akin to MD RAID set) that a disk
is part of and also a separate UUID for each device.  There is a LABEL
(pool name) for the whole pool, but not one for an individual filesystem.

I'm thinking of making the blkid UUID be the GUID of the whole pool, as
any device in the pool would be sufficient to locate all of the component
devices.  This means all devices in the same pool will return the same
UUID, but for identification that should be fine I think...  I haven't
checked for pathologies in libblkid regarding that yet.

On a related note - on Solaris the ZFS filesystems always live in a GPT
partition table, and I note that libblkid doesn't identify this.  Is that
something we want to start adding to libblkid (e.g. GPT, DOS, LVM, etc)?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: e2fsprogs-blkid-zfs.patch --]
[-- Type: text/plain; NAME=e2fsprogs-blkid-zfs.patch; charset=utf-8, Size: 6200 bytes --]

Index: e2fsprogs-cfs/lib/blkid/probe.c
===================================================================
--- e2fsprogs-cfs.orig/lib/blkid/probe.c
+++ e2fsprogs-cfs/lib/blkid/probe.c
@@ -647,6 +647,21 @@ static int probe_jfs(struct blkid_probe 
 	return 0;
 }
 
+static int probe_zfs(struct blkid_probe *probe, struct blkid_magic *id,
+		     unsigned char *buf)
+{
+	struct zfs_uber_block *zub;
+	char *vdev_label;
+	const char *pool_name = 0;
+
+	zub = (struct zfs_uber_block *)buf;
+
+	/* read nvpair data for pool name, pool GUID (complex) */
+	//blkid_set_tag(probe->dev, "LABEL", pool_name, sizeof(pool_name));
+	//set_uuid(probe->dev, pool_guid, 0);
+	return 0;
+}
+
 static int probe_luks(struct blkid_probe *probe,
 		       struct blkid_magic *id __BLKID_ATTR((unused)),
 		       unsigned char *buf)
@@ -896,15 +911,6 @@ static int probe_hfsplus(struct blkid_pr
 }
 
 /*
- * BLKID_BLK_OFFS is at least as large as the highest bim_kboff defined
- * in the type_array table below + bim_kbalign.
- *
- * When probing for a lot of magics, we handle everything in 1kB buffers so
- * that we don't have to worry about reading each combination of block sizes.
- */
-#define BLKID_BLK_OFFS	64	/* currently reiserfs */
-
-/*
  * Various filesystem magics that we can check for.  Note that kboff and
  * sboff are in kilobytes and bytes respectively.  All magics are in
  * byte strings so we don't worry about endian issues.
@@ -954,6 +960,8 @@ static struct blkid_magic type_array[] =
   { "iso9660",	32,	 1,  5, "CD001",		probe_iso9660 },
   { "iso9660",	32,	 9,  5, "CDROM",		probe_iso9660 },
   { "jfs",	32,	 0,  4, "JFS1",			probe_jfs },
+  { "zfs",     128,	 0,  8, "\0\0\0\0\x00\xba\xb1\x0c", probe_zfs },
+  { "zfs",     128,	 0,  8, "\x0c\xb1\xba\x00\0\0\0\0", probe_zfs },
   { "hfsplus",	 1,	 0,  2, "BD",			probe_hfsplus },
   { "hfsplus",	 1,	 0,  2, "H+",			0 },
   { "hfs",	 1,	 0,  2, "BD",			0 },
@@ -1074,7 +1082,7 @@ try_again:
 		if (!buf)
 			continue;
 
-		if (memcmp(id->bim_magic, buf + (id->bim_sboff&0x3ff),
+		if (memcmp(id->bim_magic, buf + (id->bim_sboff & 0x3ff),
 			   id->bim_len))
 			continue;
 
@@ -1104,7 +1112,7 @@ try_again:
 		dev = 0;
 		goto found_type;
 	}
-		
+
 found_type:
 	if (dev && type) {
 		dev->bid_devno = st.st_rdev;
@@ -1113,7 +1121,7 @@ found_type:
 		cache->bic_flags |= BLKID_BIC_FL_CHANGED;
 
 		blkid_set_tag(dev, "TYPE", type, 0);
-				
+
 		DBG(DEBUG_PROBE, printf("%s: devno 0x%04llx, type %s\n",
 			   dev->bid_name, (long long)st.st_rdev, type));
 	}
Index: e2fsprogs-cfs/lib/blkid/probe.h
===================================================================
--- e2fsprogs-cfs.orig/lib/blkid/probe.h
+++ e2fsprogs-cfs/lib/blkid/probe.h
@@ -190,6 +190,16 @@ struct jfs_super_block {
 	unsigned char	js_loguuid[16];
 };
 
+#define UBERBLOCK_MAGIC         0x00bab10c              /* oo-ba-bloc!  */
+struct zfs_uberblock {
+	__u64		ub_magic;	/* UBERBLOCK_MAGIC		*/
+	__u64		ub_version;	/* ZFS_VERSION			*/
+	__u64		ub_txg;		/* txg of last sync		*/
+	__u64		ub_guid_sum;	/* sum of all vdev guids	*/
+	__u64		ub_timestamp;	/* UTC time of last sync	*/
+	char		ub_rootbp;	/* MOS objset_phys_t		*/
+};
+
 struct romfs_super_block {
 	unsigned char	ros_magic[8];
 	__u32		ros_dummy1[2];
Index: e2fsprogs-cfs/lib/blkid/tests/zfs.results
===================================================================
--- /dev/null
+++ e2fsprogs-cfs/lib/blkid/tests/zfs.results
@@ -0,0 +1 @@
+TYPE='zfs'
Index: e2fsprogs-cfs/lib/blkid/tests/zfs.img.bz2
===================================================================
--- /dev/null
+++ e2fsprogs-cfs/lib/blkid/tests/zfs.img.bz2
@@ -0,0 +1,18 @@
+BZh91AY&SYgëœ
+=0'ˆÓ\x13M2z\x11§¡\x1abL2i¤ö£i¨i”ôÍ dz†3\x10di‰‚LOL“\x06SL§B\x15#Rz\f©æ‰§£Jy¤ž\x13&§šM1\x11á4ôi0
+m\r2Lž™O4šž›SPzŸ¢ž\x13Iå<§¦˜§£\rCOSÐ&\f&†šLhdÓTöI©“Í'¡5<£ÓM2z§©új'¤Ã\x11\f„	âMŠhôÒ\x18ŒÔÉèM¤Äôž£\x10É“'¡\x1a\x18h‡¦Q´ÔòM’cDɍO)èL2\x016š#M2i¡‰“\x1aé=LLš\r4Ä^[PÓOD\x18š2=OPÓ@e\fšSƧ¦E?HÈõ5\x1f¤Oi\x0fIä\x13Í&\x11“#Q”ý\x14ü™SÄÚ§å4FÔÓ
+zš6hSÈfSSÒ~‰¤òOÒžÕ?Tý\x14\x1fŠ\x1až)´Ú§šjž(òÔõ=(=OhÒžšS44ýIê?TÓɦÕ\b¤M6‘<M šh2OFš4Lh›DÓSÒxSzSÉ<§”fšOPóI?TÚži5\x1eM'¦˜¦y	‰=5=\f£Ê4ôž§©êzžSÄÅ=¥<Òa\x13ô)é”\x1eQáOSÔòf”اêžPÌSõ4Ô	\x12!‘\x01\x06F™\x19¦\r\x19\f 4Ä\r4dѦ\x1a22z†B\x1al¡§”4d`šhÐhÉ 4Ñ \x06Fš\x1a\x01‘§©¦€Ñ§¤\x1ah\x1e Èh\vè(Åâ±²]&\x02€Z@X¬_ErQg*rtLm¼¢aÐ\x13jŸõ$Ûf6~GQíWüZû"·!¶ÍÔp6Ûw\vmÂåq¨§Oq´TO ¦B! m9ófN6tú\x13çРPh\x18Ðë÷Ì/[¦ªõ~©Úì>¾Îl7’†“eüüø1œ™‘®÷û3'ÔeŽ¡\a'ÚªBÆ\x15¢½¨ðÿjä£çÓóÅêUð"\x10<ð‰/¿ñ\fϘY	¤?K\x15Ò}«uC’»ïW\x0e6l–a>EžºØâ'™Õó\bz…ô;5èÍ\x05éØÌëÎï”9Öˑ䂂\x016
+’Œ]H°S0óbdªltª&òT”lƒæççÍ·N\x05??+;Lx\x13"÷6$\x04$ÛˆH\x10%Ó°@Q°\x10•SEK@‚èÁ.\x15­“ÈI*LŽ*MæQ)=æ\©@„žÂ+\x1cŒxhBI!W±$´­\x02\x10X4‚¡ˆ0í	\x16,£g$È}\v[\x16‹\x16’¥f­ˆ
+¥À³Î\x18|Ö8X–2°¾4ÏÆJÊÔ#Ñ8_\x18p<rKÙ¦ŠóK\x16Ã\x1c
+Ò\x1aw!?C?»€óaÏàRT8L`mö\x0f$\x0få«E)\x06MEÇ=Ò–ª1›*_&_bÛwˆY\x1cQÿ²U@ëŸÝ¶ˆûU<’ݼª<µ·\f\fÆ‘AÏ×á’t©ÜnŽÑJ8¶æÆ#ÜI·Ùܲéxõý3›Yf\x15\a*Ißðµh—œ$3¢=\x1dsh!ú¤>iÔ\x1a>Ó…õ\x17¦ðñ Ž!Xƒ¼£u2{ïòA\x17¬ë›\ftçO\x11‚}ñ
+E\x19Û~ØÖŸ„\x16Ž…‡\x1aÈëTWjÞ›ÛîÆ'9u+ž+æù㿆D…\a,”¡ŠÙŽ\x06Ul]\x13ࣺZš*8/½ÔãÐD0³ýúÖ«w˜´h\x14º*\x18ÞŠC \x03å\aÓ—Í6N[\x1en Bù}×\fIA¥•κXQü<(µÓ\x16ïÿâ\x12”\x13\x17\x15yZ!eƒ§(ëú¬­¼Ã€²k\x1a\x19­à\ÌG%Z@°wÝ¡ÊíI\x1d”è#°!Æ‹à.nå<¾ÕOð÷¶±œz\x14@œ`ÉNÔÜD§;û\x14G`Î+ûÚ\ríð¸ÝtYí_?£ñqÊéÝ\x03ÑÞKó\x01d6Sa³LòÛÝñÙ½vʃàFUîÄÕKá\x19\x1cªq¬„\x13ëÝ~	ì‘
+0(\x02ŠFÇTø€}Ku¤ž–†xî4
+\x03È47¾¢Áì=LÕp[÷÷ÍZvÕÛëbÚkPöö\x1a\x17\x0eŸHìÄt,\x0eâd\v´èX‰e·Kû\x14 à
+#á\x1a®ìâ9Uë¿\x1aÜ=WT\x10ç£r»šó\x19"càîÎÍqØ\x1f¬ÞY¢\x1d:\x7f^[#¶f)
+=¤\x0f¬•\x1ažMýÃ/5]/\A{é~Cz|gN:F oÎ\x05þ#ß[?7@ÿ\x17Û5†¤Ê?¬`ïlQ^;³x¬äÍT¦…Òèø¥ekß#uÙšÇ\x16C;þ§\x12Ùž.@j€^[º¢“}sPÈ+'‚\x03"\x1e\x1c¾Ö\Š[n¾¶Ã§/Åò³§ËhÓi\x16XbSÒT<ø³\b\x05Ûãn<_­¼Nœ¥¤Ì±I`Ø\x12}Έ\x10Š\x1e¡œm¥¯aj\x19ñQÝ^[\x12ÙÛuw¢×\x11Z8UadV\x1fÉðOJ\x12†\x0f,d\x02*Oü\x1cÎ%æ|f´óa•¶=¸œ4¶ÿK¾i»IT¦£
+²°ƒG¡eb®£rù6\x0fJ^:>vOè¨iª ø@j³þ\x1e¼\rijk\x0fóç30Ls³±Á•\b\f\x18
+}µ\x160SÌIÇD•Öi6I  ,B\x1f.rªˆ\x15ÖßiRõ†•«JTõÓz\fŠÚÍDÒP\x12†'[\x04‘´ÒP¹Òu\x03ˆ\r»\b^\a¡X:\r
+-Ä\x1a—\x16]¢ø-\x1côG¼´$íâΰŒc}œ‰ì$RI\x15¥)E­´~
+×é\x10åd0\x0fž‹: èI¿#\x15²<pd!âÜ\x05ÖØÿ59\x1d8\x12Ým¬\x0f©¦Ö”¬\b6\fÇ\v9^ې¥i[\r¾¡ñ™?ŠIôèWU\x0eL·/>e\x15YëÓ6T3ñ”	\x18ü2í\x05ç *\x13Kðh×ð¼\x1dK\x034Éó3\x06[ê\x14õÖ\x05Ø\x14	\x15^\x03%9Q\x16V\fÈþmžBámo0Òqô"ª¥"é°Z\x1a\x1eâ(»úÉ$
++†ò]á8¢™ÆüÉX÷PdÉÄFäžïXf²^ZöWÇw\x19†DŒ¬7\bßè\x14U\x152õÉ5\x1aþ'½\x1dUøYÙ¯ª&!t^QãYêAÁÄ£¾wߣœØU2{\x04¸\x10ƒä¤öL\bIL(…úm?ïí"½3øZ[H#èŸÌVhÇgbÁûhÄ\x11u³‘\vfñjZ\x02Ü~Ž|Lc+Ó†UóÙê=Œ2ç«ÞÊ,C\x7f \x0eáó™qçÇO™SR$«üÒ!ÄÀ¶\x7faáÂìn\x16J9¨\x02³…È\x1dÆ}V\b\x1ftÚœ>GÖâæíˆJ›Ì¾ ¤\x03³~\x18<¾M1ß>7å)OYòè+¹TjÚ\x16è-•b\x1ccŸËi€sÁ$ûæ¨Á¯´Ý\x7fT)½ÂÛ\x17CbÀK*ß±‰÷\x7f¶›6X\x12R«6_¥"–ˆ!oX[Å\x11§Ñ^[\x17oê\vµê ~›ÐÅ¾0[¡NZöŽdYÍ`õ\x11"‚nŒ\x12I
+ÿšû'w·Ëþ\x16ëâhµWßÓgI]¬Çr|\x15%§°åt•Ö4–Üî‡3ùú“ï\a\y@B\f\x06Î×?°uè¼W|ªZÿÄþzm„hø\x1egÄEñ%]Í蝹ÅÝŽ~¹n\v/ûͶ§ÿÅÜ‘N\x14$\x19ãúç
\ No newline at end of file

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] blkid detection for ZFS
  2008-02-15  1:07 [PATCH] blkid detection for ZFS Andreas Dilger
@ 2008-02-20 12:57 ` Theodore Tso
  2008-02-21  9:20   ` Andreas Dilger
  0 siblings, 1 reply; 3+ messages in thread
From: Theodore Tso @ 2008-02-20 12:57 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

On Thu, Feb 14, 2008 at 06:07:40PM -0700, Andreas Dilger wrote:
> Some input is welcome here also...  There is a UUID (GUID) for the whole
> "pool" (aggregation of devices that ZFS filesystems might live on), a
> UUID for the "virtual device" (vdev) (akin to MD RAID set) that a disk
> is part of and also a separate UUID for each device.  There is a LABEL
> (pool name) for the whole pool, but not one for an individual filesystem.

Are there devices for that are made available for the vdev and the
pool?  I assume not for the pool since that's a filesystem entity, but
what about the vdev?

In general, blkid is all about mapping the UUID of what lives on the
device to the device filename, for the benefit of programs like mount
and fsck.

I don't know enough about ZFS in terms of how you would mount a
filesystem which is part of a pool.  How is the filesystem specified
to the "mount" command?

> I'm thinking of making the blkid UUID be the GUID of the whole pool, as
> any device in the pool would be sufficient to locate all of the component
> devices.  This means all devices in the same pool will return the same
> UUID, but for identification that should be fine I think...  I haven't
> checked for pathologies in libblkid regarding that yet.

What I did with the freshly checked in code to identify LVM2 volumes
is that if /dev/sda1 is part of a physical volume, then blkid returns
the UUID for the PV for /dev/sda1.  The filesystem UUID's end up
getting associated via blkid entries for /dev/mapper/FOO, and we don't
bother trying to associate the UUID for the raidset with anything,
since it's not really associated with a physical block device.

So it would seem to me that it would be better to make the UUID be for
a particular ZFS physical disk be the UUID for that disk, and not for
the whole pool.  The question really, though, is what actually would
be most useful --- who is going to actually use blkid on a Solaris
system with ZFS?  It may be that the right answer is to put the pool
UUID as a separate tag; blkid supports more than just the standard
LABEL, UUID, TYPE, etc. tags.  You could easily stash the pool UUID in
a POOL_GUID tag, if it would be useful for some blkid callers.

> On a related note - on Solaris the ZFS filesystems always live in a GPT
> partition table, and I note that libblkid doesn't identify this.  Is that
> something we want to start adding to libblkid (e.g. GPT, DOS, LVM, etc)?

What do you mean by not identifying the GPT partition table?  At the
moment we haven't been identifying the whole disk partition tables,
mainly becuase there isn't much use for it especially for the DOS MBR
(no uuid or label to speak of).

I just checked in a patch from Eric to detect LVM2 PV's, because it
was useful for the Anaconda developers.  I wouldn't have any
objections accepting a patch which detected the whole-disk device and
returned the GPT label/UUID information, but I probably wouldn't code
it myself.  Still, if it someone thought it was *useful* and would use
it, and thus felt called to write a patch, I'd certainly accept it.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] blkid detection for ZFS
  2008-02-20 12:57 ` Theodore Tso
@ 2008-02-21  9:20   ` Andreas Dilger
  0 siblings, 0 replies; 3+ messages in thread
From: Andreas Dilger @ 2008-02-21  9:20 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4

On Feb 20, 2008  07:57 -0500, Theodore Ts'o wrote:
> On Thu, Feb 14, 2008 at 06:07:40PM -0700, Andreas Dilger wrote:
> > Some input is welcome here also...  There is a UUID (GUID) for the whole
> > "pool" (aggregation of devices that ZFS filesystems might live on), a
> > UUID for the "virtual device" (vdev) (akin to MD RAID set) that a disk
> > is part of and also a separate UUID for each device.  There is a LABEL
> > (pool name) for the whole pool, but not one for an individual filesystem.
> 
> Are there devices for that are made available for the vdev and the
> pool?  I assume not for the pool since that's a filesystem entity, but
> what about the vdev?
> 
> In general, blkid is all about mapping the UUID of what lives on the
> device to the device filename, for the benefit of programs like mount
> and fsck.
> 
> I don't know enough about ZFS in terms of how you would mount a
> filesystem which is part of a pool.  How is the filesystem specified
> to the "mount" command?

Good question.  For Lustre (linux or Solaris), we want to be able to find
the pool by name, and then use ZFS tools to "import" the pool and make the
filesystems available to Lustre.  The current ZFS tools (as ported to
Linux) scan all of /dev/* directly, but I'd much prefer to use libblkid
for that since it knows about PVs, RAID devices, etc.

Filesystems in a ZFS pool are specified via "{poolname}/{fsname}", but
to get "fsname" from disk is much more involved than I want to get,
since it almost involves importing the pool and parsing a whole tree
of parameters and indexes.

I'd be pretty happy to just know from "blkid" that a given device is
used by ZFS for "lustrepool" or "fusepool" or whatever it is called.

> So it would seem to me that it would be better to make the UUID be for
> a particular ZFS physical disk be the UUID for that disk, and not for
> the whole pool.  The question really, though, is what actually would
> be most useful --- who is going to actually use blkid on a Solaris
> system with ZFS?  It may be that the right answer is to put the pool
> UUID as a separate tag; blkid supports more than just the standard
> LABEL, UUID, TYPE, etc. tags.  You could easily stash the pool UUID in
> a POOL_GUID tag, if it would be useful for some blkid callers.

OK, maybe I'll go that route, since I won't stricly be having UUIDs
or LABELs that directly map to filesystems.
> 
> > On a related note - on Solaris the ZFS filesystems always live in a GPT
> > partition table, and I note that libblkid doesn't identify this.  Is that
> > something we want to start adding to libblkid (e.g. GPT, DOS, LVM, etc)?
> 
> What do you mean by not identifying the GPT partition table?  At the
> moment we haven't been identifying the whole disk partition tables,
> mainly becuase there isn't much use for it especially for the DOS MBR
> (no uuid or label to speak of).
> 
> I just checked in a patch from Eric to detect LVM2 PV's, because it
> was useful for the Anaconda developers.  I wouldn't have any
> objections accepting a patch which detected the whole-disk device and
> returned the GPT label/UUID information, but I probably wouldn't code
> it myself.  Still, if it someone thought it was *useful* and would use
> it, and thus felt called to write a patch, I'd certainly accept it.

That was my question.  I didn't see the LVM2 identification patch until
after my email, but this makes it fairly clear that identification of
block devices isn't verboten.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-02-21  9:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-15  1:07 [PATCH] blkid detection for ZFS Andreas Dilger
2008-02-20 12:57 ` Theodore Tso
2008-02-21  9:20   ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).