From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1VZL8D-0001x6-6N for mharc-grub-devel@gnu.org; Thu, 24 Oct 2013 09:39:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42853) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VZL83-0001wt-U2 for grub-devel@gnu.org; Thu, 24 Oct 2013 09:39:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VZL7w-0001Uy-JQ for grub-devel@gnu.org; Thu, 24 Oct 2013 09:39:11 -0400 Received: from mail.csclub.uwaterloo.ca ([129.97.134.52]:53888) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VZL7w-0001Uh-Dj for grub-devel@gnu.org; Thu, 24 Oct 2013 09:39:04 -0400 Received: from caffeine.csclub.uwaterloo.ca (caffeine.csclub.uwaterloo.ca [129.97.134.17]) by mail.csclub.uwaterloo.ca (Postfix) with SMTP id 503B82DF87 for ; Thu, 24 Oct 2013 09:39:02 -0400 (EDT) Received: by caffeine.csclub.uwaterloo.ca (sSMTP sendmail emulation); Thu, 24 Oct 2013 09:39:02 -0400 From: "Lennart Sorensen" Date: Thu, 24 Oct 2013 09:39:02 -0400 To: The development of GNU GRUB Subject: Re: grub mishandles corrupt/missing primary GPT Message-ID: <20131024133902.GS13097@csclub.uwaterloo.ca> References: <9DB3EF6D-6E26-4A9F-BB2D-07CCEF378D7A@colorremedies.com> <52687CC7.4010605@gmail.com> <6400C778-E3A3-42FD-ABF7-E15DF635FC40@colorremedies.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6400C778-E3A3-42FD-ABF7-E15DF635FC40@colorremedies.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 129.97.134.52 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: The development of GNU GRUB List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Oct 2013 13:39:19 -0000 On Wed, Oct 23, 2013 at 09:07:21PM -0600, Chris Murphy wrote: > While technically a violation of the UEFI spec, I think this can be worked around by considering the disk GPT if the first entry in the MBR is type 0xEE. I don't know of a hybrid MBR implementation where an entry other than the first is 0xEE. Well everyone other than Microsoft seems to understand how useful support for hybrid setups can be and hence support them. > But if there is no 0xEE entry at all, this is identical to a formerly GPT disk repartitioned as MBR by a utility that doesn't know anything about GPT, and thus doesn't erase the stale GPT data - and therefore must be treated as MBR. That is true. That does not mean there must ONLY be a 0xEE entry. > So perhaps this test is difficult because it's GPT on BIOS, with a limited space BIOS boot partition. However, I think on UEFI computers this should still work with one valid GPT, rather than not boot at all. There's a lot more space for this there. Certainly if using the BIOS boot partition, there really isn't much of a space excuse anymore, unless you run into limitations on how much ram you can use in early boot. > Both primary and backup GPTs are preserved in this case since the primary is in LBA 1 and 2, and only LBA 0 is overwritten with the new MBR. > > UEFI spec says if the MBR signature of 0xaa55 is intact, and there isn't an 0xEE entry, and the partition entries are rational (physically on disk and don't overlap), then the two GPTs are considered stale and the disk is MBR. > > The primary header contains the location of the backup GPT. If the header is sufficiently corrupt, and the backup GPT can't be located, then that's the same as an invalid backup GPT, and in that case fail. > > My point is we shouldn't fail when there is a valid locatable backup GPT. The whole point of having a second GPT is obviated with the current behavior. Sometimes backups are designed in and never used. I don't recall ever seeing any indication Microsoft ever used the second copy of the FAT for anything other than filesystem repair tools. > I don't think we can work around this kind of hardware vendor sabotage. If it looks like a valid GPT, but is actually stale, if it's used and contains incorrect information, then boot fails. Better to try than not try at all. > > It's certainly uncommon. A Google search: corrupt "primary gpt" only turns up 1900 results. But it is possible. > > And this isn't the only mishandling I'm finding, so it's not like GRUB is unique. In fact just now by changing only a single byte in the primary GPT table (I changed the E to an F in the BIOS boot partition type UUID), the kernel suddenly has no idea what disklabel the disk is, and fails to mount rootfs. So I need to track that down too, but it seems like it knows the primary GPT table is corrupt, but then fails to use the backup GPT for some reason. > > An argument against GRUB doing all of this work: maybe the bootloader should be able to blindly trust the primary GPT table with no validity checks? And instead rely on (presently non-existent) checks by the underlying OS to fixi this problem? Something like an fsck_gpt, seeing as nothing else is in a good position to both check and fix such GPTs other than a partition tool. Perhaps. Certainly simpler. I do wonder how Windows handles booting with a corrupt primary GPT. Would you happen to know? (A quick google search didn't find an answer to the question unfortunately). > The UEFI spec says "Software should ask a user for confirmation before restoring the primary GPT" and yet it also requires the unspecified software fix the primary GPT if corrupt. The spec actually uses the word "must". So per usual, the spec has rather lofty demands. So it must fix it after asking the user for confirmation? -- Len Sorensen