* Soft RAID and EFI systems
@ 2014-01-31 17:02 Francis Moreau
2014-02-01 22:04 ` Martin Wilck
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Francis Moreau @ 2014-01-31 17:02 UTC (permalink / raw)
To: linux-raid
Hello,
On EFI systems I'd like to RAID mirror /boot partition.
For HW RAID there's no issue since access to the disk or any
partitions should be totally transparent to the bios.
For Fake RAID, I'm not sure but I would say that the bios is able to
read the RAID metadata as well.
For (md) Soft RAID, I don't know. I would say that the bios is
unlikely to understand the md metadata stored in the /boot partition
so it won't work.
Is that correct ?
If so does that mean I can't mirror /boot partition on EFI systems ?
Maybe creating a RAID device using the whole disk (not the partition
device) would work ?
Thanks for your enlightments.
--
Francis
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-01-31 17:02 Soft RAID and EFI systems Francis Moreau
@ 2014-02-01 22:04 ` Martin Wilck
2014-02-02 21:39 ` Francis Moreau
2014-02-02 20:39 ` Chris Murphy
2014-02-03 9:56 ` David Brown
2 siblings, 1 reply; 27+ messages in thread
From: Martin Wilck @ 2014-02-01 22:04 UTC (permalink / raw)
To: Francis Moreau; +Cc: linux-raid
Hi Francis,
> For Fake RAID, I'm not sure but I would say that the bios is able to
> read the RAID metadata as well.
> For (md) Soft RAID, I don't know. I would say that the bios is
> unlikely to understand the md metadata stored in the /boot partition
> so it won't work.
there is no big difference between EFI and legacy systems in this area.
Native MD meta data isn't understood by any BIOS I've heard of.
(Well I guess you actually could try to port mdadm to the UEFI
environment and use it for booting - under UEFI it is possible to run
your own applications, in principle at least, unlike legacy BIOS).
DDF or IMSM meta data can be read and written by BIOS fake RAID under
uEFI just as well as in legacy mode. You just need to make sure that
your system vendor delivers a BIOS with all the required uEFI tools.
> If so does that mean I can't mirror /boot partition on EFI systems ?
> Maybe creating a RAID device using the whole disk (not the partition
> device) would work ?
Mirroring he whole disk is the only thing the BIOS can possibly do.
But chances are bad that it will work with MD metadata, see above.
Martin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-01-31 17:02 Soft RAID and EFI systems Francis Moreau
2014-02-01 22:04 ` Martin Wilck
@ 2014-02-02 20:39 ` Chris Murphy
2014-02-02 21:34 ` Francis Moreau
2014-02-03 9:56 ` David Brown
2 siblings, 1 reply; 27+ messages in thread
From: Chris Murphy @ 2014-02-02 20:39 UTC (permalink / raw)
To: Francis Moreau; +Cc: linux-raid
On Jan 31, 2014, at 10:02 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> Hello,
>
> On EFI systems I'd like to RAID mirror /boot partition.
Yes this is non-obvious with existing tools I think. With BIOS it's as simple as grub-install /dev/sdX for each member device in the md raid1 array, and then grub-mkconfig -o /boot/grub/grub.cfg which creates a single (mirrored) copy of the grub.cfg.
For UEFI, we need:
1. ESP created per member device
2. grubx64.efi on each ESP. Grub-install depends on the ESP being mounted in order to install to it, unlike MBR gap or BIOS Boot installs. So you have to manually umount/mount each ESP to do a grub-install; or copy one grubx64.efi to each ESP.
3. grub.cfg at /boot/grub just like on BIOS.
For those distros doing Secure Boot, its complicated because there is no such thing as grub-install. There's a one size fits all signed grubx64.efi which typically searches for grub.cfg in the same directory as the grubx64.efi file. That means your grub.cfg isn't mirrored, and any time you do a kernel update you have to manually update all the grub.cfgs on each ESP. Messy. That's the way it is on Fedora right now and I just filed some bugs on this.
Anyway, another way to do this is a simpler grub.cfg on each ESP that's never updated again, that forwards to the /boot/grub/grub.cfg which is updated with kernel updates. That simple grub.cfg is described as:
# forward to real config
search --no-floppy --fs-uuid --set=root --hint-bios=hd$d,gpt2 --hint-efi=hd$d,gpt2 --hint-baremetal=ahci$d,gpt2 d7bc9d0e-7706-44f9-b1a7-ff24b7c360a7
configfile /boot/grub2/grub.cfg
# $d is the number for each md member disk, so that the hint is only looking for the fs-uuid on the local disk first. But I think even a single such grub.cfg that's not customized per disk will work, but it does need to be copied to each ESP.
> For Fake RAID, I'm not sure but I would say that the bios is able to
> read the RAID metadata as well.
I'm not sure if the fake raid is going to present a single ESP to the firmware or not? If the EFI System partitions are raided via fake raid, then you have only one grubx64.efi and one grub.cfg so there should be no problem? I haven't test it.
>
> For (md) Soft RAID, I don't know. I would say that the bios is
> unlikely to understand the md metadata stored in the /boot partition
> so it won't work.
The UEFI firmware has no way of directly reading /boot aside from the md metadata, which is that it's likely ext4 which the firmware won't understand. UEFI firmware is looking for an OS loader on the EFI System Partition (ESP) and that OS Loader will be grub, which will understand both md raid1 metadata and ext4.
> Is that correct ?
> If so does that mean I can't mirror /boot partition on EFI systems ?
You can, it's just non-obvious (in my opinion) with the existing tools and installers.
> Maybe creating a RAID device using the whole disk (not the partition
> device) would work ?
No. The firmware absolutely is looking for an ESP. If you want this to be resilient across boots, then each disk needs an ESP.
And there's something I left out from above, which is that merely copying grubx64.efi might not be enough because of the all the NVRAM business. Ideally the NVRAM contains an entry pointing to each disk's ESP so that if one isn't available it falls back to another. The other way to do this on the ESP is with //EFI/BOOT/BOOTX64.efi which is simply a renamed grubx64.efi (on non-Secure Boot systems) that the firmware will load if there isn't an NVRAM entry.
On Fedora, they are using several EFI applications that also enable fallback booting absent NVRAM entries. So as long as those files are on all ESPs, we don't strictly need NVRAM entries for each ESP. We just get some extra clutter and a slight delay at boot.
Anyway, yes it's actually complicated for the mortal user, compared to BIOS booting degraded raid1, it's not you.
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 20:39 ` Chris Murphy
@ 2014-02-02 21:34 ` Francis Moreau
2014-02-02 22:30 ` Chris Murphy
0 siblings, 1 reply; 27+ messages in thread
From: Francis Moreau @ 2014-02-02 21:34 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-raid
Hi Chris,
First of all, thanks for your answer :)
On 02/02/2014 09:39 PM, Chris Murphy wrote:
>
> On Jan 31, 2014, at 10:02 AM, Francis Moreau <francis.moro@gmail.com> wrote:
>
>> Hello,
>>
>> On EFI systems I'd like to RAID mirror /boot partition.
>
> Yes this is non-obvious with existing tools I think. With BIOS it's as simple as grub-install /dev/sdX for each member device in the md raid1 array, and then grub-mkconfig -o /boot/grub/grub.cfg which creates a single (mirrored) copy of the grub.cfg.
>
> For UEFI, we need:
> 1. ESP created per member device
> 2. grubx64.efi on each ESP. Grub-install depends on the ESP being mounted in order to install to it, unlike MBR gap or BIOS Boot installs. So you have to manually umount/mount each ESP to do a grub-install; or copy one grubx64.efi to each ESP.
> 3. grub.cfg at /boot/grub just like on BIOS.
>
That's funny because one of the reasons I want to use UEFI firmware is
to get rid of grub (I don't like it and the way it has become such a
bloated beast): since /boot is vfat and has its own partition, I prefer
use a much simpler bootloader such as gummyboot.
> For those distros doing Secure Boot, its complicated because there is no such thing as grub-install. There's a one size fits all signed grubx64.efi which typically searches for grub.cfg in the same directory as the grubx64.efi file. That means your grub.cfg isn't mirrored, and any time you do a kernel update you have to manually update all the grub.cfgs on each ESP. Messy. That's the way it is on Fedora right now and I just filed some bugs on this.
Could you give me a pointer on the bug you filled out, I would be
interested.
>
> Anyway, another way to do this is a simpler grub.cfg on each ESP that's never updated again, that forwards to the /boot/grub/grub.cfg which is updated with kernel updates. That simple grub.cfg is described as:
>
> # forward to real config
> search --no-floppy --fs-uuid --set=root --hint-bios=hd$d,gpt2 --hint-efi=hd$d,gpt2 --hint-baremetal=ahci$d,gpt2 d7bc9d0e-7706-44f9-b1a7-ff24b7c360a7
> configfile /boot/grub2/grub.cfg
>
> # $d is the number for each md member disk, so that the hint is only looking for the fs-uuid on the local disk first. But I think even a single such grub.cfg that's not customized per disk will work, but it does need to be copied to each ESP.
>
Interesting, I didn't know such trick was possible.
>
>> For Fake RAID, I'm not sure but I would say that the bios is able to
>> read the RAID metadata as well.
>
> I'm not sure if the fake raid is going to present a single ESP to the firmware or not? If the EFI System partitions are raided via fake raid, then you have only one grubx64.efi and one grub.cfg so there should be no problem? I haven't test it.
I haven't and can't because of lack of HW, at least for now. But I would
think there's no problem on such systems. That would give a first reason
to use them.
>
>>
>> For (md) Soft RAID, I don't know. I would say that the bios is
>> unlikely to understand the md metadata stored in the /boot partition
>> so it won't work.
>
> The UEFI firmware has no way of directly reading /boot aside from the md metadata, which is that it's likely ext4 which the firmware won't understand. UEFI firmware is looking for an OS loader on the EFI System Partition (ESP) and that OS Loader will be grub, which will understand both md raid1 metadata and ext4.
>
Actually, IIRC, one metadata format (0.9 perhaps) is stored at the end
of the partition which would allow the UEFI firmware to access the /boot
partition transparently, no ?
Thanks !
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-01 22:04 ` Martin Wilck
@ 2014-02-02 21:39 ` Francis Moreau
2014-02-02 21:56 ` Martin Wilck
0 siblings, 1 reply; 27+ messages in thread
From: Francis Moreau @ 2014-02-02 21:39 UTC (permalink / raw)
To: Martin Wilck; +Cc: linux-raid
Hi Martin,
On 02/01/2014 11:04 PM, Martin Wilck wrote:
> Hi Francis,
>
>> For Fake RAID, I'm not sure but I would say that the bios is able to
>> read the RAID metadata as well.
>> For (md) Soft RAID, I don't know. I would say that the bios is
>> unlikely to understand the md metadata stored in the /boot partition
>> so it won't work.
>
> there is no big difference between EFI and legacy systems in this area.
Well the main difference I can see is that EFI firmwares access a
filesystem where as BIOS doesn't in order to load the bootloader.
In the case of BIOS you can still rely on a bootloader such as grub to
access a partition with MD RAID on it.
Thanks
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 21:39 ` Francis Moreau
@ 2014-02-02 21:56 ` Martin Wilck
0 siblings, 0 replies; 27+ messages in thread
From: Martin Wilck @ 2014-02-02 21:56 UTC (permalink / raw)
To: Francis Moreau, linux-raid
On 2. Februar 2014 22:39:31 MEZ, Francis Moreau <francis.moro@gmail.com> wrote:
>Hi Martin,
>
>On 02/01/2014 11:04 PM, Martin Wilck wrote:
>> Hi Francis,
>>
>>> For Fake RAID, I'm not sure but I would say that the bios is able to
>>> read the RAID metadata as well.
>>> For (md) Soft RAID, I don't know. I would say that the bios is
>>> unlikely to understand the md metadata stored in the /boot partition
>>> so it won't work.
>>
>> there is no big difference between EFI and legacy systems in this
>area.
>
>Well the main difference I can see is that EFI firmwares access a
>filesystem where as BIOS doesn't in order to load the bootloader.
The EFI BIOS needs a driver to access the RAID array. But so does the legacy BIOS for accessing the MBR.
If you really want to do RAID on partitions, you can put your md-capable EFI boot loader on the EFI system partition. It will work just like in the legacy case. The UEFI partition itself is of course non-uRAID in this setup, but the same holds for your legacy MBR.
Martin
>
>In the case of BIOS you can still rely on a bootloader such as grub to
>access a partition with MD RAID on it.
>
>Thanks
--
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 21:34 ` Francis Moreau
@ 2014-02-02 22:30 ` Chris Murphy
2014-02-02 22:57 ` Phil Turmel
2014-02-04 8:32 ` Francis Moreau
0 siblings, 2 replies; 27+ messages in thread
From: Chris Murphy @ 2014-02-02 22:30 UTC (permalink / raw)
To: Francis Moreau; +Cc: linux-raid
On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com> wrote:
>
> That's funny because one of the reasons I want to use UEFI firmware is
> to get rid of grub (I don't like it and the way it has become such a
> bloated beast): since /boot is vfat and has its own partition, I prefer
> use a much simpler bootloader such as gummyboot.
It might be possible to do what you want with mdadm metadata version 1.0. Typically bootable raid1 is ext4 on md raid1 using metadata format 1.0, and an internal bitmap. When the partitions are not assembled, they each appear as separate ext4 partitions. If FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a separate partition, and the mdadm v1.0 metadata at the end of the partition doesn't confuse the firmware, what should happen is any ESP can boot the system. Once the kernel and initramfs are loaded, mdadm will locate the mdadm metadata on each partition and assemble them into a single md device, and fstab mounts the md device at /boot. So prior to boot they are separate ESPs, and after boot it's a single ESP (mirrored). But I haven't tested this arrangement with ESPs and UE
FI.
The easiest scenario I've found for resilient boot on EFI systems is, well, not easy. First, I put shim and grub package files onto each ESP along with the previously posted grub.cfg snippet. Those grub.cfgs are one time, non-updatable files, that point to /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on Btrfs raid1. That's about as reliable as it gets because the only dependencies are grub (which understands Btrfs multiple devices) and dracut baking the btrfs module into initramfs. It gets essentially fool proof if btrfs is compiled into the kernel. Other combinations are easier to break. I basically want ESPs that aren't being modified if at all avoidable because FAT32 breaks easily if anything is being written to it and there is a crash or power failure.
>> For those distros doing Secure Boot, its complicated because there is no such thing as grub-install. There's a one size fits all signed grubx64.efi which typically searches for grub.cfg in the same directory as the grubx64.efi file. That means your grub.cfg isn't mirrored, and any time you do a kernel update you have to manually update all the grub.cfgs on each ESP. Messy. That's the way it is on Fedora right now and I just filed some bugs on this.
>
> Could you give me a pointer on the bug you filled out, I would be
> interested.
https://bugzilla.redhat.com/show_bug.cgi?id=1048999
https://bugzilla.redhat.com/show_bug.cgi?id=1022316
https://bugzilla.redhat.com/show_bug.cgi?id=1060576
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 22:30 ` Chris Murphy
@ 2014-02-02 22:57 ` Phil Turmel
2014-02-03 7:19 ` Martin Wilck
2014-02-04 8:41 ` Francis Moreau
2014-02-04 8:32 ` Francis Moreau
1 sibling, 2 replies; 27+ messages in thread
From: Phil Turmel @ 2014-02-02 22:57 UTC (permalink / raw)
To: Chris Murphy, Francis Moreau; +Cc: linux-raid
On 02/02/2014 05:30 PM, Chris Murphy wrote:
>
> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
> wrote:
>>
>> That's funny because one of the reasons I want to use UEFI firmware
>> is to get rid of grub (I don't like it and the way it has become
>> such a bloated beast): since /boot is vfat and has its own
>> partition, I prefer use a much simpler bootloader such as
>> gummyboot.
Ditching the bootloader is possible:
http://kroah.com/log/blog/2013/09/02/booting-a-self-signed-linux-kernel/
It seems to me that you should be able to create a raid1 v1.0 MD array
of your EFI support partitions, and put the combined and signed
kernel/initramfs onto it (mirrored to all member drives).
Then set the UEFI bios to try each device's ESP in turn.
Untested ... :-)
Phil
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 22:57 ` Phil Turmel
@ 2014-02-03 7:19 ` Martin Wilck
2014-02-04 8:41 ` Francis Moreau
1 sibling, 0 replies; 27+ messages in thread
From: Martin Wilck @ 2014-02-03 7:19 UTC (permalink / raw)
To: linux-raid
On 2. Februar 2014 23:57:23 MEZ, Phil Turmel <philip@turmel.org> wrote:
>It seems to me that you should be able to create a raid1 v1.0 MD array
>of your EFI support partitions, and put the combined and signed
>kernel/initramfs onto it (mirrored to all member drives).
>
>Then set the UEFI bios to try each device's ESP in turn.
>
Smart idea ... MD RAID on the whole disk still seems more reasonable to me.
Martin
>Untested ... :-)
>
>Phil
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid"
>in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-01-31 17:02 Soft RAID and EFI systems Francis Moreau
2014-02-01 22:04 ` Martin Wilck
2014-02-02 20:39 ` Chris Murphy
@ 2014-02-03 9:56 ` David Brown
2014-02-04 8:22 ` Francis Moreau
2 siblings, 1 reply; 27+ messages in thread
From: David Brown @ 2014-02-03 9:56 UTC (permalink / raw)
To: Francis Moreau, linux-raid
On 31/01/14 18:02, Francis Moreau wrote:
> Hello,
>
> On EFI systems I'd like to RAID mirror /boot partition.
>
> For HW RAID there's no issue since access to the disk or any
> partitions should be totally transparent to the bios.
>
> For Fake RAID, I'm not sure but I would say that the bios is able to
> read the RAID metadata as well.
>
> For (md) Soft RAID, I don't know. I would say that the bios is
> unlikely to understand the md metadata stored in the /boot partition
> so it won't work.
> Is that correct ?
> If so does that mean I can't mirror /boot partition on EFI systems ?
> Maybe creating a RAID device using the whole disk (not the partition
> device) would work ?
>
> Thanks for your enlightments.
>
I can't answer your question as such, but in case you don't know
VirtualBox virtual machines can be configured with an EFI instead of a
normal bios. I don't know how complete the emulation of EFI is, but
that might give you a quick and easy way to test out the different ideas.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-03 9:56 ` David Brown
@ 2014-02-04 8:22 ` Francis Moreau
0 siblings, 0 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-04 8:22 UTC (permalink / raw)
To: David Brown, linux-raid
On 02/03/2014 10:56 AM, David Brown wrote:
> On 31/01/14 18:02, Francis Moreau wrote:
>> Hello,
>>
>> On EFI systems I'd like to RAID mirror /boot partition.
>>
>> For HW RAID there's no issue since access to the disk or any
>> partitions should be totally transparent to the bios.
>>
>> For Fake RAID, I'm not sure but I would say that the bios is able to
>> read the RAID metadata as well.
>>
>> For (md) Soft RAID, I don't know. I would say that the bios is
>> unlikely to understand the md metadata stored in the /boot partition
>> so it won't work.
>> Is that correct ?
>> If so does that mean I can't mirror /boot partition on EFI systems ?
>> Maybe creating a RAID device using the whole disk (not the partition
>> device) would work ?
>>
>> Thanks for your enlightments.
>>
>
> I can't answer your question as such, but in case you don't know
> VirtualBox virtual machines can be configured with an EFI instead of a
> normal bios. I don't know how complete the emulation of EFI is, but
> that might give you a quick and easy way to test out the different ideas.
>
Yes but I'll give some testing on qemu first, it seems that it's
possible to find and use a EFI bios for qemu.
thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 22:30 ` Chris Murphy
2014-02-02 22:57 ` Phil Turmel
@ 2014-02-04 8:32 ` Francis Moreau
2014-02-04 8:57 ` David Brown
2014-02-04 14:50 ` Chris Murphy
1 sibling, 2 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-04 8:32 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-raid
On 02/02/2014 11:30 PM, Chris Murphy wrote:
>
> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com> wrote:
>>
>> That's funny because one of the reasons I want to use UEFI firmware is
>> to get rid of grub (I don't like it and the way it has become such a
>> bloated beast): since /boot is vfat and has its own partition, I prefer
>> use a much simpler bootloader such as gummyboot.
>
> It might be possible to do what you want with mdadm metadata version 1.0. Typically bootable raid1 is ext4 on md raid1 using metadata format 1.0, and an internal bitmap. When the partitions are not assembled, they each appear as separate ext4 partitions. If FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a separate partition, and the mdadm v1.0 metadata at the end of the partition doesn't confuse the firmware, what should happen is any ESP can boot the system. Once the kernel and initramfs are loaded, mdadm will locate the mdadm metadata on each partition and assemble them into a single md device, and fstab mounts the md device at /boot. So prior to boot they are separate ESPs, and after boot it's a single ESP (mirrored). But I haven't tested this arrangement with ESPs and
UEFI.
I'll test this configuration and see if it works soon.
>
> The easiest scenario I've found for resilient boot on EFI systems is, well, not easy. First, I put shim and grub package files onto each ESP along with the previously posted grub.cfg snippet. Those grub.cfgs are one time, non-updatable files, that point to /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on Btrfs raid1. That's about as reliable as it gets because the only dependencies are grub (which understands Btrfs multiple devices) and dracut baking the btrfs module into initramfs. It gets essentially fool proof if btrfs is compiled into the kernel. Other combinations are easier to break. I basically want ESPs that aren't being modified if at all avoidable because FAT32 breaks easily if anything is being written to it and there is a crash or power failure.
>
I agree that FAT32 can break during power failure, that's the reason why
I'm trying to make it mirrored. But I want to get rid of grub as much as
possible so I would prefer to use the first solution.
>
>
>>> For those distros doing Secure Boot, its complicated because there is no such thing as grub-install. There's a one size fits all signed grubx64.efi which typically searches for grub.cfg in the same directory as the grubx64.efi file. That means your grub.cfg isn't mirrored, and any time you do a kernel update you have to manually update all the grub.cfgs on each ESP. Messy. That's the way it is on Fedora right now and I just filed some bugs on this.
>>
>> Could you give me a pointer on the bug you filled out, I would be
>> interested.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1048999
> https://bugzilla.redhat.com/show_bug.cgi?id=1022316
> https://bugzilla.redhat.com/show_bug.cgi?id=1060576
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-02 22:57 ` Phil Turmel
2014-02-03 7:19 ` Martin Wilck
@ 2014-02-04 8:41 ` Francis Moreau
2014-02-04 8:48 ` David Brown
1 sibling, 1 reply; 27+ messages in thread
From: Francis Moreau @ 2014-02-04 8:41 UTC (permalink / raw)
To: Phil Turmel, Chris Murphy; +Cc: linux-raid
On 02/02/2014 11:57 PM, Phil Turmel wrote:
> On 02/02/2014 05:30 PM, Chris Murphy wrote:
>>
>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>> wrote:
>>>
>>> That's funny because one of the reasons I want to use UEFI firmware
>>> is to get rid of grub (I don't like it and the way it has become
>>> such a bloated beast): since /boot is vfat and has its own
>>> partition, I prefer use a much simpler bootloader such as
>>> gummyboot.
>
> Ditching the bootloader is possible:
>
> http://kroah.com/log/blog/2013/09/02/booting-a-self-signed-linux-kernel/
>
Well yeah it's possible but not currently usable IMHO. It means that you
need to build your own kernel, include in this kernel the initramfs
image and you need to redo the whole process if you want to change a
single option in the kernel command line.
> It seems to me that you should be able to create a raid1 v1.0 MD array
> of your EFI support partitions, and put the combined and signed
> kernel/initramfs onto it (mirrored to all member drives).
Are both v0.9 and v1.0 MD put their metadata at the end of a partition
? I thought only v0.9 would do that.
>
> Then set the UEFI bios to try each device's ESP in turn.
>
> Untested ... :-)
I'll do :)
Thanks
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:41 ` Francis Moreau
@ 2014-02-04 8:48 ` David Brown
2014-02-04 8:53 ` Francis Moreau
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: David Brown @ 2014-02-04 8:48 UTC (permalink / raw)
To: Francis Moreau, Phil Turmel, Chris Murphy; +Cc: linux-raid
On 04/02/14 09:41, Francis Moreau wrote:
> On 02/02/2014 11:57 PM, Phil Turmel wrote:
>> On 02/02/2014 05:30 PM, Chris Murphy wrote:
>>>
>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>>> wrote:
>>>>
>>>> That's funny because one of the reasons I want to use UEFI firmware
>>>> is to get rid of grub (I don't like it and the way it has become
>>>> such a bloated beast): since /boot is vfat and has its own
>>>> partition, I prefer use a much simpler bootloader such as
>>>> gummyboot.
>>
>> Ditching the bootloader is possible:
>>
>> http://kroah.com/log/blog/2013/09/02/booting-a-self-signed-linux-kernel/
>>
>
> Well yeah it's possible but not currently usable IMHO. It means that you
> need to build your own kernel, include in this kernel the initramfs
> image and you need to redo the whole process if you want to change a
> single option in the kernel command line.
>
>> It seems to me that you should be able to create a raid1 v1.0 MD array
>> of your EFI support partitions, and put the combined and signed
>> kernel/initramfs onto it (mirrored to all member drives).
>
> Are both v0.9 and v1.0 MD put their metadata at the end of a partition
> ? I thought only v0.9 would do that.
Yes, it is only 0.9 format that is at the end of the partition. This
means that a plain raid1 mirror (with as many disks as you like, as long
as they are simple mirrors and not raid10) looks just like a normal
partition for other tools. As long as it is read-only, tools that are
not raid-aware can use it. For example, grub and lilo can happily boot
from a 0.9 metadata raid1 array just like from a normal partition.
(Actually, modern grub understands a lot of md raid formats.) The same
thing should apply to EFI, as long as it does not attempt to write to
the partition.
>
>>
>> Then set the UEFI bios to try each device's ESP in turn.
>>
>> Untested ... :-)
>
> I'll do :)
>
> Thanks
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:48 ` David Brown
@ 2014-02-04 8:53 ` Francis Moreau
2014-02-04 12:27 ` Phil Turmel
2014-02-04 15:13 ` Chris Murphy
2 siblings, 0 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-04 8:53 UTC (permalink / raw)
To: David Brown, Phil Turmel, Chris Murphy; +Cc: linux-raid
On 02/04/2014 09:48 AM, David Brown wrote:
> On 04/02/14 09:41, Francis Moreau wrote:
>> On 02/02/2014 11:57 PM, Phil Turmel wrote:
>>> On 02/02/2014 05:30 PM, Chris Murphy wrote:
>>>>
>>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>>>> wrote:
>>>>>
>>>>> That's funny because one of the reasons I want to use UEFI firmware
>>>>> is to get rid of grub (I don't like it and the way it has become
>>>>> such a bloated beast): since /boot is vfat and has its own
>>>>> partition, I prefer use a much simpler bootloader such as
>>>>> gummyboot.
>>>
>>> Ditching the bootloader is possible:
>>>
>>> http://kroah.com/log/blog/2013/09/02/booting-a-self-signed-linux-kernel/
>>>
>>
>> Well yeah it's possible but not currently usable IMHO. It means that you
>> need to build your own kernel, include in this kernel the initramfs
>> image and you need to redo the whole process if you want to change a
>> single option in the kernel command line.
>>
>>> It seems to me that you should be able to create a raid1 v1.0 MD array
>>> of your EFI support partitions, and put the combined and signed
>>> kernel/initramfs onto it (mirrored to all member drives).
>>
>> Are both v0.9 and v1.0 MD put their metadata at the end of a partition
>> ? I thought only v0.9 would do that.
>
> Yes, it is only 0.9 format that is at the end of the partition. This
> means that a plain raid1 mirror (with as many disks as you like, as long
> as they are simple mirrors and not raid10) looks just like a normal
> partition for other tools. As long as it is read-only, tools that are
> not raid-aware can use it. For example, grub and lilo can happily boot
> from a 0.9 metadata raid1 array just like from a normal partition.
> (Actually, modern grub understands a lot of md raid formats.) The same
> thing should apply to EFI, as long as it does not attempt to write to
> the partition.
>
hmm I need to check if EFI specifies that the ESP is never written by
the firmware. If not it might be risky to rely on it.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:32 ` Francis Moreau
@ 2014-02-04 8:57 ` David Brown
2014-02-04 9:06 ` Francis Moreau
2014-02-04 15:40 ` Chris Murphy
2014-02-04 14:50 ` Chris Murphy
1 sibling, 2 replies; 27+ messages in thread
From: David Brown @ 2014-02-04 8:57 UTC (permalink / raw)
To: Francis Moreau, Chris Murphy; +Cc: linux-raid
On 04/02/14 09:32, Francis Moreau wrote:
> On 02/02/2014 11:30 PM, Chris Murphy wrote:
>>
>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>> wrote:
>>>
>>> That's funny because one of the reasons I want to use UEFI
>>> firmware is to get rid of grub (I don't like it and the way it
>>> has become such a bloated beast): since /boot is vfat and has its
>>> own partition, I prefer use a much simpler bootloader such as
>>> gummyboot.
>>
>> It might be possible to do what you want with mdadm metadata
>> version 1.0. Typically bootable raid1 is ext4 on md raid1 using
>> metadata format 1.0, and an internal bitmap. When the partitions
>> are not assembled, they each appear as separate ext4 partitions. If
>> FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a
>> separate partition, and the mdadm v1.0 metadata at the end of the
>> partition doesn't confuse the firmware, what should happen is any
>> ESP can boot the system. Once the kernel and initramfs are loaded,
>> mdadm will locate the mdadm metadata on each partition and assemble
>> them into a single md device, and fstab mounts the md device at
>> /boot. So prior to boot they are separate ESPs, and after boot it's
>> a single ESP (mirrored). But I haven't tested this arrangement with
>> ESPs and UEFI.
>
> I'll test this configuration and see if it works soon.
>
>>
>> The easiest scenario I've found for resilient boot on EFI systems
>> is, well, not easy. First, I put shim and grub package files onto
>> each ESP along with the previously posted grub.cfg snippet. Those
>> grub.cfgs are one time, non-updatable files, that point to
>> /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on
>> Btrfs raid1. That's about as reliable as it gets because the only
>> dependencies are grub (which understands Btrfs multiple devices)
>> and dracut baking the btrfs module into initramfs. It gets
>> essentially fool proof if btrfs is compiled into the kernel. Other
>> combinations are easier to break. I basically want ESPs that aren't
>> being modified if at all avoidable because FAT32 breaks easily if
>> anything is being written to it and there is a crash or power
>> failure.
>>
>
> I agree that FAT32 can break during power failure, that's the reason
> why I'm trying to make it mirrored. But I want to get rid of grub as
> much as possible so I would prefer to use the first solution.
Mirroring will not help FAT32 during power failure - you have a good
chance of getting two copies of the same error. And if your power fail
hits during writes, you also have a good chance of the two disks having
/different/ errors and inconsistencies. The problem lies in FAT32
having no log, and no barriers or ordering when it makes changes -
updates to the file data, the directory structure, and the FAT table can
happen in different orders, and a power failure can leave one part
updated and the other part with old data. Raid cannot help with this
problem.
The most important way to protect your FAT32 system is simply to avoid
writing to it except when absolutely necessary. If it is mounted
read-only, and only updated when changing grub or updating the kernel,
then just make sure you don't power-cycle your machine at that time.
The smaller the critical window, the smaller the chances of problems.
If you need to do updates more regularly, then your best bet is to have
independent FAT32 partitions on the two disks. Make your updates on one
disk, and when it is finished copy the changes onto the other disk.
Then you always have a good copy - if you get a crash while the first
disk is being updated, then when you re-start the computer, use its boot
menu to choose booting from the second disk.
>
>>
>>
>>>> For those distros doing Secure Boot, its complicated because
>>>> there is no such thing as grub-install. There's a one size fits
>>>> all signed grubx64.efi which typically searches for grub.cfg in
>>>> the same directory as the grubx64.efi file. That means your
>>>> grub.cfg isn't mirrored, and any time you do a kernel update
>>>> you have to manually update all the grub.cfgs on each ESP.
>>>> Messy. That's the way it is on Fedora right now and I just
>>>> filed some bugs on this.
>>>
>>> Could you give me a pointer on the bug you filled out, I would
>>> be interested.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1048999
>> https://bugzilla.redhat.com/show_bug.cgi?id=1022316
>> https://bugzilla.redhat.com/show_bug.cgi?id=1060576
>
> Thanks.
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:57 ` David Brown
@ 2014-02-04 9:06 ` Francis Moreau
2014-02-04 9:35 ` David Brown
2014-02-04 15:27 ` Chris Murphy
2014-02-04 15:40 ` Chris Murphy
1 sibling, 2 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-04 9:06 UTC (permalink / raw)
To: David Brown, Chris Murphy; +Cc: linux-raid
On 02/04/2014 09:57 AM, David Brown wrote:
> On 04/02/14 09:32, Francis Moreau wrote:
>> On 02/02/2014 11:30 PM, Chris Murphy wrote:
>>>
>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>>> wrote:
>>>>
>>>> That's funny because one of the reasons I want to use UEFI
>>>> firmware is to get rid of grub (I don't like it and the way it
>>>> has become such a bloated beast): since /boot is vfat and has its
>>>> own partition, I prefer use a much simpler bootloader such as
>>>> gummyboot.
>>>
>>> It might be possible to do what you want with mdadm metadata
>>> version 1.0. Typically bootable raid1 is ext4 on md raid1 using
>>> metadata format 1.0, and an internal bitmap. When the partitions
>>> are not assembled, they each appear as separate ext4 partitions. If
>>> FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a
>>> separate partition, and the mdadm v1.0 metadata at the end of the
>>> partition doesn't confuse the firmware, what should happen is any
>>> ESP can boot the system. Once the kernel and initramfs are loaded,
>>> mdadm will locate the mdadm metadata on each partition and assemble
>>> them into a single md device, and fstab mounts the md device at
>>> /boot. So prior to boot they are separate ESPs, and after boot it's
>>> a single ESP (mirrored). But I haven't tested this arrangement with
>>> ESPs and UEFI.
>>
>> I'll test this configuration and see if it works soon.
>>
>>>
>>> The easiest scenario I've found for resilient boot on EFI systems
>>> is, well, not easy. First, I put shim and grub package files onto
>>> each ESP along with the previously posted grub.cfg snippet. Those
>>> grub.cfgs are one time, non-updatable files, that point to
>>> /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on
>>> Btrfs raid1. That's about as reliable as it gets because the only
>>> dependencies are grub (which understands Btrfs multiple devices)
>>> and dracut baking the btrfs module into initramfs. It gets
>>> essentially fool proof if btrfs is compiled into the kernel. Other
>>> combinations are easier to break. I basically want ESPs that aren't
>>> being modified if at all avoidable because FAT32 breaks easily if
>>> anything is being written to it and there is a crash or power
>>> failure.
>>>
>>
>> I agree that FAT32 can break during power failure, that's the reason
>> why I'm trying to make it mirrored. But I want to get rid of grub as
>> much as possible so I would prefer to use the first solution.
>
> Mirroring will not help FAT32 during power failure - you have a good
> chance of getting two copies of the same error. And if your power fail
> hits during writes, you also have a good chance of the two disks having
> /different/ errors and inconsistencies. The problem lies in FAT32
> having no log, and no barriers or ordering when it makes changes -
> updates to the file data, the directory structure, and the FAT table can
> happen in different orders, and a power failure can leave one part
> updated and the other part with old data. Raid cannot help with this
> problem.
Ok, so basically RAID helps only in case of disk failure, right ?
It seems odd to have chosen FAT32 in the first place then.
>
> The most important way to protect your FAT32 system is simply to avoid
> writing to it except when absolutely necessary. If it is mounted
> read-only, and only updated when changing grub or updating the kernel,
> then just make sure you don't power-cycle your machine at that time.
Well, the problem is that you never know when power failures happen at
least for me with a small server without any power backup.
> The smaller the critical window, the smaller the chances of problems.
>
> If you need to do updates more regularly, then your best bet is to have
> independent FAT32 partitions on the two disks. Make your updates on one
> disk, and when it is finished copy the changes onto the other disk.
> Then you always have a good copy - if you get a crash while the first
> disk is being updated, then when you re-start the computer, use its boot
> menu to choose booting from the second disk.
That seems the best thing to do then.
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 9:06 ` Francis Moreau
@ 2014-02-04 9:35 ` David Brown
2014-02-04 9:45 ` Francis Moreau
2014-02-04 15:27 ` Chris Murphy
1 sibling, 1 reply; 27+ messages in thread
From: David Brown @ 2014-02-04 9:35 UTC (permalink / raw)
To: Francis Moreau, Chris Murphy; +Cc: linux-raid
On 04/02/14 10:06, Francis Moreau wrote:
> On 02/04/2014 09:57 AM, David Brown wrote:
>> On 04/02/14 09:32, Francis Moreau wrote:
>>> On 02/02/2014 11:30 PM, Chris Murphy wrote:
>>>>
>>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>>>> wrote:
>>>>>
>>>>> That's funny because one of the reasons I want to use UEFI
>>>>> firmware is to get rid of grub (I don't like it and the way it
>>>>> has become such a bloated beast): since /boot is vfat and has its
>>>>> own partition, I prefer use a much simpler bootloader such as
>>>>> gummyboot.
>>>>
>>>> It might be possible to do what you want with mdadm metadata
>>>> version 1.0. Typically bootable raid1 is ext4 on md raid1 using
>>>> metadata format 1.0, and an internal bitmap. When the partitions
>>>> are not assembled, they each appear as separate ext4 partitions. If
>>>> FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a
>>>> separate partition, and the mdadm v1.0 metadata at the end of the
>>>> partition doesn't confuse the firmware, what should happen is any
>>>> ESP can boot the system. Once the kernel and initramfs are loaded,
>>>> mdadm will locate the mdadm metadata on each partition and assemble
>>>> them into a single md device, and fstab mounts the md device at
>>>> /boot. So prior to boot they are separate ESPs, and after boot it's
>>>> a single ESP (mirrored). But I haven't tested this arrangement with
>>>> ESPs and UEFI.
>>>
>>> I'll test this configuration and see if it works soon.
>>>
>>>>
>>>> The easiest scenario I've found for resilient boot on EFI systems
>>>> is, well, not easy. First, I put shim and grub package files onto
>>>> each ESP along with the previously posted grub.cfg snippet. Those
>>>> grub.cfgs are one time, non-updatable files, that point to
>>>> /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on
>>>> Btrfs raid1. That's about as reliable as it gets because the only
>>>> dependencies are grub (which understands Btrfs multiple devices)
>>>> and dracut baking the btrfs module into initramfs. It gets
>>>> essentially fool proof if btrfs is compiled into the kernel. Other
>>>> combinations are easier to break. I basically want ESPs that aren't
>>>> being modified if at all avoidable because FAT32 breaks easily if
>>>> anything is being written to it and there is a crash or power
>>>> failure.
>>>>
>>>
>>> I agree that FAT32 can break during power failure, that's the reason
>>> why I'm trying to make it mirrored. But I want to get rid of grub as
>>> much as possible so I would prefer to use the first solution.
>>
>> Mirroring will not help FAT32 during power failure - you have a good
>> chance of getting two copies of the same error. And if your power fail
>> hits during writes, you also have a good chance of the two disks having
>> /different/ errors and inconsistencies. The problem lies in FAT32
>> having no log, and no barriers or ordering when it makes changes -
>> updates to the file data, the directory structure, and the FAT table can
>> happen in different orders, and a power failure can leave one part
>> updated and the other part with old data. Raid cannot help with this
>> problem.
>
> Ok, so basically RAID helps only in case of disk failure, right ?
Exactly correct (where "disk failure" includes both complete failure of
the disk, and unrecoverable read errors). Raid does not help against
corruption due to power fails (if you have a raid card with a battery
backup, and a filesystem with journalling, it should help here), and it
does not help against the most common cause of data loss - human error!
>
> It seems odd to have chosen FAT32 in the first place then.
FAT32 is the worst possible choice of a filesystem, except for three
aspects - it is quite simple and can be implemented in a small amount of
code (such as in EFI or a bootloader), it is usable on small disks or
partitions, and it is supported by brain-dead OS's that don't understand
better alternatives (NTFS has journalling, but is a monster to implement
in something the size of EFI).
It's a crap filesystem, but it is the "industry standard" for small
disks and small systems.
>
>>
>> The most important way to protect your FAT32 system is simply to avoid
>> writing to it except when absolutely necessary. If it is mounted
>> read-only, and only updated when changing grub or updating the kernel,
>> then just make sure you don't power-cycle your machine at that time.
>
> Well, the problem is that you never know when power failures happen at
> least for me with a small server without any power backup.
The answer here is staring you in the face... get an UPS. A small one
is not expensive - you only need it to run the server for a couple of
minutes. Even though journalled filesystems can keep their /metadata/
consistency after a power failure, they don't normally guarantee /data/
consistency, and certainly cannot guarantee /application level/
consistency. You get that from doing a proper shutdown. And remember
also that after an unclean shutdown, restarts involve long consistency
checks at the raid level and at the filesystem level - an UPS will let
you avoid that.
>
>> The smaller the critical window, the smaller the chances of problems.
>>
>> If you need to do updates more regularly, then your best bet is to have
>> independent FAT32 partitions on the two disks. Make your updates on one
>> disk, and when it is finished copy the changes onto the other disk.
>> Then you always have a good copy - if you get a crash while the first
>> disk is being updated, then when you re-start the computer, use its boot
>> menu to choose booting from the second disk.
>
> That seems the best thing to do then.
>
> Thanks.
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 9:35 ` David Brown
@ 2014-02-04 9:45 ` Francis Moreau
0 siblings, 0 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-04 9:45 UTC (permalink / raw)
To: David Brown, Chris Murphy; +Cc: linux-raid
On 02/04/2014 10:35 AM, David Brown wrote:
[...]
>>
>> It seems odd to have chosen FAT32 in the first place then.
>
> FAT32 is the worst possible choice of a filesystem, except for three
> aspects - it is quite simple and can be implemented in a small amount of
> code (such as in EFI or a bootloader), it is usable on small disks or
> partitions, and it is supported by brain-dead OS's that don't understand
> better alternatives (NTFS has journalling, but is a monster to implement
> in something the size of EFI).
>
> It's a crap filesystem, but it is the "industry standard" for small
> disks and small systems.
If readonly support is only needed, there're some alternative to FAT32.
But I agree FAT32 is well known by the industry standard.
>
>>
>>>
>>> The most important way to protect your FAT32 system is simply to avoid
>>> writing to it except when absolutely necessary. If it is mounted
>>> read-only, and only updated when changing grub or updating the kernel,
>>> then just make sure you don't power-cycle your machine at that time.
>>
>> Well, the problem is that you never know when power failures happen at
>> least for me with a small server without any power backup.
>
> The answer here is staring you in the face... get an UPS. A small one
> is not expensive - you only need it to run the server for a couple of
> minutes. Even though journalled filesystems can keep their /metadata/
> consistency after a power failure, they don't normally guarantee /data/
> consistency, and certainly cannot guarantee /application level/
> consistency. You get that from doing a proper shutdown. And remember
> also that after an unclean shutdown, restarts involve long consistency
> checks at the raid level and at the filesystem level - an UPS will let
> you avoid that.
I understand your point.
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:48 ` David Brown
2014-02-04 8:53 ` Francis Moreau
@ 2014-02-04 12:27 ` Phil Turmel
2014-02-04 15:13 ` Chris Murphy
2 siblings, 0 replies; 27+ messages in thread
From: Phil Turmel @ 2014-02-04 12:27 UTC (permalink / raw)
To: David Brown, Francis Moreau, Chris Murphy; +Cc: linux-raid
On 02/04/2014 03:48 AM, David Brown wrote:
> On 04/02/14 09:41, Francis Moreau wrote:
>> Are both v0.9 and v1.0 MD put their metadata at the end of a partition
>> ? I thought only v0.9 would do that.
>
> Yes, it is only 0.9 format that is at the end of the partition. This
> means that a plain raid1 mirror (with as many disks as you like, as long
> as they are simple mirrors and not raid10) looks just like a normal
> partition for other tools. As long as it is read-only, tools that are
> not raid-aware can use it. For example, grub and lilo can happily boot
> from a 0.9 metadata raid1 array just like from a normal partition.
> (Actually, modern grub understands a lot of md raid formats.) The same
> thing should apply to EFI, as long as it does not attempt to write to
> the partition.
No, both 0.9 and 1.0 have metadata after the data area. v1.1 and 1.2
have their metadata before the data area.
I've been using v1.0 on my /boot mirrors for a long time.
Phil
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:32 ` Francis Moreau
2014-02-04 8:57 ` David Brown
@ 2014-02-04 14:50 ` Chris Murphy
2014-02-07 8:00 ` Francis Moreau
1 sibling, 1 reply; 27+ messages in thread
From: Chris Murphy @ 2014-02-04 14:50 UTC (permalink / raw)
To: Francis Moreau; +Cc: linux-raid
On Feb 4, 2014, at 1:32 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> On 02/02/2014 11:30 PM, Chris Murphy wrote:
>>
>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com> wrote:
>>>
>>> That's funny because one of the reasons I want to use UEFI firmware is
>>> to get rid of grub (I don't like it and the way it has become such a
>>> bloated beast): since /boot is vfat and has its own partition, I prefer
>>> use a much simpler bootloader such as gummyboot.
>>
>> It might be possible to do what you want with mdadm metadata version 1.0. Typically bootable raid1 is ext4 on md raid1 using metadata format 1.0, and an internal bitmap. When the partitions are not assembled, they each appear as separate ext4 partitions. If FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a separate partition, and the mdadm v1.0 metadata at the end of the partition doesn't confuse the firmware, what should happen is any ESP can boot the system. Once the kernel and initramfs are loaded, mdadm will locate the mdadm metadata on each partition and assemble them into a single md device, and fstab mounts the md device at /boot. So prior to boot they are separate ESPs, and after boot it's a single ESP (mirrored). But I haven't tested this arrangement with ESPs and
UEFI.
>
> I'll test this configuration and see if it works soon.
>
>>
>> The easiest scenario I've found for resilient boot on EFI systems is, well, not easy. First, I put shim and grub package files onto each ESP along with the previously posted grub.cfg snippet. Those grub.cfgs are one time, non-updatable files, that point to /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on Btrfs raid1. That's about as reliable as it gets because the only dependencies are grub (which understands Btrfs multiple devices) and dracut baking the btrfs module into initramfs. It gets essentially fool proof if btrfs is compiled into the kernel. Other combinations are easier to break. I basically want ESPs that aren't being modified if at all avoidable because FAT32 breaks easily if anything is being written to it and there is a crash or power failure.
>>
>
> I agree that FAT32 can break during power failure, that's the reason why
> I'm trying to make it mirrored. But I want to get rid of grub as much as
> possible so I would prefer to use the first solution.
Having gone down the grub2-efi road, I don't blame you at all. However, I really think to be useful gummiboot is going to need filesystem extensions. I'm biased, if it were one filesystem, I'd pick Btrfs because so much can be gained with less work, and then put /boot files there. But because ext4 would be a gateway to supporting pretty much everything else, that's also understandable.
I think the ESP really ought to be predominately read only, and in terms of resilience we're better off updating each disk's ESP one at a time asynchronously, so that a crash or power fail hopefully only munges one device's ESP. If we get such an event with a raid1'd ESP all bets are off - if there's corruption it's likely on all ESPs (by design) and it's even possible they each have slightly different problems.
A compromise option for you is rEFInd, which like gummiboot is much much simpler than GRUB. But it does have filesystem extensions so you can put only rEFInd on each ESP. And then put boot files on something else like ext4, or even ext4 on md raid if you choose e.g. metadata 1.0.
And yet still another option, maybe even simpler, is extlinux for EFI. This work is somewhat new, and I haven't tried it. I know that grub (Fedora), gummiboot and rEFInd are using EFISTUB as the actual bootloader. I suspect the same is the case for extlinux which would make things easier.
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:48 ` David Brown
2014-02-04 8:53 ` Francis Moreau
2014-02-04 12:27 ` Phil Turmel
@ 2014-02-04 15:13 ` Chris Murphy
2014-02-04 15:29 ` Chris Murphy
2014-02-07 7:42 ` Francis Moreau
2 siblings, 2 replies; 27+ messages in thread
From: Chris Murphy @ 2014-02-04 15:13 UTC (permalink / raw)
To: David Brown; +Cc: Francis Moreau, Phil Turmel, linux-raid
On Feb 4, 2014, at 1:48 AM, David Brown <david.brown@hesbynett.no> wrote:
> On 04/02/14 09:41, Francis Moreau wrote:
>> On 02/02/2014 11:57 PM, Phil Turmel wrote:
>>> On 02/02/2014 05:30 PM, Chris Murphy wrote:
>>>>
>>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>>>> wrote:
>>>>>
>>>>> That's funny because one of the reasons I want to use UEFI firmware
>>>>> is to get rid of grub (I don't like it and the way it has become
>>>>> such a bloated beast): since /boot is vfat and has its own
>>>>> partition, I prefer use a much simpler bootloader such as
>>>>> gummyboot.
>>>
>>> Ditching the bootloader is possible:
>>>
>>> http://kroah.com/log/blog/2013/09/02/booting-a-self-signed-linux-kernel/
>>>
>>
>> Well yeah it's possible but not currently usable IMHO. It means that you
>> need to build your own kernel, include in this kernel the initramfs
>> image and you need to redo the whole process if you want to change a
>> single option in the kernel command line.
>>
>>> It seems to me that you should be able to create a raid1 v1.0 MD array
>>> of your EFI support partitions, and put the combined and signed
>>> kernel/initramfs onto it (mirrored to all member drives).
>>
>> Are both v0.9 and v1.0 MD put their metadata at the end of a partition
>> ? I thought only v0.9 would do that.
>
> Yes, it is only 0.9 format that is at the end of the partition.
Both 0.90 and 1.00 are at the end of the partition.
On a 550MiB disk with the last offset 0x225f0000, I get the start of metadata:
0.90
225f0000
1.00
225fe000
In both cases the resulting md device sizes are identical. So if anything 1.00 is very slightly farther back than 0.90, and is a newer metadata version. The Fedora installer use metadata 1.00 along with internal bitmap when creating bootable raid1 arrays.
This mdadm warning "possible for there to be confusion about whether the superblock applies to a whole device or just the last partition" would only apply if you attempt to make whole physical drives md member devices, which then runs into GPT problems. For one the 1.00 metadata would actually reside within the last 34 sectors of the physical device which is where the backup GPT belongs. Even if you use metadata 0.90 to avoid that, you now have a primary and backup GPT that agree when there is no raid1 active, but then the primary and backup GPT disagree when it is active. So you can't have it both ways with GPT formatted disks on UEFI: both ways meaning a disk that appears valid whether md is active or not.
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 9:06 ` Francis Moreau
2014-02-04 9:35 ` David Brown
@ 2014-02-04 15:27 ` Chris Murphy
1 sibling, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2014-02-04 15:27 UTC (permalink / raw)
To: Francis Moreau; +Cc: David Brown, linux-raid
On Feb 4, 2014, at 2:06 AM, Francis Moreau <francis.moro@gmail.com> wrote:
>>
> Ok, so basically RAID helps only in case of disk failure, right ?
Correct. (Although the Btrfs raid1 implementation can detect and correct for corruption, including phantom writes, so it makes the question "what is raid and isn't raid?" more open ended.)
>
> It seems odd to have chosen FAT32 in the first place then.
I'm not sure what else they could have picked. FAT32 is brain dead simple and has the least amount of plausible IP attached to it, and Microsoft was willing to set aside those IP concerns for anything having to do with EFI. FAT32 has changed hardly at all in 15 years. Yet NTFS, ext, HFS have all gone through quite a bit of change.
FWIW, the UEFI spec says FAT12 or FAT16 for removable media. And FAT32 for boot devices like hard drives. However, by default dosfstools chooses bitness based on the size of the partition being formatted, rather than removable vs not. So I've found that e.g. Fedora's installer creates a 200MB EFI System partition, and formats it using mkdosfs without options, and therefore results in FAT16 EFI System partitions.
https://bugzilla.redhat.com/show_bug.cgi?id=1046577
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 15:13 ` Chris Murphy
@ 2014-02-04 15:29 ` Chris Murphy
2014-02-07 7:42 ` Francis Moreau
1 sibling, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2014-02-04 15:29 UTC (permalink / raw)
To: linux-raid List
On Feb 4, 2014, at 8:13 AM, Chris Murphy <lists@colorremedies.com> wrote:
>
> This mdadm warning "possible for there to be confusion about whether the superblock applies to a whole device or just the last partition" would only apply if you attempt to make whole physical drives md member devices, which then runs into GPT problems.
And that warning applies to the older 0.90 metadata.
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 8:57 ` David Brown
2014-02-04 9:06 ` Francis Moreau
@ 2014-02-04 15:40 ` Chris Murphy
1 sibling, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2014-02-04 15:40 UTC (permalink / raw)
To: David Brown; +Cc: Francis Moreau, linux-raid
On Feb 4, 2014, at 1:57 AM, David Brown <david.brown@hesbynett.no> wrote:
> The most important way to protect your FAT32 system is simply to avoid
> writing to it except when absolutely necessary. If it is mounted
> read-only, and only updated when changing grub or updating the kernel,
> then just make sure you don't power-cycle your machine at that time.
> The smaller the critical window, the smaller the chances of problems.
I agree. I even question why most linux distros persistently mount the EFI System partition at /boot/efi for no apparently good reason. Windows and OS X do not keep the ESP mounted even read-only. It's pretty much never updated. If we're constantly updating the ESP for things like grub.cfg modifications, I think the implementation is flawed.
On EFI, grubx64.efi/core.img needs to look for grub.cfg /boot/grub2 not /boot/efi/EFI/fedora
https://bugzilla.redhat.com/show_bug.cgi?id=1048999
Chris Murphy
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 15:13 ` Chris Murphy
2014-02-04 15:29 ` Chris Murphy
@ 2014-02-07 7:42 ` Francis Moreau
1 sibling, 0 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-07 7:42 UTC (permalink / raw)
To: Chris Murphy, David Brown; +Cc: Phil Turmel, linux-raid
Hi Chris,
On 02/04/2014 04:13 PM, Chris Murphy wrote:
>
> On Feb 4, 2014, at 1:48 AM, David Brown <david.brown@hesbynett.no> wrote:
>
>> On 04/02/14 09:41, Francis Moreau wrote:
>>> On 02/02/2014 11:57 PM, Phil Turmel wrote:
>>>> On 02/02/2014 05:30 PM, Chris Murphy wrote:
>>>>>
>>>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> That's funny because one of the reasons I want to use UEFI firmware
>>>>>> is to get rid of grub (I don't like it and the way it has become
>>>>>> such a bloated beast): since /boot is vfat and has its own
>>>>>> partition, I prefer use a much simpler bootloader such as
>>>>>> gummyboot.
>>>>
>>>> Ditching the bootloader is possible:
>>>>
>>>> http://kroah.com/log/blog/2013/09/02/booting-a-self-signed-linux-kernel/
>>>>
>>>
>>> Well yeah it's possible but not currently usable IMHO. It means that you
>>> need to build your own kernel, include in this kernel the initramfs
>>> image and you need to redo the whole process if you want to change a
>>> single option in the kernel command line.
>>>
>>>> It seems to me that you should be able to create a raid1 v1.0 MD array
>>>> of your EFI support partitions, and put the combined and signed
>>>> kernel/initramfs onto it (mirrored to all member drives).
>>>
>>> Are both v0.9 and v1.0 MD put their metadata at the end of a partition
>>> ? I thought only v0.9 would do that.
>>
>> Yes, it is only 0.9 format that is at the end of the partition.
>
> Both 0.90 and 1.00 are at the end of the partition.
>
> On a 550MiB disk with the last offset 0x225f0000, I get the start of metadata:
>
> 0.90
> 225f0000
>
> 1.00
> 225fe000
>
> In both cases the resulting md device sizes are identical. So if anything 1.00 is very slightly farther back than 0.90, and is a newer metadata version. The Fedora installer use metadata 1.00 along with internal bitmap when creating bootable raid1 arrays.
>
> This mdadm warning "possible for there to be confusion about whether the superblock applies to a whole device or just the last partition" would only apply if you attempt to make whole physical drives md member devices, which then runs into GPT problems. For one the 1.00 metadata would actually reside within the last 34 sectors of the physical device which is where the backup GPT belongs. Even if you use metadata 0.90 to avoid that, you now have a primary and backup GPT that agree when there is no raid1 active, but then the primary and backup GPT disagree when it is active. So you can't have it both ways with GPT formatted disks on UEFI: both ways meaning a disk that appears valid whether md is active or not.
>
hmm I didn't think about this previously but why one would use the whole
physical drives as md member devices, specially as for the boot disk
(the one used to store the bootloader) ?
Thanks
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Soft RAID and EFI systems
2014-02-04 14:50 ` Chris Murphy
@ 2014-02-07 8:00 ` Francis Moreau
0 siblings, 0 replies; 27+ messages in thread
From: Francis Moreau @ 2014-02-07 8:00 UTC (permalink / raw)
To: Chris Murphy; +Cc: linux-raid
On 02/04/2014 03:50 PM, Chris Murphy wrote:
>
> On Feb 4, 2014, at 1:32 AM, Francis Moreau <francis.moro@gmail.com> wrote:
>
>> On 02/02/2014 11:30 PM, Chris Murphy wrote:
>>>
>>> On Feb 2, 2014, at 2:34 PM, Francis Moreau <francis.moro@gmail.com> wrote:
>>>>
>>>> That's funny because one of the reasons I want to use UEFI firmware is
>>>> to get rid of grub (I don't like it and the way it has become such a
>>>> bloated beast): since /boot is vfat and has its own partition, I prefer
>>>> use a much simpler bootloader such as gummyboot.
>>>
>>> It might be possible to do what you want with mdadm metadata version 1.0. Typically bootable raid1 is ext4 on md raid1 using metadata format 1.0, and an internal bitmap. When the partitions are not assembled, they each appear as separate ext4 partitions. If FAT32 on md raid1 with metadata 1.0 still looks like FAT32 as a separate partition, and the mdadm v1.0 metadata at the end of the partition doesn't confuse the firmware, what should happen is any ESP can boot the system. Once the kernel and initramfs are loaded, mdadm will locate the mdadm metadata on each partition and assemble them into a single md device, and fstab mounts the md device at /boot. So prior to boot they are separate ESPs, and after boot it's a single ESP (mirrored). But I haven't tested this arrangement with ESPs an
d UEFI.
>>
>> I'll test this configuration and see if it works soon.
>>
>>>
>>> The easiest scenario I've found for resilient boot on EFI systems is, well, not easy. First, I put shim and grub package files onto each ESP along with the previously posted grub.cfg snippet. Those grub.cfgs are one time, non-updatable files, that point to /boot/grub2/grub.cfg (produced with grub2-mkconfig on Fedora) on Btrfs raid1. That's about as reliable as it gets because the only dependencies are grub (which understands Btrfs multiple devices) and dracut baking the btrfs module into initramfs. It gets essentially fool proof if btrfs is compiled into the kernel. Other combinations are easier to break. I basically want ESPs that aren't being modified if at all avoidable because FAT32 breaks easily if anything is being written to it and there is a crash or power failure.
>>>
>>
>> I agree that FAT32 can break during power failure, that's the reason why
>> I'm trying to make it mirrored. But I want to get rid of grub as much as
>> possible so I would prefer to use the first solution.
>
> Having gone down the grub2-efi road, I don't blame you at all. However, I really think to be useful gummiboot is going to need filesystem extensions. I'm biased, if it were one filesystem, I'd pick Btrfs because so much can be gained with less work, and then put /boot files there. But because ext4 would be a gateway to supporting pretty much everything else, that's also understandable.
>
Hmm ESP must be vfat so you'll always get a (small) number of files
(bootloader configurations files) that can be rewritten and will live in
the ESP.
And some bootloaders, such as syslinux (probably gummiboot too), do not
(currently) have the ability to access files outside its own partition
so they need the kernel and initramfs files to be in the ESP.
> I think the ESP really ought to be predominately read only, and in terms of resilience we're better off updating each disk's ESP one at a time asynchronously, so that a crash or power fail hopefully only munges one device's ESP. If we get such an event with a raid1'd ESP all bets are off - if there's corruption it's likely on all ESPs (by design) and it's even possible they each have slightly different problems.
>
> A compromise option for you is rEFInd, which like gummiboot is much much simpler than GRUB. But it does have filesystem extensions so you can put only rEFInd on each ESP. And then put boot files on something else like ext4, or even ext4 on md raid if you choose e.g. metadata 1.0.
>
I didn't know rEFInd, thanks for pointing it out. It seems that it
currently supports ReiserFS and ext2 filesystems though.
> And yet still another option, maybe even simpler, is extlinux for EFI. This work is somewhat new, and I haven't tried it. I know that grub (Fedora), gummiboot and rEFInd are using EFISTUB as the actual bootloader. I suspect the same is the case for extlinux which would make things easier.
>
There's also syslinux which can be used on UEFI systems. It supports a
reasonnable number of filesystems but comes with some limitations too:
https://wiki.archlinux.org/index.php/syslinux#Limitations_of_UEFI_Syslinux
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-02-07 8:00 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31 17:02 Soft RAID and EFI systems Francis Moreau
2014-02-01 22:04 ` Martin Wilck
2014-02-02 21:39 ` Francis Moreau
2014-02-02 21:56 ` Martin Wilck
2014-02-02 20:39 ` Chris Murphy
2014-02-02 21:34 ` Francis Moreau
2014-02-02 22:30 ` Chris Murphy
2014-02-02 22:57 ` Phil Turmel
2014-02-03 7:19 ` Martin Wilck
2014-02-04 8:41 ` Francis Moreau
2014-02-04 8:48 ` David Brown
2014-02-04 8:53 ` Francis Moreau
2014-02-04 12:27 ` Phil Turmel
2014-02-04 15:13 ` Chris Murphy
2014-02-04 15:29 ` Chris Murphy
2014-02-07 7:42 ` Francis Moreau
2014-02-04 8:32 ` Francis Moreau
2014-02-04 8:57 ` David Brown
2014-02-04 9:06 ` Francis Moreau
2014-02-04 9:35 ` David Brown
2014-02-04 9:45 ` Francis Moreau
2014-02-04 15:27 ` Chris Murphy
2014-02-04 15:40 ` Chris Murphy
2014-02-04 14:50 ` Chris Murphy
2014-02-07 8:00 ` Francis Moreau
2014-02-03 9:56 ` David Brown
2014-02-04 8:22 ` Francis Moreau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).