Re: 64-btrfs.rules and degraded boot

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Andrei Borzenkov <arvidjaar@gmail.com>
Cc: Chris Murphy <lists@colorremedies.com>,
	Kai Krakow <hurikhan77@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: 64-btrfs.rules and degraded boot
Date: Wed, 6 Jul 2016 08:48:40 -0400	[thread overview]
Message-ID: <93cdc463-8f53-5cf6-055c-05b5359ad814@gmail.com> (raw)
In-Reply-To: <1E3215A5-EAA9-425D-AE08-B81B57D3043E@gmail.com>

On 2016-07-06 08:39, Andrei Borzenkov wrote:
>
>
> Отправлено с iPhone
>
>> 6 июля 2016 г., в 15:14, Austin S. Hemmelgarn <ahferroin7@gmail.com> написал(а):
>>
>>> On 2016-07-06 07:55, Andrei Borzenkov wrote:
>>> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>> On 2016-07-06 05:51, Andrei Borzenkov wrote:
>>>>>
>>>>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com>
>>>>> wrote:
>>>>>>
>>>>>> I started a systemd-devel@ thread since that's where most udev stuff
>>>>>> gets talked about.
>>>>>>
>>>>>>
>>>>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>>>>
>>>>> Before discussing how to implement it in systemd, we need to decide
>>>>> what to implement. I.e.
>>>>>
>>>>> 1) do you always want to mount filesystem in degraded mode if not
>>>>> enough devices are present or only if explicit hint is given?
>>>>> 2) do you want to restrict degrade handling to root only or to other
>>>>> filesystems as well? Note that there could be more early boot
>>>>> filesystems that absolutely need same treatment (enters separate
>>>>> /usr), and there are also normal filesystems that may need be mounted
>>>>> even degraded.
>>>>> 3) can we query btrfs whether it is mountable in degraded mode?
>>>>> according to documentation, "btrfs device ready" (which udev builtin
>>>>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>>>>> This is required for proper systemd ordering of services.
>>>>
>>>>
>>>> To be entirely honest, if it were me, I'd want systemd to fsck off.  If the
>>>> kernel mount(2) call succeeds, then the filesystem was ready enough to
>>>> mount, and if it doesn't, then it wasn't, end of story.
>>>
>>> How should user space know when to try mount? What user space is
>>> supposed to do during boot if mount fails? Do you suggest
>>>
>>> while true; do
>>>  mount /dev/foo && exit 0
>>> done
>>>
>>> as part of startup sequence? And note that nowhere is systemd involved so far.
>> Nowhere there, except if you have a filesystem in fstab (or a mount unit, which I hate for other reasons that I will not go into right now), and you mount it and systemd thinks the device isn't ready, it unmounts it _immediately_.  In the case of boot, it's because of systemd thinking the device isn't ready that you can't mount degraded with a missing device.  In the case of the root filesystem at least, the initramfs is expected to handle this, and most of them do poll in some way, or have other methods of determining this.  I occasionally have issues with it with dracut without systemd, but that's due to a separate bug there involving the device mapper.
>>
>
> How this systemd bashing answers my question - how user space knows when it can call mount at startup?
You mentioned that systemd wasn't involved, which is patently false if 
it's being used as your init system, and I was admittedly mostly 
responding to that.

Now, to answer the primary question which I forgot to answer:
Userspace doesn't.  Systemd doesn't either but assumes it does and 
checks in a flawed way.  Dracut's polling loop assumes it does but 
sometimes fails in a different way.  There is no way other than calling 
mount right now to know for sure if the mount will succeed, and that 
actually applies to a certain degree to any filesystem (because any 
number of things that are outside of even the kernel's control might 
happen while trying to mount the device.
>
>
>>>
>>>> The whole concept
>>>> of trying to track in userspace something the kernel itself tracks and knows
>>>> a whole lot more about is absolutely stupid.
>>>
>>> It need not be user space. If kernel notifies user space when
>>> filesystem is mountable, problem solved. It could be udev event,
>>> netlink, whatever. Until kernel does it, user space need to either
>>> poll or somehow track it based on available events.
>> THis I agree could be done better, but it absolutely should not be in userspace, the notification needs to come from the kernel, but that leads to the problem of knowing whether or not the FS can mount degraded, or only ro, or any number of other situations.
>>>
>>>> It makes some sense when
>>>> dealing with LVM or MD, because that is potentially a security issue
>>>> (someone could inject a bogus device node that you then mount instead of
>>>> your desired target),
>>>
>>> I do not understand it at all. MD and LVM has exactly the same problem
>>> - they need to know when they can assemble MD/VG. I miss what it has
>>> to do with security, sorry.
>> If you don't track whether or not the device is assembled, then someone could create an arbitrary device node with the same name and then get you to mount that, possibly causing all kinds of issues depending on any number of other factors.
>
> Device node is created as soon as array is seen for the first time. If you imply someone may replace it, what prevents doing it at any arbitrary time in the future?
It's still possible, but it's not as easy because replacing it after 
it's mounted would require a remount to have any effect.  The most 
reliable time to do something like this is during boot before the mount. 
  LVM and/or MD may or may not replace the node properly when they start 
(I don't have enough background on MD and haven't tested with LVM), but 
if that's after the fake node has already been mounted, then it's won't 
help much, except for helping cover up the attack.
>
>>>
>>>> but it makes no sense here, because there's no way to
>>>> prevent the equivalent from happening in BTRFS.
>>>>
>>>> As far as the udev rules, I'm pretty certain that _we_ ship those with
>>>> btrfs-progs,
>>>
>>> No, you do not. You ship rule to rename devices to be more
>>> "user-friendly". But the rule in question has always been part of
>>> udev.
>> Ah, you're right, I was mistaken about this.
>>>
>>>> I have no idea why they're packaged with udev in CentOS (oh
>>>> wait, I bet they package every single possible udev rule in that package
>>>> just in case, don't they?).
>>

next prev parent reply	other threads:[~2016-07-06 12:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy
2016-07-05 19:27 ` Kai Krakow
2016-07-05 19:30   ` Chris Murphy
2016-07-05 20:10     ` Chris Murphy
2016-07-06  9:51       ` Andrei Borzenkov
2016-07-06 11:45         ` Austin S. Hemmelgarn
2016-07-06 11:55           ` Andrei Borzenkov
2016-07-06 12:14             ` Austin S. Hemmelgarn
2016-07-06 12:39               ` Andrei Borzenkov
2016-07-06 12:48                 ` Austin S. Hemmelgarn [this message]
2016-07-07 16:52                   ` Goffredo Baroncelli
2016-07-07 18:23                     ` Austin S. Hemmelgarn
2016-07-07 18:58                       ` Chris Murphy
2016-07-07 19:14                         ` Chris Murphy
2016-07-07 19:59                         ` Austin S. Hemmelgarn
2016-07-07 20:20                           ` Chris Murphy
2016-07-08 12:24                             ` Austin S. Hemmelgarn
2016-07-11 21:07                               ` Chris Murphy
2016-07-12 15:34                                 ` Austin S. Hemmelgarn
2016-07-07 20:13                         ` Goffredo Baroncelli
2016-07-07 19:41                       ` Goffredo Baroncelli
2016-07-06 12:49             ` Tomasz Torcz
2016-07-06 17:19         ` Chris Murphy
2016-07-06 18:04           ` Austin S. Hemmelgarn
2016-07-06 18:23             ` Chris Murphy
2016-07-06 18:29               ` Andrei Borzenkov
2016-07-06 19:17               ` Austin S. Hemmelgarn
2016-07-06 20:00                 ` Chris Murphy
2016-07-07 17:00                   ` Goffredo Baroncelli
2016-07-06 18:24           ` Andrei Borzenkov
2016-07-06 18:57             ` Chris Murphy
2016-07-07 17:07               ` Goffredo Baroncelli
2016-07-07 16:37 ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93cdc463-8f53-5cf6-055c-05b5359ad814@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=arvidjaar@gmail.com \
    --cc=hurikhan77@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).