* btrfs and containers
@ 2016-03-07 22:55 Tobias Hunger
2016-03-07 23:45 ` Chris Murphy
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Tobias Hunger @ 2016-03-07 22:55 UTC (permalink / raw)
To: linux-btrfs
Hi,
I have been running systemd-nspawn containers on top of a btrfs
filesystem for a while now.
This works great: Snapshots are a huge help to manage containers!
But today I ran btrfs subvol list . *inside* a container. To my
surprise I got a list of *all* subvolumes on that drive. That is
basically a complete list of containers running on the machine. I do
not want to have that kind of information exposed to my containers.
Is there a way to stop btrfs from listing subvolumes "above" the
current location? So that "btrfs subvol list /" in a container will
only show subvolumes that are set up in the container?
Best Regards,
Tobias
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-07 22:55 btrfs and containers Tobias Hunger
@ 2016-03-07 23:45 ` Chris Murphy
2016-03-08 19:58 ` Liu Bo
2016-03-08 12:12 ` Austin S. Hemmelgarn
2016-03-09 21:10 ` Marc MERLIN
2 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-07 23:45 UTC (permalink / raw)
To: Tobias Hunger; +Cc: Btrfs BTRFS
On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
> Hi,
>
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
>
> This works great: Snapshots are a huge help to manage containers!
>
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.
>
> Is there a way to stop btrfs from listing subvolumes "above" the
> current location? So that "btrfs subvol list /" in a container will
> only show subvolumes that are set up in the container?
I'm not sure whether this is something that goes in Btrfs proper,
since this is presumably a privileged container? The same thing
happens with Docker containers. One way to do this is if it's not
privileged, as non-root can't list subvolumes. I think some work is
needed to make it possible for users to list subvolumes they own.
Right now a user can create a subvolume but then now list or get
information on it. By default they can't delete it either unless a
special mount option is used. So I think there's work that's needed
one way or another, and maybe in more than one part.
--
Chris Murphy
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-07 22:55 btrfs and containers Tobias Hunger
2016-03-07 23:45 ` Chris Murphy
@ 2016-03-08 12:12 ` Austin S. Hemmelgarn
2016-03-09 21:10 ` Marc MERLIN
2 siblings, 0 replies; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-08 12:12 UTC (permalink / raw)
To: Tobias Hunger, linux-btrfs
On 2016-03-07 17:55, Tobias Hunger wrote:
> Hi,
>
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
>
> This works great: Snapshots are a huge help to manage containers!
>
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.
>
> Is there a way to stop btrfs from listing subvolumes "above" the
> current location? So that "btrfs subvol list /" in a container will
> only show subvolumes that are set up in the container?
>
There is not currently a way to do this. My personal recommendation
until there is would be to use LVM or something similar and have each
container on it's own FS (this has other advantages too, like being able
to use seed devices to quickly spin up containers in a known state.
Ideally though, we should be checking the current root directory when in
a mount namespace, and not list subvolumes outside that tree.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-07 23:45 ` Chris Murphy
@ 2016-03-08 19:58 ` Liu Bo
2016-03-08 21:28 ` Chris Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Liu Bo @ 2016-03-08 19:58 UTC (permalink / raw)
To: Chris Murphy; +Cc: Tobias Hunger, Btrfs BTRFS
On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
> > Hi,
> >
> > I have been running systemd-nspawn containers on top of a btrfs
> > filesystem for a while now.
> >
> > This works great: Snapshots are a huge help to manage containers!
> >
> > But today I ran btrfs subvol list . *inside* a container. To my
> > surprise I got a list of *all* subvolumes on that drive. That is
> > basically a complete list of containers running on the machine. I do
> > not want to have that kind of information exposed to my containers.
> >
> > Is there a way to stop btrfs from listing subvolumes "above" the
> > current location? So that "btrfs subvol list /" in a container will
> > only show subvolumes that are set up in the container?
That's a good question.
Looks like that "btrfs subvolume list -o" match the needs here.
>
> I'm not sure whether this is something that goes in Btrfs proper,
> since this is presumably a privileged container? The same thing
> happens with Docker containers. One way to do this is if it's not
> privileged, as non-root can't list subvolumes. I think some work is
> needed to make it possible for users to list subvolumes they own.
> Right now a user can create a subvolume but then now list or get
> information on it. By default they can't delete it either unless a
> special mount option is used. So I think there's work that's needed
> one way or another, and maybe in more than one part.
Unfortunately, btrfs subvolume list 's various usage is built on top of TREE_SEARCH ioctl
which requires CAP_SYS_ADMIN.
So what we need here might be to teach 'btrfs sub list' to recognize
container's CAP_SYS_XXX (if this is possible?)
Thanks,
-liubo
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-08 19:58 ` Liu Bo
@ 2016-03-08 21:28 ` Chris Murphy
2016-03-09 12:15 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-08 21:28 UTC (permalink / raw)
To: bo.li.liu; +Cc: Chris Murphy, Tobias Hunger, Btrfs BTRFS
On Tue, Mar 8, 2016 at 12:58 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
>> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
>> > Hi,
>> >
>> > I have been running systemd-nspawn containers on top of a btrfs
>> > filesystem for a while now.
>> >
>> > This works great: Snapshots are a huge help to manage containers!
>> >
>> > But today I ran btrfs subvol list . *inside* a container. To my
>> > surprise I got a list of *all* subvolumes on that drive. That is
>> > basically a complete list of containers running on the machine. I do
>> > not want to have that kind of information exposed to my containers.
>> >
>> > Is there a way to stop btrfs from listing subvolumes "above" the
>> > current location? So that "btrfs subvol list /" in a container will
>> > only show subvolumes that are set up in the container?
>
> That's a good question.
>
> Looks like that "btrfs subvolume list -o" match the needs here.
>
>>
>> I'm not sure whether this is something that goes in Btrfs proper,
>> since this is presumably a privileged container? The same thing
>> happens with Docker containers. One way to do this is if it's not
>> privileged, as non-root can't list subvolumes. I think some work is
>> needed to make it possible for users to list subvolumes they own.
>> Right now a user can create a subvolume but then now list or get
>> information on it. By default they can't delete it either unless a
>> special mount option is used. So I think there's work that's needed
>> one way or another, and maybe in more than one part.
>
> Unfortunately, btrfs subvolume list 's various usage is built on top of TREE_SEARCH ioctl
> which requires CAP_SYS_ADMIN.
>
> So what we need here might be to teach 'btrfs sub list' to recognize
> container's CAP_SYS_XXX (if this is possible).
Yes, it's a bit peculiar I can create subvolumes and snapshot them,
but can't 'btrfs sub list/show'
It's an open question why the user needs a subvolume, but I'm not
thinking of a human user necessarily but rather some service, maybe
it's httpd. Or maybe with the xdg-app stuff the Gnome folks are
working on it makes sense to encapsulate applications and their
updates in their own subvolume. *shrug* I'm open to the idea that the
use case needs to be more compelling and detailed in order to get the
implementation right.
--
Chris Murphy
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-08 21:28 ` Chris Murphy
@ 2016-03-09 12:15 ` Austin S. Hemmelgarn
2016-03-10 2:55 ` Duncan
0 siblings, 1 reply; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-09 12:15 UTC (permalink / raw)
To: Chris Murphy, bo.li.liu; +Cc: Tobias Hunger, Btrfs BTRFS
On 2016-03-08 16:28, Chris Murphy wrote:
> On Tue, Mar 8, 2016 at 12:58 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
>>> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I have been running systemd-nspawn containers on top of a btrfs
>>>> filesystem for a while now.
>>>>
>>>> This works great: Snapshots are a huge help to manage containers!
>>>>
>>>> But today I ran btrfs subvol list . *inside* a container. To my
>>>> surprise I got a list of *all* subvolumes on that drive. That is
>>>> basically a complete list of containers running on the machine. I do
>>>> not want to have that kind of information exposed to my containers.
>>>>
>>>> Is there a way to stop btrfs from listing subvolumes "above" the
>>>> current location? So that "btrfs subvol list /" in a container will
>>>> only show subvolumes that are set up in the container?
>>
>> That's a good question.
>>
>> Looks like that "btrfs subvolume list -o" match the needs here.
>>
>>>
>>> I'm not sure whether this is something that goes in Btrfs proper,
>>> since this is presumably a privileged container? The same thing
>>> happens with Docker containers. One way to do this is if it's not
>>> privileged, as non-root can't list subvolumes. I think some work is
>>> needed to make it possible for users to list subvolumes they own.
>>> Right now a user can create a subvolume but then now list or get
>>> information on it. By default they can't delete it either unless a
>>> special mount option is used. So I think there's work that's needed
>>> one way or another, and maybe in more than one part.
>>
>> Unfortunately, btrfs subvolume list 's various usage is built on top of TREE_SEARCH ioctl
>> which requires CAP_SYS_ADMIN.
>>
>> So what we need here might be to teach 'btrfs sub list' to recognize
>> container's CAP_SYS_XXX (if this is possible).
>
>
> Yes, it's a bit peculiar I can create subvolumes and snapshot them,
> but can't 'btrfs sub list/show'
>
> It's an open question why the user needs a subvolume, but I'm not
> thinking of a human user necessarily but rather some service, maybe
> it's httpd. Or maybe with the xdg-app stuff the Gnome folks are
> working on it makes sense to encapsulate applications and their
> updates in their own subvolume. *shrug* I'm open to the idea that the
> use case needs to be more compelling and detailed in order to get the
> implementation right.
>
It's probably worth tossing out there that I use them on a regular basis
as a normal user (not root or some service) for:
1. Local copies of VCS repositories.
2. Build directories.
3. Staging areas for a variety of things.
4. Specifically isolating certain parts of my home directory from backups.
1-3 are mostly because of the fact that deleting a subvolume is insanely
fast compared to recursive deletion of a directory, although 4 is
somewhat significant for those as well.
In general I can see them being useful for any number of things from a
service perspective, although I feel that snapshots are likely more
useful there (the ability to atomically save the state of a set of files
is extremely useful for a lot of things).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-07 22:55 btrfs and containers Tobias Hunger
2016-03-07 23:45 ` Chris Murphy
2016-03-08 12:12 ` Austin S. Hemmelgarn
@ 2016-03-09 21:10 ` Marc MERLIN
2016-03-09 21:21 ` Chris Murphy
2 siblings, 1 reply; 16+ messages in thread
From: Marc MERLIN @ 2016-03-09 21:10 UTC (permalink / raw)
To: Tobias Hunger; +Cc: linux-btrfs
On Mon, Mar 07, 2016 at 11:55:47PM +0100, Tobias Hunger wrote:
> Hi,
>
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
>
> This works great: Snapshots are a huge help to manage containers!
>
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.
I have a very stripped down docker image that actually mounts portion of
of my root filesystem read only.
While it's running out of a btrfs filesystem, you can't run btrfs
commands against it:
05233e5c91f0:/# btrfs fi show
05233e5c91f0:/# btrfs subvol list /
ERROR: can't perform the search - Operation not permitted
05233e5c91f0:/# btrfs subvol list .
ERROR: can't perform the search - Operation not permitted
I didn't do anything special, it's just working that way.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-09 21:10 ` Marc MERLIN
@ 2016-03-09 21:21 ` Chris Murphy
2016-03-09 21:45 ` Marc MERLIN
0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-09 21:21 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Tobias Hunger, Btrfs BTRFS
On Wed, Mar 9, 2016 at 2:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Mon, Mar 07, 2016 at 11:55:47PM +0100, Tobias Hunger wrote:
>> Hi,
>>
>> I have been running systemd-nspawn containers on top of a btrfs
>> filesystem for a while now.
>>
>> This works great: Snapshots are a huge help to manage containers!
>>
>> But today I ran btrfs subvol list . *inside* a container. To my
>> surprise I got a list of *all* subvolumes on that drive. That is
>> basically a complete list of containers running on the machine. I do
>> not want to have that kind of information exposed to my containers.
>
> I have a very stripped down docker image that actually mounts portion of
> of my root filesystem read only.
> While it's running out of a btrfs filesystem, you can't run btrfs
> commands against it:
> 05233e5c91f0:/# btrfs fi show
> 05233e5c91f0:/# btrfs subvol list /
> ERROR: can't perform the search - Operation not permitted
> 05233e5c91f0:/# btrfs subvol list .
> ERROR: can't perform the search - Operation not permitted
>
> I didn't do anything special, it's just working that way.
Yep, you're not using --privileged in which case you can't list
things. But I'm not sure what the equivalent is off hand with
systemd-nspawn containers, I think those may always be privileged?
--
Chris Murphy
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-09 21:21 ` Chris Murphy
@ 2016-03-09 21:45 ` Marc MERLIN
2016-03-09 23:28 ` Rich Freeman
0 siblings, 1 reply; 16+ messages in thread
From: Marc MERLIN @ 2016-03-09 21:45 UTC (permalink / raw)
To: Chris Murphy; +Cc: Tobias Hunger, Btrfs BTRFS
On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote:
> > I have a very stripped down docker image that actually mounts portion of
> > of my root filesystem read only.
> > While it's running out of a btrfs filesystem, you can't run btrfs
> > commands against it:
> > 05233e5c91f0:/# btrfs fi show
> > 05233e5c91f0:/# btrfs subvol list /
> > ERROR: can't perform the search - Operation not permitted
> > 05233e5c91f0:/# btrfs subvol list .
> > ERROR: can't perform the search - Operation not permitted
> >
> > I didn't do anything special, it's just working that way.
>
> Yep, you're not using --privileged in which case you can't list
> things. But I'm not sure what the equivalent is off hand with
> systemd-nspawn containers, I think those may always be privileged?
Ok, cool. I just used docker out of the box, glad to know it errs on
the secure side by default.
(and I don't have systemd, so that may also help me there)
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-09 21:45 ` Marc MERLIN
@ 2016-03-09 23:28 ` Rich Freeman
0 siblings, 0 replies; 16+ messages in thread
From: Rich Freeman @ 2016-03-09 23:28 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Chris Murphy, Tobias Hunger, Btrfs BTRFS
On Wed, Mar 9, 2016 at 4:45 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote:
>> > I have a very stripped down docker image that actually mounts portion of
>> > of my root filesystem read only.
>> > While it's running out of a btrfs filesystem, you can't run btrfs
>> > commands against it:
>> > 05233e5c91f0:/# btrfs fi show
>> > 05233e5c91f0:/# btrfs subvol list /
>> > ERROR: can't perform the search - Operation not permitted
>> > 05233e5c91f0:/# btrfs subvol list .
>> > ERROR: can't perform the search - Operation not permitted
>> >
>> > I didn't do anything special, it's just working that way.
>>
>> Yep, you're not using --privileged in which case you can't list
>> things. But I'm not sure what the equivalent is off hand with
>> systemd-nspawn containers, I think those may always be privileged?
>
> Ok, cool. I just used docker out of the box, glad to know it errs on
> the secure side by default.
> (and I don't have systemd, so that may also help me there)
>
I'm sure the default capability list for systemd-nspawn and docker is
different. I know that you can tune nspawn to give the container
whatever capabilities you want it to. In general though a general
warning is that linux containers are still not quite 100% secure when
root is running inside. Obviously the fewer capabilities you give
them the better, but the level of isolation isn't quite to VM levels.
It is better than chroot levels, however.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-09 12:15 ` Austin S. Hemmelgarn
@ 2016-03-10 2:55 ` Duncan
2016-03-10 17:04 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 16+ messages in thread
From: Duncan @ 2016-03-10 2:55 UTC (permalink / raw)
To: linux-btrfs
Austin S. Hemmelgarn posted on Wed, 09 Mar 2016 07:15:36 -0500 as
excerpted:
> On 2016-03-08 16:28, Chris Murphy wrote:
>> Yes, it's a bit peculiar I can create subvolumes and snapshot them, but
>> can't 'btrfs sub list/show'
>>
>> It's an open question why the user needs a subvolume, but I'm not
>> thinking of a human user necessarily but rather some service, maybe
>> it's httpd. Or maybe with the xdg-app stuff the Gnome folks are working
>> on it makes sense to encapsulate applications and their updates in
>> their own subvolume. *shrug* I'm open to the idea that the use case
>> needs to be more compelling and detailed in order to get the
>> implementation right.
>>
> It's probably worth tossing out there that I use them on a regular basis
> as a normal user (not root or some service) for:
> 1. Local copies of VCS repositories.
> 2. Build directories.
> 3. Staging areas for a variety of things.
> 4. Specifically isolating certain parts of my home directory from
> backups.
>
> 1-3 are mostly because of the fact that deleting a subvolume is insanely
> fast compared to recursive deletion of a directory, although 4 is
> somewhat significant for those as well.
For #2 and possibly #3, depending on what's being staged and why, tmpfs
works well, and deleting should be even faster (AFAIK, subvolume deletion
returns immediately but the work continues in the background, so if
you're running other IO-bound jobs they'll still be affected even tho the
subvolume deletion command has returned... if it's all in memory as is
tmpfs, that problem's eliminated too), tho of course you need enough
memory so that tmpfs doesn't trigger swap-thrashing.
But #1 and #4 of course don't work as well on tmpfs as you'll likely want
them around longer, and all four cases definitely make use of the the
fact that nested subvolumes wall off snapshotting and thus btrfs send,
for backup purposes. And of course if you're on a limited-memory machine
and thus can't easily use tmpfs for building and other staging, and don't
need to care about the ongoing background IO, using subvolumes for #2 and
3 remains useful, as well.
> In general I can see them being useful for any number of things from a
> service perspective, although I feel that snapshots are likely more
> useful there (the ability to atomically save the state of a set of files
> is extremely useful for a lot of things).
I consider the current situation somewhat of a security (DoS) issue,
since users (or runaway scripts or malware) can create unlimited
subvolumes as an ordinary user, with that user then not being able to
delete them, requiring admin intervention to do so. Of course as long as
it's a single-human-user with an admin-rights alter-ego login, it's not
/that/ much of a security issue, but I could see it being one for human
users who do not have that admin-rights alter-ego login. So were I to be
running in such a situation, I'd probably use the mount option to let the
users delete their own subvolumes, unless of course that opens up other
security issues I'm not aware of.
IMO before btrfs can really be considered stable, this possible DoS needs
resolved by making the list/delete set the exact same as the create set,
either by giving users some way to deal with (only) their own subvolumes
just as they can their own directories, or by reserving subvolume
creation to superuser, because that's what's needed for listing and
deletion. Because if not, I fear someone's going to take advantage of it
in some way, perhaps, as with many DoS vulns, using it to deny critical
resources as a way to simplify some other more critical attack, and it'll
be in the headlines as an attack that worked and a zero-day that still
works.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-10 2:55 ` Duncan
@ 2016-03-10 17:04 ` Austin S. Hemmelgarn
2016-03-10 19:35 ` Chris Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-10 17:04 UTC (permalink / raw)
To: Duncan, linux-btrfs
On 2016-03-09 21:55, Duncan wrote:
> Austin S. Hemmelgarn posted on Wed, 09 Mar 2016 07:15:36 -0500 as
> excerpted:
>
>> On 2016-03-08 16:28, Chris Murphy wrote:
>
>>> Yes, it's a bit peculiar I can create subvolumes and snapshot them, but
>>> can't 'btrfs sub list/show'
>>>
>>> It's an open question why the user needs a subvolume, but I'm not
>>> thinking of a human user necessarily but rather some service, maybe
>>> it's httpd. Or maybe with the xdg-app stuff the Gnome folks are working
>>> on it makes sense to encapsulate applications and their updates in
>>> their own subvolume. *shrug* I'm open to the idea that the use case
>>> needs to be more compelling and detailed in order to get the
>>> implementation right.
>>>
>> It's probably worth tossing out there that I use them on a regular basis
>> as a normal user (not root or some service) for:
>> 1. Local copies of VCS repositories.
>> 2. Build directories.
>> 3. Staging areas for a variety of things.
>> 4. Specifically isolating certain parts of my home directory from
>> backups.
>>
>> 1-3 are mostly because of the fact that deleting a subvolume is insanely
>> fast compared to recursive deletion of a directory, although 4 is
>> somewhat significant for those as well.
>
> For #2 and possibly #3, depending on what's being staged and why, tmpfs
> works well, and deleting should be even faster (AFAIK, subvolume deletion
> returns immediately but the work continues in the background, so if
> you're running other IO-bound jobs they'll still be affected even tho the
> subvolume deletion command has returned... if it's all in memory as is
> tmpfs, that problem's eliminated too), tho of course you need enough
> memory so that tmpfs doesn't trigger swap-thrashing.
Yeah, most of the time I use subvolumes for item 2 or 3 it's either
dealing with stuff that I specifically want persistent across reboots
(for example, the build directory I keep in /usr/src for the kernel, or
staging directories for audio recordings), or things that are big enough
I really want to avoid the memory consumption from working on tmpfs (as
of right now, the only package I have installed on any of my systems
that fits this is LLVM/clang, I used to do this for some other software
like LibreOffice, webkit-gtk, and icedtea as well though).
>
> But #1 and #4 of course don't work as well on tmpfs as you'll likely want
> them around longer, and all four cases definitely make use of the the
> fact that nested subvolumes wall off snapshotting and thus btrfs send,
> for backup purposes. And of course if you're on a limited-memory machine
> and thus can't easily use tmpfs for building and other staging, and don't
> need to care about the ongoing background IO, using subvolumes for #2 and
> 3 remains useful, as well.
>
>> In general I can see them being useful for any number of things from a
>> service perspective, although I feel that snapshots are likely more
>> useful there (the ability to atomically save the state of a set of files
>> is extremely useful for a lot of things).
>
> I consider the current situation somewhat of a security (DoS) issue,
> since users (or runaway scripts or malware) can create unlimited
> subvolumes as an ordinary user, with that user then not being able to
> delete them, requiring admin intervention to do so. Of course as long as
> it's a single-human-user with an admin-rights alter-ego login, it's not
> /that/ much of a security issue, but I could see it being one for human
> users who do not have that admin-rights alter-ego login. So were I to be
> running in such a situation, I'd probably use the mount option to let the
> users delete their own subvolumes, unless of course that opens up other
> security issues I'm not aware of.
>
> IMO before btrfs can really be considered stable, this possible DoS needs
> resolved by making the list/delete set the exact same as the create set,
> either by giving users some way to deal with (only) their own subvolumes
> just as they can their own directories, or by reserving subvolume
> creation to superuser, because that's what's needed for listing and
> deletion. Because if not, I fear someone's going to take advantage of it
> in some way, perhaps, as with many DoS vulns, using it to deny critical
> resources as a way to simplify some other more critical attack, and it'll
> be in the headlines as an attack that worked and a zero-day that still
> works.
The part that makes this tricky is that the list ioctl can be considered
a potential information leak (as evidenced by the issue that started
this thread), so IMHO what really needs to happen is for the mount
option to be 'user_subvolume_ops', and control all three operations (or
better yet, do something with ACL's in the btrfs xattr namespace to
control it on a per-subvolume basis).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-10 17:04 ` Austin S. Hemmelgarn
@ 2016-03-10 19:35 ` Chris Murphy
2016-03-10 22:34 ` Liu Bo
2016-03-11 2:50 ` Duncan
0 siblings, 2 replies; 16+ messages in thread
From: Chris Murphy @ 2016-03-10 19:35 UTC (permalink / raw)
To: Btrfs BTRFS
On Thu, Mar 10, 2016 at 10:04 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
>
> The part that makes this tricky is that the list ioctl can be considered a
> potential information leak (as evidenced by the issue that started this
> thread), so IMHO what really needs to happen is for the mount option to be
> 'user_subvolume_ops', and control all three operations (or better yet, do
> something with ACL's in the btrfs xattr namespace to control it on a
> per-subvolume basis).
This may also interact with the selinux + Btrfs + Docker issue. The
problem is the desire to use -o context to mount a subvolume with a
specific context for use with a specific container. But right now the
kernel won't allow different contexts for a given fs superblock. The
work around until recently is disabling Docker selinux support. The
recent work around in Docker 1.10 is it snapshots the docker image, an
uses chcon -R to to relabel it. It's actually pretty fast, but still
suboptimal. Being able to bind mount a subvolume with -o context is
faster than relabeling, with many containers it's a lot of relabeling
without it.
It's a tricky problem. If you're the owner of a filesystem tree, but
something definitely not owned at all by you is buried in that tree
somewhere, to do a subvolume delete don't you have to now traverse the
entire thing to find out? Or does the owning user have sufficient
implied permission by owning the subvolume, that no matter what's in
it, is simply gone unless it's another subvolume?
--
Chris Murphy
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-10 19:35 ` Chris Murphy
@ 2016-03-10 22:34 ` Liu Bo
2016-03-11 2:50 ` Duncan
1 sibling, 0 replies; 16+ messages in thread
From: Liu Bo @ 2016-03-10 22:34 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Thu, Mar 10, 2016 at 12:35:31PM -0700, Chris Murphy wrote:
> On Thu, Mar 10, 2016 at 10:04 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
> >
> > The part that makes this tricky is that the list ioctl can be considered a
> > potential information leak (as evidenced by the issue that started this
> > thread), so IMHO what really needs to happen is for the mount option to be
> > 'user_subvolume_ops', and control all three operations (or better yet, do
> > something with ACL's in the btrfs xattr namespace to control it on a
> > per-subvolume basis).
>
> This may also interact with the selinux + Btrfs + Docker issue. The
> problem is the desire to use -o context to mount a subvolume with a
> specific context for use with a specific container. But right now the
> kernel won't allow different contexts for a given fs superblock. The
> work around until recently is disabling Docker selinux support. The
> recent work around in Docker 1.10 is it snapshots the docker image, an
> uses chcon -R to to relabel it. It's actually pretty fast, but still
> suboptimal. Being able to bind mount a subvolume with -o context is
> faster than relabeling, with many containers it's a lot of relabeling
> without it.
You're right, supporting mount a subvolume with -o context="xxx" is the
first choice, and I've made some progress on it[1], in fact it works well in
docker's senario, but not for others where we can have inode leak.
But with that still we have to deal with the problem of listing subvolumes
that shouldn't be seen.
[1]:
patch for btrfs:
https://github.com/liubogithub/btrfs-work/commit/00765203698d7e8a795d72488aefc9e19ab70b6e
patches for docker:
https://github.com/liubogithub/docker.git btrfsselinux
>
> It's a tricky problem. If you're the owner of a filesystem tree, but
> something definitely not owned at all by you is buried in that tree
> somewhere, to do a subvolume delete don't you have to now traverse the
> entire thing to find out? Or does the owning user have sufficient
> implied permission by owning the subvolume, that no matter what's in
> it, is simply gone unless it's another subvolume?
It can be ambiguous, I can only come up with ugly hacks..
Thanks,
-liubo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
2016-03-10 19:35 ` Chris Murphy
2016-03-10 22:34 ` Liu Bo
@ 2016-03-11 2:50 ` Duncan
1 sibling, 0 replies; 16+ messages in thread
From: Duncan @ 2016-03-11 2:50 UTC (permalink / raw)
To: linux-btrfs
Chris Murphy posted on Thu, 10 Mar 2016 12:35:31 -0700 as excerpted:
> It's a tricky problem. If you're the owner of a filesystem tree, but
> something definitely not owned at all by you is buried in that tree
> somewhere, to do a subvolume delete don't you have to now traverse the
> entire thing to find out? Or does the owning user have sufficient
> implied permission by owning the subvolume, that no matter what's in it,
> is simply gone unless it's another subvolume?
Well, to the extent that subvolumes aren't special, they're supposed to
behave like subdirs. So it seems simple enough to me, just use subdir
permissions semantics and don't worry about it.
Which means you can delete files you don't own located directly in a
directory (so subvol) you have write permissions to, even if you couldn't
write to the files themselves, but if those files are in turn nested in a
subdir you can't write to, then you can't delete the files (even if you
actually own the files and can write to them, meaning you could change
them, but not delete them), and thus can't delete the dir, which means
you can't delete its parent dir... or subvol.
At least, that's the way it /should/ work.
Unless of course subvolumes are documented to be special in that regard,
since subvolumes are allowed to differ in behavior from subdirs if that's
a specific feature of being a subvolume, in which case the documentation
can simply document the way it works, which can be arbitrarily defined as
convenient for the implementation if desired, and be done with it.
Of course, if subvolumes were /not/ declared to be special in that
regard, and thus were supposed to fit the subdir model, actually
implementing that would involve crawling the subdir tree checking the
ownership and permissions of all nested subdirs and subvols, as many
layers deep as it goes, and that would need done in the foreground,
before the deletion could return, which would tend to slow down the
subvolume delete implementation toward that of subdir delete. So of
speedy subvolume delete is to be retained, then it seems subvols need to
be defined to have special subvol behavior in this regard, that does NOT
follow subdir behavior, with that special behavior being defined as, if
you own the subvolume, you can delete it (and its children), regardless
of whether you'd have permissions to delete files in the subdirs beneath
it and thus couldn't delete it were it a regular subdir.
But that brings up the opposite question, or otherwise put, the same
question in the other direction, as well. Currently, users can create
subvols but not (ordinarily) delete them. Can they create those subvols
in directories they don't have write permissions in, and thus couldn't
create subdirs or files in? If so, that seems to be another security
issue waiting to be exploited. If not, then even if they have ownership
of the subvol itself, they shouldn't be able to delete it, if it's in a
subdir they don't have write permissions in and thus couldn't create it
in.
Every time this sort of subvolumes permissions issue comes up, I get a
(metaphorical) headache trying to sort out the permissions and thus
security implications, and end up glad I'm simply not dealing with them
here. Tho I can't do anything about the security implications if normal
users can create them, even if my policy is not to do so. Which does
have me somewhat worried, because that sort of problem is notorious for
its unforeseen security implications, and right now, because any user
(presumably with write permissions on any subdir on a btrfs) can create
subvolumes, that means anyone running btrfs is exposed to those
convoluted and likely unforeseen security issues.
Which is definitely another reason not to consider btrfs fully
stabilized, until that sort of thing gets sorted. Personally, I'd say
just require superuser privs (and/or appropriate filecaps and/or SEL
security labels) to create them as well, and avoid the whole problem.
Yes, it'll be limiting, but it's a limit that will avoid the entire
Pandora's box of permissions and security implications.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: btrfs and containers
@ 2016-03-11 3:55 Tomasz Chmielewski
0 siblings, 0 replies; 16+ messages in thread
From: Tomasz Chmielewski @ 2016-03-11 3:55 UTC (permalink / raw)
To: linux-btrfs; +Cc: tobias.hunger
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
>
> This works great: Snapshots are a huge help to manage containers!
>
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.
You seem to be running a privileged container, i.e. container's root is
the same UID as host root. This is typically undesired and means that
your containers have full access to data on host and on other
containers.
For the record, with a privileged container you can not only list the
subvolumes, but also list disk data (i.e. dd if=/dev/sda) or even
destroy that data (dd if=/dev/zero of = / dev / sda).
So, think twice if the container setup you have is what you want!
LXD is particularly easy to run unprivileged containers:
https://linuxcontainers.org/ (starts containers as unprivileged by
default, and has lots of many goodies in general).
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-03-11 3:55 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-07 22:55 btrfs and containers Tobias Hunger
2016-03-07 23:45 ` Chris Murphy
2016-03-08 19:58 ` Liu Bo
2016-03-08 21:28 ` Chris Murphy
2016-03-09 12:15 ` Austin S. Hemmelgarn
2016-03-10 2:55 ` Duncan
2016-03-10 17:04 ` Austin S. Hemmelgarn
2016-03-10 19:35 ` Chris Murphy
2016-03-10 22:34 ` Liu Bo
2016-03-11 2:50 ` Duncan
2016-03-08 12:12 ` Austin S. Hemmelgarn
2016-03-09 21:10 ` Marc MERLIN
2016-03-09 21:21 ` Chris Murphy
2016-03-09 21:45 ` Marc MERLIN
2016-03-09 23:28 ` Rich Freeman
-- strict thread matches above, loose matches on Subject: below --
2016-03-11 3:55 Tomasz Chmielewski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).