linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs and containers
@ 2016-03-07 22:55 Tobias Hunger
  2016-03-07 23:45 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Tobias Hunger @ 2016-03-07 22:55 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have been running systemd-nspawn containers on top of a btrfs
filesystem for a while now.

This works great: Snapshots are a huge help to manage containers!

But today I ran btrfs subvol list . *inside* a container. To my
surprise I got a list of *all* subvolumes on that drive. That is
basically a complete list of containers running on the machine. I do
not want to have that kind of information exposed to my containers.

Is there a way to stop btrfs from listing subvolumes "above" the
current location? So that "btrfs subvol list /" in a container will
only show subvolumes that are set up in the container?

Best Regards,
Tobias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-07 22:55 btrfs and containers Tobias Hunger
@ 2016-03-07 23:45 ` Chris Murphy
  2016-03-08 19:58   ` Liu Bo
  2016-03-08 12:12 ` Austin S. Hemmelgarn
  2016-03-09 21:10 ` Marc MERLIN
  2 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-07 23:45 UTC (permalink / raw)
  To: Tobias Hunger; +Cc: Btrfs BTRFS

On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
> Hi,
>
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
>
> This works great: Snapshots are a huge help to manage containers!
>
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.
>
> Is there a way to stop btrfs from listing subvolumes "above" the
> current location? So that "btrfs subvol list /" in a container will
> only show subvolumes that are set up in the container?

I'm not sure whether this is something that goes in Btrfs proper,
since this is presumably a privileged container? The same thing
happens with Docker containers. One way to do this is if it's not
privileged, as non-root can't list subvolumes. I think some work is
needed to make it possible for users to list subvolumes they own.
Right now a user can create a subvolume but then now list or get
information on it. By default they can't delete it either unless a
special mount option is used. So I think there's work that's needed
one way or another, and maybe in more than one part.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-07 22:55 btrfs and containers Tobias Hunger
  2016-03-07 23:45 ` Chris Murphy
@ 2016-03-08 12:12 ` Austin S. Hemmelgarn
  2016-03-09 21:10 ` Marc MERLIN
  2 siblings, 0 replies; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-08 12:12 UTC (permalink / raw)
  To: Tobias Hunger, linux-btrfs

On 2016-03-07 17:55, Tobias Hunger wrote:
> Hi,
>
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
>
> This works great: Snapshots are a huge help to manage containers!
>
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.
>
> Is there a way to stop btrfs from listing subvolumes "above" the
> current location? So that "btrfs subvol list /" in a container will
> only show subvolumes that are set up in the container?
>
There is not currently a way to do this.  My personal recommendation 
until there is would be to use LVM or something similar and have each 
container on it's own FS (this has other advantages too, like being able 
to use seed devices to quickly spin up containers in a known state.

Ideally though, we should be checking the current root directory when in 
a mount namespace, and not list subvolumes outside that tree.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-07 23:45 ` Chris Murphy
@ 2016-03-08 19:58   ` Liu Bo
  2016-03-08 21:28     ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: Liu Bo @ 2016-03-08 19:58 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Tobias Hunger, Btrfs BTRFS

On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
> > Hi,
> >
> > I have been running systemd-nspawn containers on top of a btrfs
> > filesystem for a while now.
> >
> > This works great: Snapshots are a huge help to manage containers!
> >
> > But today I ran btrfs subvol list . *inside* a container. To my
> > surprise I got a list of *all* subvolumes on that drive. That is
> > basically a complete list of containers running on the machine. I do
> > not want to have that kind of information exposed to my containers.
> >
> > Is there a way to stop btrfs from listing subvolumes "above" the
> > current location? So that "btrfs subvol list /" in a container will
> > only show subvolumes that are set up in the container?

That's a good question.

Looks like that "btrfs subvolume list -o" match the needs here.

> 
> I'm not sure whether this is something that goes in Btrfs proper,
> since this is presumably a privileged container? The same thing
> happens with Docker containers. One way to do this is if it's not
> privileged, as non-root can't list subvolumes. I think some work is
> needed to make it possible for users to list subvolumes they own.
> Right now a user can create a subvolume but then now list or get
> information on it. By default they can't delete it either unless a
> special mount option is used. So I think there's work that's needed
> one way or another, and maybe in more than one part.

Unfortunately, btrfs subvolume list 's various usage is built on top of TREE_SEARCH ioctl 
which requires CAP_SYS_ADMIN.

So what we need here might be to teach 'btrfs sub list' to recognize
container's CAP_SYS_XXX (if this is possible?) 

Thanks,

-liubo

> 
> -- 
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-08 19:58   ` Liu Bo
@ 2016-03-08 21:28     ` Chris Murphy
  2016-03-09 12:15       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-08 21:28 UTC (permalink / raw)
  To: bo.li.liu; +Cc: Chris Murphy, Tobias Hunger, Btrfs BTRFS

On Tue, Mar 8, 2016 at 12:58 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
>> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
>> > Hi,
>> >
>> > I have been running systemd-nspawn containers on top of a btrfs
>> > filesystem for a while now.
>> >
>> > This works great: Snapshots are a huge help to manage containers!
>> >
>> > But today I ran btrfs subvol list . *inside* a container. To my
>> > surprise I got a list of *all* subvolumes on that drive. That is
>> > basically a complete list of containers running on the machine. I do
>> > not want to have that kind of information exposed to my containers.
>> >
>> > Is there a way to stop btrfs from listing subvolumes "above" the
>> > current location? So that "btrfs subvol list /" in a container will
>> > only show subvolumes that are set up in the container?
>
> That's a good question.
>
> Looks like that "btrfs subvolume list -o" match the needs here.
>
>>
>> I'm not sure whether this is something that goes in Btrfs proper,
>> since this is presumably a privileged container? The same thing
>> happens with Docker containers. One way to do this is if it's not
>> privileged, as non-root can't list subvolumes. I think some work is
>> needed to make it possible for users to list subvolumes they own.
>> Right now a user can create a subvolume but then now list or get
>> information on it. By default they can't delete it either unless a
>> special mount option is used. So I think there's work that's needed
>> one way or another, and maybe in more than one part.
>
> Unfortunately, btrfs subvolume list 's various usage is built on top of TREE_SEARCH ioctl
> which requires CAP_SYS_ADMIN.
>
> So what we need here might be to teach 'btrfs sub list' to recognize
> container's CAP_SYS_XXX (if this is possible).


Yes, it's a bit peculiar I can create subvolumes and snapshot them,
but can't 'btrfs sub list/show'

It's an open question why the user needs a subvolume, but I'm not
thinking of a human user necessarily but rather some service, maybe
it's httpd. Or maybe with the xdg-app stuff the Gnome folks are
working on it makes sense to encapsulate applications and their
updates in their own subvolume. *shrug*  I'm open to the idea that the
use case needs to be more compelling and detailed in order to get the
implementation right.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-08 21:28     ` Chris Murphy
@ 2016-03-09 12:15       ` Austin S. Hemmelgarn
  2016-03-10  2:55         ` Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-09 12:15 UTC (permalink / raw)
  To: Chris Murphy, bo.li.liu; +Cc: Tobias Hunger, Btrfs BTRFS

On 2016-03-08 16:28, Chris Murphy wrote:
> On Tue, Mar 8, 2016 at 12:58 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> On Mon, Mar 07, 2016 at 04:45:09PM -0700, Chris Murphy wrote:
>>> On Mon, Mar 7, 2016 at 3:55 PM, Tobias Hunger <tobias.hunger@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I have been running systemd-nspawn containers on top of a btrfs
>>>> filesystem for a while now.
>>>>
>>>> This works great: Snapshots are a huge help to manage containers!
>>>>
>>>> But today I ran btrfs subvol list . *inside* a container. To my
>>>> surprise I got a list of *all* subvolumes on that drive. That is
>>>> basically a complete list of containers running on the machine. I do
>>>> not want to have that kind of information exposed to my containers.
>>>>
>>>> Is there a way to stop btrfs from listing subvolumes "above" the
>>>> current location? So that "btrfs subvol list /" in a container will
>>>> only show subvolumes that are set up in the container?
>>
>> That's a good question.
>>
>> Looks like that "btrfs subvolume list -o" match the needs here.
>>
>>>
>>> I'm not sure whether this is something that goes in Btrfs proper,
>>> since this is presumably a privileged container? The same thing
>>> happens with Docker containers. One way to do this is if it's not
>>> privileged, as non-root can't list subvolumes. I think some work is
>>> needed to make it possible for users to list subvolumes they own.
>>> Right now a user can create a subvolume but then now list or get
>>> information on it. By default they can't delete it either unless a
>>> special mount option is used. So I think there's work that's needed
>>> one way or another, and maybe in more than one part.
>>
>> Unfortunately, btrfs subvolume list 's various usage is built on top of TREE_SEARCH ioctl
>> which requires CAP_SYS_ADMIN.
>>
>> So what we need here might be to teach 'btrfs sub list' to recognize
>> container's CAP_SYS_XXX (if this is possible).
>
>
> Yes, it's a bit peculiar I can create subvolumes and snapshot them,
> but can't 'btrfs sub list/show'
>
> It's an open question why the user needs a subvolume, but I'm not
> thinking of a human user necessarily but rather some service, maybe
> it's httpd. Or maybe with the xdg-app stuff the Gnome folks are
> working on it makes sense to encapsulate applications and their
> updates in their own subvolume. *shrug*  I'm open to the idea that the
> use case needs to be more compelling and detailed in order to get the
> implementation right.
>
It's probably worth tossing out there that I use them on a regular basis 
as a normal user (not root or some service) for:
1. Local copies of VCS repositories.
2. Build directories.
3. Staging areas for a variety of things.
4. Specifically isolating certain parts of my home directory from backups.

1-3 are mostly because of the fact that deleting a subvolume is insanely 
fast compared to recursive deletion of a directory, although 4 is 
somewhat significant for those as well.

In general I can see them being useful for any number of things from a 
service perspective, although I feel that snapshots are likely more 
useful there (the ability to atomically save the state of a set of files 
is extremely useful for a lot of things).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-07 22:55 btrfs and containers Tobias Hunger
  2016-03-07 23:45 ` Chris Murphy
  2016-03-08 12:12 ` Austin S. Hemmelgarn
@ 2016-03-09 21:10 ` Marc MERLIN
  2016-03-09 21:21   ` Chris Murphy
  2 siblings, 1 reply; 16+ messages in thread
From: Marc MERLIN @ 2016-03-09 21:10 UTC (permalink / raw)
  To: Tobias Hunger; +Cc: linux-btrfs

On Mon, Mar 07, 2016 at 11:55:47PM +0100, Tobias Hunger wrote:
> Hi,
> 
> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
> 
> This works great: Snapshots are a huge help to manage containers!
> 
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.

I have a very stripped down docker image that actually mounts portion of
of my root filesystem read only.
While it's running out of a btrfs filesystem, you can't run btrfs
commands against it:
05233e5c91f0:/# btrfs fi show
05233e5c91f0:/# btrfs subvol list /
ERROR: can't perform the search - Operation not permitted
05233e5c91f0:/# btrfs subvol list .
ERROR: can't perform the search - Operation not permitted

I didn't do anything special, it's just working that way.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-09 21:10 ` Marc MERLIN
@ 2016-03-09 21:21   ` Chris Murphy
  2016-03-09 21:45     ` Marc MERLIN
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Murphy @ 2016-03-09 21:21 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Tobias Hunger, Btrfs BTRFS

On Wed, Mar 9, 2016 at 2:10 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Mon, Mar 07, 2016 at 11:55:47PM +0100, Tobias Hunger wrote:
>> Hi,
>>
>> I have been running systemd-nspawn containers on top of a btrfs
>> filesystem for a while now.
>>
>> This works great: Snapshots are a huge help to manage containers!
>>
>> But today I ran btrfs subvol list . *inside* a container. To my
>> surprise I got a list of *all* subvolumes on that drive. That is
>> basically a complete list of containers running on the machine. I do
>> not want to have that kind of information exposed to my containers.
>
> I have a very stripped down docker image that actually mounts portion of
> of my root filesystem read only.
> While it's running out of a btrfs filesystem, you can't run btrfs
> commands against it:
> 05233e5c91f0:/# btrfs fi show
> 05233e5c91f0:/# btrfs subvol list /
> ERROR: can't perform the search - Operation not permitted
> 05233e5c91f0:/# btrfs subvol list .
> ERROR: can't perform the search - Operation not permitted
>
> I didn't do anything special, it's just working that way.

Yep, you're not using --privileged in which case you can't list
things. But I'm not sure what the equivalent is off hand with
systemd-nspawn containers, I think those may always be privileged?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-09 21:21   ` Chris Murphy
@ 2016-03-09 21:45     ` Marc MERLIN
  2016-03-09 23:28       ` Rich Freeman
  0 siblings, 1 reply; 16+ messages in thread
From: Marc MERLIN @ 2016-03-09 21:45 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Tobias Hunger, Btrfs BTRFS

On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote:
> > I have a very stripped down docker image that actually mounts portion of
> > of my root filesystem read only.
> > While it's running out of a btrfs filesystem, you can't run btrfs
> > commands against it:
> > 05233e5c91f0:/# btrfs fi show
> > 05233e5c91f0:/# btrfs subvol list /
> > ERROR: can't perform the search - Operation not permitted
> > 05233e5c91f0:/# btrfs subvol list .
> > ERROR: can't perform the search - Operation not permitted
> >
> > I didn't do anything special, it's just working that way.
> 
> Yep, you're not using --privileged in which case you can't list
> things. But I'm not sure what the equivalent is off hand with
> systemd-nspawn containers, I think those may always be privileged?

Ok, cool. I just used docker out of the box, glad to know it errs on
the secure side by default.
(and I don't have systemd, so that may also help me there)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-09 21:45     ` Marc MERLIN
@ 2016-03-09 23:28       ` Rich Freeman
  0 siblings, 0 replies; 16+ messages in thread
From: Rich Freeman @ 2016-03-09 23:28 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Chris Murphy, Tobias Hunger, Btrfs BTRFS

On Wed, Mar 9, 2016 at 4:45 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote:
>> > I have a very stripped down docker image that actually mounts portion of
>> > of my root filesystem read only.
>> > While it's running out of a btrfs filesystem, you can't run btrfs
>> > commands against it:
>> > 05233e5c91f0:/# btrfs fi show
>> > 05233e5c91f0:/# btrfs subvol list /
>> > ERROR: can't perform the search - Operation not permitted
>> > 05233e5c91f0:/# btrfs subvol list .
>> > ERROR: can't perform the search - Operation not permitted
>> >
>> > I didn't do anything special, it's just working that way.
>>
>> Yep, you're not using --privileged in which case you can't list
>> things. But I'm not sure what the equivalent is off hand with
>> systemd-nspawn containers, I think those may always be privileged?
>
> Ok, cool. I just used docker out of the box, glad to know it errs on
> the secure side by default.
> (and I don't have systemd, so that may also help me there)
>

I'm sure the default capability list for systemd-nspawn and docker is
different.  I know that you can tune nspawn to give the container
whatever capabilities you want it to.  In general though a general
warning is that linux containers are still not quite 100% secure when
root is running inside.  Obviously the fewer capabilities you give
them the better, but the level of isolation isn't quite to VM levels.
It is better than chroot levels, however.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-09 12:15       ` Austin S. Hemmelgarn
@ 2016-03-10  2:55         ` Duncan
  2016-03-10 17:04           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 16+ messages in thread
From: Duncan @ 2016-03-10  2:55 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Wed, 09 Mar 2016 07:15:36 -0500 as
excerpted:

> On 2016-03-08 16:28, Chris Murphy wrote:

>> Yes, it's a bit peculiar I can create subvolumes and snapshot them, but
>> can't 'btrfs sub list/show'
>>
>> It's an open question why the user needs a subvolume, but I'm not
>> thinking of a human user necessarily but rather some service, maybe
>> it's httpd. Or maybe with the xdg-app stuff the Gnome folks are working
>> on it makes sense to encapsulate applications and their updates in
>> their own subvolume. *shrug*  I'm open to the idea that the use case
>> needs to be more compelling and detailed in order to get the
>> implementation right.
>>
> It's probably worth tossing out there that I use them on a regular basis
> as a normal user (not root or some service) for:
> 1. Local copies of VCS repositories.
> 2. Build directories.
> 3. Staging areas for a variety of things.
> 4. Specifically isolating certain parts of my home directory from
> backups.
> 
> 1-3 are mostly because of the fact that deleting a subvolume is insanely
> fast compared to recursive deletion of a directory, although 4 is
> somewhat significant for those as well.

For #2 and possibly #3, depending on what's being staged and why, tmpfs 
works well, and deleting should be even faster (AFAIK, subvolume deletion 
returns immediately but the work continues in the background, so if 
you're running other IO-bound jobs they'll still be affected even tho the 
subvolume deletion command has returned... if it's all in memory as is 
tmpfs, that problem's eliminated too), tho of course you need enough 
memory so that tmpfs doesn't trigger swap-thrashing.

But #1 and #4 of course don't work as well on tmpfs as you'll likely want 
them around longer, and all four cases definitely make use of the the 
fact that nested subvolumes wall off snapshotting and thus btrfs send, 
for backup purposes.  And of course if you're on a limited-memory machine 
and thus can't easily use tmpfs for building and other staging, and don't 
need to care about the ongoing background IO, using subvolumes for #2 and 
3 remains useful, as well.

> In general I can see them being useful for any number of things from a
> service perspective, although I feel that snapshots are likely more
> useful there (the ability to atomically save the state of a set of files
> is extremely useful for a lot of things).

I consider the current situation somewhat of a security (DoS) issue, 
since users (or runaway scripts or malware) can create unlimited 
subvolumes as an ordinary user, with that user then not being able to 
delete them, requiring admin intervention to do so.  Of course as long as 
it's a single-human-user with an admin-rights alter-ego login, it's not 
/that/ much of a security issue, but I could see it being one for human 
users who do not have that admin-rights alter-ego login.  So were I to be 
running in such a situation, I'd probably use the mount option to let the 
users delete their own subvolumes, unless of course that opens up other 
security issues I'm not aware of.

IMO before btrfs can really be considered stable, this possible DoS needs 
resolved by making the list/delete set the exact same as the create set, 
either by giving users some way to deal with (only) their own subvolumes 
just as they can their own directories, or by reserving subvolume 
creation to superuser, because that's what's needed for listing and 
deletion.  Because if not, I fear someone's going to take advantage of it 
in some way, perhaps, as with many DoS vulns, using it to deny critical 
resources as a way to simplify some other more critical attack, and it'll 
be in the headlines as an attack that worked and a zero-day that still 
works.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-10  2:55         ` Duncan
@ 2016-03-10 17:04           ` Austin S. Hemmelgarn
  2016-03-10 19:35             ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-10 17:04 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 2016-03-09 21:55, Duncan wrote:
> Austin S. Hemmelgarn posted on Wed, 09 Mar 2016 07:15:36 -0500 as
> excerpted:
>
>> On 2016-03-08 16:28, Chris Murphy wrote:
>
>>> Yes, it's a bit peculiar I can create subvolumes and snapshot them, but
>>> can't 'btrfs sub list/show'
>>>
>>> It's an open question why the user needs a subvolume, but I'm not
>>> thinking of a human user necessarily but rather some service, maybe
>>> it's httpd. Or maybe with the xdg-app stuff the Gnome folks are working
>>> on it makes sense to encapsulate applications and their updates in
>>> their own subvolume. *shrug*  I'm open to the idea that the use case
>>> needs to be more compelling and detailed in order to get the
>>> implementation right.
>>>
>> It's probably worth tossing out there that I use them on a regular basis
>> as a normal user (not root or some service) for:
>> 1. Local copies of VCS repositories.
>> 2. Build directories.
>> 3. Staging areas for a variety of things.
>> 4. Specifically isolating certain parts of my home directory from
>> backups.
>>
>> 1-3 are mostly because of the fact that deleting a subvolume is insanely
>> fast compared to recursive deletion of a directory, although 4 is
>> somewhat significant for those as well.
>
> For #2 and possibly #3, depending on what's being staged and why, tmpfs
> works well, and deleting should be even faster (AFAIK, subvolume deletion
> returns immediately but the work continues in the background, so if
> you're running other IO-bound jobs they'll still be affected even tho the
> subvolume deletion command has returned... if it's all in memory as is
> tmpfs, that problem's eliminated too), tho of course you need enough
> memory so that tmpfs doesn't trigger swap-thrashing.
Yeah, most of the time I use subvolumes for item 2 or 3 it's either 
dealing with stuff that I specifically want persistent across reboots 
(for example, the build directory I keep in /usr/src for the kernel, or 
staging directories for audio recordings), or things that are big enough 
I really want to avoid the memory consumption from working on tmpfs (as 
of right now, the only package I have installed on any of my systems 
that fits this is LLVM/clang, I used to do this for some other software 
like LibreOffice, webkit-gtk, and icedtea as well though).
>
> But #1 and #4 of course don't work as well on tmpfs as you'll likely want
> them around longer, and all four cases definitely make use of the the
> fact that nested subvolumes wall off snapshotting and thus btrfs send,
> for backup purposes.  And of course if you're on a limited-memory machine
> and thus can't easily use tmpfs for building and other staging, and don't
> need to care about the ongoing background IO, using subvolumes for #2 and
> 3 remains useful, as well.
>
>> In general I can see them being useful for any number of things from a
>> service perspective, although I feel that snapshots are likely more
>> useful there (the ability to atomically save the state of a set of files
>> is extremely useful for a lot of things).
>
> I consider the current situation somewhat of a security (DoS) issue,
> since users (or runaway scripts or malware) can create unlimited
> subvolumes as an ordinary user, with that user then not being able to
> delete them, requiring admin intervention to do so.  Of course as long as
> it's a single-human-user with an admin-rights alter-ego login, it's not
> /that/ much of a security issue, but I could see it being one for human
> users who do not have that admin-rights alter-ego login.  So were I to be
> running in such a situation, I'd probably use the mount option to let the
> users delete their own subvolumes, unless of course that opens up other
> security issues I'm not aware of.
>
> IMO before btrfs can really be considered stable, this possible DoS needs
> resolved by making the list/delete set the exact same as the create set,
> either by giving users some way to deal with (only) their own subvolumes
> just as they can their own directories, or by reserving subvolume
> creation to superuser, because that's what's needed for listing and
> deletion.  Because if not, I fear someone's going to take advantage of it
> in some way, perhaps, as with many DoS vulns, using it to deny critical
> resources as a way to simplify some other more critical attack, and it'll
> be in the headlines as an attack that worked and a zero-day that still
> works.
The part that makes this tricky is that the list ioctl can be considered 
a potential information leak (as evidenced by the issue that started 
this thread), so IMHO what really needs to happen is for the mount 
option to be 'user_subvolume_ops', and control all three operations (or 
better yet, do something with ACL's in the btrfs xattr namespace to 
control it on a per-subvolume basis).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-10 17:04           ` Austin S. Hemmelgarn
@ 2016-03-10 19:35             ` Chris Murphy
  2016-03-10 22:34               ` Liu Bo
  2016-03-11  2:50               ` Duncan
  0 siblings, 2 replies; 16+ messages in thread
From: Chris Murphy @ 2016-03-10 19:35 UTC (permalink / raw)
  To: Btrfs BTRFS

On Thu, Mar 10, 2016 at 10:04 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

>
> The part that makes this tricky is that the list ioctl can be considered a
> potential information leak (as evidenced by the issue that started this
> thread), so IMHO what really needs to happen is for the mount option to be
> 'user_subvolume_ops', and control all three operations (or better yet, do
> something with ACL's in the btrfs xattr namespace to control it on a
> per-subvolume basis).

This may also interact with the selinux + Btrfs + Docker issue. The
problem is the desire to use -o context to mount a subvolume with a
specific context for use with a specific container. But right now the
kernel won't allow different contexts for a given fs superblock. The
work around until recently is disabling Docker selinux support. The
recent work around in Docker 1.10 is it snapshots the docker image, an
uses chcon -R to to relabel it. It's actually pretty fast, but still
suboptimal. Being able to bind mount a subvolume with -o context is
faster than relabeling, with many containers it's a lot of relabeling
without it.

It's a tricky problem. If you're the owner of a filesystem tree, but
something definitely not owned at all by you is buried in that tree
somewhere, to do a subvolume delete don't you have to now traverse the
entire thing to find out? Or does the owning user have sufficient
implied permission by owning the subvolume, that no matter what's in
it, is simply gone unless it's another subvolume?




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-10 19:35             ` Chris Murphy
@ 2016-03-10 22:34               ` Liu Bo
  2016-03-11  2:50               ` Duncan
  1 sibling, 0 replies; 16+ messages in thread
From: Liu Bo @ 2016-03-10 22:34 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Mar 10, 2016 at 12:35:31PM -0700, Chris Murphy wrote:
> On Thu, Mar 10, 2016 at 10:04 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
> >
> > The part that makes this tricky is that the list ioctl can be considered a
> > potential information leak (as evidenced by the issue that started this
> > thread), so IMHO what really needs to happen is for the mount option to be
> > 'user_subvolume_ops', and control all three operations (or better yet, do
> > something with ACL's in the btrfs xattr namespace to control it on a
> > per-subvolume basis).
> 
> This may also interact with the selinux + Btrfs + Docker issue. The
> problem is the desire to use -o context to mount a subvolume with a
> specific context for use with a specific container. But right now the
> kernel won't allow different contexts for a given fs superblock. The
> work around until recently is disabling Docker selinux support. The
> recent work around in Docker 1.10 is it snapshots the docker image, an
> uses chcon -R to to relabel it. It's actually pretty fast, but still
> suboptimal. Being able to bind mount a subvolume with -o context is
> faster than relabeling, with many containers it's a lot of relabeling
> without it.

You're right, supporting mount a subvolume with -o context="xxx" is the
first choice, and I've made some progress on it[1], in fact it works well in
docker's senario, but not for others where we can have inode leak.

But with that still we have to deal with the problem of listing subvolumes
that shouldn't be seen.

[1]:
patch for btrfs:
https://github.com/liubogithub/btrfs-work/commit/00765203698d7e8a795d72488aefc9e19ab70b6e
patches for docker:
https://github.com/liubogithub/docker.git btrfsselinux

> 
> It's a tricky problem. If you're the owner of a filesystem tree, but
> something definitely not owned at all by you is buried in that tree
> somewhere, to do a subvolume delete don't you have to now traverse the
> entire thing to find out? Or does the owning user have sufficient
> implied permission by owning the subvolume, that no matter what's in
> it, is simply gone unless it's another subvolume?

It can be ambiguous, I can only come up with ugly hacks..

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
  2016-03-10 19:35             ` Chris Murphy
  2016-03-10 22:34               ` Liu Bo
@ 2016-03-11  2:50               ` Duncan
  1 sibling, 0 replies; 16+ messages in thread
From: Duncan @ 2016-03-11  2:50 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 10 Mar 2016 12:35:31 -0700 as excerpted:

> It's a tricky problem. If you're the owner of a filesystem tree, but
> something definitely not owned at all by you is buried in that tree
> somewhere, to do a subvolume delete don't you have to now traverse the
> entire thing to find out? Or does the owning user have sufficient
> implied permission by owning the subvolume, that no matter what's in it,
> is simply gone unless it's another subvolume?

Well, to the extent that subvolumes aren't special, they're supposed to 
behave like subdirs.  So it seems simple enough to me, just use subdir 
permissions semantics and don't worry about it.

Which means you can delete files you don't own located directly in a 
directory (so subvol) you have write permissions to, even if you couldn't 
write to the files themselves, but if those files are in turn nested in a 
subdir you can't write to, then you can't delete the files (even if you 
actually own the files and can write to them, meaning you could change 
them, but not delete them), and thus can't delete the dir, which means 
you can't delete its parent dir... or subvol.

At least, that's the way it /should/ work.

Unless of course subvolumes are documented to be special in that regard, 
since subvolumes are allowed to differ in behavior from subdirs if that's 
a specific feature of being a subvolume, in which case the documentation 
can simply document the way it works, which can be arbitrarily defined as 
convenient for the implementation if desired, and be done with it.

Of course, if subvolumes were /not/ declared to be special in that 
regard, and thus were supposed to fit the subdir model, actually 
implementing that would involve crawling the subdir tree checking the 
ownership and permissions of all nested subdirs and subvols, as many 
layers deep as it goes, and that would need done in the foreground, 
before the deletion could return, which would tend to slow down the 
subvolume delete implementation toward that of subdir delete.  So of 
speedy subvolume delete is to be retained, then it seems subvols need to 
be defined to have special subvol behavior in this regard, that does NOT 
follow subdir behavior, with that special behavior being defined as, if 
you own the subvolume, you can delete it (and its children), regardless 
of whether you'd have permissions to delete files in the subdirs beneath 
it and thus couldn't delete it were it a regular subdir.


But that brings up the opposite question, or otherwise put, the same 
question in the other direction, as well.  Currently, users can create 
subvols but not (ordinarily) delete them.  Can they create those subvols 
in directories they don't have write permissions in, and thus couldn't 
create subdirs or files in?  If so, that seems to be another security 
issue waiting to be exploited.  If not, then even if they have ownership 
of the subvol itself, they shouldn't be able to delete it, if it's in a 
subdir they don't have write permissions in and thus couldn't create it 
in.


Every time this sort of subvolumes permissions issue comes up, I get a 
(metaphorical) headache trying to sort out the permissions and thus 
security implications, and end up glad I'm simply not dealing with them 
here.  Tho I can't do anything about the security implications if normal 
users can create them, even if my policy is not to do so.  Which does 
have me somewhat worried, because that sort of problem is notorious for 
its unforeseen security implications, and right now, because any user 
(presumably with write permissions on any subdir on a btrfs) can create 
subvolumes, that means anyone running btrfs is exposed to those 
convoluted and likely unforeseen security issues.

Which is definitely another reason not to consider btrfs fully 
stabilized, until that sort of thing gets sorted.  Personally, I'd say 
just require superuser privs (and/or appropriate filecaps and/or SEL 
security labels) to create them as well, and avoid the whole problem.  
Yes, it'll be limiting, but it's a limit that will avoid the entire 
Pandora's box of permissions and security implications.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs and containers
@ 2016-03-11  3:55 Tomasz Chmielewski
  0 siblings, 0 replies; 16+ messages in thread
From: Tomasz Chmielewski @ 2016-03-11  3:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: tobias.hunger

> I have been running systemd-nspawn containers on top of a btrfs
> filesystem for a while now.
> 
> This works great: Snapshots are a huge help to manage containers!
> 
> But today I ran btrfs subvol list . *inside* a container. To my
> surprise I got a list of *all* subvolumes on that drive. That is
> basically a complete list of containers running on the machine. I do
> not want to have that kind of information exposed to my containers.

You seem to be running a privileged container, i.e. container's root is 
the same UID as host root. This is typically undesired and means that 
your containers have full access to data on host and on other 
containers.

For the record, with a privileged container you can not only list the 
subvolumes, but also list disk data (i.e. dd if=/dev/sda) or even 
destroy that data (dd if=/dev/zero of = / dev / sda).

So, think twice if the container setup you have is what you want!

LXD is particularly easy to run unprivileged containers: 
https://linuxcontainers.org/ (starts containers as unprivileged by 
default, and has lots of many goodies in general).


Tomasz Chmielewski
http://wpkg.org



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-11  3:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-07 22:55 btrfs and containers Tobias Hunger
2016-03-07 23:45 ` Chris Murphy
2016-03-08 19:58   ` Liu Bo
2016-03-08 21:28     ` Chris Murphy
2016-03-09 12:15       ` Austin S. Hemmelgarn
2016-03-10  2:55         ` Duncan
2016-03-10 17:04           ` Austin S. Hemmelgarn
2016-03-10 19:35             ` Chris Murphy
2016-03-10 22:34               ` Liu Bo
2016-03-11  2:50               ` Duncan
2016-03-08 12:12 ` Austin S. Hemmelgarn
2016-03-09 21:10 ` Marc MERLIN
2016-03-09 21:21   ` Chris Murphy
2016-03-09 21:45     ` Marc MERLIN
2016-03-09 23:28       ` Rich Freeman
  -- strict thread matches above, loose matches on Subject: below --
2016-03-11  3:55 Tomasz Chmielewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).