A Plumber’s Wish List for Linux

All of lore.kernel.org
 help / color / mirror / Atom feed

* A Plumber’s Wish List for Linux
@ 2011-10-06 23:17 Kay Sievers
  2011-10-06 23:46 ` Andi Kleen
                   ` (7 more replies)
  0 siblings, 8 replies; 52+ messages in thread
From: Kay Sievers @ 2011-10-06 23:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: lennart, harald, david, greg

We’d like to share our current wish list of plumbing layer features we
are hoping to see implemented in the near future in the Linux kernel and
associated tools. Some items we can implement on our own, others are not
our area of expertise, and we will need help getting them implemented.

Acknowledging that this wish list of ours only gets longer and not
shorter, even though we have implemented a number of other features on
our own in the previous years, we are posting this list here, in the
hope to find some help.

If you happen to be interested in working on something from this list or
able to help out, we’d be delighted. Please ping us in case you need
clarifications or more information on specific items.

Thanks,
Kay, Lennart, Harald, in the name of all the other plumbers

An here’s the wish list, in no particular order:

* (ioctl based?) interface to query and modify the label of a mounted
FAT volume:
A FAT labels is implemented as a hidden directory entry in the file
system which need to be renamed when changing the file system label,
this is impossible to do from userspace without unmounting. Hence we’d
like to see a kernel interface that is available on the mounted file
system mount point itself. Of course, bonus points if this new interface
can be implemented for other file systems as well, and also covers fs
UUIDs in addition to labels.

* CPU modaliases in /sys/devices/system/cpu/cpuX/modalias:
useful to allow module auto-loading of e.g. cpufreq drivers and KVM
modules. Andy Kleen has a patch to create the alias file itself. CPU
‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct
bus_type cpu’ needs to be introduced to allow proper CPU coldplug event
replay at bootup. This is one of the last remaining places where
automatic hardware-triggered module auto-loading is not available. And
we’d like to see that fix to make numerous ugly userspace work-arounds
to achieve the same go away.

* expose CAP_LAST_CAP somehow in the running kernel at runtime:
Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from header files
only. The fact that this value cannot be detected properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels. They assume capabilities are available
which actually aren’t. Specifically, libcap-ng claims that all running
processes retain the higher capabilities in this case due to the
“inverted” semantics of CapBnd in /proc/$PID/status.

* export ‘struct device_type fb/fbcon’ of ‘struct class graphics’
Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other
without the need to match on the device name.

* allow changing argv[] of a process without mucking with environ[]:
Something like setproctitle() or a prctl() would be ideal. Of course it
is questionable if services like sendmail make use of this, but otoh for
services which fork but do not immediately exec() another binary being
able to rename this child processes in ps is of importance.

* module-init-tools: provide a proper libmodprobe.so from
module-init-tools:
Early boot tools, installers, driver install disks want to access
information about available modules to optimize bootup handling.

* fork throttling mechanism as basic cgroup functionality that is
available in all hierarchies independent of the controllers used:
This is important to implement race-free killing of all members of a
cgroup, so that cgroup member processes cannot fork faster then a cgroup
supervisor process could kill them. This needs to be recursive, so that
not only a cgroup but all its subgroups are covered as well.

* proper cgroup-is-empty notification interface:
The current call_usermodehelper() interface is an unefficient and an
ugly hack. Tools would prefer anything more lightweight like a netlink,
poll() or fanotify interface.

* allow user xattrs to be set on files in the cgroupfs (and maybe
procfs?)

* simple, reliable and future-proof way to detect whether a specific pid
is running in a CLONE_NEWPID container, i.e. not in the root PID
namespace. Currently, there are available a few ugly hacks to detect
this (for example a process wanting to know whether it is running in a
PID namespace could just look for a PID 2 being around and named
kthreadd which is a kernel thread only visible in the root namespace),
however all these solutions encode information and expectations that
better shouldn’t be encoded in a namespace test like this. This
functionality is needed in particular since the removal of the the ns
cgroup controller which provided the namespace membership information to
user code.

* allow making use of the “cpu” cgroup controller by default without
breaking RT. Right now creating a cgroup in the “cpu” hierarchy that
shall be able to take advantage of RT is impossible for the generic case
since it needs an RT budget configured which is from a limited resource
pool. What we want is the ability to create cgroups in “cpu” whose
processes get an non-RT weight applied, but for RT take advantage of the
parent’s RT budget. We want the separation of RT and non-RT budget
assignment in the “cpu” hierarchy, because right now, you lose RT
functionality in it unless you assign an RT budget. This issue severely
limits the usefulness of “cpu” hierarchy on general purpose systems
right now.

* Add a timerslack cgroup controller, to allow increasing the timer
slack of user session cgroups when the machine is idle.

* An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or
something like that), i.e. a way to attach sender cgroup membership to
messages sent via AF_UNIX. This is useful in case services such as
syslog shall be shared among various containers (or service cgroups),
and the syslog implementation needs to be able to distinguish the
sending cgroup in order to separate the logs on disk. Of course stm
SCM_CREDENTIALS can be used to look up the PID of the sender followed by
a check in /proc/$PID/cgroup, but that is necessarily racy, and actually
a very real race in real life.

* SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary
control message should carry the process name as available
in /proc/$PID/comm.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
@ 2011-10-06 23:46 ` Andi Kleen
  2011-10-07  0:13   ` Lennart Poettering
  2011-10-07  7:49 ` Matt Helsley
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2011-10-06 23:46 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, lennart, harald, david, greg

Kay Sievers <kay.sievers@vrfy.org> writes:
>
> * allow changing argv[] of a process without mucking with environ[]:
> Something like setproctitle() or a prctl() would be ideal. Of course
> it

prctl(PR_SET_NAME, ...)

The only problem is that some programs still use argv[] and get the old
name, but at least it works in "top"

> * An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or
> something like that), i.e. a way to attach sender cgroup membership to
> messages sent via AF_UNIX.

The problem is: this requires a reference count and these reference
counts can be very expensive. We had the same problem with pid
namespaces ruining AF_UNIX performance in some cases.

It can be probably done, but one would need to be very careful
about scalability issues.


> * SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary
> control message should carry the process name as available
> in /proc/$PID/comm.

That sounds super racy. No guarantee at all this is unique and useful
for anything and everyone can change it.

The other ideas mostly sound reasonable to me, but I haven't thought
a lot about their details and implications.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:46 ` Andi Kleen
@ 2011-10-07  0:13   ` Lennart Poettering
  2011-10-07  1:57     ` Andi Kleen
  2011-10-19 23:16     ` H. Peter Anvin
  0 siblings, 2 replies; 52+ messages in thread
From: Lennart Poettering @ 2011-10-07  0:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kay Sievers, linux-kernel, harald, david, greg

On Thu, 06.10.11 16:46, Andi Kleen (andi@firstfloor.org) wrote:

> 
> Kay Sievers <kay.sievers@vrfy.org> writes:
> >
> > * allow changing argv[] of a process without mucking with environ[]:
> > Something like setproctitle() or a prctl() would be ideal. Of course
> > it
> 
> prctl(PR_SET_NAME, ...)
> 
> The only problem is that some programs still use argv[] and get the old
> name, but at least it works in "top"

Well, I am aware of PR_SET_NAME, but that modifies comm, not argv[]. And
while "top" indeed shows the former, "ps" shows the latter. We are looking
for a way to nice way to modify argv[] without having to reuse space
from environ[] like most current Linux implementations of
setproctitle() do.

A while back there were patches for PR_SET_PROCTITLE_AREA floating
around. We'd like to see something like that merged one day.

> > * SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary
> > control message should carry the process name as available
> > in /proc/$PID/comm.
> 
> That sounds super racy. No guarantee at all this is unique and useful
> for anything and everyone can change it.

Well, it's interesting in the syslog case, and it's OK if people can
change it. What matters is that this information is available simply for
the informational value. Right now, if one combines SCM_CREDENTIALS and
/proc/$PID/comm you often end up with no information about the senders
name at all, since at the time you try to read comm the PID might
actually not exist anymore at all. We are simply trying to close this
particular race between receiving SCM_CREDENTIALS and reading
/proc/$PID/comm here, we are not looking for a way to make process names
trusted.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07  0:13   ` Lennart Poettering
@ 2011-10-07  1:57     ` Andi Kleen
  2011-10-07 15:58       ` Lennart Poettering
  2011-10-19 23:16     ` H. Peter Anvin
  1 sibling, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2011-10-07  1:57 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Andi Kleen, Kay Sievers, linux-kernel, harald, david, greg

> Well, I am aware of PR_SET_NAME, but that modifies comm, not argv[]. And
> while "top" indeed shows the former, "ps" shows the latter. We are looking
> for a way to nice way to modify argv[] without having to reuse space
> from environ[] like most current Linux implementations of
> setproctitle() do.

It's not clear to me how the kernel could change argv[] any better than you 
could in user space.

> Well, it's interesting in the syslog case, and it's OK if people can
> change it. What matters is that this information is available simply for
> the informational value. Right now, if one combines SCM_CREDENTIALS and
> /proc/$PID/comm you often end up with no information about the senders
> name at all, since at the time you try to read comm the PID might
> actually not exist anymore at all. We are simply trying to close this
> particular race between receiving SCM_CREDENTIALS and reading
> /proc/$PID/comm here, we are not looking for a way to make process names
> trusted.

The issue with all of these proposals is that the sender currently doesn't
know if the receiver needs it. Thus it always has to put it in and you
slow down the fast paths.

e.g. consider

sender sends packet
                                     receiver enables funky option
                                     receiver reads

If it was done lazily you would lose.

Also there are usually various complications with namespaces.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
  2011-10-06 23:46 ` Andi Kleen
@ 2011-10-07  7:49 ` Matt Helsley
  2011-10-07 16:01   ` Lennart Poettering
  2011-10-07 10:12 ` Alan Cox
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Matt Helsley @ 2011-10-07  7:49 UTC (permalink / raw)
  To: Kay Sievers
  Cc: linux-kernel, lennart, harald, david, greg,
	Biederman Eric Biederman

On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:

<snip>

> * simple, reliable and future-proof way to detect whether a specific pid
> is running in a CLONE_NEWPID container, i.e. not in the root PID
> namespace. Currently, there are available a few ugly hacks to detect

Is that precisely what's needed or would it be sufficient to know
that the pid is running in a child pid namespace of the current pid
namespace? If so, I think this could eventually be done by comparing
the inode numbers assigned to /proc/<pid>/ns/pid to those of
/proc/1/ns/pid.

> * Add a timerslack cgroup controller, to allow increasing the timer
> slack of user session cgroups when the machine is idle.

There were patches for a timerslack cgroup controller but for some
reason (I don't recall why) they stalled. It might be worth digging
through the containers mailing list archives.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
  2011-10-06 23:46 ` Andi Kleen
  2011-10-07  7:49 ` Matt Helsley
@ 2011-10-07 10:12 ` Alan Cox
  2011-10-07 10:28   ` Kay Sievers
  2011-10-07 12:35 ` Vivek Goyal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Alan Cox @ 2011-10-07 10:12 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, lennart, harald, david, greg

> * (ioctl based?) interface to query and modify the label of a mounted
> FAT volume:

Seems sensible - or it could go in sysfs ?

> A FAT labels is implemented as a hidden directory entry in the file
> system which need to be renamed when changing the file system label,

That would be ugly - it works for FAT as you can create an imaginary name
which is not possible on the fs, but that isn't true for say ext4. Sysfs
sounds the logic way, it means adding chunks of code to various file
systems.

> * expose CAP_LAST_CAP somehow in the running kernel at runtime:
> Userspace needs to know the highest valid capability of the running
> kernel, which right now cannot reliably be retrieved from header files
> only. The fact that this value cannot be detected properly right now
> creates various problems for libraries compiled on newer header files
> which are run on older kernels. They assume capabilities are available
> which actually aren’t. Specifically, libcap-ng claims that all running
> processes retain the higher capabilities in this case due to the
> “inverted” semantics of CapBnd in /proc/$PID/status.

You can probably deduce this by poking around but to me it seems like a
very sensible idea.

> * allow changing argv[] of a process without mucking with environ[]:
> Something like setproctitle() or a prctl() would be ideal. Of course it
> is questionable if services like sendmail make use of this, but otoh for
> services which fork but do not immediately exec() another binary being
> able to rename this child processes in ps is of importance.

Yes, its a real valuable tool for r00tkits, worms and general purpose
deception.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 10:12 ` Alan Cox
@ 2011-10-07 10:28   ` Kay Sievers
  2011-10-07 10:38     ` Alan Cox
  0 siblings, 1 reply; 52+ messages in thread
From: Kay Sievers @ 2011-10-07 10:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, lennart, harald, david, greg

On Fri, Oct 7, 2011 at 12:12, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> * (ioctl based?) interface to query and modify the label of a mounted
>> FAT volume:
>
> Seems sensible - or it could go in sysfs ?

That would mean to export superblocks in /sys, which isn't namespaced,
and which might create issues by making information globally available
which probably shouldn't?

>> A FAT labels is implemented as a hidden directory entry in the file
>> system which need to be renamed when changing the file system label,
>
> That would be ugly - it works for FAT as you can create an imaginary name
> which is not possible on the fs, but that isn't true for say ext4. Sysfs
> sounds the logic way, it means adding chunks of code to various file
> systems.

What do you mean would be ugly?

>> * expose CAP_LAST_CAP somehow in the running kernel at runtime:
>> Userspace needs to know the highest valid capability of the running
>> kernel, which right now cannot reliably be retrieved from header files
>> only. The fact that this value cannot be detected properly right now
>> creates various problems for libraries compiled on newer header files
>> which are run on older kernels. They assume capabilities are available
>> which actually aren’t. Specifically, libcap-ng claims that all running
>> processes retain the higher capabilities in this case due to the
>> “inverted” semantics of CapBnd in /proc/$PID/status.
>
> You can probably deduce this by poking around but to me it seems like a
> very sensible idea.
>
>> * allow changing argv[] of a process without mucking with environ[]:
>> Something like setproctitle() or a prctl() would be ideal. Of course it
>> is questionable if services like sendmail make use of this, but otoh for
>> services which fork but do not immediately exec() another binary being
>> able to rename this child processes in ps is of importance.
>
> Yes, its a real valuable tool for r00tkits, worms and general purpose
> deception.

They can do that already today.  The code to do that just looks really
ugly. So the r00tkits could have nicer looking code. :)

Thanks,
Kay

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 10:28   ` Kay Sievers
@ 2011-10-07 10:38     ` Alan Cox
  2011-10-07 12:46       ` Kay Sievers
  2011-10-07 16:07       ` Valdis.Kletnieks
  0 siblings, 2 replies; 52+ messages in thread
From: Alan Cox @ 2011-10-07 10:38 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, lennart, harald, david, greg

On Fri, 7 Oct 2011 12:28:46 +0200
Kay Sievers <kay.sievers@vrfy.org> wrote:

> On Fri, Oct 7, 2011 at 12:12, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> >> * (ioctl based?) interface to query and modify the label of a mounted
> >> FAT volume:
> >
> > Seems sensible - or it could go in sysfs ?
> 
> That would mean to export superblocks in /sys, which isn't namespaced,
> and which might create issues by making information globally available
> which probably shouldn't?

Possibly, otherwise you really need an ioctl on the root inode of the fs
- which is doable, NCPfs makes heavy use of that.
> 
> >> A FAT labels is implemented as a hidden directory entry in the file
> >> system which need to be renamed when changing the file system label,
> >
> > That would be ugly - it works for FAT as you can create an imaginary name
> > which is not possible on the fs, but that isn't true for say ext4. Sysfs
> > sounds the logic way, it means adding chunks of code to various file
> > systems.
> 
> What do you mean would be ugly?

I have an ext4fs. It supports every possible file name allowed by POSIX
and SuS. What name are you going to use for your 'hidden directory' that
won't clash with a real file ?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
                   ` (2 preceding siblings ...)
  2011-10-07 10:12 ` Alan Cox
@ 2011-10-07 12:35 ` Vivek Goyal
  2011-10-07 18:59 ` Greg KH
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Vivek Goyal @ 2011-10-07 12:35 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, lennart, harald, david, greg

On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:

[..]
> * fork throttling mechanism as basic cgroup functionality that is
> available in all hierarchies independent of the controllers used:
> This is important to implement race-free killing of all members of a
> cgroup, so that cgroup member processes cannot fork faster then a cgroup
> supervisor process could kill them. This needs to be recursive, so that
> not only a cgroup but all its subgroups are covered as well.

Above should make sense for "freezer" controller too. That will allow us
reliable dynamic migration of tasks in a cgroup by first freezing them,
then change the cgroup and then unfreeze.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 10:38     ` Alan Cox
@ 2011-10-07 12:46       ` Kay Sievers
  2011-10-07 13:39         ` Theodore Tso
                           ` (2 more replies)
  2011-10-07 16:07       ` Valdis.Kletnieks
  1 sibling, 3 replies; 52+ messages in thread
From: Kay Sievers @ 2011-10-07 12:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, lennart, harald, david, greg

[]sorry, need to resend. I tried to reply with the cell phone but it bounces]

On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:
>
>> What do you mean would be ugly?
>
> I have an ext4fs. It supports every possible file name allowed by POSIX
> and SuS. What name are you going to use for your 'hidden directory' that
> won't clash with a real file ?

Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
root dir list, not a real file in the root dir.

We need kernel support for changing a mounted fs, because, unlike
ext4, the blocks containing the strings are inside the fs, which the
kernel might change any time.

Kay

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 12:46       ` Kay Sievers
@ 2011-10-07 13:39         ` Theodore Tso
  2011-10-07 15:21         ` Hugo Mills
  2011-10-08  9:53         ` A Plumber’s " Bastien ROUCARIES
  2 siblings, 0 replies; 52+ messages in thread
From: Theodore Tso @ 2011-10-07 13:39 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Alan Cox, linux-kernel, lennart, harald, david, greg

On Oct 7, 2011, at 8:46 AM, Kay Sievers wrote:
> On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:
>>> What do you mean would be ugly?
>> 
>> I have an ext4fs. It supports every possible file name allowed by POSIX
>> and SuS. What name are you going to use for your 'hidden directory' that
>> won't clash with a real file ?
> 
> Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
> root dir list, not a real file in the root dir.
> 
> We need kernel support for changing a mounted fs, because, unlike
> ext4, the blocks containing the strings are inside the fs, which the
> kernel might change any time.

I'd suggest a syscall, not an ioctl, and if a file system has some limitation on what is a valid name (even ext4 has length limitations which might be different from other file systems), we just simply return an error if it's not a valid label name.

As it turns out I went to great lengths in both the kernel and userspace implementations of e2label/tune2fs to make sure it would be safe to directly edit the superblock while the file system is mounted, but that depends on implementation details of the buffer cache in the kernel.  Better to have a formally supported interface which is file system independent.

-- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
       [not found] <CAE2SPAZci=u__d58phePCftVr_e+i+N2YU-JYjGDG_b3TmYTSQ@mail.gmail.com>
@ 2011-10-07 13:40 ` Alan Cox
  2011-10-07 14:57   ` Alexander E. Patrakov
  0 siblings, 1 reply; 52+ messages in thread
From: Alan Cox @ 2011-10-07 13:40 UTC (permalink / raw)
  To: Bastien ROUCARIES; +Cc: Kay Sievers, david, greg, lennart, linux-kernel, harald

On Fri, 7 Oct 2011 15:09:16 +0200
Bastien ROUCARIES <roucaries.bastien@gmail.com> wrote:

> For fat a special xattr for root inode ?

If it's as Kay says a specific magic part of the directory and we need
this just as a fixup for FAT and NTFS then probably an ioctl on it will
do the job nicely. Sometimes stretching existing API's in semi-sane ways
actually gets to produce worse special cases (like tar restoring the
volume label by accident depending upon its settings)

Alan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 13:40 ` A Plumber’s Wish List for Linux Alan Cox
@ 2011-10-07 14:57   ` Alexander E. Patrakov
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander E. Patrakov @ 2011-10-07 14:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Bastien ROUCARIES, Kay Sievers, david, greg, lennart, harald

07.10.2011 19:40, Alan Cox пишет:
> On Fri, 7 Oct 2011 15:09:16 +0200
> Bastien ROUCARIES<roucaries.bastien@gmail.com>  wrote:
>
>> For fat a special xattr for root inode ?
>
> If it's as Kay says a specific magic part of the directory and we need
> this just as a fixup for FAT and NTFS then probably an ioctl on it will
> do the job nicely. Sometimes stretching existing API's in semi-sane ways
> actually gets to produce worse special cases (like tar restoring the
> volume label by accident depending upon its settings)

I'd say that we also need to consider EXFAT which is available only for 
FUSE, and the fact that the FUSE-based NTFS driver has more features 
than the kernel driver. And, frankly speaking, I don't think that FAT 
belongs to the kernel at all. So any proposed solution has to be 
extensible enough to also cover FUSE.

-- 
Alexander E. Patrakov


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 12:46       ` Kay Sievers
  2011-10-07 13:39         ` Theodore Tso
@ 2011-10-07 15:21         ` Hugo Mills
  2011-10-10 11:18             ` David Sterba
  2011-10-08  9:53         ` A Plumber’s " Bastien ROUCARIES
  2 siblings, 1 reply; 52+ messages in thread
From: Hugo Mills @ 2011-10-07 15:21 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Alan Cox, linux-kernel, lennart, harald, david, greg, Chris Mason,
	Btrfs mailing list

[-- Attachment #1: Type: text/plain, Size: 1755 bytes --]

On Fri, Oct 07, 2011 at 02:46:23PM +0200, Kay Sievers wrote:
> On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:
> >
> >> What do you mean would be ugly?
> >
> > I have an ext4fs. It supports every possible file name allowed by POSIX
> > and SuS. What name are you going to use for your 'hidden directory' that
> > won't clash with a real file ?
> 
> Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
> root dir list, not a real file in the root dir.
> 
> We need kernel support for changing a mounted fs, because, unlike
> ext4, the blocks containing the strings are inside the fs, which the
> kernel might change any time.

   It's worth noting that there are similar issues with btrfs around
changing label. A common API for it would make sense. The only btrfs
patches I've seen to change label after mkfs-time work either as:

 * unmounted only, single underlying device only, pure userspace
   implementation
 * mounted only, multiple underlying devices, kernel support needed

   The kernel-side patches never got integrated, so we're still unable
to change the label on the majority of btrfs filesystems.

   Changing the UUID for the filesystem is even harder, as I think
it's written to every metadata block. I'm not sure we can do that
sanely on a mounted filesystem.

   Hugo (just a spear-carrier from the btrfs chorus).

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
    --- Anyone using a computer to generate random numbers is, of ---    
                       course,  in a state of sin.                       

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07  1:57     ` Andi Kleen
@ 2011-10-07 15:58       ` Lennart Poettering
  0 siblings, 0 replies; 52+ messages in thread
From: Lennart Poettering @ 2011-10-07 15:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Kay Sievers, linux-kernel, harald, david, greg

On Fri, 07.10.11 03:57, Andi Kleen (andi@firstfloor.org) wrote:

> 
> > Well, I am aware of PR_SET_NAME, but that modifies comm, not argv[]. And
> > while "top" indeed shows the former, "ps" shows the latter. We are looking
> > for a way to nice way to modify argv[] without having to reuse space
> > from environ[] like most current Linux implementations of
> > setproctitle() do.
> 
> It's not clear to me how the kernel could change argv[] any better than you 
> could in user space.

Well, it can resize the argv[] buffer, which we can't right now in
userspace. See those PR_SET_PROCTITLE_AREA.

> > Well, it's interesting in the syslog case, and it's OK if people can
> > change it. What matters is that this information is available simply for
> > the informational value. Right now, if one combines SCM_CREDENTIALS and
> > /proc/$PID/comm you often end up with no information about the senders
> > name at all, since at the time you try to read comm the PID might
> > actually not exist anymore at all. We are simply trying to close this
> > particular race between receiving SCM_CREDENTIALS and reading
> > /proc/$PID/comm here, we are not looking for a way to make process names
> > trusted.
> 
> The issue with all of these proposals is that the sender currently doesn't
> know if the receiver needs it. Thus it always has to put it in and you
> slow down the fast paths.
> 
> e.g. consider
> 
> sender sends packet
>                                      receiver enables funky option
>                                      receiver reads
> 
> If it was done lazily you would lose.

Would you? I think it's OK if messages queued before the sockopt is
enabled do not carry the SCM_COMM/SCM_CGROUPS data, even if they are
dequeued after the sockopt. At least I wouldn't expect them to
necessarily have the data, and this is probably just a matter of
documentation, i.e. say in the man page explicitly that the control data
will only be attached to newly queued messages. Given that
SCM_COMM/SCM_CGROUPS is a completely new API anyway this should not
create any compatibility problems.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07  7:49 ` Matt Helsley
@ 2011-10-07 16:01   ` Lennart Poettering
  2011-10-08  4:24     ` Eric W. Biederman
  0 siblings, 1 reply; 52+ messages in thread
From: Lennart Poettering @ 2011-10-07 16:01 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Kay Sievers, linux-kernel, harald, david, greg,
	Biederman Eric Biederman

On Fri, 07.10.11 00:49, Matt Helsley (matthltc@us.ibm.com) wrote:

> 
> On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:
> 
> <snip>
> 
> > * simple, reliable and future-proof way to detect whether a specific pid
> > is running in a CLONE_NEWPID container, i.e. not in the root PID
> > namespace. Currently, there are available a few ugly hacks to detect
> 
> Is that precisely what's needed or would it be sufficient to know
> that the pid is running in a child pid namespace of the current pid
> namespace? If so, I think this could eventually be done by comparing
> the inode numbers assigned to /proc/<pid>/ns/pid to those of
> /proc/1/ns/pid.

I think the most interesting test would be to figure out for a process
if itself is running in a PID namespace. And for that comparing inodes
wouldn't work since the namespace process would never get access to the
inode of the outside init.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 10:38     ` Alan Cox
  2011-10-07 12:46       ` Kay Sievers
@ 2011-10-07 16:07       ` Valdis.Kletnieks
  1 sibling, 0 replies; 52+ messages in thread
From: Valdis.Kletnieks @ 2011-10-07 16:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Kay Sievers, linux-kernel, lennart, harald, david, greg

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

On Fri, 07 Oct 2011 11:38:20 BST, Alan Cox said:

> > What do you mean would be ugly?
> 
> I have an ext4fs. It supports every possible file name allowed by POSIX
> and SuS. What name are you going to use for your 'hidden directory' that
> won't clash with a real file ?

ext4 could always use an attribute bit for that.  Not that *that* solution is all that
much prettier, since you can't use it for filesystems that don't have attribute bits.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
                   ` (3 preceding siblings ...)
  2011-10-07 12:35 ` Vivek Goyal
@ 2011-10-07 18:59 ` Greg KH
  2011-10-09 12:20   ` Kay Sievers
  2011-10-09  8:45 ` Rusty Russell
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 52+ messages in thread
From: Greg KH @ 2011-10-07 18:59 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, lennart, harald, david

On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:
> * CPU modaliases in /sys/devices/system/cpu/cpuX/modalias:
> useful to allow module auto-loading of e.g. cpufreq drivers and KVM
> modules. Andy Kleen has a patch to create the alias file itself. CPU
> ‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct
> bus_type cpu’ needs to be introduced to allow proper CPU coldplug event
> replay at bootup. This is one of the last remaining places where
> automatic hardware-triggered module auto-loading is not available. And
> we’d like to see that fix to make numerous ugly userspace work-arounds
> to achieve the same go away.

I need to get off my ass and fix this properly, now that Rafael has done
all of the hard work for sysdev already.  Thanks for reminding me.

> * export ‘struct device_type fb/fbcon’ of ‘struct class graphics’
> Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other
> without the need to match on the device name.

Can't we just export a "type" file for the device for these devices?
Is it really just that simple?

> * module-init-tools: provide a proper libmodprobe.so from
> module-init-tools:
> Early boot tools, installers, driver install disks want to access
> information about available modules to optimize bootup handling.

What information do they want to know?

> * allow user xattrs to be set on files in the cgroupfs (and maybe
> procfs?)

This shouldn't be that difficult, right?

Thanks for the list, much appreciated.

greg k-h

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 16:01   ` Lennart Poettering
@ 2011-10-08  4:24     ` Eric W. Biederman
  2011-10-10 16:31       ` Lennart Poettering
  0 siblings, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2011-10-08  4:24 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Matt Helsley, Kay Sievers, linux-kernel, harald, david, greg

Lennart Poettering <mzxreary@0pointer.de> writes:

> On Fri, 07.10.11 00:49, Matt Helsley (matthltc@us.ibm.com) wrote:
>
>> 
>> On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:
>> 
>> <snip>
>> 
>> > * simple, reliable and future-proof way to detect whether a specific pid
>> > is running in a CLONE_NEWPID container, i.e. not in the root PID
>> > namespace. Currently, there are available a few ugly hacks to detect
>> 
>> Is that precisely what's needed or would it be sufficient to know
>> that the pid is running in a child pid namespace of the current pid
>> namespace? If so, I think this could eventually be done by comparing
>> the inode numbers assigned to /proc/<pid>/ns/pid to those of
>> /proc/1/ns/pid.
>
> I think the most interesting test would be to figure out for a process
> if itself is running in a PID namespace. And for that comparing inodes
> wouldn't work since the namespace process would never get access to the
> inode of the outside init.

Strictly correct answer.  All processes are running in a pid namespace.
I think we can implement that in a libc header.

static inline bool in_pid_namespace(void)
{
        return true;
}

Why does it matter if you are running in something other than the
initial pid namespace?  I expect what you are really after is something
else entirely, and you are asking the wrong question.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 12:46       ` Kay Sievers
  2011-10-07 13:39         ` Theodore Tso
  2011-10-07 15:21         ` Hugo Mills
@ 2011-10-08  9:53         ` Bastien ROUCARIES
  2011-10-09  3:15           ` Alex Elsayed
  2 siblings, 1 reply; 52+ messages in thread
From: Bastien ROUCARIES @ 2011-10-08  9:53 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Alan Cox, linux-kernel, lennart, harald, david, greg

On Fri, Oct 7, 2011 at 2:46 PM, Kay Sievers <kay.sievers@vrfy.org> wrote:
> []sorry, need to resend. I tried to reply with the cell phone but it bounces]
>
> On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:
>>
>>> What do you mean would be ugly?
>>
>> I have an ext4fs. It supports every possible file name allowed by POSIX
>> and SuS. What name are you going to use for your 'hidden directory' that
>> won't clash with a real file ?
>
> Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
> root dir list, not a real file in the root dir.

Why not using a special xattr namespace ?

Bastien
> We need kernel support for changing a mounted fs, because, unlike
> ext4, the blocks containing the strings are inside the fs, which the
> kernel might change any time.
>
> Kay
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-08  9:53         ` A Plumber’s " Bastien ROUCARIES
@ 2011-10-09  3:15           ` Alex Elsayed
  0 siblings, 0 replies; 52+ messages in thread
From: Alex Elsayed @ 2011-10-09  3:15 UTC (permalink / raw)
  To: linux-kernel

Bastien ROUCARIES <roucaries.bastien <at> gmail.com> writes:

> 
> On Fri, Oct 7, 2011 at 2:46 PM, Kay Sievers <kay.sievers <at> vrfy.org> wrote:
> > On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan <at> lxorguk.ukuu.org.uk> wrote:
> >> On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers <at> vrfy.org>
wrote:
> >>> What do you mean would be ugly?
> >>
> >> I have an ext4fs. It supports every possible file name allowed by POSIX
> >> and SuS. What name are you going to use for your 'hidden directory' that
> >> won't clash with a real file ?
> >
> > Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
> > root dir list, not a real file in the root dir.
> 
> Why not using a special xattr namespace ?
> 
> Bastien

All of you are completely misconstruing what was said. He was NOT suggesting
magic entries as an interface to change the label. He was noting that the FAT
filesystem IMPLEMENTS its labels as a magic entry, which cannot be safely
altered from userspace on a mounted FS, necessitating help from the kernel.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
                   ` (4 preceding siblings ...)
  2011-10-07 18:59 ` Greg KH
@ 2011-10-09  8:45 ` Rusty Russell
  2011-10-11 23:16 ` Andrew Morton
  2011-10-19 21:12 ` Paul Menage
  7 siblings, 0 replies; 52+ messages in thread
From: Rusty Russell @ 2011-10-09  8:45 UTC (permalink / raw)
  To: Kay Sievers, linux-kernel; +Cc: lennart, harald, david, greg, Jon Masters

On Fri, 07 Oct 2011 01:17:02 +0200, Kay Sievers <kay.sievers@vrfy.org> wrote:
> * module-init-tools: provide a proper libmodprobe.so from
> module-init-tools:
> Early boot tools, installers, driver install disks want to access
> information about available modules to optimize bootup handling.

That's a bit too vague for my limited experience and/or lack of
imagination: what exactly do they want?  And why?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07 18:59 ` Greg KH
@ 2011-10-09 12:20   ` Kay Sievers
  0 siblings, 0 replies; 52+ messages in thread
From: Kay Sievers @ 2011-10-09 12:20 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, lennart, harald, david

On Fri, Oct 7, 2011 at 20:59, Greg KH <greg@kroah.com> wrote:
> On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:
>> * CPU modaliases in /sys/devices/system/cpu/cpuX/modalias:

> I need to get off my ass and fix this properly, now that Rafael has done
> all of the hard work for sysdev already.  Thanks for reminding me.
>
>> * export ‘struct device_type fb/fbcon’ of ‘struct class graphics’
>> Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other
>> without the need to match on the device name.
>
> Can't we just export a "type" file for the device for these devices?
> Is it really just that simple?

Yeah, it's just adding a 'struct device_type' with a 'name =
"fb/fccon" and DEVTYPE= will appear as a property at the device. So
much for getting off my ass. :)

>> * module-init-tools: provide a proper libmodprobe.so from
>> module-init-tools:
>> Early boot tools, installers, driver install disks want to access
>> information about available modules to optimize bootup handling.
>
> What information do they want to know?

Resolve the alias database that 'depmod' has created from inside any
process. Udev wants to avoid calling ~60 modprobes per bootup for a
bunch of device types like USB-hubs which will never have driver to be
loaded (optimization). Also the installer and module-update tools
sometimes want to query the list of things to load before running all
the magic asynchronously (less hacks).

In general, the command-line-tool-style of doing complexer system
software does not really fit any more into the way we need to do
things today. We need proper libraries in the background that can be
used by whatever thing needs the information, and the same tools we
have already should just be users of their own libraries. We a strict
separation of policy and mechanics. Other users should be able the use
the 'mechanics' of a tool, without executing any 'policy'.

>> * allow user xattrs to be set on files in the cgroupfs (and maybe
>> procfs?)
>
> This shouldn't be that difficult, right?

It shouldn't. We just need to be careful here what to export, when to
use it, and not to create problems and information leaks for
namespaces, which might re-use some of the mount points.

Kay

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-07 15:21         ` Hugo Mills
@ 2011-10-10 11:18             ` David Sterba
  0 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2011-10-10 11:18 UTC (permalink / raw)
  To: Hugo Mills, Kay Sievers, Alan Cox, linux-kernel, lennart, harald

On Fri, Oct 07, 2011 at 04:21:37PM +0100, Hugo Mills wrote:
> On Fri, Oct 07, 2011 at 02:46:23PM +0200, Kay Sievers wrote:
> > On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > > On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:
> > >
> > >> What do you mean would be ugly?
> > >
> > > I have an ext4fs. It supports every possible file name allowed by POSIX
> > > and SuS. What name are you going to use for your 'hidden directory' that
> > > won't clash with a real file ?
> > 
> > Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
> > root dir list, not a real file in the root dir.
> > 
> > We need kernel support for changing a mounted fs, because, unlike
> > ext4, the blocks containing the strings are inside the fs, which the
> > kernel might change any time.
> 
>    It's worth noting that there are similar issues with btrfs around
> changing label. A common API for it would make sense. The only btrfs
> patches I've seen to change label after mkfs-time work either as:
> 
>  * unmounted only, single underlying device only, pure userspace
>    implementation
>  * mounted only, multiple underlying devices, kernel support needed
> 
>    The kernel-side patches never got integrated, so we're still unable
> to change the label on the majority of btrfs filesystems.
> 
>    Changing the UUID for the filesystem is even harder, as I think
> it's written to every metadata block. I'm not sure we can do that
> sanely on a mounted filesystem.

http://marc.info/?l=linux-btrfs&m=131161949201880&w=2

"Resetting the UUID on btrfs isn't a quick-and-easy thing - you have to
walk the entire tree and change every object. We've got a bad-hack in
meego that uses btrfs-debug-tree and changes the UUID while it runs
the entire tree, but it's ugly as hell."

That's on an unmoutned fs. Doing it on a mounted one seems more
complicated wrt to the intermediate state when there are some blocks
with the old and some block wit the new UUID. The operation will take
long and I don't know if it's better do to do it in batches (and
follow usual rules for commiting a transaction every now and then), or
in one go (requires: no failures, no scrub run, no devices
added/removed). Counting all potential problems and practical
unusability of the FS during UUID change, the off-line approach seems a
better way to go.


david

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
@ 2011-10-10 11:18             ` David Sterba
  0 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2011-10-10 11:18 UTC (permalink / raw)
  To: Hugo Mills, Kay Sievers, Alan Cox, linux-kernel, lennart, harald,
	david, greg, Chris Mason, Btrfs mailing list

On Fri, Oct 07, 2011 at 04:21:37PM +0100, Hugo Mills wrote:
> On Fri, Oct 07, 2011 at 02:46:23PM +0200, Kay Sievers wrote:
> > On Fri, Oct 7, 2011 at 12:38, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > > On Fri, 7 Oct 2011 12:28:46 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:
> > >
> > >> What do you mean would be ugly?
> > >
> > > I have an ext4fs. It supports every possible file name allowed by POSIX
> > > and SuS. What name are you going to use for your 'hidden directory' that
> > > won't clash with a real file ?
> > 
> > Ah, no. The label on FAT (similar on NTFS) are 'magic entries' in the
> > root dir list, not a real file in the root dir.
> > 
> > We need kernel support for changing a mounted fs, because, unlike
> > ext4, the blocks containing the strings are inside the fs, which the
> > kernel might change any time.
> 
>    It's worth noting that there are similar issues with btrfs around
> changing label. A common API for it would make sense. The only btrfs
> patches I've seen to change label after mkfs-time work either as:
> 
>  * unmounted only, single underlying device only, pure userspace
>    implementation
>  * mounted only, multiple underlying devices, kernel support needed
> 
>    The kernel-side patches never got integrated, so we're still unable
> to change the label on the majority of btrfs filesystems.
> 
>    Changing the UUID for the filesystem is even harder, as I think
> it's written to every metadata block. I'm not sure we can do that
> sanely on a mounted filesystem.

http://marc.info/?l=linux-btrfs&m=131161949201880&w=2

"Resetting the UUID on btrfs isn't a quick-and-easy thing - you have to
walk the entire tree and change every object. We've got a bad-hack in
meego that uses btrfs-debug-tree and changes the UUID while it runs
the entire tree, but it's ugly as hell."

That's on an unmoutned fs. Doing it on a mounted one seems more
complicated wrt to the intermediate state when there are some blocks
with the old and some block wit the new UUID. The operation will take
long and I don't know if it's better do to do it in batches (and
follow usual rules for commiting a transaction every now and then), or
in one go (requires: no failures, no scrub run, no devices
added/removed). Counting all potential problems and practical
unusability of the FS during UUID change, the off-line approach seems a
better way to go.


david

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-10 11:18             ` David Sterba
  (?)
@ 2011-10-10 13:09             ` Theodore Tso
  2011-10-13  0:28               ` Dave Chinner
  -1 siblings, 1 reply; 52+ messages in thread
From: Theodore Tso @ 2011-10-10 13:09 UTC (permalink / raw)
  To: dave
  Cc: Theodore Tso, Hugo Mills, Kay Sievers, Alan Cox, linux-kernel,
	lennart, harald, david, greg, Chris Mason, Btrfs mailing list

On Oct 10, 2011, at 7:18 AM, David Sterba wrote:

> "Resetting the UUID on btrfs isn't a quick-and-easy thing - you have to
> walk the entire tree and change every object. We've got a bad-hack in
> meego that uses btrfs-debug-tree and changes the UUID while it runs
> the entire tree, but it's ugly as hell."

Changing the UUID is going to be harder for ext4 as well, once we integrate metadata checksums.   So while it makes sense to have on-line ways of updating labels for mounted file systems it probably makes muchness sense to support it for UUIDs.

I suspect what it means in practice is that it will be useful for file systems to provide fs image copying tools that also generate a new UUID while you're at it, for use by IT administrators and embedded systems manufacturers.

-- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-08  4:24     ` Eric W. Biederman
@ 2011-10-10 16:31       ` Lennart Poettering
  0 siblings, 0 replies; 52+ messages in thread
From: Lennart Poettering @ 2011-10-10 16:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Matt Helsley, Kay Sievers, linux-kernel, harald, david, greg

On Fri, 07.10.11 21:24, Eric W. Biederman (ebiederm@xmission.com) wrote:

> 
> Lennart Poettering <mzxreary@0pointer.de> writes:
> 
> > On Fri, 07.10.11 00:49, Matt Helsley (matthltc@us.ibm.com) wrote:
> >
> >> 
> >> On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:
> >> 
> >> <snip>
> >> 
> >> > * simple, reliable and future-proof way to detect whether a specific pid
> >> > is running in a CLONE_NEWPID container, i.e. not in the root PID
> >> > namespace. Currently, there are available a few ugly hacks to detect
> >> 
> >> Is that precisely what's needed or would it be sufficient to know
> >> that the pid is running in a child pid namespace of the current pid
> >> namespace? If so, I think this could eventually be done by comparing
> >> the inode numbers assigned to /proc/<pid>/ns/pid to those of
> >> /proc/1/ns/pid.
> >
> > I think the most interesting test would be to figure out for a process
> > if itself is running in a PID namespace. And for that comparing inodes
> > wouldn't work since the namespace process would never get access to the
> > inode of the outside init.
> 
> Strictly correct answer.  All processes are running in a pid namespace.
> I think we can implement that in a libc header.
> 
> static inline bool in_pid_namespace(void)
> {
>         return true;
> }
> 
> Why does it matter if you are running in something other than the
> initial pid namespace?  I expect what you are really after is something
> else entirely, and you are asking the wrong question.

Well, all other virtualization solutions are easily detectable via CPUID
leaf 0x1, bit 31, and via DMI and some other ways. However, for Linux
containers there is no nice way to detect them.

VMs are pretty good at providing a comprehensive emulation of real
machines, and distributions running in them usually do not need
information whether they are running in a VM or not. This is very
different though for containers: Quite a few kernel subsystems are
currently not virtualized, for example SELinux, VTs, most of sysfs, most
of /proc/sys, audit, udev or file systems (by which I mean that for a
container you probably don't want to fsck the root fs, and so on), and
containers tend to be much more lightweight than real systems.

To make a standard distribution run nicely in a Linux container you
usually have to make quite a number of modifications to it and disable
certain things from the boot process. Ideally however, one could simply
boot the same image on a real machine and in a container and would just
do the right thing, fully stateless. And for that you need to be able to
detect containers, and currently you can't.

Of course, in 10 years or so containers might be much more complete then
they are right now, and virtualize all subsystems I listed above and
maybe a ton more, but that's 10y for now, and for now to make things
work as cleanly as possible it would be immensly helpful if containers
could be detectable in a nice way.

Of course, in many case there are nicer ways to shortcut the init jobs
on a container. For example, instead of bypassing root fsck in a
container it makes a lot more sense to simply say: bypass root fsck if
the root fs is already writable. And there's more like that. But at the
end of the day you always want to be able to bind certain things to the
fact that you are running in a container, if you want things to "just
work". And I believe that must be the goal. 

I am pretty sure that having a way to detect execution in a container is
a minimum requirement to get general purpose distribution makers to
officially support and care for execution in container environments. As
you are a container guy I am sure that would be very much in your
interest.

And note that I am only interested in detecting CLONE_NEWPID, not the
other namespaces. CLONE_NEWPID is the core namespace technology that
turns a container into a container, so that's all that's needed.

And yes, CLONE_NEWPID can be useful for other purposes then just
containers as well. However, that doesn't really matter for my usecase
as mentioned above: becuase if you run an init system in CLONE_NEWPID
namespace, then that's what I call a container, and the init system
should have all rights to detect that.

The root PID namespace is different from all other namespaces btw,
already in the fact that the the kernel threads are part of it, but not
the other namespaces.

Finally, note that it prevously has been very easy to detect execution
in a container, simple by checking the "ns" cgroup hierarchy. (i.e. look
whether the path in /proc/self/cgroup for "ns" wasn't "/" and you knew
you were in a container). systemd made use of that and since very early
on we supported container boots. The removal of "ns" broke systemd in
that regard. Now, I don't want "ns" back, and I am not going to make the
big hubbub out of the fact that you guys broke userspace that way. But
what I do like to see made available again is a sane way to detect
execution in a container environment, i.e. a way for a process to detect
whether it is running in the root CLONE_NEWPID namespace.

Thanks,

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-10 11:18             ` David Sterba
  (?)
  (?)
@ 2011-10-11 13:14             ` Serge E. Hallyn
  2011-10-11 15:49               ` Andrew G. Morgan
  -1 siblings, 1 reply; 52+ messages in thread
From: Serge E. Hallyn @ 2011-10-11 13:14 UTC (permalink / raw)
  To: Kay Sievers, Alan Cox, linux-kernel, lennart, harald, david, greg,
	Andrew Morgan, KaiGai Kohei

Unfortunately I'd deleted the early part of this thread before noticing
the mention on lwn+lkml.org, but fwiw detection of the last supported
capability has been brought up before (with patchsets floated (By KaiGai
I'm pretty sure) which exported the list of capabilities through /sys or
/security), and I agree it's something we need.

-serge

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-11 13:14             ` Serge E. Hallyn
@ 2011-10-11 15:49               ` Andrew G. Morgan
  2011-10-12  2:31                 ` Serge E. Hallyn
  2011-10-12 20:51                 ` Lennart Poettering
  0 siblings, 2 replies; 52+ messages in thread
From: Andrew G. Morgan @ 2011-10-11 15:49 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Kay Sievers, Alan Cox, linux-kernel, lennart, harald, david, greg,
	KaiGai Kohei

The benefit of Kai Gai's patch was that it exported the actual names
of the capabilities rather than have them only stored in libcap.

It is possible to use CAP_IS_SUPPORTED(cap) (in libcap-2.21) to figure
out the maximum capability supported by the running kernel.

  https://sites.google.com/site/fullycapable/release-notes-for-libcap

Cheers

Andrew

On Tue, Oct 11, 2011 at 6:14 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
>
> Unfortunately I'd deleted the early part of this thread before noticing
> the mention on lwn+lkml.org, but fwiw detection of the last supported
> capability has been brought up before (with patchsets floated (By KaiGai
> I'm pretty sure) which exported the list of capabilities through /sys or
> /security), and I agree it's something we need.
>
> -serge
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
                   ` (5 preceding siblings ...)
  2011-10-09  8:45 ` Rusty Russell
@ 2011-10-11 23:16 ` Andrew Morton
  2011-10-12  0:53   ` Frederic Weisbecker
  2011-10-12  0:59   ` Frederic Weisbecker
  2011-10-19 21:12 ` Paul Menage
  7 siblings, 2 replies; 52+ messages in thread
From: Andrew Morton @ 2011-10-11 23:16 UTC (permalink / raw)
  To: Kay Sievers
  Cc: linux-kernel, lennart, harald, david, greg, Kirill A. Shutemov,
	Frederic Weisbecker


Useful email, thanks.

On Fri, 07 Oct 2011 01:17:02 +0200
Kay Sievers <kay.sievers@vrfy.org> wrote:

> We___d like to share our current wish list of plumbing layer features we

gargh.  gmail?

>
> ...
>
> * fork throttling mechanism as basic cgroup functionality that is
> available in all hierarchies independent of the controllers used:
> This is important to implement race-free killing of all members of a
> cgroup, so that cgroup member processes cannot fork faster then a cgroup
> supervisor process could kill them. This needs to be recursive, so that
> not only a cgroup but all its subgroups are covered as well.

Frederic Weisbecker's "cgroups: add a task counter subsystem" should
address this.  Does it meet these requirments?  Have you tested it?

>
> ...
>
> * Add a timerslack cgroup controller, to allow increasing the timer
> slack of user session cgroups when the machine is idle.

Kirill Shutemov has just posted "cgroups: introduce timer slack
controller".  Again, is that sufficient?  Have you reviewed and tested
it?

>
> ...
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-11 23:16 ` Andrew Morton
@ 2011-10-12  0:53   ` Frederic Weisbecker
  2011-10-12  0:59   ` Frederic Weisbecker
  1 sibling, 0 replies; 52+ messages in thread
From: Frederic Weisbecker @ 2011-10-12  0:53 UTC (permalink / raw)
  To: Andrew Morton, Kay Sievers, TejunHeotj
  Cc: linux-kernel, lennart, harald, david, greg, Kirill A. Shutemov

On Tue, Oct 11, 2011 at 04:16:00PM -0700, Andrew Morton wrote:
> On Fri, 07 Oct 2011 01:17:02 +0200
> Kay Sievers <kay.sievers@vrfy.org> wrote:
> > * fork throttling mechanism as basic cgroup functionality that is
> > available in all hierarchies independent of the controllers used:
> > This is important to implement race-free killing of all members of a
> > cgroup, so that cgroup member processes cannot fork faster then a cgroup
> > supervisor process could kill them. This needs to be recursive, so that
> > not only a cgroup but all its subgroups are covered as well.
> 
> Frederic Weisbecker's "cgroups: add a task counter subsystem" should
> address this.  Does it meet these requirments?  Have you tested it?

It should work for this yeah. We in fact explored and documented that
second usecase of the task counter subsystem for Kay's needs.

Now cgroup subsystems can only be binded in one hierarchy at a time.
So it couldn't be used by Lxc and some other user at the same time
and that defeats kay's goals. But there is an old patch from Paul
Menage that allows some specific subsystems (those that don't deal
with global resources) to be mounted on many hierarchies. The task
counter would fit in and hence be usable by Lxc and other users
simultaneously.

There is another solution that is to be considered. One could use
the cgroup freezer to freeze all the tasks in a cgroup and then kill
them all before thawing the whole. If the process of freezing doesn't
have races against fork then it should work as well. I only worry
about the window in copy_process() between the test on signal_pending(),
that cancels the fork if a signal is pending on the parent, and the
time the new task is eventually added to the cgroup with
cgroup_post_fork(). If the freezer misses the child while it is in that
window, then it's not going to be killed with the rest and it may even
launch some fork() soon to annoy you further. I don't know if that's
handled by the freezer. If it doesn't and that can't be fixed then that
won't work for you.

If the freezer is a possible solution then I don't know which one
is best for you. Perhaps freezing the tasks in the cgroup can make
it faster, or slower, than rejecting any fork and killing directly.
Perhaps it would be helpful to get more details about the practical
case you have.

Anyway, if you think the task counter subsystem approach suits you
better, I can rework Paul's patches that allow multi-bindable
subsystem so that it gets usable by several users simultaneously.

Thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-11 23:16 ` Andrew Morton
  2011-10-12  0:53   ` Frederic Weisbecker
@ 2011-10-12  0:59   ` Frederic Weisbecker
       [not found]     ` <20111012174014.GE6281@google.com>
  1 sibling, 1 reply; 52+ messages in thread
From: Frederic Weisbecker @ 2011-10-12  0:59 UTC (permalink / raw)
  To: Andrew Morton, Kay Sievers, Tejun Heo
  Cc: linux-kernel, lennart, harald, david, greg, Kirill A. Shutemov

(Resending because I screwed Tejun's email address...)

On Tue, Oct 11, 2011 at 04:16:00PM -0700, Andrew Morton wrote:
> On Fri, 07 Oct 2011 01:17:02 +0200
> Kay Sievers <kay.sievers@vrfy.org> wrote:
> > * fork throttling mechanism as basic cgroup functionality that is
> > available in all hierarchies independent of the controllers used:
> > This is important to implement race-free killing of all members of a
> > cgroup, so that cgroup member processes cannot fork faster then a cgroup
> > supervisor process could kill them. This needs to be recursive, so that
> > not only a cgroup but all its subgroups are covered as well.
>
> Frederic Weisbecker's "cgroups: add a task counter subsystem" should
> address this.  Does it meet these requirments?  Have you tested it?

It should work for this yeah. We in fact explored and documented that
second usecase of the task counter subsystem for Kay's needs.

Now cgroup subsystems can only be binded in one hierarchy at a time.
So it couldn't be used by Lxc and some other user at the same time
and that defeats kay's goals. But there is an old patch from Paul
Menage that allows some specific subsystems (those that don't deal
with global resources) to be mounted on many hierarchies. The task
counter would fit in and hence be usable by Lxc and other users
simultaneously.

There is another solution that is to be considered. One could use
the cgroup freezer to freeze all the tasks in a cgroup and then kill
them all before thawing the whole. If the process of freezing doesn't
have races against fork then it should work as well. I only worry
about the window in copy_process() between the test on signal_pending(),
that cancels the fork if a signal is pending on the parent, and the
time the new task is eventually added to the cgroup with
cgroup_post_fork(). If the freezer misses the child while it is in that
window, then it's not going to be killed with the rest and it may even
launch some fork() soon to annoy you further. I don't know if that's
handled by the freezer. If it doesn't and that can't be fixed then that
won't work for you.

If the freezer is a possible solution then I don't know which one
is best for you. Perhaps freezing the tasks in the cgroup can make
it faster, or slower, than rejecting any fork and killing directly.
Perhaps it would be helpful to get more details about the practical
case you have.

Anyway, if you think the task counter subsystem approach suits you
better, I can rework Paul's patches that allow multi-bindable
subsystem so that it gets usable by several users simultaneously.

Thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-11 15:49               ` Andrew G. Morgan
@ 2011-10-12  2:31                 ` Serge E. Hallyn
  2011-10-12 20:51                 ` Lennart Poettering
  1 sibling, 0 replies; 52+ messages in thread
From: Serge E. Hallyn @ 2011-10-12  2:31 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Kay Sievers, Alan Cox, linux-kernel, lennart, harald, david, greg,
	KaiGai Kohei

Quoting Andrew G. Morgan (morgan@kernel.org):
> The benefit of Kai Gai's patch was that it exported the actual names
> of the capabilities rather than have them only stored in libcap.
> 
> It is possible to use CAP_IS_SUPPORTED(cap) (in libcap-2.21) to figure
> out the maximum capability supported by the running kernel.
> 
>   https://sites.google.com/site/fullycapable/release-notes-for-libcap

I keep forgetting about that :)

thanks, Andrew.

-serge

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
       [not found]     ` <20111012174014.GE6281@google.com>
@ 2011-10-12 18:16       ` Cyrill Gorcunov
  2011-10-14 15:38         ` Frederic Weisbecker
  0 siblings, 1 reply; 52+ messages in thread
From: Cyrill Gorcunov @ 2011-10-12 18:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frederic Weisbecker, Andrew Morton, Kay Sievers, linux-kernel,
	lennart, harald, david, greg, Kirill A. Shutemov, Oleg Nesterov,
	Paul Menage, Rafael J. Wysocki, Pavel Emelyanov

On Wed, Oct 12, 2011 at 10:40:14AM -0700, Tejun Heo wrote:
...
> 
> In general, I think making freezer work nicely with the rest of the
> system is a good idea and have been working towards that direction.
> Allowing a frozen task to be killed is not only handy for use cases
> like above but also makes solving freezer involved deadlocks much less
> likely and easier to solve.  Another that I have in mind is allowing
> ptrace from unfrozen task to a frozen task.  This can be helpful in
> general debugging (currently attaching to multi-threaded, violently
> cloning process is quite cumbersome) and userland checkpointing.

Yeah, being able to ptrace a frozen cgroup would be great for us.
We stick with signals start/stop cycle at moment but the final target
is the cgroups and freezer of course. (btw while were poking freezer
code I noticed that there is no shortcut to move all tasks in cgroup
into the root cgroup, so I guess say "echo -1 > tasks" might be a good
addition to move all tasks from some particular cgroup to the root
by single action).

> 
> I was working toward these and had some of the patches in Rafael's
> tree but then korg went down and we lost track of the tree and I had a
> pretty long vacation.  I can't say for sure but am aiming to achieve
> the goals during the next devel cycle.
>

This is a wishlist after all, so target is pointed and only time is
needed to implement all this ;)

	Cyrill

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-11 15:49               ` Andrew G. Morgan
  2011-10-12  2:31                 ` Serge E. Hallyn
@ 2011-10-12 20:51                 ` Lennart Poettering
  1 sibling, 0 replies; 52+ messages in thread
From: Lennart Poettering @ 2011-10-12 20:51 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Kay Sievers, Alan Cox, linux-kernel, harald,
	david, greg, KaiGai Kohei

On Tue, 11.10.11 08:49, Andrew G. Morgan (morgan@kernel.org) wrote:

> 
> The benefit of Kai Gai's patch was that it exported the actual names
> of the capabilities rather than have them only stored in libcap.
> 
> It is possible to use CAP_IS_SUPPORTED(cap) (in libcap-2.21) to figure
> out the maximum capability supported by the running kernel.
> 
>   https://sites.google.com/site/fullycapable/release-notes-for-libcap

Oh, hmm, interesting. I have now changed my code to make use of this,
but I can't say it's pretty, because I basically have to search linearly
for the highest capability supported if that's what I want to know.

So, I guess this solves the problem for now, but I'd still like to see a
proper API for this.

Anyway, thanks for the pointer,

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-10 13:09             ` Theodore Tso
@ 2011-10-13  0:28               ` Dave Chinner
  2011-10-14 15:47                 ` Ted Ts'o
  0 siblings, 1 reply; 52+ messages in thread
From: Dave Chinner @ 2011-10-13  0:28 UTC (permalink / raw)
  To: Theodore Tso
  Cc: dave, Hugo Mills, Kay Sievers, Alan Cox, linux-kernel, lennart,
	harald, david, greg, Chris Mason, Btrfs mailing list

On Mon, Oct 10, 2011 at 09:09:37AM -0400, Theodore Tso wrote:
> 
> On Oct 10, 2011, at 7:18 AM, David Sterba wrote:
> 
> > "Resetting the UUID on btrfs isn't a quick-and-easy thing - you
> > have to walk the entire tree and change every object. We've got
> > a bad-hack in meego that uses btrfs-debug-tree and changes the
> > UUID while it runs the entire tree, but it's ugly as hell."
> 
> Changing the UUID is going to be harder for ext4 as well, once we
> integrate metadata checksums. 

And for XFS, we're modifying the on-disk format to encode the UUID
into every single piece of metadata in the filesystem. Hence
changing it entails a similar problem to btrfs - an entire
filesystem metadata RMW cycle.

> So while it makes sense to have
> on-line ways of updating labels for mounted file systems it
> probably makes muchness sense to support it for UUIDs.
                     ^^^^ less
Agreed.

> I suspect what it means in practice is that it will be useful for
> file systems to provide fs image copying tools that also generate
> a new UUID while you're at it, for use by IT administrators and
> embedded systems manufacturers.

Yup. xfs_admin already provides an interface for offline
modification of the UUID for XFS filesytems. I.e. clone the
filesytem using xfs_copy, then run xfs_admin -U generate <clone> to
generate a new uuid in the cloned copy before you mount the
clone....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-12 18:16       ` Cyrill Gorcunov
@ 2011-10-14 15:38         ` Frederic Weisbecker
  2011-10-14 16:01           ` Cyrill Gorcunov
  2011-10-19 21:19           ` Paul Menage
  0 siblings, 2 replies; 52+ messages in thread
From: Frederic Weisbecker @ 2011-10-14 15:38 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Tejun Heo, Andrew Morton, Kay Sievers, linux-kernel, lennart,
	harald, david, greg, Kirill A. Shutemov, Oleg Nesterov,
	Paul Menage, Rafael J. Wysocki, Pavel Emelyanov

On Wed, Oct 12, 2011 at 10:16:41PM +0400, Cyrill Gorcunov wrote:
> On Wed, Oct 12, 2011 at 10:40:14AM -0700, Tejun Heo wrote:
> ...
> > 
> > In general, I think making freezer work nicely with the rest of the
> > system is a good idea and have been working towards that direction.
> > Allowing a frozen task to be killed is not only handy for use cases
> > like above but also makes solving freezer involved deadlocks much less
> > likely and easier to solve.  Another that I have in mind is allowing
> > ptrace from unfrozen task to a frozen task.  This can be helpful in
> > general debugging (currently attaching to multi-threaded, violently
> > cloning process is quite cumbersome) and userland checkpointing.
> 
> Yeah, being able to ptrace a frozen cgroup would be great for us.
> We stick with signals start/stop cycle at moment but the final target
> is the cgroups and freezer of course. (btw while were poking freezer
> code I noticed that there is no shortcut to move all tasks in cgroup
> into the root cgroup, so I guess say "echo -1 > tasks" might be a good
> addition to move all tasks from some particular cgroup to the root
> by single action).

Well, wouldn't it be better to pull that complexity to userspace?
After all, moving tasks from a cgroup to another is not a performance
critical operation so that probably doesn't need to be all handled by
the kernel.

If one worries about concurrent clone/fork while moving tasks, then
freezing the cgroup and moving its tasks away from userspace could
be enough?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber???s Wish List for Linux
  2011-10-13  0:28               ` Dave Chinner
@ 2011-10-14 15:47                 ` Ted Ts'o
  0 siblings, 0 replies; 52+ messages in thread
From: Ted Ts'o @ 2011-10-14 15:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: dave, Hugo Mills, Kay Sievers, Alan Cox, linux-kernel, lennart,
	harald, david, greg, Chris Mason, Btrfs mailing list

On Thu, Oct 13, 2011 at 11:28:39AM +1100, Dave Chinner wrote:
> Yup. xfs_admin already provides an interface for offline
> modification of the UUID for XFS filesytems. I.e. clone the
> filesytem using xfs_copy, then run xfs_admin -U generate <clone> to
> generate a new uuid in the cloned copy before you mount the
> clone....

This is probably another thing which perhaps Ric Wheeler's proposed
"generic LVM / file system management front end" should abstract away,
since every single file system has a different way of setting the UUID
in an off-line way.  It's a relatively specialized feature, so I
wouldn't call it high priority to implement first.

	      	      	       	  - Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-14 15:38         ` Frederic Weisbecker
@ 2011-10-14 16:01           ` Cyrill Gorcunov
  2011-10-14 16:08             ` Cyrill Gorcunov
  2011-10-19 21:19           ` Paul Menage
  1 sibling, 1 reply; 52+ messages in thread
From: Cyrill Gorcunov @ 2011-10-14 16:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tejun Heo, Andrew Morton, Kay Sievers, linux-kernel, lennart,
	harald, david, greg, Kirill A. Shutemov, Oleg Nesterov,
	Paul Menage, Rafael J. Wysocki, Pavel Emelyanov

On Fri, Oct 14, 2011 at 05:38:47PM +0200, Frederic Weisbecker wrote:
...
> 
> Well, wouldn't it be better to pull that complexity to userspace?
> After all, moving tasks from a cgroup to another is not a performance
> critical operation so that probably doesn't need to be all handled by
> the kernel.
> 
> If one worries about concurrent clone/fork while moving tasks, then
> freezing the cgroup and moving its tasks away from userspace could
> be enough?

Well, it's not that problem to make it task-by-task, still I think
it's just a convenient shortcut :)

	Cyrill

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-14 16:01           ` Cyrill Gorcunov
@ 2011-10-14 16:08             ` Cyrill Gorcunov
  2011-10-14 16:19               ` Frederic Weisbecker
  0 siblings, 1 reply; 52+ messages in thread
From: Cyrill Gorcunov @ 2011-10-14 16:08 UTC (permalink / raw)
  To: Frederic Weisbecker, Tejun Heo, Andrew Morton, Kay Sievers,
	linux-kernel, lennart, harald, david, greg, Kirill A. Shutemov,
	Oleg Nesterov, Paul Menage, Rafael J. Wysocki, Pavel Emelyanov

On Fri, Oct 14, 2011 at 08:01:10PM +0400, Cyrill Gorcunov wrote:
...
> > Well, wouldn't it be better to pull that complexity to userspace?
> > After all, moving tasks from a cgroup to another is not a performance
> > critical operation so that probably doesn't need to be all handled by
> > the kernel.
> > 
> > If one worries about concurrent clone/fork while moving tasks, then
> > freezing the cgroup and moving its tasks away from userspace could
> > be enough?
> 
> Well, it's not that problem to make it task-by-task, still I think
> it's just a convenient shortcut :)
> 

Frederic, don't get me wrong, but when I've tried cgroups and freezer for
first time (and I did it not by any script but by hands rather) it makes
me scream out that once I've moved a number of tasks to some freezer cgroup
now I need to move them back again. Of course there is a way to write some
script of whatever but I thought we had some echo -1 shortcut. Anyway,
I can live with it ;)

	Cyrill

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-14 16:08             ` Cyrill Gorcunov
@ 2011-10-14 16:19               ` Frederic Weisbecker
  0 siblings, 0 replies; 52+ messages in thread
From: Frederic Weisbecker @ 2011-10-14 16:19 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Tejun Heo, Andrew Morton, Kay Sievers, linux-kernel, lennart,
	harald, david, greg, Kirill A. Shutemov, Oleg Nesterov,
	Paul Menage, Rafael J. Wysocki, Pavel Emelyanov

On Fri, Oct 14, 2011 at 08:08:09PM +0400, Cyrill Gorcunov wrote:
> On Fri, Oct 14, 2011 at 08:01:10PM +0400, Cyrill Gorcunov wrote:
> ...
> > > Well, wouldn't it be better to pull that complexity to userspace?
> > > After all, moving tasks from a cgroup to another is not a performance
> > > critical operation so that probably doesn't need to be all handled by
> > > the kernel.
> > > 
> > > If one worries about concurrent clone/fork while moving tasks, then
> > > freezing the cgroup and moving its tasks away from userspace could
> > > be enough?
> > 
> > Well, it's not that problem to make it task-by-task, still I think
> > it's just a convenient shortcut :)
> > 
> 
> Frederic, don't get me wrong, but when I've tried cgroups and freezer for
> first time (and I did it not by any script but by hands rather) it makes
> me scream out that once I've moved a number of tasks to some freezer cgroup
> now I need to move them back again. Of course there is a way to write some
> script of whatever but I thought we had some echo -1 shortcut. Anyway,
> I can live with it ;)

Using a script would be a much better shortcut.
The script may be a few dozen lines. Push that in the kernel and it may
be much more.

Have a close overall look at kernel/cgroup.c and ask yourself if you would
like to add 100 more lines to it, to avoid to make it in userspace ;)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-06 23:17 Kay Sievers
                   ` (6 preceding siblings ...)
  2011-10-11 23:16 ` Andrew Morton
@ 2011-10-19 21:12 ` Paul Menage
  2011-10-19 23:03   ` Lennart Poettering
  7 siblings, 1 reply; 52+ messages in thread
From: Paul Menage @ 2011-10-19 21:12 UTC (permalink / raw)
  To: Kay Sievers; +Cc: linux-kernel, lennart, harald, david, greg

On Thu, Oct 6, 2011 at 4:17 PM, Kay Sievers <kay.sievers@vrfy.org> wrote:
>
> * fork throttling mechanism as basic cgroup functionality that is
> available in all hierarchies independent of the controllers used:
> This is important to implement race-free killing of all members of a
> cgroup, so that cgroup member processes cannot fork faster then a cgroup
> supervisor process could kill them. This needs to be recursive, so that
> not only a cgroup but all its subgroups are covered as well.

If that's your end goal, then an alternative to the freezer support
that others have mentioned would be a 'cgroup.signal' file which, when
written to, would send that signal to all members of the cgroup at
once. Perhaps simpler than having to get in the way of the fork path
more and manage a rate-limit.

>
> * allow user xattrs to be set on files in the cgroupfs (and maybe
> procfs?)

What would the use case be for this?

Paul

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-14 15:38         ` Frederic Weisbecker
  2011-10-14 16:01           ` Cyrill Gorcunov
@ 2011-10-19 21:19           ` Paul Menage
  1 sibling, 0 replies; 52+ messages in thread
From: Paul Menage @ 2011-10-19 21:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Cyrill Gorcunov, Tejun Heo, Andrew Morton, Kay Sievers,
	linux-kernel, lennart, harald, david, greg, Kirill A. Shutemov,
	Oleg Nesterov, Rafael J. Wysocki, Pavel Emelyanov

On Fri, Oct 14, 2011 at 8:38 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> Well, wouldn't it be better to pull that complexity to userspace?
> After all, moving tasks from a cgroup to another is not a performance
> critical operation so that probably doesn't need to be all handled by
> the kernel.

I'd always assumed that too, but apparently on very many (possibly the
majority of?) Linux systems, it actually is performance-critical.

Specifically, Android bounces tasks in and out of a "foreground
low-latency" cpu cgroup at a fairly high rate, and has found the
performance hit from the locking to be a problem on multi-core phones.
Hence Colin Cross' patches for avoid calls to synchronize_rcu() in the
attach path.

Paul

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-19 21:12 ` Paul Menage
@ 2011-10-19 23:03   ` Lennart Poettering
  2011-10-19 23:09     ` Paul Menage
  0 siblings, 1 reply; 52+ messages in thread
From: Lennart Poettering @ 2011-10-19 23:03 UTC (permalink / raw)
  To: Paul Menage; +Cc: Kay Sievers, linux-kernel, harald, david, greg

On Wed, 19.10.11 14:12, Paul Menage (paul@paulmenage.org) wrote:

> On Thu, Oct 6, 2011 at 4:17 PM, Kay Sievers <kay.sievers@vrfy.org> wrote:
> >
> > * fork throttling mechanism as basic cgroup functionality that is
> > available in all hierarchies independent of the controllers used:
> > This is important to implement race-free killing of all members of a
> > cgroup, so that cgroup member processes cannot fork faster then a cgroup
> > supervisor process could kill them. This needs to be recursive, so that
> > not only a cgroup but all its subgroups are covered as well.
> 
> If that's your end goal, then an alternative to the freezer support
> that others have mentioned would be a 'cgroup.signal' file which, when
> written to, would send that signal to all members of the cgroup at
> once. Perhaps simpler than having to get in the way of the fork path
> more and manage a rate-limit.

For our systemd usecase a cgroup.signal file would not be useful. This
is because we actually kill all members of the service's cgroup plus the
main process of the service, which is usually also in the service's
cgroup but sometimes isn't (for example: when the user logs in, the
whole /sbin/login process ends up in the user's session cgroup, and is
removed from the original service cgroup). Since we want to avoid
killing the main service process twice in the case where it isn't in the
servce cgroup we'd hence prefer to have some fork throttling logic in
place, so that we can kill members flexibly in accordance with these
rules.

> > * allow user xattrs to be set on files in the cgroupfs (and maybe
> > procfs?)
> 
> What would the use case be for this?

Attaching meta information to services, in an easily discoverable
way. For example, in systemd we create one cgroup for each service, and
could then store data like the main pid of the specific service as an
xattr on the cgroup itself. That way we'd have almost all service state
in the cgroupfs, which would make it possible to terminate systemd and
later restart it without losing any state information. But there's more:
for example, some very peculiar services cannot be terminated on
shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
services in question could just mark that on their cgroup, by setting an
xattr. On the more desktopy side of things there are other
possibilities: for example there are plans defining what an application
is along the lines of a cgroup (i.e. an app being a collection of
processes). With xattrs one could then attach an icon or human readable
program name on the cgroup.

The key idea is that this would allow attaching runtime meta information
to cgroups and everything they model (services, apps, vms), that doesn't
need any complex userspace infrastructure, has good access control
(i.e. because the file system enforces that anyway, and there's the
"trusted." xattr namespace), notifications (inotify), and can easily be
shared among applications. 

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-19 23:03   ` Lennart Poettering
@ 2011-10-19 23:09     ` Paul Menage
  2011-10-19 23:31       ` Lennart Poettering
  0 siblings, 1 reply; 52+ messages in thread
From: Paul Menage @ 2011-10-19 23:09 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: Kay Sievers, linux-kernel, harald, david, greg

On Wed, Oct 19, 2011 at 4:03 PM, Lennart Poettering
<mzxreary@0pointer.de> wrote:
>
> For our systemd usecase a cgroup.signal file would not be useful. This
> is because we actually kill all members of the service's cgroup plus the
> main process of the service, which is usually also in the service's
> cgroup but sometimes isn't (for example: when the user logs in, the
> whole /sbin/login process ends up in the user's session cgroup, and is
> removed from the original service cgroup). Since we want to avoid
> killing the main service process twice in the case where it isn't in the
> servce cgroup we'd hence prefer to have some fork throttling logic in
> place, so that we can kill members flexibly in accordance with these
> rules.

By fork-throttling, do you just mean "0 or unlimited", or would you
actually want some kind of rate-limited throttling? If the former,
than I agree with Frederick that his task counter should solve that
problem.

Paul

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-07  0:13   ` Lennart Poettering
  2011-10-07  1:57     ` Andi Kleen
@ 2011-10-19 23:16     ` H. Peter Anvin
  1 sibling, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2011-10-19 23:16 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Andi Kleen, Kay Sievers, linux-kernel, harald, david, greg

On 10/06/2011 05:13 PM, Lennart Poettering wrote:
> 
> Well, I am aware of PR_SET_NAME, but that modifies comm, not argv[]. And
> while "top" indeed shows the former, "ps" shows the latter. We are looking
> for a way to nice way to modify argv[] without having to reuse space
> from environ[] like most current Linux implementations of
> setproctitle() do.
> 
> A while back there were patches for PR_SET_PROCTITLE_AREA floating
> around. We'd like to see something like that merged one day.
> 

A saner thing would be if the initial argv[] area couldn't be modified
at all, and that an explicit system call was required to change the
title displayed by ps or top, but that ps or top could be forced to show
the argv as initially passed to the process.

	-hpa


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-19 23:09     ` Paul Menage
@ 2011-10-19 23:31       ` Lennart Poettering
  2011-10-22 10:21         ` Frederic Weisbecker
  0 siblings, 1 reply; 52+ messages in thread
From: Lennart Poettering @ 2011-10-19 23:31 UTC (permalink / raw)
  To: Paul Menage; +Cc: Kay Sievers, linux-kernel, harald, david, greg

On Wed, 19.10.11 16:09, Paul Menage (paul@paulmenage.org) wrote:

> On Wed, Oct 19, 2011 at 4:03 PM, Lennart Poettering
> <mzxreary@0pointer.de> wrote:
> >
> > For our systemd usecase a cgroup.signal file would not be useful. This
> > is because we actually kill all members of the service's cgroup plus the
> > main process of the service, which is usually also in the service's
> > cgroup but sometimes isn't (for example: when the user logs in, the
> > whole /sbin/login process ends up in the user's session cgroup, and is
> > removed from the original service cgroup). Since we want to avoid
> > killing the main service process twice in the case where it isn't in the
> > servce cgroup we'd hence prefer to have some fork throttling logic in
> > place, so that we can kill members flexibly in accordance with these
> > rules.
> 
> By fork-throttling, do you just mean "0 or unlimited", or would you
> actually want some kind of rate-limited throttling? If the former,
> than I agree with Frederick that his task counter should solve that
> problem.

Given that shutting down some services might involve forking off a few
things (think: a shell script handling shutdown which forks off a couple
of shell utilities) we'd want something that is between "from now on no
forking at all" and "unlimited forking". This could be done in many
different ways: we'd be happy if we could do time-based rate limiting,
but we'd also be fine with defining a certain budget of additional forks
a cgroup can do (i.e. "from now on you can do 50 more forks, then you'll
get EPERM).

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-19 23:31       ` Lennart Poettering
@ 2011-10-22 10:21         ` Frederic Weisbecker
  2011-10-22 15:28           ` Lennart Poettering
  0 siblings, 1 reply; 52+ messages in thread
From: Frederic Weisbecker @ 2011-10-22 10:21 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Paul Menage, Kay Sievers, linux-kernel, harald, david, greg

On Thu, Oct 20, 2011 at 01:31:11AM +0200, Lennart Poettering wrote:
> On Wed, 19.10.11 16:09, Paul Menage (paul@paulmenage.org) wrote:
> 
> > On Wed, Oct 19, 2011 at 4:03 PM, Lennart Poettering
> > <mzxreary@0pointer.de> wrote:
> > >
> > > For our systemd usecase a cgroup.signal file would not be useful. This
> > > is because we actually kill all members of the service's cgroup plus the
> > > main process of the service, which is usually also in the service's
> > > cgroup but sometimes isn't (for example: when the user logs in, the
> > > whole /sbin/login process ends up in the user's session cgroup, and is
> > > removed from the original service cgroup). Since we want to avoid
> > > killing the main service process twice in the case where it isn't in the
> > > servce cgroup we'd hence prefer to have some fork throttling logic in
> > > place, so that we can kill members flexibly in accordance with these
> > > rules.
> > 
> > By fork-throttling, do you just mean "0 or unlimited", or would you
> > actually want some kind of rate-limited throttling? If the former,
> > than I agree with Frederick that his task counter should solve that
> > problem.
> 
> Given that shutting down some services might involve forking off a few
> things (think: a shell script handling shutdown which forks off a couple
> of shell utilities) we'd want something that is between "from now on no
> forking at all" and "unlimited forking". This could be done in many
> different ways: we'd be happy if we could do time-based rate limiting,
> but we'd also be fine with defining a certain budget of additional forks
> a cgroup can do (i.e. "from now on you can do 50 more forks, then you'll
> get EPERM).

Thinking more about it, you shouldn't use the task counter subsystem for
Systemd. This is a subsystem that may bring some significant overhead
(ie: walk through the entire hierarchy every fork and exit). Doesn't
sound like something suitable for an init process.

If you really need to stop any forks in a cgroup, then a cgroup core feature
handling that very single purpose would be better and more efficient.

That said I'm not really sure why you're using cgroups in Systemd.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-22 10:21         ` Frederic Weisbecker
@ 2011-10-22 15:28           ` Lennart Poettering
  2011-10-25  5:40             ` Li Zefan
  0 siblings, 1 reply; 52+ messages in thread
From: Lennart Poettering @ 2011-10-22 15:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul Menage, Kay Sievers, linux-kernel, harald, david, greg

On Sat, 22.10.11 12:21, Frederic Weisbecker (fweisbec@gmail.com) wrote:

> If you really need to stop any forks in a cgroup, then a cgroup core feature
> handling that very single purpose would be better and more efficient.

We'd be happy with that and this is what we originally suggested actually.

> That said I'm not really sure why you're using cgroups in Systemd.

We want to reliably label processes in a hierarchial way, so that this
is inherited by all child processes, cannot be overriden by unprivileged
code (subject to some classic Unix access control handling) and get
notifications when such a label stops referring to any process. We use
that for sticking the service name on a process, so that all CGI
processes of Apache are automatically assigned the same service as
apache itself. And we want a notification when all of apache's processes
die. And we also want to be able to kill Apache compeltely by killing
all its processes.

cgroups provides us with all of that, though the last two items only in
a suboptimal way: notification of cgroups running empty is ugly, since
it is done by spawning a usermode helper (we'd prefer a netlink msg or
so), and the process killing is a bit racy.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-22 15:28           ` Lennart Poettering
@ 2011-10-25  5:40             ` Li Zefan
  2011-10-30 17:18               ` Lennart Poettering
  0 siblings, 1 reply; 52+ messages in thread
From: Li Zefan @ 2011-10-25  5:40 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Frederic Weisbecker, Paul Menage, Kay Sievers, linux-kernel,
	harald, david, greg

Lennart Poettering wrote:
> On Sat, 22.10.11 12:21, Frederic Weisbecker (fweisbec@gmail.com) wrote:
> 
>> If you really need to stop any forks in a cgroup, then a cgroup core feature
>> handling that very single purpose would be better and more efficient.
> 
> We'd be happy with that and this is what we originally suggested actually.
> 
>> That said I'm not really sure why you're using cgroups in Systemd.
> 
> We want to reliably label processes in a hierarchial way, so that this
> is inherited by all child processes, cannot be overriden by unprivileged
> code (subject to some classic Unix access control handling) and get
> notifications when such a label stops referring to any process. We use
> that for sticking the service name on a process, so that all CGI
> processes of Apache are automatically assigned the same service as
> apache itself. And we want a notification when all of apache's processes
> die. And we also want to be able to kill Apache compeltely by killing
> all its processes.
> 
> cgroups provides us with all of that, though the last two items only in
> a suboptimal way: notification of cgroups running empty is ugly, since
> it is done by spawning a usermode helper (we'd prefer a netlink msg or
> so), and the process killing is a bit racy.
> 

How about using eventfd? You can create an eventfd for the specific "tasks"
file, and when the cgroup gets empty (no task in it), you'll get a notification.

It should be easy to implement, since cgroup already supports eventfd-based
API.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-25  5:40             ` Li Zefan
@ 2011-10-30 17:18               ` Lennart Poettering
  2011-11-01  1:27                 ` Li Zefan
  0 siblings, 1 reply; 52+ messages in thread
From: Lennart Poettering @ 2011-10-30 17:18 UTC (permalink / raw)
  To: Li Zefan
  Cc: Frederic Weisbecker, Paul Menage, Kay Sievers, linux-kernel,
	harald, david, greg

On Tue, 25.10.11 13:40, Li Zefan (lizf@cn.fujitsu.com) wrote:

> > cgroups provides us with all of that, though the last two items only in
> > a suboptimal way: notification of cgroups running empty is ugly, since
> > it is done by spawning a usermode helper (we'd prefer a netlink msg or
> > so), and the process killing is a bit racy.
> 
> How about using eventfd? You can create an eventfd for the specific "tasks"
> file, and when the cgroup gets empty (no task in it), you'll get a notification.
> 
> It should be easy to implement, since cgroup already supports eventfd-based
> API.

I am quite convinced that using eventfd() like this is quite ugly. The
current evetnfd() logic is not recursive anyway, hence wouldn't help us
much.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: A Plumber’s Wish List for Linux
  2011-10-30 17:18               ` Lennart Poettering
@ 2011-11-01  1:27                 ` Li Zefan
  0 siblings, 0 replies; 52+ messages in thread
From: Li Zefan @ 2011-11-01  1:27 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Frederic Weisbecker, Paul Menage, Kay Sievers, linux-kernel,
	harald, david, greg

Lennart Poettering wrote:
> On Tue, 25.10.11 13:40, Li Zefan (lizf@cn.fujitsu.com) wrote:
> 
>>> cgroups provides us with all of that, though the last two items only in
>>> a suboptimal way: notification of cgroups running empty is ugly, since
>>> it is done by spawning a usermode helper (we'd prefer a netlink msg or
>>> so), and the process killing is a bit racy.
>>
>> How about using eventfd? You can create an eventfd for the specific "tasks"
>> file, and when the cgroup gets empty (no task in it), you'll get a notification.
>>
>> It should be easy to implement, since cgroup already supports eventfd-based
>> API.
> 
> I am quite convinced that using eventfd() like this is quite ugly. The
> current evetnfd() logic is not recursive anyway, hence wouldn't help us
> much.
> 

I remember in an earlier email you stated you want to be able to kill all tasks
in a cgroup and its children, and you used the word "recursive", but what do you
mean by ""recursive" for empty cgroup notification, do you expect the listener
to recieve a message if a cgroup or any of its children becomes empty?

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2011-11-01  1:26 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAE2SPAZci=u__d58phePCftVr_e+i+N2YU-JYjGDG_b3TmYTSQ@mail.gmail.com>
2011-10-07 13:40 ` A Plumber’s Wish List for Linux Alan Cox
2011-10-07 14:57   ` Alexander E. Patrakov
2011-10-06 23:17 Kay Sievers
2011-10-06 23:46 ` Andi Kleen
2011-10-07  0:13   ` Lennart Poettering
2011-10-07  1:57     ` Andi Kleen
2011-10-07 15:58       ` Lennart Poettering
2011-10-19 23:16     ` H. Peter Anvin
2011-10-07  7:49 ` Matt Helsley
2011-10-07 16:01   ` Lennart Poettering
2011-10-08  4:24     ` Eric W. Biederman
2011-10-10 16:31       ` Lennart Poettering
2011-10-07 10:12 ` Alan Cox
2011-10-07 10:28   ` Kay Sievers
2011-10-07 10:38     ` Alan Cox
2011-10-07 12:46       ` Kay Sievers
2011-10-07 13:39         ` Theodore Tso
2011-10-07 15:21         ` Hugo Mills
2011-10-10 11:18           ` A Plumber???s " David Sterba
2011-10-10 11:18             ` David Sterba
2011-10-10 13:09             ` Theodore Tso
2011-10-13  0:28               ` Dave Chinner
2011-10-14 15:47                 ` Ted Ts'o
2011-10-11 13:14             ` Serge E. Hallyn
2011-10-11 15:49               ` Andrew G. Morgan
2011-10-12  2:31                 ` Serge E. Hallyn
2011-10-12 20:51                 ` Lennart Poettering
2011-10-08  9:53         ` A Plumber’s " Bastien ROUCARIES
2011-10-09  3:15           ` Alex Elsayed
2011-10-07 16:07       ` Valdis.Kletnieks
2011-10-07 12:35 ` Vivek Goyal
2011-10-07 18:59 ` Greg KH
2011-10-09 12:20   ` Kay Sievers
2011-10-09  8:45 ` Rusty Russell
2011-10-11 23:16 ` Andrew Morton
2011-10-12  0:53   ` Frederic Weisbecker
2011-10-12  0:59   ` Frederic Weisbecker
     [not found]     ` <20111012174014.GE6281@google.com>
2011-10-12 18:16       ` Cyrill Gorcunov
2011-10-14 15:38         ` Frederic Weisbecker
2011-10-14 16:01           ` Cyrill Gorcunov
2011-10-14 16:08             ` Cyrill Gorcunov
2011-10-14 16:19               ` Frederic Weisbecker
2011-10-19 21:19           ` Paul Menage
2011-10-19 21:12 ` Paul Menage
2011-10-19 23:03   ` Lennart Poettering
2011-10-19 23:09     ` Paul Menage
2011-10-19 23:31       ` Lennart Poettering
2011-10-22 10:21         ` Frederic Weisbecker
2011-10-22 15:28           ` Lennart Poettering
2011-10-25  5:40             ` Li Zefan
2011-10-30 17:18               ` Lennart Poettering
2011-11-01  1:27                 ` Li Zefan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.