Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH v3] gpio: lib-sysfs: Add 'wakeup' attribute
From: Sören Brinkmann @ 2014-10-31 16:16 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-api, Alexandre Courbot, Jonathan Corbet,
	linux-kernel@vger.kernel.org, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org
In-Reply-To: <CACRpkdaduj3MaEN=hmL54ANJiP0B5MdH9qSqWRukX4H7pFt_Hw@mail.gmail.com>

On Fri, 2014-10-31 at 08:03AM +0100, Linus Walleij wrote:
> On Mon, Oct 27, 2014 at 7:30 PM, Soren Brinkmann
> <soren.brinkmann@xilinx.com> wrote:
> 
> > Add an attribute 'wakeup' to the GPIO sysfs interface which allows
> > marking/unmarking a GPIO as wake IRQ.
> > The file 'wakeup' is created in each exported GPIOs directory, if an IRQ
> > is associated with that GPIO and the irqchip implements set_wake().
> > Writing 'enabled' to that file will enable wake for that GPIO, while
> > writing 'disabled' will disable wake.
> > Reading that file will return either 'disabled' or 'enabled' depening on
> > the currently set flag for the GPIO's IRQ.
> >
> > Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> > Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
> > ---
> > v3:
> >  - add documentation
> > v2:
> >  - fix error path to unlock mutex before return
> (...)
> 
> Looking better!
> 
> > +       "wakeup" ... reads as either "enabled" or "disabled". Write these
> > +               strings to set/clear the 'wakeup' flag of the IRQ associated
> > +               with this GPIO. If the IRQ has the 'wakeup' flag set, it can
> > +               wake the system from sleep states.
> > +
> > +               This file exists only if the pin can generate interrupts and
> > +               the driver implements the required infrastructure.
> 
> Should this not be 0/1 rather than the string "enabled"/"disabled"?
> 
> I think that is the common pattern in sysfs?
> 
> Not sure, but want an indication from the ABI people.

So, as I told Alexandre, 'wakeup' including the values 'enabled' &
'disabled' is how devices that support wake expose this functionality. I
think this is more in line with what already exists. As reference, have
a look at https://www.kernel.org/doc/Documentation/power/devices.txt. It
has a section '/sys/devices/.../power/wakeup files'.

	Thanks,
	Sören

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Karol Lewandowski @ 2014-10-31 14:21 UTC (permalink / raw)
  To: Paul Moore, Greg Kroah-Hartman
  Cc: Jiri Kosina, Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	John Stultz, Arnd Bergmann, Tejun Heo, Ryan Lortie,
	Simon McVittie, daniel-cYrQPVfZoowdnm+yROfE0A, David Herrmann,
	casey.schaufler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	marcel-kz+m5ild9QBg9hUCZPvPmw, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Karol Lewandowski
In-Reply-To: <5113482.YUK8i6Rueb@sifl>

On 2014-10-31 00:39, Paul Moore wrote:
> On Thursday, October 30, 2014 08:55:56 PM Karol Lewandowski wrote:
>> On 2014-10-30 15:47, Greg Kroah-Hartman wrote:
>>> Other than that, I don't know exactly what your patches do, or why they
>>> are needed, care to go into details?
>>
>> Patches in question were supposed to add few hooks for kdbus-specific
>> operations that doesn't seem to have compatible semantics with hooks
>> currently available in LSM.
>>
>> kdbus' bus introduces quite a few new concepts that we wanted to be able
>> to limit based on MAC label/context, eg.
>>
>>  - check flags at HELO stage (say disallow fd passing),
>>
>>  - restrict ability to acquire name to certain subjects (for system bus),
>>
>>  - disallow creation of new buses,
>>
>>  - limit scope of broadcasts,
>>
>>  - etc.
>>
>> Please take a look at hook list - I think most of names are
>> self-explanatory:
>>
>>  
>> https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add1
>> 2/include/linux/security.h#L1874
>>
>> kdbus modifications were pretty light - with most visible change being
>> addition of opaque security pointer to kdbus_bus and similar structs.
> 
> [NOTE: we really should add the LSM list to this discussion and future 
> patchset postings.]
> 
> Also, to be completely honest, I don't think we ever really arrived at any 
> final conclusion about those LSM/kdbus hooks either.  At least I don't think I 
> ever really satisfied myself that what we had was the "right" solution.

Agreed, "hooks" are far from being complete.  I think that patches
were and still are - a starting point for discussion, not "a solution"
itself.

Timing wasn't good either - since our last discussion (Apr/May 2014)
kdbus policy engine has been completely rewritten and few core concepts
changed too.

> We both got busy and kinda drifted away from this effort.  Karol, did you do 
> any further work on the hooks?

I didn't.  I was waiting for the peace of change in kdbus to slow
down a bit and, honestly, wasn't expecting submission in few next
months...

I'll do my best to post RFC patchset today or tomorrow.

Thanks
-- 
Karol Lewandowski, Samsung R&D Institute Poland

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Karol Lewandowski @ 2014-10-31 11:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Linux API, linux-kernel, John Stultz, Arnd Bergmann,
	Tejun Heo, Ryan Lortie, Simon McVittie, daniel, David Herrmann,
	Paul Moore, casey.schaufler@intel.com, marcel, tixxdz,
	javier.martinez, alban.crequy, linux-security-module
In-Reply-To: <20141030202434.GA17410@kroah.com>

On 2014-10-30 21:24, Greg Kroah-Hartman wrote:
> On Thu, Oct 30, 2014 at 08:55:56PM +0100, Karol Lewandowski wrote:
>> On 2014-10-30 15:47, Greg Kroah-Hartman wrote:
>>> On Thu, Oct 30, 2014 at 11:44:39AM +0100, Karol Lewandowski wrote:
>>>> [ Sorry for breaking thread and resend - gmane rejected my original message
>>>>   due to too long list of recipients... ]
>>>>
>>>> On 2014-10-30 00:40, Greg Kroah-Hartman wrote:
>>>>
>>>>> There is a 1815 line documentation file in this series, so we aren't
>>>>> trying to not provide this type of information here at all.  But yes,
>>>>> more background, about why this can't be done in userspace (zero copy,
>>>>> less context switches, proper credential passing, timestamping, availble
>>>>> at early-boot, LSM hooks for security models to tie into
>>>>
>>>> While you're at it... I did some work on proof-of-concept LSM patches for
>>>> kdbus some time ago, see [1][2].  Currently, these are completely of date.
>>>>
>>>>  [1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
>>>>  [2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
>>>>
>>>> May I ask if you guys have your own plan for LSM or maybe it would be
>>>> worth to resurrect [1]?
>>>
>>> The core calls are already mediated by LSM today, right?  We don't want
>>> anyone to be parsing the data stream through an LSM, that idea got
>>> rejected a long time ago as something that is really not a good idea.
>>
>> Parsing data is out of question, of course, but this is not what we were
>> proposing.
> 
> Glad to hear it :)
> 
>>> Other than that, I don't know exactly what your patches do, or why they
>>> are needed, care to go into details?
>>
>> Patches in question were supposed to add few hooks for kdbus-specific
>> operations that doesn't seem to have compatible semantics with hooks
>> currently available in LSM.
>>
>> kdbus' bus introduces quite a few new concepts that we wanted to be able
>> to limit based on MAC label/context, eg.
>>
>>  - check flags at HELO stage (say disallow fd passing),
>>
>>  - restrict ability to acquire name to certain subjects (for system bus),
>>
>>  - disallow creation of new buses,
>>
>>  - limit scope of broadcasts,
>>
>>  - etc.
> 
> Nice list.
> 
>> Please take a look at hook list - I think most of names are self-explanatory:
>>
>>   https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add12/include/linux/security.h#L1874
>>
>> kdbus modifications were pretty light - with most visible change being
>> addition of opaque security pointer to kdbus_bus and similar structs.
> 
> That looks very reasonable, care to make it up into a patch I can add to
> the end of this series so it's easy to review and possibly submit as
> part of it?

I'll do my best to prepare something suitable for review, but I'm
not sure it can/should be part of next patch set.

As Paul wrote - discussion about hooks hasn't really ended up with
satisfactory conclusion but just faded away.  kdbus own policy engine
has been  rewritten since I last touched it so I'm not sure what part
are still applicable.

(Unfortunately, I'll be traveling from monday and likely to be offline
for a week or two...)

Thanks
-- 
Karol Lewandowski, Samsung R&D Institute Poland

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Karol Lewandowski @ 2014-10-31 10:58 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Greg Kroah-Hartman, Jiri Kosina, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, John Stultz, Arnd Bergmann,
	Tejun Heo, Ryan Lortie, Simon McVittie,
	daniel-cYrQPVfZoowdnm+yROfE0A, David Herrmann, Paul Moore,
	casey.schaufler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	marcel-kz+m5ild9QBg9hUCZPvPmw, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ
In-Reply-To: <20141030231310.0b65b762-mUKnrFFms3BCCTY1wZZT65JpZx93mCW/@public.gmane.org>

On 2014-10-31 00:13, One Thousand Gnomes wrote:
>>> The core calls are already mediated by LSM today, right?  We don't want
>>> anyone to be parsing the data stream through an LSM, that idea got
>>> rejected a long time ago as something that is really not a good idea.
>>
>> Parsing data is out of question, of course, but this is not what we were
>> proposing.
> 
> Why is it out of the question. If it's a socket you can just a BPF filter
> on it, so why can't kdbus support similar basic functionality ?

Fair point.  I think that none of us simply considered this till now.

Thanks
-- 
Karol Lewandowski, Samsung R&D Institute Poland

^ permalink raw reply

* Re: kdbus: add header file
From: Daniel Mack @ 2014-10-31 10:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tom Gundersen, Greg Kroah-Hartman, Linux API, LKML, John Stultz,
	Tejun Heo, Marcel Holtmann, Ryan Lortie, Bastien Nocera,
	David Herrmann, Djalal Harouni, Simon McVittie, alban.crequy,
	javier.martinez
In-Reply-To: <3334170.rJW8YMf1Nv@wuerfel>

On 10/30/2014 01:03 PM, Arnd Bergmann wrote:
> On Thursday 30 October 2014 12:52:58 Daniel Mack wrote:

>> Hmm, this is the header exported to userspace, so having enums in would
>> make our lives easier, right?
> 
> My point was that you never use the enum by type and the only place in
> user space where it's referenced would be something like
> 
> 	ret = ioctl(fd, KDBUS_CMD_BUS_MAKE, &make);
> 
> In the debugger, you will see the source line here. If you trace into the
> glibc ioctl function, you no longer know the type because that just
> has an 'int'.

Alright - I changed that to #defines now.


Thanks,
Daniel

^ permalink raw reply

* Re: kdbus: add code for buses, domains and endpoints
From: Daniel Mack @ 2014-10-31  9:55 UTC (permalink / raw)
  To: Al Viro, Greg Kroah-Hartman
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	tj-DgEjT+Ai2ygdnm+yROfE0A, marcel-kz+m5ild9QBg9hUCZPvPmw,
	desrt-0xnayjDhYQY, hadess-0MeiytkfxGOsTnJN9+BGXg,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	simon.mcvittie-ZGY8ohtN/8pPYcu2f3hruQ,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ, teg-B22kvLQNl6c
In-Reply-To: <20141031013922.GG7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

Hi,

On 10/31/2014 02:39 AM, Al Viro wrote:
> On Wed, Oct 29, 2014 at 03:00:52PM -0700, Greg Kroah-Hartman wrote:
>> See Documentation/kdbus.txt for more details.
> 
> ... which has nothing whatsoever on object lifetime rules.

True. That document only describes the external API exposed by the
driver towards userspace.

> Could you
> folks please document that somewhere?  What pins what, what state
> transitions are possible, etc.

Hmm, I'll see whether I can write something up.

> BTW, the calling conventions for your foo_new() are annoying - instead of
> "return -E... or 0, storing the reference to new object in var parameter
> passed as the last argument", could you please just return ERR_PTR(-E...)
> on error, a pointer to new object on success and to hell with those
> struct foo **foo in the argument lists?

No problem at all. We'll change that around.


Thanks for your feedback, much appreciated!

Daniel

^ permalink raw reply

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
From: Nicolas Dichtel @ 2014-10-31  9:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	luto-kltTT9wpgjJwATOyAt5JVQ, cwang-xCSkyg8dI+0RB7SZvlqPiA
In-Reply-To: <871tpph03k.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

Le 30/10/2014 19:41, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> writes:
>
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> Ids of peer netns are set by userland via a new genl messages. These ids are
>> stored per netns and are local (ie only valid in the netns where they are set).
>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>> the id of a peer netns. Note that it will be possible to add a table (struct net
>> -> id) later to optimize this lookup if needed.
>>
>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>> a GET and a SET.
>>
>> iproute2 patches are available, I can send them on demand.
>
> A quick reply.  I think this patchset is in the right general direction.
> There are some oddball details that seem odd/awkward to me such as using
> genetlink instead of rtnetlink to get and set the ids, and not having
> ids if they are not set (that feels like a maintenance/usability challenge).
No problem to use rtnetlink, in fact, I hesitated.

For the second point, I'm not sure to follow you: how to have an id, which will
not break migration, without asking the user to set it?
Note that if the user does not provide an id, you still have a magic value to
say "it's a peer netns but we don't know which one".

>
> I would like to give your patches a deep review, but I won't be able to
> do that for a couple of weeks.  I am deep in the process of moving,
> and will be mostly offline until about the Nov 11th.

No problem, I will wait.
I would be great to get a final version for the 3.19 ;-)


Thank you,
Nicolas

^ permalink raw reply

* Re: [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
From: Nicolas Dichtel @ 2014-10-31  9:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	luto-kltTT9wpgjJwATOyAt5JVQ, cwang-xCSkyg8dI+0RB7SZvlqPiA
In-Reply-To: <874mulh0cs.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

Le 30/10/2014 19:35, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> writes:
>
>> With this patch, a user can define an id for a peer netns by providing a FD or a
>> PID. These ids are local to netns (ie valid only into one netns).
>
> Scratches head.  Do you actually find value in using the pid instead of
> a file descriptor?
I copied the mechanism from rtnl_link_get_net():
First check if the user provides a PID, if not, check for a FD.

>
> Doing things by pid was an early attempt to make things work, and has
> been a bit clutsy.  If you don't find value in it I would recommend just
> supporting getting/setting the network namespace by file descriptor.
Hmm, if I understand well, it's what is done in the patch:

[snip]
>> +static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
>> +{
[snip]
>> +	if (info->attrs[NETNSA_PID])
>> +		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
>> +	else if (info->attrs[NETNSA_FD])
>> +		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
>> +	else
>> +		return -EINVAL;
>> +	if (IS_ERR(peer))
>> +		return PTR_ERR(peer);

Am I right?


Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH v3] gpio: lib-sysfs: Add 'wakeup' attribute
From: Linus Walleij @ 2014-10-31  7:03 UTC (permalink / raw)
  To: Soren Brinkmann, linux-api
  Cc: Alexandre Courbot, Jonathan Corbet, linux-kernel@vger.kernel.org,
	linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <1414434602-14263-1-git-send-email-soren.brinkmann@xilinx.com>

On Mon, Oct 27, 2014 at 7:30 PM, Soren Brinkmann
<soren.brinkmann@xilinx.com> wrote:

> Add an attribute 'wakeup' to the GPIO sysfs interface which allows
> marking/unmarking a GPIO as wake IRQ.
> The file 'wakeup' is created in each exported GPIOs directory, if an IRQ
> is associated with that GPIO and the irqchip implements set_wake().
> Writing 'enabled' to that file will enable wake for that GPIO, while
> writing 'disabled' will disable wake.
> Reading that file will return either 'disabled' or 'enabled' depening on
> the currently set flag for the GPIO's IRQ.
>
> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> v3:
>  - add documentation
> v2:
>  - fix error path to unlock mutex before return
(...)

Looking better!

> +       "wakeup" ... reads as either "enabled" or "disabled". Write these
> +               strings to set/clear the 'wakeup' flag of the IRQ associated
> +               with this GPIO. If the IRQ has the 'wakeup' flag set, it can
> +               wake the system from sleep states.
> +
> +               This file exists only if the pin can generate interrupts and
> +               the driver implements the required infrastructure.

Should this not be 0/1 rather than the string "enabled"/"disabled"?

I think that is the common pattern in sysfs?

Not sure, but want an indication from the ABI people.

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH] kernel, add panic_on_warn
From: Hedi Berriche @ 2014-10-31  1:58 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andi Kleen, Jonathan Corbet,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Rusty Russell,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, jbaron-JqFfY2XvxFXQT0dZR+AlfA,
	Fabian Frederick, isimatu.yasuaki-+CUm20s59erQFUHtdCDX3A,
	H. Peter Anvin, Masami Hiramatsu, Andrew Morton,
	linux-api-u79uwXL29TY76Z2rM5mHXA, vgoyal-H+wXaHxf7aLQT0dZR+AlfA
In-Reply-To: <1414688627-5298-1-git-send-email-prarit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Oct 30, 2014 at 17:06 Prarit Bhargava wrote:
| There have been several times where I have had to rebuild a kernel to
| cause a panic when hitting a WARN() in the code in order to get a crash
| dump from a system.  Sometimes this is easy to do, other times (such as
| in the case of a remote admin) it is not trivial to send new images to the
| user.
| 
| A much easier method would be a switch to change the WARN() over to a
| panic.  This makes debugging easier in that I can now test the actual
| image the WARN() was seen on and I do not have to engage in remote
| debugging.

Do we want to leave it to usersspace[1] to ensure panic_on_warn is out
of the way in when the kdump kernel boots? or would a self-contained
approach be more preferable i.e. test whether we're a kdump kernel
before bothering with panic_on_warn?

Cheers,
Hedi.

[1] kexec-tools in the case of the boot param by filtering it out of the
    kdump kernel cmdline. In the case of sysctl.conf, it would depend on
    whether there are distros out there that  include it in the kdump
    initrd.

-- 
Hedi Berriche
Linux Kernel Engineer
http://www.sgi.com

^ permalink raw reply

* Re: kdbus: add code for buses, domains and endpoints
From: Al Viro @ 2014-10-31  1:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	tj-DgEjT+Ai2ygdnm+yROfE0A, marcel-kz+m5ild9QBg9hUCZPvPmw,
	desrt-0xnayjDhYQY, hadess-0MeiytkfxGOsTnJN9+BGXg,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	simon.mcvittie-ZGY8ohtN/8pPYcu2f3hruQ,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ, teg-B22kvLQNl6c
In-Reply-To: <1414620056-6675-9-git-send-email-gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

On Wed, Oct 29, 2014 at 03:00:52PM -0700, Greg Kroah-Hartman wrote:
> See Documentation/kdbus.txt for more details.

... which has nothing whatsoever on object lifetime rules.  Could you
folks please document that somewhere?  What pins what, what state
transitions are possible, etc.

BTW, the calling conventions for your foo_new() are annoying - instead of
"return -E... or 0, storing the reference to new object in var parameter
passed as the last argument", could you please just return ERR_PTR(-E...)
on error, a pointer to new object on success and to hell with those
struct foo **foo in the argument lists?

^ permalink raw reply

* Re: kdbus: add driver skeleton, ioctl entry points and utility functions
From: Thomas Gleixner @ 2014-10-31  0:42 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Greg Kroah-Hartman, linux-api-u79uwXL29TY76Z2rM5mHXA, LKML,
	John Stultz, Arnd Bergmann, Tejun Heo,
	marcel-kz+m5ild9QBg9hUCZPvPmw, desrt-0xnayjDhYQY,
	hadess-0MeiytkfxGOsTnJN9+BGXg, dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	simon.mcvittie-ZGY8ohtN/8pPYcu2f3hruQ,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ, teg-B22kvLQNl6c,
	Peter Zijlstra
In-Reply-To: <alpine.LRH.2.00.1410310114290.11562-1ReQVI26iDCaZKY3DrU6dA@public.gmane.org>

On Fri, 31 Oct 2014, Jiri Kosina wrote:
> On Fri, 31 Oct 2014, Thomas Gleixner wrote:
> > > +static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
> > > +			       unsigned long arg)
> > > +{
> > > +	struct kdbus_handle *handle = file->private_data;
> > > +	void __user *argp = (void __user *)arg;
> > > +	enum kdbus_handle_type type = handle->type;
> > > +
> > > +	/* make sure all handle fields are set if handle->type is */
> > > +	smp_rmb();
> > 
> > Sure. You really need this kind of serialization because your design
> > choice of allowing opaque handles in the first place.
> > 
> > I'm really interested why you need this rmb() at all. Just because you
> > have several threads in user space which might race with the type
> > assignment when they call the ioctl?
> > 
> > We have a strict requirement to document memory barriers. The
> > following comment definitely does not fulfil this requirement as it
> > just documents that someone observed a race of unknown provenance and
> > got it 'fixed' with a 'smp_rmb()'
> > 
> > > +     /* make sure all handle fields are set if handle->type is */
> > 
> > That's really hillarious, The user space side knows excatly upfront
> > which type of 'handle' it wants to open. Making it an opaque handle in
> > the first place and let the kernel deal with the actual type
> > assignment is beyond silly. Especially if that involves undocumented
> > memory barriers.
> 
> I have been staring at exactly this for rather a long time today. 
> 
> Apparently this barrier pairs with smp_wmb() in kdbus_handle_transform() 
> and tries to make sure that whenever handle->type is seen as updated, 
> handle->ptr is as well.

Right. But it does not make any sense at all. 

The underlying problem is the design of the whole character device
interface as an opaque type. Just look at the absurd workarounds in
the userspace implementation of this,

Now we copy it to kernel space 1:1 and find other absurd workarounds
for it instead of designing it new.

Welcome to the world of bug compability...

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH V2] kernel, add bug_on_warn
From: Rusty Russell @ 2014-10-31  0:25 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jonathan Corbet,
	Andrew Morton, H. Peter Anvin, Andi Kleen, Masami Hiramatsu,
	Fabian Frederick, vgoyal-H+wXaHxf7aLQT0dZR+AlfA,
	isimatu.yasuaki-+CUm20s59erQFUHtdCDX3A,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <54478367.7030505-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Prarit Bhargava <prarit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> On 10/22/2014 12:27 AM, Rusty Russell wrote:
>> Prarit Bhargava <prarit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>>> There have been several times where I have had to rebuild a kernel to
>>> cause a panic when hitting a WARN() in the code in order to get a crash
>>> dump from a system.  Sometimes this is easy to do, other times (such as
>>> in the case of a remote admin) it is not trivial to send new images to the
>>> user.
>> 
>> What about during early boot?
>
> Hi Rusty,
>
> I really don't have a use case for this in early boot.  The kernel boots, the
> initramfs, and then we run whatever init (systemd in my case).  A systemd script
> configures kexec for kdump and that point kdump is "armed".  Doing a bug_on_warn
> before this will simply result in a panicked system.  I don't get any "new"
> information FWIW as I get a stack trace, etc., in both the WARN() and BUG() cases.
>
>> 
>> I'd recommend you use core_param().  Less code, and can be set on
>> commandline.
>
> Is that a general request, or is it dependent on the answer above?  Of course I
> have no problem doing it either way.

Oops, I read your initial patch too lightly: I see you added both
a sysctl (which I saw) and an early_param (which I didn't).

I still think it should be a core_param, like so (untested!):

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 6c0b9f27e465..cf16cbe9f544 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -471,6 +471,13 @@ format. Crash is available on Dave Anderson's site at the following URL:
 
    http://people.redhat.com/~anderson/
 
+Trigger Kdump on WARN()
+=======================
+
+The kernel parameter, bug_on_warn, calls BUG() in all WARN() paths.  This
+will cause a kdump to occur at the BUG() call.  In cases where a user
+wants to specify this during runtime, /sys/module/kernel/bug_on_warn can be
+set to 1 to achieve the same behaviour.
 
 Contact
 =======
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 74339c57b914..aa1d3198e987 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -553,6 +553,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	bttv.pll=	See Documentation/video4linux/bttv/Insmod-options
 	bttv.tuner=
 
+	bug_on_warn	BUG() instead of WARN().  Useful to cause kdump
+			on a WARN().
+
 	bulk_remove=off	[PPC]  This parameter disables the use of the pSeries
 			firmware feature for flushing multiple hpte entries
 			at a time.
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 630dd2372238..4d0c763862b0 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -75,10 +75,18 @@ extern void warn_slowpath_null(const char *file, const int line);
 #define __WARN_printf_taint(taint, arg...)				\
 	warn_slowpath_fmt_taint(__FILE__, __LINE__, taint, arg)
 #else
-#define __WARN()		__WARN_TAINT(TAINT_WARN)
+#define check_bug_on_warn()						\
+	do {								\
+		if (bug_on_warn)					\
+			BUG();						\
+	} while (0)
+
+#define __WARN()							\
+	do { __WARN_TAINT(TAINT_WARN); check_bug_on_warn(); } while (0)
+
 #define __WARN_printf(arg...)	do { printk(arg); __WARN(); } while (0)
 #define __WARN_printf_taint(taint, arg...)				\
-	do { printk(arg); __WARN_TAINT(taint); } while (0)
+	do { printk(arg); __WARN_TAINT(taint); check_bug_on_warn(); } while (0)
 #endif
 
 #ifndef WARN_ON
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3d770f5564b8..d583df09ee82 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -423,6 +423,7 @@ extern int panic_on_oops;
 extern int panic_on_unrecovered_nmi;
 extern int panic_on_io_nmi;
 extern int sysctl_panic_on_stackoverflow;
+extern bool bug_on_warn;
 /*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
diff --git a/kernel/panic.c b/kernel/panic.c
index d09dc5c32c67..3d345357fcc8 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -33,6 +33,7 @@ static int pause_on_oops;
 static int pause_on_oops_flag;
 static DEFINE_SPINLOCK(pause_on_oops_lock);
 static bool crash_kexec_post_notifiers;
+bool bug_on_warn;
 
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
@@ -420,13 +421,24 @@ static void warn_slowpath_common(const char *file, int line, void *caller,
 {
 	disable_trace_on_warning();
 
-	pr_warn("------------[ cut here ]------------\n");
+	if (!bug_on_warn)
+		pr_warn("------------[ cut here ]------------\n");
 	pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS()\n",
 		raw_smp_processor_id(), current->pid, file, line, caller);
 
 	if (args)
 		vprintk(args->fmt, args->args);
 
+	if (bug_on_warn) {
+		pr_warn("bug_on_warn set, calling BUG()...\n");
+		/*
+		 * A flood of WARN()s may occur.  Prevent further WARN()s
+		 * from panicking the system.
+		 */
+		bug_on_warn = false;
+		BUG();
+	}
+
 	print_modules();
 	dump_stack();
 	print_oops_end_marker();
@@ -484,6 +496,7 @@ EXPORT_SYMBOL(__stack_chk_fail);
 
 core_param(panic, panic_timeout, int, 0644);
 core_param(pause_on_oops, pause_on_oops, int, 0644);
+core_param(bug_on_warn, bug_on_warn, bool, 0644);
 
 static int __init setup_crash_kexec_post_notifiers(char *s)
 {

^ permalink raw reply related

* Re: kdbus: add driver skeleton, ioctl entry points and utility functions
From: Jiri Kosina @ 2014-10-31  0:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Greg Kroah-Hartman, linux-api-u79uwXL29TY76Z2rM5mHXA, LKML,
	John Stultz, Arnd Bergmann, Tejun Heo,
	marcel-kz+m5ild9QBg9hUCZPvPmw, desrt-0xnayjDhYQY,
	hadess-0MeiytkfxGOsTnJN9+BGXg, dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	simon.mcvittie-ZGY8ohtN/8pPYcu2f3hruQ,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ, teg-B22kvLQNl6c,
	Peter Zijlstra
In-Reply-To: <alpine.DEB.2.11.1410302224230.5308@nanos>

On Fri, 31 Oct 2014, Thomas Gleixner wrote:

> > +static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
> > +			       unsigned long arg)
> > +{
> > +	struct kdbus_handle *handle = file->private_data;
> > +	void __user *argp = (void __user *)arg;
> > +	enum kdbus_handle_type type = handle->type;
> > +
> > +	/* make sure all handle fields are set if handle->type is */
> > +	smp_rmb();
> 
> Sure. You really need this kind of serialization because your design
> choice of allowing opaque handles in the first place.
> 
> I'm really interested why you need this rmb() at all. Just because you
> have several threads in user space which might race with the type
> assignment when they call the ioctl?
> 
> We have a strict requirement to document memory barriers. The
> following comment definitely does not fulfil this requirement as it
> just documents that someone observed a race of unknown provenance and
> got it 'fixed' with a 'smp_rmb()'
> 
> > +     /* make sure all handle fields are set if handle->type is */
> 
> That's really hillarious, The user space side knows excatly upfront
> which type of 'handle' it wants to open. Making it an opaque handle in
> the first place and let the kernel deal with the actual type
> assignment is beyond silly. Especially if that involves undocumented
> memory barriers.

I have been staring at exactly this for rather a long time today. 

Apparently this barrier pairs with smp_wmb() in kdbus_handle_transform() 
and tries to make sure that whenever handle->type is seen as updated, 
handle->ptr is as well.

But it's still difficult for me to understand all the memory ordering 
rules and consequences of this strict ordering (my current understanding 
is that the barrier is not needed, but I will have to think about it a 
little bit more), so a nice and explanatory comment precisely describing 
the race this is protecting against would be very welcome.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply

* Re: kdbus: add driver skeleton, ioctl entry points and utility functions
From: Thomas Gleixner @ 2014-10-30 23:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, LKML, John Stultz,
	Arnd Bergmann, Tejun Heo, marcel-kz+m5ild9QBg9hUCZPvPmw,
	desrt-0xnayjDhYQY, hadess-0MeiytkfxGOsTnJN9+BGXg,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	simon.mcvittie-ZGY8ohtN/8pPYcu2f3hruQ,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ, teg-B22kvLQNl6c,
	Peter Zijlstra
In-Reply-To: <1414620056-6675-4-git-send-email-gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

On Wed, 29 Oct 2014, Greg Kroah-Hartman wrote:
> +/* kdbus major */
> +static unsigned int kdbus_major;
> +
> +/* map of minors to objects */
> +static DEFINE_IDR(kdbus_minor_idr);
> +
> +/* kdbus minor lock */
> +static DEFINE_SPINLOCK(kdbus_minor_lock);
> +
> +int kdbus_minor_init(void)
> +{
> +	int ret;
> +
> +	ret = __register_chrdev(0, 0, 0xfffff, KBUILD_MODNAME,

0xfffff? Random number pulled out of thin air or is there some
sensible explanation for this choice?

> +				&kdbus_handle_ops);
> +	if (ret < 0)
> +		return ret;
> +
> +	kdbus_major = ret;

So minor_init actually assigns the major number ...

> +	return 0;
> +}
> +
> +void kdbus_minor_exit(void)
> +{
> +	__unregister_chrdev(kdbus_major, 0, 0xfffff, KBUILD_MODNAME);

So we have the magic 0xfffff constant at two places to make sure that
it is updated in sync.

> +	idr_destroy(&kdbus_minor_idr);
> +}
> +
> +static void *kdbus_minor_pack(enum kdbus_minor_type type, void *ptr)
> +{
> +	unsigned long p = (unsigned long)ptr;
> +
> +	BUILD_BUG_ON(KDBUS_MINOR_CNT > 4);

We certainly want a build bug in some random function with another
magic number for a completely undocumented enum, which does neither
explain what its enum constants mean nor does have a comment that it
is limited to 0-3 for a very good reason.

And of course any not overloaded pointer defaults to
KDBUS_MINOR_CONTROL which is always valid...

> +
> +	if (WARN_ON(p & 0x3UL || type >= KDBUS_MINOR_CNT))

0x03UL is a very descriptive constant ....

> +		return NULL;
> +
> +	return (void *)(p | (unsigned long)type);
> +}
> +
> +static enum kdbus_minor_type kdbus_minor_unpack(void **ptr)
> +{
> +	unsigned long p = (unsigned long)*ptr;
> +
> +	*ptr = (void *)(p & ~0x3UL);
> +	return p & 0x3UL;

I'm really excited about the intuitive naming conventions
here. minor_init() initializes kdbus_major and this pack/unpack stuff
converts a pointer to carry a type and vice versa. Of course that
stuff lacks any comment in order to accelerate reviews, right?

I really had to look more than twice to figure out that this function
serves two purposes;

 - Remove the type overload from the pointer

 - Return the type retrieved from the pointer.

Aside of that: What on earth has minor to do with this?

For heavens sake. Minor is referring to a minor number in the context
of character devices, right?

And a number does not map at all to a randomly overloaded pointer
AFAICT.

> +/**
> + * kdbus_minor_set() - set an existing minor type of a kdbus device node

Groan. The name choices are just ass backwards to be honest. And the
explanation of the function is even worse:

   "set an existing minor type of a kdbus device node"

What the heck is an 'existing minor type' ?

Ah, you mean if the type does not exist, i.e. it is >= KDBUS_MINOR_CNT)
then the idr entry will be replaced by a NULL pointer silently.

Aside of that if ptr has one of the lower two bits set, and the call
tries to change its type then we get a WARN_ON in dmesg and happily
assign a NULL pointer to the idr entry for that minor number.

Makes a lot of sense. We'll see the fallout later...

> + * @devt:	The device node to remove

So @devt is removed? The documentation of idr_replace() tells a
different story.

> + * @type:	New type to set
> + * @ptr:	Associated pointer when node was initially registered

Why is this a void pointer? Are we dealing with arbitrary node types
here?

> + */
> +void kdbus_minor_set(dev_t devt, enum kdbus_minor_type type, void *ptr)
> +{
> +	unsigned int minor = MINOR(devt);
> +
> +	ptr = kdbus_minor_pack(type, ptr);
> +
> +	spin_lock(&kdbus_minor_lock);

Why is this a spinlock and not a mutex?

> +	ptr = idr_replace(&kdbus_minor_idr, ptr, minor);

What's the value of this pointless assignment? Pacify gcc?

> +static int kdbus_handle_release(struct inode *inode, struct file *file)
> +{
> +	struct kdbus_handle *handle = file->private_data;
> +
> +	switch (handle->type) {
> +	case KDBUS_HANDLE_CONTROL_DOMAIN_OWNER:
> +		kdbus_domain_disconnect(handle->domain_owner);
> +		kdbus_domain_unref(handle->domain_owner);
> +		break;
> +
> +	case KDBUS_HANDLE_CONTROL_BUS_OWNER:
> +		kdbus_bus_disconnect(handle->bus_owner);
> +		kdbus_bus_unref(handle->bus_owner);
> +		break;
> +
> +	case KDBUS_HANDLE_ENDPOINT_OWNER:
> +		kdbus_ep_disconnect(handle->ep_owner);
> +		kdbus_ep_unref(handle->ep_owner);
> +		break;
> +
> +	case KDBUS_HANDLE_ENDPOINT_CONNECTED:
> +		kdbus_conn_disconnect(handle->conn, false);
> +		kdbus_conn_unref(handle->conn);
> +		break;
> +
> +	default:
> +		break;

Silent acceptance of type being unknown?

> +static int kdbus_copy_from_user(void *dest,
> +				void __user *user_ptr,
> +				size_t size)
> +{
> +	if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
> +		return -EFAULT;

Completely undocumented requirement and there is no reason WHY we need
KDBUS_IS_ALIGNED8 to figure that out ....

> +static int kdbus_handle_transform(struct kdbus_handle *handle,
> +				  enum kdbus_handle_type old_type,
> +				  enum kdbus_handle_type new_type,
> +				  void *ctx_ptr)
> +{
> +	int ret = -EBADFD;
> +
> +	/*
> +	 * This transforms a handle from one state into another. Only a single
> +	 * transformation is allowed per handle, and it must be one of:
> +	 *   CONTROL -> CONTROL_DOMAIN_OWNER
> +	 *           -> CONTROL_BUS_OWNER
> +	 *        EP -> EP_CONNECTED
> +	 *           -> EP_OWNER

And that's magically enforced by what? new_type is not sanity checked
here at all and there is no requirement for the call site to do so.

> +/* kdbus control device commands */
> +static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
> +				       void __user *buf)
> +{
> +	struct kdbus_handle *handle = file->private_data;
> +	struct kdbus_bus *bus = NULL;
> +	struct kdbus_cmd_make *make;
> +	struct kdbus_domain *domain = NULL;
> +	umode_t mode = 0600;
> +	void *p = NULL;
> +	int ret;
> +
> +	switch (cmd) {
> +	case KDBUS_CMD_BUS_MAKE: {
> +		kgid_t gid = KGIDT_INIT(0);
> +		struct kdbus_bloom_parameter bloom;
> +		char *name;
> +
> +		ret = kdbus_memdup_user(buf, &p, sizeof(*make),
> +					KDBUS_MAKE_MAX_SIZE);
> +		if (ret < 0)
> +			break;
> +
> +		make = p;

Another great coding convention stolen from some ramdonly misdesigned
user space app?

     ret = kdbus_memdup_user(buf, &p, sizeof(*make)....

What's wrong with 

       struct kdbus_cmd_make *make = NULL;

       ret = kdbus_memdup_user(buf, &make, sizeof(*make) ... ????

Surely void pointers are a great guarantee to make things better,
right?

> +	case KDBUS_CMD_DOMAIN_MAKE: {
> +		const char *name;
> +
> +		if (!capable(CAP_IPC_OWNER)) {

Why is this not in the open() call? Because you have an opaque device
at the time of open()?

Offloading security relevant decisions to an ioctl is an interesting
design choice in theory. In fact it is just wrong.

> +			ret = -EPERM;
> +			break;
> +		}
> +
> +		ret = kdbus_memdup_user(buf, &p, sizeof(*make),
> +					KDBUS_MAKE_MAX_SIZE);
> +		if (ret < 0)
> +			break;
> +
> +		make = p;

See above.

> +/* kdbus endpoint make commands */
> +static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
> +				  void __user *buf)
> +{
> +	struct kdbus_handle *handle = file->private_data;
> +	void *p = NULL;
> +	long ret = 0;
> +
> +	switch (cmd) {
> +	case KDBUS_CMD_ENDPOINT_MAKE: {
> +		struct kdbus_cmd_make *make;
> +		umode_t mode = 0;
> +		kgid_t gid = KGIDT_INIT(0);
> +		const char *name;
> +		struct kdbus_ep *ep;
> +
> +		/* creating custom endpoints is a privileged operation */
> +		if (!kdbus_bus_uid_is_privileged(handle->ep->bus)) {

See above.

> +			ret = -EPERM;
> +			break;
> +		}
> +
> +		ret = kdbus_memdup_user(buf, &p, sizeof(*make),
> +					KDBUS_MAKE_MAX_SIZE);
> +		if (ret < 0)
> +			break;
> +
> +		make = p;

See above.

> +	case KDBUS_CMD_HELLO: {
> +		struct kdbus_cmd_hello *hello;
> +		struct kdbus_conn *conn = NULL;
> +
> +		ret = kdbus_memdup_user(buf, &p, sizeof(*hello),
> +					KDBUS_HELLO_MAX_SIZE);
> +		if (ret < 0)
> +			break;
> +
> +		hello = p;

Ditto.

> +/* kdbus endpoint commands for connected peers */
> +static long kdbus_handle_ioctl_ep_connected(struct file *file, unsigned int cmd,
> +					    void __user *buf)
> +{
> +	struct kdbus_handle *handle = file->private_data;
> +	struct kdbus_conn *conn = handle->conn;
> +	void *p = NULL;
> +	long ret = 0;
> +
> +	/*
> +	 * BYEBYE is special; we must not acquire a connection when
> +	 * calling into kdbus_conn_disconnect() or we will deadlock,
> +	 * because kdbus_conn_disconnect() will wait for all acquired
> +	 * references to be dropped.
> +	 */
> +	if (cmd == KDBUS_CMD_BYEBYE) {
> +		if (!kdbus_conn_is_connected(conn))
> +			return -EOPNOTSUPP;

If the connection is down already then a BEYBEY is just moot. I don't
see a good reason WHY the return code is -EOPNOTSUPP.

Is this just to provide bug compability with the existing user space
code? If so, pick some proper return code which reflects the state and
deal with it in user space. If not, then you should think hard why you
did not find anything which is more appropriate in the wide choice of
error codes.

> +		return kdbus_conn_disconnect(conn, true);
> +	}
> +
> +	ret = kdbus_conn_acquire(conn);
> +	if (ret < 0)
> +		return ret;
> +
> +	switch (cmd) {
> +	case KDBUS_CMD_NAME_ACQUIRE: {
> +		/* acquire a well-known name */
> +		struct kdbus_cmd_name *cmd_name;
> +
> +		if (!kdbus_conn_is_connected(conn)) {
> +			ret = -EOPNOTSUPP;

See above.

> +			break;
> +		}

Aside of that what makes sure that the connection is not going away
after you checked it above?

Magic serialization or what? I can't see any of it.

If it's just an optimization then it wants to have a proper comment
and not just a random chosen return code which matches the expectation
of some equally undocumented user space app.

> +static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
> +			       unsigned long arg)
> +{
> +	struct kdbus_handle *handle = file->private_data;
> +	void __user *argp = (void __user *)arg;
> +	enum kdbus_handle_type type = handle->type;
> +
> +	/* make sure all handle fields are set if handle->type is */
> +	smp_rmb();

Sure. You really need this kind of serialization because your design
choice of allowing opaque handles in the first place.

I'm really interested why you need this rmb() at all. Just because you
have several threads in user space which might race with the type
assignment when they call the ioctl?

We have a strict requirement to document memory barriers. The
following comment definitely does not fulfil this requirement as it
just documents that someone observed a race of unknown provenance and
got it 'fixed' with a 'smp_rmb()'

> +     /* make sure all handle fields are set if handle->type is */

That's really hillarious, The user space side knows excatly upfront
which type of 'handle' it wants to open. Making it an opaque handle in
the first place and let the kernel deal with the actual type
assignment is beyond silly. Especially if that involves undocumented
memory barriers.

> +	switch (type) {
> +	case KDBUS_HANDLE_CONTROL:
> +		return kdbus_handle_ioctl_control(file, cmd, argp);
> +
> +	case KDBUS_HANDLE_EP:
> +		return kdbus_handle_ioctl_ep(file, cmd, argp);
> +
> +	case KDBUS_HANDLE_ENDPOINT_CONNECTED:
> +		return kdbus_handle_ioctl_ep_connected(file, cmd, argp);
> +
> +	case KDBUS_HANDLE_ENDPOINT_OWNER:
> +		return kdbus_handle_ioctl_ep_owner(file, cmd, argp);
> +
> +	default:
> +		return -EBADFD;
> +	}
> +}
> +
> +static unsigned int kdbus_handle_poll(struct file *file,
> +				      struct poll_table_struct *wait)
> +{
> +	struct kdbus_handle *handle = file->private_data;
> +	struct kdbus_conn *conn;
> +	unsigned int mask = POLLOUT | POLLWRNORM;
> +
> +	/* Only a connected endpoint can read/write data */
> +	if (handle->type != KDBUS_HANDLE_ENDPOINT_CONNECTED)
> +		return POLLERR | POLLHUP;
> +
> +	/* make sure handle->conn is set if handle->type is */
> +	smp_rmb();

Surely we need to plaster that all over the place just because we
avoid to open dedicated devices in the first place. And do not tell me
that the open call does not know what type it is going to be.

Copying badly designed userspace code to the kernel without rethinking
the design and 'fixing' the short comings by copying the same 'fixup'
over and over is definitely a guarantee for interesting CVEs in the
future.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Paul Moore @ 2014-10-30 23:39 UTC (permalink / raw)
  To: Karol Lewandowski
  Cc: Greg Kroah-Hartman, Jiri Kosina, Linux API, linux-kernel,
	John Stultz, Arnd Bergmann, Tejun Heo, Ryan Lortie,
	Simon McVittie, daniel, David Herrmann, casey.schaufler@intel.com,
	marcel, tixxdz, javier.martinez, alban.crequy,
	linux-security-module
In-Reply-To: <545297CC.6020306@samsung.com>

On Thursday, October 30, 2014 08:55:56 PM Karol Lewandowski wrote:
> On 2014-10-30 15:47, Greg Kroah-Hartman wrote:
> > Other than that, I don't know exactly what your patches do, or why they
> > are needed, care to go into details?
> 
> Patches in question were supposed to add few hooks for kdbus-specific
> operations that doesn't seem to have compatible semantics with hooks
> currently available in LSM.
> 
> kdbus' bus introduces quite a few new concepts that we wanted to be able
> to limit based on MAC label/context, eg.
> 
>  - check flags at HELO stage (say disallow fd passing),
> 
>  - restrict ability to acquire name to certain subjects (for system bus),
> 
>  - disallow creation of new buses,
> 
>  - limit scope of broadcasts,
> 
>  - etc.
> 
> Please take a look at hook list - I think most of names are
> self-explanatory:
> 
>  
> https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add1
> 2/include/linux/security.h#L1874
> 
> kdbus modifications were pretty light - with most visible change being
> addition of opaque security pointer to kdbus_bus and similar structs.

[NOTE: we really should add the LSM list to this discussion and future 
patchset postings.]

Also, to be completely honest, I don't think we ever really arrived at any 
final conclusion about those LSM/kdbus hooks either.  At least I don't think I 
ever really satisfied myself that what we had was the "right" solution.

We both got busy and kinda drifted away from this effort.  Karol, did you do 
any further work on the hooks?

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* How Not To Use kref (was Re: kdbus: add code for buses, domains and endpoints)
From: Al Viro @ 2014-10-30 23:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	tj-DgEjT+Ai2ygdnm+yROfE0A, marcel-kz+m5ild9QBg9hUCZPvPmw,
	desrt-0xnayjDhYQY, hadess-0MeiytkfxGOsTnJN9+BGXg,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	simon.mcvittie-ZGY8ohtN/8pPYcu2f3hruQ,
	daniel-cYrQPVfZoowdnm+yROfE0A,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ, teg-B22kvLQNl6c,
	Linus Torvalds
In-Reply-To: <1414620056-6675-9-git-send-email-gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

On Wed, Oct 29, 2014 at 03:00:52PM -0700, Greg Kroah-Hartman wrote:

> +static void __kdbus_domain_user_free(struct kref *kref)
> +{
> +	struct kdbus_domain_user *user =
> +		container_of(kref, struct kdbus_domain_user, kref);
> +
> +	BUG_ON(atomic_read(&user->buses) > 0);
> +	BUG_ON(atomic_read(&user->connections) > 0);
> +
> +	mutex_lock(&user->domain->lock);
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +	idr_remove(&user->domain->user_idr, user->idr);
> +	hash_del(&user->hentry);
	^^^^^^^^^^^^^^^^^^^^^^^^
> +	mutex_unlock(&user->domain->lock);
> +
> +	kdbus_domain_unref(user->domain);
> +	kfree(user);
> +}

> +struct kdbus_domain_user *kdbus_domain_user_unref(struct kdbus_domain_user *u)
> +{
> +	if (u)
> +		kref_put(&u->kref, __kdbus_domain_user_free);
> +	return NULL;
> +}

If you remove an object from some search structures, taking the lock in
destructor is Too Fucking Late(tm).  Somebody might have already found
that puppy and decided to pick it (all under that lock) just as we'd
got to that point in destructor and blocked there.  Oops...

Normally I'd say "just use kref_put_mutex()", but this case is even worse.
Look:

refcount is 1
A: kref_put_mutex() 
	see that it's potential 1->0 crossing, need to take mutex
	mutex_lock()
B: kref_get()
refcount is 2
A:	got the sodding mutex
	atomic_dec_and_test
refcount is 1 now
	OK, it's not 1->0, after all, just drop the mutex and bugger off
B: kref_put_mutex()
	see that it's potential 1->0 crossing, need to take mutex
	mutex_lock() blocks
A:	mutex_unlock() lets B go
B:	... got it
	atomic_dec_and_test
refcount is 0
	call the destructor now, which ends with
	kdbus_domain_unref(user->domain);
	... which just happens to be the last reference to ->domain
	... and frees it, along with ->domain->mutex

But what's to guarantee that A will be past the last point where mutex_unlock()
is looking at the mutex?  Sure, it's hard to hit, but AFAICS it's not
impossible, especially if the following happens (assuming mutex-dec.h-style
mutices):

B: mutex_lock()
	atomic_dec_return -> -1
	__mutex_lock_slowpath()
A: mutex_unlock()
	atomic_inc_return -> 0
	get preempted
B: note that A has already incremented it to 0 and bugger off - we'd got it

and there we go, with A getting the timeslice back and deciding to call
__mutex_unlock_slowpath() when B has already freed the damn thing.

Basically, kref_put_mutex() is only usable when destructor callback cannot end
up freeing the mutex.

kref_get_unless_zero() might be a usable approach, but IMO the whole thing is
simply outside of kref applicability.  Using it for something that needs to
deal with removal from search structures from the destructor callback is
already stretching the things; this one is far worse.  kref isn't a universal
tool for expressing lifetime cycles.  It works for really simple cases and
might eliminate some amount of boilerplate code.  It's been greatly oversold
and overused, though...

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: One Thousand Gnomes @ 2014-10-30 23:13 UTC (permalink / raw)
  To: Karol Lewandowski
  Cc: Greg Kroah-Hartman, Jiri Kosina, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, John Stultz, Arnd Bergmann,
	Tejun Heo, Ryan Lortie, Simon McVittie,
	daniel-cYrQPVfZoowdnm+yROfE0A, David Herrmann, Paul Moore,
	casey.schaufler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	marcel-kz+m5ild9QBg9hUCZPvPmw, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ
In-Reply-To: <545297CC.6020306-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>

> > The core calls are already mediated by LSM today, right?  We don't want
> > anyone to be parsing the data stream through an LSM, that idea got
> > rejected a long time ago as something that is really not a good idea.
> 
> Parsing data is out of question, of course, but this is not what we were
> proposing.

Why is it out of the question. If it's a socket you can just a BPF filter
on it, so why can't kdbus support similar basic functionality ?

Alan

^ permalink raw reply

* Re: kdbus: add code for buses, domains and endpoints
From: Andy Lutomirski @ 2014-10-30 22:00 UTC (permalink / raw)
  To: Alex Elsayed
  Cc: Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <m2ublh$5h7$2@ger.gmane.org>

On Thu, Oct 30, 2014 at 2:47 PM, Alex Elsayed <eternaleye-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Andy Lutomirski wrote:
>
>> On Thu, Oct 30, 2014 at 11:08 AM, Djalal Harouni
>> <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org> wrote:
>>> Hi Andy,
>>>
>>> 2) To get the creds of the sender of the message during send time. This
>>> is specially relevent to authorize specific D-Bus method calls, by
>>> checking the creds of the caller, not the one who created the kdbus
>>> connection.
>>
>> Please humor me here: can you describe, concretely, a case where
>> authorization of the principal issuing a method call is more correct
>> than authorization of the principal who connected to the object being
>> acted on?
>>
>> I suspect that such examples are actually quite difficult to find.
>>
>> --Andy
>
> The simple answer is that this is a misaimed question - you don't connect to
> the object being acted on.
>
> You connect to the _same bus_ as other clients have connected to. You then
> act on objects they have made available on the bus.
>
> You might have connected to a restricted endpoint, which provides a narrowed
> view of the bus, but that's neither the same thing nor mandatory.

OK, but this doesn't answer the question.  It is not an example of a
case where checking credentials at the time of connection to the bus
is actually worse from a security standpoint than checking for
credentials at the time of the send.

--Andy

^ permalink raw reply

* Re: kdbus: add code for buses, domains and endpoints
From: Alex Elsayed @ 2014-10-30 21:47 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CALCETrW27mdY1qG4nnxC3JvsDG+1C8kMv93WpHwO0xs0CiayVw@mail.gmail.com>

Andy Lutomirski wrote:

> On Thu, Oct 30, 2014 at 11:08 AM, Djalal Harouni
> <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org> wrote:
>> Hi Andy,
>>
>> On Thu, Oct 30, 2014 at 07:58:04AM -0700, Andy Lutomirski wrote:
>>> On Thu, Oct 30, 2014 at 7:48 AM, Djalal Harouni
>>> <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org> wrote:
>>> > On Thu, Oct 30, 2014 at 05:15:04AM -0700, Eric W. Biederman wrote:
>>> >> Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org>
>>> >> writes: What others are doing makes it very hard to safely use allow
>>> >> those ioctls in a tightly sandboxed application, as it is
>>> >> unpredictable what the sandboxed ioctl can do with the file
>>> >> descriptor.
>>> >>
>>> >> Further an application that calls setresuid at different times during
>>> >> it's application will behave differently.  Which makes ioctls that do
>>> >> not have consistent behavior after open time inappropriate for use in
>>> >> userspace libraries.
>>> > We are consistent in our checks, you say that the application will
>>> > behave differently when it calls setresuid() sure! If it changes its
>>> > creds then regain of course it will behave differently! and the checks
>>> > are here to make sure that setresuid() and alike work correctly when
>>> > the application changes its creds and calls-in.
>>> >
>>>
>>> Except that it isn't consistent.
>>>
>>> If I open a postgresql socket that wants me to be root and then I drop
>>> privileges, I can keep talking to postresql.  This is a good thing,
>>> because it means that I can keep talking to postgresql but I lose my
>>> privilege to do other things.
>> Yes, that's nice :-)
>>
>> But here you are not following about those capable() checks in ioctl(),
>> here you are referring to the send (talking) logic! which is another
>> thing. But hey we do not break that use case, we support it.
> 
> I don't understand.  If postgres starts checking the credentials of
> the sender of a query (behind the sender's back, because the current
> kdbus code does it implicitly), then this *doesn't work*.  Postgres
> will see that the sender of the query has the wrong credentials, and
> it will reject.
> 
>>
>>
>>> The new kdbus model breaks this.  If I start as root and drop
>>> privileges to UID_PRIVSEP, then my attempts to communicate over
>>> already-open connections shouldn't consider UID_PRIVSEP.  In the, they
>>> shouldn't tell the other endpoints that UID_PRIVSEP exists at all
>>> unless I've explicitly asked the kernel for this behavior.
>> Yes, but kdbus tries to follow D-Bus which is primarily an RPC system,
>> not just a stream of bytes.
>>
>> So we really want to be able to perform real time checks and authorise
>> method calls on the bus, and not just connections. I mean yes we do our
>> kdbus talk access checks on send (Talk) requests using creds of the
>> connection at creation time, but in the other hand we also need and have
>> to deal with D-Bus method requests which is the primary usecase here.
> 
> I'm sympathetic to this use case (RPC authorization).  I do think that
> you can achieve it by making a new connection at the time at which
> authorization is needed, since kdbus is supposed to be lightweight,
> but that could be an annoying requirement.
> 
> *However*, if an RPC client is making an RPC call that needs
> authorization, it should know that it needs authorization, and it
> should know what authorization it needs, and it should send that
> authorization explicitly.
> 
> If you need lots of data for logging, then have the process sending
> the log message send that data to the logging daemon.  If the logging
> daemon gets less data than it wants, then it can indicate that in the
> logs or return an error.
> 
> [snip]
> 
>> 2) To get the creds of the sender of the message during send time. This
>> is specially relevent to authorize specific D-Bus method calls, by
>> checking the creds of the caller, not the one who created the kdbus
>> connection.
> 
> Please humor me here: can you describe, concretely, a case where
> authorization of the principal issuing a method call is more correct
> than authorization of the principal who connected to the object being
> acted on?
> 
> I suspect that such examples are actually quite difficult to find.
> 
> --Andy

The simple answer is that this is a misaimed question - you don't connect to 
the object being acted on.

You connect to the _same bus_ as other clients have connected to. You then 
act on objects they have made available on the bus.

You might have connected to a restricted endpoint, which provides a narrowed 
view of the bus, but that's neither the same thing nor mandatory.

^ permalink raw reply

* Re: kdbus: add code to gather metadata
From: Andy Lutomirski @ 2014-10-30 21:01 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Greg Kroah-Hartman, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, John Stultz,
	Arnd Bergmann, Tejun Heo, Marcel Holtmann, Ryan Lortie,
	Bastien Nocera, David Herrmann, Djalal Harouni, Simon McVittie,
	alban.crequy, Javier Martinez Canillas, Tom Gundersen
In-Reply-To: <54525F32.3040502-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>

On Thu, Oct 30, 2014 at 8:54 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
> On 10/30/2014 03:07 PM, Andy Lutomirski wrote:
>> On Thu, Oct 30, 2014 at 1:45 AM, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> wrote:
>>> 2. When a new bus is created through KDBUS_CMD_BUS_MAKE, so peers can
>>> later query the credentials of the owner of the bus they're connected to.
>>
>> Ditto.  Although, why on earth should a bus have credentials?  This
>> sounds like a misdesign.  It seems to me that this type of policy
>> belongs all the way in userspace.  If you want a bus, you ask the
>> owner of the entire domain to make you a bus.  Or you make it yourself
>> and hand off references in some authenticated way.
>
> Yes, that's the way it works. However, the idea is that a bus stays
> alive as long as the file descriptor that was used to the create that
> bus remains open, and it is immediately shut down when the fd is closed.
> We merely allow user that are connected to a bus to query the
> credentials of the creator of the bus they're connected to.

Why do you allow this?  What purpose does it serve?  Is that idea that
each application will own one bus?  If so, what goes wrong if you only
capture the specific credentials that the creator of a given bus asks
to have captured?

[snip]

>
>> If I create an endpoint and delegate some processing to a less
>> privileged child, other things on the bus MUST NOT be able to detect
>> that delegation in any sensible design.  Otherwise everything will
>> appear to work in testing because other processes never checked the
>> problematic credential, but then it will randomly fail because someone
>> decided to do something daft and validate my unprivileged child's
>> argv[0], which is, of course, not what they expected.
>
> Not sure whether I got your point, but if a privileged service that
> takes action on behalf of unprivileged clients, it may well depend on
> certain credential information to be present along with the message, and
> refuse to take action otherwise.
>
> For example, if a privileged service can reboot the system, it checks
> whether the asking peer has CAP_SYS_BOOT set. If it checks for uid==0
> instead, and it works in your tests because you happen to test as root,
> that just a bug in the service, right? But I might have missed your point.

The issue is the following: if have the privilege needed to talk to
journald, I may want to enhance security by opening a connection to
journald (and capture that privilege) and then drop privilege.  I
should still be able to talk to journald.

Alternatively, if the privilege needed to reboot is CAP_SYS_BOOT, then
clients should send that capability bit.  Capturing extra information
to try to give daemons the flexibility to change their authorization
conditions later on just moves the problem if you need to change
policy down the line.

>
>> I suspect that, if you make credential sending opt-in, you will
>> quickly discover that the current model for which credentials matter
>> makes no sense.
>
> While for the example above, opting-in for creds items on the sender
> side might actually work (the asking service would be refused in his
> request to reboot the machine). However, for any sort of logging or
> system services, for example, allowing the sender to select which creds
> it wants to reveal is supporting a hide-and-seek game, and that's
> something that won't work.

What's the actual problem for logging?  I very much understand why a
logging service never wants to log an incorrect credential (and legacy
syslog has serious problems here because it doesn't even try to
capture credential), but what's wrong with having a log that shows the
uid for legit log messages and that reliably says "declined to state"
for messages that decline to state.

(Also, I presume that cmdline is for logging.  Keep in mind that the
cmdline is yanked from user memory and can be freely spoofed.)

A major benefit of opt-in credential passing is that it makes it very
difficult to convince another process to exercise its credential on
your behalf by accident.

>
> Thanks for sharing your thoughts on this - I appreciate this discussion
> stays on such technical grounds :)
>

My pleasure.  I have no desire to see this process devolve into random
flames (and I'm glad it hasn't).  That being said, I'll still continue
to object to credential and namespace issues that I think are
problematic. :)

--Andy

^ permalink raw reply

* Re: kdbus: add code for buses, domains and endpoints
From: Andy Lutomirski @ 2014-10-30 20:37 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, John Stultz,
	Arnd Bergmann, Tejun Heo, Marcel Holtmann, Ryan Lortie,
	Bastien Nocera, David Herrmann, Simon McVittie, Daniel Mack,
	alban.crequy, Javier Martinez Canillas, Tom Gundersen
In-Reply-To: <20141030180813.GA11850@dztty>

On Thu, Oct 30, 2014 at 11:08 AM, Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org> wrote:
> Hi Andy,
>
> On Thu, Oct 30, 2014 at 07:58:04AM -0700, Andy Lutomirski wrote:
>> On Thu, Oct 30, 2014 at 7:48 AM, Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org> wrote:
>> > On Thu, Oct 30, 2014 at 05:15:04AM -0700, Eric W. Biederman wrote:
>> >> Djalal Harouni <tixxdz-Umm1ozX2/EEdnm+yROfE0A@public.gmane.org> writes:
>> >> What others are doing makes it very hard to safely use allow those
>> >> ioctls in a tightly sandboxed application, as it is unpredictable
>> >> what the sandboxed ioctl can do with the file descriptor.
>> >>
>> >> Further an application that calls setresuid at different times during
>> >> it's application will behave differently.  Which makes ioctls that do
>> >> not have consistent behavior after open time inappropriate for use in
>> >> userspace libraries.
>> > We are consistent in our checks, you say that the application will
>> > behave differently when it calls setresuid() sure! If it changes its
>> > creds then regain of course it will behave differently! and the checks
>> > are here to make sure that setresuid() and alike work correctly when the
>> > application changes its creds and calls-in.
>> >
>>
>> Except that it isn't consistent.
>>
>> If I open a postgresql socket that wants me to be root and then I drop
>> privileges, I can keep talking to postresql.  This is a good thing,
>> because it means that I can keep talking to postgresql but I lose my
>> privilege to do other things.
> Yes, that's nice :-)
>
> But here you are not following about those capable() checks in ioctl(),
> here you are referring to the send (talking) logic! which is another
> thing. But hey we do not break that use case, we support it.

I don't understand.  If postgres starts checking the credentials of
the sender of a query (behind the sender's back, because the current
kdbus code does it implicitly), then this *doesn't work*.  Postgres
will see that the sender of the query has the wrong credentials, and
it will reject.

>
>
>> The new kdbus model breaks this.  If I start as root and drop
>> privileges to UID_PRIVSEP, then my attempts to communicate over
>> already-open connections shouldn't consider UID_PRIVSEP.  In the, they
>> shouldn't tell the other endpoints that UID_PRIVSEP exists at all
>> unless I've explicitly asked the kernel for this behavior.
> Yes, but kdbus tries to follow D-Bus which is primarily an RPC system,
> not just a stream of bytes.
>
> So we really want to be able to perform real time checks and authorise
> method calls on the bus, and not just connections. I mean yes we do our
> kdbus talk access checks on send (Talk) requests using creds of the
> connection at creation time, but in the other hand we also need and have
> to deal with D-Bus method requests which is the primary usecase here.

I'm sympathetic to this use case (RPC authorization).  I do think that
you can achieve it by making a new connection at the time at which
authorization is needed, since kdbus is supposed to be lightweight,
but that could be an annoying requirement.

*However*, if an RPC client is making an RPC call that needs
authorization, it should know that it needs authorization, and it
should know what authorization it needs, and it should send that
authorization explicitly.

If you need lots of data for logging, then have the process sending
the log message send that data to the logging daemon.  If the logging
daemon gets less data than it wants, then it can indicate that in the
logs or return an error.

[snip]

> 2) To get the creds of the sender of the message during send time. This
> is specially relevent to authorize specific D-Bus method calls, by
> checking the creds of the caller, not the one who created the kdbus
> connection.

Please humor me here: can you describe, concretely, a case where
authorization of the principal issuing a method call is more correct
than authorization of the principal who connected to the object being
acted on?

I suspect that such examples are actually quite difficult to find.

--Andy

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Alex Elsayed @ 2014-10-30 20:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api
In-Reply-To: <CALCETrXv_CevG7166++ZHBok0ae7mNhF83VjrRVQ2AKF22Vf_g@mail.gmail.com>

Andy Lutomirski wrote:

<snip>
> There should be a number measured in, say, nanoseconds in here
> somewhere.  The actual extent of the speedup is unmeasurable here.
> Also, it's worth reading at least one of Linus' many rants about
> zero-copy.  It's not an automatic win.

It's well-understood that it's not an automatic win; significant testing on 
multiple architectures indicated that 512K is a surprisingly universal 
crossover point. The userspace code, therefore, switches from copying 
(normal kdbus parameters) to zero-copy (memfds) right around there.

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Greg Kroah-Hartman @ 2014-10-30 20:24 UTC (permalink / raw)
  To: Karol Lewandowski
  Cc: Jiri Kosina, Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	John Stultz, Arnd Bergmann, Tejun Heo, Ryan Lortie,
	Simon McVittie, daniel-cYrQPVfZoowdnm+yROfE0A, David Herrmann,
	Paul Moore,
	casey.schaufler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	marcel-kz+m5ild9QBg9hUCZPvPmw, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ
In-Reply-To: <545297CC.6020306-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>

On Thu, Oct 30, 2014 at 08:55:56PM +0100, Karol Lewandowski wrote:
> On 2014-10-30 15:47, Greg Kroah-Hartman wrote:
> > On Thu, Oct 30, 2014 at 11:44:39AM +0100, Karol Lewandowski wrote:
> >> [ Sorry for breaking thread and resend - gmane rejected my original message
> >>   due to too long list of recipients... ]
> >>
> >> On 2014-10-30 00:40, Greg Kroah-Hartman wrote:
> >>
> >>> There is a 1815 line documentation file in this series, so we aren't
> >>> trying to not provide this type of information here at all.  But yes,
> >>> more background, about why this can't be done in userspace (zero copy,
> >>> less context switches, proper credential passing, timestamping, availble
> >>> at early-boot, LSM hooks for security models to tie into
> >>
> >> While you're at it... I did some work on proof-of-concept LSM patches for
> >> kdbus some time ago, see [1][2].  Currently, these are completely of date.
> >>
> >>  [1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
> >>  [2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
> >>
> >> May I ask if you guys have your own plan for LSM or maybe it would be
> >> worth to resurrect [1]?
> > 
> > The core calls are already mediated by LSM today, right?  We don't want
> > anyone to be parsing the data stream through an LSM, that idea got
> > rejected a long time ago as something that is really not a good idea.
> 
> Parsing data is out of question, of course, but this is not what we were
> proposing.

Glad to hear it :)

> > Other than that, I don't know exactly what your patches do, or why they
> > are needed, care to go into details?
> 
> Patches in question were supposed to add few hooks for kdbus-specific
> operations that doesn't seem to have compatible semantics with hooks
> currently available in LSM.
> 
> kdbus' bus introduces quite a few new concepts that we wanted to be able
> to limit based on MAC label/context, eg.
> 
>  - check flags at HELO stage (say disallow fd passing),
> 
>  - restrict ability to acquire name to certain subjects (for system bus),
> 
>  - disallow creation of new buses,
> 
>  - limit scope of broadcasts,
> 
>  - etc.

Nice list.

> Please take a look at hook list - I think most of names are self-explanatory:
> 
>   https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add12/include/linux/security.h#L1874
> 
> kdbus modifications were pretty light - with most visible change being
> addition of opaque security pointer to kdbus_bus and similar structs.

That looks very reasonable, care to make it up into a patch I can add to
the end of this series so it's easy to review and possibly submit as
part of it?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 00/12] Add kdbus implementation
From: Karol Lewandowski @ 2014-10-30 19:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Linux API, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	John Stultz, Arnd Bergmann, Tejun Heo, Ryan Lortie,
	Simon McVittie, daniel-cYrQPVfZoowdnm+yROfE0A, David Herrmann,
	Paul Moore,
	casey.schaufler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	marcel-kz+m5ild9QBg9hUCZPvPmw, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	javier.martinez-ZGY8ohtN/8pPYcu2f3hruQ,
	alban.crequy-ZGY8ohtN/8pPYcu2f3hruQ
In-Reply-To: <20141030144709.GA19721-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>

On 2014-10-30 15:47, Greg Kroah-Hartman wrote:
> On Thu, Oct 30, 2014 at 11:44:39AM +0100, Karol Lewandowski wrote:
>> [ Sorry for breaking thread and resend - gmane rejected my original message
>>   due to too long list of recipients... ]
>>
>> On 2014-10-30 00:40, Greg Kroah-Hartman wrote:
>>
>>> There is a 1815 line documentation file in this series, so we aren't
>>> trying to not provide this type of information here at all.  But yes,
>>> more background, about why this can't be done in userspace (zero copy,
>>> less context switches, proper credential passing, timestamping, availble
>>> at early-boot, LSM hooks for security models to tie into
>>
>> While you're at it... I did some work on proof-of-concept LSM patches for
>> kdbus some time ago, see [1][2].  Currently, these are completely of date.
>>
>>  [1] https://github.com/lmctl/linux/commits/kdbus-lsm-v4.for-systemd-v212
>>  [2] https://github.com/lmctl/kdbus/commit/aa0885489d19be92fa41c6f0a71df28763228a40
>>
>> May I ask if you guys have your own plan for LSM or maybe it would be
>> worth to resurrect [1]?
> 
> The core calls are already mediated by LSM today, right?  We don't want
> anyone to be parsing the data stream through an LSM, that idea got
> rejected a long time ago as something that is really not a good idea.

Parsing data is out of question, of course, but this is not what we were
proposing.

> Other than that, I don't know exactly what your patches do, or why they
> are needed, care to go into details?

Patches in question were supposed to add few hooks for kdbus-specific
operations that doesn't seem to have compatible semantics with hooks
currently available in LSM.

kdbus' bus introduces quite a few new concepts that we wanted to be able
to limit based on MAC label/context, eg.

 - check flags at HELO stage (say disallow fd passing),

 - restrict ability to acquire name to certain subjects (for system bus),

 - disallow creation of new buses,

 - limit scope of broadcasts,

 - etc.

Please take a look at hook list - I think most of names are self-explanatory:

  https://github.com/lmctl/linux/blob/a9fe4c33b6e5ab25a243e0590df406aabb6add12/include/linux/security.h#L1874

kdbus modifications were pretty light - with most visible change being
addition of opaque security pointer to kdbus_bus and similar structs.

Thanks!
-- 
Karol Lewandowski, Samsung R&D Institute Poland

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox