Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH v11 1/2] crypto: AF_ALG: add AEAD support
From: Stephan Mueller @ 2015-01-26  4:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Neil Horman
In-Reply-To: <20150125235550.GB18212-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Montag, 26. Januar 2015, 10:55:50 schrieb Herbert Xu:

Hi Herbert,

> On Wed, Jan 21, 2015 at 02:19:17AM +0100, Stephan Mueller wrote:
> > +		/* use the existing memory in an allocated page */
> > +		if (ctx->merge) {
> > +			sg = sgl->sg + sgl->cur - 1;
> > +			len = min_t(unsigned long, len,
> > +				    PAGE_SIZE - sg->offset - sg->length);
> > +			err = memcpy_from_msg(page_address(sg_page(sg)) +
> > +					      sg->offset + sg->length,
> > +					      msg, len);
> > +			if (err)
> > +				goto unlock;
> > +
> > +			sg->length += len;
> > +			ctx->merge = (sg->offset + sg->length) &
> > +				     (PAGE_SIZE - 1);
> > +
> > +			ctx->used += len;
> > +			copied += len;
> > +			size -= len;
> 
> Need to add a continue here to recheck size != 0.

Why would that be needed?

When size is still != 0 (i.e. the existing buffer is completely filled, we 
have still some remaining data), we fall through to the while loop that 
generates a new buffer.

If we add a continue here, we start the next iteration in the outer while loop 
which again checks for the merging of data in an existing buffer. As this 
merging will never happen as we filled that buffer completely in the previous 
loop, we always will fall through to the inner while loop. Thus, not having 
the check for size != 0 is functional identical to having it (besides, it is 
more efficient to not having it). 

Note, this case is triggered in my tests, where I use sendmsg with first a 
small buffer, followed by a large buffer. And I still can send 65536 bytes to 
the kernel.

-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH] crypto: algif - change algif_skcipher to be asynchronous
From: Herbert Xu @ 2015-01-26  2:57 UTC (permalink / raw)
  To: Tadeusz Struk
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	qat-linux-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <54C2CDF8.3000406-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

On Fri, Jan 23, 2015 at 02:40:56PM -0800, Tadeusz Struk wrote:
>
> Ok, It looks to me that we *do* have all we need to implement zero copy
> and AIO with algif_skcipher. The only thing we need to do is to add
> support for it in skcipher_recvmsg(). I think no change is required in
> neither skcipher_sendmsg(), skcipher_sendpage(), nor the if_alg interface.
> Then to start using the interface asynchronously an application will
> need to call aio_read() or lio_listio() instead of read(), but if
> someone will use read(), then it will still work in the same
> (synchronous) way as it is today.
> How does this sound to you, Herbert?

It sounds good but let's see the code first :)
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v11 1/2] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2015-01-26  0:06 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto, linux-api, Neil Horman
In-Reply-To: <6238416.vGUni1CI4i@tachyon.chronox.de>

On Wed, Jan 21, 2015 at 02:19:17AM +0100, Stephan Mueller wrote:
>
> +	/*
> +	 * Require exactly one IOV block as the AEAD operation is a one shot
> +	 * due to the authentication tag.
> +	 */
> +	if (msg->msg_iter.nr_segs != 1)
> +		return -ENOMSG;

Why does limit exist? The fact that you have to do it in one go does
not limit the number of receive IOVs, at least not to one.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v11 1/2] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2015-01-25 23:59 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto, linux-api, Neil Horman
In-Reply-To: <6238416.vGUni1CI4i@tachyon.chronox.de>

On Wed, Jan 21, 2015 at 02:19:17AM +0100, Stephan Mueller wrote:
>
> +		if (!aead_writable(sk)) {
> +			/* user space sent too much data */
> +			aead_put_sgl(sk);
> +			err = -EMSGSIZE;
> +			goto unlock;
> +		}
> +
> +		/* allocate a new page */
> +		len = min_t(unsigned long, size, aead_sndbuf(sk));
> +		while (len) {
> +			int plen = 0;
> +
> +			if (sgl->cur >= ALG_MAX_PAGES) {
> +				err = -E2BIG;

Should reset the socket just like EMSGSIZE above.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v11 1/2] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2015-01-25 23:55 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Neil Horman
In-Reply-To: <6238416.vGUni1CI4i-PJstQz4BMNNP20K/wil9xYQuADTiUCJX@public.gmane.org>

On Wed, Jan 21, 2015 at 02:19:17AM +0100, Stephan Mueller wrote:
>
> +		/* use the existing memory in an allocated page */
> +		if (ctx->merge) {
> +			sg = sgl->sg + sgl->cur - 1;
> +			len = min_t(unsigned long, len,
> +				    PAGE_SIZE - sg->offset - sg->length);
> +			err = memcpy_from_msg(page_address(sg_page(sg)) +
> +					      sg->offset + sg->length,
> +					      msg, len);
> +			if (err)
> +				goto unlock;
> +
> +			sg->length += len;
> +			ctx->merge = (sg->offset + sg->length) &
> +				     (PAGE_SIZE - 1);
> +
> +			ctx->used += len;
> +			copied += len;
> +			size -= len;

Need to add a continue here to recheck size != 0.

Cheers,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v11 1/2] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2015-01-25 23:54 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto, linux-api, Neil Horman
In-Reply-To: <6238416.vGUni1CI4i@tachyon.chronox.de>

On Wed, Jan 21, 2015 at 02:19:17AM +0100, Stephan Mueller wrote:
>
> +static void aead_data_wakeup(struct sock *sk)
> +{
> +	struct alg_sock *ask = alg_sk(sk);
> +	struct aead_ctx *ctx = ask->private;
> +	struct socket_wq *wq;
> +
> +	if (ctx->more)
> +		return;

You should also check for ctx->used here.  This can be called after
a zero-sized write.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 01/13] kdbus: add documentation
From: Ahmed S. Darwish @ 2015-01-25  3:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZooxQFI55V6+gNQ, dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack
In-Reply-To: <20150123131946.GA26302-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>

On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.  These are all valid uses of ioctls
> as they are device/filesystem specific ways to interact with the kernel.
> 
> The thing is, almost no one pays attention to these new ioctls as they
> are domain-specific interfaces, with open userspace programs talking to
> them, and they work well.  ioctl is a powerful and useful interface, and
> if we were to suddenly require no new ioctls, and require everything to
> be a syscall, we would do nothing except make apis more complex (hint,
> you now have to do extra validation on your file descriptor passed to
> you to determine if it really is what you can properly operate your
> ioctl on), and cause no real benefit at all.
> 
> Yes, people abuse ioctls at times, but really, they are needed.
> 
> And remember, I was one of the people who long ago thought we should not
> be adding more ioctls, but I have seen the folly of my ways, and chalk
> it up to youthful ignorance :)
> 

Exactly, and that's why I wondered about the sudden change
of heart ;-)

Thanks for taking the time to write this.

Regards,
Darwish

^ permalink raw reply

* Re: futex(2) man page update help request
From: Thomas Gleixner @ 2015-01-24 16:25 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Darren Hart, Michael Kerrisk (man-pages), Carlos O'Donell,
	Ingo Molnar, Jakub Jelinek,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
	Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
	Darren Hart, Anton Blanchard, Eric Dumazet, bill o gallmeister,
	Jan Kiszka, Daniel Wagner, Rich Felker
In-Reply-To: <1422104287.29655.13.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>

On Sat, 24 Jan 2015, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> > On Fri, 23 Jan 2015, Torvald Riegel wrote:
> > 
> > > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > > > <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > > 
> > > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > > >check that is "independent of the requeue type (normal/pi)"?
> > > > >
> > > > >When I look through futex_requeue(), all the likely looking sources
> > > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > > > 
> > > > 
> > > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > > the back of the FIFO, for example (OK, maybe the only example?).
> > > 
> > > But we never guarantee a futex is a FIFO, or do we?  If we don't, then
> > > such a requeue could be implemented as a no-op by the kernel, which
> > > would sort of invalidate the use case.
> > > 
> > > (And I guess we don't want to guarantee FIFO behavior for futexes.)
> > 
> > The (current) behaviour is:
> > 
> >     real-time threads:   FIFO per priority level
> >     sched-other threads: FIFO independent of nice level
> > 
> > The wakeup is priority ordered. Highest priority level first.
> 
> OK.
> 
> But, just to be clear, do I correctly understand that you do not want to
> guarantee FIFO behavior in the specified futex semantics?  I think there
> are cases where being able to *rely* on FIFO (now and on all future
> kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
> assume in other ordered-wakeup cases too) -- but at the same time, this
> would constrain future futex implementations.

It would be a constraint, but I don't think it would be a horrible
one. Though I have my doubts, that we can actually guarantee it under
all circumstances.

One thing comes to my mind right away: spurious wakeups. There is no
way that we can guarantee FIFO ordering in the context of those. And
we cannot prevent them either.

Thanks,

	tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: futex(2) man page update help request
From: Torvald Riegel @ 2015-01-24 13:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michael Kerrisk (man-pages), Carlos O'Donell, Darren Hart,
	Ingo Molnar, Jakub Jelinek,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
	Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
	Darren Hart, Anton Blanchard, Eric Dumazet, bill o gallmeister,
	Jan Kiszka, Daniel Wagner, Rich Felker
In-Reply-To: <alpine.DEB.2.11.1501241116160.5526@nanos>

On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
> So we should never see -EINTR in the case of a spurious wakeup here.
> 
> But, here is the not so good news:
> 
>  I did some archaeology. The restart handling of futex_wait() got
>  introduced in kernel 2.6.22, so anything older than that will have
>  the spurious -EINTR issues.
> 
> futex_wait_pi() always had the restart handling and glibc folks back
> then (2006) requested that it should never return -EINTR, so it
> unconditionally restarts the syscall whether a signal had been
> delivered or not.
> 
> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
> happens it's a bug and needs to be fixed.

Thanks for looking into this.

Michael, can you include the above in the documentation please?  This is
useful for userspace code like glibc that expects a minimum kernel
version.  Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: futex(2) man page update help request
From: Torvald Riegel @ 2015-01-24 12:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Michael Kerrisk (man-pages), Carlos O'Donell,
	Ingo Molnar, Jakub Jelinek, linux-man@vger.kernel.org, lkml,
	Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
	Darren Hart, Anton Blanchard, Eric Dumazet, bill o gallmeister,
	Jan Kiszka, Daniel Wagner, Rich Felker
In-Reply-To: <alpine.DEB.2.11.1501241102450.5526@nanos>

On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> On Fri, 23 Jan 2015, Torvald Riegel wrote:
> 
> > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > > <mtk.manpages@gmail.com> wrote:
> > > 
> > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > >check that is "independent of the requeue type (normal/pi)"?
> > > >
> > > >When I look through futex_requeue(), all the likely looking sources
> > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > > 
> > > 
> > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > the back of the FIFO, for example (OK, maybe the only example?).
> > 
> > But we never guarantee a futex is a FIFO, or do we?  If we don't, then
> > such a requeue could be implemented as a no-op by the kernel, which
> > would sort of invalidate the use case.
> > 
> > (And I guess we don't want to guarantee FIFO behavior for futexes.)
> 
> The (current) behaviour is:
> 
>     real-time threads:   FIFO per priority level
>     sched-other threads: FIFO independent of nice level
> 
> The wakeup is priority ordered. Highest priority level first.

OK.

But, just to be clear, do I correctly understand that you do not want to
guarantee FIFO behavior in the specified futex semantics?  I think there
are cases where being able to *rely* on FIFO (now and on all future
kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
assume in other ordered-wakeup cases too) -- but at the same time, this
would constrain future futex implementations.

^ permalink raw reply

* Re: futex(2) man page update help request
From: Thomas Gleixner @ 2015-01-24 11:35 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Michael Kerrisk (man-pages), Carlos O'Donell, Darren Hart,
	Ingo Molnar, Jakub Jelinek,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
	Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
	Linux API, Darren Hart, Anton Blanchard, Petr Baudis,
	Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
	Rich Felker
In-Reply-To: <1422037788.29655.0.camel-I2ZjUw8blINjztcc/or7kQ@public.gmane.org>

On Fri, 23 Jan 2015, Torvald Riegel wrote:
> Second, the current documentation for EINTR is that it can happen due to
> receiving a signal *or* due to a spurious wake-up.  This is difficult to

I don't think so. I went through all callchains again with a fine comb.

futex_wait()
retry:
	ret = futex_wait_setup();
	if (ret) {
		 /*
		  * Possible return codes related to uaddr:
		  * -EINVAL:    Not u32 aligned uaddr
		  * -EFAULT:    No mapping, no RW
		  * -ENOMEM:    Paging ran out of memory
		  * -EHWPOISON: Memory hardware error
		  *
		  * Others:
		  * -EWOULDBLOCK: value at uaddr has changed
		  */
		return ret;
	}

	futex_wait_queue_me();

	if (woken by futex_wake/requeue)
	   	return 0;

	if (timeout)
		return -ETIMEOUT;

	/*
	 * Spurious wakeup, i.e. no signal pending
	 */
	if (!signal_pending())
		goto retry;

	/* Handled in the low level syscall exit code */
	if (!timed_wait)
		return -ERESTARTSYS;
	else
		return -ERESTARTBLOCK;

Now in the low level syscall exit we try to deliver the signal

	if (!signal_delivered())
	      restart_syscall();

	if (sigaction->flags & SA_RESTART)
	      restart_syscall();

	ret_to_userspace -EINTR;

So we should never see -EINTR in the case of a spurious wakeup here.

But, here is the not so good news:

 I did some archaeology. The restart handling of futex_wait() got
 introduced in kernel 2.6.22, so anything older than that will have
 the spurious -EINTR issues.

futex_wait_pi() always had the restart handling and glibc folks back
then (2006) requested that it should never return -EINTR, so it
unconditionally restarts the syscall whether a signal had been
delivered or not.

So kernels >= 2.6.22 should never return -EINTR spuriously. If that
happens it's a bug and needs to be fixed.

> Third, I think it would be useful to -- somewhere -- explain which
> behavior the futex operations would have conceptually when expressed by
> C11 code.  We currently say that they wake up, sleep, etc, and which
> values they return.  But we never say how to properly synchronize with
> them on the userspace side.  The C11 memory model is probably the best
> model to use on the userspace side, so that's why I'm arguing for this.
> Basically, I think we need to (1) tell people that they should use
> memory_order_relaxed accesses to the futex variable (ie, the memory
> location associated with the whole futex construct on the kernel side --
> or do we have another name for this?), and (2) give some conceptual
> guarantees for the kernel-side synchronization so that one use this to
> derive how to use them correctly in userspace.
> 
> The man pages might not be the right place for this, and maybe we just
> need a revision of "Futexes are tricky".  If you have other suggestions
> for where to document this, or on the content, let me know.  (I'm also
> willing to spend time on this :) ).

The current futex code in the kernel has gained documentation about
the required memory ordering recently. That should be a good starting
point.

Thanks,

	tglx

^ permalink raw reply

* Re: futex(2) man page update help request
From: Thomas Gleixner @ 2015-01-24 10:05 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: Darren Hart, Michael Kerrisk (man-pages), Carlos O'Donell,
	Ingo Molnar, Jakub Jelinek, linux-man@vger.kernel.org, lkml,
	Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
	Linux API, Darren Hart, Anton Blanchard, Petr Baudis,
	Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
	Rich Felker
In-Reply-To: <1422037145.27573.0.camel@triegel.csb>

On Fri, 23 Jan 2015, Torvald Riegel wrote:

> On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > <mtk.manpages@gmail.com> wrote:
> > 
> > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > >check that is "independent of the requeue type (normal/pi)"?
> > >
> > >When I look through futex_requeue(), all the likely looking sources
> > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > 
> > 
> > Right, in the non-PI case, I believe there are valid use cases: move to
> > the back of the FIFO, for example (OK, maybe the only example?).
> 
> But we never guarantee a futex is a FIFO, or do we?  If we don't, then
> such a requeue could be implemented as a no-op by the kernel, which
> would sort of invalidate the use case.
> 
> (And I guess we don't want to guarantee FIFO behavior for futexes.)

The (current) behaviour is:

    real-time threads:   FIFO per priority level
    sched-other threads: FIFO independent of nice level

The wakeup is priority ordered. Highest priority level first.

Thanks,

	tglx

^ permalink raw reply

* Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
From: Andy Lutomirski @ 2015-01-23 23:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Konstantin Khlebnikov, Li Xi, Linux FS Devel,
	linux-ext4@vger.kernel.org, Linux API, Theodore Ts'o,
	Andreas Dilger, Jan Kara, Al Viro, Christoph Hellwig, dmonakhov,
	Eric W. Biederman
In-Reply-To: <20150123233026.GP16552@dastard>

On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
>> On 23.01.2015 04:53, Dave Chinner wrote:
>> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
>> >>>+  kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
>> >>
>> >>Maybe current_user_ns()?
>> >>This code should be user-namespace aware from the beginning.
>> >
>> >No, the code is correct. Project quotas have nothing to do with
>> >UIDs and so should never have been included in the uid/gid
>> >namespace mapping infrastructure in the first place.
>>
>> Right, but user-namespace provides id mapping for project-id too.
>> This infrastructure adds support for nested project quotas with
>> virtualized ids in sub-containers. I couldn't say that this is
>> must have feature but implementation is trivial because whole
>> infrastructure is already here.
>
> This is an extremely common misunderstanding of project IDs. Project
> IDs are completely separate to the UID/GID namespace.  Project
> quotas were originally designed specifically for
> accounting/enforcing quotas in situations where uid/gid
> accounting/enforcing is not possible. This design intent goes back
> 25 years - it predates XFS...
>
> IOWs, mapping prids via user namespaces defeats the purpose
> for which prids were originally intended for.
>
>> >Point in case: directory subtree quotas can be used as a resource
>> >controller for limiting space usage within separate containers that
>> >share the same underlying (large) filesystem via mount namespaces.
>>
>> That's exactly my use-case: 'sub-volumes' for containers with
>> quota for space usage/inodes count.
>
> That doesn't require mapped project IDs. Hard container space limits
> can only be controlled by the init namespace, and because inodes can
> hold only one project ID the current ns cannot be allowed to change
> the project ID on the inode because that allows them to escape the
> resource limits set on the project ID associated with the sub-mount
> set up by the init namespace...
>
> i.e.
>
> /mnt                    prid = 0, default for entire fs.
> /mnt/container1/        prid = 1, inherit, 10GB space limit
> /mnt/container2/        prid = 2, inherit, 50GB space limit
> .....
> /mnt/containerN/        prid = N, inherit, 20GB space limit
>
> And you clone the mount namespace for each container so the root is
> at the appropriate /mnt/containerX/.  Now the containers have a
> fixed amount of space they can use in the parent filesystem they
> know nothing about, and it is enforced by directory subquotas
> controlled by the init namespace.  This "fixed amount of space" is
> reflected in the container namespace when "df" is run as it will
> report the project quota space limits. Adding or removing space to a
> container is as simple as changing the project quota limits from the
> init namespace. i.e. an admin operation controlled by the host, not
> the container....
>
> Allowing the container to modify the prid and/or the inherit bit of
> inodes in it's namespace then means the user can define their own
> space usage limits, even turn them off. It's not a resource
> container at that point because the user can define their own
> limits.  Hence, only if the current_ns cannot change project quotas
> will we have a hard fence on space usage that the container *cannot
> exceed*.

I think I must be missing something simple here.  In a hypothetical
world where the code used nsown_capable, if an admin wants to stick a
container in /mnt/container1 with associated prid 1 and a userns,
shouldn't it just map only prid 1 into the user ns?  Then a user in
that userns can't try to change the prid of a file to 2 because the
number "2" is unmapped for that user and translation will fail.

--Andy

^ permalink raw reply

* Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
From: Dave Chinner @ 2015-01-23 23:30 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Li Xi, linux-fsdevel, linux-ext4, linux-api, tytso, adilger, jack,
	viro, hch, dmonakhov, Eric W. Biederman
In-Reply-To: <54C23751.7000009@yandex-team.ru>

On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
> On 23.01.2015 04:53, Dave Chinner wrote:
> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
> >>>+	kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
> >>
> >>Maybe current_user_ns()?
> >>This code should be user-namespace aware from the beginning.
> >
> >No, the code is correct. Project quotas have nothing to do with
> >UIDs and so should never have been included in the uid/gid
> >namespace mapping infrastructure in the first place.
> 
> Right, but user-namespace provides id mapping for project-id too.
> This infrastructure adds support for nested project quotas with
> virtualized ids in sub-containers. I couldn't say that this is
> must have feature but implementation is trivial because whole
> infrastructure is already here.

This is an extremely common misunderstanding of project IDs. Project
IDs are completely separate to the UID/GID namespace.  Project
quotas were originally designed specifically for
accounting/enforcing quotas in situations where uid/gid
accounting/enforcing is not possible. This design intent goes back
25 years - it predates XFS...

IOWs, mapping prids via user namespaces defeats the purpose
for which prids were originally intended for.

> >Point in case: directory subtree quotas can be used as a resource
> >controller for limiting space usage within separate containers that
> >share the same underlying (large) filesystem via mount namespaces.
> 
> That's exactly my use-case: 'sub-volumes' for containers with
> quota for space usage/inodes count.

That doesn't require mapped project IDs. Hard container space limits
can only be controlled by the init namespace, and because inodes can
hold only one project ID the current ns cannot be allowed to change
the project ID on the inode because that allows them to escape the
resource limits set on the project ID associated with the sub-mount
set up by the init namespace...

i.e.

/mnt			prid = 0, default for entire fs.
/mnt/container1/	prid = 1, inherit, 10GB space limit
/mnt/container2/	prid = 2, inherit, 50GB space limit
.....
/mnt/containerN/	prid = N, inherit, 20GB space limit

And you clone the mount namespace for each container so the root is
at the appropriate /mnt/containerX/.  Now the containers have a
fixed amount of space they can use in the parent filesystem they
know nothing about, and it is enforced by directory subquotas
controlled by the init namespace.  This "fixed amount of space" is
reflected in the container namespace when "df" is run as it will
report the project quota space limits. Adding or removing space to a
container is as simple as changing the project quota limits from the
init namespace. i.e. an admin operation controlled by the host, not
the container....

Allowing the container to modify the prid and/or the inherit bit of
inodes in it's namespace then means the user can define their own
space usage limits, even turn them off. It's not a resource
container at that point because the user can define their own
limits.  Hence, only if the current_ns cannot change project quotas
will we have a hard fence on space usage that the container *cannot
exceed*.

Yes, I know there are other use cases for project quotas *within* a
container as controlled by the user (same as existing project quota
usages), but we don't have the capability of storing multiple
project IDs on each inode, nor accounting/enforcement across
multiple project IDs on an inode. Nor, really, do we want to (on
disk format changes required) and hence we can have one or the
other but not both.

Further, in a containerised system, providing the admin with a
trivial and easy to manage mechanism to provide hard limits on
shared filesystem space usage of each container is far more
important than catering to the occasional user who might have a need
for project quotas inside a container.

These are the points I brought up when I initially reviewed the user
namespace patches - the userns developer ignored my concerns and the
code was merged without acknowledging them, let alone addressing
them.

As we (the XFS guys) have no way of knowing when such a distinction
should be made, and with the user ns developers being completely
unresponsive on the subject, we made the decision ourselves.  Our
only concern was to be consistent, safe and predictable and that
means we choose to only allow project quotas to be used as an
external container resource hardwall limit and hence *never* allow
access to project quotas inside container namespaces.

That's the long and the short of it. project IDs are independent of
user IDs and they cannot be sanely used both inside and outside user
namespaces at the same time. Hence they should never have been
included in the user namespace mappings in the first place.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply

* Re: [PATCH] crypto: algif - change algif_skcipher to be asynchronous
From: Tadeusz Struk @ 2015-01-23 22:40 UTC (permalink / raw)
  To: Herbert Xu
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	qat-linux-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150116020052.GB5851-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

On 01/15/2015 06:00 PM, Herbert Xu wrote:
>> But then would you like to extend AIO interface to take the IV and
>> > something that would indicate the encrypt/decrypt operation on
>> > aio_write()? Also as far as I can see AIO doesn't support splice()
> Any metadata such as the IV can still go through the existing
> sendmsg interface, just as you would do a sendmsg before a sendfile
> to set things up.
> 
>> > operation for zero copy, which is the main thing here.
> The AIO interface itself can accomodate zero-copy.  It's just that
> we currently don't have any support for it in the network socket
> API.
> 
Hi,
Ok, It looks to me that we *do* have all we need to implement zero copy
and AIO with algif_skcipher. The only thing we need to do is to add
support for it in skcipher_recvmsg(). I think no change is required in
neither skcipher_sendmsg(), skcipher_sendpage(), nor the if_alg interface.
Then to start using the interface asynchronously an application will
need to call aio_read() or lio_listio() instead of read(), but if
someone will use read(), then it will still work in the same
(synchronous) way as it is today.
How does this sound to you, Herbert?
I'll send a v2 shortly.
Thanks,
Tadeusz

^ permalink raw reply

* Re: futex(2) man page update help request
From: Torvald Riegel @ 2015-01-23 18:29 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Thomas Gleixner, Carlos O'Donell, Darren Hart, Ingo Molnar,
	Jakub Jelinek, linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
	Peter Zijlstra, Linux API, Darren Hart, Anton Blanchard,
	Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
	Daniel Wagner, Rich Felker
In-Reply-To: <54B7D87C.3090901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, 2015-01-15 at 16:10 +0100, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC that have expressed interest in the 
> progress of the updates of this page, or who may be able to
> provide review feedback. Eventually, you'll all get CCed on
> the new draft of the page.]
> 
> Hello Thomas,
> 
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> > On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
> >> And that universe would love to have your documentation of 
> >> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
> > 
> > I give you almost the full treatment, but I leave REQUEUE_PI to
> > Darren and FUTEX_WAKE_OP to Jakub. :)
> 
> Thank you for the great effort you put into compiling the
> text below, and apologies for my long delay in following up.
> 
> I've integrated almost all of your suggestions into the 
> manual page. I will shortly send out a new draft of the
> page that contains various FIXMEs for points that remain 
> unclear.

Michael, thanks for working on the draft!  I'll review the draft closely
once you've sent it (or have I missed it?).

There are a few things that I'd like to see covered.

First, we should discuss that users, until they control all code in the
respective process, need to expect futexes to be affected by spurious
futex_wake calls; see https://lkml.org/lkml/2014/11/27/472 for
background and Linus' choice (AFAIU) to just document this.
So, for example, a return code of 0 for FUTEX_WAIT can mean either being
woken up by a FUTEX_WAKE intended for this futex, or a stale one
intended for another futex used by, for example, glibc internally.
(Note that as explained in this thread, this isn't just a glibc
artifact, but a result of the general futex design mixed with
destruction requirements for mutexes and other constructs in C++11 and
POSIX.)
It might also be necessary to further consider this when documenting the
errors, because it does affect how to handle them. See this for a glibc
perspective:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html

Second, the current documentation for EINTR is that it can happen due to
receiving a signal *or* due to a spurious wake-up.  This is difficult to
handle when implementing POSIX semaphores, because they require that
EINTR is returned from SEM_WAIT if and only if the interruption was due
to a signal.  Thus, if FUTEX_WAIT returns EINTR, the semaphore
implementation can't return EINTR from sem_wait; see this for more
comments, including some discussion why use cases relying on the POSIX
requirement around EINTR are borderline timing-dependent:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/sem_waitcommon.c;h=96848d7ac5b6f8f1f3099b422deacc09323c796a;hb=HEAD#l282
Others have commented that aio_suspend has a similar issue; if EINTR
wouldn't in fact be returned spuriously, the POSIX-implementation-side
would get easier.

Third, I think it would be useful to -- somewhere -- explain which
behavior the futex operations would have conceptually when expressed by
C11 code.  We currently say that they wake up, sleep, etc, and which
values they return.  But we never say how to properly synchronize with
them on the userspace side.  The C11 memory model is probably the best
model to use on the userspace side, so that's why I'm arguing for this.
Basically, I think we need to (1) tell people that they should use
memory_order_relaxed accesses to the futex variable (ie, the memory
location associated with the whole futex construct on the kernel side --
or do we have another name for this?), and (2) give some conceptual
guarantees for the kernel-side synchronization so that one use this to
derive how to use them correctly in userspace.

The man pages might not be the right place for this, and maybe we just
need a revision of "Futexes are tricky".  If you have other suggestions
for where to document this, or on the content, let me know.  (I'm also
willing to spend time on this :) ).

Torvald

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: futex(2) man page update help request
From: Torvald Riegel @ 2015-01-23 18:19 UTC (permalink / raw)
  To: Darren Hart
  Cc: Michael Kerrisk (man-pages), Thomas Gleixner, Carlos O'Donell,
	Ingo Molnar, Jakub Jelinek,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lkml,
	Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
	Linux API, Darren Hart, Anton Blanchard, Petr Baudis,
	Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
	Rich Felker
In-Reply-To: <D0DEECA2.B7EAD%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
> 
> 
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?).

But we never guarantee a futex is a FIFO, or do we?  If we don't, then
such a requeue could be implemented as a no-op by the kernel, which
would sort of invalidate the use case.

(And I guess we don't want to guarantee FIFO behavior for futexes.)


--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 3/3] perf: Sample additional clock value
From: David Ahern @ 2015-01-23 18:05 UTC (permalink / raw)
  To: Pawel Moll, Peter Zijlstra, John Stultz
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Christopher Covington,
	Namhyung Kim, Thomas Gleixner, Tomeu Vizoso,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1422032767.14076.151.camel-5wv7dgnIgG8@public.gmane.org>

On 1/23/15 10:06 AM, Pawel Moll wrote:
> As far as I understand (John?) POSIX timers can be used on any clockid?
> So it would be possible to obtain a dynamic clock id, for example for my
> exotic trace hardware (by any means necessary, like opening a char
> device) and create a timer firing every 1 ms (in the trace time domain).
> Than this event would be somehow associated with a perf session (for
> example, by passing the timerid via perf's ioctl) and then, every when
> timer fires, a perf record (something like PERF_RECORD_TIMER?)
> containing the timer/clock's value*and*  the normal perf timestamp,
> would be injected into the circular buffer.

Like this: https://lkml.org/lkml/2011/2/27/158 ? note the date -- 4 
years ago. This is has been dragging on for a long time.

A few problems with that approach:
1. I would like to see a sample generated immediately to get the 
perf_clock -> timeofday correlation immediately rather than have to wait 
N (milli)seconds and have perf scan forward through an M-(giga)byte file 
looking for the one sample that gives the correlation.

I tried to address that problem with an ioctl to force a sample into the 
stream:
    https://lkml.org/lkml/2011/2/27/159
it did not go over very well.

2. there is a risk that the realtime samples dominate a stream.

Another issue that has been raised is updates to xtime by ntp / user. I 
have suggested tracepoints to catch those:
     https://lkml.org/lkml/2011/6/7/636
I don't believe there were ever any comments on the tracepoints.

David

^ permalink raw reply

* Re: [PATCH v4 3/3] perf: Sample additional clock value
From: Pawel Moll @ 2015-01-23 17:06 UTC (permalink / raw)
  To: Peter Zijlstra, John Stultz
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Christopher Covington,
	Namhyung Kim, David Ahern, Thomas Gleixner, Tomeu Vizoso,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20150105134514.GS30905-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>

On Mon, 2015-01-05 at 13:45 +0000, Peter Zijlstra wrote:
> Also, one would expect something like:
> 
> 	default: {
> 		struct k_clock *kc = clockid_to_kclock(event->attr.clock);
> 		struct timespec ts;
> 		if (kc) {
> 			kc->clock_get(event->attr.clock, &ts);
> 			data->clock = ktime_to_ns(timespec_to_ktime(ts));
> 		} else {
> 			data->clock = 0;
> 		}
> 	}
> 
> Albeit preferably slightly less horrible -- of course, one would first
> need to deal with the NMI issue.

I was thinking about it... Maybe the solution is approaching the problem
in a completely different way.

As far as I understand (John?) POSIX timers can be used on any clockid?
So it would be possible to obtain a dynamic clock id, for example for my
exotic trace hardware (by any means necessary, like opening a char
device) and create a timer firing every 1 ms (in the trace time domain).
Than this event would be somehow associated with a perf session (for
example, by passing the timerid via perf's ioctl) and then, every when
timer fires, a perf record (something like PERF_RECORD_TIMER?)
containing the timer/clock's value *and* the normal perf timestamp,
would be injected into the circular buffer.

No issue with NMI, no issue with passing clockid through
perf_event_attr...

Does it make any sense to anyone else but me? ;-)

Pawel

^ permalink raw reply

* Re: [PATCH 01/13] kdbus: add documentation
From: Greg Kroah-Hartman @ 2015-01-23 16:08 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: David Herrmann, Michael Kerrisk (man-pages), Daniel Mack,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, Andy Lutomirski, Linux API,
	linux-kernel, Djalal Harouni, Johannes Stezenbach,
	Theodore T'so
In-Reply-To: <54C10DDC.9000503@gmail.com>

On Thu, Jan 22, 2015 at 09:49:00AM -0500, Austin S Hemmelgarn wrote:
> While I agree that there should be a way for userspace to get the list of
> supported operations, userspace apps will only actually care about that
> once, when they begin talking to kdbus, because (ignoring the live kernel
> patching that people have been working on recently) the list of supported
> operations isn't going to change while the system is running.  While a u64
> copy has relatively low overhead, it does have overhead, and that is very
> significant when you consider part of the reason some people want kdbus is
> for the performance gain.  Especially for those automotive applications that
> have been mentioned which fire off thousands of messages during start-up,
> every little bit of performance is significant.

A single u64 in a structure is not going to be measurable at all,
processors just copy memory too fast these days for 4 extra bytes to be
noticable.  So let's make this as easy as possible for userspace, making
it simpler logic there, which is much more important than saving
theoretical time in the kernel.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 01/13] kdbus: add documentation
From: Greg Kroah-Hartman @ 2015-01-23 15:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Daniel Mack, arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w, tixxdz-Umm1ozX2/EEdnm+yROfE0A,
	Johannes Stezenbach, Theodore T'so
In-Reply-To: <54C0CE8A.5080805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, Jan 22, 2015 at 11:18:50AM +0100, Michael Kerrisk (man-pages) wrote:
> >> And that process seems to be frequent and ongoing even now. (And 
> >> it's to your great credit that the API/ABI breaks are clearly and honestly 
> >> marked in the kdbus.h changelog.) All of this lightens the burden of API
> >> design for kernel developers, but I'm concerned that the long-term pain
> >> for user-space developers who use an API which (in my estimation) may
> >> come to be widely used will be enormous.
> > 
> > Yes, we've jointly reviewed the API details again until just recently to
> > unify structs and enums etc, and added fields to make the ioctls structs
> > more versatile for possible future additions. By that, we effectively
> > broke the ABI, but we did that because we know we can't do such things
> > again in the future.
> > 
> > But again - I don't see how this would be different when using syscalls
> > rather than ioctls to transport information between the driver and
> > userspace. Could you elaborate?
> 
> My suspicion is that not nearly enough thinking has yet been done about
> the design of the API. That's based on these observations:
> 
> * Documentation that is, considering the size of the API, *way* too thin.
> * Some parts of the API not documented at all (various kdbus_item blobs)
> * ABI changes happening even quite recently
> * API oddities such as the 'kernel_flags' fields. Why do I need to
>   be told what flags the kernel supports on *every* operation?
> 
> The above is just after a day of looking hard at kdbus.txt. I strongly
> suspect I'd find a lot of other issues if I spent more time on kdbus.

"not enough thinking"?

We started working on kdbus 2 years ago this FOSDEM (in a few weeks.)
Before that we have been thinking about this for many years, learning
from the previous attempts to get this type of feature merged into the
kernel, talking with users about what they need for this, and soliciting
kernel developer's opinions on what type of API would be best for this
type of feature.

Since then we have done nothing but constantly revise the API.  My first
mock ups were way too simple, and in discussing things with people much
more knowledgeable about D-Bus, they pointed out the problems, and we
iterated.  And iterated.  And iterated some more.  We have worked with
just about every userspace libdbus developer group, including QtDbus
developers as well as glib developers.  Now not all of them agreed with
some of our decisions in the implementation, which is fair enough, you
can't please everyone, but they _all_ agree that what we have now is the
proper way to implement this type of functionality and have reviewed the
features as being correct and compatible with their needs and users.

Those discussions have happened in emails, presentations, meetings, and
hackfests pretty much continuously for the past 2 years all around the
world.

We have stress-tested the api with both unit tests (which are included
here in the patch set) as well as a real-world implementation (sd-bus in
the systemd source repo.)  That real-world implementation successfully
has been booting many of our daily machines for many months now.

Yes, the documentation can always be better, but please don't confuse
the lack of understanding how D-Bus works and its model with the lack of
understanding this kdbus implementation, the two are not comparable.
For some good primers on what D-Bus is, and the terminology it expects
see:
	http://dbus.freedesktop.org/doc/dbus-tutorial.html
and also:
	http://dbus.freedesktop.org/doc/dbus-faq.html#other-ipc

We are not going to put a basic "here is what D-Bus is and how to use
it" into the kernel tree, that is totally outside the scope here.

I suggest reading the tutorial above, and then going back and reading
the kdbus documentation provided.  If you think we are lacking stuff on
the kdbus side, we will be glad to flush out any needed areas.

Also, Daniel has said he will work on a basic userspace "example"
library to show how to use this api, which might make the api a bit
easier to understand.

However, I personally don't think this "example code" is necessary at
all.  We don't ask for this type of "simple examples" from other new
kernel apis we create and add to the kernel all the time.  We require
there to be a user of the api, but not one that is so well documented
that others can write a from-scratch raw userspace replacement.
Specific examples of this are my previously mentioned ioctl users
(btrfs, mei, mic, openvlan, etc.), and the grand-daddy of all horrid
apis, DRM.

Users aren't going to be writing their own "raw kdbus" libraries.  Or if
they are, they are going to start with one of the existing
implementations we have (the test examples and sd-bus, and I think there
is a native Go implementation somewhere as well.)  Users are going to be
using those libraries to write their code, and to be honest, the user
api for sd-bus is a delight to use compared to the "old style" libdbus
interface as we have the benefit of 10 years of experience working with
D-Bus apis in the wild now to learn from past mistakes.

Back to the API.  We have taken review comments on the previous postings
of the code and reworked the API, moving it from a character device to
be a filesystem, which ended up making things a lot easier in the end, a
good example of a review process that is working.  Those changes are
a sign that our development review process works.  People pointed out
problems in our character api that we hadn't thought about from the
kernel implementation side.  And so we changed them and the code is
better and more robust because of it, a success story for our review
process.

Personally, I was the one that started down the character device node
path, so blame that design decision on me, not the other developers
here.  And I was wrong with that, but moving from character to a
filesystem wasn't a huge change, the structures and interactions all
remained almost identical, as the logic behind the API is, in my
opinion, correct for the problem it is addressing.

The 37 different developers who have contributed to this code base are
quite talented and skilled and experienced in user and kernel apis,
having implemented many apis of their own that users rely on every day.

Yes, we all make design mistakes, and you might not agree with some of
them, that's fine.  But it is flat out rude to say that we have not been
thinking about this, when I would guess that this is one of the largest
(in time and contributions) kernel development feature to be worked on
in the past few years.

And yes, I'm being very defensive, as I take this very seriously, so
please, don't so lightly dismiss us as not knowing what we are doing, as
frankly, we do.

Thanks for making it this far, I'll go back to technical discussions of
the API now, as that's what we should be doing, not casting aspirations
as to the aptitude of the people involved.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v7 2/2] tty/serial: Add Spreadtrum sc9836-uart driver support
From: Lyra Zhang @ 2015-01-23 13:39 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Chunyan Zhang,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, Mark Rutland,
	Arnd Bergmann,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
	Pawel Moll,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg@public.gmane.org,
	Kumar Gala, Grant Likely, jslaby-AlSwsSmVLrQ@public.gmane.org,
	Heiko Stübner, jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
	florian.vaussard-p8DiymsW2f8@public.gmane.org,
	andrew-g2DYL2Zd6BY@public.gmane.org, Hayato Suzuki,
	antonynpavlov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Shawn Guo,
	Orson Zhai, geng.ren-lxIno14LUO0EEoCn2XhGlw@public.gmane.org
In-Reply-To: <54C24D39.1080607-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>

On Fri, Jan 23, 2015 at 9:31 PM, Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org> wrote:
> Hi Chunyan,
>
> Just the minor fix to zeroing the stack local in sprd_set_termios()
> and using dev_get_drvdata() in sprd_suspend()/sprd_resume().
>

ok, I see.

> Regards,
> Peter Hurley
>
> On 01/23/2015 08:01 AM, Chunyan Zhang wrote:
>> Add a full sc9836-uart driver for SC9836 SoC which is based on the
>> spreadtrum sharkl64 platform.
>> This driver also support earlycon.
>
> [...]
>
>> +static void sprd_set_termios(struct uart_port *port,
>> +                                 struct ktermios *termios,
>> +                                 struct ktermios *old)
>> +{
>> +     unsigned int baud, quot;
>> +     unsigned int lcr, fc;
>
>         unsigned int lcr = 0, fc;
>
>> +     unsigned long flags;
>> +
>> +     /* ask the core to calculate the divisor for us */
>> +     baud = uart_get_baud_rate(port, termios, old, 0, SPRD_BAUD_IO_LIMIT);
>> +
>> +     quot = (unsigned int)((port->uartclk + baud / 2) / baud);
>> +
>> +     /* set data length */
>> +     switch (termios->c_cflag & CSIZE) {
>> +     case CS5:
>> +             lcr |= SPRD_LCR_DATA_LEN5;
>> +             break;
>> +     case CS6:
>> +             lcr |= SPRD_LCR_DATA_LEN6;
>> +             break;
>> +     case CS7:
>> +             lcr |= SPRD_LCR_DATA_LEN7;
>> +             break;
>> +     case CS8:
>> +     default:
>> +             lcr |= SPRD_LCR_DATA_LEN8;
>> +             break;
>> +     }
>> +
>> +     /* calculate stop bits */
>> +     lcr &= ~(SPRD_LCR_STOP_1BIT | SPRD_LCR_STOP_2BIT);
>> +     if (termios->c_cflag & CSTOPB)
>> +             lcr |= SPRD_LCR_STOP_2BIT;
>> +     else
>> +             lcr |= SPRD_LCR_STOP_1BIT;
>> +
>> +     /* calculate parity */
>> +     lcr &= ~SPRD_LCR_PARITY;
>> +     termios->c_cflag &= ~CMSPAR;    /* no support mark/space */
>> +     if (termios->c_cflag & PARENB) {
>> +             lcr |= SPRD_LCR_PARITY_EN;
>> +             if (termios->c_cflag & PARODD)
>> +                     lcr |= SPRD_LCR_ODD_PAR;
>> +             else
>> +                     lcr |= SPRD_LCR_EVEN_PAR;
>> +     }
>
> [...]
>
>> +static int sprd_probe(struct platform_device *pdev)
>> +{
>> +     struct resource *res;
>> +     struct uart_port *up;
>> +     struct clk *clk;
>> +     int irq;
>> +     int index;
>> +     int ret;
>> +
>> +     for (index = 0; index < ARRAY_SIZE(sprd_port); index++)
>> +             if (sprd_port[index] == NULL)
>> +                     break;
>> +
>> +     if (index == ARRAY_SIZE(sprd_port))
>> +             return -EBUSY;
>> +
>> +     index = sprd_probe_dt_alias(index, &pdev->dev);
>> +
>> +     sprd_port[index] = devm_kzalloc(&pdev->dev,
>> +             sizeof(*sprd_port[index]), GFP_KERNEL);
>> +     if (!sprd_port[index])
>> +             return -ENOMEM;
>> +
>> +     pdev->id = index;
>
> see my recent reply regarding this and sprd_suspend/sprd_resume.
>
>> +
>> +     up = &sprd_port[index]->port;
>> +     up->dev = &pdev->dev;
>> +     up->line = index;
>> +     up->type = PORT_SPRD;
>> +     up->iotype = SERIAL_IO_PORT;
>> +     up->uartclk = SPRD_DEF_RATE;
>> +     up->fifosize = SPRD_FIFO_SIZE;
>> +     up->ops = &serial_sprd_ops;
>> +     up->flags = UPF_BOOT_AUTOCONF;
>> +
>> +     clk = devm_clk_get(&pdev->dev, NULL);
>> +     if (!IS_ERR(clk))
>> +             up->uartclk = clk_get_rate(clk);
>> +
>> +     res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +     if (!res) {
>> +             dev_err(&pdev->dev, "not provide mem resource\n");
>> +             return -ENODEV;
>> +     }
>> +     up->mapbase = res->start;
>> +     up->membase = devm_ioremap_resource(&pdev->dev, res);
>> +     if (IS_ERR(up->membase))
>> +             return PTR_ERR(up->membase);
>> +
>> +     irq = platform_get_irq(pdev, 0);
>> +     if (irq < 0) {
>> +             dev_err(&pdev->dev, "not provide irq resource\n");
>> +             return -ENODEV;
>> +     }
>> +     up->irq = irq;
>> +
>> +     if (!sprd_ports_num) {
>> +             ret = uart_register_driver(&sprd_uart_driver);
>> +             if (ret < 0) {
>> +                     pr_err("Failed to register SPRD-UART driver\n");
>> +                     return ret;
>> +             }
>> +     }
>> +     sprd_ports_num++;
>> +
>> +     ret = uart_add_one_port(&sprd_uart_driver, up);
>> +     if (ret) {
>> +             sprd_port[index] = NULL;
>> +             sprd_remove(pdev);
>> +     }
>> +
>> +     platform_set_drvdata(pdev, up);
>> +
>> +     return ret;
>> +}
>> +
>> +static int sprd_suspend(struct device *dev)
>> +{
>> +     int id = to_platform_device(dev)->id;
>> +     struct uart_port *port = &sprd_port[id]->port;
>> +
>> +     uart_suspend_port(&sprd_uart_driver, port);
>
> same comment: see my recent reply using dev_get_drvdata().
>

ok, i see. thanks!

Chunyan

>> +
>> +     return 0;
>> +}
>> +
>> +static int sprd_resume(struct device *dev)
>> +{
>> +     int id = to_platform_device(dev)->id;
>> +     struct uart_port *port = &sprd_port[id]->port;
>> +
>> +     uart_resume_port(&sprd_uart_driver, port);
>> +
>> +     return 0;
>> +}
>> +
>
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v6 2/2] tty/serial: Add Spreadtrum sc9836-uart driver support
From: Lyra Zhang @ 2015-01-23 13:32 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Chunyan Zhang,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, Mark Rutland,
	Arnd Bergmann,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
	Shawn Guo, Pawel Moll,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg@public.gmane.org,
	Kumar Gala, jslaby-AlSwsSmVLrQ@public.gmane.org, Grant Likely,
	Heiko Stübner, jason-NLaQJdtUoK4Be96aLqz0jA@public.gmane.org,
	florian.vaussard-p8DiymsW2f8@public.gmane.org,
	andrew-g2DYL2Zd6BY@public.gmane.org, Hayato Suzuki,
	antonynpavlov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Orson Zhai,
	geng.ren-lxIno14LUO0EEoCn2XhGlw@public.gmane.org
In-Reply-To: <54C248D7.6040901-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org>

On Fri, Jan 23, 2015 at 9:12 PM, Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org> wrote:
> On 01/23/2015 02:23 AM, Lyra Zhang wrote:
>> Hi, Peter
>>
>> Many thanks to you for reviewing so carefully and giving us so many
>> suggestions and so clear explanations.
>
> :)
>
>> I'll address all of your comments and send an updated patch soon.
>>
>> On Fri, Jan 23, 2015 at 11:57 AM, Peter Hurley <peter-WaGBZJeGNqdsbIuE7sb01tBPR1lH4CV8@public.gmane.org> wrote:
>
> [...]
>
>>>> +static void sprd_set_termios(struct uart_port *port,
>>>> +                                 struct ktermios *termios,
>>>> +                                 struct ktermios *old)
>>>> +{
>>>> +     unsigned int baud, quot;
>>>> +     unsigned int lcr, fc;
>>>> +     unsigned long flags;
>>>> +
>>>> +     /* ask the core to calculate the divisor for us */
>>>> +     baud = uart_get_baud_rate(port, termios, old, 1200, 3000000);
>>>                                                       ^^^^   ^^^^^^
>>>                                            mabye derive these from uartclk?
>>
>> I'm afraid I can't understand very clearly, Could you explain more
>> details please?
>
> Is the fixed clock divider == 8 and the uartclk == 26000000 ?
> If so,
>         baud = uartclk / 8 = 3250000
>
> I see now this is clamping baud inside the maximum, so this is fine.
> Please disregard my comment.
>
> [...]
>
>
>>>> +static int sprd_probe(struct platform_device *pdev)
>>>> +{
>>>> +     struct resource *res;
>>>> +     struct uart_port *up;
>>>> +     struct clk *clk;
>>>> +     int irq;
>>>> +     int index;
>>>> +     int ret;
>>>> +
>>>> +     for (index = 0; index < ARRAY_SIZE(sprd_port); index++)
>>>> +             if (sprd_port[index] == NULL)
>>>> +                     break;
>>>> +
>>>> +     if (index == ARRAY_SIZE(sprd_port))
>>>> +             return -EBUSY;
>>>> +
>>>> +     index = sprd_probe_dt_alias(index, &pdev->dev);
>>>> +
>>>> +     sprd_port[index] = devm_kzalloc(&pdev->dev,
>>>> +             sizeof(*sprd_port[index]), GFP_KERNEL);
>>>> +     if (!sprd_port[index])
>>>> +             return -ENOMEM;
>>>> +
>>>> +     up = &sprd_port[index]->port;
>>>> +     up->dev = &pdev->dev;
>>>> +     up->line = index;
>>>> +     up->type = PORT_SPRD;
>>>> +     up->iotype = SERIAL_IO_PORT;
>>>> +     up->uartclk = SPRD_DEF_RATE;
>>>> +     up->fifosize = SPRD_FIFO_SIZE;
>>>> +     up->ops = &serial_sprd_ops;
>>>> +     up->flags = ASYNC_BOOT_AUTOCONF;
>>>                     ^^^^^^^^^
>>>                     UPF_BOOT_AUTOCONF
>>>
>>> sparse will catch errors like this. See Documentation/sparse.txt
>>
>> you mean we should use UPF_BOOT_AUTOCONF, right?
>
> Yes. Only UPF_* flag definitions should be used with the uart_port.flags
> field.
>
> My comment regarding the sparse tool and documentation is because the
> flags field and UPF_* definitions use a type mechanism to generate
> warnings using the sparse tool if regular integer values are used
> with the flags field.
>
> The type mechanism was specifically introduced to catch using ASYNC_*
> definitions with the uart_port.flags field.
>
> [...]
>
>>>> +static int sprd_suspend(struct device *dev)
>>>> +{
>>>> +     int id = to_platform_device(dev)->id;
>>>> +     struct uart_port *port = &sprd_port[id]->port;
>>>
>>> I'm a little confused regarding the port indexing;
>>> is platform_device->id == line ?  Where did that happen?
>>>
>>
>> Oh, I'll change to assign platform_device->id with port->line in probe()
>
> I apologize; I should have made my comment clearer.
>
> The ->id should not be assigned.
>
> Replace
>
>         int id = to_platform_device(dev)->id;
>         struct uart_port *port = &sprd_port[id]->port;
>
>         uart_suspend_port(&sprd_uart_driver, port);
>
> with
>
>         struct sprd_uart_port *sup = dev_get_drvdata(dev);
>
>         uart_suspend_port(&sprd_uart_driver, &sup->port);
>
>
> I know it's not obvious but platform_get/set_drvdata() is really
> dev_get/set_drvdata() using the embedded struct device dev.
>

I've just sent the v7 before I saw your this email, but I'll address
this point in the next version.

Thanks a lot,
Chunyan

>>
>>>
>>>> +
>>>> +     uart_suspend_port(&sprd_uart_driver, port);
>>>> +
>>>> +     return 0;
>>>> +}
>>>> +
>>>> +static int sprd_resume(struct device *dev)
>>>> +{
>>>> +     int id = to_platform_device(dev)->id;
>>>> +     struct uart_port *port = &sprd_port[id]->port;
>>>> +
>>>> +     uart_resume_port(&sprd_uart_driver, port);
>
> same here
>
>>>> +     return 0;
>

^ permalink raw reply

* Re: [PATCH v7 2/2] tty/serial: Add Spreadtrum sc9836-uart driver support
From: Peter Hurley @ 2015-01-23 13:31 UTC (permalink / raw)
  To: Chunyan Zhang, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	arnd-r2nGTMty4D4, gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	pawel.moll-5wv7dgnIgG8, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, grant.likely-QSEj5FYQhm4dnm+yROfE0A
  Cc: jslaby-AlSwsSmVLrQ, heiko-4mtYJXux2i+zQB+pC5nmwQ,
	jason-NLaQJdtUoK4Be96aLqz0jA, florian.vaussard-p8DiymsW2f8,
	andrew-g2DYL2Zd6BY, hytszk-Re5JQEeQqe8AvxtiuMwx3w,
	antonynpavlov-Re5JQEeQqe8AvxtiuMwx3w,
	shawn.guo-QSEj5FYQhm4dnm+yROfE0A,
	orsonzhai-Re5JQEeQqe8AvxtiuMwx3w, geng.ren-lxIno14LUO0EEoCn2XhGlw,
	zhizhou.zhang-lxIno14LUO0EEoCn2XhGlw,
	lanqing.liu-lxIno14LUO0EEoCn2XhGlw,
	zhang.lyra-Re5JQEeQqe8AvxtiuMwx3w,
	wei.qiao-lxIno14LUO0EEoCn2XhGlw,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-serial-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <1422018066-24997-3-git-send-email-chunyan.zhang-lxIno14LUO0EEoCn2XhGlw@public.gmane.org>

Hi Chunyan,

Just the minor fix to zeroing the stack local in sprd_set_termios()
and using dev_get_drvdata() in sprd_suspend()/sprd_resume().

Regards,
Peter Hurley

On 01/23/2015 08:01 AM, Chunyan Zhang wrote:
> Add a full sc9836-uart driver for SC9836 SoC which is based on the
> spreadtrum sharkl64 platform.
> This driver also support earlycon.

[...]

> +static void sprd_set_termios(struct uart_port *port,
> +				    struct ktermios *termios,
> +				    struct ktermios *old)
> +{
> +	unsigned int baud, quot;
> +	unsigned int lcr, fc;

	unsigned int lcr = 0, fc;

> +	unsigned long flags;
> +
> +	/* ask the core to calculate the divisor for us */
> +	baud = uart_get_baud_rate(port, termios, old, 0, SPRD_BAUD_IO_LIMIT);
> +
> +	quot = (unsigned int)((port->uartclk + baud / 2) / baud);
> +
> +	/* set data length */
> +	switch (termios->c_cflag & CSIZE) {
> +	case CS5:
> +		lcr |= SPRD_LCR_DATA_LEN5;
> +		break;
> +	case CS6:
> +		lcr |= SPRD_LCR_DATA_LEN6;
> +		break;
> +	case CS7:
> +		lcr |= SPRD_LCR_DATA_LEN7;
> +		break;
> +	case CS8:
> +	default:
> +		lcr |= SPRD_LCR_DATA_LEN8;
> +		break;
> +	}
> +
> +	/* calculate stop bits */
> +	lcr &= ~(SPRD_LCR_STOP_1BIT | SPRD_LCR_STOP_2BIT);
> +	if (termios->c_cflag & CSTOPB)
> +		lcr |= SPRD_LCR_STOP_2BIT;
> +	else
> +		lcr |= SPRD_LCR_STOP_1BIT;
> +
> +	/* calculate parity */
> +	lcr &= ~SPRD_LCR_PARITY;
> +	termios->c_cflag &= ~CMSPAR;	/* no support mark/space */
> +	if (termios->c_cflag & PARENB) {
> +		lcr |= SPRD_LCR_PARITY_EN;
> +		if (termios->c_cflag & PARODD)
> +			lcr |= SPRD_LCR_ODD_PAR;
> +		else
> +			lcr |= SPRD_LCR_EVEN_PAR;
> +	}

[...]

> +static int sprd_probe(struct platform_device *pdev)
> +{
> +	struct resource *res;
> +	struct uart_port *up;
> +	struct clk *clk;
> +	int irq;
> +	int index;
> +	int ret;
> +
> +	for (index = 0; index < ARRAY_SIZE(sprd_port); index++)
> +		if (sprd_port[index] == NULL)
> +			break;
> +
> +	if (index == ARRAY_SIZE(sprd_port))
> +		return -EBUSY;
> +
> +	index = sprd_probe_dt_alias(index, &pdev->dev);
> +
> +	sprd_port[index] = devm_kzalloc(&pdev->dev,
> +		sizeof(*sprd_port[index]), GFP_KERNEL);
> +	if (!sprd_port[index])
> +		return -ENOMEM;
> +
> +	pdev->id = index;

see my recent reply regarding this and sprd_suspend/sprd_resume.

> +
> +	up = &sprd_port[index]->port;
> +	up->dev = &pdev->dev;
> +	up->line = index;
> +	up->type = PORT_SPRD;
> +	up->iotype = SERIAL_IO_PORT;
> +	up->uartclk = SPRD_DEF_RATE;
> +	up->fifosize = SPRD_FIFO_SIZE;
> +	up->ops = &serial_sprd_ops;
> +	up->flags = UPF_BOOT_AUTOCONF;
> +
> +	clk = devm_clk_get(&pdev->dev, NULL);
> +	if (!IS_ERR(clk))
> +		up->uartclk = clk_get_rate(clk);
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		dev_err(&pdev->dev, "not provide mem resource\n");
> +		return -ENODEV;
> +	}
> +	up->mapbase = res->start;
> +	up->membase = devm_ioremap_resource(&pdev->dev, res);
> +	if (IS_ERR(up->membase))
> +		return PTR_ERR(up->membase);
> +
> +	irq = platform_get_irq(pdev, 0);
> +	if (irq < 0) {
> +		dev_err(&pdev->dev, "not provide irq resource\n");
> +		return -ENODEV;
> +	}
> +	up->irq = irq;
> +
> +	if (!sprd_ports_num) {
> +		ret = uart_register_driver(&sprd_uart_driver);
> +		if (ret < 0) {
> +			pr_err("Failed to register SPRD-UART driver\n");
> +			return ret;
> +		}
> +	}
> +	sprd_ports_num++;
> +
> +	ret = uart_add_one_port(&sprd_uart_driver, up);
> +	if (ret) {
> +		sprd_port[index] = NULL;
> +		sprd_remove(pdev);
> +	}
> +
> +	platform_set_drvdata(pdev, up);
> +
> +	return ret;
> +}
> +
> +static int sprd_suspend(struct device *dev)
> +{
> +	int id = to_platform_device(dev)->id;
> +	struct uart_port *port = &sprd_port[id]->port;
> +
> +	uart_suspend_port(&sprd_uart_driver, port);

same comment: see my recent reply using dev_get_drvdata().

> +
> +	return 0;
> +}
> +
> +static int sprd_resume(struct device *dev)
> +{
> +	int id = to_platform_device(dev)->id;
> +	struct uart_port *port = &sprd_port[id]->port;
> +
> +	uart_resume_port(&sprd_uart_driver, port);
> +
> +	return 0;
> +}
> +


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 01/13] kdbus: add documentation
From: Greg Kroah-Hartman @ 2015-01-23 13:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: arnd-r2nGTMty4D4, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gnomes-qBU/x9rampVanCEyBjwyrvXRex20P6io, teg-B22kvLQNl6c,
	jkosina-AlSwsSmVLrQ, luto-kltTT9wpgjJwATOyAt5JVQ,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	daniel-cYrQPVfZooxQFI55V6+gNQ, dh.herrmann-Re5JQEeQqe8AvxtiuMwx3w,
	tixxdz-Umm1ozX2/EEdnm+yROfE0A, Michael Kerrisk (man-pages),
	Linus Torvalds, Daniel Mack
In-Reply-To: <20150123131946.GA26302-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>

On Fri, Jan 23, 2015 at 09:19:46PM +0800, Greg Kroah-Hartman wrote:
> On Fri, Jan 23, 2015 at 08:28:20AM +0200, Ahmed S. Darwish wrote:
> > On Fri, Jan 16, 2015 at 11:16:05AM -0800, Greg Kroah-Hartman wrote:
> > > From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
> > > 
> > > kdbus is a system for low-latency, low-overhead, easy to use
> > > interprocess communication (IPC).
> > > 
> > > The interface to all functions in this driver is implemented via ioctls
> > > on files exposed through a filesystem called 'kdbusfs'. The default
> > > mount point of kdbusfs is /sys/fs/kdbus.
> > 
> > Pardon my ignorance, but we've always been told that adding
> > new ioctl()s to the kernel is a very big no-no.  But given
> > the seniority of the folks stewarding this kdbus effort,
> > there must be a good rationale ;-)
> > 
> > So, can the rationale behind introducing new ioctl()s be
> > further explained? It would be even better if it's included
> > in the documentation patch itself.
> 
> The main reason to use an ioctl is that you want to atomically set
> and/or get something "complex" through the user/kernel boundary.  For
> simple device attributes, sysfs works great, for configuring devices,
> configfs works great, but for data streams / structures / etc. an ioctl
> is the correct thing to use.
> 
> Examples of new ioctls being added to the kernel are all over the
> place, look at all of the special-purpose ioctls the filesystems keep
> creating (they aren't adding new syscalls), look at the monstrosity that
> is the DRM layer, look at other complex things like openvswitch, or
> "simpler" device-specific interfaces like the MEI one, or even more
> complex ones like the MMC interface.

Oops, I meant, MIC, not MMC, sorry about that.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox