Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v5 3/8] crypto: AF_ALG: add AEAD support
From: Stephan Mueller @ 2014-12-24  8:54 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141223202401.GA2474-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Mittwoch, 24. Dezember 2014, 07:24:01 schrieb Herbert Xu:

Hi Herbert,

> On Tue, Dec 23, 2014 at 03:52:27PM +0100, Stephan Mueller wrote:
> > Am Dienstag, 23. Dezember 2014, 22:56:26 schrieb Herbert Xu:
> > > In fact AEAD is rather awkward because you need to do everything
> > > in one go.  Perhaps we could adapt our kernel interface to allow
> > > partial AEAD operations?
> > 
> > I am not sure what you are referring to. The invocation does not need to
> > be in one go. You can have arbitrary number of sendmsg calls. But all
> > input data needs to be supplied before you call recvmsg.
> 
> What I mean is that unlike skcipher we cannot precede until we
> have the complete input.  So you cannot begin recvmsg until all
> input has been sent.

That is right, but isn't that the nature of AEAD ciphers in general? Even if 
you are in the kernel, you need to have all scatter lists together for one 
invocation of the AEAD cipher.

In case of a threaded application, the recvmsg does not start until all data 
is in, marked with the missing MSG_MORE -- see aead_readable.

All we can do is allow the user to use multiple system calls to collect all 
data before the AEAD operation takes place.

Or do you see another way on how to invoke the AEAD operation in a different 
manner?

The only item that I see that could be made better is the output side: 
currently the code allows only one and exactly one iovec to point to the 
output buffer. I would like to allow multiple iovec buffers that are filled 
with the output of one invocation of the AEAD operation. However, to avoid 
making a kernel-internal scratch buffer, I would need to somehow link the 
kernel-internal scatter lists with the iovec buffers. That only works when 
walking the iovec lists first and call af_alg_make_sg with every iovec entry 
and create the kernel-internal scatterlist representation. That is followed by 
the AEAD operation on the scatterlist.

If we agree on walking the iovec list first, then the question arises how many 
iovec list entries we allow at max. Is 16 entries a sensible value?

-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Rik van Riel @ 2014-12-23 22:33 UTC (permalink / raw)
  To: Khalid Aziz, Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, corbet-T1hC0tSOHrs,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, ak-VuQAYsv1563Yd54FQh9/CA,
	mgorman-l3A5Bk7waGM, raistlin-k2GhghHVRtY,
	kirill.shutemov-VuQAYsv1563Yd54FQh9/CA,
	atomlin-H+wXaHxf7aLQT0dZR+AlfA, avagin-GEFAQzZX7r8dnm+yROfE0A,
	gorcunov-GEFAQzZX7r8dnm+yROfE0A,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw, athorlton-sJ/iWh9BUns,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, vdavydov-bzQdu9zFT3WakBO8gow8eQ,
	daeseok.youn-Re5JQEeQqe8AvxtiuMwx3w,
	keescook-F7+t8E8rja9g9hUCZPvPmw,
	yangds.fnst-BthXqXjhjHXQFUHtdCDX3A, sbauer-F61uvSdQLzf2fBVCVOL8/A,
	vishnu.ps-Sze3O3UU22JBDgjK7y7TUQ, axboe-b10kYP2dOMg,
	paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <5499D4D7.90109-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/23/2014 03:47 PM, Khalid Aziz wrote:

>> You are right. Uncontended futex is very fast since it never goes
>> into kernel. Queuing problem happens when the lock holder has
>> been pre-empted. Adaptive spinning does the smart thing os
>> spin-waiting only if the lock holder is still running on another
>> core. If lock holder is not scheduled on any core, even adaptive
>> spinning has to go into the kernel to be put on wait queue. What
>> would avoid queuing problem and reduce the cost of contention is
>> a combination of adaptive spinning, and a way to keep the lock
>> holder running on one of the cores just a little longer so it can
>> release the lock. Without creating special case and a new API in
>> kernel, one way I can think of accomplishing the second part is
>> to boost the priority of lock holder when contention happens and 
>> priority ceiling is meant to do exactly that. Priority ceiling 
>> implementation in glibc boosts the priority by calling into
>> scheduler which does incur the cost of a system call. Priority
>> boost is a reliable solution that does not change scheduling
>> semantics. The solution allowing lock holder to use one extra
>> timeslice is not a definitive solution but tpcc workload shows it
>> does work and it works without requiring changes to database
>> locking code.
> 
>> Theoretically a new locking library that uses both these
>> techniques will help solve the problem but being a new locking
>> library, there is a big unknown of what new problems, performance
>> and otherwise, it will bring and database has to recode to this
>> new library. Nevertheless this is the path I am exploring now.
>> The challenge being how to do this without requiring changes to
>> database code or the kernel. The hooks available to me into
>> current database code are schedctl_init(), schedctl_start() and 
>> schedctl_stop() which are no-op on Linux at this time.

That sounds like a feature.  Keep the uncontended operations fast
by not doing anything, and only slow down when there is contention.

Presumably the database people will optimize their code to avoid
contention, so any complexity can happen in the slow path, instead
of by adding things to the fast path...

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUme3NAAoJEM553pKExN6DD8gH/3am5Izrobk/AiN8sijg3YXA
a9orVuoWNE+BLt49PwWrYpjsR2AgN4G3BbUrb4GVhaFBL5/v/frUhk0As3w3uM21
QjxMtaFvqZviLWCFgtIna7zSxHom+v/eRiAjLtCoX+GtHs+t25Jyf1GowmZnkoNd
UtDPHPXmyA2CqZC0E9d53Uzb9XaP/T4G3J8U2aPSvwoj4Nw85H2S/QMptNQEJDjY
0Qpx/fv2Ze/gJ7GujU3gloX6cH5DDU+p9/pFZ7iDEB6jbbb384Zuacq6R6CeJMVB
EAxKW1tpFtPvaRC51x8TFNJY5FxSISbXKbehxKjXQ8rlkcM/k1euzo2KCKOp68w=
=cTlU
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Khalid Aziz @ 2014-12-23 20:47 UTC (permalink / raw)
  To: Rik van Riel, Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, corbet, mingo, hpa, akpm,
	rientjes, ak, mgorman, raistlin, kirill.shutemov, atomlin, avagin,
	gorcunov, serge.hallyn, athorlton, oleg, vdavydov, daeseok.youn,
	keescook, yangds.fnst, sbauer, vishnu.ps, axboe, paulmck,
	linux-kernel, linux-doc, linux-api
In-Reply-To: <5499B8A2.4080008@redhat.com>

On 12/23/2014 11:46 AM, Rik van Riel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/23/2014 10:13 AM, Khalid Aziz wrote:
>> On 12/23/2014 03:52 AM, Ingo Molnar wrote:
>>>
>>>
>>> to implement what Thomas suggested in the discussion: a proper
>>> futex like spin mechanism? That looks like a totally acceptable
>>> solution to me, without the disadvantages of your proposed
>>> solution.
>>
>> Hi Ingo,
>>
>> Thank you for taking the time to respond. It is indeed possible to
>> implement a futex like spin mechanism. Futex like mechanism will
>> be clean and elegant. That is where I had started when I was given
>> this problem to solve. Trouble I run into is the primary
>> application I am looking at to help with this solution is Database
>> which implements its own locking mechanism without using POSIX
>> semaphore or futex. Since the locking is entirely in userspace,
>> kernel has no clue when the userspace has acquired one of these
>> locks. So I can see only two ways to solve this - find a solution
>> in userspace entirely, or have userspace tell the kernel when it
>> acquires one of these locks. I will spend more time on finding a
>> way to solve it in userspace and see if I can find a way to
>> leverage futex mechanism without causing significant change to
>> database code. There may be a way to use priority inheritance to
>> avoid contention. Database performance people tell me that their
>> testing has shown the cost of making any system calls in this code
>> easily offsets any gains from optimizing for contention avoidance,
>> so that is one big challenge. Database rewriting their locking code
>> is extremely unlikely scenario. Am I missing a third option here?
>
> An uncontended futex is taken without ever going into kernel
> space. Adaptive spinning allows short duration futexes to be
> taken without going into kernel space.

You are right. Uncontended futex is very fast since it never goes into 
kernel. Queuing problem happens when the lock holder has been 
pre-empted. Adaptive spinning does the smart thing os spin-waiting only 
if the lock holder is still running on another core. If lock holder is 
not scheduled on any core, even adaptive spinning has to go into the 
kernel to be put on wait queue. What would avoid queuing problem and 
reduce the cost of contention is a combination of adaptive spinning, and 
a way to keep the lock holder running on one of the cores just a little 
longer so it can release the lock. Without creating special case and a 
new API in kernel, one way I can think of accomplishing the second part 
is to boost the priority of lock holder when contention happens and 
priority ceiling is meant to do exactly that. Priority ceiling 
implementation in glibc boosts the priority by calling into scheduler 
which does incur the cost of a system call. Priority boost is a reliable 
solution that does not change scheduling semantics. The solution 
allowing lock holder to use one extra timeslice is not a definitive 
solution but tpcc workload shows it does work and it works without 
requiring changes to database locking code.

Theoretically a new locking library that uses both these techniques will 
help solve the problem but being a new locking library, there is a big 
unknown of what new problems, performance and otherwise, it will bring 
and database has to recode to this new library. Nevertheless this is the 
path I am exploring now. The challenge being how to do this without 
requiring changes to database code or the kernel. The hooks available to 
me into current database code are schedctl_init(), schedctl_start() and 
schedctl_stop() which are no-op on Linux at this time. Database folks 
can replace these no-ops with real code in their library to solve the 
queuing problem. schedctl_start() and schedctl_stop() are called only 
when one of the highly contended locks is acquired or released. 
schedctl_start() is called after the lock has been acquired which means 
I can not rely upon it to solve contention issue. schedctl_stop() is 
called after the lock has been released.

Thanks,
Khalid

>
> Only long held locks cause a thread to go into kernel space,
> where it goes to sleep, freeing up the cpu, and increasing
> the chance that the lock holder will run.
>
> - --
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQEcBAEBAgAGBQJUmbihAAoJEM553pKExN6DDlQH/1vvy9YYuP2dCAZSU3fz855e
> pj4796Qja929I2dStsbLl6Qhcg2ELtwtPkLoAePQ/4j2l7DCYgSNLXlC+RzQ32ay
> rbMIfwiriEVGp2hsvYTOCpnur19IHf7v726ivaDXVOM/nrRaHsB8wwspLQQyfSIE
> b7M7jxvT4S2pEELOGB6JQfEZZhbf5wBv9HBk+fkCBMaO4WZrnYczyD0/omiADm65
> xSm/8pCMK22u8Tzn9EpKpIVdIFrl9AlZ1uiRBV2Br1oqwaBTvJVknW4bvIk0DWZU
> ErwR/073UYKpl+xce3nbnixH8FeRP7/mq73Xd8e+iCgn6Dtzr1tANsu27EigMZ0=
> =WHb3
> -----END PGP SIGNATURE-----
>


^ permalink raw reply

* Re: [PATCH v5 3/8] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2014-12-23 20:24 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4537021.IXSvIIgcH4-PJstQz4BMNNP20K/wil9xYQuADTiUCJX@public.gmane.org>

On Tue, Dec 23, 2014 at 03:52:27PM +0100, Stephan Mueller wrote:
> Am Dienstag, 23. Dezember 2014, 22:56:26 schrieb Herbert Xu:
>
> > In fact AEAD is rather awkward because you need to do everything
> > in one go.  Perhaps we could adapt our kernel interface to allow
> > partial AEAD operations?
> 
> 
> I am not sure what you are referring to. The invocation does not need to be in 
> one go. You can have arbitrary number of sendmsg calls. But all input data 
> needs to be supplied before you call recvmsg.

What I mean is that unlike skcipher we cannot precede until we
have the complete input.  So you cannot begin recvmsg until all
input has been sent.

Cheers,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Rik van Riel @ 2014-12-23 18:46 UTC (permalink / raw)
  To: Khalid Aziz, Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, corbet, mingo, hpa, akpm,
	rientjes, ak, mgorman, raistlin, kirill.shutemov, atomlin, avagin,
	gorcunov, serge.hallyn, athorlton, oleg, vdavydov, daeseok.youn,
	keescook, yangds.fnst, sbauer, vishnu.ps, axboe, paulmck,
	linux-kernel, linux-doc, linux-api
In-Reply-To: <5499867C.1010201@oracle.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/23/2014 10:13 AM, Khalid Aziz wrote:
> On 12/23/2014 03:52 AM, Ingo Molnar wrote:
>> 
>> 
>> to implement what Thomas suggested in the discussion: a proper 
>> futex like spin mechanism? That looks like a totally acceptable 
>> solution to me, without the disadvantages of your proposed 
>> solution.
> 
> Hi Ingo,
> 
> Thank you for taking the time to respond. It is indeed possible to 
> implement a futex like spin mechanism. Futex like mechanism will
> be clean and elegant. That is where I had started when I was given
> this problem to solve. Trouble I run into is the primary
> application I am looking at to help with this solution is Database
> which implements its own locking mechanism without using POSIX
> semaphore or futex. Since the locking is entirely in userspace,
> kernel has no clue when the userspace has acquired one of these
> locks. So I can see only two ways to solve this - find a solution
> in userspace entirely, or have userspace tell the kernel when it
> acquires one of these locks. I will spend more time on finding a
> way to solve it in userspace and see if I can find a way to 
> leverage futex mechanism without causing significant change to
> database code. There may be a way to use priority inheritance to
> avoid contention. Database performance people tell me that their
> testing has shown the cost of making any system calls in this code
> easily offsets any gains from optimizing for contention avoidance,
> so that is one big challenge. Database rewriting their locking code
> is extremely unlikely scenario. Am I missing a third option here?

An uncontended futex is taken without ever going into kernel
space. Adaptive spinning allows short duration futexes to be
taken without going into kernel space.

Only long held locks cause a thread to go into kernel space,
where it goes to sleep, freeing up the cpu, and increasing
the chance that the lock holder will run.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUmbihAAoJEM553pKExN6DDlQH/1vvy9YYuP2dCAZSU3fz855e
pj4796Qja929I2dStsbLl6Qhcg2ELtwtPkLoAePQ/4j2l7DCYgSNLXlC+RzQ32ay
rbMIfwiriEVGp2hsvYTOCpnur19IHf7v726ivaDXVOM/nrRaHsB8wwspLQQyfSIE
b7M7jxvT4S2pEELOGB6JQfEZZhbf5wBv9HBk+fkCBMaO4WZrnYczyD0/omiADm65
xSm/8pCMK22u8Tzn9EpKpIVdIFrl9AlZ1uiRBV2Br1oqwaBTvJVknW4bvIk0DWZU
ErwR/073UYKpl+xce3nbnixH8FeRP7/mq73Xd8e+iCgn6Dtzr1tANsu27EigMZ0=
=WHb3
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Khalid Aziz @ 2014-12-23 15:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, corbet, mingo, hpa, riel, akpm,
	rientjes, ak, mgorman, raistlin, kirill.shutemov, atomlin, avagin,
	gorcunov, serge.hallyn, athorlton, oleg, vdavydov, daeseok.youn,
	keescook, yangds.fnst, sbauer, vishnu.ps, axboe, paulmck,
	linux-kernel, linux-doc, linux-api
In-Reply-To: <20141223105251.GB22203@gmail.com>

On 12/23/2014 03:52 AM, Ingo Molnar wrote:
>
>
> to implement what Thomas suggested in the discussion: a proper
> futex like spin mechanism? That looks like a totally acceptable
> solution to me, without the disadvantages of your proposed
> solution.

Hi Ingo,

Thank you for taking the time to respond. It is indeed possible to 
implement a futex like spin mechanism. Futex like mechanism will be 
clean and elegant. That is where I had started when I was given this 
problem to solve. Trouble I run into is the primary application I am 
looking at to help with this solution is Database which implements its 
own locking mechanism without using POSIX semaphore or futex. Since the 
locking is entirely in userspace, kernel has no clue when the userspace 
has acquired one of these locks. So I can see only two ways to solve 
this - find a solution in userspace entirely, or have userspace tell the 
kernel when it acquires one of these locks. I will spend more time on 
finding a way to solve it in userspace and see if I can find a way to 
leverage futex mechanism without causing significant change to database 
code. There may be a way to use priority inheritance to avoid 
contention. Database performance people tell me that their testing has 
shown the cost of making any system calls in this code easily offsets 
any gains from optimizing for contention avoidance, so that is one big 
challenge. Database rewriting their locking code is extremely unlikely 
scenario. Am I missing a third option here?

Thanks,
Khalid

^ permalink raw reply

* Re: [PATCH v5 3/8] crypto: AF_ALG: add AEAD support
From: Stephan Mueller @ 2014-12-23 14:52 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141223115626.GA31450-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Dienstag, 23. Dezember 2014, 22:56:26 schrieb Herbert Xu:

Hi Herbert,

> On Tue, Dec 23, 2014 at 09:14:43AM +0100, Stephan Mueller wrote:
> > - the check aead_readable() immediately before this check implements the
> > blocking if we do not have sufficient data *and* more data is to be
> > expected
> Good point.
> 
> In fact AEAD is rather awkward because you need to do everything
> in one go.  Perhaps we could adapt our kernel interface to allow
> partial AEAD operations?


I am not sure what you are referring to. The invocation does not need to be in 
one go. You can have arbitrary number of sendmsg calls. But all input data 
needs to be supplied before you call recvmsg.

Please see my test code that implements the following call sequence using the 
libkcapi wrapper API calls where I dissect the data to be sent to the kernel 
for testing purposes:

if (cavs_test->enc) {
                /* send assoc with init call */
                ret = kcapi_aead_stream_init_enc(&handle, &iov, 1);
                if (0 > ret) {
                        printf("Initialization of cipher buffer failed\n");
                        goto out;
                }
                /* send plaintext with last call */
                iov.iov_base = cavs_test->pt;
                iov.iov_len = cavs_test->ptlen;
                ret = kcapi_aead_stream_update_last(&handle, &iov, 1);
                if (0 > ret) {
                        printf("Sending last update buffer failed\n");
                        goto out;
                }
                ret = kcapi_aead_stream_op(&handle, &outiov, 1);
        } else {
                /* send assoc with init call */
                ret = kcapi_aead_stream_init_dec(&handle, &iov, 1);
                if (0 > ret) {
                        printf("Initialization of cipher buffer failed\n");
                        goto out;
                }
                /* send plaintext with intermediary call */
                iov.iov_base = cavs_test->ct;
                iov.iov_len = cavs_test->ctlen;
                ret = kcapi_aead_stream_update(&handle, &iov, 1);
                if (0 > ret) {
                        printf("Sending update buffer failed\n");
                        goto out;
                }
                /* send tag with last send call */
                iov.iov_base = cavs_test->tag;
                iov.iov_len = cavs_test->taglen;
                ret = kcapi_aead_stream_update_last(&handle, &iov, 1);
                if (0 > ret) {
                        printf("Sending last update buffer failed\n");
                        goto out;
                }
                ret = kcapi_aead_stream_op(&handle, &outiov, 1);
        }

Every call to kcapi_aead_stream_init_dec / kcapi_aead_stream_update / 
kcapi_aead_stream_update_last invokes one sendmsg syscall.

In essence, kcapi_aead_stream_update can be invoked with every byte you want 
to add to the message stream. This "stream" API of libkcapi is logially 
equivalent to the init/update/final of message digests.
> 
> I want to be very careful before we pin down our user-space
> interface since that's something that we cannot easily change
> while the kernel interface can be modified at any time.

I am fully with you and try to patiently present solutions.
> 
> Thanks,


-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH v5 3/8] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2014-12-23 11:56 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <101382546.xjTjAHLGAb-PJstQz4BMNNP20K/wil9xYQuADTiUCJX@public.gmane.org>

On Tue, Dec 23, 2014 at 09:14:43AM +0100, Stephan Mueller wrote:
>
> - the check aead_readable() immediately before this check implements the 
> blocking if we do not have sufficient data *and* more data is to be expected

Good point.

In fact AEAD is rather awkward because you need to do everything
in one go.  Perhaps we could adapt our kernel interface to allow
partial AEAD operations?

I want to be very careful before we pin down our user-space
interface since that's something that we cannot easily change
while the kernel interface can be modified at any time.

Thanks,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Ingo Molnar @ 2014-12-23 10:52 UTC (permalink / raw)
  To: Khalid Aziz
  Cc: Thomas Gleixner, Peter Zijlstra, corbet-T1hC0tSOHrs,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	riel-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rientjes-hpIqsD4AKlfQT0dZR+AlfA, ak-VuQAYsv1563Yd54FQh9/CA,
	mgorman-l3A5Bk7waGM, raistlin-k2GhghHVRtY,
	kirill.shutemov-VuQAYsv1563Yd54FQh9/CA,
	atomlin-H+wXaHxf7aLQT0dZR+AlfA, avagin-GEFAQzZX7r8dnm+yROfE0A,
	gorcunov-GEFAQzZX7r8dnm+yROfE0A,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw, athorlton-sJ/iWh9BUns,
	oleg-H+wXaHxf7aLQT0dZR+AlfA, vdavydov-bzQdu9zFT3WakBO8gow8eQ,
	daeseok.youn-Re5JQEeQqe8AvxtiuMwx3w,
	keescook-F7+t8E8rja9g9hUCZPvPmw,
	yangds.fnst-BthXqXjhjHXQFUHtdCDX3A, sbauer-F61uvSdQLzf2fBVCVOL8/A,
	vishnu.ps-Sze3O3UU22JBDgjK7y7TUQ, axboe-b10kYP2dOMg,
	paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <5498498B.90703-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>


* Khalid Aziz <khalid.aziz-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:

> On 12/19/2014 04:57 PM, Thomas Gleixner wrote:
> >On Fri, 19 Dec 2014, Khalid Aziz wrote:
> >>The queuing problem caused by a task taking a contended lock just before its
> >>current timeslice is up which userspace app wouldn't know about, is a real
> >>problem nevertheless.
> >
> >We know that already.
> >
> >>My patch attempts to avoid the contention in the first
> >>place. futex with adaptive spinning is a post-contention solution that tries
> >>to minimize the cost of contention but does nothing to avoid the contention.
> >
> >I never said that adaptive spinning can solve that problem.
> >
> >If you would have carefuly read what I wrote, you might have noticed,
> >that I said:
> >
> >      a proper futex like spin mechanism
> >
> >Can you spot the subtle difference between that phrase and 'futex with
> >adaptive spinning'?
> >
> >>Solving this problem using futex can help only if the userspace lock uses
> >>futex.
> >
> >A really fundamentally new and earth shattering insight.
> >
> >If you would spend your time to actually digest what maintainers are
> >telling you, we might make progress on that matter.
> >
> >But you prefer to spend your time by repeating yourself and providing
> >completely useless information.
> >
> >What you are missing completely here is that neither me nor other
> >maintainers involved care about how you spend your time. But we very
> >much care about the time WE waste with your behaviour.
> 
> I am sorry that you feel the need to continue to resort to 
> personal attacks [...]

Thomas did not attack your person AFAICS - he criticised your 
arguments with increasing volume, because he did not see you 
respond to his arguments in substance.

> even after I made it clear in my last response that I was not 
> going to pursue this patch. There is no possibility of a 
> productive discussion of a solution at this point. [...]

I think there is very much a possibility of a productive 
discussion:

> [...] I hope someone else can find a solution you find 
> acceptable.

to implement what Thomas suggested in the discussion: a proper 
futex like spin mechanism? That looks like a totally acceptable 
solution to me, without the disadvantages of your proposed 
solution.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH v5 5/8] crypto: AF_ALG: add user space interface for RNG
From: Stephan Mueller @ 2014-12-23  8:27 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141222112730.GB19532-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Montag, 22. Dezember 2014, 22:27:30 schrieb Herbert Xu:

Hi Herbert,

> On Sun, Dec 07, 2014 at 11:23:48PM +0100, Stephan Mueller wrote:
> > Allow user space to seed / reset the RNG via a setsockopt.
> > 
> > This patch reuses alg_setkey to copy data into the kernel. The
> > alg_setkey is now used for two mechanisms: setkey and seeding.
> > The function is extended by the providing the function pointer
> > to the function handling the copied data.
> > 
> > As the alg_setkey is now usable for more than just setkey, it is renamed
> > to alg_setop.
> 
> Just call it setkey, there is no harm in treating the seed as a key
> is there?
> 
> In fact we should have done this from the very start.
> crypto_rng_reset should be renamed crypto_rng_setkey.

Ok, that means I will drop this patch entirely and wire up the reseeding 
function in algif_rng.c with setkey.
> 
> Cheers,


-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH v5 3/8] crypto: AF_ALG: add AEAD support
From: Stephan Mueller @ 2014-12-23  8:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto, linux-api
In-Reply-To: <20141222112341.GA19532@gondor.apana.org.au>

Am Montag, 22. Dezember 2014, 22:23:41 schrieb Herbert Xu:

Hi Herbert,

> On Sun, Dec 07, 2014 at 11:22:30PM +0100, Stephan Mueller wrote:
> > +static inline bool aead_sufficient_data(struct aead_ctx *ctx)
> > +{
> > +	unsigned as = crypto_aead_authsize(crypto_aead_reqtfm(&ctx-
>aead_req));
> > +
> > +	return (ctx->used >= (ctx->aead_assoclen + ctx->enc ? : as ));
> 
> Is this supposed to be
> 
> 	return (ctx->used >= (ctx->aead_assoclen + (ctx->enc ?: as)));

Thanks, will be fixed in the next iteration
> 
> > +static int aead_recvmsg(struct kiocb *unused, struct socket *sock,
> > +			    struct msghdr *msg, size_t ignored, int flags)
> > +{
> 
> ...
> 
> > +	err = -ENOMEM;
> > +	if (!aead_sufficient_data(ctx))
> > +		goto unlock;
> 
> You should just block if there is insufficient input.
> 
I do not concur here due to the following:

- the check aead_readable() immediately before this check implements the 
blocking if we do not have sufficient data *and* more data is to be expected

- this very check for aead_sufficient_data() comes into play if the caller 
does not have more data (i.e. ctx->more is zero). In this case, more data is 
not to be expected and we cannot wait as this would be a deadlock in user 
space.

-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH selftest fails!] m68k: Wire up execveat
From: David Drysdale @ 2014-12-22 22:54 UTC (permalink / raw)
  To: David Drysdale, schwab, geert
  Cc: shuahkh, akpm, linux-api, linux-m68k, linux-kernel
In-Reply-To: <CAHse=S-KzC5y8eC_ZFrbUJo0Ub5Qv8WeeXHdY1-F=HckAX=L=A@mail.gmail.com>

[Re-send from a different email address because I apparently can't send 
plaintext from gMail on my phone.]

On 21 Dec 2014 09:37, "Andreas Schwab" <schwab@linux-m68k.org> wrote:
 >
> Geert Uytterhoeven <geert@linux-m68k.org> writes:
>
>  > Check success of execveat(5,
> 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [FAIL] (child 792
> exited with 126 not 127)
>
> POSIX says
> (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_08_02):
>
>      If a command is not found, the exit status shall be 127. If the
>      command name is found, but it is not an executable utility, the exit
>      status shall be 126.
>
 > Andreas.
 >

That sounds like a bit of a grey area -- is ENAMETOOLONG nearer to 
ENOENT or EACCES?   Maybe it's best to make the test allow either (given 
that it's not a test of shell behaviour).

I can update the test to do that, but it probably won't be until the new 
year I'm afraid.

David

^ permalink raw reply

* Re: [PATCH net] in6: fix conflict with glibc
From: David Miller @ 2014-12-22 21:13 UTC (permalink / raw)
  To: stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ
  Cc: hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
	florent.fourcot-Pj3lBMu8rt9bbU8NOSLlsg,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141220121549.7c1b8aad@urahara>

From: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Date: Sat, 20 Dec 2014 12:15:49 -0800

> Resolve conflicts between glibc definition of IPV6 socket options
> and those defined in Linux headers. Looks like earlier efforts to
> solve this did not cover all the definitions.
> 
> It resolves warnings during iproute2 build. 
> Please consider for stable as well.
> 
> Signed-off-by: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
> 
> ---
> Patch against -net tree

Applied and queued up for -stable, thanks Stephen.

^ permalink raw reply

* Re: [PATCH] selftests/exec: Use %zu to format size_t
From: Shuah Khan @ 2014-12-22 18:14 UTC (permalink / raw)
  To: Geert Uytterhoeven, David Drysdale, Andrew Morton
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1419159496-11558-1-git-send-email-geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org>

On 12/21/2014 03:58 AM, Geert Uytterhoeven wrote:
> On 32-bit:
> 
> execveat.c: In function 'check_execveat_pathmax':
> execveat.c:183: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'
> execveat.c:187: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'size_t'
> 
> Signed-off-by: Geert Uytterhoeven <geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org>
> ---
>  tools/testing/selftests/exec/execveat.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
> index 33a5c06d95caa038..d273624c93a64254 100644
> --- a/tools/testing/selftests/exec/execveat.c
> +++ b/tools/testing/selftests/exec/execveat.c
> @@ -179,11 +179,11 @@ static int check_execveat_pathmax(int dot_dfd, const char *src, int is_script)
>  	 */
>  	fd = open(longpath, O_RDONLY);
>  	if (fd > 0) {
> -		printf("Invoke copy of '%s' via filename of length %lu:\n",
> +		printf("Invoke copy of '%s' via filename of length %zu:\n",
>  			src, strlen(longpath));
>  		fail += check_execveat(fd, "", AT_EMPTY_PATH);
>  	} else {
> -		printf("Failed to open length %lu filename, errno=%d (%s)\n",
> +		printf("Failed to open length %zu filename, errno=%d (%s)\n",
>  			strlen(longpath), errno, strerror(errno));
>  		fail++;
>  	}
> 


Applied to kernel/git/shuah/linux-kselftest.git fixes
for 3.19-rc2

thanks,
-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org | (970) 217-8978

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Khalid Aziz @ 2014-12-22 16:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, corbet, mingo, hpa, riel, akpm, rientjes, ak,
	mgorman, raistlin, kirill.shutemov, atomlin, avagin, gorcunov,
	serge.hallyn, athorlton, oleg, vdavydov, daeseok.youn, keescook,
	yangds.fnst, sbauer, vishnu.ps, axboe, paulmck, linux-kernel,
	linux-doc, linux-api
In-Reply-To: <alpine.DEB.2.11.1412200041220.17382@nanos>

On 12/19/2014 04:57 PM, Thomas Gleixner wrote:
> On Fri, 19 Dec 2014, Khalid Aziz wrote:
>> The queuing problem caused by a task taking a contended lock just before its
>> current timeslice is up which userspace app wouldn't know about, is a real
>> problem nevertheless.
>
> We know that already.
>
>> My patch attempts to avoid the contention in the first
>> place. futex with adaptive spinning is a post-contention solution that tries
>> to minimize the cost of contention but does nothing to avoid the contention.
>
> I never said that adaptive spinning can solve that problem.
>
> If you would have carefuly read what I wrote, you might have noticed,
> that I said:
>
>       a proper futex like spin mechanism
>
> Can you spot the subtle difference between that phrase and 'futex with
> adaptive spinning'?
>
>> Solving this problem using futex can help only if the userspace lock uses
>> futex.
>
> A really fundamentally new and earth shattering insight.
>
> If you would spend your time to actually digest what maintainers are
> telling you, we might make progress on that matter.
>
> But you prefer to spend your time by repeating yourself and providing
> completely useless information.
>
> What you are missing completely here is that neither me nor other
> maintainers involved care about how you spend your time. But we very
> much care about the time WE waste with your behaviour.

I am sorry that you feel the need to continue to resort to personal 
attacks even after I made it clear in my last response that I was not 
going to pursue this patch. There is no possibility of a productive 
discussion of a solution at this point. I hope someone else can find a 
solution you find acceptable.

Thanks,
Khalid


^ permalink raw reply

* Re: [PATCH v5 2/8] crypto: AF_ALG: add setsockopt for auth tag size
From: Herbert Xu @ 2014-12-22 12:05 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto, linux-api
In-Reply-To: <5195949.2IIqD2tWoo@tachyon.chronox.de>

On Sun, Dec 07, 2014 at 11:21:42PM +0100, Stephan Mueller wrote:
> Use setsockopt on the tfm FD to provide the authentication tag size for
> an AEAD cipher. This is achieved by adding a callback function which is
> intended to be used by the AEAD AF_ALG implementation.
> 
> The optlen argument of the setsockopt specifies the authentication tag
> size to be used with the AEAD tfm.
> 
> Signed-off-by: Stephan Mueller <smueller@chronox.de>

Patch applied.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v5 5/8] crypto: AF_ALG: add user space interface for RNG
From: Herbert Xu @ 2014-12-22 11:27 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <3380968.kTQNpvjKFa-PJstQz4BMNNP20K/wil9xYQuADTiUCJX@public.gmane.org>

On Sun, Dec 07, 2014 at 11:23:48PM +0100, Stephan Mueller wrote:
> Allow user space to seed / reset the RNG via a setsockopt.
> 
> This patch reuses alg_setkey to copy data into the kernel. The
> alg_setkey is now used for two mechanisms: setkey and seeding.
> The function is extended by the providing the function pointer
> to the function handling the copied data.
> 
> As the alg_setkey is now usable for more than just setkey, it is renamed
> to alg_setop.

Just call it setkey, there is no harm in treating the seed as a key
is there?

In fact we should have done this from the very start.
crypto_rng_reset should be renamed crypto_rng_setkey.

Cheers,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v5 3/8] crypto: AF_ALG: add AEAD support
From: Herbert Xu @ 2014-12-22 11:23 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, 'Quentin Gouchet', 'LKML',
	linux-crypto, linux-api
In-Reply-To: <3151022.sz5v21Vqeg@tachyon.chronox.de>

On Sun, Dec 07, 2014 at 11:22:30PM +0100, Stephan Mueller wrote:
>
> +static inline bool aead_sufficient_data(struct aead_ctx *ctx)
> +{
> +	unsigned as = crypto_aead_authsize(crypto_aead_reqtfm(&ctx->aead_req));
> +
> +	return (ctx->used >= (ctx->aead_assoclen + ctx->enc ? : as ));

Is this supposed to be

	return (ctx->used >= (ctx->aead_assoclen + (ctx->enc ?: as)));

> +static int aead_recvmsg(struct kiocb *unused, struct socket *sock,
> +			    struct msghdr *msg, size_t ignored, int flags)
> +{

...

> +	err = -ENOMEM;
> +	if (!aead_sufficient_data(ctx))
> +		goto unlock;

You should just block if there is insufficient input.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH selftest fails!] m68k: Wire up execveat
From: Andreas Schwab @ 2014-12-21 14:37 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David Drysdale, Shuah Khan, Andrew Morton, linux-api, Linux/m68k,
	Linux Kernel Development
In-Reply-To: <alpine.DEB.2.10.1412211159410.7624@ayla.of.borg>

Geert Uytterhoeven <geert@linux-m68k.org> writes:

> Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [FAIL] (child 792 exited with 126 not 127)

POSIX says (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_08_02):

    If a command is not found, the exit status shall be 127. If the
    command name is found, but it is not an executable utility, the exit
    status shall be 126.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH net] in6: fix conflict with glibc
From: Hannes Frederic Sowa @ 2014-12-21 13:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Florent Fourcot, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141220121549.7c1b8aad@urahara>

On Sa, 2014-12-20 at 12:15 -0800, Stephen Hemminger wrote:
> Resolve conflicts between glibc definition of IPV6 socket options
> and those defined in Linux headers. Looks like earlier efforts to
> solve this did not cover all the definitions.
> 
> It resolves warnings during iproute2 build. 
> Please consider for stable as well.
> 
> Signed-off-by: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>

Acked-by: Hannes Frederic Sowa <hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>

^ permalink raw reply

* [PATCH selftest fails!] m68k: Wire up execveat
From: Geert Uytterhoeven @ 2014-12-21 11:02 UTC (permalink / raw)
  To: David Drysdale
  Cc: Shuah Khan, Andrew Morton, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Linux/m68k, Linux Kernel Development

Check success of execveat(3, '../execveat', 0)... [OK]
Check success of execveat(5, 'execveat', 0)... [OK]
Check success of execveat(6, 'execveat', 0)... [OK]
Check success of execveat(-100, '/root/selftest-exec/exec/execveat', 0)... [OK]
Check success of execveat(99, '/root/selftest-exec/exec/execveat', 0)... [OK]
Check success of execveat(8, '', 4096)... [OK]
Check success of execveat(17, '', 4096)... [OK]
Check success of execveat(9, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(15, '', 4096)... [OK]
Check failure of execveat(8, '', 0) with ENOENT... [OK]
Check failure of execveat(8, '(null)', 4096) with EFAULT... [OK]
Check success of execveat(5, 'execveat.symlink', 0)... [OK]
Check success of execveat(6, 'execveat.symlink', 0)... [OK]
Check success of execveat(-100, '/root/selftest-exec/...xec/execveat.symlink', 0)... [OK]
Check success of execveat(10, '', 4096)... [OK]
Check success of execveat(10, '', 4352)... [OK]
Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(6, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(-100, '/root/selftest-exec/exec/execveat.symlink', 256) with ELOOP... [OK]
Check success of execveat(3, '../script', 0)... [OK]
Check success of execveat(5, 'script', 0)... [OK]
Check success of execveat(6, 'script', 0)... [OK]
Check success of execveat(-100, '/root/selftest-exec/exec/script', 0)... [OK]
Check success of execveat(13, '', 4096)... [OK]
Check success of execveat(13, '', 4352)... [OK]
Check failure of execveat(18, '', 4096) with ENOENT... [OK]
Check failure of execveat(7, 'script', 0) with ENOENT... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check success of execveat(4, 'script', 0)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check failure of execveat(4, 'script', 0) with ENOENT... [OK]
Check failure of execveat(5, 'execveat', 65535) with EINVAL... [OK]
Check failure of execveat(5, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(6, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(-100, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(5, '', 4096) with EACCES... [OK]
Check failure of execveat(5, 'Makefile', 0) with EACCES... [OK]
Check failure of execveat(11, '', 4096) with EACCES... [OK]
Check failure of execveat(12, '', 4096) with EACCES... [OK]
Check failure of execveat(99, '', 4096) with EBADF... [OK]
Check failure of execveat(99, 'execveat', 0) with EBADF... [OK]
Check failure of execveat(8, 'execveat', 0) with ENOTDIR... [OK]
Invoke copy of 'execveat' via filename of length 4093:
Check success of execveat(19, '', 4096)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
Invoke copy of 'script' via filename of length 4093:
Check success of execveat(20, '', 4096)... [OK]
/bin/sh: /dev/fd/5/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 xxxxxxxxxxxxxxxxxxxxxxxxxx/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy: File name too long
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [FAIL] (child 792 exited with 126 not 127)
1 tests failed
make: *** [run_tests] Error 1

Signed-off-by: Geert Uytterhoeven <geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org>
---
The last test fails because of an unexpected exit code?

 arch/m68k/include/asm/unistd.h      | 2 +-
 arch/m68k/include/uapi/asm/unistd.h | 1 +
 arch/m68k/kernel/syscalltable.S     | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/include/asm/unistd.h b/arch/m68k/include/asm/unistd.h
index 75e75d7b1702fb64..244e0dbe45dbeda3 100644
--- a/arch/m68k/include/asm/unistd.h
+++ b/arch/m68k/include/asm/unistd.h
@@ -4,7 +4,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		355
+#define NR_syscalls		356
 
 #define __ARCH_WANT_OLD_READDIR
 #define __ARCH_WANT_OLD_STAT
diff --git a/arch/m68k/include/uapi/asm/unistd.h b/arch/m68k/include/uapi/asm/unistd.h
index 2c1bec9a14b67da4..61fb6cb9d2ae3c66 100644
--- a/arch/m68k/include/uapi/asm/unistd.h
+++ b/arch/m68k/include/uapi/asm/unistd.h
@@ -360,5 +360,6 @@
 #define __NR_getrandom		352
 #define __NR_memfd_create	353
 #define __NR_bpf		354
+#define __NR_execveat		355
 
 #endif /* _UAPI_ASM_M68K_UNISTD_H_ */
diff --git a/arch/m68k/kernel/syscalltable.S b/arch/m68k/kernel/syscalltable.S
index 2ca219e184cd16e6..a0ec4303f2c8e57a 100644
--- a/arch/m68k/kernel/syscalltable.S
+++ b/arch/m68k/kernel/syscalltable.S
@@ -375,4 +375,5 @@ ENTRY(sys_call_table)
 	.long sys_getrandom
 	.long sys_memfd_create
 	.long sys_bpf
+	.long sys_execveat		/* 355 */
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH] selftests/exec: Use %zu to format size_t
From: Geert Uytterhoeven @ 2014-12-21 10:58 UTC (permalink / raw)
  To: David Drysdale, Shuah Khan, Andrew Morton
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Geert Uytterhoeven

On 32-bit:

execveat.c: In function 'check_execveat_pathmax':
execveat.c:183: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'
execveat.c:187: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'size_t'

Signed-off-by: Geert Uytterhoeven <geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org>
---
 tools/testing/selftests/exec/execveat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
index 33a5c06d95caa038..d273624c93a64254 100644
--- a/tools/testing/selftests/exec/execveat.c
+++ b/tools/testing/selftests/exec/execveat.c
@@ -179,11 +179,11 @@ static int check_execveat_pathmax(int dot_dfd, const char *src, int is_script)
 	 */
 	fd = open(longpath, O_RDONLY);
 	if (fd > 0) {
-		printf("Invoke copy of '%s' via filename of length %lu:\n",
+		printf("Invoke copy of '%s' via filename of length %zu:\n",
 			src, strlen(longpath));
 		fail += check_execveat(fd, "", AT_EMPTY_PATH);
 	} else {
-		printf("Failed to open length %lu filename, errno=%d (%s)\n",
+		printf("Failed to open length %zu filename, errno=%d (%s)\n",
 			strlen(longpath), errno, strerror(errno));
 		fail++;
 	}
-- 
1.9.1

^ permalink raw reply related

* [PATCH net] in6: fix conflict with glibc
From: Stephen Hemminger @ 2014-12-20 20:15 UTC (permalink / raw)
  To: David Miller, Hannes Frederic Sowa, Florent Fourcot
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA

Resolve conflicts between glibc definition of IPV6 socket options
and those defined in Linux headers. Looks like earlier efforts to
solve this did not cover all the definitions.

It resolves warnings during iproute2 build. 
Please consider for stable as well.

Signed-off-by: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>

---
Patch against -net tree

--- a/include/uapi/linux/in6.h	2014-08-28 08:23:29.981240723 -0700
+++ b/include/uapi/linux/in6.h	2014-12-20 12:00:31.973120787 -0800
@@ -149,7 +149,7 @@ struct in6_flowlabel_req {
 /*
  *	IPV6 socket options
  */
-
+#if __UAPI_DEF_IPV6_OPTIONS
 #define IPV6_ADDRFORM		1
 #define IPV6_2292PKTINFO	2
 #define IPV6_2292HOPOPTS	3
@@ -196,6 +196,7 @@ struct in6_flowlabel_req {
 
 #define IPV6_IPSEC_POLICY	34
 #define IPV6_XFRM_POLICY	35
+#endif
 
 /*
  * Multicast:
--- a/include/uapi/linux/libc-compat.h	2014-04-08 22:12:24.316054847 -0700
+++ b/include/uapi/linux/libc-compat.h	2014-12-20 12:10:29.960213758 -0800
@@ -69,6 +69,7 @@
 #define __UAPI_DEF_SOCKADDR_IN6		0
 #define __UAPI_DEF_IPV6_MREQ		0
 #define __UAPI_DEF_IPPROTO_V6		0
+#define __UAPI_DEF_IPV6_OPTIONS		0
 
 #else
 
@@ -82,6 +83,7 @@
 #define __UAPI_DEF_SOCKADDR_IN6		1
 #define __UAPI_DEF_IPV6_MREQ		1
 #define __UAPI_DEF_IPPROTO_V6		1
+#define __UAPI_DEF_IPV6_OPTIONS		1
 
 #endif /* _NETINET_IN_H */
 
@@ -103,6 +105,7 @@
 #define __UAPI_DEF_SOCKADDR_IN6		1
 #define __UAPI_DEF_IPV6_MREQ		1
 #define __UAPI_DEF_IPPROTO_V6		1
+#define __UAPI_DEF_IPV6_OPTIONS		1
 
 /* Definitions for xattr.h */
 #define __UAPI_DEF_XATTR		1

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Thomas Gleixner @ 2014-12-19 23:57 UTC (permalink / raw)
  To: Khalid Aziz
  Cc: Peter Zijlstra, corbet, mingo, hpa, riel, akpm, rientjes, ak,
	mgorman, raistlin, kirill.shutemov, atomlin, avagin, gorcunov,
	serge.hallyn, athorlton, oleg, vdavydov, daeseok.youn, keescook,
	yangds.fnst, sbauer, vishnu.ps, axboe, paulmck, linux-kernel,
	linux-doc, linux-api
In-Reply-To: <54949BF0.8030403@oracle.com>

On Fri, 19 Dec 2014, Khalid Aziz wrote:
> The queuing problem caused by a task taking a contended lock just before its
> current timeslice is up which userspace app wouldn't know about, is a real
> problem nevertheless.

We know that already.

> My patch attempts to avoid the contention in the first
> place. futex with adaptive spinning is a post-contention solution that tries
> to minimize the cost of contention but does nothing to avoid the contention.

I never said that adaptive spinning can solve that problem. 

If you would have carefuly read what I wrote, you might have noticed,
that I said: 

     a proper futex like spin mechanism

Can you spot the subtle difference between that phrase and 'futex with
adaptive spinning'?

> Solving this problem using futex can help only if the userspace lock uses
> futex.

A really fundamentally new and earth shattering insight.

If you would spend your time to actually digest what maintainers are
telling you, we might make progress on that matter.

But you prefer to spend your time by repeating yourself and providing
completely useless information.

What you are missing completely here is that neither me nor other
maintainers involved care about how you spend your time. But we very
much care about the time WE waste with your behaviour.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice
From: Khalid Aziz @ 2014-12-19 21:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, corbet, mingo, hpa, riel, akpm, rientjes, ak,
	mgorman, raistlin, kirill.shutemov, atomlin, avagin, gorcunov,
	serge.hallyn, athorlton, oleg, vdavydov, daeseok.youn, keescook,
	yangds.fnst, sbauer, vishnu.ps, axboe, paulmck, linux-kernel,
	linux-doc, linux-api
In-Reply-To: <alpine.DEB.2.11.1412190045590.17382@nanos>

On 12/18/2014 05:27 PM, Thomas Gleixner wrote:
> On Thu, 18 Dec 2014, Khalid Aziz wrote:
>> On 12/18/2014 04:02 PM, Thomas Gleixner wrote:
>>> If we can solve it with a proper designed and well thought out
>>> functionality in the kernel based on a futex like mechanism, why cant
>>> java and databases not switch over to that and simply use it?
>>>
>>> You need to modify user space anyway, so it does not matter whether
>>> you modify it in a sane or in a hacky way.
>>
>> Actually userspace does not need to be modified. The code to use this
>> functionality is already present in database code since this same
>> functionality exists on other OSs (the API is a little different but those
>> details can be handled with a simple header file in userspace). Userspace code
>> has already been tested and debugged thoroughly on the OSs that support this
>> functionality and that has significant impact on testing effort. So for
>> userspace it is simply a matter of turning that code on on Linux as well and
>> recompiling. This would be a multi-platform solution for database/java as
>> opposed to a Linux specific solution.
>
> Bullshit. If you turn that option on, it's a modification from the QA
> point of view and you need to run a full validation no matter
> what. Anything else is just QA by crystal ball.
>
> Of course you carefully avoided (again) to answer the real question:
>
>> But its simpler to hack crap into the scheduler than coming up with a
>> proper solution to the problem, right?
>
> I can answer it for you: Yes, it is simpler.
>
> But as you might have figured out it's not really popular and therefor
> not simpler to be accepted by the people who actually care about sane
> designs. I can whip you up special purpose hacks for that which will
> give you way more guarantees with way less lines of horrible code, but
> that does not mean that such hacks are an acceptable solution. You can
> carry those hacks in your private tree and ship it to your customers,
> but do not expect that any sane maintainer will care about it.
>
> Now the very same maintainers asked you several times to answer the
> question why this can't be done with proper futex like spin
> mechanisms, which would solve a bunch of related problems as well.
>
>   You never even tried to answer that question simply because you never
>   tried to think about it for real. Your only answer is that you want A
>   because A is already used on other OSs and therefor solution B is not
>   an option.
>
>   But if solution B would gain 4% performance, then according to your
>   previous argumentation it would become suddenly very interesting,
>   right?
>
> So unless you even show any sign of thinking about different
> approaches and technically arguing why they cannot deliver the same
> value you wont get anywhere with this and I can tell you why.
>
> You create a new user space ABI
>
>   That forces the kernel to support it forever, which in consequence
>   imposes restrictions on the kernel scheduler forever.
>
>   We have enough restrictions by misdesigned ABIs (e.g. sched_yield())
>   already, so we really do not need more of that.
>
> You ignore any request to prove why a proper designed spin futex
> interface would not be a sensible solution for the problem.
>
>   Of course you are free to ignore that (as you are free to ignore
>   important review comments), but you don't have to be suprised when
>   the responsible maintainers ignore any further attempt from you to
>   get this merged.
>
> Aside of that, you still fail to provide a proper test case which is
> publically usable for the people involved in this to reproduce your 3%
> gain and analyze the problem at hand properly. The provided:
>
>        enable_hack();
>        while (/*some condition */) {
>        	    /* bla */
> 	    /* blub */
> 	    /* blurb */
> 	    /* yay! */
>        }
>        disable_hack();
>
> is beyond useless.
>
> Thanks,
>
> 	tglx
>

Fair enough. Implications of a new userspace ABI can be significant and 
I can accept not introducing a new one in the kernel.

The queuing problem caused by a task taking a contended lock just before 
its current timeslice is up which userspace app wouldn't know about, is 
a real problem nevertheless. My patch attempts to avoid the contention 
in the first place. futex with adaptive spinning is a post-contention 
solution that tries to minimize the cost of contention but does nothing 
to avoid the contention. Solving this problem using futex can help only 
if the userspace lock uses futex.

I have looked at solving this problem in userspace using priority 
inheritance semaphore but ran into many problems. I will go back and 
take another look at it.

I appreciate your feedback.

Thanks,
Khalid

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox