All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>
Subject: [RFC] regset ->get() API
Date: Mon, 17 Feb 2020 18:33:40 +0000	[thread overview]
Message-ID: <20200217183340.GI23230@ZenIV.linux.org.uk> (raw)

	Looking at the regset guts, I really wonder if the interface
is right.  What happens goes along the lines of

ptrace(PTRACE_GETREGS, pid, whatever, buf)
  arch_ptrace(task, PTRACE_GETREGS, whatever, buf)
    copy_regset_to_user(task, task_user_regset_view(current),
		REGSET_GENERAL, 0, sizeof(struct user_regs_struct), buf);
     check access_ok(buf, sizeof(...)
     genregs_get(task, regset, 0, sizeof(...), NULL, buf);
and we hit
static int genregs_get(struct task_struct *target,
                       const struct user_regset *regset,
                       unsigned int pos, unsigned int count,
                       void *kbuf, void __user *ubuf)
{
        if (kbuf) {
                unsigned long *k = kbuf;
                while (count >= sizeof(*k)) {
                        *k++ = getreg(target, pos);
                        count -= sizeof(*k);
                        pos += sizeof(*k);
                }
        } else {
                unsigned long __user *u = ubuf;
                while (count >= sizeof(*u)) {
                        if (__put_user(getreg(target, pos), u++))
                                return -EFAULT;
                        count -= sizeof(*u);
                        pos += sizeof(*u);
                }
        }

        return 0;
}
 
IOW, we call getreg() in a loop, with separate __put_user() for each
value.  For one thing, it's going to be painful on any recent x86 -
the cost of stac/clac on each of those is not going to be cheap.
And we obviously can't extend stac/clac area over the entire loop
there - getreg() is not trivial.

For another, the calling conventions are too generic - the callers
of ->get() always pass zero for pos, for example.  And this "pass
kbuf and ubuf separately" wart does not make it any prettier.

Other instances (e.g. powerpc gpr_get()) do not use __put_user() -
they make a series of user_regset_copyout()/user_regset_copyout_zero()
calls (4 in this case):
static inline int user_regset_copyout(unsigned int *pos, unsigned int *count,
                                      void **kbuf,
                                      void __user **ubuf, const void *data,
                                      const int start_pos, const int end_pos)
{
        if (*count == 0)
                return 0;
        BUG_ON(*pos < start_pos);
        if (end_pos < 0 || *pos < end_pos) {
                unsigned int copy = (end_pos < 0 ? *count
                                     : min(*count, end_pos - *pos));
                data += *pos - start_pos;
                if (*kbuf) {
                        memcpy(*kbuf, data, copy);
                        *kbuf += copy;
                } else if (__copy_to_user(*ubuf, data, copy))
                        return -EFAULT;
                else
                        *ubuf += copy;
                *pos += copy;
                *count -= copy;
        }
        return 0;
}

The caller obviously needs to check if the damn thing has failed, and the
calling conventions are still over-generic (grep and you'll see).  The
actual calls of __copy_to_user()/__clear_user() have destinations back-to-back.
And while in case of ppc their number is not too large, for e.g.
arch/arc/kernel/ptrace.c:genregs_get() we get 39 calls of those things -
basically, back to the __put_user-inna-loop-and-thats-cuttin-me-own-throat
situation we have on x86, only with more overhead per register.

And too convoluted calling conventions come with the usual price - they
are too easy to get wrong.  Example:
arch/x86/math-emu/fpu_entry.c:fpregs_soft_get() is
{
        struct swregs_state *s387 = &target->thread.fpu.state.soft;
        const void *space = s387->st_space;
        int ret;
        int offset = (S387->ftop & 7) * 10, other = 80 - offset;

        RE_ENTRANT_CHECK_OFF;

#ifdef PECULIAR_486
        S387->cwd &= ~0xe080;
        /* An 80486 sets nearly all of the reserved bits to 1. */
        S387->cwd |= 0xffff0040;
        S387->swd = sstatus_word() | 0xffff0000;
        S387->twd |= 0xffff0000;
        S387->fcs &= ~0xf8000000;
        S387->fos |= 0xffff0000;
#endif /* PECULIAR_486 */

        ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, s387, 0,
                                  offsetof(struct swregs_state, st_space));

OK, here we copy everything in s387 up to the beginning of s387->st_space.
Fair enough.  Now pos is offsetof(struct swregs_state, st_space)), and...
        /* Copy all registers in stack order. */
        if (!ret)
                ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                          space + offset, 0, other);
        if (!ret)
                ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                          space, 0, offset);
We _intend_ to copy stack entries, top..7, then 0..top-1, out of
s387->st_space.  What we actually do is different - pos is updated
by each call, so the last two arguments in these calls are wrong.
They should've been offsetof(struct swregs_state, st_space)) and
offsetof(struct swregs_state, st_space)) + other in the first
one and offsetof(struct swregs_state, st_space)) + other and
offsetof(struct swregs_state, st_space)) + 80 in the second one.

It's not hard to fix, and nobody really uses !CONFIG_FPU setups
on x86 anyway; the point is that this is really easy to get wrong.

What I really wonder is whether it's actually worth bothering - the
life would be much simpler if we *always* passed a kernel buffer to
->get() instances and did all copyout at once.  Sure, it results in
double copies, but... would that really cost more than all the complexity
we have there?

What are the typical amounts of data copied in such ptrace() calls
and how hot they normally are?

             reply	other threads:[~2020-02-17 18:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-17 18:33 Al Viro [this message]
2020-02-19 20:01 ` [RFC] regset ->get() API Linus Torvalds
2020-02-20 22:47   ` Al Viro
2020-02-20 22:56     ` Linus Torvalds
2020-02-20 23:29       ` Al Viro
2020-02-20 23:31         ` Linus Torvalds
2020-02-21  3:30           ` Al Viro
2020-02-21 18:59             ` Al Viro
2020-02-21 19:22               ` David Miller
2020-02-22  0:41                 ` Al Viro
2020-04-13  4:32                   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200217183340.GI23230@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=arnd@arndb.de \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.