From: Oren Laadan <orenl@librato.com>
To: Dan Smith <danms@us.ibm.com>
Cc: containers@lists.osdl.org, Alexey Dobriyan <adobriyan@gmail.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH 5/5] c/r: Add AF_UNIX support (v6)
Date: Tue, 28 Jul 2009 17:18:17 -0400 [thread overview]
Message-ID: <4A6F6B19.9010508@librato.com> (raw)
In-Reply-To: <87ljm8czsf.fsf@caffeine.danplanet.com>
Dan Smith wrote:
> OL> obj_sock_users() is only required for objects that are to be
> OL> tested for leaks in full container checkpoint. I don't think it's
> OL> needed, because the relation sock <-> file is 1-to-1. (If it
> OL> were, then you would also need to collect sockets).
>
> Okay, that also answers the question posed by John in the other
> thread.
>
> OL> (Actually, I will remove checkpoint_bad and restore_bad and modify
> OL> checkpoint_obj() and restore_obj() to fail if the respective
> OL> method is NULL).
>
> Okay, should I expect that to show up in v17-dev soon?
Yes.
>
> OL> Nit: I already got confused a few times because of the similar
> OL> names. Perhaps change CKPT_HDR_SOCKET_BUFFERS to
> OL> CKPT_HDR_SOCKET_QUEUE (and adjust the header name accordingly).
>
> Agreed. I remember writing that and remarking to myself how
> ridiculous of a name it was, but never went back to change it.
>
> OL> Unless/until we decide otherwise, let's keep all ckpt_hdr_xxx
> OL> definitions in a single place -- checkpoint_hdr.h
>
> That's how I initially had it, but Serge asked for it to be moved into
> the network headers. Serge?
>
> OL> Unless you intend to use the struct name 'ckpt_socket' elsewhere,
> OL> can we get rid of it (leaving only 'struct {') ?
>
> Yep.
>
> OL> Can you use fill_name() from checkpoint/file.c ?
>
> Yeah, looks like it.
>
> OL> This direct call to ->getname skips security checks through
> OL> getsockname(). You may want to refactor sys_getsockname() similar
> OL> to sys_bind().
>
> Okay.
>
>>> + if ((h->sock.userlocks & SOCK_SNDBUF_LOCK) &&
>>> + ((h->sock.sndbuf < SOCK_MIN_SNDBUF) ||
>>> + (h->sock.sndbuf > sysctl_wmem_max)))
>>> + return -EINVAL;
>
> OL> At least for afunix, if the user did not explicitly set the
> OL> sndbuf, then userlocks won't have SOCK_SNDBUF_LOCK set.
>
> OL> Also, if userlocks in the image doesn't have SOCK_SNDBUF_LOCK,
> OL> then the sndbuf value isn't checked ?
>
> I think I was thinking that I only needed to verify the buffer value
> if the user claimed to have set it (as if it would be ignored
> otherwise), but that doesn't seem right. So, I think the proper
> thing to do here is always check it (i.e., remove the first check of
> the lock).
>
> OL> What about verifying sock.flags itself ?
> OL> In doing that, some options may assume/require some state --
> OL> - SOCK_USE_WRITE_QUEUE only used in ipv4/ipv6/sctp
> OL> - SOCK_DESTROY only used in some protocols
>
> OL> Perhaps sanitize sock.flags per protocol ?
>
> Hmm, okay.
>
> OL> Many of these direct copy into the socket and sock effectively
> OL> bypass security checks that take place in {get,set}sockopt(),
> OL> either explicitly (e.g. sk_sndtimeo) or implicitly (e.g.
> OL> SOCK_LINGER in sock.flags, reflecting SO_LINGER option).
>
> OL> This applies both to checkpoint (potentially bypassing permission
> OL> of the checkpointer to view this data) and restart (bypassing
> OL> permissions of user to set these values).
>
> OL> The alternative is to use socksetopt/sockgetopt for those values
> OL> that should be subject to security checks.
>
> Yeah, I suppose so. I've resisted that thus far because it will make
> the sync operation so much harder to read, but I suppose it's
> unavoidable.
>
> OL> There should also be per-protocol consistency checks. E.g. afunix
> OL> cannot be in socket.state == SS_{DIS,}CONNECTING
>
> I suppose so, but I don't see anything in af_unix.c that seems to care :)
>
> OL> Better yet: -ENOSYS ?
>
> Okay.
>
>>> + ret = kernel_sendmsg(sock->sk_socket, &msg, &kvec, 1, len);
>
> OL> Is it possible to avoid the extra copy using splice() instead ?
>
> It's possible, but it will require some refactoring to get access to
> all the right pointers. I'd propose we push that optimization out
> until after we've got these patches integrated into the c/r tree.
Hmmm.. what about splice_direct_to_actor() ?
>
> OL> SOCK_DEAD in sk->flags may also pose a problem. (Do we at all
> OL> need to checkpoint and/or restore SOCK_DEAD ?!)
>
> Is there any reasonable way we could arrive at a SOCK_DEAD socket via
> a valid descriptor?
That's a good point. I think you are right, and there isn't.
Still need to keep it in mind for inet when including those lingering
sockets that don't belong to anyone.
This also means that a peer (of a dgram socket) that was closed will
not be checkpointed, so restoring the rcvbuf of the remaining dgram
socket wouldn't work.
>
> OL> Hmm... this test is quite hidden here - maybe a fat comment with a
> OL> reference to why it's here and what it is doing ?
>
> Sure.
>
>>> + else {
>>> + ckpt_debug("Buffer total %u exceeds limit %u\n",
>>> + h->total_bytes, *bufsize);
>>> + ret = -EINVAL;
>>> + goto out;
>>> + }
>>> + }
>>> +
>>> + for (i = 0; i < h->skb_count; i++) {
>>> + ret = sock_read_buffer(ctx, sock);
>>> + ckpt_debug("read buffer %i: %i\n", i, ret);
>>> + if (ret < 0)
>>> + break;
>>> +
>>> + total += ret;
>>> + if (total > h->total_bytes) {
>
> OL> What if 'total' overflows (for CAP_NET_ADMIN) ? perhaps:
> OL> if (total > h->total_bytes || total < ret) {
>
> Actually, I've changed this locally to require fewer variables and
> special cases. Let me know when I re-post if you don't think it's a
> better way to handle this.
>
> OL> Does the following bypass security checks for sys_connect() ?
>
> I don't think so. We're basically replicating sys_socketpair() here,
> which does not do a security check, presumably because all you're
> doing is hooking two sockets together that both belong to you. That's
> not to say that we're as safe as that limited operation, but I don't
> think it's totally clear. Perhaps someone more confident will
> comment.
Yes, please ... Serge ?
To me it sounds plausible. If we adopt it, then a comment in the
code is worthwhile.
>
>>> + /* Prime the socket's buffer limit with the maximum */
>>> + peer->sk_userlocks |= SOCK_SNDBUF_LOCK;
>>> + peer->sk_sndbuf = sysctl_wmem_max;
>
> OL> I suppose this meant to be temporary ?
>
> Right, it will be overridden when we synchronize the socket
> properties. I'll strengthen the comment.
Hmm.. then what happens when you have a circular dependency ?
For example, three dgram sockets, A, B and C where: A->B, B->C
and C->A ('->' means connected).
I suspect that sock_unix_restore_connect() will fail, because
neither:
+ if (!IS_ERR(this) && !IS_ERR(peer)) {
nor
+ } else if ((PTR_ERR(this) == -EINVAL) && (PTR_ERR(peer) == -EINVAL)) {
will hold true, therefore:
+ } else {
+ ckpt_debug("Order Error\n");
+ ret = PTR_ERR(this);
+ goto out;
+ }
Either that, or the temporary change cited above will become
permanent, because A won't be restored again.
>
> OL> It is indeed a good idea to make it generic for all sockets, but I
> OL> suspect the way sock_read_buffers() works will only be suitable
> OL> for afunix, and perhaps for in-container inet4/6 (by
> OL> "in-container" I mean that both sides of the connection are in the
> OL> checkpoint image).
>
> Yeah, I meant to put a comment in the commit log about this. As I
> mentioned before, I was hoping to put most of the effort into the UNIX
> patch and move forward with that as we iterate on the INET patch.
> Thus, this was part of the buffer restore re-swizzle that clearly
> won't work for INET. In the meantime I've been working on the INET
> patch and splitting it up while retaining the necessary behavioral
> changes made here. In my next posting these should look better.
>
> OL> Hmm... does the code below work well with dgram sockets who
> OL> received data from the peer, and then the peer died (before
> OL> checkpoint) ?
>
> I'm not sure that I've replicated that exact scenario in my tests.
> However, when we're restoring socket A with outstanding incoming
> buffers, we restore them via B. If B is not in a reasonable state, I
> put it back into connected state to appease the sendmsg function and
> restore it when complete. See the "unshutdown" comment.
The problem isn't that B is not in a reasonable state, but
rather that B is not checkpointed in the first place (because it
is not reachable via any fd), so it isn't restored either.
Oren.
>
>>> +static struct file *sock_alloc_attach_fd(struct socket *socket)
>
> OL> Nit: I think this name is misleading - sock_attach_fd() itself is
> OL> already and should have been sock_attach_file() IMHO - so perhaps
> OL> s/fd/file here ?
>
> The name was chosen to reflect the fact that we're allocating a socket
> and then calling sock_attach_fd() on it. If you think it's less
> misleading to be inconsistent with the other call, I'll change it, but
> I wouldn't really agree... :)
> Thanks!
>
next prev parent reply other threads:[~2009-07-28 21:18 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-22 20:41 Add Checkpoint/Restart support for UNIX sockets Dan Smith
[not found] ` <1248295301-30930-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-07-22 20:41 ` [PATCH 1/5] Add _ckpt_read_hdr_type() helper Dan Smith
[not found] ` <1248295301-30930-2-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-07-23 3:59 ` Oren Laadan
[not found] ` <4A67E027.1050602-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-07-23 16:43 ` Dan Smith
2009-07-22 20:41 ` [PATCH 2/5] Add an errno validation function Dan Smith
[not found] ` <1248295301-30930-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-07-23 3:17 ` Oren Laadan
[not found] ` <4A67D654.2000206-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-23 15:40 ` Dan Smith
2009-07-22 20:41 ` [PATCH 3/5] Add a ckpt_read_string() function to allow reading of a variable-length (but length-capped) string from the checkpoint stream Dan Smith
[not found] ` <1248295301-30930-4-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-07-23 4:19 ` Oren Laadan
[not found] ` <4A67E4EC.3070509-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-23 16:58 ` Dan Smith
[not found] ` <87y6qfnxoq.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-07-24 1:32 ` Oren Laadan
2009-07-22 20:41 ` [PATCH 4/5] Add a common sock_bind() helper to unify the security hook Dan Smith
2009-07-22 20:41 ` [PATCH 5/5] c/r: Add AF_UNIX support (v6) Dan Smith
[not found] ` <1248295301-30930-6-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-07-28 16:54 ` Oren Laadan
2009-07-28 20:34 ` Dan Smith
2009-07-28 21:18 ` Oren Laadan [this message]
2009-07-29 13:36 ` Serge E. Hallyn
2009-07-29 14:49 ` Dan Smith
2009-07-29 14:59 ` Serge E. Hallyn
2009-07-29 15:03 ` Dan Smith
2009-07-29 15:34 ` Oren Laadan
2009-07-29 18:37 ` Dan Smith
2009-07-30 15:49 ` Dan Smith
2009-07-30 22:10 ` John Dykstra
2009-07-30 22:12 ` John Dykstra
2009-07-30 22:14 ` Dan Smith
2009-08-04 8:27 ` Oren Laadan
2009-08-04 15:16 ` Dan Smith
2009-08-04 17:05 ` Oren Laadan
2009-08-04 17:13 ` Dan Smith
2009-07-29 1:56 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A6F6B19.9010508@librato.com \
--to=orenl@librato.com \
--cc=adobriyan@gmail.com \
--cc=containers@lists.osdl.org \
--cc=danms@us.ibm.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.