From: ebiederm@xmission.com (Eric W. Biederman)
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: netdev@vger.kernel.org
Subject: Re: network namespace and kernel bind issue
Date: Mon, 01 Oct 2012 17:35:42 -0700 [thread overview]
Message-ID: <871uhhd82p.fsf@xmission.com> (raw)
In-Reply-To: <20121001163226.3873ca58@nehalam.linuxnetplumber.net> (Stephen Hemminger's message of "Mon, 1 Oct 2012 16:32:26 -0700")
Stephen Hemminger <shemminger@vyatta.com> writes:
> On Mon, 01 Oct 2012 16:11:07 -0700
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> Then my guess is that you have an ordering problem. Attempting
>> to initialize a vxlan before ipv4 is initialized or some such.
>
> Isn't there a gurantee that init operations are called in the order
> they registered?
Yes. With the caveat that all things registered with
register_pernet_subsys are called before register_pernet_device.
So if you are a registering as a subsystem the loopback device won't
have been registered yet.
So if there is some requirement that I'm not seeing that the loopback
device needs to be registered or possibly even registered and brought up
before we can bind to a port you could easily be hitting that.
[11587.371211] vxlan: bind for UDP socket 0.0.0.0:8472 (-98)
>From this one clue it does look like the trace is:
inet_bind
udp4_get_port
udp_lib_get_port
And it does look like the only possible failure when a port number
is passed in is for the port to be genuinly in use.
Ok. So I tracked down your patch so I could find the relevant code.
+static __net_init int vxlan_init_net(struct net *net)
+{
....
+ /* Create UDP socket for encapsulation receive. */
+ rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &vn->sock);
+ if (rc < 0) {
+ pr_debug("UDP socket create failed\n");
+ return rc;
+ }
And this is where we have the issue.
sock_create_kern only creates sockets in the initial network namespace.
There is inet_ctl_sock_create which comes closer to what you want
but I expect you want your socket to be hashed.
Still we need to do something here to avoid have a socket in the
network namespace that has a reference count on the network namespace
and keeps the network namespace from exiting.
We very clearly don't have a good interface for handling this at
the moment. I am drawing a blank at the moment on exactly what
such an interface should look like.
What we have is certainly error prone for use inside the kernel.
I have a suspicion the nfs server code that uses __sock_create
has the potential to forever pin a network namespace.
int sock_create_netns(struct net *net, int family, int type, int protocol,
struct socket **res)
{
int err;
err = __sock_create(&init_net, family, type, protocol, res, 1);
if (err == 0) {
sk_change_net(sock->sk, net);
return err;
}
Although I am beginning to suspect we should do the silly refcount
avoidance for all in kernel sockets, and just pass the kern parameter
all of the way down to sk_alloc, so it can get the refcounting right
the first time.
However for the bug fix for the merge window (since it appears Dave
merged this code).
I suggest you just add the sk_change_net and change the socket release
to sk_release_kern in release_net. At least that is localized, and
doesn't require us to clean up the API for in kernel sockets in a rush.
Eric
+ vxlan_addr.sin_port = htons(vxlan_port);
+
+ rc = kernel_bind(vn->sock, (struct sockaddr *) &vxlan_addr,
+ sizeof(vxlan_addr));
+ if (rc < 0) {
+ pr_debug("bind for UDP socket %pI4:%u (%d)\n",
+ &vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc);
+ sock_release(vn->sock);
+ vn->sock = NULL;
+ return rc;
+ }
Eric
next prev parent reply other threads:[~2012-10-02 0:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20121001141609.14639bc0@nehalam.linuxnetplumber.net>
[not found] ` <20121001145838.5eafef4c@nehalam.linuxnetplumber.net>
2012-10-01 22:40 ` network namespace and kernel bind issue Eric W. Biederman
2012-10-01 22:57 ` Stephen Hemminger
2012-10-01 23:11 ` Eric W. Biederman
2012-10-01 23:32 ` Stephen Hemminger
2012-10-02 0:35 ` Eric W. Biederman [this message]
2012-10-02 0:48 ` Stephen Hemminger
2012-10-02 6:15 ` Julian Anastasov
2012-10-02 15:51 ` Stephen Hemminger
2012-10-02 0:51 ` [PATCH net-next] vxlan: put UDP socket in correct namespace Stephen Hemminger
2012-10-02 0:58 ` Eric W. Biederman
2012-10-02 4:49 ` [PATCHv2 " Stephen Hemminger
2012-10-02 15:18 ` Eric W. Biederman
2012-10-02 18:39 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871uhhd82p.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.