* Re: [PATCH 0/7] Network namespace manipulation with file descriptors
From: Alex Bligh @ 2011-05-08 12:31 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, netdev, linux-kernel, Linux Containers, linux-fsdevel,
Alex Bligh
In-Reply-To: <m1fwoqoapn.fsf@fess.ebiederm.org>
Eric,
--On 7 May 2011 07:18:44 -0700 "Eric W. Biederman" <ebiederm@xmission.com>
wrote:
> You are essentially describing my setns system call.
Great - thanks.
>> As a secondary issue, ever without your patch, it would be really
>> useful to be able to read from userspace the current network namespace.
>> (i.e. the pid concerned, or 1 if not unshared). I would like to
>> simply modify a routing daemon's init script so it doesn't start
>> if in the host, e.g. at the top:
>> [ `cat /proc/.../networknamespace` eq 1 ] && exit 0
>
> You can read the processes network namespace by opening
> /proc/<pid>/ns/net. Unfortunately comparing the network
> namespaces for identity is another matter. You will probably
> be better off simply forcing the routing daemon to start
> in the desired network namespace in it's initscript.
It's solely a minor convenience issue. The network namespace is
unshared by the filing system namespace isn't. So there's an
/etc/init.d/bird, which I would like to remain there so I
can call it from the network namespace concerned (which
doesn't exist at boot time). But I'd also like it not to run
at boot time. So it would be useful to me if the script could
check whether it is running in the default namespace and
refuse to launch if so.
I note the /proc/ file you mention is not present in the main tree at
the moment.
> For purposes of clarity please have a look at my work in
> progress patch for iproute2. This demonstrates how I expect
> userspace to work in a multi-network namespace world.
Will do
--
Alex Bligh
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Alex Bligh @ 2011-05-08 12:18 UTC (permalink / raw)
To: Alex Bligh, Eric Dumazet; +Cc: netdev, Paul E. McKenney, Alex Bligh
In-Reply-To: <F57561A93EFF5E88729A8D53@nimrod.local>
--On 8 May 2011 10:35:02 +0100 Alex Bligh <alex@alex.org.uk> wrote:
> I suspect this may just mean an rcu reader holds the rcu_read_lock
> for a jiffies related time. Though I'm having difficulty seeing
> what that might be on a system where the net is in essence idle.
Having read the RCU docs, this can't be right, because blocking
is not legal when in the rcu_read_lock critical section.
The system concerned is an 8 cpu system but I get comparable
results on a 2 cpu system.
I am guessing that when the synchronize_sched() happens, all cores
but the cpu on which that is executing are idle (at least on
the vast majority of calls) as the machine itself is idle.
As I understand, RCU synchronization (in the absence of lots
of callbacks etc.) is meant to wait until it knows all RCU
read critical sections which are running on entry have
been left. It exploits the fact that RCU read critical sections
cannot block by waiting for a context switch on each cpu, OR
for that cpu to be in the idle state or running user code (also
incompatible with a read critical section).
The fact that increasing HZ masks the problem seems to imply that
sychronize_sched() is waiting when it shouldn't be, as it suggests
it's waiting for a context switch. But surely it shouldn't be
waiting for context switch if all other cpu cores are idle?
It knows that it (the caller) doesn't hold an rcu_read_lock,
and presumably can see the other cpus are in the idle state,
in which case surely it should return immediately? Distribution
of latency in synchronize_sched() looks like this:
20-49 us 110 instances (27.500%)
50-99 us 45 instances (11.250%)
5000-9999 us 5 instances (1.250%)
10000-19999 us 33 instances (8.250%)
20000-49999 us 4 instances (1.000%)
50000-99999 us 191 instances (47.750%)
100000-199999 us 12 instances (3.000%)
--
Alex Bligh
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Alex Bligh @ 2011-05-08 10:09 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Alex Bligh
In-Reply-To: <1304793553.3207.24.camel@edumazet-laptop>
--On 7 May 2011 20:39:13 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le samedi 07 mai 2011 à 20:32 +0200, Eric Dumazet a écrit :
>
> Also you could patch synchronize_sched() itself instead of
> synchronize_net()
OK, I did this, plus instrumented the call to rcu_barrier()
you mentioned:
Looking at the synchronize_net() and rcu_barrier() calls:
Total 8.43935 Usage 399 Average 0.02115 elsewhere
Total 10.65050 Usage 200 Average 0.05325 rcu_barrier
Total 9.28948 Usage 200 Average 0.04645 synchronize_net
it's spending about 1/3 of its time in that rcu_barrier, 1/3
in synchronize_sched() and 1/3 elsewere.
Turning now to the synchronize_sched() (per your patch), I see
Total 16.36852 Usage 400 Average 0.04092 synchronize_sched()
Note "Usage 400". That's because precisely half the calls to
synchronize_sched() occur outside of synchronize_net(), and
half occur within synchronize_net() (per logs)
A typical interface being removed looks like this:
May 8 09:47:31 nattytest kernel: [ 177.030197] synchronize_sched() in
66921 us
May 8 09:47:31 nattytest kernel: [ 177.030957] begin synchronize_net()
May 8 09:47:31 nattytest kernel: [ 177.120085] synchronize_sched() in
89080 us
May 8 09:47:31 nattytest kernel: [ 177.120819] end synchronize_net()
May 8 09:47:31 nattytest kernel: [ 177.121698] begin rcu_barrier()
May 8 09:47:31 nattytest kernel: [ 177.190152] end rcu_barrier()
So for every interface being destroyed (I'm doing 200 as veths
are pairs), we do 2 synchronize_sched() calls and 1 rcu_barrier.
Each of these takes roughly 42ms with CONFIG_HZ set to 100,
leading to 125ms per interface destroy, and 250ms per veth
pair destroy.
It may be a naive question but why would we need to do
2 synchronize_sched() and 1 rcu_barrier() to remove an
interface?
--
Alex Bligh
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Alex Bligh @ 2011-05-08 9:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Paul E. McKenney, Alex Bligh
In-Reply-To: <1304838742.3207.45.camel@edumazet-laptop>
Eric,
--On 8 May 2011 09:12:22 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> By the way, if I change HZ from 1000 to 100 I now have ten times slower
> result :
I repeated that test here. With HZ set to 1000 I got a total time of
4.022 seconds to remove 100 interfaces, of which:
Total 3.03808 Usage 199 Average 0.01527 elsewhere
Total 0.93992 Usage 200 Average 0.00470 synchronizing
as opposed to a total of 27.917 seconds with HZ set to 100, of which
Total 18.98515 Usage 199 Average 0.09540 elsewhere
Total 8.77581 Usage 200 Average 0.04388 synchronizing
Not quite a factor of 10 improvement, but nearly.
I have CONFIG_RCU_FAST_NO_HZ=y
I suspect this may just mean an rcu reader holds the rcu_read_lock
for a jiffies related time. Though I'm having difficulty seeing
what that might be on a system where the net is in essence idle.
--
Alex Bligh
^ permalink raw reply
* [PATCH v2] net: add mac_pton() for parsing MAC address
From: Alexey Dobriyan @ 2011-05-08 9:00 UTC (permalink / raw)
To: davem; +Cc: netdev, shemminger
mac_pton() parses MAC address in form XX:XX:XX:XX:XX:XX and only in that form.
mac_pton() doesn't dirty result until it's sure string representation is valid.
mac_pton() doesn't care about characters _after_ last octet,
it's up to caller to deal with it.
mac_pton() diverges from 0/-E return value convention.
Target usage:
if (!mac_pton(str, whatever->mac))
return -EINVAL;
/* ->mac being u8 [ETH_ALEN] is filled at this point. */
/* optionally check str[3 * ETH_ALEN - 1] for termination */
Use mac_pton() in pktgen and netconsole for start.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---
drivers/net/netconsole.c | 20 +++--------------
include/linux/if_ether.h | 1
net/core/netpoll.c | 26 -----------------------
net/core/pktgen.c | 53 +++++++----------------------------------------
net/core/utils.c | 24 +++++++++++++++++++++
5 files changed, 38 insertions(+), 86 deletions(-)
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -429,8 +429,6 @@ static ssize_t store_remote_mac(struct netconsole_target *nt,
size_t count)
{
u8 remote_mac[ETH_ALEN];
- char *p = (char *) buf;
- int i;
if (nt->enabled) {
printk(KERN_ERR "netconsole: target (%s) is enabled, "
@@ -439,23 +437,13 @@ static ssize_t store_remote_mac(struct netconsole_target *nt,
return -EINVAL;
}
- for (i = 0; i < ETH_ALEN - 1; i++) {
- remote_mac[i] = simple_strtoul(p, &p, 16);
- if (*p != ':')
- goto invalid;
- p++;
- }
- remote_mac[ETH_ALEN - 1] = simple_strtoul(p, &p, 16);
- if (*p && (*p != '\n'))
- goto invalid;
-
+ if (!mac_pton(buf, remote_mac))
+ return -EINVAL;
+ if (buf[3 * ETH_ALEN - 1] && buf[3 * ETH_ALEN - 1] != '\n')
+ return -EINVAL;
memcpy(nt->np.remote_mac, remote_mac, ETH_ALEN);
return strnlen(buf, count);
-
-invalid:
- printk(KERN_ERR "netconsole: invalid input\n");
- return -EINVAL;
}
/*
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -136,6 +136,7 @@ int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr);
extern struct ctl_table ether_table[];
#endif
+int mac_pton(const char *s, u8 *mac);
extern ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len);
#endif
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -698,32 +698,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
if (*cur != 0) {
/* MAC address */
- if ((delim = strchr(cur, ':')) == NULL)
+ if (!mac_pton(cur, np->remote_mac))
goto parse_failed;
- *delim = 0;
- np->remote_mac[0] = simple_strtol(cur, NULL, 16);
- cur = delim + 1;
- if ((delim = strchr(cur, ':')) == NULL)
- goto parse_failed;
- *delim = 0;
- np->remote_mac[1] = simple_strtol(cur, NULL, 16);
- cur = delim + 1;
- if ((delim = strchr(cur, ':')) == NULL)
- goto parse_failed;
- *delim = 0;
- np->remote_mac[2] = simple_strtol(cur, NULL, 16);
- cur = delim + 1;
- if ((delim = strchr(cur, ':')) == NULL)
- goto parse_failed;
- *delim = 0;
- np->remote_mac[3] = simple_strtol(cur, NULL, 16);
- cur = delim + 1;
- if ((delim = strchr(cur, ':')) == NULL)
- goto parse_failed;
- *delim = 0;
- np->remote_mac[4] = simple_strtol(cur, NULL, 16);
- cur = delim + 1;
- np->remote_mac[5] = simple_strtol(cur, NULL, 16);
}
netpoll_print_options(np);
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1420,11 +1420,6 @@ static ssize_t pktgen_if_write(struct file *file,
return count;
}
if (!strcmp(name, "dst_mac")) {
- char *v = valstr;
- unsigned char old_dmac[ETH_ALEN];
- unsigned char *m = pkt_dev->dst_mac;
- memcpy(old_dmac, pkt_dev->dst_mac, ETH_ALEN);
-
len = strn_len(&user_buffer[i], sizeof(valstr) - 1);
if (len < 0)
return len;
@@ -1432,35 +1427,16 @@ static ssize_t pktgen_if_write(struct file *file,
memset(valstr, 0, sizeof(valstr));
if (copy_from_user(valstr, &user_buffer[i], len))
return -EFAULT;
- i += len;
-
- for (*m = 0; *v && m < pkt_dev->dst_mac + 6; v++) {
- int value;
-
- value = hex_to_bin(*v);
- if (value >= 0)
- *m = *m * 16 + value;
-
- if (*v == ':') {
- m++;
- *m = 0;
- }
- }
+ if (!mac_pton(valstr, pkt_dev->dst_mac))
+ return -EINVAL;
/* Set up Dest MAC */
- if (compare_ether_addr(old_dmac, pkt_dev->dst_mac))
- memcpy(&(pkt_dev->hh[0]), pkt_dev->dst_mac, ETH_ALEN);
+ memcpy(&pkt_dev->hh[0], pkt_dev->dst_mac, ETH_ALEN);
- sprintf(pg_result, "OK: dstmac");
+ sprintf(pg_result, "OK: dstmac %pM", pkt_dev->dst_mac);
return count;
}
if (!strcmp(name, "src_mac")) {
- char *v = valstr;
- unsigned char old_smac[ETH_ALEN];
- unsigned char *m = pkt_dev->src_mac;
-
- memcpy(old_smac, pkt_dev->src_mac, ETH_ALEN);
-
len = strn_len(&user_buffer[i], sizeof(valstr) - 1);
if (len < 0)
return len;
@@ -1468,26 +1444,13 @@ static ssize_t pktgen_if_write(struct file *file,
memset(valstr, 0, sizeof(valstr));
if (copy_from_user(valstr, &user_buffer[i], len))
return -EFAULT;
- i += len;
-
- for (*m = 0; *v && m < pkt_dev->src_mac + 6; v++) {
- int value;
-
- value = hex_to_bin(*v);
- if (value >= 0)
- *m = *m * 16 + value;
-
- if (*v == ':') {
- m++;
- *m = 0;
- }
- }
+ if (!mac_pton(valstr, pkt_dev->src_mac))
+ return -EINVAL;
/* Set up Src MAC */
- if (compare_ether_addr(old_smac, pkt_dev->src_mac))
- memcpy(&(pkt_dev->hh[6]), pkt_dev->src_mac, ETH_ALEN);
+ memcpy(&pkt_dev->hh[6], pkt_dev->src_mac, ETH_ALEN);
- sprintf(pg_result, "OK: srcmac");
+ sprintf(pg_result, "OK: srcmac %pM", pkt_dev->src_mac);
return count;
}
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -296,3 +296,27 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
csum_unfold(*sum)));
}
EXPORT_SYMBOL(inet_proto_csum_replace4);
+
+int mac_pton(const char *s, u8 *mac)
+{
+ int i;
+
+ /* XX:XX:XX:XX:XX:XX */
+ if (strlen(s) < 3 * ETH_ALEN - 1)
+ return 0;
+
+ /* Don't dirty result unless string is valid MAC. */
+ for (i = 0; i < ETH_ALEN; i++) {
+ if (!strchr("0123456789abcdefABCDEF", s[i * 3]))
+ return 0;
+ if (!strchr("0123456789abcdefABCDEF", s[i * 3 + 1]))
+ return 0;
+ if (i != ETH_ALEN - 1 && s[i * 3 + 2] != ':')
+ return 0;
+ }
+ for (i = 0; i < ETH_ALEN; i++) {
+ mac[i] = (hex_to_bin(s[i * 3]) << 4) | hex_to_bin(s[i * 3 + 1]);
+ }
+ return 1;
+}
+EXPORT_SYMBOL(mac_pton);
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Alex Bligh @ 2011-05-08 8:08 UTC (permalink / raw)
To: Ben Greear, Eric Dumazet; +Cc: netdev, Alex Bligh
In-Reply-To: <4DC611C3.7070607@candelatech.com>
--On 7 May 2011 20:45:07 -0700 Ben Greear <greearb@candelatech.com> wrote:
> Well, I'd hope to get a netlink message about the device being deleted,
> and
> after that, be able to create another one with the same name, etc.
>
> Whether the memory is actually freed in the kernel or not wouldn't matter
> to me...
Provided the former para is always done, I can't actually think of a case
where the caller would /ever/ care about the latter (save perhaps
a final shutdown of the whole net subsystem).
--
Alex Bligh
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Alex Bligh @ 2011-05-08 8:06 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Paul E. McKenney, Alex Bligh
In-Reply-To: <1304838742.3207.45.camel@edumazet-laptop>
Eric,
> By the way, if I change HZ from 1000 to 100 I now have ten times slower
> result :
>
># ip link add link eth0 eth0.103 type vlan id 103
># time ip link del eth0.103
>
> real 0m0.430s
> user 0m0.000s
> sys 0m0.000s
>
> So all this is related to your HZ value, even in a CONFIG_NO_HZ=y
> kernel.
That's very mysterious.
> Alex, I guess you have HZ=250 ?
I have HZ=100. I am basically using the Ubuntu default with
localmodconfig to make compile times sensible.
amb@nattytest:~$ cd kernel/linux-2.6/
amb@nattytest:~/kernel/linux-2.6$ fgrep HZ .config
CONFIG_RCU_FAST_NO_HZ=y
CONFIG_NO_HZ=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
# CONFIG_MACHZ_WDT is not set
--
Alex Bligh
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Eric Dumazet @ 2011-05-08 7:12 UTC (permalink / raw)
To: Alex Bligh; +Cc: netdev, Paul E. McKenney
In-Reply-To: <1304793749.3207.26.camel@edumazet-laptop>
Le samedi 07 mai 2011 à 20:42 +0200, Eric Dumazet a écrit :
> Here is my trace here for one device deletion on one 8 core machine
>
> [ 800.447012] synchronize_rcu() in 15787 us
> [ 800.455013] synchronize_rcu() in 7682 us
> [ 800.464019] rcu_barrier() in 8487 us
>
> Not that bad.
>
> $ grep RCU .config
> # RCU Subsystem
> CONFIG_TREE_RCU=y
> # CONFIG_PREEMPT_RCU is not set
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_FANOUT=32
> # CONFIG_RCU_FANOUT_EXACT is not set
> # CONFIG_RCU_FAST_NO_HZ is not set
> CONFIG_TREE_RCU_TRACE=y
>
By the way, if I change HZ from 1000 to 100 I now have ten times slower
result :
# ip link add link eth0 eth0.103 type vlan id 103
# time ip link del eth0.103
real 0m0.430s
user 0m0.000s
sys 0m0.000s
So all this is related to your HZ value, even in a CONFIG_NO_HZ=y
kernel. Alex, I guess you have HZ=250 ?
# uname -a
Linux svivoipvnx021 2.6.39-rc6-00214-g5511a34-dirty #574 SMP Sun May 8
08:44:14 CEST 2011 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/cmdline
I enabled CONFIG_RCU_FAST_NO_HZ and got worse results (but not
alsways... its very variable)
# time ip link del eth0.103
real 0m0.544s
user 0m0.000s
sys 0m0.000s
# time ip link del eth0.103
real 0m0.414s
user 0m0.000s
sys 0m0.000s
^ permalink raw reply
* [PATCH] netconsole: switch to kstrto*() functions
From: Alexey Dobriyan @ 2011-05-08 6:33 UTC (permalink / raw)
To: davem; +Cc: netdev
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---
drivers/net/netconsole.c | 62 ++++++++++-------------------------------------
1 file changed, 14 insertions(+), 48 deletions(-)
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -242,34 +242,6 @@ static struct netconsole_target *to_target(struct config_item *item)
}
/*
- * Wrapper over simple_strtol (base 10) with sanity and range checking.
- * We return (signed) long only because we may want to return errors.
- * Do not use this to convert numbers that are allowed to be negative.
- */
-static long strtol10_check_range(const char *cp, long min, long max)
-{
- long ret;
- char *p = (char *) cp;
-
- WARN_ON(min < 0);
- WARN_ON(max < min);
-
- ret = simple_strtol(p, &p, 10);
-
- if (*p && (*p != '\n')) {
- printk(KERN_ERR "netconsole: invalid input\n");
- return -EINVAL;
- }
- if ((ret < min) || (ret > max)) {
- printk(KERN_ERR "netconsole: input %ld must be between "
- "%ld and %ld\n", ret, min, max);
- return -EINVAL;
- }
-
- return ret;
-}
-
-/*
* Attribute operations for netconsole_target.
*/
@@ -327,12 +299,14 @@ static ssize_t store_enabled(struct netconsole_target *nt,
const char *buf,
size_t count)
{
+ int enabled;
int err;
- long enabled;
- enabled = strtol10_check_range(buf, 0, 1);
- if (enabled < 0)
- return enabled;
+ err = kstrtoint(buf, 10, &enabled);
+ if (err < 0)
+ return err;
+ if (enabled < 0 || enabled > 1)
+ return -EINVAL;
if (enabled) { /* 1 */
@@ -384,8 +358,7 @@ static ssize_t store_local_port(struct netconsole_target *nt,
const char *buf,
size_t count)
{
- long local_port;
-#define __U16_MAX ((__u16) ~0U)
+ int rv;
if (nt->enabled) {
printk(KERN_ERR "netconsole: target (%s) is enabled, "
@@ -394,12 +367,9 @@ static ssize_t store_local_port(struct netconsole_target *nt,
return -EINVAL;
}
- local_port = strtol10_check_range(buf, 0, __U16_MAX);
- if (local_port < 0)
- return local_port;
-
- nt->np.local_port = local_port;
-
+ rv = kstrtou16(buf, 10, &nt->np.local_port);
+ if (rv < 0)
+ return rv;
return strnlen(buf, count);
}
@@ -407,8 +377,7 @@ static ssize_t store_remote_port(struct netconsole_target *nt,
const char *buf,
size_t count)
{
- long remote_port;
-#define __U16_MAX ((__u16) ~0U)
+ int rv;
if (nt->enabled) {
printk(KERN_ERR "netconsole: target (%s) is enabled, "
@@ -417,12 +386,9 @@ static ssize_t store_remote_port(struct netconsole_target *nt,
return -EINVAL;
}
- remote_port = strtol10_check_range(buf, 0, __U16_MAX);
- if (remote_port < 0)
- return remote_port;
-
- nt->np.remote_port = remote_port;
-
+ rv = kstrtou16(buf, 10, &nt->np.remote_port);
+ if (rv < 0)
+ return rv;
return strnlen(buf, count);
}
^ permalink raw reply
* Re: [net-next-2.6 0/5][pull request] Intel Wired LAN Driver Update
From: David Miller @ 2011-05-08 5:59 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips
In-Reply-To: <1304763923-6839-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Sat, 7 May 2011 03:25:18 -0700
> The following series contains updates to e100, e1000, igb and ixgbe.
>
> Sorry for the delay on the e100/e1000/igb convert to set_phys_id patches,
> it was due to me falling ill and not completing the patches in a timely
> manner.
>
> The following are changes since commit 706527280ec38fcdcd0466f10b607105fd23801b:
> ipv4: Initialize cork->opt using NULL not 0
> and are available in the git repository at:
> master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next-2.6 master
Pulled, thanks Jeff!
^ permalink raw reply
* Re: [PATCH 7/7] ns: Wire up the setns system call
From: James Bottomley @ 2011-05-08 4:02 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Daniel Lezcano, Linux Containers, Renato Westphal
In-Reply-To: <m162pm53ye.fsf@fess.ebiederm.org>
On Sat, 2011-05-07 at 19:19 -0700, Eric W. Biederman wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>
> > On Fri, 2011-05-06 at 19:25 -0700, Eric W. Biederman wrote:
> >> v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
> >> v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
> >> v4: Moved wiring up of the system call to another patch
> >> v5: ported to v2.6.39-rc6
> >>
> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >
> > We already have several syscalls queued up for parisc:
> >
> > http://git.kernel.org/?p=linux/kernel/git/jejb/parisc-2.6.git;a=shortlog;h=refs/heads/misc
> >
> > So if you could make this patch over them (or over linux-next), that
> > would help the merge process.
>
> I will take a look. I was rather pleasantly surprised that no other
> system call conflicts had shown up before now.
>
> This is unfortunately one of those areas where it is almost impossible
> to avoid conflicts.
>
> Do you know if there is any chance that the parisc tree might get
> rebased or anything horrible like that?
It shouldn't unless one of the wire ups is wrong and I have to replace
it, but I think the possibility of that will be minute.
> If not I think I will just pull the hunk of the parisc tree with the syscalls
> e38f5b745075828ac51b12c8c95c85a7be4a3ec7...2e7bad5f34b5beed47542490c760ed26574e38ba
> into my tree so I don't have to worry about merge order.
Yes, that should work fine.
James
^ permalink raw reply
* Re: [PATCH 2/7] ns: Introduce the setns syscall
From: Matt Helsley @ 2011-05-08 3:51 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, netdev, linux-kernel, Linux Containers, linux-fsdevel
In-Reply-To: <1304735101-1824-2-git-send-email-ebiederm@xmission.com>
On Fri, May 06, 2011 at 07:24:56PM -0700, Eric W. Biederman wrote:
> With the networking stack today there is demand to handle
> multiple network stacks at a time. Not in the context
> of containers but in the context of people doing interesting
> things with routing.
>
> There is also demand in the context of containers to have
> an efficient way to execute some code in the container itself.
> If nothing else it is very useful ad a debugging technique.
>
> Both problems can be solved by starting some form of login
> daemon in the namespaces people want access to, or you
> can play games by ptracing a process and getting the
> traced process to do things you want it to do. However
> it turns out that a login daemon or a ptrace puppet
> controller are more code, they are more prone to
> failure, and generally they are less efficient than
> simply changing the namespace of a process to a
> specified one.
>
> Pieces of this puzzle can also be solved by instead of
> coming up with a general purpose system call coming up
> with targed system calls perhaps socketat that solve
> a subset of the larger problem. Overall that appears
> to be more work for less reward.
>
> int setns(int fd, int nstype);
>
> The fd argument is a file descriptor referring to a proc
> file of the namespace you want to switch the process to.
>
> In the setns system call the nstype is 0 or specifies
> an clone flag of the namespace you intend to change
> to prevent changing a namespace unintentionally.
>
> v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
> v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
> v4: Moved wiring up of the system call to another patch
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> kernel/nsproxy.c | 37 +++++++++++++++++++++++++++++++++++++
> 1 files changed, 37 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index a05d191..96059d8 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -22,6 +22,9 @@
> #include <linux/pid_namespace.h>
> #include <net/net_namespace.h>
> #include <linux/ipc_namespace.h>
> +#include <linux/proc_fs.h>
> +#include <linux/file.h>
> +#include <linux/syscalls.h>
>
> static struct kmem_cache *nsproxy_cachep;
>
> @@ -233,6 +236,40 @@ void exit_task_namespaces(struct task_struct *p)
> switch_task_namespaces(p, NULL);
> }
>
> +SYSCALL_DEFINE2(setns, int, fd, int, nstype)
> +{
> + const struct proc_ns_operations *ops;
> + struct task_struct *tsk = current;
> + struct nsproxy *new_nsproxy;
> + struct proc_inode *ei;
> + struct file *file;
> + int err;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + file = proc_ns_fget(fd);
> + if (IS_ERR(file))
> + return PTR_ERR(file);
> +
> + err = -EINVAL;
> + ei = PROC_I(file->f_dentry->d_inode);
> + ops = ei->ns_ops;
> + if (nstype && (ops->type != nstype))
> + goto out;
> +
> + new_nsproxy = create_new_namespaces(0, tsk, tsk->fs);
Doesn't this need some error checking like:
if (IS_ERR(new_nsproxy)) {
err = PTR_ERR(new_nsproxy);
goto out;
}
> + err = ops->install(new_nsproxy, ei->ns);
> + if (err) {
> + free_nsproxy(new_nsproxy);
> + goto out;
> + }
> + switch_task_namespaces(tsk, new_nsproxy);
> +out:
> + fput(file);
> + return err;
> +}
> +
> static int __init nsproxy_cache_init(void)
> {
> nsproxy_cachep = KMEM_CACHE(nsproxy, SLAB_PANIC);
> --
> 1.6.5.2.143.g8cc62
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply
* Re: Scalability of interface creation and deletion
From: Ben Greear @ 2011-05-08 3:45 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Alex Bligh, netdev
In-Reply-To: <1304787065.3207.17.camel@edumazet-laptop>
On 05/07/2011 09:51 AM, Eric Dumazet wrote:
> Le samedi 07 mai 2011 à 09:44 -0700, Ben Greear a écrit :
>> On 05/07/2011 09:37 AM, Eric Dumazet wrote:
>>> Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit :
>>>
>>>> I wonder if it would be worth having a 'delete me soon'
>>>> method to delete interfaces that would not block on the
>>>> RCU code.
>>>>
>>>> The controlling programs could use netlink messages to
>>>> know exactly when an interface was truly gone.
>>>>
>>>> That should allow some batching in the sync-net logic
>>>> too, if user-space code deletes 1000 interfaces very
>>>> quickly, for instance...
>>>>
>>>
>>> I suggested in the past to have an extension of batch capabilities, so
>>> that one kthread could have 3 separate lists of devices being destroyed
>>> in //,
>>>
>>> This daemon would basically loop on one call to synchronize_rcu(), and
>>> transfert list3 to deletion, list2 to list3, list1 to list2, loop,
>>> eventually releasing RTNL while blocked in synchronize_rcu()
>>>
>>> This would need to allow as you suggest an asynchronous deletion method,
>>> or use a callback to wake the process blocked on device delete.
>>
>> I'd want to at least have the option to not block the calling
>> process...otherwise, it would be a lot more difficult to
>> quickly delete 1000 interfaces. You'd need 1000 threads, or
>> sockets, or something to parallelize it otherwise, eh?
>
> Yes, if you can afford not receive a final notification of device being
> fully freed, it should be possible.
Well, I'd hope to get a netlink message about the device being deleted, and
after that, be able to create another one with the same name, etc.
Whether the memory is actually freed in the kernel or not wouldn't matter
to me...
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [PATCH 7/7] ns: Wire up the setns system call
From: Eric W. Biederman @ 2011-05-08 2:19 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Daniel Lezcano, Linux Containers, Renato Westphal
In-Reply-To: <1304798775.6123.18.camel@mulgrave.site>
James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Fri, 2011-05-06 at 19:25 -0700, Eric W. Biederman wrote:
>> v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
>> v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
>> v4: Moved wiring up of the system call to another patch
>> v5: ported to v2.6.39-rc6
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> We already have several syscalls queued up for parisc:
>
> http://git.kernel.org/?p=linux/kernel/git/jejb/parisc-2.6.git;a=shortlog;h=refs/heads/misc
>
> So if you could make this patch over them (or over linux-next), that
> would help the merge process.
I will take a look. I was rather pleasantly surprised that no other
system call conflicts had shown up before now.
This is unfortunately one of those areas where it is almost impossible
to avoid conflicts.
Do you know if there is any chance that the parisc tree might get
rebased or anything horrible like that?
If not I think I will just pull the hunk of the parisc tree with the syscalls
e38f5b745075828ac51b12c8c95c85a7be4a3ec7...2e7bad5f34b5beed47542490c760ed26574e38ba
into my tree so I don't have to worry about merge order.
Eric
^ permalink raw reply
* Re: [PATCH 6/7] net: Allow setting the network namespace by fd
From: Daniel Lezcano @ 2011-05-07 22:46 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-6-git-send-email-ebiederm@xmission.com>
On 05/07/2011 04:25 AM, Eric W. Biederman wrote:
> Take advantage of the new abstraction and allow network devices
> to be placed in any network namespace that we have a fd to talk
> about.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> ---
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 5/7] ns proc: Add support for the ipc namespace
From: Daniel Lezcano @ 2011-05-07 22:44 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-5-git-send-email-ebiederm@xmission.com>
On 05/07/2011 04:24 AM, Eric W. Biederman wrote:
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> ---
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 4/7] ns proc: Add support for the uts namespace
From: Daniel Lezcano @ 2011-05-07 22:42 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-4-git-send-email-ebiederm@xmission.com>
On 05/07/2011 04:24 AM, Eric W. Biederman wrote:
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> ---
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 3/7] ns proc: Add support for the network namespace.
From: Daniel Lezcano @ 2011-05-07 22:41 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-3-git-send-email-ebiederm@xmission.com>
On 05/07/2011 04:24 AM, Eric W. Biederman wrote:
> Implementing file descriptors for the network namespace
> is simple and straight forward.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> ---
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 2/7] ns: Introduce the setns syscall
From: Daniel Lezcano @ 2011-05-07 22:39 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-2-git-send-email-ebiederm@xmission.com>
On 05/07/2011 04:24 AM, Eric W. Biederman wrote:
> With the networking stack today there is demand to handle
> multiple network stacks at a time. Not in the context
> of containers but in the context of people doing interesting
> things with routing.
>
> There is also demand in the context of containers to have
> an efficient way to execute some code in the container itself.
> If nothing else it is very useful ad a debugging technique.
>
> Both problems can be solved by starting some form of login
> daemon in the namespaces people want access to, or you
> can play games by ptracing a process and getting the
> traced process to do things you want it to do. However
> it turns out that a login daemon or a ptrace puppet
> controller are more code, they are more prone to
> failure, and generally they are less efficient than
> simply changing the namespace of a process to a
> specified one.
>
> Pieces of this puzzle can also be solved by instead of
> coming up with a general purpose system call coming up
> with targed system calls perhaps socketat that solve
> a subset of the larger problem. Overall that appears
> to be more work for less reward.
>
> int setns(int fd, int nstype);
>
> The fd argument is a file descriptor referring to a proc
> file of the namespace you want to switch the process to.
>
> In the setns system call the nstype is 0 or specifies
> an clone flag of the namespace you intend to change
> to prevent changing a namespace unintentionally.
>
> v2: Most of the architecture support added by Daniel Lezcano<dlezcano@fr.ibm.com>
> v3: ported to v2.6.36-rc4 by: Eric W. Biederman<ebiederm@xmission.com>
> v4: Moved wiring up of the system call to another patch
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> ---
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 1/7] ns: proc files for namespace naming policy.
From: Daniel Lezcano @ 2011-05-07 22:37 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-1-git-send-email-ebiederm@xmission.com>
On 05/07/2011 04:24 AM, Eric W. Biederman wrote:
> Create files under /proc/<pid>/ns/ to allow controlling the
> namespaces of a process.
>
> This addresses three specific problems that can make namespaces hard to
> work with.
> - Namespaces require a dedicated process to pin them in memory.
> - It is not possible to use a namespace unless you are the child
> of the original creator.
> - Namespaces don't have names that userspace can use to talk about
> them.
>
> The namespace files under /proc/<pid>/ns/ can be opened and the
> file descriptor can be used to talk about a specific namespace, and
> to keep the specified namespace alive.
>
> A namespace can be kept alive by either holding the file descriptor
> open or bind mounting the file someplace else. aka:
> mount --bind /proc/self/ns/net /some/filesystem/path
> mount --bind /proc/self/fd/<N> /some/filesystem/path
>
> This allows namespaces to be named with userspace policy.
>
> It requires additional support to make use of these filedescriptors
> and that will be comming in the following patches.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: Rafael Aquini @ 2011-05-07 21:37 UTC (permalink / raw)
To: Jay Vosburgh
Cc: Joe Perches, kernel-janitors, David Miller, Andy Gospodarek,
shemminger, netdev, Nicolas Kaiser
In-Reply-To: <23896.1304803541@death>
Howdy Jay,
On Sat, May 07, 2011 at 02:25:41PM -0700, Jay Vosburgh wrote:
> My preference would be to only remove the "silence compiler"
> comments if the possibility of silent misbehavior is also eliminated.
> For __ad_timer_to_ticks, that would mean either a default: case in the
> current arrangement, or something like what Joe suggests above.
>
> If this is beyond the scope of what you, Rafeal, want to do,
> that's fine, but in that case leave the "silence" notes in place.
Yeah, re-factor that sort of code was beyond my intentions when I first
submit the patch. However, it is certainly feasible to be done, and as I wrote
before, I'm more than willing to help where is needed to help.
I'll review the places where "silence" notes are placed, and try to figure out a
proper way to get rid of them.
Thank you, folks, for keep hitting me with valuable feedback.
Cheers!
--
Rafael Aquini <aquini@linux.com>
^ permalink raw reply
* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: Jay Vosburgh @ 2011-05-07 21:25 UTC (permalink / raw)
To: Joe Perches
Cc: aquini, kernel-janitors, David Miller, Andy Gospodarek,
shemminger, netdev, Nicolas Kaiser
In-Reply-To: <1304791360.1738.6.camel@Joe-Laptop>
Joe Perches <joe@perches.com> wrote:
>On Sat, 2011-05-07 at 14:31 -0300, Rafael Aquini wrote:
>> To exemplify my point, I'll taking that very __ad_timer_to_ticks() as an example:
>> static u16 __ad_timer_to_ticks(u16 timer_type, u16 par)
>> {
>> u16 retval = 0;
>>
>> switch (timer_type) {
>> case AD_CURRENT_WHILE_TIMER: /* for rx machine usage */
>> if (par)
>> retval = (AD_SHORT_TIMEOUT_TIME*ad_ticks_per_sec);
>> else
>> retval = (AD_LONG_TIMEOUT_TIME*ad_ticks_per_sec);
>> break;
>> case AD_ACTOR_CHURN_TIMER: /* for local churn machine */
>> retval = (AD_CHURN_DETECTION_TIME*ad_ticks_per_sec);
>> break;
>> case AD_PERIODIC_TIMER: /* for periodic machine */
>> retval = (par*ad_ticks_per_sec);
>> break;
>> case AD_PARTNER_CHURN_TIMER: /* for remote churn machine */
>> retval = (AD_CHURN_DETECTION_TIME*ad_ticks_per_sec);
>> break;
>> case AD_WAIT_WHILE_TIMER: /* for selection machine */
>> retval = (AD_AGGREGATE_WAIT_TIME*ad_ticks_per_sec);
>> break;
>> }
>> return retval;
>> }
>>
>> If, for some unknown reason timer_type receives an 'alien' value, and
>> we were
>> using retval uninitialized, this function, as it is, would return an
>> unpredictable value to its caller. Unless we have the switch block
>> re-factored, we cannot leave retval uninitialized. So, it's not just a
>> matter of leaving the variable uninitialized, or initialize it just to
>> get rid of a compiler warning. That's why those comments are not
>> helpful anyway.
The comments are helpful, because they mark places where the
code could use some improvement to better handle "impossible"
conditions. The only reason we need the initializer is because the
function doesn't handle all possible inputs. This is probably not the
comment's intended purpose, but they're useful for this nevertheless.
>I'd write this not using a retval variable at all as:
>
> switch (timer_type) {
> case AD_CURRENT_WHILE_TIMER: /* for rx machine usage */
> if (par)
> return AD_SHORT_TIMEOUT_TIME * ad_ticks_per_sec;
> return AD_LONG_TIMEOUT_TIME * ad_ticks_per_sec;
> case AD_ACTOR_CHURN_TIMER: /* for local churn machine */
> return AD_CHURN_DETECTION_TIME * ad_ticks_per_sec;
> ...
> }
> WARN(1, "Invalid timer type: %u\n", timer_type)
> return 0;
>}
Most (perhaps all) of the "silence compiler" comments in this
code exist in places that, like the above, will silently misbehave if
unexpected input arrives. Granted, for the __ad_timer_to_ticks
function, there aren't a lot of call sites, but the other similar cases
exist within the state machine handler code.
My preference would be to only remove the "silence compiler"
comments if the possibility of silent misbehavior is also eliminated.
For __ad_timer_to_ticks, that would mean either a default: case in the
current arrangement, or something like what Joe suggests above.
If this is beyond the scope of what you, Rafeal, want to do,
that's fine, but in that case leave the "silence" notes in place.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply
* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: Joe Perches @ 2011-05-07 20:24 UTC (permalink / raw)
To: matt mooney
Cc: David Miller, Jay Vosburgh, Andy Gospodarek, shemminger, netdev,
Nicolas Kaiser, kernel-janitors, aquini
In-Reply-To: <BANLkTimKs39m_htMFEygR+3Bfu7Butc-zQ@mail.gmail.com>
On Sat, 2011-05-07 at 12:35 -0700, matt mooney wrote:
> On Sat, May 7, 2011 at 11:02 AM, Joe Perches <joe@perches.com> wrote:
> > I'd write this not using a retval variable at all as:
[]
> But isn't the preferred style to have a single exit point?
I don't think so. It's not pascal.
^ permalink raw reply
* Re: [PATCH] [RESEND] iwl4965: drop a lone pr_err()
From: julie Sullivan @ 2011-05-07 20:14 UTC (permalink / raw)
To: Paul Bolle; +Cc: John W. Linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1304771519.2227.6.camel@x61.thuisdomein>
> iwl4965_rate_control_register() prints a message at KERN_ERR level.
> - pr_err("Registering 4965 rate control operations\n");
Yes, I noticed this message on boot too and found it a tad confusing...
('Rate control operations'?? 4,965 of them?)
:-) At least I know what it is now. Thanks, Paul.
Julie
^ permalink raw reply
* Re: [PATCH 7/7] ns: Wire up the setns system call
From: James Bottomley @ 2011-05-07 20:06 UTC (permalink / raw)
To: Eric W. Biederman
Cc: linux-arch, linux-kernel, netdev, linux-fsdevel, jamal,
Daniel Lezcano, Linux Containers, Renato Westphal
In-Reply-To: <1304735101-1824-7-git-send-email-ebiederm@xmission.com>
On Fri, 2011-05-06 at 19:25 -0700, Eric W. Biederman wrote:
> v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
> v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
> v4: Moved wiring up of the system call to another patch
> v5: ported to v2.6.39-rc6
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We already have several syscalls queued up for parisc:
http://git.kernel.org/?p=linux/kernel/git/jejb/parisc-2.6.git;a=shortlog;h=refs/heads/misc
So if you could make this patch over them (or over linux-next), that
would help the merge process.
Thanks,
James
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox