* Scalability of interface creation and deletion @ 2011-05-07 11:08 Alex Bligh 2011-05-07 12:22 ` Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-07 11:08 UTC (permalink / raw) To: netdev; +Cc: Alex Bligh I am trying to track down why interface creation slows down badly with large numbers of interfaces (~1,000 interfaces) and why deletion is so slow. Use case: restarting routers needs to be fast; some failover methods require interface up/down; some routers need lots of interfaces. I have written a small shell script to create and delete a number of interfaces supplied on the command line (script appended below). It is important to run this with udev, udev-bridge etc. disabled. In my environment (Ubuntu 2.6.32-28-generic, Lucid). I did this by * service upstart-udev-bridge stop * service udev stop * unshare -n bash If you don't do this, you are simply timing your distro's interface scripts. Note the "-n" parameter creates the supplied number of veth pair interfaces. As these are pairs, there are twice as many interfaces actually created. So, the results which are pretty repeatable are as follows: 100 pairs 500 pairs Interface creation 14ms 110ms Interface deletion 160ms 148ms Now I don't think interface deletion has in fact got faster: simply the overhead of loading the script is spread over more processes. But there are two obvious conclusions: 1. Interface creation slows down hugely with more interfaces 2. Interface deletion is normally much slower than interface creation strace -T -ttt on the "ip" command used to do this does not show the delay where I thought it would be - cataloguing the existing interfaces. Instead, it's the final send() to the netlink socket which does the relevant action which appears to be slow, for both addition and detion. Adding the last interface takes 200ms in that syscall, the first is quick (symptomatic of a slowdown); for deletion the last send syscall is quick. Poking about in net/core/dev.c, I see that interface names are hashed using a hash with a maximum of 256 entries. However, these seem to be hash buckets supporting multiple entries so I can't imagine a chain of 4 entries is problematic. I am having difficulty seeing what might be the issue in interface creation. Any ideas? In interface deletion, my attention is drawn to netdev_wait_allrefs, which does this: refcnt = netdev_refcnt_read(dev); while (refcnt != 0) { ... msleep(250); refcnt = netdev_refcnt_read(dev); .... } I am guessing that this is going to do the msleep 50% of the time, explaining 125ms of the observed time. How would people react to exponential backoff instead (untested): int backoff = 10; refcnt = netdev_refcnt_read(dev); while (refcnt != 0) { ... msleep(backoff); if ((backoff *= 2) > 250) backoff = 250; refcnt = netdev_refcnt_read(dev); .... } -- Alex Bligh #!/bin/bash # Usage: # ifaceseq [options] # # Options: # -n NUM : use NUM interfaces # -t TYPE : use TYPE of interfaces (supported: veth, vlan) numifs=10 itype=veth while getopts n:t: flag; do case ${flag} in n) numifs=${OPTARG} ;; t) itype=${OPTARG} ;; esac done shift $((OPTIND-1)) createifs () { echo `date` creating $numifs interfaces case ${itype} in vlan) for i in `seq 1 $numifs` ; do ip link add link eth0 name vlan${i} type vlan id ${i} done ;; *) for i in `seq 1 $numifs` ; do ip link add testa${i} type veth peer name testb${i} done esac echo `date` done } deleteifs () { echo `date` deleting $numifs interfaces case ${itype} in vlan) for i in `seq 1 $numifs` ; do ip link delete dev vlan${i} done ;; *) for i in `seq 1 $numifs` ; do ip link delete testa${i} done esac echo `date` done } time createifs; time deleteifs; ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 11:08 Scalability of interface creation and deletion Alex Bligh @ 2011-05-07 12:22 ` Eric Dumazet 2011-05-07 15:26 ` Alex Bligh 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 12:22 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 12:08 +0100, Alex Bligh a écrit : > I am trying to track down why interface creation slows down badly with > large numbers of interfaces (~1,000 interfaces) and why deletion is so > slow. Use case: restarting routers needs to be fast; some failover methods > require interface up/down; some routers need lots of interfaces. > > I have written a small shell script to create and delete a number of > interfaces supplied on the command line (script appended below). It > is important to run this with udev, udev-bridge etc. disabled. In > my environment > (Ubuntu 2.6.32-28-generic, Lucid). I did this by > * service upstart-udev-bridge stop > * service udev stop > * unshare -n bash > If you don't do this, you are simply timing your distro's interface > scripts. > > Note the "-n" parameter creates the supplied number of veth pair > interfaces. As these are pairs, there are twice as many interfaces actually > created. > > So, the results which are pretty repeatable are as follows: > > 100 pairs 500 pairs > Interface creation 14ms 110ms > Interface deletion 160ms 148ms > > Now I don't think interface deletion has in fact got faster: simply > the overhead of loading the script is spread over more processes. > But there are two obvious conclusions: > > 1. Interface creation slows down hugely with more interfaces sysfs is the problem, a very well known one. (sysfs_refresh_inode(), try : $ time ls /sys/class/net >/dev/null real 0m0.002s user 0m0.000s sys 0m0.001s $ modprobe dummy numdummies=1000 $ time ls /sys/class/net >/dev/null real 0m0.041s user 0m0.003s sys 0m0.002s > 2. Interface deletion is normally much slower than interface creation > > strace -T -ttt on the "ip" command used to do this does not show the delay > where I thought it would be - cataloguing the existing interfaces. Instead, > it's the final send() to the netlink socket which does the relevant action > which appears to be slow, for both addition and detion. Adding the last > interface takes 200ms in that syscall, the first is quick (symptomatic of a > slowdown); for deletion the last send syscall is quick. > > Poking about in net/core/dev.c, I see that interface names are hashed using > a hash with a maximum of 256 entries. However, these seem to be hash > buckets supporting multiple entries so I can't imagine a chain of 4 entries > is problematic. Its not. > > I am having difficulty seeing what might be the issue in interface > creation. Any ideas? > Actually a lot, just make git log net/core/dev.c and you'll see many commits to make this faster. > In interface deletion, my attention is drawn to netdev_wait_allrefs, > which does this: > refcnt = netdev_refcnt_read(dev); > Here refcnt is 0, or there is a bug somewhere. (It happens, we fix bugs once in a while) > while (refcnt != 0) { > ... > msleep(250); > > refcnt = netdev_refcnt_read(dev); > .... > } > > I am guessing that this is going to do the msleep 50% of the time, > explaining 125ms of the observed time. How would people react to > exponential backoff instead (untested): > > int backoff = 10; > refcnt = netdev_refcnt_read(dev); > > while (refcnt != 0) { > ... > msleep(backoff); > if ((backoff *= 2) > 250) > backoff = 250; > > refcnt = netdev_refcnt_read(dev); > .... > } > > Welcome to the club. This is what is discussed on netdev since many years. Lot of work had been done to make it better. Interface deletion needs several rcu synch calls, they are very expensive. This is the price to pay to have lockless network stack in fast paths. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 12:22 ` Eric Dumazet @ 2011-05-07 15:26 ` Alex Bligh 2011-05-07 15:54 ` Eric Dumazet 2011-05-07 16:26 ` Eric Dumazet 0 siblings, 2 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-07 15:26 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Alex Bligh Eric, >> 1. Interface creation slows down hugely with more interfaces > > sysfs is the problem, a very well known one. > (sysfs_refresh_inode(), Thanks >> 2. Interface deletion is normally much slower than interface creation >> >> strace -T -ttt on the "ip" command used to do this does not show the >> delay where I thought it would be - cataloguing the existing interfaces. >> Instead, it's the final send() to the netlink socket which does the >> relevant action which appears to be slow, for both addition and detion. >> Adding the last interface takes 200ms in that syscall, the first is >> quick (symptomatic of a slowdown); for deletion the last send syscall is >> quick. > >> I am having difficulty seeing what might be the issue in interface >> creation. Any ideas? >> > > Actually a lot, just make > > git log net/core/dev.c > > and you'll see many commits to make this faster. OK. I am up to 2.6.38.2 and see no improvement by then. I will try something bleeding edge in a bit. >> I am guessing that this is going to do the msleep 50% of the time, >> explaining 125ms of the observed time. How would people react to >> exponential backoff instead (untested): >> >> int backoff = 10; >> refcnt = netdev_refcnt_read(dev); >> >> while (refcnt != 0) { >> ... >> msleep(backoff); >> if ((backoff *= 2) > 250) >> backoff = 250; >> >> refcnt = netdev_refcnt_read(dev); >> .... >> } >> >> > > Welcome to the club. This is what is discussed on netdev since many > years. Lot of work had been done to make it better. Well, I patched it (patch attached for what it's worth) and it made no difference in this case. I would suggest however that it might be the right think to do anyway. > Interface deletion needs several rcu synch calls, they are very > expensive. This is the price to pay to have lockless network stack in > fast paths. On the current 8 core box I am testing, I see 280ms per interface delete **even with only 10 interfaces**. I see 260ms with one interface. I know doing lots of rcu sync stuff can be slow, but 260ms to remove one veth pair sounds like more than rcu sync going on. It sounds like a sleep (though I may not have found the right one). I see no CPU load. Equally, with one interface (remember I'm doing this in unshare -n so there is only a loopback interface there), this bit surely can't be sysfs. -- Alex Bligh Signed-off-by: Alex Bligh <alex@alex.org.uk> diff --git a/net/core/dev.c b/net/core/dev.c index 6561021..f55c95c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5429,6 +5429,7 @@ static void netdev_wait_allrefs(struct net_device *dev) { unsigned long rebroadcast_time, warning_time; int refcnt; + int backoff = 5; linkwatch_forget_dev(dev); @@ -5460,7 +5461,9 @@ static void netdev_wait_allrefs(struct net_device *dev) rebroadcast_time = jiffies; } - msleep(250); + msleep(backoff); + if ((backoff *= 2) > 250) + backoff = 250; refcnt = netdev_refcnt_read(dev); ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 15:26 ` Alex Bligh @ 2011-05-07 15:54 ` Eric Dumazet 2011-05-07 16:23 ` Ben Greear 2011-05-09 21:46 ` Octavian Purdila 2011-05-07 16:26 ` Eric Dumazet 1 sibling, 2 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 15:54 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 16:26 +0100, Alex Bligh a écrit : > Well, I patched it (patch attached for what it's worth) and it made > no difference in this case. I would suggest however that it might > be the right think to do anyway. > As I said, this code should not be entered in normal situations. You are not the first to suggest a change, but it wont help you at all. > On the current 8 core box I am testing, I see 280ms per interface > delete **even with only 10 interfaces**. I see 260ms with one > interface. I know doing lots of rcu sync stuff can be slow, but > 260ms to remove one veth pair sounds like more than rcu sync going > on. It sounds like a sleep (though I may not have found the > right one). I see no CPU load. > > Equally, with one interface (remember I'm doing this in unshare -n > so there is only a loopback interface there), this bit surely > can't be sysfs. > synchronize_rcu() calls are not consuming cpu, they just _wait_ rcu grace period. I suggest you read Documentation/RCU files if you really want to :) If you want to check how expensive it is, its quite easy: add a trace in synchronize_net() diff --git a/net/core/dev.c b/net/core/dev.c index 856b6ee..70f3c46 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5915,8 +5915,10 @@ EXPORT_SYMBOL(free_netdev); */ void synchronize_net(void) { + pr_err("begin synchronize_net()\n"); might_sleep(); synchronize_rcu(); + pr_err("end synchronize_net()\n"); } EXPORT_SYMBOL(synchronize_net); ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 15:54 ` Eric Dumazet @ 2011-05-07 16:23 ` Ben Greear 2011-05-07 16:37 ` Eric Dumazet 2011-05-09 21:46 ` Octavian Purdila 1 sibling, 1 reply; 53+ messages in thread From: Ben Greear @ 2011-05-07 16:23 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alex Bligh, netdev On 05/07/2011 08:54 AM, Eric Dumazet wrote: > Le samedi 07 mai 2011 à 16:26 +0100, Alex Bligh a écrit : >> Well, I patched it (patch attached for what it's worth) and it made >> no difference in this case. I would suggest however that it might >> be the right think to do anyway. >> > > As I said, this code should not be entered in normal situations. > > You are not the first to suggest a change, but it wont help you at all. > > > > >> On the current 8 core box I am testing, I see 280ms per interface >> delete **even with only 10 interfaces**. I see 260ms with one >> interface. I know doing lots of rcu sync stuff can be slow, but >> 260ms to remove one veth pair sounds like more than rcu sync going >> on. It sounds like a sleep (though I may not have found the >> right one). I see no CPU load. >> >> Equally, with one interface (remember I'm doing this in unshare -n >> so there is only a loopback interface there), this bit surely >> can't be sysfs. >> > > synchronize_rcu() calls are not consuming cpu, they just _wait_ > rcu grace period. > > I suggest you read Documentation/RCU files if you really want to :) > > If you want to check how expensive it is, its quite easy: > add a trace in synchronize_net() > > diff --git a/net/core/dev.c b/net/core/dev.c > index 856b6ee..70f3c46 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5915,8 +5915,10 @@ EXPORT_SYMBOL(free_netdev); > */ > void synchronize_net(void) > { > + pr_err("begin synchronize_net()\n"); > might_sleep(); > synchronize_rcu(); > + pr_err("end synchronize_net()\n"); > } > EXPORT_SYMBOL(synchronize_net); I wonder if it would be worth having a 'delete me soon' method to delete interfaces that would not block on the RCU code. The controlling programs could use netlink messages to know exactly when an interface was truly gone. That should allow some batching in the sync-net logic too, if user-space code deletes 1000 interfaces very quickly, for instance... Thanks, Ben > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 16:23 ` Ben Greear @ 2011-05-07 16:37 ` Eric Dumazet 2011-05-07 16:44 ` Ben Greear 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 16:37 UTC (permalink / raw) To: Ben Greear; +Cc: Alex Bligh, netdev Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit : > I wonder if it would be worth having a 'delete me soon' > method to delete interfaces that would not block on the > RCU code. > > The controlling programs could use netlink messages to > know exactly when an interface was truly gone. > > That should allow some batching in the sync-net logic > too, if user-space code deletes 1000 interfaces very > quickly, for instance... > I suggested in the past to have an extension of batch capabilities, so that one kthread could have 3 separate lists of devices being destroyed in //, This daemon would basically loop on one call to synchronize_rcu(), and transfert list3 to deletion, list2 to list3, list1 to list2, loop, eventually releasing RTNL while blocked in synchronize_rcu() This would need to allow as you suggest an asynchronous deletion method, or use a callback to wake the process blocked on device delete. Right now, we hold RTNL for the whole 3 steps process, so we cannot use any parallelism. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 16:37 ` Eric Dumazet @ 2011-05-07 16:44 ` Ben Greear 2011-05-07 16:51 ` Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: Ben Greear @ 2011-05-07 16:44 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alex Bligh, netdev On 05/07/2011 09:37 AM, Eric Dumazet wrote: > Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit : > >> I wonder if it would be worth having a 'delete me soon' >> method to delete interfaces that would not block on the >> RCU code. >> >> The controlling programs could use netlink messages to >> know exactly when an interface was truly gone. >> >> That should allow some batching in the sync-net logic >> too, if user-space code deletes 1000 interfaces very >> quickly, for instance... >> > > I suggested in the past to have an extension of batch capabilities, so > that one kthread could have 3 separate lists of devices being destroyed > in //, > > This daemon would basically loop on one call to synchronize_rcu(), and > transfert list3 to deletion, list2 to list3, list1 to list2, loop, > eventually releasing RTNL while blocked in synchronize_rcu() > > This would need to allow as you suggest an asynchronous deletion method, > or use a callback to wake the process blocked on device delete. I'd want to at least have the option to not block the calling process...otherwise, it would be a lot more difficult to quickly delete 1000 interfaces. You'd need 1000 threads, or sockets, or something to parallelize it otherwise, eh? Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 16:44 ` Ben Greear @ 2011-05-07 16:51 ` Eric Dumazet 2011-05-08 3:45 ` Ben Greear 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 16:51 UTC (permalink / raw) To: Ben Greear; +Cc: Alex Bligh, netdev Le samedi 07 mai 2011 à 09:44 -0700, Ben Greear a écrit : > On 05/07/2011 09:37 AM, Eric Dumazet wrote: > > Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit : > > > >> I wonder if it would be worth having a 'delete me soon' > >> method to delete interfaces that would not block on the > >> RCU code. > >> > >> The controlling programs could use netlink messages to > >> know exactly when an interface was truly gone. > >> > >> That should allow some batching in the sync-net logic > >> too, if user-space code deletes 1000 interfaces very > >> quickly, for instance... > >> > > > > I suggested in the past to have an extension of batch capabilities, so > > that one kthread could have 3 separate lists of devices being destroyed > > in //, > > > > This daemon would basically loop on one call to synchronize_rcu(), and > > transfert list3 to deletion, list2 to list3, list1 to list2, loop, > > eventually releasing RTNL while blocked in synchronize_rcu() > > > > This would need to allow as you suggest an asynchronous deletion method, > > or use a callback to wake the process blocked on device delete. > > I'd want to at least have the option to not block the calling > process...otherwise, it would be a lot more difficult to > quickly delete 1000 interfaces. You'd need 1000 threads, or > sockets, or something to parallelize it otherwise, eh? Yes, if you can afford not receive a final notification of device being fully freed, it should be possible. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 16:51 ` Eric Dumazet @ 2011-05-08 3:45 ` Ben Greear 2011-05-08 8:08 ` Alex Bligh 0 siblings, 1 reply; 53+ messages in thread From: Ben Greear @ 2011-05-08 3:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alex Bligh, netdev On 05/07/2011 09:51 AM, Eric Dumazet wrote: > Le samedi 07 mai 2011 à 09:44 -0700, Ben Greear a écrit : >> On 05/07/2011 09:37 AM, Eric Dumazet wrote: >>> Le samedi 07 mai 2011 à 09:23 -0700, Ben Greear a écrit : >>> >>>> I wonder if it would be worth having a 'delete me soon' >>>> method to delete interfaces that would not block on the >>>> RCU code. >>>> >>>> The controlling programs could use netlink messages to >>>> know exactly when an interface was truly gone. >>>> >>>> That should allow some batching in the sync-net logic >>>> too, if user-space code deletes 1000 interfaces very >>>> quickly, for instance... >>>> >>> >>> I suggested in the past to have an extension of batch capabilities, so >>> that one kthread could have 3 separate lists of devices being destroyed >>> in //, >>> >>> This daemon would basically loop on one call to synchronize_rcu(), and >>> transfert list3 to deletion, list2 to list3, list1 to list2, loop, >>> eventually releasing RTNL while blocked in synchronize_rcu() >>> >>> This would need to allow as you suggest an asynchronous deletion method, >>> or use a callback to wake the process blocked on device delete. >> >> I'd want to at least have the option to not block the calling >> process...otherwise, it would be a lot more difficult to >> quickly delete 1000 interfaces. You'd need 1000 threads, or >> sockets, or something to parallelize it otherwise, eh? > > Yes, if you can afford not receive a final notification of device being > fully freed, it should be possible. Well, I'd hope to get a netlink message about the device being deleted, and after that, be able to create another one with the same name, etc. Whether the memory is actually freed in the kernel or not wouldn't matter to me... Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 3:45 ` Ben Greear @ 2011-05-08 8:08 ` Alex Bligh 0 siblings, 0 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-08 8:08 UTC (permalink / raw) To: Ben Greear, Eric Dumazet; +Cc: netdev, Alex Bligh --On 7 May 2011 20:45:07 -0700 Ben Greear <greearb@candelatech.com> wrote: > Well, I'd hope to get a netlink message about the device being deleted, > and > after that, be able to create another one with the same name, etc. > > Whether the memory is actually freed in the kernel or not wouldn't matter > to me... Provided the former para is always done, I can't actually think of a case where the caller would /ever/ care about the latter (save perhaps a final shutdown of the whole net subsystem). -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 15:54 ` Eric Dumazet 2011-05-07 16:23 ` Ben Greear @ 2011-05-09 21:46 ` Octavian Purdila 1 sibling, 0 replies; 53+ messages in thread From: Octavian Purdila @ 2011-05-09 21:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alex Bligh, netdev On Sat, May 7, 2011 at 6:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > synchronize_rcu() calls are not consuming cpu, they just _wait_ > rcu grace period. > > I suggest you read Documentation/RCU files if you really want to :) > > If you want to check how expensive it is, its quite easy: > add a trace in synchronize_net() > <snip> I proposed adding a "wait" software counter to perf [1] a while ago, which would allow people identify sync_rcu hotspots: http://marc.info/?l=linux-kernel&m=129188584110162 I don't know how much visibility it got, so given this context, I thought of bringing it up again :) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 15:26 ` Alex Bligh 2011-05-07 15:54 ` Eric Dumazet @ 2011-05-07 16:26 ` Eric Dumazet 2011-05-07 18:24 ` Alex Bligh 2011-05-07 18:38 ` Alex Bligh 1 sibling, 2 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 16:26 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 16:26 +0100, Alex Bligh a écrit : > On the current 8 core box I am testing, I see 280ms per interface > delete **even with only 10 interfaces**. I see 260ms with one > interface. I know doing lots of rcu sync stuff can be slow, but > 260ms to remove one veth pair sounds like more than rcu sync going > on. It sounds like a sleep (though I may not have found the > right one). I see no CPU load. > Here, on 2.6.38 kernel (Ubuntu 11.04 provided, on my 2 core laptop) # time rmmod dummy real 0m0.111s user 0m0.000s sys 0m0.000s This removed my two dummy0/dummy1 devices. On another machine with a very recent kernel : $ modprobe dummy numdummies=1 $ ifconfig dummy0 192.168.46.46 up $ time rmmod dummy real 0m0.032s user 0m0.000s sys 0m0.001s $ uname -a Linux svivoipvnx001 2.6.39-rc6-00097-g6ac1576-dirty #550 SMP Sat May 7 00:12:26 CEST 2011 i686 i686 i386 GNU/Linux So 260ms is a bit too much, maybe you hit yet another bug. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 16:26 ` Eric Dumazet @ 2011-05-07 18:24 ` Alex Bligh 2011-05-07 18:32 ` Eric Dumazet 2011-05-07 18:38 ` Alex Bligh 1 sibling, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-07 18:24 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Alex Bligh Eric, --On 7 May 2011 18:26:29 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Here, on 2.6.38 kernel (Ubuntu 11.04 provided, on my 2 core laptop) ># time rmmod dummy > real 0m0.111s ... > On another machine with a very recent kernel : > $ modprobe dummy numdummies=1 > $ ifconfig dummy0 192.168.46.46 up > $ time rmmod dummy > > real 0m0.032s I know it's different machines, but that's a pretty significant difference. So I compiled from 2.6.39-rc6 head (i.e. a kernel less than an hour old), with only your suggested change in, so that (a) I could eliminate old kernels, and (b) I could instrument it. > synchronize_rcu() calls are not consuming cpu, they just _wait_ > rcu grace period. > > I suggest you read Documentation/RCU files if you really want to :) I understand the basic point: it needs to wait for all readers to drop their references. It's sort of hard to understand why on a machine with an idle network there would be reader(s) holding references for 250ms. And indeed the analysis below shows that isn't the case (it's more like 44 ms). > If you want to check how expensive it is, its quite easy: > add a trace in synchronize_net() At least for veth devices, I see the same on 2.6.39-rc6 - if anything it's worse: # ./ifseq -n 100 Sat May 7 17:50:53 UTC 2011 creating 100 interfaces Sat May 7 17:50:54 UTC 2011 done real 0m1.549s user 0m0.060s sys 0m0.990s Sat May 7 17:50:54 UTC 2011 deleting 100 interfaces Sat May 7 17:51:22 UTC 2011 done real 0m27.917s user 0m0.420s sys 0m0.060s Performing that operation produced exactly 200 calls to synchronize net. The timestamps indicate that's 2 per veth pair deletion, and zero per veth pair creation. Analysing the resultant logs shows only 31% of the problem is time spent within synchronize_net() (perl script below). $ ./analyse.pl < syncnet | tail -2 Total 18.98515 Usage 199 Average 0.09540 elsewhere Total 8.77581 Usage 200 Average 0.04388 synchronizing So *something* is spending more than twice as much time as synchronize_net(). I've attached the log below as well. -- Alex Bligh $ cat analyse.pl #!/usr/bin/perl use strict; use warnings; my $lastuptime; my $uptime; my $diff; my $area; my %time; my %usage; while (<>) { chomp; if (m/\[\s*([0-9.]+)\].*synchronize_net/) { $uptime = $1; if (defined($lastuptime)) { $area = (m/end/)?"synchronizing":"elsewhere"; $diff = $uptime - $lastuptime; printf "%5.5f $area\n", $diff; $time{$area}+=$diff; $usage{$area}++; } $lastuptime = $uptime; } } print "\n"; my $k; foreach $k (sort keys %time) { printf "Total %5.5f Usage %d Average %5.5f %s\n", $time{$k}, $usage{$k}, $time{$k}/$usage{$k}, $k; } May 7 17:50:55 nattytest kernel: [ 127.490142] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.560084] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.610350] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.610932] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.740078] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.820071] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.870300] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 127.871050] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.000079] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.070070] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.140085] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.140960] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.260082] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.380072] end synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.430296] begin synchronize_net() May 7 17:50:55 nattytest kernel: [ 128.431135] end synchronize_net() May 7 17:50:56 nattytest kernel: [ 128.550087] begin synchronize_net() May 7 17:50:56 nattytest kernel: [ 128.640057] end synchronize_net() May 7 17:50:56 nattytest kernel: [ 128.710191] begin synchronize_net() May 7 17:50:56 nattytest kernel: [ 128.730085] end synchronize_net() May 7 17:50:56 nattytest kernel: [ 128.880074] begin synchronize_net() May 7 17:50:56 nattytest kernel: [ 128.990123] end synchronize_net() May 7 17:50:56 nattytest kernel: [ 129.060087] begin synchronize_net() May 7 17:50:56 nattytest kernel: [ 129.070128] end synchronize_net() May 7 17:50:56 nattytest kernel: [ 129.220079] begin synchronize_net() May 7 17:50:56 nattytest kernel: [ 129.310070] end synchronize_net() May 7 17:50:56 nattytest kernel: [ 129.370280] begin synchronize_net() May 7 17:50:56 nattytest kernel: [ 129.390099] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.540174] begin synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.620063] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.690196] begin synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.710098] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.850084] begin synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.930070] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.980314] begin synchronize_net() May 7 17:50:57 nattytest kernel: [ 129.990225] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 130.110086] begin synchronize_net() May 7 17:50:57 nattytest kernel: [ 130.200078] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 130.270187] begin synchronize_net() May 7 17:50:57 nattytest kernel: [ 130.280159] end synchronize_net() May 7 17:50:57 nattytest kernel: [ 130.420133] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.500075] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.550344] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.550998] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.680075] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.750071] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.800333] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.801105] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 130.930071] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.010064] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.080171] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.090129] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.240078] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.320068] end synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.390216] begin synchronize_net() May 7 17:50:58 nattytest kernel: [ 131.403088] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.540081] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.610072] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.660314] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.661094] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.790076] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.860082] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.910294] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 131.911061] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 132.030075] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 132.110079] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 132.160319] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 132.161101] end synchronize_net() May 7 17:50:59 nattytest kernel: [ 132.280075] begin synchronize_net() May 7 17:50:59 nattytest kernel: [ 132.400066] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.450321] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.451134] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.570078] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.650080] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.700288] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.701073] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.830066] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.910069] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.990128] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 132.990893] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 133.120076] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 133.200071] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 133.250313] begin synchronize_net() May 7 17:51:00 nattytest kernel: [ 133.251088] end synchronize_net() May 7 17:51:00 nattytest kernel: [ 133.370082] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.450147] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.500354] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.505508] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.630085] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.710090] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.760399] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.770396] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 133.920090] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.010083] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.060292] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.080082] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.220082] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.310064] end synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.380176] begin synchronize_net() May 7 17:51:01 nattytest kernel: [ 134.390127] end synchronize_net() May 7 17:51:02 nattytest kernel: [ 134.550205] begin synchronize_net() May 7 17:51:02 nattytest kernel: [ 134.630133] end synchronize_net() May 7 17:51:02 nattytest kernel: [ 134.830065] begin synchronize_net() May 7 17:51:02 nattytest kernel: [ 134.880091] end synchronize_net() May 7 17:51:02 nattytest kernel: [ 135.040194] begin synchronize_net() May 7 17:51:02 nattytest kernel: [ 135.120072] end synchronize_net() May 7 17:51:02 nattytest kernel: [ 135.190202] begin synchronize_net() May 7 17:51:02 nattytest kernel: [ 135.210084] end synchronize_net() May 7 17:51:02 nattytest kernel: [ 135.370081] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.440076] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.490331] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.491147] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.620079] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.700064] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.757709] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.770095] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.920101] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 135.990068] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.050068] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.050825] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.160081] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.240067] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.290431] begin synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.291020] end synchronize_net() May 7 17:51:03 nattytest kernel: [ 136.420094] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.500062] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.550388] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.550988] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.670101] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.750063] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.800297] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.801081] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 136.920070] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.000069] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.050252] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.051042] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.180076] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.260065] end synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.320191] begin synchronize_net() May 7 17:51:04 nattytest kernel: [ 137.340087] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.490082] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.570071] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.620314] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.621084] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.740083] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.830071] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.890264] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 137.910087] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 138.060074] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 138.140070] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 138.210094] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 138.210940] end synchronize_net() May 7 17:51:05 nattytest kernel: [ 138.340089] begin synchronize_net() May 7 17:51:05 nattytest kernel: [ 138.410088] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.470306] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.471080] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.590082] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.670131] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.720268] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.721034] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.850077] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.920071] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.970305] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 138.971074] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 139.090082] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 139.170068] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 139.230297] begin synchronize_net() May 7 17:51:06 nattytest kernel: [ 139.231115] end synchronize_net() May 7 17:51:06 nattytest kernel: [ 139.350083] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.440057] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.510174] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.520124] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.680077] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.750087] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.840093] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.840867] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 139.970079] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.080094] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.130284] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.131056] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.260076] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.330076] end synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.380332] begin synchronize_net() May 7 17:51:07 nattytest kernel: [ 140.381364] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.510081] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.590074] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.650289] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.670086] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.800064] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.880073] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.930267] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 140.931048] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 141.050072] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 141.140067] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 141.190328] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 141.200119] end synchronize_net() May 7 17:51:08 nattytest kernel: [ 141.360077] begin synchronize_net() May 7 17:51:08 nattytest kernel: [ 141.430074] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.480312] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.481110] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.600080] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.680081] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.730335] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.731136] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.860079] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.930070] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.980317] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 141.981114] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 142.110097] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 142.180074] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 142.230335] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 142.231113] end synchronize_net() May 7 17:51:09 nattytest kernel: [ 142.350080] begin synchronize_net() May 7 17:51:09 nattytest kernel: [ 142.430087] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.480310] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.481084] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.600090] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.720091] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.770310] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.771130] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.900080] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 142.980070] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 143.060109] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 143.080079] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 143.230095] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 143.300044] end synchronize_net() May 7 17:51:10 nattytest kernel: [ 143.350319] begin synchronize_net() May 7 17:51:10 nattytest kernel: [ 143.351119] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.470076] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.560083] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.620209] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.640090] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.780078] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.860076] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.910284] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 143.911142] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.030075] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.110085] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.160313] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.161104] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.280080] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.360074] end synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.410294] begin synchronize_net() May 7 17:51:11 nattytest kernel: [ 144.411096] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.530092] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.620072] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.680355] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.700081] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.860073] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.930075] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.980325] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 144.981155] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 145.110079] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 145.180084] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 145.230324] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 145.231098] end synchronize_net() May 7 17:51:12 nattytest kernel: [ 145.350077] begin synchronize_net() May 7 17:51:12 nattytest kernel: [ 145.430081] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.480300] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.481050] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.610079] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.690071] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.750370] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.760378] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.900074] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 145.970077] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 146.020298] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 146.021548] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 146.150081] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 146.230073] end synchronize_net() May 7 17:51:13 nattytest kernel: [ 146.299959] begin synchronize_net() May 7 17:51:13 nattytest kernel: [ 146.310076] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.440080] begin synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.520065] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.580285] begin synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.590266] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.750063] begin synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.830084] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.890233] begin synchronize_net() May 7 17:51:14 nattytest kernel: [ 146.910088] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 147.060081] begin synchronize_net() May 7 17:51:14 nattytest kernel: [ 147.140061] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 147.200277] begin synchronize_net() May 7 17:51:14 nattytest kernel: [ 147.220089] end synchronize_net() May 7 17:51:14 nattytest kernel: [ 147.360081] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.450084] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.510283] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.530135] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.680075] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.760066] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.830172] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 147.870065] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.000075] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.070065] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.120327] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.121099] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.240073] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.320061] end synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.370253] begin synchronize_net() May 7 17:51:15 nattytest kernel: [ 148.371001] end synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.500082] begin synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.580073] end synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.650192] begin synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.670095] end synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.820077] begin synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.910058] end synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.980223] begin synchronize_net() May 7 17:51:16 nattytest kernel: [ 148.990280] end synchronize_net() May 7 17:51:16 nattytest kernel: [ 149.130076] begin synchronize_net() May 7 17:51:16 nattytest kernel: [ 149.220104] end synchronize_net() May 7 17:51:16 nattytest kernel: [ 149.270373] begin synchronize_net() May 7 17:51:16 nattytest kernel: [ 149.300076] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.450078] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.530063] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.580337] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.581173] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.700072] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.780071] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.830307] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.831075] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 149.960083] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.030067] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.080323] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.081113] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.200076] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.280065] end synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.330341] begin synchronize_net() May 7 17:51:17 nattytest kernel: [ 150.331164] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.460081] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.540064] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.600267] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.620084] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.780067] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.850065] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.900315] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 150.901093] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.020083] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.100069] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.150282] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.151067] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.280079] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.360070] end synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.410316] begin synchronize_net() May 7 17:51:18 nattytest kernel: [ 151.411133] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.540178] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.610071] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.660313] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.661125] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.780072] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.860074] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.910295] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 151.911124] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.040109] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.110066] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.160295] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.161049] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.280073] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.360085] end synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.410306] begin synchronize_net() May 7 17:51:19 nattytest kernel: [ 152.411060] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.530075] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.610064] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.660270] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.661284] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.830095] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.910065] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.960314] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 152.961131] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 153.090076] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 153.160083] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 153.210293] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 153.211113] end synchronize_net() May 7 17:51:20 nattytest kernel: [ 153.340081] begin synchronize_net() May 7 17:51:20 nattytest kernel: [ 153.420067] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.470317] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.471164] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.590082] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.680063] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.740238] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.750127] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.900077] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 153.980077] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 154.080091] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 154.080872] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 154.210077] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 154.290061] end synchronize_net() May 7 17:51:21 nattytest kernel: [ 154.340327] begin synchronize_net() May 7 17:51:21 nattytest kernel: [ 154.360089] end synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.510085] begin synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.580075] end synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.630300] begin synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.631065] end synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.770074] begin synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.880081] end synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.940349] begin synchronize_net() May 7 17:51:22 nattytest kernel: [ 154.960085] end synchronize_net() May 7 17:51:22 nattytest kernel: [ 155.130080] begin synchronize_net() May 7 17:51:22 nattytest kernel: [ 155.200079] end synchronize_net() May 7 17:51:22 nattytest kernel: [ 155.250334] begin synchronize_net() May 7 17:51:22 nattytest kernel: [ 155.251105] end synchronize_net() ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:24 ` Alex Bligh @ 2011-05-07 18:32 ` Eric Dumazet 2011-05-07 18:39 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 18:32 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 19:24 +0100, Alex Bligh a écrit : > Eric, > > --On 7 May 2011 18:26:29 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Here, on 2.6.38 kernel (Ubuntu 11.04 provided, on my 2 core laptop) > ># time rmmod dummy > > real 0m0.111s > ... > > On another machine with a very recent kernel : > > $ modprobe dummy numdummies=1 > > $ ifconfig dummy0 192.168.46.46 up > > $ time rmmod dummy > > > > real 0m0.032s > > I know it's different machines, but that's a pretty significant > difference. So I compiled from 2.6.39-rc6 head (i.e. a kernel > less than an hour old), with only your suggested change in, > so that (a) I could eliminate old kernels, and (b) I could > instrument it. > > > synchronize_rcu() calls are not consuming cpu, they just _wait_ > > rcu grace period. > > > > I suggest you read Documentation/RCU files if you really want to :) > > I understand the basic point: it needs to wait for all readers > to drop their references. It's sort of hard to understand why > on a machine with an idle network there would be reader(s) holding > references for 250ms. And indeed the analysis below shows that > isn't the case (it's more like 44 ms). > > > If you want to check how expensive it is, its quite easy: > > add a trace in synchronize_net() > > At least for veth devices, I see the same on 2.6.39-rc6 - if anything > it's worse: > > # ./ifseq -n 100 > Sat May 7 17:50:53 UTC 2011 creating 100 interfaces > Sat May 7 17:50:54 UTC 2011 done > > real 0m1.549s > user 0m0.060s > sys 0m0.990s > Sat May 7 17:50:54 UTC 2011 deleting 100 interfaces > Sat May 7 17:51:22 UTC 2011 done > > real 0m27.917s > user 0m0.420s > sys 0m0.060s > > Performing that operation produced exactly 200 calls to synchronize net. > The timestamps indicate that's 2 per veth pair deletion, and zero > per veth pair creation. > > Analysing the resultant logs shows only 31% of the problem is > time spent within synchronize_net() (perl script below). > > $ ./analyse.pl < syncnet | tail -2 > Total 18.98515 Usage 199 Average 0.09540 elsewhere > Total 8.77581 Usage 200 Average 0.04388 synchronizing > > So *something* is spending more than twice as much time as > synchronize_net(). > > I've attached the log below as well. > > -- > Alex Bligh > > > $ cat analyse.pl > #!/usr/bin/perl > > use strict; > use warnings; > > my $lastuptime; > my $uptime; > my $diff; > my $area; > my %time; > my %usage; > > while (<>) > { > chomp; > if (m/\[\s*([0-9.]+)\].*synchronize_net/) > { > $uptime = $1; > if (defined($lastuptime)) > { > $area = (m/end/)?"synchronizing":"elsewhere"; > $diff = $uptime - $lastuptime; > printf "%5.5f $area\n", $diff; > $time{$area}+=$diff; > $usage{$area}++; > } > $lastuptime = $uptime; > } > } > > print "\n"; > > my $k; > foreach $k (sort keys %time) > { > printf "Total %5.5f Usage %d Average %5.5f %s\n", $time{$k}, > $usage{$k}, $time{$k}/$usage{$k}, $k; > } > > > > May 7 17:50:55 nattytest kernel: [ 127.490142] begin synchronize_net() > May 7 17:50:55 nattytest kernel: [ 127.560084] end synchronize_net() > May 7 17:50:55 nattytest kernel: [ 127.610350] begin synchronize_net() > May 7 17:50:55 nattytest kernel: [ 127.610932] end synchronize_net() > May 7 17:50:55 nattytest kernel: [ 127.740078] begin synchronize_net() > May 7 17:50:55 nattytest kernel: [ 127.820071] end synchronize_net() Well, there is also one rcu_barrier() call that is expensive. (It was changed from one synchronize_rcu() to one rcu_barrier() lately in commit ef885afb , in 2.6.36 kernel) net/core/dev.c line 5167 http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ef885afbf8a37689afc1d9d545e2f3e7a8276c17 netdev_wait_allrefs() waits that all references to a device vanishes. It currently uses a _very_ pessimistic 250 ms delay between each probe. Some users reported that no more than 4 devices can be dismantled per second, this is a pretty serious problem for some setups. Most of the time, a refcount is about to be released by an RCU callback, that is still in flight because rollback_registered_many() uses a synchronize_rcu() call instead of rcu_barrier(). Problem is visible if number of online cpus is one, because synchronize_rcu() is then a no op. time to remove 50 ipip tunnels on a UP machine : before patch : real 11.910s after patch : real 1.250s Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:32 ` Eric Dumazet @ 2011-05-07 18:39 ` Eric Dumazet 2011-05-08 10:09 ` Alex Bligh 2011-05-07 18:42 ` Eric Dumazet 2011-05-07 18:51 ` Alex Bligh 2 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 18:39 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 20:32 +0200, Eric Dumazet a écrit : Also you could patch synchronize_sched() itself instead of synchronize_net() diff --git a/kernel/rcutree.c b/kernel/rcutree.c index dd4aea8..4af6e10 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1518,6 +1518,7 @@ EXPORT_SYMBOL_GPL(call_rcu_bh); void synchronize_sched(void) { struct rcu_synchronize rcu; + ktime_t time_start = ktime_get(); if (rcu_blocking_is_gp()) return; @@ -1529,6 +1530,7 @@ void synchronize_sched(void) /* Wait for it. */ wait_for_completion(&rcu.completion); destroy_rcu_head_on_stack(&rcu.head); + pr_err("synchronize_rcu() in %lld us\n", ktime_us_delta(ktime_get(), time_start)); } EXPORT_SYMBOL_GPL(synchronize_sched); ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:39 ` Eric Dumazet @ 2011-05-08 10:09 ` Alex Bligh 0 siblings, 0 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-08 10:09 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Alex Bligh --On 7 May 2011 20:39:13 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le samedi 07 mai 2011 à 20:32 +0200, Eric Dumazet a écrit : > > Also you could patch synchronize_sched() itself instead of > synchronize_net() OK, I did this, plus instrumented the call to rcu_barrier() you mentioned: Looking at the synchronize_net() and rcu_barrier() calls: Total 8.43935 Usage 399 Average 0.02115 elsewhere Total 10.65050 Usage 200 Average 0.05325 rcu_barrier Total 9.28948 Usage 200 Average 0.04645 synchronize_net it's spending about 1/3 of its time in that rcu_barrier, 1/3 in synchronize_sched() and 1/3 elsewere. Turning now to the synchronize_sched() (per your patch), I see Total 16.36852 Usage 400 Average 0.04092 synchronize_sched() Note "Usage 400". That's because precisely half the calls to synchronize_sched() occur outside of synchronize_net(), and half occur within synchronize_net() (per logs) A typical interface being removed looks like this: May 8 09:47:31 nattytest kernel: [ 177.030197] synchronize_sched() in 66921 us May 8 09:47:31 nattytest kernel: [ 177.030957] begin synchronize_net() May 8 09:47:31 nattytest kernel: [ 177.120085] synchronize_sched() in 89080 us May 8 09:47:31 nattytest kernel: [ 177.120819] end synchronize_net() May 8 09:47:31 nattytest kernel: [ 177.121698] begin rcu_barrier() May 8 09:47:31 nattytest kernel: [ 177.190152] end rcu_barrier() So for every interface being destroyed (I'm doing 200 as veths are pairs), we do 2 synchronize_sched() calls and 1 rcu_barrier. Each of these takes roughly 42ms with CONFIG_HZ set to 100, leading to 125ms per interface destroy, and 250ms per veth pair destroy. It may be a naive question but why would we need to do 2 synchronize_sched() and 1 rcu_barrier() to remove an interface? -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:32 ` Eric Dumazet 2011-05-07 18:39 ` Eric Dumazet @ 2011-05-07 18:42 ` Eric Dumazet 2011-05-07 18:50 ` Alex Bligh 2011-05-08 7:12 ` Eric Dumazet 2011-05-07 18:51 ` Alex Bligh 2 siblings, 2 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 18:42 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Here is my trace here for one device deletion on one 8 core machine [ 800.447012] synchronize_rcu() in 15787 us [ 800.455013] synchronize_rcu() in 7682 us [ 800.464019] rcu_barrier() in 8487 us Not that bad. $ grep RCU .config # RCU Subsystem CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set CONFIG_RCU_TRACE=y CONFIG_RCU_FANOUT=32 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_RCU_FAST_NO_HZ is not set CONFIG_TREE_RCU_TRACE=y ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:42 ` Eric Dumazet @ 2011-05-07 18:50 ` Alex Bligh 2011-05-08 7:12 ` Eric Dumazet 1 sibling, 0 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-07 18:50 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Alex Bligh --On 7 May 2011 20:42:29 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Here is my trace here for one device deletion on one 8 core machine > > [ 800.447012] synchronize_rcu() in 15787 us > [ 800.455013] synchronize_rcu() in 7682 us > [ 800.464019] rcu_barrier() in 8487 us Would you mind trying it with my script to do veth devices? kill udev and do unshare -n first. I've done this on 2 different lots of hardware now, with 3 kernels 18 months apart. $ grep RCU .config # RCU Subsystem CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=64 # CONFIG_RCU_FANOUT_EXACT is not set CONFIG_RCU_FAST_NO_HZ=y # CONFIG_TREE_RCU_TRACE is not set # CONFIG_SPARSE_RCU_POINTER is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_RCU_CPU_STALL_DETECTOR is not set -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:42 ` Eric Dumazet 2011-05-07 18:50 ` Alex Bligh @ 2011-05-08 7:12 ` Eric Dumazet 2011-05-08 8:06 ` Alex Bligh ` (2 more replies) 1 sibling, 3 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-08 7:12 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev, Paul E. McKenney Le samedi 07 mai 2011 à 20:42 +0200, Eric Dumazet a écrit : > Here is my trace here for one device deletion on one 8 core machine > > [ 800.447012] synchronize_rcu() in 15787 us > [ 800.455013] synchronize_rcu() in 7682 us > [ 800.464019] rcu_barrier() in 8487 us > > Not that bad. > > $ grep RCU .config > # RCU Subsystem > CONFIG_TREE_RCU=y > # CONFIG_PREEMPT_RCU is not set > CONFIG_RCU_TRACE=y > CONFIG_RCU_FANOUT=32 > # CONFIG_RCU_FANOUT_EXACT is not set > # CONFIG_RCU_FAST_NO_HZ is not set > CONFIG_TREE_RCU_TRACE=y > By the way, if I change HZ from 1000 to 100 I now have ten times slower result : # ip link add link eth0 eth0.103 type vlan id 103 # time ip link del eth0.103 real 0m0.430s user 0m0.000s sys 0m0.000s So all this is related to your HZ value, even in a CONFIG_NO_HZ=y kernel. Alex, I guess you have HZ=250 ? # uname -a Linux svivoipvnx021 2.6.39-rc6-00214-g5511a34-dirty #574 SMP Sun May 8 08:44:14 CEST 2011 x86_64 x86_64 x86_64 GNU/Linux # cat /proc/cmdline I enabled CONFIG_RCU_FAST_NO_HZ and got worse results (but not alsways... its very variable) # time ip link del eth0.103 real 0m0.544s user 0m0.000s sys 0m0.000s # time ip link del eth0.103 real 0m0.414s user 0m0.000s sys 0m0.000s ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 7:12 ` Eric Dumazet @ 2011-05-08 8:06 ` Alex Bligh 2011-05-08 9:35 ` Alex Bligh 2011-05-08 12:32 ` Paul E. McKenney 2 siblings, 0 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-08 8:06 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Paul E. McKenney, Alex Bligh Eric, > By the way, if I change HZ from 1000 to 100 I now have ten times slower > result : > ># ip link add link eth0 eth0.103 type vlan id 103 ># time ip link del eth0.103 > > real 0m0.430s > user 0m0.000s > sys 0m0.000s > > So all this is related to your HZ value, even in a CONFIG_NO_HZ=y > kernel. That's very mysterious. > Alex, I guess you have HZ=250 ? I have HZ=100. I am basically using the Ubuntu default with localmodconfig to make compile times sensible. amb@nattytest:~$ cd kernel/linux-2.6/ amb@nattytest:~/kernel/linux-2.6$ fgrep HZ .config CONFIG_RCU_FAST_NO_HZ=y CONFIG_NO_HZ=y CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 # CONFIG_MACHZ_WDT is not set -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 7:12 ` Eric Dumazet 2011-05-08 8:06 ` Alex Bligh @ 2011-05-08 9:35 ` Alex Bligh 2011-05-08 12:18 ` Alex Bligh 2011-05-08 12:44 ` Paul E. McKenney 2011-05-08 12:32 ` Paul E. McKenney 2 siblings, 2 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-08 9:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Paul E. McKenney, Alex Bligh Eric, --On 8 May 2011 09:12:22 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > By the way, if I change HZ from 1000 to 100 I now have ten times slower > result : I repeated that test here. With HZ set to 1000 I got a total time of 4.022 seconds to remove 100 interfaces, of which: Total 3.03808 Usage 199 Average 0.01527 elsewhere Total 0.93992 Usage 200 Average 0.00470 synchronizing as opposed to a total of 27.917 seconds with HZ set to 100, of which Total 18.98515 Usage 199 Average 0.09540 elsewhere Total 8.77581 Usage 200 Average 0.04388 synchronizing Not quite a factor of 10 improvement, but nearly. I have CONFIG_RCU_FAST_NO_HZ=y I suspect this may just mean an rcu reader holds the rcu_read_lock for a jiffies related time. Though I'm having difficulty seeing what that might be on a system where the net is in essence idle. -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 9:35 ` Alex Bligh @ 2011-05-08 12:18 ` Alex Bligh 2011-05-08 12:50 ` Paul E. McKenney 2011-05-08 12:44 ` Paul E. McKenney 1 sibling, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-08 12:18 UTC (permalink / raw) To: Alex Bligh, Eric Dumazet; +Cc: netdev, Paul E. McKenney, Alex Bligh --On 8 May 2011 10:35:02 +0100 Alex Bligh <alex@alex.org.uk> wrote: > I suspect this may just mean an rcu reader holds the rcu_read_lock > for a jiffies related time. Though I'm having difficulty seeing > what that might be on a system where the net is in essence idle. Having read the RCU docs, this can't be right, because blocking is not legal when in the rcu_read_lock critical section. The system concerned is an 8 cpu system but I get comparable results on a 2 cpu system. I am guessing that when the synchronize_sched() happens, all cores but the cpu on which that is executing are idle (at least on the vast majority of calls) as the machine itself is idle. As I understand, RCU synchronization (in the absence of lots of callbacks etc.) is meant to wait until it knows all RCU read critical sections which are running on entry have been left. It exploits the fact that RCU read critical sections cannot block by waiting for a context switch on each cpu, OR for that cpu to be in the idle state or running user code (also incompatible with a read critical section). The fact that increasing HZ masks the problem seems to imply that sychronize_sched() is waiting when it shouldn't be, as it suggests it's waiting for a context switch. But surely it shouldn't be waiting for context switch if all other cpu cores are idle? It knows that it (the caller) doesn't hold an rcu_read_lock, and presumably can see the other cpus are in the idle state, in which case surely it should return immediately? Distribution of latency in synchronize_sched() looks like this: 20-49 us 110 instances (27.500%) 50-99 us 45 instances (11.250%) 5000-9999 us 5 instances (1.250%) 10000-19999 us 33 instances (8.250%) 20000-49999 us 4 instances (1.000%) 50000-99999 us 191 instances (47.750%) 100000-199999 us 12 instances (3.000%) -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 12:18 ` Alex Bligh @ 2011-05-08 12:50 ` Paul E. McKenney 2011-05-08 13:13 ` Alex Bligh 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2011-05-08 12:50 UTC (permalink / raw) To: Alex Bligh; +Cc: Eric Dumazet, netdev On Sun, May 08, 2011 at 01:18:55PM +0100, Alex Bligh wrote: > > > --On 8 May 2011 10:35:02 +0100 Alex Bligh <alex@alex.org.uk> wrote: > > >I suspect this may just mean an rcu reader holds the rcu_read_lock > >for a jiffies related time. Though I'm having difficulty seeing > >what that might be on a system where the net is in essence idle. > > Having read the RCU docs, this can't be right, because blocking > is not legal when in the rcu_read_lock critical section. > > The system concerned is an 8 cpu system but I get comparable > results on a 2 cpu system. > > I am guessing that when the synchronize_sched() happens, all cores > but the cpu on which that is executing are idle (at least on > the vast majority of calls) as the machine itself is idle. > As I understand, RCU synchronization (in the absence of lots > of callbacks etc.) is meant to wait until it knows all RCU > read critical sections which are running on entry have > been left. It exploits the fact that RCU read critical sections > cannot block by waiting for a context switch on each cpu, OR > for that cpu to be in the idle state or running user code (also > incompatible with a read critical section). > > The fact that increasing HZ masks the problem seems to imply that > sychronize_sched() is waiting when it shouldn't be, as it suggests > it's waiting for a context switch. But surely it shouldn't be > waiting for context switch if all other cpu cores are idle? > It knows that it (the caller) doesn't hold an rcu_read_lock, > and presumably can see the other cpus are in the idle state, > in which case surely it should return immediately? Distribution > of latency in synchronize_sched() looks like this: > > 20-49 us 110 instances (27.500%) > 50-99 us 45 instances (11.250%) Really? I am having a hard time believing this above two. Is this really 2000-4999 us and 5000-9999 us? That would be much more believable, and expected on a busy system with lots of context switching. Or on a system with CONFIG_NO_HZ=n. > 5000-9999 us 5 instances (1.250%) This makes sense for a mostly-idle system with frequent short bursts of work. > 10000-19999 us 33 instances (8.250%) This makes sense for a CONFIG_NO_HZ system that is idle, where there is some amount of background work that is also using RCU grace periods. > 20000-49999 us 4 instances (1.000%) > 50000-99999 us 191 instances (47.750%) > 100000-199999 us 12 instances (3.000%) These last involve additional delays. Possibilities include long-running irq handlers, SMIs, or NMIs. Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 12:50 ` Paul E. McKenney @ 2011-05-08 13:13 ` Alex Bligh 2011-05-08 13:44 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-08 13:13 UTC (permalink / raw) To: paulmck; +Cc: Eric Dumazet, netdev, Alex Bligh >> The fact that increasing HZ masks the problem seems to imply that >> sychronize_sched() is waiting when it shouldn't be, as it suggests >> it's waiting for a context switch. But surely it shouldn't be >> waiting for context switch if all other cpu cores are idle? >> It knows that it (the caller) doesn't hold an rcu_read_lock, >> and presumably can see the other cpus are in the idle state, >> in which case surely it should return immediately? Distribution >> of latency in synchronize_sched() looks like this: >> >> 20-49 us 110 instances (27.500%) >> 50-99 us 45 instances (11.250%) > > Really? I am having a hard time believing this above two. Is this really > 2000-4999 us and 5000-9999 us? That would be much more believable, > and expected on a busy system with lots of context switching. Or on a > system with CONFIG_NO_HZ=n. Yes, really 20-49us and 50-99us, not ms. Raw data attached :-) I'm guessing there are circumstances where there is an early exit. $ fgrep HZ .config CONFIG_RCU_FAST_NO_HZ=y CONFIG_NO_HZ=y CONFIG_HZ_100=y # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=100 # CONFIG_MACHZ_WDT is not set > >> 5000-9999 us 5 instances (1.250%) > > This makes sense for a mostly-idle system with frequent short bursts > of work. > >> 10000-19999 us 33 instances (8.250%) > > This makes sense for a CONFIG_NO_HZ system that is idle, where there > is some amount of background work that is also using RCU grace periods. > >> 20000-49999 us 4 instances (1.000%) >> 50000-99999 us 191 instances (47.750%) >> 100000-199999 us 12 instances (3.000%) > > These last involve additional delays. Possibilities include long-running > irq handlers, SMIs, or NMIs. There is nothing much going on these systems (idle, no other users, just normal system daemons). Note this is with CONFIG_HZ=100 (which is the Ubuntu default). I wonder if that means you need to scale your expactations by a factor if 10 (i.e. you'd expect 50000-99999 for "a mostly-idle system with frequent short bursts of work." Interestingly there are quite a few of these on boot up. If this is a CONFIG_HZ thing, then setting it to 1000 would probably take half a second off bootup time. -- Alex Bligh May 8 09:47:31 nattytest kernel: [ 177.030197] synchronize_sched() in 66921 us May 8 09:47:31 nattytest kernel: [ 177.120085] synchronize_sched() in 89080 us May 8 09:47:31 nattytest kernel: [ 177.190910] synchronize_sched() in 46 us May 8 09:47:31 nattytest kernel: [ 177.210087] synchronize_sched() in 18686 us May 8 09:47:32 nattytest kernel: [ 177.400080] synchronize_sched() in 110609 us May 8 09:47:32 nattytest kernel: [ 177.480071] synchronize_sched() in 78550 us May 8 09:47:32 nattytest kernel: [ 177.550119] synchronize_sched() in 19087 us May 8 09:47:32 nattytest kernel: [ 177.551761] synchronize_sched() in 79 us May 8 09:47:32 nattytest kernel: [ 177.670085] synchronize_sched() in 60616 us May 8 09:47:32 nattytest kernel: [ 177.760075] synchronize_sched() in 88465 us May 8 09:47:32 nattytest kernel: [ 177.820999] synchronize_sched() in 45 us May 8 09:47:32 nattytest kernel: [ 177.840086] synchronize_sched() in 17579 us May 8 09:47:32 nattytest kernel: [ 177.970084] synchronize_sched() in 70925 us May 8 09:47:32 nattytest kernel: [ 178.050092] synchronize_sched() in 78477 us May 8 09:47:32 nattytest kernel: [ 178.101046] synchronize_sched() in 39 us May 8 09:47:32 nattytest kernel: [ 178.102574] synchronize_sched() in 48 us May 8 09:47:33 nattytest kernel: [ 178.230082] synchronize_sched() in 70966 us May 8 09:47:33 nattytest kernel: [ 178.310071] synchronize_sched() in 78503 us May 8 09:47:33 nattytest kernel: [ 178.371056] synchronize_sched() in 43 us May 8 09:47:33 nattytest kernel: [ 178.390094] synchronize_sched() in 17553 us May 8 09:47:33 nattytest kernel: [ 178.540084] synchronize_sched() in 70533 us May 8 09:47:33 nattytest kernel: [ 178.620074] synchronize_sched() in 79172 us May 8 09:47:33 nattytest kernel: [ 178.671124] synchronize_sched() in 57 us May 8 09:47:33 nattytest kernel: [ 178.672645] synchronize_sched() in 49 us May 8 09:47:33 nattytest kernel: [ 178.830106] synchronize_sched() in 100955 us May 8 09:47:33 nattytest kernel: [ 178.910084] synchronize_sched() in 78408 us May 8 09:47:33 nattytest kernel: [ 178.961090] synchronize_sched() in 43 us May 8 09:47:33 nattytest kernel: [ 178.962642] synchronize_sched() in 50 us May 8 09:47:33 nattytest kernel: [ 179.090088] synchronize_sched() in 71354 us May 8 09:47:33 nattytest kernel: [ 179.220071] synchronize_sched() in 128463 us May 8 09:47:34 nattytest kernel: [ 179.300867] synchronize_sched() in 52 us May 8 09:47:34 nattytest kernel: [ 179.302079] synchronize_sched() in 48 us May 8 09:47:34 nattytest kernel: [ 179.420085] synchronize_sched() in 61562 us May 8 09:47:34 nattytest kernel: [ 179.500093] synchronize_sched() in 78506 us May 8 09:47:34 nattytest kernel: [ 179.551048] synchronize_sched() in 42 us May 8 09:47:34 nattytest kernel: [ 179.552637] synchronize_sched() in 51 us May 8 09:47:34 nattytest kernel: [ 179.670088] synchronize_sched() in 61469 us May 8 09:47:34 nattytest kernel: [ 179.760078] synchronize_sched() in 88430 us May 8 09:47:34 nattytest kernel: [ 179.830918] synchronize_sched() in 44 us May 8 09:47:34 nattytest kernel: [ 179.850173] synchronize_sched() in 17783 us May 8 09:47:34 nattytest kernel: [ 180.080085] synchronize_sched() in 150585 us May 8 09:47:34 nattytest kernel: [ 180.160082] synchronize_sched() in 78381 us May 8 09:47:34 nattytest kernel: [ 180.211027] synchronize_sched() in 39 us May 8 09:47:34 nattytest kernel: [ 180.212186] synchronize_sched() in 52 us May 8 09:47:35 nattytest kernel: [ 180.320091] synchronize_sched() in 50520 us May 8 09:47:35 nattytest kernel: [ 180.400074] synchronize_sched() in 78525 us May 8 09:47:35 nattytest kernel: [ 180.451028] synchronize_sched() in 50 us May 8 09:47:35 nattytest kernel: [ 180.452100] synchronize_sched() in 30 us May 8 09:47:35 nattytest kernel: [ 180.570086] synchronize_sched() in 61211 us May 8 09:47:35 nattytest kernel: [ 180.650073] synchronize_sched() in 79145 us May 8 09:47:35 nattytest kernel: [ 180.701376] synchronize_sched() in 38 us May 8 09:47:35 nattytest kernel: [ 180.703490] synchronize_sched() in 57 us May 8 09:47:35 nattytest kernel: [ 180.820087] synchronize_sched() in 61483 us May 8 09:47:35 nattytest kernel: [ 180.910081] synchronize_sched() in 88485 us May 8 09:47:35 nattytest kernel: [ 180.965970] synchronize_sched() in 49 us May 8 09:47:35 nattytest kernel: [ 180.990091] synchronize_sched() in 22425 us May 8 09:47:35 nattytest kernel: [ 181.150084] synchronize_sched() in 71317 us May 8 09:47:36 nattytest kernel: [ 181.230074] synchronize_sched() in 78470 us May 8 09:47:36 nattytest kernel: [ 181.300099] synchronize_sched() in 18988 us May 8 09:47:36 nattytest kernel: [ 181.301637] synchronize_sched() in 46 us May 8 09:47:36 nattytest kernel: [ 181.420086] synchronize_sched() in 61240 us May 8 09:47:36 nattytest kernel: [ 181.500072] synchronize_sched() in 77682 us May 8 09:47:36 nattytest kernel: [ 181.551059] synchronize_sched() in 45 us May 8 09:47:36 nattytest kernel: [ 181.552546] synchronize_sched() in 29 us May 8 09:47:36 nattytest kernel: [ 181.670084] synchronize_sched() in 61612 us May 8 09:47:36 nattytest kernel: [ 181.750076] synchronize_sched() in 78513 us May 8 09:47:36 nattytest kernel: [ 181.801195] synchronize_sched() in 52 us May 8 09:47:36 nattytest kernel: [ 181.802805] synchronize_sched() in 69 us May 8 09:47:36 nattytest kernel: [ 181.920088] synchronize_sched() in 61998 us May 8 09:47:36 nattytest kernel: [ 182.000091] synchronize_sched() in 78539 us May 8 09:47:36 nattytest kernel: [ 182.051053] synchronize_sched() in 42 us May 8 09:47:36 nattytest kernel: [ 182.052595] synchronize_sched() in 29 us May 8 09:47:36 nattytest kernel: [ 182.170103] synchronize_sched() in 61607 us May 8 09:47:37 nattytest kernel: [ 182.260072] synchronize_sched() in 88460 us May 8 09:47:37 nattytest kernel: [ 182.331028] synchronize_sched() in 44 us May 8 09:47:37 nattytest kernel: [ 182.350096] synchronize_sched() in 17563 us May 8 09:47:37 nattytest kernel: [ 182.500085] synchronize_sched() in 71793 us May 8 09:47:37 nattytest kernel: [ 182.620085] synchronize_sched() in 118343 us May 8 09:47:37 nattytest kernel: [ 182.671024] synchronize_sched() in 39 us May 8 09:47:37 nattytest kernel: [ 182.672511] synchronize_sched() in 29 us May 8 09:47:37 nattytest kernel: [ 182.830096] synchronize_sched() in 101470 us May 8 09:47:37 nattytest kernel: [ 182.910081] synchronize_sched() in 78413 us May 8 09:47:37 nattytest kernel: [ 182.971056] synchronize_sched() in 43 us May 8 09:47:37 nattytest kernel: [ 182.972568] synchronize_sched() in 44 us May 8 09:47:37 nattytest kernel: [ 183.100084] synchronize_sched() in 60854 us May 8 09:47:37 nattytest kernel: [ 183.170097] synchronize_sched() in 68450 us May 8 09:47:37 nattytest kernel: [ 183.221081] synchronize_sched() in 40 us May 8 09:47:37 nattytest kernel: [ 183.222580] synchronize_sched() in 29 us May 8 09:47:38 nattytest kernel: [ 183.340082] synchronize_sched() in 61422 us May 8 09:47:38 nattytest kernel: [ 183.420078] synchronize_sched() in 79154 us May 8 09:47:38 nattytest kernel: [ 183.471003] synchronize_sched() in 41 us May 8 09:47:38 nattytest kernel: [ 183.472469] synchronize_sched() in 29 us May 8 09:47:38 nattytest kernel: [ 183.590095] synchronize_sched() in 61591 us May 8 09:47:38 nattytest kernel: [ 183.670069] synchronize_sched() in 78533 us May 8 09:47:38 nattytest kernel: [ 183.721106] synchronize_sched() in 43 us May 8 09:47:38 nattytest kernel: [ 183.722663] synchronize_sched() in 49 us May 8 09:47:38 nattytest kernel: [ 183.850094] synchronize_sched() in 71060 us May 8 09:47:38 nattytest kernel: [ 183.930080] synchronize_sched() in 78522 us May 8 09:47:38 nattytest kernel: [ 183.981040] synchronize_sched() in 44 us May 8 09:47:38 nattytest kernel: [ 183.990375] synchronize_sched() in 7913 us May 8 09:47:38 nattytest kernel: [ 184.150080] synchronize_sched() in 81568 us May 8 09:47:38 nattytest kernel: [ 184.220088] synchronize_sched() in 68481 us May 8 09:47:39 nattytest kernel: [ 184.281161] synchronize_sched() in 65 us May 8 09:47:39 nattytest kernel: [ 184.282733] synchronize_sched() in 60 us May 8 09:47:39 nattytest kernel: [ 184.400079] synchronize_sched() in 61791 us May 8 09:47:39 nattytest kernel: [ 184.480104] synchronize_sched() in 79181 us May 8 09:47:39 nattytest kernel: [ 184.550103] synchronize_sched() in 19063 us May 8 09:47:39 nattytest kernel: [ 184.551621] synchronize_sched() in 30 us May 8 09:47:39 nattytest kernel: [ 184.680083] synchronize_sched() in 71800 us May 8 09:47:39 nattytest kernel: [ 184.750084] synchronize_sched() in 69162 us May 8 09:47:39 nattytest kernel: [ 184.801153] synchronize_sched() in 51 us May 8 09:47:39 nattytest kernel: [ 184.802802] synchronize_sched() in 53 us May 8 09:47:39 nattytest kernel: [ 184.920091] synchronize_sched() in 61217 us May 8 09:47:39 nattytest kernel: [ 185.000074] synchronize_sched() in 79141 us May 8 09:47:39 nattytest kernel: [ 185.050985] synchronize_sched() in 40 us May 8 09:47:39 nattytest kernel: [ 185.052727] synchronize_sched() in 47 us May 8 09:47:39 nattytest kernel: [ 185.170087] synchronize_sched() in 62802 us May 8 09:47:40 nattytest kernel: [ 185.250074] synchronize_sched() in 78418 us May 8 09:47:40 nattytest kernel: [ 185.311022] synchronize_sched() in 40 us May 8 09:47:40 nattytest kernel: [ 185.312542] synchronize_sched() in 63 us May 8 09:47:40 nattytest kernel: [ 185.430111] synchronize_sched() in 61894 us May 8 09:47:40 nattytest kernel: [ 185.510073] synchronize_sched() in 78410 us May 8 09:47:40 nattytest kernel: [ 185.561055] synchronize_sched() in 58 us May 8 09:47:40 nattytest kernel: [ 185.562589] synchronize_sched() in 45 us May 8 09:47:40 nattytest kernel: [ 185.680091] synchronize_sched() in 62023 us May 8 09:47:40 nattytest kernel: [ 185.760076] synchronize_sched() in 78438 us May 8 09:47:40 nattytest kernel: [ 185.811063] synchronize_sched() in 45 us May 8 09:47:40 nattytest kernel: [ 185.812580] synchronize_sched() in 52 us May 8 09:47:40 nattytest kernel: [ 185.930089] synchronize_sched() in 61864 us May 8 09:47:40 nattytest kernel: [ 186.010072] synchronize_sched() in 78465 us May 8 09:47:40 nattytest kernel: [ 186.061141] synchronize_sched() in 44 us May 8 09:47:40 nattytest kernel: [ 186.062631] synchronize_sched() in 29 us May 8 09:47:40 nattytest kernel: [ 186.180078] synchronize_sched() in 61806 us May 8 09:47:41 nattytest kernel: [ 186.260074] synchronize_sched() in 78415 us May 8 09:47:41 nattytest kernel: [ 186.311073] synchronize_sched() in 43 us May 8 09:47:41 nattytest kernel: [ 186.312573] synchronize_sched() in 29 us May 8 09:47:41 nattytest kernel: [ 186.440083] synchronize_sched() in 72358 us May 8 09:47:41 nattytest kernel: [ 186.520071] synchronize_sched() in 78395 us May 8 09:47:41 nattytest kernel: [ 186.573068] synchronize_sched() in 66 us May 8 09:47:41 nattytest kernel: [ 186.590105] synchronize_sched() in 15551 us May 8 09:47:41 nattytest kernel: [ 186.740092] synchronize_sched() in 71884 us May 8 09:47:41 nattytest kernel: [ 186.830069] synchronize_sched() in 89123 us May 8 09:47:41 nattytest kernel: [ 186.890967] synchronize_sched() in 41 us May 8 09:47:41 nattytest kernel: [ 186.910091] synchronize_sched() in 18639 us May 8 09:47:41 nattytest kernel: [ 187.070078] synchronize_sched() in 82847 us May 8 09:47:41 nattytest kernel: [ 187.140090] synchronize_sched() in 68499 us May 8 09:47:41 nattytest kernel: [ 187.191148] synchronize_sched() in 57 us May 8 09:47:41 nattytest kernel: [ 187.192743] synchronize_sched() in 69 us May 8 09:47:42 nattytest kernel: [ 187.310087] synchronize_sched() in 63069 us May 8 09:47:42 nattytest kernel: [ 187.400076] synchronize_sched() in 88432 us May 8 09:47:42 nattytest kernel: [ 187.461521] synchronize_sched() in 57 us May 8 09:47:42 nattytest kernel: [ 187.480099] synchronize_sched() in 16493 us May 8 09:47:42 nattytest kernel: [ 187.640081] synchronize_sched() in 82269 us May 8 09:47:42 nattytest kernel: [ 187.710079] synchronize_sched() in 68389 us May 8 09:47:42 nattytest kernel: [ 187.761071] synchronize_sched() in 38 us May 8 09:47:42 nattytest kernel: [ 187.762623] synchronize_sched() in 67 us May 8 09:47:42 nattytest kernel: [ 187.880077] synchronize_sched() in 61847 us May 8 09:47:42 nattytest kernel: [ 187.960096] synchronize_sched() in 78450 us May 8 09:47:42 nattytest kernel: [ 188.011147] synchronize_sched() in 62 us May 8 09:47:42 nattytest kernel: [ 188.012617] synchronize_sched() in 29 us May 8 09:47:42 nattytest kernel: [ 188.130083] synchronize_sched() in 62431 us May 8 09:47:42 nattytest kernel: [ 188.210090] synchronize_sched() in 78447 us May 8 09:47:43 nattytest kernel: [ 188.261015] synchronize_sched() in 38 us May 8 09:47:43 nattytest kernel: [ 188.262491] synchronize_sched() in 29 us May 8 09:47:43 nattytest kernel: [ 188.380097] synchronize_sched() in 61923 us May 8 09:47:43 nattytest kernel: [ 188.470131] synchronize_sched() in 88521 us May 8 09:47:43 nattytest kernel: [ 188.531024] synchronize_sched() in 46 us May 8 09:47:43 nattytest kernel: [ 188.550087] synchronize_sched() in 17676 us May 8 09:47:43 nattytest kernel: [ 188.690085] synchronize_sched() in 71911 us May 8 09:47:43 nattytest kernel: [ 188.820071] synchronize_sched() in 128473 us May 8 09:47:43 nattytest kernel: [ 188.881032] synchronize_sched() in 42 us May 8 09:47:43 nattytest kernel: [ 188.900085] synchronize_sched() in 17588 us May 8 09:47:43 nattytest kernel: [ 189.080084] synchronize_sched() in 111945 us May 8 09:47:43 nattytest kernel: [ 189.170068] synchronize_sched() in 88449 us May 8 09:47:43 nattytest kernel: [ 189.221066] synchronize_sched() in 41 us May 8 09:47:44 nattytest kernel: [ 189.230257] synchronize_sched() in 7463 us May 8 09:47:44 nattytest kernel: [ 189.380093] synchronize_sched() in 71896 us May 8 09:47:44 nattytest kernel: [ 189.470093] synchronize_sched() in 88481 us May 8 09:47:44 nattytest kernel: [ 189.550883] synchronize_sched() in 43 us May 8 09:47:44 nattytest kernel: [ 189.552391] synchronize_sched() in 29 us May 8 09:47:44 nattytest kernel: [ 189.670085] synchronize_sched() in 61850 us May 8 09:47:44 nattytest kernel: [ 189.760080] synchronize_sched() in 88549 us May 8 09:47:44 nattytest kernel: [ 189.821014] synchronize_sched() in 44 us May 8 09:47:44 nattytest kernel: [ 189.850080] synchronize_sched() in 27628 us May 8 09:47:44 nattytest kernel: [ 189.990087] synchronize_sched() in 72251 us May 8 09:47:44 nattytest kernel: [ 190.080074] synchronize_sched() in 88486 us May 8 09:47:44 nattytest kernel: [ 190.150968] synchronize_sched() in 71 us May 8 09:47:44 nattytest kernel: [ 190.170092] synchronize_sched() in 17672 us May 8 09:47:45 nattytest kernel: [ 190.320088] synchronize_sched() in 71937 us May 8 09:47:45 nattytest kernel: [ 190.410068] synchronize_sched() in 89155 us May 8 09:47:45 nattytest kernel: [ 190.490882] synchronize_sched() in 40 us May 8 09:47:45 nattytest kernel: [ 190.492405] synchronize_sched() in 29 us May 8 09:47:45 nattytest kernel: [ 190.620103] synchronize_sched() in 62808 us May 8 09:47:45 nattytest kernel: [ 190.700108] synchronize_sched() in 78486 us May 8 09:47:45 nattytest kernel: [ 190.820119] synchronize_sched() in 69075 us May 8 09:47:45 nattytest kernel: [ 190.870096] synchronize_sched() in 48447 us May 8 09:47:45 nattytest kernel: [ 191.000082] synchronize_sched() in 62795 us May 8 09:47:45 nattytest kernel: [ 191.080082] synchronize_sched() in 78489 us May 8 09:47:45 nattytest kernel: [ 191.141021] synchronize_sched() in 61 us May 8 09:47:45 nattytest kernel: [ 191.160086] synchronize_sched() in 17596 us May 8 09:47:46 nattytest kernel: [ 191.310097] synchronize_sched() in 82089 us May 8 09:47:46 nattytest kernel: [ 191.390076] synchronize_sched() in 78427 us May 8 09:47:46 nattytest kernel: [ 191.451045] synchronize_sched() in 49 us May 8 09:47:46 nattytest kernel: [ 191.470088] synchronize_sched() in 17643 us May 8 09:47:46 nattytest kernel: [ 191.620095] synchronize_sched() in 72871 us May 8 09:47:46 nattytest kernel: [ 191.700072] synchronize_sched() in 79137 us May 8 09:47:46 nattytest kernel: [ 191.751087] synchronize_sched() in 40 us May 8 09:47:46 nattytest kernel: [ 191.752600] synchronize_sched() in 48 us May 8 09:47:46 nattytest kernel: [ 191.880092] synchronize_sched() in 72297 us May 8 09:47:46 nattytest kernel: [ 191.950076] synchronize_sched() in 69142 us May 8 09:47:46 nattytest kernel: [ 192.005349] synchronize_sched() in 54 us May 8 09:47:46 nattytest kernel: [ 192.006834] synchronize_sched() in 30 us May 8 09:47:46 nattytest kernel: [ 192.130091] synchronize_sched() in 72839 us May 8 09:47:46 nattytest kernel: [ 192.210069] synchronize_sched() in 78535 us May 8 09:47:47 nattytest kernel: [ 192.270973] synchronize_sched() in 43 us May 8 09:47:47 nattytest kernel: [ 192.300094] synchronize_sched() in 27592 us May 8 09:47:47 nattytest kernel: [ 192.450079] synchronize_sched() in 83072 us May 8 09:47:47 nattytest kernel: [ 192.520073] synchronize_sched() in 68499 us May 8 09:47:47 nattytest kernel: [ 192.571106] synchronize_sched() in 44 us May 8 09:47:47 nattytest kernel: [ 192.572585] synchronize_sched() in 29 us May 8 09:47:47 nattytest kernel: [ 192.700080] synchronize_sched() in 62670 us May 8 09:47:47 nattytest kernel: [ 192.820084] synchronize_sched() in 118509 us May 8 09:47:47 nattytest kernel: [ 192.871107] synchronize_sched() in 69 us May 8 09:47:47 nattytest kernel: [ 192.872611] synchronize_sched() in 29 us May 8 09:47:47 nattytest kernel: [ 192.990110] synchronize_sched() in 62742 us May 8 09:47:47 nattytest kernel: [ 193.080070] synchronize_sched() in 89147 us May 8 09:47:47 nattytest kernel: [ 193.150962] synchronize_sched() in 65 us May 8 09:47:47 nattytest kernel: [ 193.160285] synchronize_sched() in 8879 us May 8 09:47:48 nattytest kernel: [ 193.300097] synchronize_sched() in 72753 us May 8 09:47:48 nattytest kernel: [ 193.380075] synchronize_sched() in 78472 us May 8 09:47:48 nattytest kernel: [ 193.431029] synchronize_sched() in 41 us May 8 09:47:48 nattytest kernel: [ 193.432518] synchronize_sched() in 30 us May 8 09:47:48 nattytest kernel: [ 193.560081] synchronize_sched() in 72448 us May 8 09:47:48 nattytest kernel: [ 193.670073] synchronize_sched() in 108452 us May 8 09:47:48 nattytest kernel: [ 193.721098] synchronize_sched() in 49 us May 8 09:47:48 nattytest kernel: [ 193.722821] synchronize_sched() in 30 us May 8 09:47:48 nattytest kernel: [ 193.840075] synchronize_sched() in 62317 us May 8 09:47:48 nattytest kernel: [ 193.920070] synchronize_sched() in 78472 us May 8 09:47:48 nattytest kernel: [ 193.971073] synchronize_sched() in 43 us May 8 09:47:48 nattytest kernel: [ 193.972550] synchronize_sched() in 29 us May 8 09:47:48 nattytest kernel: [ 194.100088] synchronize_sched() in 72391 us May 8 09:47:48 nattytest kernel: [ 194.170082] synchronize_sched() in 69202 us May 8 09:47:48 nattytest kernel: [ 194.221053] synchronize_sched() in 40 us May 8 09:47:48 nattytest kernel: [ 194.222543] synchronize_sched() in 29 us May 8 09:47:49 nattytest kernel: [ 194.350082] synchronize_sched() in 72874 us May 8 09:47:49 nattytest kernel: [ 194.430068] synchronize_sched() in 78536 us May 8 09:47:49 nattytest kernel: [ 194.481167] synchronize_sched() in 70 us May 8 09:47:49 nattytest kernel: [ 194.500091] synchronize_sched() in 17486 us May 8 09:47:49 nattytest kernel: [ 194.640088] synchronize_sched() in 73012 us May 8 09:47:49 nattytest kernel: [ 194.730069] synchronize_sched() in 88519 us May 8 09:47:49 nattytest kernel: [ 194.781117] synchronize_sched() in 46 us May 8 09:47:49 nattytest kernel: [ 194.792950] synchronize_sched() in 10384 us May 8 09:47:49 nattytest kernel: [ 194.910126] synchronize_sched() in 63131 us May 8 09:47:49 nattytest kernel: [ 195.000088] synchronize_sched() in 88453 us May 8 09:47:49 nattytest kernel: [ 195.160080] synchronize_sched() in 79582 us May 8 09:47:50 nattytest kernel: [ 195.240073] synchronize_sched() in 79132 us May 8 09:47:50 nattytest kernel: [ 195.360124] synchronize_sched() in 62368 us May 8 09:47:50 nattytest kernel: [ 195.440096] synchronize_sched() in 79148 us May 8 09:47:50 nattytest kernel: [ 195.491063] synchronize_sched() in 53 us May 8 09:47:50 nattytest kernel: [ 195.492182] synchronize_sched() in 29 us May 8 09:47:50 nattytest kernel: [ 195.610085] synchronize_sched() in 62444 us May 8 09:47:50 nattytest kernel: [ 195.700078] synchronize_sched() in 89170 us May 8 09:47:50 nattytest kernel: [ 195.761011] synchronize_sched() in 57 us May 8 09:47:50 nattytest kernel: [ 195.780137] synchronize_sched() in 18587 us May 8 09:47:50 nattytest kernel: [ 195.920080] synchronize_sched() in 73194 us May 8 09:47:50 nattytest kernel: [ 196.000073] synchronize_sched() in 78456 us May 8 09:47:50 nattytest kernel: [ 196.051067] synchronize_sched() in 42 us May 8 09:47:50 nattytest kernel: [ 196.052613] synchronize_sched() in 44 us May 8 09:47:50 nattytest kernel: [ 196.170080] synchronize_sched() in 63040 us May 8 09:47:51 nattytest kernel: [ 196.250075] synchronize_sched() in 78420 us May 8 09:47:51 nattytest kernel: [ 196.301051] synchronize_sched() in 46 us May 8 09:47:51 nattytest kernel: [ 196.302527] synchronize_sched() in 29 us May 8 09:47:51 nattytest kernel: [ 196.420095] synchronize_sched() in 62701 us May 8 09:47:51 nattytest kernel: [ 196.500064] synchronize_sched() in 78448 us May 8 09:47:51 nattytest kernel: [ 196.551104] synchronize_sched() in 48 us May 8 09:47:51 nattytest kernel: [ 196.552631] synchronize_sched() in 49 us May 8 09:47:51 nattytest kernel: [ 196.670084] synchronize_sched() in 63111 us May 8 09:47:51 nattytest kernel: [ 196.750095] synchronize_sched() in 78464 us May 8 09:47:51 nattytest kernel: [ 196.801082] synchronize_sched() in 60 us May 8 09:47:51 nattytest kernel: [ 196.802616] synchronize_sched() in 43 us May 8 09:47:51 nattytest kernel: [ 196.930124] synchronize_sched() in 73010 us May 8 09:47:51 nattytest kernel: [ 197.010084] synchronize_sched() in 78359 us May 8 09:47:51 nattytest kernel: [ 197.071137] synchronize_sched() in 79 us May 8 09:47:51 nattytest kernel: [ 197.080320] synchronize_sched() in 7773 us May 8 09:47:52 nattytest kernel: [ 197.230086] synchronize_sched() in 83245 us May 8 09:47:52 nattytest kernel: [ 197.310075] synchronize_sched() in 78509 us May 8 09:47:52 nattytest kernel: [ 197.361050] synchronize_sched() in 44 us May 8 09:47:52 nattytest kernel: [ 197.380084] synchronize_sched() in 17591 us May 8 09:47:52 nattytest kernel: [ 197.530084] synchronize_sched() in 82999 us May 8 09:47:52 nattytest kernel: [ 197.610080] synchronize_sched() in 79162 us May 8 09:47:52 nattytest kernel: [ 197.680937] synchronize_sched() in 65 us May 8 09:47:52 nattytest kernel: [ 197.700096] synchronize_sched() in 18620 us May 8 09:47:52 nattytest kernel: [ 197.850141] synchronize_sched() in 73601 us May 8 09:47:52 nattytest kernel: [ 197.930091] synchronize_sched() in 78393 us May 8 09:47:52 nattytest kernel: [ 197.981050] synchronize_sched() in 39 us May 8 09:47:52 nattytest kernel: [ 197.982788] synchronize_sched() in 29 us May 8 09:47:52 nattytest kernel: [ 198.110084] synchronize_sched() in 72893 us May 8 09:47:52 nattytest kernel: [ 198.180072] synchronize_sched() in 69151 us May 8 09:47:53 nattytest kernel: [ 198.231098] synchronize_sched() in 50 us May 8 09:47:53 nattytest kernel: [ 198.232679] synchronize_sched() in 46 us May 8 09:47:53 nattytest kernel: [ 198.370084] synchronize_sched() in 72935 us May 8 09:47:53 nattytest kernel: [ 198.450068] synchronize_sched() in 78461 us May 8 09:47:53 nattytest kernel: [ 198.511077] synchronize_sched() in 77 us May 8 09:47:53 nattytest kernel: [ 198.530094] synchronize_sched() in 17512 us May 8 09:47:53 nattytest kernel: [ 198.670095] synchronize_sched() in 83122 us May 8 09:47:53 nattytest kernel: [ 198.740079] synchronize_sched() in 68453 us May 8 09:47:53 nattytest kernel: [ 198.801121] synchronize_sched() in 48 us May 8 09:47:53 nattytest kernel: [ 198.802600] synchronize_sched() in 29 us May 8 09:47:53 nattytest kernel: [ 198.920100] synchronize_sched() in 63392 us May 8 09:47:53 nattytest kernel: [ 199.000088] synchronize_sched() in 78482 us May 8 09:47:53 nattytest kernel: [ 199.070099] synchronize_sched() in 19013 us May 8 09:47:53 nattytest kernel: [ 199.071626] synchronize_sched() in 48 us May 8 09:47:53 nattytest kernel: [ 199.190083] synchronize_sched() in 63245 us May 8 09:47:54 nattytest kernel: [ 199.270075] synchronize_sched() in 78491 us May 8 09:47:54 nattytest kernel: [ 199.321088] synchronize_sched() in 42 us May 8 09:47:54 nattytest kernel: [ 199.322589] synchronize_sched() in 29 us May 8 09:47:54 nattytest kernel: [ 199.440101] synchronize_sched() in 63063 us May 8 09:47:54 nattytest kernel: [ 199.520073] synchronize_sched() in 78463 us May 8 09:47:54 nattytest kernel: [ 199.571056] synchronize_sched() in 49 us May 8 09:47:54 nattytest kernel: [ 199.572488] synchronize_sched() in 29 us May 8 09:47:54 nattytest kernel: [ 199.700083] synchronize_sched() in 73220 us May 8 09:47:54 nattytest kernel: [ 199.770096] synchronize_sched() in 68508 us May 8 09:47:54 nattytest kernel: [ 199.821090] synchronize_sched() in 50 us May 8 09:47:54 nattytest kernel: [ 199.822540] synchronize_sched() in 29 us May 8 09:47:54 nattytest kernel: [ 199.940084] synchronize_sched() in 63681 us May 8 09:47:54 nattytest kernel: [ 200.020067] synchronize_sched() in 78451 us May 8 09:47:54 nattytest kernel: [ 200.071095] synchronize_sched() in 41 us May 8 09:47:54 nattytest kernel: [ 200.072566] synchronize_sched() in 29 us May 8 09:47:54 nattytest kernel: [ 200.190089] synchronize_sched() in 63584 us May 8 09:47:55 nattytest kernel: [ 200.280076] synchronize_sched() in 89123 us May 8 09:47:55 nattytest kernel: [ 200.341011] synchronize_sched() in 61 us May 8 09:47:55 nattytest kernel: [ 200.360094] synchronize_sched() in 18527 us May 8 09:47:55 nattytest kernel: [ 200.520085] synchronize_sched() in 94129 us May 8 09:47:55 nattytest kernel: [ 200.670072] synchronize_sched() in 148512 us May 8 09:47:55 nattytest kernel: [ 200.730983] synchronize_sched() in 41 us May 8 09:47:55 nattytest kernel: [ 200.750355] synchronize_sched() in 17917 us May 8 09:47:55 nattytest kernel: [ 200.890082] synchronize_sched() in 83406 us May 8 09:47:55 nattytest kernel: [ 200.970072] synchronize_sched() in 78439 us May 8 09:47:55 nattytest kernel: [ 201.031034] synchronize_sched() in 45 us May 8 09:47:55 nattytest kernel: [ 201.050090] synchronize_sched() in 17555 us May 8 09:47:55 nattytest kernel: [ 201.200090] synchronize_sched() in 83318 us May 8 09:47:56 nattytest kernel: [ 201.270075] synchronize_sched() in 68468 us May 8 09:47:56 nattytest kernel: [ 201.330983] synchronize_sched() in 41 us May 8 09:47:56 nattytest kernel: [ 201.332487] synchronize_sched() in 62 us May 8 09:47:56 nattytest kernel: [ 201.450088] synchronize_sched() in 53772 us May 8 09:47:56 nattytest kernel: [ 201.530075] synchronize_sched() in 78437 us May 8 09:47:56 nattytest kernel: [ 201.590986] synchronize_sched() in 46 us May 8 09:47:56 nattytest kernel: [ 201.592134] synchronize_sched() in 29 us May 8 09:47:56 nattytest kernel: [ 201.710085] synchronize_sched() in 63551 us May 8 09:47:56 nattytest kernel: [ 201.830146] synchronize_sched() in 118553 us May 8 09:47:56 nattytest kernel: [ 201.891143] synchronize_sched() in 50 us May 8 09:47:56 nattytest kernel: [ 201.892712] synchronize_sched() in 55 us May 8 09:47:56 nattytest kernel: [ 202.010092] synchronize_sched() in 63446 us May 8 09:47:56 nattytest kernel: [ 202.090126] synchronize_sched() in 78477 us May 8 09:47:56 nattytest kernel: [ 202.141120] synchronize_sched() in 45 us May 8 09:47:56 nattytest kernel: [ 202.142867] synchronize_sched() in 30 us May 8 09:47:57 nattytest kernel: [ 202.260092] synchronize_sched() in 63271 us May 8 09:47:57 nattytest kernel: [ 202.340071] synchronize_sched() in 78474 us May 8 09:47:57 nattytest kernel: [ 202.391094] synchronize_sched() in 44 us May 8 09:47:57 nattytest kernel: [ 202.392581] synchronize_sched() in 29 us May 8 09:47:57 nattytest kernel: [ 202.510110] synchronize_sched() in 63718 us May 8 09:47:57 nattytest kernel: [ 202.600092] synchronize_sched() in 88434 us May 8 09:47:57 nattytest kernel: [ 202.661075] synchronize_sched() in 46 us May 8 09:47:57 nattytest kernel: [ 202.680091] synchronize_sched() in 17516 us May 8 09:47:57 nattytest kernel: [ 202.820083] synchronize_sched() in 83733 us May 8 09:47:57 nattytest kernel: [ 202.900067] synchronize_sched() in 78478 us May 8 09:47:57 nattytest kernel: [ 202.951062] synchronize_sched() in 43 us May 8 09:47:57 nattytest kernel: [ 202.960445] synchronize_sched() in 7921 us May 8 09:47:57 nattytest kernel: [ 203.100080] synchronize_sched() in 83496 us May 8 09:47:57 nattytest kernel: [ 203.170084] synchronize_sched() in 68474 us May 8 09:47:57 nattytest kernel: [ 203.221083] synchronize_sched() in 41 us May 8 09:47:57 nattytest kernel: [ 203.222692] synchronize_sched() in 68 us May 8 09:47:58 nattytest kernel: [ 203.350083] synchronize_sched() in 74062 us May 8 09:47:58 nattytest kernel: [ 203.430100] synchronize_sched() in 78468 us May 8 09:47:58 nattytest kernel: [ 203.491028] synchronize_sched() in 52 us May 8 09:47:58 nattytest kernel: [ 203.510086] synchronize_sched() in 17607 us May 8 09:47:58 nattytest kernel: [ 203.670080] synchronize_sched() in 83798 us May 8 09:47:58 nattytest kernel: [ 203.750080] synchronize_sched() in 78382 us May 8 09:47:58 nattytest kernel: [ 203.821032] synchronize_sched() in 48 us May 8 09:47:58 nattytest kernel: [ 203.840093] synchronize_sched() in 17608 us May 8 09:47:58 nattytest kernel: [ 203.990084] synchronize_sched() in 73664 us May 8 09:47:58 nattytest kernel: [ 204.080078] synchronize_sched() in 88391 us May 8 09:47:58 nattytest kernel: [ 204.141021] synchronize_sched() in 44 us May 8 09:47:58 nattytest kernel: [ 204.160083] synchronize_sched() in 17671 us May 8 09:47:59 nattytest kernel: [ 204.310086] synchronize_sched() in 83288 us May 8 09:47:59 nattytest kernel: [ 204.380074] synchronize_sched() in 69168 us May 8 09:47:59 nattytest kernel: [ 204.431059] synchronize_sched() in 67 us May 8 09:47:59 nattytest kernel: [ 204.432205] synchronize_sched() in 30 us May 8 09:47:59 nattytest kernel: [ 204.550147] synchronize_sched() in 54218 us May 8 09:47:59 nattytest kernel: [ 204.640079] synchronize_sched() in 88401 us May 8 09:47:59 nattytest kernel: [ 204.701062] synchronize_sched() in 48 us May 8 09:47:59 nattytest kernel: [ 204.720093] synchronize_sched() in 17508 us May 8 09:47:59 nattytest kernel: [ 204.850108] synchronize_sched() in 73968 us May 8 09:47:59 nattytest kernel: [ 204.940074] synchronize_sched() in 88489 us May 8 09:47:59 nattytest kernel: [ 205.011002] synchronize_sched() in 44 us May 8 09:47:59 nattytest kernel: [ 205.030104] synchronize_sched() in 17658 us May 8 09:47:59 nattytest kernel: [ 205.190091] synchronize_sched() in 83651 us May 8 09:48:00 nattytest kernel: [ 205.270094] synchronize_sched() in 78501 us May 8 09:48:00 nattytest kernel: [ 205.341000] synchronize_sched() in 65 us May 8 09:48:00 nattytest kernel: [ 205.360081] synchronize_sched() in 17640 us ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 13:13 ` Alex Bligh @ 2011-05-08 13:44 ` Paul E. McKenney 2011-05-08 14:27 ` Alex Bligh 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2011-05-08 13:44 UTC (permalink / raw) To: Alex Bligh; +Cc: Eric Dumazet, netdev On Sun, May 08, 2011 at 02:13:30PM +0100, Alex Bligh wrote: > > >>The fact that increasing HZ masks the problem seems to imply that > >>sychronize_sched() is waiting when it shouldn't be, as it suggests > >>it's waiting for a context switch. But surely it shouldn't be > >>waiting for context switch if all other cpu cores are idle? > >>It knows that it (the caller) doesn't hold an rcu_read_lock, > >>and presumably can see the other cpus are in the idle state, > >>in which case surely it should return immediately? Distribution > >>of latency in synchronize_sched() looks like this: > >> > >>20-49 us 110 instances (27.500%) > >>50-99 us 45 instances (11.250%) > > > >Really? I am having a hard time believing this above two. Is this really > >2000-4999 us and 5000-9999 us? That would be much more believable, > >and expected on a busy system with lots of context switching. Or on a > >system with CONFIG_NO_HZ=n. > > Yes, really 20-49us and 50-99us, not ms. Raw data attached :-) > > I'm guessing there are circumstances where there is an early exit. Well, if you were onlining and offlining CPUs, then if there was only one CPU online, this could happen. And there really is only one CPU online during boot, so if your measurements included early boot time, this could easily explain these very short timings. > $ fgrep HZ .config > CONFIG_RCU_FAST_NO_HZ=y > CONFIG_NO_HZ=y > CONFIG_HZ_100=y > # CONFIG_HZ_250 is not set > # CONFIG_HZ_300 is not set > # CONFIG_HZ_1000 is not set > CONFIG_HZ=100 > # CONFIG_MACHZ_WDT is not set > > > > >>5000-9999 us 5 instances (1.250%) > > > >This makes sense for a mostly-idle system with frequent short bursts > >of work. > > > >>10000-19999 us 33 instances (8.250%) > > > >This makes sense for a CONFIG_NO_HZ system that is idle, where there > >is some amount of background work that is also using RCU grace periods. > > > >>20000-49999 us 4 instances (1.000%) > >>50000-99999 us 191 instances (47.750%) > >>100000-199999 us 12 instances (3.000%) > > > >These last involve additional delays. Possibilities include long-running > >irq handlers, SMIs, or NMIs. > > There is nothing much going on these systems (idle, no other users, > just normal system daemons). And normal system daemons might cause this, right? > Note this is with CONFIG_HZ=100 (which is the Ubuntu default). I wonder > if that means you need to scale your expactations by a factor if 10 (i.e. > you'd expect 50000-99999 for "a mostly-idle system with frequent > short bursts of work." Indeed, I was thinking in terms of HZ=1000. Thanx, Paul > Interestingly there are quite a few of these on boot up. If this > is a CONFIG_HZ thing, then setting it to 1000 would probably take > half a second off bootup time. > > > -- > Alex Bligh > > > May 8 09:47:31 nattytest kernel: [ 177.030197] synchronize_sched() > in 66921 us > May 8 09:47:31 nattytest kernel: [ 177.120085] synchronize_sched() > in 89080 us > May 8 09:47:31 nattytest kernel: [ 177.190910] synchronize_sched() > in 46 us > May 8 09:47:31 nattytest kernel: [ 177.210087] synchronize_sched() > in 18686 us > May 8 09:47:32 nattytest kernel: [ 177.400080] synchronize_sched() > in 110609 us > May 8 09:47:32 nattytest kernel: [ 177.480071] synchronize_sched() > in 78550 us > May 8 09:47:32 nattytest kernel: [ 177.550119] synchronize_sched() > in 19087 us > May 8 09:47:32 nattytest kernel: [ 177.551761] synchronize_sched() > in 79 us > May 8 09:47:32 nattytest kernel: [ 177.670085] synchronize_sched() > in 60616 us > May 8 09:47:32 nattytest kernel: [ 177.760075] synchronize_sched() > in 88465 us > May 8 09:47:32 nattytest kernel: [ 177.820999] synchronize_sched() > in 45 us > May 8 09:47:32 nattytest kernel: [ 177.840086] synchronize_sched() > in 17579 us > May 8 09:47:32 nattytest kernel: [ 177.970084] synchronize_sched() > in 70925 us > May 8 09:47:32 nattytest kernel: [ 178.050092] synchronize_sched() > in 78477 us > May 8 09:47:32 nattytest kernel: [ 178.101046] synchronize_sched() > in 39 us > May 8 09:47:32 nattytest kernel: [ 178.102574] synchronize_sched() > in 48 us > May 8 09:47:33 nattytest kernel: [ 178.230082] synchronize_sched() > in 70966 us > May 8 09:47:33 nattytest kernel: [ 178.310071] synchronize_sched() > in 78503 us > May 8 09:47:33 nattytest kernel: [ 178.371056] synchronize_sched() > in 43 us > May 8 09:47:33 nattytest kernel: [ 178.390094] synchronize_sched() > in 17553 us > May 8 09:47:33 nattytest kernel: [ 178.540084] synchronize_sched() > in 70533 us > May 8 09:47:33 nattytest kernel: [ 178.620074] synchronize_sched() > in 79172 us > May 8 09:47:33 nattytest kernel: [ 178.671124] synchronize_sched() > in 57 us > May 8 09:47:33 nattytest kernel: [ 178.672645] synchronize_sched() > in 49 us > May 8 09:47:33 nattytest kernel: [ 178.830106] synchronize_sched() > in 100955 us > May 8 09:47:33 nattytest kernel: [ 178.910084] synchronize_sched() > in 78408 us > May 8 09:47:33 nattytest kernel: [ 178.961090] synchronize_sched() > in 43 us > May 8 09:47:33 nattytest kernel: [ 178.962642] synchronize_sched() > in 50 us > May 8 09:47:33 nattytest kernel: [ 179.090088] synchronize_sched() > in 71354 us > May 8 09:47:33 nattytest kernel: [ 179.220071] synchronize_sched() > in 128463 us > May 8 09:47:34 nattytest kernel: [ 179.300867] synchronize_sched() > in 52 us > May 8 09:47:34 nattytest kernel: [ 179.302079] synchronize_sched() > in 48 us > May 8 09:47:34 nattytest kernel: [ 179.420085] synchronize_sched() > in 61562 us > May 8 09:47:34 nattytest kernel: [ 179.500093] synchronize_sched() > in 78506 us > May 8 09:47:34 nattytest kernel: [ 179.551048] synchronize_sched() > in 42 us > May 8 09:47:34 nattytest kernel: [ 179.552637] synchronize_sched() > in 51 us > May 8 09:47:34 nattytest kernel: [ 179.670088] synchronize_sched() > in 61469 us > May 8 09:47:34 nattytest kernel: [ 179.760078] synchronize_sched() > in 88430 us > May 8 09:47:34 nattytest kernel: [ 179.830918] synchronize_sched() > in 44 us > May 8 09:47:34 nattytest kernel: [ 179.850173] synchronize_sched() > in 17783 us > May 8 09:47:34 nattytest kernel: [ 180.080085] synchronize_sched() > in 150585 us > May 8 09:47:34 nattytest kernel: [ 180.160082] synchronize_sched() > in 78381 us > May 8 09:47:34 nattytest kernel: [ 180.211027] synchronize_sched() > in 39 us > May 8 09:47:34 nattytest kernel: [ 180.212186] synchronize_sched() > in 52 us > May 8 09:47:35 nattytest kernel: [ 180.320091] synchronize_sched() > in 50520 us > May 8 09:47:35 nattytest kernel: [ 180.400074] synchronize_sched() > in 78525 us > May 8 09:47:35 nattytest kernel: [ 180.451028] synchronize_sched() > in 50 us > May 8 09:47:35 nattytest kernel: [ 180.452100] synchronize_sched() > in 30 us > May 8 09:47:35 nattytest kernel: [ 180.570086] synchronize_sched() > in 61211 us > May 8 09:47:35 nattytest kernel: [ 180.650073] synchronize_sched() > in 79145 us > May 8 09:47:35 nattytest kernel: [ 180.701376] synchronize_sched() > in 38 us > May 8 09:47:35 nattytest kernel: [ 180.703490] synchronize_sched() > in 57 us > May 8 09:47:35 nattytest kernel: [ 180.820087] synchronize_sched() > in 61483 us > May 8 09:47:35 nattytest kernel: [ 180.910081] synchronize_sched() > in 88485 us > May 8 09:47:35 nattytest kernel: [ 180.965970] synchronize_sched() > in 49 us > May 8 09:47:35 nattytest kernel: [ 180.990091] synchronize_sched() > in 22425 us > May 8 09:47:35 nattytest kernel: [ 181.150084] synchronize_sched() > in 71317 us > May 8 09:47:36 nattytest kernel: [ 181.230074] synchronize_sched() > in 78470 us > May 8 09:47:36 nattytest kernel: [ 181.300099] synchronize_sched() > in 18988 us > May 8 09:47:36 nattytest kernel: [ 181.301637] synchronize_sched() > in 46 us > May 8 09:47:36 nattytest kernel: [ 181.420086] synchronize_sched() > in 61240 us > May 8 09:47:36 nattytest kernel: [ 181.500072] synchronize_sched() > in 77682 us > May 8 09:47:36 nattytest kernel: [ 181.551059] synchronize_sched() > in 45 us > May 8 09:47:36 nattytest kernel: [ 181.552546] synchronize_sched() > in 29 us > May 8 09:47:36 nattytest kernel: [ 181.670084] synchronize_sched() > in 61612 us > May 8 09:47:36 nattytest kernel: [ 181.750076] synchronize_sched() > in 78513 us > May 8 09:47:36 nattytest kernel: [ 181.801195] synchronize_sched() > in 52 us > May 8 09:47:36 nattytest kernel: [ 181.802805] synchronize_sched() > in 69 us > May 8 09:47:36 nattytest kernel: [ 181.920088] synchronize_sched() > in 61998 us > May 8 09:47:36 nattytest kernel: [ 182.000091] synchronize_sched() > in 78539 us > May 8 09:47:36 nattytest kernel: [ 182.051053] synchronize_sched() > in 42 us > May 8 09:47:36 nattytest kernel: [ 182.052595] synchronize_sched() > in 29 us > May 8 09:47:36 nattytest kernel: [ 182.170103] synchronize_sched() > in 61607 us > May 8 09:47:37 nattytest kernel: [ 182.260072] synchronize_sched() > in 88460 us > May 8 09:47:37 nattytest kernel: [ 182.331028] synchronize_sched() > in 44 us > May 8 09:47:37 nattytest kernel: [ 182.350096] synchronize_sched() > in 17563 us > May 8 09:47:37 nattytest kernel: [ 182.500085] synchronize_sched() > in 71793 us > May 8 09:47:37 nattytest kernel: [ 182.620085] synchronize_sched() > in 118343 us > May 8 09:47:37 nattytest kernel: [ 182.671024] synchronize_sched() > in 39 us > May 8 09:47:37 nattytest kernel: [ 182.672511] synchronize_sched() > in 29 us > May 8 09:47:37 nattytest kernel: [ 182.830096] synchronize_sched() > in 101470 us > May 8 09:47:37 nattytest kernel: [ 182.910081] synchronize_sched() > in 78413 us > May 8 09:47:37 nattytest kernel: [ 182.971056] synchronize_sched() > in 43 us > May 8 09:47:37 nattytest kernel: [ 182.972568] synchronize_sched() > in 44 us > May 8 09:47:37 nattytest kernel: [ 183.100084] synchronize_sched() > in 60854 us > May 8 09:47:37 nattytest kernel: [ 183.170097] synchronize_sched() > in 68450 us > May 8 09:47:37 nattytest kernel: [ 183.221081] synchronize_sched() > in 40 us > May 8 09:47:37 nattytest kernel: [ 183.222580] synchronize_sched() > in 29 us > May 8 09:47:38 nattytest kernel: [ 183.340082] synchronize_sched() > in 61422 us > May 8 09:47:38 nattytest kernel: [ 183.420078] synchronize_sched() > in 79154 us > May 8 09:47:38 nattytest kernel: [ 183.471003] synchronize_sched() > in 41 us > May 8 09:47:38 nattytest kernel: [ 183.472469] synchronize_sched() > in 29 us > May 8 09:47:38 nattytest kernel: [ 183.590095] synchronize_sched() > in 61591 us > May 8 09:47:38 nattytest kernel: [ 183.670069] synchronize_sched() > in 78533 us > May 8 09:47:38 nattytest kernel: [ 183.721106] synchronize_sched() > in 43 us > May 8 09:47:38 nattytest kernel: [ 183.722663] synchronize_sched() > in 49 us > May 8 09:47:38 nattytest kernel: [ 183.850094] synchronize_sched() > in 71060 us > May 8 09:47:38 nattytest kernel: [ 183.930080] synchronize_sched() > in 78522 us > May 8 09:47:38 nattytest kernel: [ 183.981040] synchronize_sched() > in 44 us > May 8 09:47:38 nattytest kernel: [ 183.990375] synchronize_sched() > in 7913 us > May 8 09:47:38 nattytest kernel: [ 184.150080] synchronize_sched() > in 81568 us > May 8 09:47:38 nattytest kernel: [ 184.220088] synchronize_sched() > in 68481 us > May 8 09:47:39 nattytest kernel: [ 184.281161] synchronize_sched() > in 65 us > May 8 09:47:39 nattytest kernel: [ 184.282733] synchronize_sched() > in 60 us > May 8 09:47:39 nattytest kernel: [ 184.400079] synchronize_sched() > in 61791 us > May 8 09:47:39 nattytest kernel: [ 184.480104] synchronize_sched() > in 79181 us > May 8 09:47:39 nattytest kernel: [ 184.550103] synchronize_sched() > in 19063 us > May 8 09:47:39 nattytest kernel: [ 184.551621] synchronize_sched() > in 30 us > May 8 09:47:39 nattytest kernel: [ 184.680083] synchronize_sched() > in 71800 us > May 8 09:47:39 nattytest kernel: [ 184.750084] synchronize_sched() > in 69162 us > May 8 09:47:39 nattytest kernel: [ 184.801153] synchronize_sched() > in 51 us > May 8 09:47:39 nattytest kernel: [ 184.802802] synchronize_sched() > in 53 us > May 8 09:47:39 nattytest kernel: [ 184.920091] synchronize_sched() > in 61217 us > May 8 09:47:39 nattytest kernel: [ 185.000074] synchronize_sched() > in 79141 us > May 8 09:47:39 nattytest kernel: [ 185.050985] synchronize_sched() > in 40 us > May 8 09:47:39 nattytest kernel: [ 185.052727] synchronize_sched() > in 47 us > May 8 09:47:39 nattytest kernel: [ 185.170087] synchronize_sched() > in 62802 us > May 8 09:47:40 nattytest kernel: [ 185.250074] synchronize_sched() > in 78418 us > May 8 09:47:40 nattytest kernel: [ 185.311022] synchronize_sched() > in 40 us > May 8 09:47:40 nattytest kernel: [ 185.312542] synchronize_sched() > in 63 us > May 8 09:47:40 nattytest kernel: [ 185.430111] synchronize_sched() > in 61894 us > May 8 09:47:40 nattytest kernel: [ 185.510073] synchronize_sched() > in 78410 us > May 8 09:47:40 nattytest kernel: [ 185.561055] synchronize_sched() > in 58 us > May 8 09:47:40 nattytest kernel: [ 185.562589] synchronize_sched() > in 45 us > May 8 09:47:40 nattytest kernel: [ 185.680091] synchronize_sched() > in 62023 us > May 8 09:47:40 nattytest kernel: [ 185.760076] synchronize_sched() > in 78438 us > May 8 09:47:40 nattytest kernel: [ 185.811063] synchronize_sched() > in 45 us > May 8 09:47:40 nattytest kernel: [ 185.812580] synchronize_sched() > in 52 us > May 8 09:47:40 nattytest kernel: [ 185.930089] synchronize_sched() > in 61864 us > May 8 09:47:40 nattytest kernel: [ 186.010072] synchronize_sched() > in 78465 us > May 8 09:47:40 nattytest kernel: [ 186.061141] synchronize_sched() > in 44 us > May 8 09:47:40 nattytest kernel: [ 186.062631] synchronize_sched() > in 29 us > May 8 09:47:40 nattytest kernel: [ 186.180078] synchronize_sched() > in 61806 us > May 8 09:47:41 nattytest kernel: [ 186.260074] synchronize_sched() > in 78415 us > May 8 09:47:41 nattytest kernel: [ 186.311073] synchronize_sched() > in 43 us > May 8 09:47:41 nattytest kernel: [ 186.312573] synchronize_sched() > in 29 us > May 8 09:47:41 nattytest kernel: [ 186.440083] synchronize_sched() > in 72358 us > May 8 09:47:41 nattytest kernel: [ 186.520071] synchronize_sched() > in 78395 us > May 8 09:47:41 nattytest kernel: [ 186.573068] synchronize_sched() > in 66 us > May 8 09:47:41 nattytest kernel: [ 186.590105] synchronize_sched() > in 15551 us > May 8 09:47:41 nattytest kernel: [ 186.740092] synchronize_sched() > in 71884 us > May 8 09:47:41 nattytest kernel: [ 186.830069] synchronize_sched() > in 89123 us > May 8 09:47:41 nattytest kernel: [ 186.890967] synchronize_sched() > in 41 us > May 8 09:47:41 nattytest kernel: [ 186.910091] synchronize_sched() > in 18639 us > May 8 09:47:41 nattytest kernel: [ 187.070078] synchronize_sched() > in 82847 us > May 8 09:47:41 nattytest kernel: [ 187.140090] synchronize_sched() > in 68499 us > May 8 09:47:41 nattytest kernel: [ 187.191148] synchronize_sched() > in 57 us > May 8 09:47:41 nattytest kernel: [ 187.192743] synchronize_sched() > in 69 us > May 8 09:47:42 nattytest kernel: [ 187.310087] synchronize_sched() > in 63069 us > May 8 09:47:42 nattytest kernel: [ 187.400076] synchronize_sched() > in 88432 us > May 8 09:47:42 nattytest kernel: [ 187.461521] synchronize_sched() > in 57 us > May 8 09:47:42 nattytest kernel: [ 187.480099] synchronize_sched() > in 16493 us > May 8 09:47:42 nattytest kernel: [ 187.640081] synchronize_sched() > in 82269 us > May 8 09:47:42 nattytest kernel: [ 187.710079] synchronize_sched() > in 68389 us > May 8 09:47:42 nattytest kernel: [ 187.761071] synchronize_sched() > in 38 us > May 8 09:47:42 nattytest kernel: [ 187.762623] synchronize_sched() > in 67 us > May 8 09:47:42 nattytest kernel: [ 187.880077] synchronize_sched() > in 61847 us > May 8 09:47:42 nattytest kernel: [ 187.960096] synchronize_sched() > in 78450 us > May 8 09:47:42 nattytest kernel: [ 188.011147] synchronize_sched() > in 62 us > May 8 09:47:42 nattytest kernel: [ 188.012617] synchronize_sched() > in 29 us > May 8 09:47:42 nattytest kernel: [ 188.130083] synchronize_sched() > in 62431 us > May 8 09:47:42 nattytest kernel: [ 188.210090] synchronize_sched() > in 78447 us > May 8 09:47:43 nattytest kernel: [ 188.261015] synchronize_sched() > in 38 us > May 8 09:47:43 nattytest kernel: [ 188.262491] synchronize_sched() > in 29 us > May 8 09:47:43 nattytest kernel: [ 188.380097] synchronize_sched() > in 61923 us > May 8 09:47:43 nattytest kernel: [ 188.470131] synchronize_sched() > in 88521 us > May 8 09:47:43 nattytest kernel: [ 188.531024] synchronize_sched() > in 46 us > May 8 09:47:43 nattytest kernel: [ 188.550087] synchronize_sched() > in 17676 us > May 8 09:47:43 nattytest kernel: [ 188.690085] synchronize_sched() > in 71911 us > May 8 09:47:43 nattytest kernel: [ 188.820071] synchronize_sched() > in 128473 us > May 8 09:47:43 nattytest kernel: [ 188.881032] synchronize_sched() > in 42 us > May 8 09:47:43 nattytest kernel: [ 188.900085] synchronize_sched() > in 17588 us > May 8 09:47:43 nattytest kernel: [ 189.080084] synchronize_sched() > in 111945 us > May 8 09:47:43 nattytest kernel: [ 189.170068] synchronize_sched() > in 88449 us > May 8 09:47:43 nattytest kernel: [ 189.221066] synchronize_sched() > in 41 us > May 8 09:47:44 nattytest kernel: [ 189.230257] synchronize_sched() > in 7463 us > May 8 09:47:44 nattytest kernel: [ 189.380093] synchronize_sched() > in 71896 us > May 8 09:47:44 nattytest kernel: [ 189.470093] synchronize_sched() > in 88481 us > May 8 09:47:44 nattytest kernel: [ 189.550883] synchronize_sched() > in 43 us > May 8 09:47:44 nattytest kernel: [ 189.552391] synchronize_sched() > in 29 us > May 8 09:47:44 nattytest kernel: [ 189.670085] synchronize_sched() > in 61850 us > May 8 09:47:44 nattytest kernel: [ 189.760080] synchronize_sched() > in 88549 us > May 8 09:47:44 nattytest kernel: [ 189.821014] synchronize_sched() > in 44 us > May 8 09:47:44 nattytest kernel: [ 189.850080] synchronize_sched() > in 27628 us > May 8 09:47:44 nattytest kernel: [ 189.990087] synchronize_sched() > in 72251 us > May 8 09:47:44 nattytest kernel: [ 190.080074] synchronize_sched() > in 88486 us > May 8 09:47:44 nattytest kernel: [ 190.150968] synchronize_sched() > in 71 us > May 8 09:47:44 nattytest kernel: [ 190.170092] synchronize_sched() > in 17672 us > May 8 09:47:45 nattytest kernel: [ 190.320088] synchronize_sched() > in 71937 us > May 8 09:47:45 nattytest kernel: [ 190.410068] synchronize_sched() > in 89155 us > May 8 09:47:45 nattytest kernel: [ 190.490882] synchronize_sched() > in 40 us > May 8 09:47:45 nattytest kernel: [ 190.492405] synchronize_sched() > in 29 us > May 8 09:47:45 nattytest kernel: [ 190.620103] synchronize_sched() > in 62808 us > May 8 09:47:45 nattytest kernel: [ 190.700108] synchronize_sched() > in 78486 us > May 8 09:47:45 nattytest kernel: [ 190.820119] synchronize_sched() > in 69075 us > May 8 09:47:45 nattytest kernel: [ 190.870096] synchronize_sched() > in 48447 us > May 8 09:47:45 nattytest kernel: [ 191.000082] synchronize_sched() > in 62795 us > May 8 09:47:45 nattytest kernel: [ 191.080082] synchronize_sched() > in 78489 us > May 8 09:47:45 nattytest kernel: [ 191.141021] synchronize_sched() > in 61 us > May 8 09:47:45 nattytest kernel: [ 191.160086] synchronize_sched() > in 17596 us > May 8 09:47:46 nattytest kernel: [ 191.310097] synchronize_sched() > in 82089 us > May 8 09:47:46 nattytest kernel: [ 191.390076] synchronize_sched() > in 78427 us > May 8 09:47:46 nattytest kernel: [ 191.451045] synchronize_sched() > in 49 us > May 8 09:47:46 nattytest kernel: [ 191.470088] synchronize_sched() > in 17643 us > May 8 09:47:46 nattytest kernel: [ 191.620095] synchronize_sched() > in 72871 us > May 8 09:47:46 nattytest kernel: [ 191.700072] synchronize_sched() > in 79137 us > May 8 09:47:46 nattytest kernel: [ 191.751087] synchronize_sched() > in 40 us > May 8 09:47:46 nattytest kernel: [ 191.752600] synchronize_sched() > in 48 us > May 8 09:47:46 nattytest kernel: [ 191.880092] synchronize_sched() > in 72297 us > May 8 09:47:46 nattytest kernel: [ 191.950076] synchronize_sched() > in 69142 us > May 8 09:47:46 nattytest kernel: [ 192.005349] synchronize_sched() > in 54 us > May 8 09:47:46 nattytest kernel: [ 192.006834] synchronize_sched() > in 30 us > May 8 09:47:46 nattytest kernel: [ 192.130091] synchronize_sched() > in 72839 us > May 8 09:47:46 nattytest kernel: [ 192.210069] synchronize_sched() > in 78535 us > May 8 09:47:47 nattytest kernel: [ 192.270973] synchronize_sched() > in 43 us > May 8 09:47:47 nattytest kernel: [ 192.300094] synchronize_sched() > in 27592 us > May 8 09:47:47 nattytest kernel: [ 192.450079] synchronize_sched() > in 83072 us > May 8 09:47:47 nattytest kernel: [ 192.520073] synchronize_sched() > in 68499 us > May 8 09:47:47 nattytest kernel: [ 192.571106] synchronize_sched() > in 44 us > May 8 09:47:47 nattytest kernel: [ 192.572585] synchronize_sched() > in 29 us > May 8 09:47:47 nattytest kernel: [ 192.700080] synchronize_sched() > in 62670 us > May 8 09:47:47 nattytest kernel: [ 192.820084] synchronize_sched() > in 118509 us > May 8 09:47:47 nattytest kernel: [ 192.871107] synchronize_sched() > in 69 us > May 8 09:47:47 nattytest kernel: [ 192.872611] synchronize_sched() > in 29 us > May 8 09:47:47 nattytest kernel: [ 192.990110] synchronize_sched() > in 62742 us > May 8 09:47:47 nattytest kernel: [ 193.080070] synchronize_sched() > in 89147 us > May 8 09:47:47 nattytest kernel: [ 193.150962] synchronize_sched() > in 65 us > May 8 09:47:47 nattytest kernel: [ 193.160285] synchronize_sched() > in 8879 us > May 8 09:47:48 nattytest kernel: [ 193.300097] synchronize_sched() > in 72753 us > May 8 09:47:48 nattytest kernel: [ 193.380075] synchronize_sched() > in 78472 us > May 8 09:47:48 nattytest kernel: [ 193.431029] synchronize_sched() > in 41 us > May 8 09:47:48 nattytest kernel: [ 193.432518] synchronize_sched() > in 30 us > May 8 09:47:48 nattytest kernel: [ 193.560081] synchronize_sched() > in 72448 us > May 8 09:47:48 nattytest kernel: [ 193.670073] synchronize_sched() > in 108452 us > May 8 09:47:48 nattytest kernel: [ 193.721098] synchronize_sched() > in 49 us > May 8 09:47:48 nattytest kernel: [ 193.722821] synchronize_sched() > in 30 us > May 8 09:47:48 nattytest kernel: [ 193.840075] synchronize_sched() > in 62317 us > May 8 09:47:48 nattytest kernel: [ 193.920070] synchronize_sched() > in 78472 us > May 8 09:47:48 nattytest kernel: [ 193.971073] synchronize_sched() > in 43 us > May 8 09:47:48 nattytest kernel: [ 193.972550] synchronize_sched() > in 29 us > May 8 09:47:48 nattytest kernel: [ 194.100088] synchronize_sched() > in 72391 us > May 8 09:47:48 nattytest kernel: [ 194.170082] synchronize_sched() > in 69202 us > May 8 09:47:48 nattytest kernel: [ 194.221053] synchronize_sched() > in 40 us > May 8 09:47:48 nattytest kernel: [ 194.222543] synchronize_sched() > in 29 us > May 8 09:47:49 nattytest kernel: [ 194.350082] synchronize_sched() > in 72874 us > May 8 09:47:49 nattytest kernel: [ 194.430068] synchronize_sched() > in 78536 us > May 8 09:47:49 nattytest kernel: [ 194.481167] synchronize_sched() > in 70 us > May 8 09:47:49 nattytest kernel: [ 194.500091] synchronize_sched() > in 17486 us > May 8 09:47:49 nattytest kernel: [ 194.640088] synchronize_sched() > in 73012 us > May 8 09:47:49 nattytest kernel: [ 194.730069] synchronize_sched() > in 88519 us > May 8 09:47:49 nattytest kernel: [ 194.781117] synchronize_sched() > in 46 us > May 8 09:47:49 nattytest kernel: [ 194.792950] synchronize_sched() > in 10384 us > May 8 09:47:49 nattytest kernel: [ 194.910126] synchronize_sched() > in 63131 us > May 8 09:47:49 nattytest kernel: [ 195.000088] synchronize_sched() > in 88453 us > May 8 09:47:49 nattytest kernel: [ 195.160080] synchronize_sched() > in 79582 us > May 8 09:47:50 nattytest kernel: [ 195.240073] synchronize_sched() > in 79132 us > May 8 09:47:50 nattytest kernel: [ 195.360124] synchronize_sched() > in 62368 us > May 8 09:47:50 nattytest kernel: [ 195.440096] synchronize_sched() > in 79148 us > May 8 09:47:50 nattytest kernel: [ 195.491063] synchronize_sched() > in 53 us > May 8 09:47:50 nattytest kernel: [ 195.492182] synchronize_sched() > in 29 us > May 8 09:47:50 nattytest kernel: [ 195.610085] synchronize_sched() > in 62444 us > May 8 09:47:50 nattytest kernel: [ 195.700078] synchronize_sched() > in 89170 us > May 8 09:47:50 nattytest kernel: [ 195.761011] synchronize_sched() > in 57 us > May 8 09:47:50 nattytest kernel: [ 195.780137] synchronize_sched() > in 18587 us > May 8 09:47:50 nattytest kernel: [ 195.920080] synchronize_sched() > in 73194 us > May 8 09:47:50 nattytest kernel: [ 196.000073] synchronize_sched() > in 78456 us > May 8 09:47:50 nattytest kernel: [ 196.051067] synchronize_sched() > in 42 us > May 8 09:47:50 nattytest kernel: [ 196.052613] synchronize_sched() > in 44 us > May 8 09:47:50 nattytest kernel: [ 196.170080] synchronize_sched() > in 63040 us > May 8 09:47:51 nattytest kernel: [ 196.250075] synchronize_sched() > in 78420 us > May 8 09:47:51 nattytest kernel: [ 196.301051] synchronize_sched() > in 46 us > May 8 09:47:51 nattytest kernel: [ 196.302527] synchronize_sched() > in 29 us > May 8 09:47:51 nattytest kernel: [ 196.420095] synchronize_sched() > in 62701 us > May 8 09:47:51 nattytest kernel: [ 196.500064] synchronize_sched() > in 78448 us > May 8 09:47:51 nattytest kernel: [ 196.551104] synchronize_sched() > in 48 us > May 8 09:47:51 nattytest kernel: [ 196.552631] synchronize_sched() > in 49 us > May 8 09:47:51 nattytest kernel: [ 196.670084] synchronize_sched() > in 63111 us > May 8 09:47:51 nattytest kernel: [ 196.750095] synchronize_sched() > in 78464 us > May 8 09:47:51 nattytest kernel: [ 196.801082] synchronize_sched() > in 60 us > May 8 09:47:51 nattytest kernel: [ 196.802616] synchronize_sched() > in 43 us > May 8 09:47:51 nattytest kernel: [ 196.930124] synchronize_sched() > in 73010 us > May 8 09:47:51 nattytest kernel: [ 197.010084] synchronize_sched() > in 78359 us > May 8 09:47:51 nattytest kernel: [ 197.071137] synchronize_sched() > in 79 us > May 8 09:47:51 nattytest kernel: [ 197.080320] synchronize_sched() > in 7773 us > May 8 09:47:52 nattytest kernel: [ 197.230086] synchronize_sched() > in 83245 us > May 8 09:47:52 nattytest kernel: [ 197.310075] synchronize_sched() > in 78509 us > May 8 09:47:52 nattytest kernel: [ 197.361050] synchronize_sched() > in 44 us > May 8 09:47:52 nattytest kernel: [ 197.380084] synchronize_sched() > in 17591 us > May 8 09:47:52 nattytest kernel: [ 197.530084] synchronize_sched() > in 82999 us > May 8 09:47:52 nattytest kernel: [ 197.610080] synchronize_sched() > in 79162 us > May 8 09:47:52 nattytest kernel: [ 197.680937] synchronize_sched() > in 65 us > May 8 09:47:52 nattytest kernel: [ 197.700096] synchronize_sched() > in 18620 us > May 8 09:47:52 nattytest kernel: [ 197.850141] synchronize_sched() > in 73601 us > May 8 09:47:52 nattytest kernel: [ 197.930091] synchronize_sched() > in 78393 us > May 8 09:47:52 nattytest kernel: [ 197.981050] synchronize_sched() > in 39 us > May 8 09:47:52 nattytest kernel: [ 197.982788] synchronize_sched() > in 29 us > May 8 09:47:52 nattytest kernel: [ 198.110084] synchronize_sched() > in 72893 us > May 8 09:47:52 nattytest kernel: [ 198.180072] synchronize_sched() > in 69151 us > May 8 09:47:53 nattytest kernel: [ 198.231098] synchronize_sched() > in 50 us > May 8 09:47:53 nattytest kernel: [ 198.232679] synchronize_sched() > in 46 us > May 8 09:47:53 nattytest kernel: [ 198.370084] synchronize_sched() > in 72935 us > May 8 09:47:53 nattytest kernel: [ 198.450068] synchronize_sched() > in 78461 us > May 8 09:47:53 nattytest kernel: [ 198.511077] synchronize_sched() > in 77 us > May 8 09:47:53 nattytest kernel: [ 198.530094] synchronize_sched() > in 17512 us > May 8 09:47:53 nattytest kernel: [ 198.670095] synchronize_sched() > in 83122 us > May 8 09:47:53 nattytest kernel: [ 198.740079] synchronize_sched() > in 68453 us > May 8 09:47:53 nattytest kernel: [ 198.801121] synchronize_sched() > in 48 us > May 8 09:47:53 nattytest kernel: [ 198.802600] synchronize_sched() > in 29 us > May 8 09:47:53 nattytest kernel: [ 198.920100] synchronize_sched() > in 63392 us > May 8 09:47:53 nattytest kernel: [ 199.000088] synchronize_sched() > in 78482 us > May 8 09:47:53 nattytest kernel: [ 199.070099] synchronize_sched() > in 19013 us > May 8 09:47:53 nattytest kernel: [ 199.071626] synchronize_sched() > in 48 us > May 8 09:47:53 nattytest kernel: [ 199.190083] synchronize_sched() > in 63245 us > May 8 09:47:54 nattytest kernel: [ 199.270075] synchronize_sched() > in 78491 us > May 8 09:47:54 nattytest kernel: [ 199.321088] synchronize_sched() > in 42 us > May 8 09:47:54 nattytest kernel: [ 199.322589] synchronize_sched() > in 29 us > May 8 09:47:54 nattytest kernel: [ 199.440101] synchronize_sched() > in 63063 us > May 8 09:47:54 nattytest kernel: [ 199.520073] synchronize_sched() > in 78463 us > May 8 09:47:54 nattytest kernel: [ 199.571056] synchronize_sched() > in 49 us > May 8 09:47:54 nattytest kernel: [ 199.572488] synchronize_sched() > in 29 us > May 8 09:47:54 nattytest kernel: [ 199.700083] synchronize_sched() > in 73220 us > May 8 09:47:54 nattytest kernel: [ 199.770096] synchronize_sched() > in 68508 us > May 8 09:47:54 nattytest kernel: [ 199.821090] synchronize_sched() > in 50 us > May 8 09:47:54 nattytest kernel: [ 199.822540] synchronize_sched() > in 29 us > May 8 09:47:54 nattytest kernel: [ 199.940084] synchronize_sched() > in 63681 us > May 8 09:47:54 nattytest kernel: [ 200.020067] synchronize_sched() > in 78451 us > May 8 09:47:54 nattytest kernel: [ 200.071095] synchronize_sched() > in 41 us > May 8 09:47:54 nattytest kernel: [ 200.072566] synchronize_sched() > in 29 us > May 8 09:47:54 nattytest kernel: [ 200.190089] synchronize_sched() > in 63584 us > May 8 09:47:55 nattytest kernel: [ 200.280076] synchronize_sched() > in 89123 us > May 8 09:47:55 nattytest kernel: [ 200.341011] synchronize_sched() > in 61 us > May 8 09:47:55 nattytest kernel: [ 200.360094] synchronize_sched() > in 18527 us > May 8 09:47:55 nattytest kernel: [ 200.520085] synchronize_sched() > in 94129 us > May 8 09:47:55 nattytest kernel: [ 200.670072] synchronize_sched() > in 148512 us > May 8 09:47:55 nattytest kernel: [ 200.730983] synchronize_sched() > in 41 us > May 8 09:47:55 nattytest kernel: [ 200.750355] synchronize_sched() > in 17917 us > May 8 09:47:55 nattytest kernel: [ 200.890082] synchronize_sched() > in 83406 us > May 8 09:47:55 nattytest kernel: [ 200.970072] synchronize_sched() > in 78439 us > May 8 09:47:55 nattytest kernel: [ 201.031034] synchronize_sched() > in 45 us > May 8 09:47:55 nattytest kernel: [ 201.050090] synchronize_sched() > in 17555 us > May 8 09:47:55 nattytest kernel: [ 201.200090] synchronize_sched() > in 83318 us > May 8 09:47:56 nattytest kernel: [ 201.270075] synchronize_sched() > in 68468 us > May 8 09:47:56 nattytest kernel: [ 201.330983] synchronize_sched() > in 41 us > May 8 09:47:56 nattytest kernel: [ 201.332487] synchronize_sched() > in 62 us > May 8 09:47:56 nattytest kernel: [ 201.450088] synchronize_sched() > in 53772 us > May 8 09:47:56 nattytest kernel: [ 201.530075] synchronize_sched() > in 78437 us > May 8 09:47:56 nattytest kernel: [ 201.590986] synchronize_sched() > in 46 us > May 8 09:47:56 nattytest kernel: [ 201.592134] synchronize_sched() > in 29 us > May 8 09:47:56 nattytest kernel: [ 201.710085] synchronize_sched() > in 63551 us > May 8 09:47:56 nattytest kernel: [ 201.830146] synchronize_sched() > in 118553 us > May 8 09:47:56 nattytest kernel: [ 201.891143] synchronize_sched() > in 50 us > May 8 09:47:56 nattytest kernel: [ 201.892712] synchronize_sched() > in 55 us > May 8 09:47:56 nattytest kernel: [ 202.010092] synchronize_sched() > in 63446 us > May 8 09:47:56 nattytest kernel: [ 202.090126] synchronize_sched() > in 78477 us > May 8 09:47:56 nattytest kernel: [ 202.141120] synchronize_sched() > in 45 us > May 8 09:47:56 nattytest kernel: [ 202.142867] synchronize_sched() > in 30 us > May 8 09:47:57 nattytest kernel: [ 202.260092] synchronize_sched() > in 63271 us > May 8 09:47:57 nattytest kernel: [ 202.340071] synchronize_sched() > in 78474 us > May 8 09:47:57 nattytest kernel: [ 202.391094] synchronize_sched() > in 44 us > May 8 09:47:57 nattytest kernel: [ 202.392581] synchronize_sched() > in 29 us > May 8 09:47:57 nattytest kernel: [ 202.510110] synchronize_sched() > in 63718 us > May 8 09:47:57 nattytest kernel: [ 202.600092] synchronize_sched() > in 88434 us > May 8 09:47:57 nattytest kernel: [ 202.661075] synchronize_sched() > in 46 us > May 8 09:47:57 nattytest kernel: [ 202.680091] synchronize_sched() > in 17516 us > May 8 09:47:57 nattytest kernel: [ 202.820083] synchronize_sched() > in 83733 us > May 8 09:47:57 nattytest kernel: [ 202.900067] synchronize_sched() > in 78478 us > May 8 09:47:57 nattytest kernel: [ 202.951062] synchronize_sched() > in 43 us > May 8 09:47:57 nattytest kernel: [ 202.960445] synchronize_sched() > in 7921 us > May 8 09:47:57 nattytest kernel: [ 203.100080] synchronize_sched() > in 83496 us > May 8 09:47:57 nattytest kernel: [ 203.170084] synchronize_sched() > in 68474 us > May 8 09:47:57 nattytest kernel: [ 203.221083] synchronize_sched() > in 41 us > May 8 09:47:57 nattytest kernel: [ 203.222692] synchronize_sched() > in 68 us > May 8 09:47:58 nattytest kernel: [ 203.350083] synchronize_sched() > in 74062 us > May 8 09:47:58 nattytest kernel: [ 203.430100] synchronize_sched() > in 78468 us > May 8 09:47:58 nattytest kernel: [ 203.491028] synchronize_sched() > in 52 us > May 8 09:47:58 nattytest kernel: [ 203.510086] synchronize_sched() > in 17607 us > May 8 09:47:58 nattytest kernel: [ 203.670080] synchronize_sched() > in 83798 us > May 8 09:47:58 nattytest kernel: [ 203.750080] synchronize_sched() > in 78382 us > May 8 09:47:58 nattytest kernel: [ 203.821032] synchronize_sched() > in 48 us > May 8 09:47:58 nattytest kernel: [ 203.840093] synchronize_sched() > in 17608 us > May 8 09:47:58 nattytest kernel: [ 203.990084] synchronize_sched() > in 73664 us > May 8 09:47:58 nattytest kernel: [ 204.080078] synchronize_sched() > in 88391 us > May 8 09:47:58 nattytest kernel: [ 204.141021] synchronize_sched() > in 44 us > May 8 09:47:58 nattytest kernel: [ 204.160083] synchronize_sched() > in 17671 us > May 8 09:47:59 nattytest kernel: [ 204.310086] synchronize_sched() > in 83288 us > May 8 09:47:59 nattytest kernel: [ 204.380074] synchronize_sched() > in 69168 us > May 8 09:47:59 nattytest kernel: [ 204.431059] synchronize_sched() > in 67 us > May 8 09:47:59 nattytest kernel: [ 204.432205] synchronize_sched() > in 30 us > May 8 09:47:59 nattytest kernel: [ 204.550147] synchronize_sched() > in 54218 us > May 8 09:47:59 nattytest kernel: [ 204.640079] synchronize_sched() > in 88401 us > May 8 09:47:59 nattytest kernel: [ 204.701062] synchronize_sched() > in 48 us > May 8 09:47:59 nattytest kernel: [ 204.720093] synchronize_sched() > in 17508 us > May 8 09:47:59 nattytest kernel: [ 204.850108] synchronize_sched() > in 73968 us > May 8 09:47:59 nattytest kernel: [ 204.940074] synchronize_sched() > in 88489 us > May 8 09:47:59 nattytest kernel: [ 205.011002] synchronize_sched() > in 44 us > May 8 09:47:59 nattytest kernel: [ 205.030104] synchronize_sched() > in 17658 us > May 8 09:47:59 nattytest kernel: [ 205.190091] synchronize_sched() > in 83651 us > May 8 09:48:00 nattytest kernel: [ 205.270094] synchronize_sched() > in 78501 us > May 8 09:48:00 nattytest kernel: [ 205.341000] synchronize_sched() > in 65 us > May 8 09:48:00 nattytest kernel: [ 205.360081] synchronize_sched() > in 17640 us > > > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 13:44 ` Paul E. McKenney @ 2011-05-08 14:27 ` Alex Bligh 2011-05-08 14:47 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-08 14:27 UTC (permalink / raw) To: paulmck; +Cc: Eric Dumazet, netdev, Alex Bligh Paul, >> Yes, really 20-49us and 50-99us, not ms. Raw data attached :-) >> >> I'm guessing there are circumstances where there is an early exit. > > Well, if you were onlining and offlining CPUs, then if there was only > one CPU online, this could happen. No, I wasn't doing that. > And there really is only one CPU > online during boot, so if your measurements included early boot time, > this could easily explain these very short timings. No, I waited a few minutes after boot for the system to stabilize, and all CPUs were definitely online. The patch to the kernel I am running is below. >> There is nothing much going on these systems (idle, no other users, >> just normal system daemons). > > And normal system daemons might cause this, right? Yes. Everything is normal, except I did service udev stop unshare -n bash which together stop the system running interface scripts when interfaces are created (as upstart and upstart-udev-bridge are now integrated, you can't kill upstart, so you have to rely on unshare -n to stop the events being propagated). That's just to avoid measuring the time it takes to execute the scripts. -- Alex Bligh diff --git a/kernel/rcutree.c b/kernel/rcutree.c index dd4aea8..e401018 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1518,6 +1518,7 @@ EXPORT_SYMBOL_GPL(call_rcu_bh); void synchronize_sched(void) { struct rcu_synchronize rcu; + ktime_t time_start = ktime_get(); if (rcu_blocking_is_gp()) return; @@ -1529,6 +1530,7 @@ void synchronize_sched(void) /* Wait for it. */ wait_for_completion(&rcu.completion); destroy_rcu_head_on_stack(&rcu.head); + pr_err("synchronize_sched() in %lld us\n", ktime_us_delta(ktime_get(), time_start)); } EXPORT_SYMBOL_GPL(synchronize_sched); diff --git a/net/core/dev.c b/net/core/dev.c index 856b6ee..013f627 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5164,7 +5164,9 @@ static void rollback_registered_many(struct list_head *head) dev = list_first_entry(head, struct net_device, unreg_list); call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev); + pr_err("begin rcu_barrier()\n"); rcu_barrier(); + pr_err("end rcu_barrier()\n"); list_for_each_entry(dev, head, unreg_list) dev_put(dev); @@ -5915,8 +5917,10 @@ EXPORT_SYMBOL(free_netdev); */ void synchronize_net(void) { + pr_err("begin synchronize_net()\n"); might_sleep(); synchronize_rcu(); + pr_err("end synchronize_net()\n"); } EXPORT_SYMBOL(synchronize_net); ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 14:27 ` Alex Bligh @ 2011-05-08 14:47 ` Paul E. McKenney 2011-05-08 15:17 ` Alex Bligh 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2011-05-08 14:47 UTC (permalink / raw) To: Alex Bligh; +Cc: Eric Dumazet, netdev On Sun, May 08, 2011 at 03:27:07PM +0100, Alex Bligh wrote: > Paul, > > >>Yes, really 20-49us and 50-99us, not ms. Raw data attached :-) > >> > >>I'm guessing there are circumstances where there is an early exit. > > > >Well, if you were onlining and offlining CPUs, then if there was only > >one CPU online, this could happen. > > No, I wasn't doing that. OK. > > And there really is only one CPU > >online during boot, so if your measurements included early boot time, > >this could easily explain these very short timings. > > No, I waited a few minutes after boot for the system to stabilize, and > all CPUs were definitely online. > > The patch to the kernel I am running is below. OK, interesting... My guess is that you need to be using ktime_get_ts(). Isn't ktime_get() subject to various sorts of adjustment? > >>There is nothing much going on these systems (idle, no other users, > >>just normal system daemons). > > > >And normal system daemons might cause this, right? > > Yes. Everything is normal, except I did > service udev stop > unshare -n bash > which together stop the system running interface scripts when > interfaces are created (as upstart and upstart-udev-bridge are > now integrated, you can't kill upstart, so you have to rely on > unshare -n to stop the events being propagated). That's just > to avoid measuring the time it takes to execute the scripts. OK, so you really could be seeing grace periods started by these system daemons. Thanx, Paul > -- > Alex Bligh > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index dd4aea8..e401018 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -1518,6 +1518,7 @@ EXPORT_SYMBOL_GPL(call_rcu_bh); > void synchronize_sched(void) > { > struct rcu_synchronize rcu; > + ktime_t time_start = ktime_get(); > > if (rcu_blocking_is_gp()) > return; > @@ -1529,6 +1530,7 @@ void synchronize_sched(void) > /* Wait for it. */ > wait_for_completion(&rcu.completion); > destroy_rcu_head_on_stack(&rcu.head); > + pr_err("synchronize_sched() in %lld us\n", > ktime_us_delta(ktime_get(), time_start)); > } > EXPORT_SYMBOL_GPL(synchronize_sched); > > diff --git a/net/core/dev.c b/net/core/dev.c > index 856b6ee..013f627 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5164,7 +5164,9 @@ static void rollback_registered_many(struct > list_head *head) > dev = list_first_entry(head, struct net_device, unreg_list); > call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev); > > + pr_err("begin rcu_barrier()\n"); > rcu_barrier(); > + pr_err("end rcu_barrier()\n"); > > list_for_each_entry(dev, head, unreg_list) > dev_put(dev); > @@ -5915,8 +5917,10 @@ EXPORT_SYMBOL(free_netdev); > */ > void synchronize_net(void) > { > + pr_err("begin synchronize_net()\n"); > might_sleep(); > synchronize_rcu(); > + pr_err("end synchronize_net()\n"); > } > EXPORT_SYMBOL(synchronize_net); > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 14:47 ` Paul E. McKenney @ 2011-05-08 15:17 ` Alex Bligh 2011-05-08 15:48 ` Paul E. McKenney 0 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-08 15:17 UTC (permalink / raw) To: paulmck; +Cc: Eric Dumazet, netdev, Alex Bligh Paul, >> No, I waited a few minutes after boot for the system to stabilize, and >> all CPUs were definitely online. >> >> The patch to the kernel I am running is below. > > OK, interesting... > > My guess is that you need to be using ktime_get_ts(). Isn't ktime_get() > subject to various sorts of adjustment? It's Eric's code, not mine, but: kernel/time/timekeeping.c suggests they do the same thing (adjust xtime by wall_to_monotonic), just one returns a struct timespec and the other returns a ktime_t. >> >> There is nothing much going on these systems (idle, no other users, >> >> just normal system daemons). >> > >> > And normal system daemons might cause this, right? >> >> Yes. Everything is normal, except I did >> service udev stop >> unshare -n bash >> which together stop the system running interface scripts when >> interfaces are created (as upstart and upstart-udev-bridge are >> now integrated, you can't kill upstart, so you have to rely on >> unshare -n to stop the events being propagated). That's just >> to avoid measuring the time it takes to execute the scripts. > > OK, so you really could be seeing grace periods started by these system > daemons. In 50% of 200 calls? That seems pretty unlikely. I think it's more likely to be the 6 jiffies per call to ensure cpus are idle, plus the 3 calls per interface destroy. If 6 jiffies per call to ensure cpus are idle is a fact of life, then the question goes back to why interface removal is waiting for rcu readers to be released synchronously, as opposed to doing the update bits synchronously, then doing the reclaim element (freeing the memory) afterwards using call_rcu. -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 15:17 ` Alex Bligh @ 2011-05-08 15:48 ` Paul E. McKenney 2011-05-08 21:00 ` Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2011-05-08 15:48 UTC (permalink / raw) To: Alex Bligh; +Cc: Eric Dumazet, netdev On Sun, May 08, 2011 at 04:17:42PM +0100, Alex Bligh wrote: > Paul, > > >>No, I waited a few minutes after boot for the system to stabilize, and > >>all CPUs were definitely online. > >> > >>The patch to the kernel I am running is below. > > > >OK, interesting... > > > >My guess is that you need to be using ktime_get_ts(). Isn't ktime_get() > >subject to various sorts of adjustment? > > It's Eric's code, not mine, but: > > kernel/time/timekeeping.c suggests they do the same thing > (adjust xtime by wall_to_monotonic), just one returns a > struct timespec and the other returns a ktime_t. > > >>>> There is nothing much going on these systems (idle, no other users, > >>>> just normal system daemons). > >>> > >>> And normal system daemons might cause this, right? > >> > >>Yes. Everything is normal, except I did > >>service udev stop > >>unshare -n bash > >>which together stop the system running interface scripts when > >>interfaces are created (as upstart and upstart-udev-bridge are > >>now integrated, you can't kill upstart, so you have to rely on > >>unshare -n to stop the events being propagated). That's just > >>to avoid measuring the time it takes to execute the scripts. > > > >OK, so you really could be seeing grace periods started by these system > >daemons. > > In 50% of 200 calls? That seems pretty unlikely. I think it's more > likely to be the 6 jiffies per call to ensure cpus are idle, > plus the 3 calls per interface destroy. > > If 6 jiffies per call to ensure cpus are idle is a fact of life, > then the question goes back to why interface removal is waiting > for rcu readers to be released synchronously, as opposed to > doing the update bits synchronously, then doing the reclaim > element (freeing the memory) afterwards using call_rcu. This would speed things up considerably, assuming that there is no other reason to block for an RCU grace period. Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 15:48 ` Paul E. McKenney @ 2011-05-08 21:00 ` Eric Dumazet 2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet ` (3 more replies) 0 siblings, 4 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-08 21:00 UTC (permalink / raw) To: paulmck; +Cc: Alex Bligh, netdev, Jesse Gross Le dimanche 08 mai 2011 à 08:48 -0700, Paul E. McKenney a écrit : > On Sun, May 08, 2011 at 04:17:42PM +0100, Alex Bligh wrote: > > > > If 6 jiffies per call to ensure cpus are idle is a fact of life, > > then the question goes back to why interface removal is waiting > > for rcu readers to be released synchronously, as opposed to > > doing the update bits synchronously, then doing the reclaim > > element (freeing the memory) afterwards using call_rcu. > > This would speed things up considerably, assuming that there is no > other reason to block for an RCU grace period. > Thats not so simple... Things are modular and better be safe than crash, on a very rare event (device dismantles are not the thing we expect to do very often. Only special needs might need to perform hundred of them per minute...) For example, in the VLAN dismantle phase (ip link del eth0.103) we have 3 calls to synchronize_rcu() and one call to rcu_barrier() [ the 'extra' synchronize_rcu() call comes from unregister_vlan_dev() ] Maybe with new VLAN model, we could now remove this synchronize_net() call from vlan code. Jesse what do you think ? Once vlan_group_set_device(grp, vlan_id, NULL) had been called, why should we respect one rcu grace period at all, given dev is queued to unregister_netdevice_queue() [ which has its own couples of synchronize_net() / rcu_barrier() ] The real scalability problem of device dismantles comes from the fact that all these waits are done under RTNL mutex. This is the real killer because you cannot use your eight cpus, even if you are willing to. We can probably speed things, but we should consider the following user actions : ip link add link eth0 vlan103 type vlan id 103 ip link del vlan103 ip link add link eth1 vlan103 type vlan id 103 The "link del" command should return to user only if the minimum things had been done, to make sure the following "link add" wont fail mysteriously. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] veth: use batched device unregister 2011-05-08 21:00 ` Eric Dumazet @ 2011-05-09 4:44 ` Eric Dumazet 2011-05-09 6:56 ` Michał Mirosław 2011-05-09 7:45 ` [PATCH v2 net-next-2.6] veth: use batched device unregister Eric Dumazet 2011-05-09 5:37 ` Scalability of interface creation and deletion Alex Bligh ` (2 subsequent siblings) 3 siblings, 2 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 4:44 UTC (permalink / raw) To: David Miller Cc: Alex Bligh, netdev, Jesse Gross, Paul E. McKenney, Ben Greear veth devices dont use the batched device unregisters yet. Since veth are a pair of devices, it makes sense to use a batch of two unregisters, this roughly divide dismantle time by two. Reported-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesse Gross <jesse@nicira.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ben Greear <greearb@candelatech.com> --- drivers/net/veth.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 3b99f64..77c4679 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -451,8 +451,16 @@ static void veth_dellink(struct net_device *dev, struct list_head *head) priv = netdev_priv(dev); peer = priv->peer; - unregister_netdevice_queue(dev, head); - unregister_netdevice_queue(peer, head); + if (head == NULL) { + LIST_HEAD(list); + /* make a batch of two devices to speedup unregister */ + unregister_netdevice_queue(dev, &list); + unregister_netdevice_queue(peer, &list); + unregister_netdevice_many(&list); + } else { + unregister_netdevice_queue(dev, head); + unregister_netdevice_queue(peer, head); + } } static const struct nla_policy veth_policy[VETH_INFO_MAX + 1]; ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] veth: use batched device unregister 2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet @ 2011-05-09 6:56 ` Michał Mirosław 2011-05-09 8:20 ` Eric Dumazet 2011-05-09 7:45 ` [PATCH v2 net-next-2.6] veth: use batched device unregister Eric Dumazet 1 sibling, 1 reply; 53+ messages in thread From: Michał Mirosław @ 2011-05-09 6:56 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Alex Bligh, netdev, Jesse Gross, Paul E. McKenney, Ben Greear 2011/5/9 Eric Dumazet <eric.dumazet@gmail.com>: > veth devices dont use the batched device unregisters yet. > > Since veth are a pair of devices, it makes sense to use a batch of two > unregisters, this roughly divide dismantle time by two. [...] > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -451,8 +451,16 @@ static void veth_dellink(struct net_device *dev, struct list_head *head) > priv = netdev_priv(dev); > peer = priv->peer; > > - unregister_netdevice_queue(dev, head); > - unregister_netdevice_queue(peer, head); > + if (head == NULL) { > + LIST_HEAD(list); > + /* make a batch of two devices to speedup unregister */ > + unregister_netdevice_queue(dev, &list); > + unregister_netdevice_queue(peer, &list); > + unregister_netdevice_many(&list); > + } else { > + unregister_netdevice_queue(dev, head); > + unregister_netdevice_queue(peer, head); > + } You could change dellink callers to always pass head != NULL. As a side effect, unregister_netdevice_queue() would do just what its name suggests. Best Regards, Michał Mirosław ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] veth: use batched device unregister 2011-05-09 6:56 ` Michał Mirosław @ 2011-05-09 8:20 ` Eric Dumazet 2011-05-09 9:17 ` [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 8:20 UTC (permalink / raw) To: Michał Mirosław Cc: David Miller, Alex Bligh, netdev, Jesse Gross, Paul E. McKenney, Ben Greear Le lundi 09 mai 2011 à 08:56 +0200, Michał Mirosław a écrit : > You could change dellink callers to always pass head != NULL. As a > side effect, unregister_netdevice_queue() would do just what its name > suggests. Good idea. At first glance, macvlan and rtnetlink.c would need a change. This would help macvlan_device_event( event=NETDEV_UNREGISTER) use batch as well. And yes, unregister_netdevice_queue(dev, head) would only make a list_move_tail(&dev->unreg_list, head); Will submit a patch soon, thanks ! ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan 2011-05-09 8:20 ` Eric Dumazet @ 2011-05-09 9:17 ` Eric Dumazet 2011-05-09 18:42 ` David Miller 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 9:17 UTC (permalink / raw) To: Michał Mirosław, David Miller Cc: Alex Bligh, netdev, Jesse Gross, Paul E. McKenney, Ben Greear veth devices dont use the batched device unregisters yet. Since veth are a pair of devices, it makes sense to use a batch of two unregisters, this roughly divides dismantle time by two. Fix this by changing dellink() callers to always provide a non NULL head. (Idea from Michał Mirosław) This patch also handles macvlan case : We now dismantle all macvlans on top of a lower dev at once. Reported-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Michał Mirosław <mirqus@gmail.com> Cc: Jesse Gross <jesse@nicira.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ben Greear <greearb@candelatech.com> --- v3: Michał Mirosław dellink idea drivers/net/macvlan.c | 5 ++++- net/core/rtnetlink.c | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 3ad5425..d7c0bc62 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -785,6 +785,7 @@ static int macvlan_device_event(struct notifier_block *unused, struct net_device *dev = ptr; struct macvlan_dev *vlan, *next; struct macvlan_port *port; + LIST_HEAD(list_kill); if (!macvlan_port_exists(dev)) return NOTIFY_DONE; @@ -810,7 +811,9 @@ static int macvlan_device_event(struct notifier_block *unused, break; list_for_each_entry_safe(vlan, next, &port->vlans, list) - vlan->dev->rtnl_link_ops->dellink(vlan->dev, NULL); + vlan->dev->rtnl_link_ops->dellink(vlan->dev, &list_kill); + unregister_netdevice_many(&list_kill); + list_del(&list_kill); break; case NETDEV_PRE_TYPE_CHANGE: /* Forbid underlaying device to change its type. */ diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 5a160f4..d2ba259 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1501,6 +1501,7 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) char ifname[IFNAMSIZ]; struct nlattr *tb[IFLA_MAX+1]; int err; + LIST_HEAD(list_kill); err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err < 0) @@ -1524,7 +1525,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) if (!ops) return -EOPNOTSUPP; - ops->dellink(dev, NULL); + ops->dellink(dev, &list_kill); + unregister_netdevice_many(&list_kill); + list_del(&list_kill); return 0; } ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan 2011-05-09 9:17 ` [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan Eric Dumazet @ 2011-05-09 18:42 ` David Miller 2011-05-09 19:05 ` Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: David Miller @ 2011-05-09 18:42 UTC (permalink / raw) To: eric.dumazet; +Cc: mirqus, alex, netdev, jesse, paulmck, greearb From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 09 May 2011 11:17:57 +0200 > veth devices dont use the batched device unregisters yet. > > Since veth are a pair of devices, it makes sense to use a batch of two > unregisters, this roughly divides dismantle time by two. > > Fix this by changing dellink() callers to always provide a non NULL > head. (Idea from Michał Mirosław) > > This patch also handles macvlan case : We now dismantle all macvlans on > top of a lower dev at once. > > Reported-by: Alex Bligh <alex@alex.org.uk> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan 2011-05-09 18:42 ` David Miller @ 2011-05-09 19:05 ` Eric Dumazet 2011-05-09 20:17 ` Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 19:05 UTC (permalink / raw) To: David Miller Cc: mirqus, alex, netdev, jesse, paulmck, greearb, Patrick McHardy Le lundi 09 mai 2011 à 11:42 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 09 May 2011 11:17:57 +0200 > > > veth devices dont use the batched device unregisters yet. > > > > Since veth are a pair of devices, it makes sense to use a batch of two > > unregisters, this roughly divides dismantle time by two. > > > > Fix this by changing dellink() callers to always provide a non NULL > > head. (Idea from Michał Mirosław) > > > > This patch also handles macvlan case : We now dismantle all macvlans on > > top of a lower dev at once. > > > > Reported-by: Alex Bligh <alex@alex.org.uk> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > Applied. Thanks ! I believe there is one problem with this patch and unregister_vlan_dev(), I'll have to find a solution fast ;) ip link add link eth2 eth2.103 type vlan id 103 gvrp on ip link add link eth2 eth2.104 type vlan id 104 gvrp on ip link set eth2.103 up ip link set eth2.104 up ip link del eth2.103 ip link del eth2.104 <<<BUG>>> [ 372.573591] BUG: unable to handle kernel NULL pointer dereference at (null) [ 372.573738] IP: [<ffffffffa014ecde>] garp_request_leave+0x2e/0x88 [garp] [ 372.573835] PGD 7a7d0067 PUD 7c9b1067 PMD 0 [ 372.573995] Oops: 0000 [#1] SMP [ 372.574119] last sysfs file: /sys/devices/virtual/net/eth2.104/ifindex [ 372.574180] CPU 3 [ 372.574221] Modules linked in: 8021q garp stp llc nfsd lockd sunrpc tg3 libphy sg [last unloaded: x_tables] [ 372.574765] [ 372.574817] Pid: 5656, comm: ip Tainted: G W 2.6.39-rc2-01916-g0e21eae-dirty #696 HP ProLiant BL460c G6 [ 372.574967] RIP: 0010:[<ffffffffa014ecde>] [<ffffffffa014ecde>] garp_request_leave+0x2e/0x88 [garp] [ 372.575083] RSP: 0018:ffff8801168697c8 EFLAGS: 00010282 [ 372.577084] RAX: 0000000000000000 RBX: ffff880116869816 RCX: 0000000000000002 [ 372.577146] RDX: 0000000000000000 RSI: ffffffffa01594c0 RDI: ffff880117bc0000 [ 372.577208] RBP: ffff8801168697f8 R08: 0000000000000001 R09: ffff88007a190800 [ 372.577269] R10: ffff88007a17da00 R11: 0000000000000000 R12: ffff880117bc0000 [ 372.577331] R13: ffff8801168699d8 R14: 0000000000000001 R15: 0000000000000002 [ 372.577393] FS: 0000000000000000(0000) GS:ffff88007fc40000(0063) knlGS:00000000f779f6c0 [ 372.577494] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 372.577553] CR2: 0000000000000000 CR3: 000000007af08000 CR4: 00000000000006e0 [ 372.577615] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 372.577677] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 372.577739] Process ip (pid: 5656, threadinfo ffff880116868000, task ffff88011a388000) [ 372.577816] Stack: [ 372.577868] ffff8801168697e8 ffff88007a74c800 ffff880117bc0000 ffff8801168699d8 [ 372.578083] ffff880116869868 0000000000000000 ffff880116869818 ffffffffa0158226 [ 372.578297] 0000000316869818 6800880116869938 ffff880116869838 ffffffffa0157467 [ 372.578511] Call Trace: [ 372.578579] [<ffffffffa0158226>] vlan_gvrp_request_leave+0x46/0x50 [8021q] [ 372.578642] [<ffffffffa0157467>] vlan_dev_stop+0xb7/0xc0 [8021q] [ 372.578703] [<ffffffff81398b87>] __dev_close_many+0x87/0xe0 [ 372.578763] [<ffffffff81398c67>] dev_close_many+0x87/0x110 [ 372.578823] [<ffffffff81398d90>] rollback_registered_many+0xa0/0x240 [ 372.578884] [<ffffffff81398f49>] unregister_netdevice_many+0x19/0x60 [ 372.578946] [<ffffffff813a7e62>] rtnl_dellink+0xc2/0xf0 [ 372.579005] [<ffffffff813a5ae7>] rtnetlink_rcv_msg+0x247/0x250 [ 372.579066] [<ffffffff813a58a0>] ? rtnetlink_net_init+0x40/0x40 [ 372.579126] [<ffffffff813cb529>] netlink_rcv_skb+0x99/0xc0 [ 372.579185] [<ffffffff813a7690>] rtnetlink_rcv+0x20/0x30 [ 372.579244] [<ffffffff813cb296>] netlink_unicast+0x296/0x2a0 [ 372.579304] [<ffffffff8139052f>] ? memcpy_fromiovec+0x5f/0x80 [ 372.579364] [<ffffffff813cc1c7>] netlink_sendmsg+0x227/0x370 unregister_vlan_dev() does : vlan_group_set_device(grp, vlan_id, NULL); unregister_netdevice_queue(dev, head); /* If the group is now empty, kill off the group. */ if (grp->nr_vlans == 0) { vlan_gvrp_uninit_applicant(real_dev); Now 'head' is not anymore NULL, we no longer immediately release the dev in unregister_netdevice_queue() but queue it. So vlan_gvrp_uninit_applicant() is now freeing garp structure, _before_ vlan_gvrp_request_leave() is called from vlan_dev_stop() So we dereference NULL pointer in garp_request_leave I suspect we should move the 'group freeing' out from unregister_vlan_dev() to vlan_dev_stop() ? Patrick, David any idea before I cook a patch ? BTW, bug must be present in net-2.6, if we unload vlan module (since in this case we also had a non NULL head ) Thanks ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan 2011-05-09 19:05 ` Eric Dumazet @ 2011-05-09 20:17 ` Eric Dumazet 2011-05-10 6:40 ` [PATCH net-2.6] vlan: fix GVRP at dismantle time Eric Dumazet 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 20:17 UTC (permalink / raw) To: David Miller Cc: mirqus, alex, netdev, jesse, paulmck, greearb, Patrick McHardy Le lundi 09 mai 2011 à 21:05 +0200, Eric Dumazet a écrit : > BTW, bug must be present in net-2.6, if we unload vlan module (since in this > case we also had a non NULL head ) Yes, I confirm we have the bug in linux-2.6 Here eth2 provided by tg3 module ip link add link eth2 eth2.103 type vlan id 103 gvrp on loose_binding on ip link add link eth2 eth2.104 type vlan id 104 gvrp on loose_binding on ip link set eth2.103 up ip link set eth2.104 up sync sleep 5 sync rmmod tg3 <<<BUG>>> [ 1848.669338] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1848.669388] IP: [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp] [ 1848.669421] PGD 11d251067 PUD 11b9e0067 PMD 0 [ 1848.669454] Oops: 0000 [#1] SMP [ 1848.669482] last sysfs file: /sys/devices/virtual/net/eth2.104/ifindex [ 1848.669512] CPU 0 [ 1848.669518] Modules linked in: tg3(-) 8021q garp nfsd lockd auth_rpcgss sunrpc libphy sg [last unloaded: x_tables] [ 1848.669613] [ 1848.669633] Pid: 11494, comm: rmmod Tainted: G W 2.6.39-rc6-00261-gfd71257-dirty #580 HP ProLiant BL460c G6 [ 1848.669688] RIP: 0010:[<ffffffffa0030c9e>] [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp] [ 1848.669740] RSP: 0018:ffff88007a19bae8 EFLAGS: 00010286 [ 1848.669766] RAX: 0000000000000000 RBX: ffff88011b5e2000 RCX: 0000000000000002 [ 1848.669796] RDX: 0000000000000000 RSI: 0000000000000175 RDI: ffffffffa0030d5b [ 1848.669826] RBP: ffff88007a19bb18 R08: 0000000000000001 R09: ffff88011bd64a00 [ 1848.669860] R10: ffff88011d34ec00 R11: 0000000000000000 R12: 0000000000000002 [ 1848.669890] R13: ffff88007a19bc48 R14: ffff88007a19bb88 R15: 0000000000000001 [ 1848.669920] FS: 0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f77d76c0 [ 1848.669966] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 1848.669993] CR2: 0000000000000000 CR3: 000000011a675000 CR4: 00000000000006f0 [ 1848.670023] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1848.670063] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1848.670115] Process rmmod (pid: 11494, threadinfo ffff88007a19a000, task ffff8800798595c0) [ 1848.672098] Stack: [ 1848.672118] ffff88007a19bb36 ffff88011c84b800 ffff88011b5e2000 ffff88007a19bc48 [ 1848.672172] ffff88007a19bb88 0000000000000006 ffff88007a19bb38 ffffffffa003a5f6 [ 1848.672235] ffff88007a19bb38 670088007a19bba8 ffff88007a19bb58 ffffffffa00397e7 [ 1848.672293] Call Trace: [ 1848.672317] [<ffffffffa003a5f6>] vlan_gvrp_request_leave+0x46/0x50 [8021q] [ 1848.672348] [<ffffffffa00397e7>] vlan_dev_stop+0xb7/0xc0 [8021q] [ 1848.672377] [<ffffffff8137e427>] __dev_close_many+0x87/0xe0 [ 1848.672405] [<ffffffff8137e507>] dev_close_many+0x87/0x110 [ 1848.672433] [<ffffffff8137e630>] rollback_registered_many+0xa0/0x240 [ 1848.672462] [<ffffffff8137e7e9>] unregister_netdevice_many+0x19/0x60 [ 1848.672492] [<ffffffffa00389eb>] vlan_device_event+0x53b/0x550 [8021q] [ 1848.672522] [<ffffffff8143f448>] ? ip6mr_device_event+0xa8/0xd0 [ 1848.672551] [<ffffffff81479d03>] notifier_call_chain+0x53/0x80 [ 1848.672583] [<ffffffff81062539>] __raw_notifier_call_chain+0x9/0x10 [ 1848.672612] [<ffffffff81062551>] raw_notifier_call_chain+0x11/0x20 [ 1848.672641] [<ffffffff8137df82>] call_netdevice_notifiers+0x32/0x60 [ 1848.672671] [<ffffffff8137e69f>] rollback_registered_many+0x10f/0x240 [ 1848.672700] [<ffffffff8137e85f>] rollback_registered+0x2f/0x40 [ 1848.672729] [<ffffffff8137e8c8>] unregister_netdevice_queue+0x58/0x90 [ 1848.672759] [<ffffffff8137e9eb>] unregister_netdev+0x1b/0x30 [ 1848.672788] [<ffffffffa005d73f>] tg3_remove_one+0x6f/0x10b [tg3] ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH net-2.6] vlan: fix GVRP at dismantle time 2011-05-09 20:17 ` Eric Dumazet @ 2011-05-10 6:40 ` Eric Dumazet 2011-05-10 19:23 ` David Miller 0 siblings, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-10 6:40 UTC (permalink / raw) To: David Miller; +Cc: mirqus, alex, netdev, jesse, greearb, Patrick McHardy Le lundi 09 mai 2011 à 22:17 +0200, Eric Dumazet a écrit : > Le lundi 09 mai 2011 à 21:05 +0200, Eric Dumazet a écrit : > > > BTW, bug must be present in net-2.6, if we unload vlan module (since in this > > case we also had a non NULL head ) > > Yes, I confirm we have the bug in linux-2.6 > Here is a patch to address this problem. Thanks ! [PATCH net-2.6] vlan: fix GVRP at dismantle time ip link add link eth2 eth2.103 type vlan id 103 gvrp on loose_binding on ip link set eth2.103 up rmmod tg3 # driver providing eth2 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp] PGD 11d251067 PUD 11b9e0067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/eth2.104/ifindex CPU 0 Modules linked in: tg3(-) 8021q garp nfsd lockd auth_rpcgss sunrpc libphy sg [last unloaded: x_tables] Pid: 11494, comm: rmmod Tainted: G W 2.6.39-rc6-00261-gfd71257-dirty #580 HP ProLiant BL460c G6 RIP: 0010:[<ffffffffa0030c9e>] [<ffffffffa0030c9e>] garp_request_leave+0x3e/0xc0 [garp] RSP: 0018:ffff88007a19bae8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88011b5e2000 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000175 RDI: ffffffffa0030d5b RBP: ffff88007a19bb18 R08: 0000000000000001 R09: ffff88011bd64a00 R10: ffff88011d34ec00 R11: 0000000000000000 R12: 0000000000000002 R13: ffff88007a19bc48 R14: ffff88007a19bb88 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88011fc00000(0063) knlGS:00000000f77d76c0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000011a675000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 11494, threadinfo ffff88007a19a000, task ffff8800798595c0) Stack: ffff88007a19bb36 ffff88011c84b800 ffff88011b5e2000 ffff88007a19bc48 ffff88007a19bb88 0000000000000006 ffff88007a19bb38 ffffffffa003a5f6 ffff88007a19bb38 670088007a19bba8 ffff88007a19bb58 ffffffffa00397e7 Call Trace: [<ffffffffa003a5f6>] vlan_gvrp_request_leave+0x46/0x50 [8021q] [<ffffffffa00397e7>] vlan_dev_stop+0xb7/0xc0 [8021q] [<ffffffff8137e427>] __dev_close_many+0x87/0xe0 [<ffffffff8137e507>] dev_close_many+0x87/0x110 [<ffffffff8137e630>] rollback_registered_many+0xa0/0x240 [<ffffffff8137e7e9>] unregister_netdevice_many+0x19/0x60 [<ffffffffa00389eb>] vlan_device_event+0x53b/0x550 [8021q] [<ffffffff8143f448>] ? ip6mr_device_event+0xa8/0xd0 [<ffffffff81479d03>] notifier_call_chain+0x53/0x80 [<ffffffff81062539>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff81062551>] raw_notifier_call_chain+0x11/0x20 [<ffffffff8137df82>] call_netdevice_notifiers+0x32/0x60 [<ffffffff8137e69f>] rollback_registered_many+0x10f/0x240 [<ffffffff8137e85f>] rollback_registered+0x2f/0x40 [<ffffffff8137e8c8>] unregister_netdevice_queue+0x58/0x90 [<ffffffff8137e9eb>] unregister_netdev+0x1b/0x30 [<ffffffffa005d73f>] tg3_remove_one+0x6f/0x10b [tg3] We should call vlan_gvrp_request_leave() from unregister_vlan_dev(), not from vlan_dev_stop(), because vlan_gvrp_uninit_applicant() is called right after unregister_netdevice_queue(). In batch mode, unregister_netdevice_queue() doesn’t immediately call vlan_dev_stop(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Jesse Gross <jesse@nicira.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> --- net/8021q/vlan.c | 3 +++ net/8021q/vlan_dev.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 7850412..0eb1a88 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -124,6 +124,9 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) grp->nr_vlans--; + if (vlan->flags & VLAN_FLAG_GVRP) + vlan_gvrp_request_leave(dev); + vlan_group_set_device(grp, vlan_id, NULL); if (!grp->killall) synchronize_net(); diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index e34ea9e..b2ff6c8 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -487,9 +487,6 @@ static int vlan_dev_stop(struct net_device *dev) struct vlan_dev_info *vlan = vlan_dev_info(dev); struct net_device *real_dev = vlan->real_dev; - if (vlan->flags & VLAN_FLAG_GVRP) - vlan_gvrp_request_leave(dev); - dev_mc_unsync(real_dev, dev); dev_uc_unsync(real_dev, dev); if (dev->flags & IFF_ALLMULTI) ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH net-2.6] vlan: fix GVRP at dismantle time 2011-05-10 6:40 ` [PATCH net-2.6] vlan: fix GVRP at dismantle time Eric Dumazet @ 2011-05-10 19:23 ` David Miller 0 siblings, 0 replies; 53+ messages in thread From: David Miller @ 2011-05-10 19:23 UTC (permalink / raw) To: eric.dumazet; +Cc: mirqus, alex, netdev, jesse, greearb, kaber From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 10 May 2011 08:40:36 +0200 > [PATCH net-2.6] vlan: fix GVRP at dismantle time Applied and queued up for -stable, thanks! ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v2 net-next-2.6] veth: use batched device unregister 2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet 2011-05-09 6:56 ` Michał Mirosław @ 2011-05-09 7:45 ` Eric Dumazet 2011-05-09 9:22 ` Eric Dumazet 1 sibling, 1 reply; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 7:45 UTC (permalink / raw) To: David Miller Cc: Alex Bligh, netdev, Jesse Gross, Paul E. McKenney, Ben Greear veth devices dont use the batched device unregisters yet. Since veth are a pair of devices, it makes sense to use a batch of two unregisters, this roughly divide dismantle time by two. Reported-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesse Gross <jesse@nicira.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ben Greear <greearb@candelatech.com> --- v2: added a list_del(&list) for safety (see commit ceaaec98) drivers/net/veth.c | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 3b0151a..b41d6a9 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -416,8 +416,17 @@ static void veth_dellink(struct net_device *dev, struct list_head *head) priv = netdev_priv(dev); peer = priv->peer; - unregister_netdevice_queue(dev, head); - unregister_netdevice_queue(peer, head); + if (head == NULL) { + LIST_HEAD(list); + /* make a batch of two devices to speedup unregister */ + unregister_netdevice_queue(dev, &list); + unregister_netdevice_queue(peer, &list); + unregister_netdevice_many(&list); + list_del(&list); + } else { + unregister_netdevice_queue(dev, head); + unregister_netdevice_queue(peer, head); + } } static const struct nla_policy veth_policy[VETH_INFO_MAX + 1]; ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v2 net-next-2.6] veth: use batched device unregister 2011-05-09 7:45 ` [PATCH v2 net-next-2.6] veth: use batched device unregister Eric Dumazet @ 2011-05-09 9:22 ` Eric Dumazet 0 siblings, 0 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 9:22 UTC (permalink / raw) To: David Miller Cc: Alex Bligh, netdev, Jesse Gross, Paul E. McKenney, Ben Greear Le lundi 09 mai 2011 à 09:45 +0200, Eric Dumazet a écrit : > veth devices dont use the batched device unregisters yet. > > Since veth are a pair of devices, it makes sense to use a batch of two > unregisters, this roughly divide dismantle time by two. > > Reported-by: Alex Bligh <alex@alex.org.uk> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Jesse Gross <jesse@nicira.com> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Ben Greear <greearb@candelatech.com> > --- > v2: added a list_del(&list) for safety (see commit ceaaec98) Just to make things clear, please dont apply this patch, since I posted another version including Michał idea. thanks ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 21:00 ` Eric Dumazet 2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet @ 2011-05-09 5:37 ` Alex Bligh 2011-05-09 6:37 ` Eric Dumazet 2011-05-09 7:11 ` Paul E. McKenney 2011-05-09 17:30 ` Jesse Gross 3 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-09 5:37 UTC (permalink / raw) To: Eric Dumazet, paulmck; +Cc: netdev, Jesse Gross, Alex Bligh --On 8 May 2011 23:00:47 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > We can probably speed things, but we should consider the following user > actions : How about > ip link add link eth0 vlan103 type vlan id 103 > ip link del vlan103 Removes and unlinks structures, including making name available, sending out netlink messages, but doesn't free things > ip link add link eth1 vlan103 type vlan id 103 creates new interface [some time later] original zombie i/f freed > The "link del" command should return to user only if the minimum things > had been done, to make sure the following "link add" wont fail > mysteriously. Are you worried about failure through name collision (already dealt with), vlan tag collision (ditto) or what? -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-09 5:37 ` Scalability of interface creation and deletion Alex Bligh @ 2011-05-09 6:37 ` Eric Dumazet 0 siblings, 0 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-09 6:37 UTC (permalink / raw) To: Alex Bligh; +Cc: paulmck, netdev, Jesse Gross, Ben Greear Le lundi 09 mai 2011 à 06:37 +0100, Alex Bligh a écrit : > > --On 8 May 2011 23:00:47 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > We can probably speed things, but we should consider the following user > > actions : > > How about > > > ip link add link eth0 vlan103 type vlan id 103 > > ip link del vlan103 > > Removes and unlinks structures, including making name available, sending > out netlink messages, but doesn't free things Most of the cleanup work has to be done with RTNL being held, and this might because of transaction atomicity requirement. In your test you dismantle idle devices. Now think a bit when you have both trafic in and out, sockets with destinations still pointing to the device, in flight arp requests, all this using RCU of course. When you dismantle one device (or several in case of a module unload), this can have implications on other devices (see veth cas for an obvious example : this automatically removes the peer device), but also on routes, neighbours, cached routes, various protocol cleanups, ... and so on. Few people even on netdev understand the whole picture. Given that 99.99% machines setup netdevice at boot time only, and hardly consider dismantles, we netdev guys were pragmatic and safe. Two or three synchronize_rcu() were considered as a non issue. It seems there is interest to improve things now. One way is to allow more batching and delegation, and I am working on that right now, using a kthread, so that we dont block the requester for the whole device dismantle. This kthread might use call_rcu() driven state machine, but that is a detail of implementation, since only kthread would be impacted. I am pretty busy at work these days, so dont expect patches before some time :) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 21:00 ` Eric Dumazet 2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet 2011-05-09 5:37 ` Scalability of interface creation and deletion Alex Bligh @ 2011-05-09 7:11 ` Paul E. McKenney 2011-05-09 17:30 ` Jesse Gross 3 siblings, 0 replies; 53+ messages in thread From: Paul E. McKenney @ 2011-05-09 7:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alex Bligh, netdev, Jesse Gross On Sun, May 08, 2011 at 11:00:47PM +0200, Eric Dumazet wrote: > Le dimanche 08 mai 2011 à 08:48 -0700, Paul E. McKenney a écrit : > > On Sun, May 08, 2011 at 04:17:42PM +0100, Alex Bligh wrote: > > > > > > If 6 jiffies per call to ensure cpus are idle is a fact of life, > > > then the question goes back to why interface removal is waiting > > > for rcu readers to be released synchronously, as opposed to > > > doing the update bits synchronously, then doing the reclaim > > > element (freeing the memory) afterwards using call_rcu. > > > > This would speed things up considerably, assuming that there is no > > other reason to block for an RCU grace period. > > Thats not so simple... Things are modular and better be safe than crash, > on a very rare event (device dismantles are not the thing we expect to > do very often. Only special needs might need to perform hundred of them > per minute...) I was afraid of that, but had to ask... > For example, in the VLAN dismantle phase (ip link del eth0.103) > we have 3 calls to synchronize_rcu() and one call to rcu_barrier() > > [ the 'extra' synchronize_rcu() call comes from unregister_vlan_dev() ] > > Maybe with new VLAN model, we could now remove this synchronize_net() > call from vlan code. Jesse what do you think ? > Once vlan_group_set_device(grp, vlan_id, NULL) had been called, why > should we respect one rcu grace period at all, given dev is queued to > unregister_netdevice_queue() [ which has its own couples of > synchronize_net() / rcu_barrier() ] > > > The real scalability problem of device dismantles comes from the fact > that all these waits are done under RTNL mutex. This is the real killer > because you cannot use your eight cpus, even if you are willing to. > > We can probably speed things, but we should consider the following user > actions : > > ip link add link eth0 vlan103 type vlan id 103 > ip link del vlan103 > ip link add link eth1 vlan103 type vlan id 103 > > The "link del" command should return to user only if the minimum things > had been done, to make sure the following "link add" wont fail > mysteriously. Hmmm... One approach would be to use synchronize_rcu_expedited(), though that is a bit of a big hammer. Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 21:00 ` Eric Dumazet ` (2 preceding siblings ...) 2011-05-09 7:11 ` Paul E. McKenney @ 2011-05-09 17:30 ` Jesse Gross 3 siblings, 0 replies; 53+ messages in thread From: Jesse Gross @ 2011-05-09 17:30 UTC (permalink / raw) To: Eric Dumazet; +Cc: paulmck, Alex Bligh, netdev On Sun, May 8, 2011 at 2:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > For example, in the VLAN dismantle phase (ip link del eth0.103) > we have 3 calls to synchronize_rcu() and one call to rcu_barrier() > > [ the 'extra' synchronize_rcu() call comes from unregister_vlan_dev() ] > > Maybe with new VLAN model, we could now remove this synchronize_net() > call from vlan code. Jesse what do you think ? > Once vlan_group_set_device(grp, vlan_id, NULL) had been called, why > should we respect one rcu grace period at all, given dev is queued to > unregister_netdevice_queue() [ which has its own couples of > synchronize_net() / rcu_barrier() ] Yes, I agree that the extra call to synchronize_net() provides no value, though I think that's actually been true for a while. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 9:35 ` Alex Bligh 2011-05-08 12:18 ` Alex Bligh @ 2011-05-08 12:44 ` Paul E. McKenney 2011-05-08 13:06 ` Alex Bligh 1 sibling, 1 reply; 53+ messages in thread From: Paul E. McKenney @ 2011-05-08 12:44 UTC (permalink / raw) To: Alex Bligh; +Cc: Eric Dumazet, netdev On Sun, May 08, 2011 at 10:35:02AM +0100, Alex Bligh wrote: > Eric, > > --On 8 May 2011 09:12:22 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > >By the way, if I change HZ from 1000 to 100 I now have ten times slower > >result : > > I repeated that test here. With HZ set to 1000 I got a total time of > 4.022 seconds to remove 100 interfaces, of which: > > Total 3.03808 Usage 199 Average 0.01527 elsewhere > Total 0.93992 Usage 200 Average 0.00470 synchronizing > > as opposed to a total of 27.917 seconds with HZ set to 100, of which > > Total 18.98515 Usage 199 Average 0.09540 elsewhere > Total 8.77581 Usage 200 Average 0.04388 synchronizing > > Not quite a factor of 10 improvement, but nearly. > > I have CONFIG_RCU_FAST_NO_HZ=y > > I suspect this may just mean an rcu reader holds the rcu_read_lock > for a jiffies related time. Though I'm having difficulty seeing > what that might be on a system where the net is in essence idle. OK, let's break it out... 4.022 seconds for 100 interfaces means about 40 milliseconds per interface. My guess is that you have CONFIG_NO_HZ=y, which means that RCU needs to figure out that various CPUs are in dyntick-idle state, which is a minimum of 6 jiffies. It could be longer if a given CPU happens to be in IRQ when RCU checks, so call it 9 jiffies. If you are doing the interfaces synchronously, you will likely have to wait for a prior grace period (due to background activity). So I can easily imagine 18 milliseconds for HZ=1000. 40 milliseconds sounds a bit high, but perhaps not impossible. Thanx, Paul ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 12:44 ` Paul E. McKenney @ 2011-05-08 13:06 ` Alex Bligh 2011-05-08 13:14 ` Alex Bligh 0 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-08 13:06 UTC (permalink / raw) To: paulmck; +Cc: Eric Dumazet, netdev, Alex Bligh --On 8 May 2011 05:44:28 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > OK, let's break it out... > > 4.022 seconds for 100 interfaces means about 40 milliseconds per > interface. It's 200 interfaces as they are veth pairs, so 20ms. > My guess is that you have CONFIG_NO_HZ=y, which means that RCU needs to > figure out that various CPUs are in dyntick-idle state, which is a minimum > of 6 jiffies. That sounds like the main cause. > It could be longer if a given CPU happens to be in IRQ > when RCU checks, so call it 9 jiffies. That sounds unlikely to happen much of the time. Looking at /proc/interrupts not much is going on. > If you are doing the interfaces > synchronously, you will likely have to wait for a prior grace period (due > to background activity). Not sure I understand that (there's nothing else going on), but... > So I can easily imagine 18 milliseconds for > HZ=1000. 40 milliseconds sounds a bit high, but perhaps not impossible. Add to that the fact that there are 2 x synchronize_sched() and one rcu_barrier() per interface removal. If I ignore your IRQ idea, that's 6 x 3 = 18ms per interface removal at CONFIG_HZ=10, 180ms at CONFIG_HZ=100. -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 13:06 ` Alex Bligh @ 2011-05-08 13:14 ` Alex Bligh 0 siblings, 0 replies; 53+ messages in thread From: Alex Bligh @ 2011-05-08 13:14 UTC (permalink / raw) To: Alex Bligh, paulmck; +Cc: Eric Dumazet, netdev, Alex Bligh --On 8 May 2011 14:06:07 +0100 Alex Bligh <alex@alex.org.uk> wrote: > Add to that the fact that there are 2 x synchronize_sched() and one > rcu_barrier() per interface removal. If I ignore your IRQ idea, that's > 6 x 3 = 18ms per interface removal at CONFIG_HZ=10, 180ms at > CONFIG_HZ=100. Aargh I mean: 6 x 3 = 18ms per interface removal at CONFIG_HZ=1000, 180ms at CONFIG_HZ=100. -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-08 7:12 ` Eric Dumazet 2011-05-08 8:06 ` Alex Bligh 2011-05-08 9:35 ` Alex Bligh @ 2011-05-08 12:32 ` Paul E. McKenney 2 siblings, 0 replies; 53+ messages in thread From: Paul E. McKenney @ 2011-05-08 12:32 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alex Bligh, netdev On Sun, May 08, 2011 at 09:12:22AM +0200, Eric Dumazet wrote: > Le samedi 07 mai 2011 à 20:42 +0200, Eric Dumazet a écrit : > > Here is my trace here for one device deletion on one 8 core machine > > > > [ 800.447012] synchronize_rcu() in 15787 us > > [ 800.455013] synchronize_rcu() in 7682 us > > [ 800.464019] rcu_barrier() in 8487 us > > > > Not that bad. There is always synchronize_rcu_expedited() if you need lower latency and can tolerate a bit higher CPU overhead. Thanx, Paul > > $ grep RCU .config > > # RCU Subsystem > > CONFIG_TREE_RCU=y > > # CONFIG_PREEMPT_RCU is not set > > CONFIG_RCU_TRACE=y > > CONFIG_RCU_FANOUT=32 > > # CONFIG_RCU_FANOUT_EXACT is not set > > # CONFIG_RCU_FAST_NO_HZ is not set > > CONFIG_TREE_RCU_TRACE=y > > > > By the way, if I change HZ from 1000 to 100 I now have ten times slower > result : > > # ip link add link eth0 eth0.103 type vlan id 103 > # time ip link del eth0.103 > > real 0m0.430s > user 0m0.000s > sys 0m0.000s > > So all this is related to your HZ value, even in a CONFIG_NO_HZ=y > kernel. Alex, I guess you have HZ=250 ? > > # uname -a > Linux svivoipvnx021 2.6.39-rc6-00214-g5511a34-dirty #574 SMP Sun May 8 > 08:44:14 CEST 2011 x86_64 x86_64 x86_64 GNU/Linux > # cat /proc/cmdline > > I enabled CONFIG_RCU_FAST_NO_HZ and got worse results (but not > alsways... its very variable) > > # time ip link del eth0.103 > > real 0m0.544s > user 0m0.000s > sys 0m0.000s > > > # time ip link del eth0.103 > > real 0m0.414s > user 0m0.000s > sys 0m0.000s > > ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:32 ` Eric Dumazet 2011-05-07 18:39 ` Eric Dumazet 2011-05-07 18:42 ` Eric Dumazet @ 2011-05-07 18:51 ` Alex Bligh 2011-05-07 19:24 ` Eric Dumazet 2 siblings, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-07 18:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Alex Bligh --On 7 May 2011 20:32:54 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Well, there is also one rcu_barrier() call that is expensive. > (It was changed from one synchronize_rcu() to one rcu_barrier() lately > in commit ef885afb , in 2.6.36 kernel) I think you are saying it may be waiting in rcu_barrier(). I'll instrument that later plus synchronize_sched(). > http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi > tdiff;h=ef885afbf8a37689afc1d9d545e2f3e7a8276c17 OK, so in head, which I am using, rollback_registered_many which previously had 2 calls to synchronize_net(), now has one, followed by a call to rc_barrier() at the bottom. > netdev_wait_allrefs() waits that all references to a device vanishes. > > It currently uses a _very_ pessimistic 250 ms delay between each probe. > Some users reported that no more than 4 devices can be dismantled per > second, this is a pretty serious problem for some setups. Right, that's what I patched before (see patch attached to message from earlier today) to do an exponential backoff (see previous entry), i.e. do a 5ms sleep, then a 10ms, then a 20ms, but never more than 250ms. It made no difference. > time to remove 50 ipip tunnels on a UP machine : > > before patch : real 11.910s > after patch : real 1.250s Sadly I don't see that improvement! -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:51 ` Alex Bligh @ 2011-05-07 19:24 ` Eric Dumazet 0 siblings, 0 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 19:24 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 19:51 +0100, Alex Bligh a écrit : > > --On 7 May 2011 20:32:54 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Well, there is also one rcu_barrier() call that is expensive. > > (It was changed from one synchronize_rcu() to one rcu_barrier() lately > > in commit ef885afb , in 2.6.36 kernel) > > I think you are saying it may be waiting in rcu_barrier(). I'll > instrument that later plus synchronize_sched(). > > > http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi > > tdiff;h=ef885afbf8a37689afc1d9d545e2f3e7a8276c17 > > OK, so in head, which I am using, rollback_registered_many which > previously had 2 calls to synchronize_net(), now has one, followed > by a call to rc_barrier() at the bottom. > each device dismantle needs 2 synchronize_rcu() and one rcu_barrier() > Right, that's what I patched before (see patch attached to > message from earlier today) to do an exponential backoff (see > previous entry), i.e. do a 5ms sleep, then a 10ms, then a 20ms, but > never more than 250ms. It made no difference. > Oh well. How many time are you going to tell us about this ? We suggested to wait no more than 1 ms, or even shout asap. If after synchronize_rcu() and rcu_barrier() calls, they are still references to the device, then there is a BUG somewhere. Since these bugs are usually not fatal, we just wait a bit. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 16:26 ` Eric Dumazet 2011-05-07 18:24 ` Alex Bligh @ 2011-05-07 18:38 ` Alex Bligh 2011-05-07 18:44 ` Eric Dumazet 1 sibling, 1 reply; 53+ messages in thread From: Alex Bligh @ 2011-05-07 18:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Alex Bligh --On 7 May 2011 18:26:29 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: ># time rmmod dummy > > real 0m0.111s > user 0m0.000s > sys 0m0.000s > > > This removed my two dummy0/dummy1 devices. rmmod dummy even with numdummies=100 does only one synchronize_net() and is quick (0.8ms). -- Alex Bligh ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Scalability of interface creation and deletion 2011-05-07 18:38 ` Alex Bligh @ 2011-05-07 18:44 ` Eric Dumazet 0 siblings, 0 replies; 53+ messages in thread From: Eric Dumazet @ 2011-05-07 18:44 UTC (permalink / raw) To: Alex Bligh; +Cc: netdev Le samedi 07 mai 2011 à 19:38 +0100, Alex Bligh a écrit : > > --On 7 May 2011 18:26:29 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > ># time rmmod dummy > > > > real 0m0.111s > > user 0m0.000s > > sys 0m0.000s > > > > > > This removed my two dummy0/dummy1 devices. > > rmmod dummy even with numdummies=100 does only one synchronize_net() and > is quick (0.8ms). > Yes, thanks to batching we added some time ago to speedup module unload. And because you didnt setup IP addresses on them ;) for i in `seq 0 99` do ifconfig dummy$i 192.168.$i.1 up done ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2011-05-10 19:23 UTC | newest] Thread overview: 53+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-07 11:08 Scalability of interface creation and deletion Alex Bligh 2011-05-07 12:22 ` Eric Dumazet 2011-05-07 15:26 ` Alex Bligh 2011-05-07 15:54 ` Eric Dumazet 2011-05-07 16:23 ` Ben Greear 2011-05-07 16:37 ` Eric Dumazet 2011-05-07 16:44 ` Ben Greear 2011-05-07 16:51 ` Eric Dumazet 2011-05-08 3:45 ` Ben Greear 2011-05-08 8:08 ` Alex Bligh 2011-05-09 21:46 ` Octavian Purdila 2011-05-07 16:26 ` Eric Dumazet 2011-05-07 18:24 ` Alex Bligh 2011-05-07 18:32 ` Eric Dumazet 2011-05-07 18:39 ` Eric Dumazet 2011-05-08 10:09 ` Alex Bligh 2011-05-07 18:42 ` Eric Dumazet 2011-05-07 18:50 ` Alex Bligh 2011-05-08 7:12 ` Eric Dumazet 2011-05-08 8:06 ` Alex Bligh 2011-05-08 9:35 ` Alex Bligh 2011-05-08 12:18 ` Alex Bligh 2011-05-08 12:50 ` Paul E. McKenney 2011-05-08 13:13 ` Alex Bligh 2011-05-08 13:44 ` Paul E. McKenney 2011-05-08 14:27 ` Alex Bligh 2011-05-08 14:47 ` Paul E. McKenney 2011-05-08 15:17 ` Alex Bligh 2011-05-08 15:48 ` Paul E. McKenney 2011-05-08 21:00 ` Eric Dumazet 2011-05-09 4:44 ` [PATCH] veth: use batched device unregister Eric Dumazet 2011-05-09 6:56 ` Michał Mirosław 2011-05-09 8:20 ` Eric Dumazet 2011-05-09 9:17 ` [PATCH net-next-2.6] net: use batched device unregister in veth and macvlan Eric Dumazet 2011-05-09 18:42 ` David Miller 2011-05-09 19:05 ` Eric Dumazet 2011-05-09 20:17 ` Eric Dumazet 2011-05-10 6:40 ` [PATCH net-2.6] vlan: fix GVRP at dismantle time Eric Dumazet 2011-05-10 19:23 ` David Miller 2011-05-09 7:45 ` [PATCH v2 net-next-2.6] veth: use batched device unregister Eric Dumazet 2011-05-09 9:22 ` Eric Dumazet 2011-05-09 5:37 ` Scalability of interface creation and deletion Alex Bligh 2011-05-09 6:37 ` Eric Dumazet 2011-05-09 7:11 ` Paul E. McKenney 2011-05-09 17:30 ` Jesse Gross 2011-05-08 12:44 ` Paul E. McKenney 2011-05-08 13:06 ` Alex Bligh 2011-05-08 13:14 ` Alex Bligh 2011-05-08 12:32 ` Paul E. McKenney 2011-05-07 18:51 ` Alex Bligh 2011-05-07 19:24 ` Eric Dumazet 2011-05-07 18:38 ` Alex Bligh 2011-05-07 18:44 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).