* [PATCH 0/2] netem: trace enhancement @ 2007-11-20 22:11 Ariane Keller 2007-11-27 13:57 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-11-20 22:11 UTC (permalink / raw) To: shemminger; +Cc: netdev Hi Stephen Approximately a year ago we discussed an enhancement to netem, which we called "trace control for netem". We obtain the value for the packet delay, drop, duplication and corruption from a so called "trace file". The trace file may be obtained by monitoring network traffic and thus enables us to emulate "real world" network behavior. Traces can ether be generated individually (we supply a set of tools to do this) or can be downloaded from our homepage: http://tcn.hypert.net . Since our last submission on 2006-12-15 we did some code clean up and have created two new patches one against kernel 2.6.23.8 and one against iproute2-2.6.23. To refer to our discussion from last year please have a look at messages with subject "LARTC: trace control for netem". We are looking forward for any comments, suggestions and instructions to bring the trace enhancement to the kernel and to iproute2. Thanks, Ariane ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-11-20 22:11 [PATCH 0/2] netem: trace enhancement Ariane Keller @ 2007-11-27 13:57 ` Ariane Keller 2007-11-29 21:45 ` Stephen Hemminger 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-11-27 13:57 UTC (permalink / raw) To: netdev; +Cc: shemminger, herbert, Rainer Baumann I just wanted to ask whether there is a general interest in this patch. If yes: great, how to proceed? otherwise: please let me know why. Thanks! Ariane Keller wrote: > Hi Stephen > > Approximately a year ago we discussed an enhancement to netem, > which we called "trace control for netem". > > We obtain the value for the packet delay, drop, duplication and > corruption from a so called "trace file". The trace file may be obtained > by monitoring network traffic and thus enables us to emulate "real > world" network behavior. > > Traces can ether be generated individually (we supply a set of tools to > do this) or can be downloaded from our homepage: http://tcn.hypert.net . > > Since our last submission on 2006-12-15 we did some code clean up and > have created two new patches one against kernel 2.6.23.8 and one against > iproute2-2.6.23. > To refer to our discussion from last year please have a look at messages > with subject "LARTC: trace control for netem". > > We are looking forward for any comments, suggestions and instructions to > bring the trace enhancement to the kernel and to iproute2. > > Thanks, > Ariane > > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-11-27 13:57 ` Ariane Keller @ 2007-11-29 21:45 ` Stephen Hemminger 2007-11-29 22:03 ` Patrick McHardy 0 siblings, 1 reply; 26+ messages in thread From: Stephen Hemminger @ 2007-11-29 21:45 UTC (permalink / raw) To: Ariane Keller; +Cc: netdev, herbert, Rainer Baumann On Tue, 27 Nov 2007 14:57:26 +0100 Ariane Keller <ariane.keller@tik.ee.ethz.ch> wrote: > I just wanted to ask whether there is a general interest in this patch. > If yes: great, how to proceed? > otherwise: please let me know why. > > Thanks! > > > > > Ariane Keller wrote: > > Hi Stephen > > > > Approximately a year ago we discussed an enhancement to netem, > > which we called "trace control for netem". > > > > We obtain the value for the packet delay, drop, duplication and > > corruption from a so called "trace file". The trace file may be obtained > > by monitoring network traffic and thus enables us to emulate "real > > world" network behavior. > > > > Traces can ether be generated individually (we supply a set of tools to > > do this) or can be downloaded from our homepage: http://tcn.hypert.net . > > > > Since our last submission on 2006-12-15 we did some code clean up and > > have created two new patches one against kernel 2.6.23.8 and one against > > iproute2-2.6.23. > > To refer to our discussion from last year please have a look at messages > > with subject "LARTC: trace control for netem". > > > > We are looking forward for any comments, suggestions and instructions to > > bring the trace enhancement to the kernel and to iproute2. > > > > Thanks, > > Ariane Still interested in this. I got part way through integrating it but had concerns about the API from the application to netem for getting the data. It seemed like there ought to be a better way to do it that could handle large data sets better, but never really got a good solution worked out (that is why I never said anything). The 2.6.23.8 patch seems to be unavailable right now. -- Stephen Hemminger <shemminger@linux-foundation.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-11-29 21:45 ` Stephen Hemminger @ 2007-11-29 22:03 ` Patrick McHardy 2007-11-30 16:25 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Patrick McHardy @ 2007-11-29 22:03 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ariane Keller, netdev, herbert, Rainer Baumann Stephen Hemminger wrote: > Still interested in this. I got part way through integrating it but had > concerns about the API from the application to netem for getting the data. > It seemed like there ought to be a better way to do it that could handle large > data sets better, but never really got a good solution worked out (that is why > I never said anything). Would spreading them over multiple netlink messages work? A different, slightly ugly possibility would be to simply use copy_from_user, netlink is synchronous now (still better than using configfs IMO). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-11-29 22:03 ` Patrick McHardy @ 2007-11-30 16:25 ` Ariane Keller 2007-12-03 7:45 ` Patrick McHardy 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-11-30 16:25 UTC (permalink / raw) To: Patrick McHardy Cc: Stephen Hemminger, Ariane Keller, netdev, herbert, Rainer Baumann Thanks for your comments! I'd like to better understand your dislike of the current implementation of the data transfer from user space to kernel space. Is it the fact that we use configfs? I think, we had already a discussion about this (and we changed from procfs to configfs). Or don't you like that we need a user space daemon which is responsible for feeding the data to the kernel module? I think we do not have another option, since the trace file may be of arbitrary length. Or anything else? Patrick McHardy wrote: > Stephen Hemminger wrote: >> Still interested in this. I got part way through integrating it but had >> concerns about the API from the application to netem for getting the >> data. >> It seemed like there ought to be a better way to do it that could >> handle large >> data sets better, but never really got a good solution worked out >> (that is why >> I never said anything). > > Would spreading them over multiple netlink messages work? A different, > slightly ugly possibility would be to simply use copy_from_user, netlink > is synchronous now (still better than using configfs IMO). > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-11-30 16:25 ` Ariane Keller @ 2007-12-03 7:45 ` Patrick McHardy 2007-12-03 9:12 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Patrick McHardy @ 2007-12-03 7:45 UTC (permalink / raw) To: Ariane Keller; +Cc: Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > Thanks for your comments! > > I'd like to better understand your dislike of the current implementation > of the data transfer from user space to kernel space. > Is it the fact that we use configfs? > I think, we had already a discussion about this (and we changed from > procfs to configfs). > Or don't you like that we need a user space daemon which is responsible > for feeding the data to the kernel module? > I think we do not have another option, since the trace file may be of > arbitrary length. > Or anything else? I dislike using anything besides rtnetlink for qdisc configuration. The only way to transfer arbitary amounts of data over netlink would be to spread the data over multiple messages. But then again, you're using kmalloc and only seem to allocate 4k, so how large are these traces in practice? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-03 7:45 ` Patrick McHardy @ 2007-12-03 9:12 ` Ariane Keller 2007-12-03 17:35 ` Patrick McHardy 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-12-03 9:12 UTC (permalink / raw) To: Patrick McHardy Cc: Ariane Keller, Stephen Hemminger, netdev, herbert, Rainer Baumann Patrick McHardy wrote: > Ariane Keller wrote: >> Thanks for your comments! >> >> I'd like to better understand your dislike of the current >> implementation of the data transfer from user space to kernel space. >> Is it the fact that we use configfs? >> I think, we had already a discussion about this (and we changed from >> procfs to configfs). >> Or don't you like that we need a user space daemon which is >> responsible for feeding the data to the kernel module? >> I think we do not have another option, since the trace file may be of >> arbitrary length. >> Or anything else? > > > I dislike using anything besides rtnetlink for qdisc configuration. > The only way to transfer arbitary amounts of data over netlink would > be to spread the data over multiple messages. But then again, you're > using kmalloc and only seem to allocate 4k, so how large are these > traces in practice? For each packet to be processed there is 32bit of data, which encodes delay and drop, duplicate etc. The size of the actual trace file can therefore reach any length, depending on for how many packets the information is encoded (up to several GB). Therefore we send the trace file in chunks of 4000bytes to the kernel. In order to have always a "packet-delay-value ready", we maintain two "delay queues" in the kernel (each of 4k). In a first step, both queues are filled, and the values are read from the first queue, if this queue is finished, we read values from the second queue and fill the first queue with new values from the trace file etc. Therefore we have a user space process running, which reads the values from the trace file, and sends them to the kernel. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-03 9:12 ` Ariane Keller @ 2007-12-03 17:35 ` Patrick McHardy 2007-12-03 18:29 ` Ben Greear 0 siblings, 1 reply; 26+ messages in thread From: Patrick McHardy @ 2007-12-03 17:35 UTC (permalink / raw) To: Ariane Keller; +Cc: Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > Patrick McHardy wrote: >> >> I dislike using anything besides rtnetlink for qdisc configuration. >> The only way to transfer arbitary amounts of data over netlink would >> be to spread the data over multiple messages. But then again, you're >> using kmalloc and only seem to allocate 4k, so how large are these >> traces in practice? > > For each packet to be processed there is 32bit of data, which encodes > delay and drop, duplicate etc. The size of the actual trace file can > therefore reach any length, depending on for how many packets the > information is encoded (up to several GB). > Therefore we send the trace file in chunks of 4000bytes to the kernel. > In order to have always a "packet-delay-value ready", we maintain two > "delay queues" in the kernel (each of 4k). In a first step, both queues > are filled, and the values are read from the first queue, if this queue > is finished, we read values from the second queue and fill the first > queue with new values from the trace file etc. Therefore we have a user > space process running, which reads the values from the trace file, and > sends them to the kernel. That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-03 17:35 ` Patrick McHardy @ 2007-12-03 18:29 ` Ben Greear 2007-12-04 14:45 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Ben Greear @ 2007-12-03 18:29 UTC (permalink / raw) To: Patrick McHardy Cc: Ariane Keller, Stephen Hemminger, netdev, herbert, Rainer Baumann Patrick McHardy wrote: > > That sounds like it would also be possible using rtnetlink. You could > send out a notification whenever you switch the active buffer and have > userspace listen to these and replace the inactive one. Also, I think you will need a larger cache than 4-8k if you are running higher speeds (100,000 pps, etc), as you probably can't rely on user-space responding reliably every 10ms (or even less time for faster speeds.) Thanks, Ben > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-03 18:29 ` Ben Greear @ 2007-12-04 14:45 ` Ariane Keller 2007-12-04 14:58 ` Patrick McHardy 2007-12-04 17:40 ` Ben Greear 0 siblings, 2 replies; 26+ messages in thread From: Ariane Keller @ 2007-12-04 14:45 UTC (permalink / raw) To: Ben Greear, Patrick McHardy Cc: Ariane Keller, Stephen Hemminger, netdev, herbert, Rainer Baumann >> That sounds like it would also be possible using rtnetlink. You could >> send out a notification whenever you switch the active buffer and have >> userspace listen to these and replace the inactive one. I guess using rtnetlink is possible. However I'm not sure about how to implement it: The first thought was to use RTM_NEWQDISC to send the data to the netem_change() function (similar to tc_qdisc_modify() ). But with this we would need the tcm_handle, tcm_parent arguments etc. which are not known in q_netem.c Therefore we would have to change the parse_qopt() function prototype in order to pass the whole "req" and not only the nlmsghdr. The second possibility would be to add a new message type, e.g RTM_NETEMDATA. This type would be registered in the netem kernel module with a callback function netem_recv_data(). If this function receives some data it searches for the correct flow, and saves the data in the corresponding buffer. However, I'm not convinced of any of these options. Do you have an alternative suggestion? > Also, I think you will need a larger cache than 4-8k if you are > running higher speeds (100,000 pps, etc), > as you probably can't rely on user-space responding reliably every > 10ms (or even less time for faster > speeds.) Increasing the cache size to say 32k for each buffer would be no problem. Is this enough? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 14:45 ` Ariane Keller @ 2007-12-04 14:58 ` Patrick McHardy 2007-12-05 12:57 ` Ariane Keller 2007-12-04 17:40 ` Ben Greear 1 sibling, 1 reply; 26+ messages in thread From: Patrick McHardy @ 2007-12-04 14:58 UTC (permalink / raw) To: Ariane Keller Cc: Ben Greear, Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > >>> That sounds like it would also be possible using rtnetlink. You could >>> send out a notification whenever you switch the active buffer and have >>> userspace listen to these and replace the inactive one. > > I guess using rtnetlink is possible. However I'm not sure about how to > implement it: > The first thought was to use RTM_NEWQDISC to send the data to the > netem_change() function (similar to tc_qdisc_modify() ). That sounds reasonable. > But with this > we would need the tcm_handle, tcm_parent arguments etc. which are not > known in q_netem.c > Therefore we would have to change the parse_qopt() function prototype in > order to pass the whole "req" and not only the nlmsghdr. I assume you mean netem_init, parse_qopt is userspace. But I don't see how that is related, emptying the buffer happens during packet processing, right? I guess I would simply change the qdisc_notify function to not require a struct nlmsghdr * (simply pass nlmsg_seq directly) and use that to send notifications. The netem dump function would add the buffer state. BTW, the parent class id is available in sch->parent, the handle in sch->handle, but qdisc_notify should take care of everything you need. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 14:58 ` Patrick McHardy @ 2007-12-05 12:57 ` Ariane Keller 2007-12-05 13:05 ` Patrick McHardy 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-12-05 12:57 UTC (permalink / raw) To: Patrick McHardy Cc: Ariane Keller, Ben Greear, Stephen Hemminger, netdev, herbert, Rainer Baumann Thanks for your comments! Patrick McHardy wrote: > Ariane Keller wrote: >> >>>> That sounds like it would also be possible using rtnetlink. You could >>>> send out a notification whenever you switch the active buffer and have >>>> userspace listen to these and replace the inactive one. >> >> I guess using rtnetlink is possible. However I'm not sure about how to >> implement it: >> The first thought was to use RTM_NEWQDISC to send the data to the >> netem_change() function (similar to tc_qdisc_modify() ). > > That sounds reasonable. > >> But with this we would need the tcm_handle, tcm_parent arguments etc. >> which are not known in q_netem.c >> Therefore we would have to change the parse_qopt() function prototype >> in order to pass the whole "req" and not only the nlmsghdr. > > I assume you mean netem_init, parse_qopt is userspace. But I don't > see how that is related, emptying the buffer happens during packet > processing, right? Actually I meant parse_qopt from user space. If we would change that function prototype we would have the whole message header available in netem_parse_opt() and could pass this to the process which is responsible for sending the data to the kernel. This process would use this header every time it has to send new values to the netem_change() function in the kernel module. I thought about this because I was not aware of the qdisc_notify function. Anyway I've got some troubles with calling qdisc_notify. 1. I have to do a EXPORT_SYMBOL(qdisc_notify) (currently it is declared static in sch_api.c) 2. I'd like to call it from netem_enqueue(), which leads to a "sleeping function called from invalid context", since we are still in interrupt context. Therefore I think I have to put it in a workqueue. I hope, this is ok. > > I guess I would simply change the qdisc_notify function to not > require a struct nlmsghdr * (simply pass nlmsg_seq directly) and > use that to send notifications. The netem dump function would > add the buffer state. BTW, the parent class id is available in > sch->parent, the handle in sch->handle, but qdisc_notify should > take care of everything you need. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-05 12:57 ` Ariane Keller @ 2007-12-05 13:05 ` Patrick McHardy 2007-12-10 14:32 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Patrick McHardy @ 2007-12-05 13:05 UTC (permalink / raw) To: Ariane Keller Cc: Ben Greear, Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > Thanks for your comments! > > Patrick McHardy wrote: > >>> But with this we would need the tcm_handle, tcm_parent arguments etc. >>> which are not known in q_netem.c >>> Therefore we would have to change the parse_qopt() function prototype >>> in order to pass the whole "req" and not only the nlmsghdr. >> >> I assume you mean netem_init, parse_qopt is userspace. But I don't >> see how that is related, emptying the buffer happens during packet >> processing, right? > > Actually I meant parse_qopt from user space. > If we would change that function prototype we would have the whole > message header available in netem_parse_opt() and could pass this to the > process which is responsible for sending the data to the kernel. This > process would use this header every time it has to send new values to > the netem_change() function in the kernel module. You don't actually want to parse tc output in your program? Just open a netlink socket and do the necessary processing yourself, libnl makes this really easy. > I thought about this because I was not aware of the qdisc_notify function. > Anyway I've got some troubles with calling qdisc_notify. > 1. I have to do a EXPORT_SYMBOL(qdisc_notify) (currently it is declared > static in sch_api.c) This is fine. > 2. I'd like to call it from netem_enqueue(), which leads to a "sleeping > function called from invalid context", since we are still in interrupt > context. Therefore I think I have to put it in a workqueue. Just change it to use gfp_any(). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-05 13:05 ` Patrick McHardy @ 2007-12-10 14:32 ` Ariane Keller 2007-12-12 23:13 ` Stephen Hemminger 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-12-10 14:32 UTC (permalink / raw) To: Patrick McHardy Cc: Ariane Keller, Ben Greear, Stephen Hemminger, netdev, herbert, Rainer Baumann I finally managed to rewrite the netem trace extension to use rtnetlink communication for the data transfer for user space to kernel space. The kernel patch is available here: http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink and the iproute patch is here: http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink Whenever new data is needed the kernel module sends a notification to the user space process. Thereupon the user space process sends a data package to the kernel module. I had to write a new qdisc_notify function (qdisc_notify_pid) since the other was acquiring a lock, which we already hold in this situation. I hope everything works as expected and I'm looking forward for your comments. Thanks! Ariane ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-10 14:32 ` Ariane Keller @ 2007-12-12 23:13 ` Stephen Hemminger 0 siblings, 0 replies; 26+ messages in thread From: Stephen Hemminger @ 2007-12-12 23:13 UTC (permalink / raw) To: Ariane Keller Cc: Patrick McHardy, Ariane Keller, Ben Greear, netdev, herbert, Rainer Baumann On Mon, 10 Dec 2007 15:32:14 +0100 Ariane Keller <ariane.keller@tik.ee.ethz.ch> wrote: > I finally managed to rewrite the netem trace extension to use rtnetlink > communication for the data transfer for user space to kernel space. > > The kernel patch is available here: > http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink > > and the iproute patch is here: > http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink > > Whenever new data is needed the kernel module sends a notification to > the user space process. Thereupon the user space process sends a data > package to the kernel module. I wonder if it wouldn't be possible to enhance/extend netlink to use sendfile/splice to get the data. It is rather more work than needed for just this, but it would be useful for large configuration. > I had to write a new qdisc_notify function (qdisc_notify_pid) since the > other was acquiring a lock, which we already hold in this situation. > I hope everything works as expected and I'm looking forward for your > comments. > > Thanks! > Ariane -- Stephen Hemminger <shemminger@linux-foundation.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 14:45 ` Ariane Keller 2007-12-04 14:58 ` Patrick McHardy @ 2007-12-04 17:40 ` Ben Greear 2007-12-04 17:54 ` Ariane Keller 1 sibling, 1 reply; 26+ messages in thread From: Ben Greear @ 2007-12-04 17:40 UTC (permalink / raw) To: Ariane Keller Cc: Patrick McHardy, Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > > Increasing the cache size to say 32k for each buffer would be no problem. > Is this enough? Maybe just a variable length list of 4k buffers chained together? Its usually easier to get 4k chunks of memory than 32k chunks, especially under high network load, and if you go ahead an make it arbitrary length, then each user can determine how many they want to have queued... Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 17:40 ` Ben Greear @ 2007-12-04 17:54 ` Ariane Keller 2007-12-04 18:07 ` Ben Greear 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-12-04 17:54 UTC (permalink / raw) To: Ben Greear Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev, herbert, Rainer Baumann I thought about that as well, but in my opinion this does not help much. It's the same as before: in average every 10ms a new buffer needs to be filled. Ben Greear wrote: > Ariane Keller wrote: >> >> Increasing the cache size to say 32k for each buffer would be no problem. >> Is this enough? > Maybe just a variable length list of 4k buffers chained together? Its > usually easier > to get 4k chunks of memory than 32k chunks, especially under high > network load, > and if you go ahead an make it arbitrary length, then each user can > determine how many > they want to have queued... > > Thanks, > Ben > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 17:54 ` Ariane Keller @ 2007-12-04 18:07 ` Ben Greear 2007-12-04 21:41 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Ben Greear @ 2007-12-04 18:07 UTC (permalink / raw) To: Ariane Keller Cc: Patrick McHardy, Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > I thought about that as well, but in my opinion this does not help much. > It's the same as before: in average every 10ms a new buffer needs to > be filled. But, you can fill 50 or 100 at a time, so if user-space is delayed for a few ms, the kernel still has plenty of buffers to work with until user-space gets another chance. I'm not worried about average thoughput of user-space to kernel, just random short-term starvation. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 18:07 ` Ben Greear @ 2007-12-04 21:41 ` Ariane Keller 2007-12-04 22:21 ` Ben Greear 0 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-12-04 21:41 UTC (permalink / raw) To: Ben Greear Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev, herbert, Rainer Baumann Ben Greear wrote: > Ariane Keller wrote: >> I thought about that as well, but in my opinion this does not help much. >> It's the same as before: in average every 10ms a new buffer needs to >> be filled. > But, you can fill 50 or 100 at a time, so if user-space is delayed for a > few ms, the > kernel still has plenty of buffers to work with until user-space gets > another chance. > I'm not worried about average thoughput of user-space to kernel, just > random > short-term starvation. Yes, for short-term starvation it helps certainly. But I'm still not convinced that it is really necessary to add more buffers, because I'm not sure whether the bottleneck is really the loading of data from user space to kernel space. Some basic tests have shown that the kernel starts loosing packets at approximately the same packet rate regardless whether we use netem, or netem with the trace extension. But if you have contrary experience I'm happy to add a parameter which defines the number of buffers. Thanks! ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 21:41 ` Ariane Keller @ 2007-12-04 22:21 ` Ben Greear 2007-12-05 6:12 ` Ariane Keller ` (3 more replies) 0 siblings, 4 replies; 26+ messages in thread From: Ben Greear @ 2007-12-04 22:21 UTC (permalink / raw) To: Ariane Keller Cc: Patrick McHardy, Stephen Hemminger, netdev, herbert, Rainer Baumann Ariane Keller wrote: > Yes, for short-term starvation it helps certainly. > But I'm still not convinced that it is really necessary to add more > buffers, because I'm not sure whether the bottleneck is really the > loading of data from user space to kernel space. > Some basic tests have shown that the kernel starts loosing packets at > approximately the same packet rate regardless whether we use netem, or > netem with the trace extension. > But if you have contrary experience I'm happy to add a parameter which > defines the number of buffers. I have no numbers, so if you think it works, then that is fine with me. If you actually run out of the trace buffers, do you just continue to run with the last settings? If so, that would keep up throughput even if you are out of trace buffers... What rates do you see, btw? (pps, bps). Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 22:21 ` Ben Greear @ 2007-12-05 6:12 ` Ariane Keller 2007-12-23 19:54 ` Ariane Keller ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Ariane Keller @ 2007-12-05 6:12 UTC (permalink / raw) To: Ben Greear Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev, herbert, Rainer Baumann > If you actually run out of the trace buffers, do you just continue to > run with the last settings? If so, that would keep up throughput > even if you are out of trace buffers... Upon configuring the qdisc you can specify a default value, which is taken when the buffers are empty. It is either drop the packet or just forward it with no delay. > What rates do you see, btw? (pps, bps). My machine was an AMD Athlon 2083MHz, with a default installation of Debian with Kernel 2.6.16 and HZ set to 1000. Up to 80'000 pps (with "small" udp packets) everything (without netem, with netem and with netem trace) worked fine (tested with up to 10ms delay). For 90'000 pps the kernel dropped some packets even with no netem running, some more with netem and allmost all with netem trace. As soon as I have changed the mechanism for the data transfer to rtnetlink I'll do some new tests, trying to reach a higher packet rate. Then I'll see whether it is necessary to add more buffers, or whether the system collapses before. Thanks again! Ariane ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement 2007-12-04 22:21 ` Ben Greear 2007-12-05 6:12 ` Ariane Keller @ 2007-12-23 19:54 ` Ariane Keller 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: kernel Ariane Keller 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: iproute Ariane Keller 3 siblings, 0 replies; 26+ messages in thread From: Ariane Keller @ 2007-12-23 19:54 UTC (permalink / raw) To: Ben Greear Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev, Rainer Baumann I have added the possibility to configure the number of buffers used to store the trace data for packet delays. The complete command to start netem with a trace file is: tc qdisc add dev eth1 root netem trace path/to/trace/file.bin buf 3 loops 1 0 with buf: the number of buffers to be used loops: how many times to loop through the tracefile the last argument is optional and specifies whether the default is to drop packets or 0-delay them. The patches are available at: http://www.tcn.hypert.net/tcn_kernel_2_6_23_confbuf http://www.tcn.hypert.net/tcn_iproute2_2_6_23_confbuf I'm looking forward for your comments! Thanks! Ariane Ben Greear wrote: > Ariane Keller wrote: > >> Yes, for short-term starvation it helps certainly. >> But I'm still not convinced that it is really necessary to add more >> buffers, because I'm not sure whether the bottleneck is really the >> loading of data from user space to kernel space. >> Some basic tests have shown that the kernel starts loosing packets at >> approximately the same packet rate regardless whether we use netem, or >> netem with the trace extension. >> But if you have contrary experience I'm happy to add a parameter which >> defines the number of buffers. > > I have no numbers, so if you think it works, then that is fine with me. > > If you actually run out of the trace buffers, do you just continue to > run with the last settings? If so, that would keep up throughput > even if you are out of trace buffers... > > What rates do you see, btw? (pps, bps). > > Thanks, > Ben > -- Ariane Keller Communication Systems Research Group, ETH Zurich Web: http://www.csg.ethz.ch/people/arkeller Office: ETZ G 60.1, Gloriastrasse 35, 8092 Zurich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement: kernel 2007-12-04 22:21 ` Ben Greear 2007-12-05 6:12 ` Ariane Keller 2007-12-23 19:54 ` Ariane Keller @ 2007-12-23 19:54 ` Ariane Keller 2007-12-28 16:08 ` Patrick McHardy 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: iproute Ariane Keller 3 siblings, 1 reply; 26+ messages in thread From: Ariane Keller @ 2007-12-23 19:54 UTC (permalink / raw) To: Ben Greear Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev, Rainer Baumann This patch applies to kernel 2.6.23. It enhances the network emulator netem with the possibility to read all delay/drop/duplicate etc values from a trace file. This trace file contains for each packet to be processed one value. The values are read from the file in a user space process called flowseed. These values are sent to the netem module with the help of rtnetlink sockets. In the netem module the values are "cached" in buffers. The number of buffers is configurable upon start of netem. If a buffer is empty the netem module sends a rtnetlink notification to the flowseed process. Upon receiving such a notification this process sends the next 1000 values to the netem module. signed-off-by: Ariane Keller <ariane.keller@tik.ee.ethz.ch> --- diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/include/linux/pkt_sched.h linux-2.6.23.8_mod/include/linux/pkt_sched.h --- linux-2.6.23.8/include/linux/pkt_sched.h 2007-11-16 19:14:27.000000000 +0100 +++ linux-2.6.23.8_mod/include/linux/pkt_sched.h 2007-12-21 19:42:49.000000000 +0100 @@ -439,6 +439,9 @@ enum TCA_NETEM_DELAY_DIST, TCA_NETEM_REORDER, TCA_NETEM_CORRUPT, + TCA_NETEM_TRACE, + TCA_NETEM_TRACE_DATA, + TCA_NETEM_STATS, __TCA_NETEM_MAX, }; @@ -454,6 +457,26 @@ struct tc_netem_qopt __u32 jitter; /* random jitter in latency (us) */ }; +struct tc_netem_stats +{ + int packetcount; + int packetok; + int normaldelay; + int drops; + int dupl; + int corrupt; + int novaliddata; + int reloadbuffer; +}; + +struct tc_netem_trace +{ + __u32 fid; /*flowid */ + __u32 def; /* default action 0 = no delay, 1 = drop*/ + __u32 ticks; /* number of ticks corresponding to 1ms */ + __u32 nr_bufs; /* number of buffers to save trace data*/ +}; + struct tc_netem_corr { __u32 delay_corr; /* delay correlation */ diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/include/net/flowseed.h linux-2.6.23.8_mod/include/net/flowseed.h --- linux-2.6.23.8/include/net/flowseed.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.23.8_mod/include/net/flowseed.h 2007-12-21 19:43:24.000000000 +0100 @@ -0,0 +1,34 @@ +/* flowseed.h header file for the netem trace enhancement + */ + +#ifndef _FLOWSEED_H +#define _FLOWSEED_H +#include <net/sch_generic.h> + +/* must be divisible by 4 (=#pkts)*/ +#define DATA_PACKAGE 4000 +#define DATA_PACKAGE_ID 4008 + +/* struct per flow - kernel */ +struct tcn_control +{ + struct list_head full_buffer_list; + struct list_head empty_buffer_list; + struct buflist * buffer_in_use; + int *offsetpos; /* pointer to actual pos in the buffer in use */ + int flowid; +}; + +struct tcn_statistic +{ + int packetcount; + int packetok; + int normaldelay; + int drops; + int dupl; + int corrupt; + int novaliddata; + int reloadbuffer; +}; + +#endif diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/include/net/pkt_sched.h linux-2.6.23.8_mod/include/net/pkt_sched.h --- linux-2.6.23.8/include/net/pkt_sched.h 2007-11-16 19:14:27.000000000 +0100 +++ linux-2.6.23.8_mod/include/net/pkt_sched.h 2007-12-21 19:42:49.000000000 +0100 @@ -72,6 +72,9 @@ extern void qdisc_watchdog_cancel(struct extern struct Qdisc_ops pfifo_qdisc_ops; extern struct Qdisc_ops bfifo_qdisc_ops; +extern int qdisc_notify_pid(int pid, struct nlmsghdr *n, u32 clid, + struct Qdisc *old, struct Qdisc *new); + extern int register_qdisc(struct Qdisc_ops *qops); extern int unregister_qdisc(struct Qdisc_ops *qops); extern struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/net/core/rtnetlink.c linux-2.6.23.8_mod/net/core/rtnetlink.c --- linux-2.6.23.8/net/core/rtnetlink.c 2007-11-16 19:14:27.000000000 +0100 +++ linux-2.6.23.8_mod/net/core/rtnetlink.c 2007-12-21 19:42:49.000000000 +0100 @@ -460,7 +460,7 @@ int rtnetlink_send(struct sk_buff *skb, NETLINK_CB(skb).dst_group = group; if (echo) atomic_inc(&skb->users); - netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL); + netlink_broadcast(rtnl, skb, pid, group, gfp_any()); if (echo) err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT); return err; diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/net/sched/sch_api.c linux-2.6.23.8_mod/net/sched/sch_api.c --- linux-2.6.23.8/net/sched/sch_api.c 2007-11-16 19:14:27.000000000 +0100 +++ linux-2.6.23.8_mod/net/sched/sch_api.c 2007-12-21 19:42:49.000000000 +0100 @@ -28,6 +28,7 @@ #include <linux/list.h> #include <linux/hrtimer.h> +#include <net/sock.h> #include <net/netlink.h> #include <net/pkt_sched.h> @@ -841,6 +842,62 @@ rtattr_failure: nlmsg_trim(skb, b); return -1; } +static int tc_fill(struct sk_buff *skb, struct Qdisc *q, u32 clid, + u32 pid, u32 seq, u16 flags, int event) +{ + struct tcmsg *tcm; + struct nlmsghdr *nlh; + unsigned char *b = skb_tail_pointer(skb); + + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*tcm), flags); + tcm = NLMSG_DATA(nlh); + tcm->tcm_family = AF_UNSPEC; + tcm->tcm__pad1 = 0; + tcm->tcm__pad2 = 0; + tcm->tcm_ifindex = q->dev->ifindex; + tcm->tcm_parent = clid; + tcm->tcm_handle = q->handle; + tcm->tcm_info = atomic_read(&q->refcnt); + RTA_PUT(skb, TCA_KIND, IFNAMSIZ, q->ops->id); + if (q->ops->dump && q->ops->dump(q, skb) < 0) + goto rtattr_failure; + + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + + return skb->len; + +nlmsg_failure: +rtattr_failure: + nlmsg_trim(skb, b); + return -1; +} + +int qdisc_notify_pid(int pid, struct nlmsghdr *n, + u32 clid, struct Qdisc *old, struct Qdisc *new) +{ + struct sk_buff *skb; + skb = alloc_skb(NLMSG_GOODSIZE, gfp_any()); + if (!skb) + return -ENOBUFS; + + if (old && old->handle) { + if (tc_fill(skb, old, clid, pid, n->nlmsg_seq, + 0, RTM_DELQDISC) < 0) + goto err_out; + } + if (new) { + if (tc_fill(skb, new, clid, pid, n->nlmsg_seq, + old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0) + goto err_out; + } + if (skb->len) + return rtnetlink_send(skb, pid, RTNLGRP_TC, n->nlmsg_flags); + +err_out: + kfree_skb(skb); + return -EINVAL; +} +EXPORT_SYMBOL(qdisc_notify_pid); static int qdisc_notify(struct sk_buff *oskb, struct nlmsghdr *n, u32 clid, struct Qdisc *old, struct Qdisc *new) @@ -848,7 +905,7 @@ static int qdisc_notify(struct sk_buff * struct sk_buff *skb; u32 pid = oskb ? NETLINK_CB(oskb).pid : 0; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + skb = alloc_skb(NLMSG_GOODSIZE, gfp_any()); if (!skb) return -ENOBUFS; diff -uprN -X linux-2.6.23.8/Documentation/dontdiff linux-2.6.23.8/net/sched/sch_netem.c linux-2.6.23.8_mod/net/sched/sch_netem.c --- linux-2.6.23.8/net/sched/sch_netem.c 2007-11-16 19:14:27.000000000 +0100 +++ linux-2.6.23.8_mod/net/sched/sch_netem.c 2007-12-21 19:42:49.000000000 +0100 @@ -11,6 +11,8 @@ * * Authors: Stephen Hemminger <shemminger@osdl.org> * Catalin(ux aka Dino) BOIE <catab at umbrella dot ro> + * netem trace: Ariane Keller <arkeller at ee.ethz.ch> ETH Zurich + * Rainer Baumann <baumann at hypert.net> ETH Zurich */ #include <linux/module.h> @@ -19,11 +21,13 @@ #include <linux/errno.h> #include <linux/skbuff.h> #include <linux/rtnetlink.h> - +#include <linux/list.h> #include <net/netlink.h> #include <net/pkt_sched.h> -#define VERSION "1.2" +#include "net/flowseed.h" + +#define VERSION "1.3" /* Network Emulation Queuing algorithm. ==================================== @@ -49,6 +53,11 @@ The simulator is limited by the Linux timer resolution and will create packet bursts on the HZ boundary (1ms). + + The trace option allows us to read the values for packet delay, + duplication, loss and corruption from a tracefile. This permits + the modulation of statistical properties such as long-range + dependences. See http://tcn.hypert.net. */ struct netem_sched_data { @@ -65,7 +74,11 @@ struct netem_sched_data { u32 duplicate; u32 reorder; u32 corrupt; - + u32 trace; + u32 ticks; + u32 def; + u32 flowid; + u32 bufnr; struct crndstate { u32 last; u32 rho; @@ -75,13 +88,29 @@ struct netem_sched_data { u32 size; s16 table[0]; } *delay_dist; + + struct tcn_statistic *statistic; + struct tcn_control *flowbuffer; +}; + +struct buflist { + struct list_head list; + char *buf; }; + /* Time stamp put into socket buffer control block */ struct netem_skb_cb { psched_time_t time_to_send; }; + +#define MASK_BITS 29 +#define MASK_DELAY ((1<<MASK_BITS)-1) +#define MASK_HEAD ~MASK_DELAY + +enum tcn_action { FLOW_NORMAL, FLOW_DROP, FLOW_DUP, FLOW_MANGLE }; + /* init_crandom - initialize correlated random number generator * Use entropy source for initial seed. */ @@ -141,6 +170,72 @@ static psched_tdiff_t tabledist(psched_t return x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu; } +/* don't call this function directly. It is called after + * a packet has been taken out of a buffer and it was the last. + */ +static int reload_flowbuffer(struct netem_sched_data *q, struct Qdisc *sch) +{ + struct tcn_control *flow = q->flowbuffer; + struct nlmsghdr n; + struct buflist *element = list_entry(flow->full_buffer_list.next, + struct buflist, list); + /* the current buffer is empty */ + list_add_tail(&flow->buffer_in_use->list, &flow->empty_buffer_list); + + if (list_empty(&q->flowbuffer->full_buffer_list)) { + printk(KERN_ERR "netem: reload_flowbuffer, no full buffer\n"); + return -EFAULT; + } + + list_del_init(&element->list); + flow->buffer_in_use = element; + flow->offsetpos = (int *)element->buf; + memset(&n, 0, sizeof(struct nlmsghdr)); + n.nlmsg_seq = 1; + n.nlmsg_flags = NLM_F_REQUEST; + if (qdisc_notify_pid(q->flowid, &n, sch->parent, NULL, sch) < 0) + printk(KERN_ERR "netem: unable to request for more data\n"); + + return 0; +} + +/* return pktdelay with delay and drop/dupl/corrupt option */ +static int get_next_delay(struct netem_sched_data *q, enum tcn_action *head, + struct sk_buff *skb, struct Qdisc *sch) +{ + struct tcn_control *flow = q->flowbuffer; + u32 variout; + /*choose whether to drop or 0 delay packets on default*/ + *head = q->def; + + if (!flow) { + printk(KERN_ERR "netem: read from an uninitialized flow.\n"); + q->statistic->novaliddata++; + return 0; + } + if (!flow->buffer_in_use) { + printk(KERN_ERR "netem: read from uninitialized flow\n"); + return 0; + } + if (!flow->buffer_in_use->buf || !flow->offsetpos) { + printk(KERN_ERR "netem: buffer empty or offsetpos null\n"); + return 0; + } + + q->statistic->packetcount++; + /* check if we have to reload a buffer */ + if ((void *)flow->offsetpos - (void *)flow->buffer_in_use->buf == DATA_PACKAGE) + reload_flowbuffer(q, sch); + + variout = *flow->offsetpos++; + *head = (variout & MASK_HEAD) >> MASK_BITS; + + (&q->statistic->normaldelay)[*head] += 1; + q->statistic->packetok++; + + return ((variout & MASK_DELAY) * q->ticks) / 1000; +} + /* * Insert one skb into qdisc. * Note: parent depends on return value to account for queue length. @@ -153,17 +248,23 @@ static int netem_enqueue(struct sk_buff /* We don't fill cb now as skb_unshare() may invalidate it */ struct netem_skb_cb *cb; struct sk_buff *skb2; + enum tcn_action action = FLOW_NORMAL; + psched_tdiff_t delay = -1; int ret; int count = 1; pr_debug("netem_enqueue skb=%p\n", skb); + if (q->trace) + delay = get_next_delay(q, &action, sch->q.next, sch); /* Random duplication */ - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor)) + if (q->trace ? action == FLOW_DUP : + (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))) ++count; /* Random packet drop 0 => none, ~0 => all */ - if (q->loss && q->loss >= get_crandom(&q->loss_cor)) + if (q->trace ? action == FLOW_DROP : + (q->loss && q->loss >= get_crandom(&q->loss_cor))) --count; if (count == 0) { @@ -194,7 +295,8 @@ static int netem_enqueue(struct sk_buff * If packet is going to be hardware checksummed, then * do it now in software before we mangle it. */ - if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) { + if (q->trace ? action == FLOW_MANGLE : + (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor))) { if (!(skb = skb_unshare(skb, GFP_ATOMIC)) || (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))) { @@ -210,10 +312,10 @@ static int netem_enqueue(struct sk_buff || q->counter < q->gap /* inside last reordering gap */ || q->reorder < get_crandom(&q->reorder_cor)) { psched_time_t now; - psched_tdiff_t delay; - delay = tabledist(q->latency, q->jitter, - &q->delay_cor, q->delay_dist); + if (!q->trace) + delay = tabledist(q->latency, q->jitter, + &q->delay_cor, q->delay_dist); now = psched_get_time(); cb->time_to_send = now + delay; @@ -332,6 +434,61 @@ static int set_fifo_limit(struct Qdisc * return ret; } +static void reset_stats(struct netem_sched_data *q) +{ + if (q->statistic) + memset(q->statistic, 0, sizeof(*(q->statistic))); + return; +} + +static void free_flowbuffer(struct netem_sched_data *q) +{ + struct buflist *cursor; + struct buflist *next; + list_for_each_entry_safe(cursor, next, + &q->flowbuffer->full_buffer_list, list) { + kfree(cursor->buf); + list_del(&cursor->list); + kfree(cursor); + } + + list_for_each_entry_safe(cursor, next, + &q->flowbuffer->empty_buffer_list, list) { + kfree(cursor->buf); + list_del(&cursor->list); + kfree(cursor); + } + + kfree(q->flowbuffer->buffer_in_use->buf); + kfree(q->flowbuffer->buffer_in_use); + + kfree(q->statistic); + kfree(q->flowbuffer); + q->statistic = NULL; + q->flowbuffer = NULL; + +} + +static int init_flowbuffer(unsigned int fid, struct netem_sched_data *q) +{ + q->statistic = kzalloc(sizeof(*(q->statistic)), GFP_KERNEL); + q->flowbuffer = kmalloc(sizeof(*(q->flowbuffer)), GFP_KERNEL); + + INIT_LIST_HEAD(&q->flowbuffer->full_buffer_list); + INIT_LIST_HEAD(&q->flowbuffer->empty_buffer_list); + + while (q->bufnr > 0) { + int size = sizeof(struct buflist); + struct buflist *element = kmalloc(size, GFP_KERNEL); + element->buf = kmalloc(DATA_PACKAGE, GFP_KERNEL); + list_add(&element->list, &q->flowbuffer->empty_buffer_list); + q->bufnr--; + } + q->flowbuffer->buffer_in_use = NULL; + q->flowbuffer->offsetpos = NULL; + return 0; +} + /* * Distribution data is a variable size payload containing * signed 16 bit values. @@ -403,6 +560,87 @@ static int get_corrupt(struct Qdisc *sch return 0; } +static int get_trace(struct Qdisc *sch, const struct rtattr *attr) +{ + struct netem_sched_data *q = qdisc_priv(sch); + const struct tc_netem_trace *traceopt = RTA_DATA(attr); + struct nlmsghdr n; + if (RTA_PAYLOAD(attr) != sizeof(*traceopt)) + return -EINVAL; + + if (traceopt->fid) { + q->ticks = traceopt->ticks; + q->bufnr = traceopt->nr_bufs; + q->trace = 1; + init_flowbuffer(traceopt->fid, q); + } else { + printk(KERN_ERR "netem: invalid flow id\n"); + q->trace = 0; + } + q->def = traceopt->def; + q->flowid = traceopt->fid; + + memset(&n, 0, sizeof(struct nlmsghdr)); + + n.nlmsg_seq = 1; + n.nlmsg_flags = NLM_F_REQUEST; + + if (qdisc_notify_pid(traceopt->fid, &n, sch->parent, NULL, sch) < 0) { + printk(KERN_ERR "netem: could not send notification"); + return -EINVAL; + } + return 0; +} + +static int get_trace_data(struct Qdisc *sch, const struct rtattr *attr) +{ + struct netem_sched_data *q = qdisc_priv(sch); + const char *msg = RTA_DATA(attr); + int fid, validData; + struct buflist *element; + struct tcn_control *flow; + if (RTA_PAYLOAD(attr) != DATA_PACKAGE_ID) { + printk("get_trace_data: invalid size\n"); + return -EINVAL; + } + memcpy(&fid, msg + DATA_PACKAGE, sizeof(int)); + memcpy(&validData, msg + DATA_PACKAGE + sizeof(int), sizeof(int)); + + /* check whether this process is allowed to send data */ + if (fid != q->flowid) + return -EPERM; + + /* no empty buffer */ + if (list_empty(&q->flowbuffer->empty_buffer_list)) + return -ENOBUFS; + + element = list_entry(q->flowbuffer->empty_buffer_list.next, + struct buflist, list); + if (element->buf == NULL) + return -ENOBUFS; + + list_del_init(&element->list); + memcpy(element->buf, msg, DATA_PACKAGE); + flow = q->flowbuffer; + if (flow->buffer_in_use == NULL) { + flow->buffer_in_use = element; + flow->offsetpos = (int *)element->buf; + } else + list_add_tail(&element->list, &q->flowbuffer->full_buffer_list); + + if (!list_empty(&q->flowbuffer->empty_buffer_list)) { + struct nlmsghdr n; + memset(&n, 0, sizeof(struct nlmsghdr)); + n.nlmsg_flags = NLM_F_REQUEST; + n.nlmsg_seq = 1; + if (qdisc_notify_pid(fid, &n, sch->parent, NULL, sch) < 0) + printk(KERN_NOTICE "could not send data " + "request for flow %i\n", fid); + } + q->statistic->reloadbuffer++; + return 0; +} + /* Parse netlink message to set options */ static int netem_change(struct Qdisc *sch, struct rtattr *opt) { @@ -414,11 +652,6 @@ static int netem_change(struct Qdisc *sc return -EINVAL; qopt = RTA_DATA(opt); - ret = set_fifo_limit(q->qdisc, qopt->limit); - if (ret) { - pr_debug("netem: can't set fifo limit\n"); - return ret; - } q->latency = qopt->latency; q->jitter = qopt->jitter; @@ -444,6 +677,29 @@ static int netem_change(struct Qdisc *sc RTA_PAYLOAD(opt) - sizeof(*qopt))) return -EINVAL; + /* its a user tc add or tc change command. + * We free the flowbuffer*/ + if (!tb[TCA_NETEM_TRACE_DATA-1] && q->trace) { + struct nlmsghdr n; + q->trace = 0; + memset(&n, 0, sizeof(struct nlmsghdr)); + n.nlmsg_flags = NLM_F_REQUEST; + n.nlmsg_seq = 1; + if (qdisc_notify_pid(q->flowid, &n, sch->parent, sch, NULL) < 0) + printk(KERN_NOTICE "netem: cannot send notification\n"); + + reset_stats(q); + free_flowbuffer(q); + + /* we set the fifo limit: this is done here + * since TRACE_DATA memset qopt to 0 */ + ret = set_fifo_limit(q->qdisc, qopt->limit); + if (ret) { + pr_debug("netem: can't set fifo limit\n"); + return ret; + } + } + if (tb[TCA_NETEM_CORR-1]) { ret = get_correlation(sch, tb[TCA_NETEM_CORR-1]); if (ret) @@ -467,7 +723,40 @@ static int netem_change(struct Qdisc *sc if (ret) return ret; } + if (tb[TCA_NETEM_TRACE-1]) { + ret = get_trace(sch, tb[TCA_NETEM_TRACE-1]); + if (ret) + return ret; + } + if (tb[TCA_NETEM_TRACE_DATA-1]) { + ret = get_trace_data(sch, tb[TCA_NETEM_TRACE_DATA-1]); + if (ret) + return ret; + } + } + /* it was a user tc add or tc change request, + * we delete the current flowbuffer*/ + else { + if (q->trace) { + struct nlmsghdr n; + q->trace = 0; + memset(&n, 0, sizeof(struct nlmsghdr)); + n.nlmsg_flags = NLM_F_REQUEST; + n.nlmsg_seq = 1; + if (qdisc_notify_pid(q->flowid, &n, sch->parent, sch, NULL) < 0) + printk(KERN_NOTICE "netem: could not send notification\n"); + reset_stats(q); + free_flowbuffer(q); + } + /* we set the fifo limit */ + ret = set_fifo_limit(q->qdisc, qopt->limit); + if (ret) { + pr_debug("netem: can't set fifo limit\n"); + return ret; + } + } + return 0; } @@ -567,6 +856,7 @@ static int netem_init(struct Qdisc *sch, qdisc_watchdog_init(&q->watchdog, sch); + q->trace = 0; q->qdisc = qdisc_create_dflt(sch->dev, &tfifo_qdisc_ops, TC_H_MAKE(sch->handle, 1)); if (!q->qdisc) { @@ -585,6 +875,16 @@ static int netem_init(struct Qdisc *sch, static void netem_destroy(struct Qdisc *sch) { struct netem_sched_data *q = qdisc_priv(sch); + if (q->trace) { + struct nlmsghdr n; + q->trace = 0; + memset(&n, 0, sizeof(struct nlmsghdr)); + n.nlmsg_flags = NLM_F_REQUEST; + n.nlmsg_seq = 1; + if (qdisc_notify_pid(q->flowid, &n, sch->parent, sch, NULL) < 0) + printk(KERN_NOTICE "netem: could not send notification\n"); + free_flowbuffer(q); + } qdisc_watchdog_cancel(&q->watchdog); qdisc_destroy(q->qdisc); @@ -600,6 +900,7 @@ static int netem_dump(struct Qdisc *sch, struct tc_netem_corr cor; struct tc_netem_reorder reorder; struct tc_netem_corrupt corrupt; + struct tc_netem_trace traceopt; qopt.latency = q->latency; qopt.jitter = q->jitter; @@ -622,6 +923,23 @@ static int netem_dump(struct Qdisc *sch, corrupt.correlation = q->corrupt_cor.rho; RTA_PUT(skb, TCA_NETEM_CORRUPT, sizeof(corrupt), &corrupt); + traceopt.fid = q->trace; + traceopt.def = q->def; + traceopt.ticks = q->ticks; + RTA_PUT(skb, TCA_NETEM_TRACE, sizeof(traceopt), &traceopt); + + if (q->trace) { + struct tc_netem_stats tstats; + tstats.packetcount = q->statistic->packetcount; + tstats.packetok = q->statistic->packetok; + tstats.normaldelay = q->statistic->normaldelay; + tstats.drops = q->statistic->drops; + tstats.dupl = q->statistic->dupl; + tstats.corrupt = q->statistic->corrupt; + tstats.novaliddata = q->statistic->novaliddata; + tstats.reloadbuffer = q->statistic->reloadbuffer; + RTA_PUT(skb, TCA_NETEM_STATS, sizeof(tstats), &tstats); + } rta->rta_len = skb_tail_pointer(skb) - b; return skb->len; Ben Greear wrote: > Ariane Keller wrote: > >> Yes, for short-term starvation it helps certainly. >> But I'm still not convinced that it is really necessary to add more >> buffers, because I'm not sure whether the bottleneck is really the >> loading of data from user space to kernel space. >> Some basic tests have shown that the kernel starts loosing packets at >> approximately the same packet rate regardless whether we use netem, or >> netem with the trace extension. >> But if you have contrary experience I'm happy to add a parameter which >> defines the number of buffers. > > I have no numbers, so if you think it works, then that is fine with me. > > If you actually run out of the trace buffers, do you just continue to > run with the last settings? If so, that would keep up throughput > even if you are out of trace buffers... > > What rates do you see, btw? (pps, bps). > > Thanks, > Ben > -- Ariane Keller Communication Systems Research Group, ETH Zurich Web: http://www.csg.ethz.ch/people/arkeller Office: ETZ G 60.1, Gloriastrasse 35, 8092 Zurich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement: kernel 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: kernel Ariane Keller @ 2007-12-28 16:08 ` Patrick McHardy 2007-12-28 21:02 ` Ariane Keller 0 siblings, 1 reply; 26+ messages in thread From: Patrick McHardy @ 2007-12-28 16:08 UTC (permalink / raw) To: Ariane Keller; +Cc: Ben Greear, Stephen Hemminger, netdev, Rainer Baumann Ariane Keller wrote: > +struct tc_netem_stats > +{ > + int packetcount; > + int packetok; > + int normaldelay; > + int drops; > + int dupl; > + int corrupt; > + int novaliddata; > + int reloadbuffer; These should be unsigned int or __u32. > > diff -uprN -X linux-2.6.23.8/Documentation/dontdiff > linux-2.6.23.8/include/net/flowseed.h > linux-2.6.23.8_mod/include/net/flowseed.h > --- linux-2.6.23.8/include/net/flowseed.h 1970-01-01 > 01:00:00.000000000 +0100 > +++ linux-2.6.23.8_mod/include/net/flowseed.h 2007-12-21 > 19:43:24.000000000 +0100 > @@ -0,0 +1,34 @@ > +/* flowseed.h header file for the netem trace enhancement > + */ > + > +#ifndef _FLOWSEED_H > +#define _FLOWSEED_H > +#include <net/sch_generic.h> > + > +/* must be divisible by 4 (=#pkts)*/ > +#define DATA_PACKAGE 4000 Its not obvious that this refers to a size, please rename to something more approriate. And why is it hardcoded to 4000? Shouldn't it be related to NLMSG_GOODSIZE? > +#define DATA_PACKAGE_ID 4008 Its even less obvious that this is the netlink attribute size. Its obfuscation anyway, just open-code RTA_SPACE(new name of DATA_PACKAGE). > + > +/* struct per flow - kernel */ > +struct tcn_control > +{ > + struct list_head full_buffer_list; > + struct list_head empty_buffer_list; > + struct buflist * buffer_in_use; > + int *offsetpos; /* pointer to actual pos in the buffer in > use */ > + int flowid; > +}; > + > +struct tcn_statistic > +{ > + int packetcount; > + int packetok; > + int normaldelay; > + int drops; > + int dupl; > + int corrupt; > + int novaliddata; > + int reloadbuffer; Also unsigned please. > > diff -uprN -X linux-2.6.23.8/Documentation/dontdiff > linux-2.6.23.8/net/sched/sch_api.c linux-2.6.23.8_mod/net/sched/sch_api.c > --- linux-2.6.23.8/net/sched/sch_api.c 2007-11-16 > 19:14:27.000000000 +0100 > +++ linux-2.6.23.8_mod/net/sched/sch_api.c 2007-12-21 > 19:42:49.000000000 +0100 > @@ -28,6 +28,7 @@ > #include <linux/list.h> > #include <linux/hrtimer.h> > > +#include <net/sock.h> > #include <net/netlink.h> > #include <net/pkt_sched.h> > > @@ -841,6 +842,62 @@ rtattr_failure: > nlmsg_trim(skb, b); > return -1; > } > +static int tc_fill(struct sk_buff *skb, struct Qdisc *q, u32 clid, > + u32 pid, u32 seq, u16 flags, int event) > +{ > + struct tcmsg *tcm; > + struct nlmsghdr *nlh; > + unsigned char *b = skb_tail_pointer(skb); > + > + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*tcm), flags); > + tcm = NLMSG_DATA(nlh); > + tcm->tcm_family = AF_UNSPEC; > + tcm->tcm__pad1 = 0; > + tcm->tcm__pad2 = 0; > + tcm->tcm_ifindex = q->dev->ifindex; > + tcm->tcm_parent = clid; > + tcm->tcm_handle = q->handle; > + tcm->tcm_info = atomic_read(&q->refcnt); > + RTA_PUT(skb, TCA_KIND, IFNAMSIZ, q->ops->id); > + if (q->ops->dump && q->ops->dump(q, skb) < 0) > + goto rtattr_failure; > + > + nlh->nlmsg_len = skb_tail_pointer(skb) - b; > + > + return skb->len; Why is this function not used by tc_fill_qdisc? > + > +nlmsg_failure: > +rtattr_failure: > + nlmsg_trim(skb, b); > + return -1; > +} > + > +int qdisc_notify_pid(int pid, struct nlmsghdr *n, > + u32 clid, struct Qdisc *old, struct Qdisc *new) > +{ > + struct sk_buff *skb; > + skb = alloc_skb(NLMSG_GOODSIZE, gfp_any()); > + if (!skb) > + return -ENOBUFS; > + > + if (old && old->handle) { > + if (tc_fill(skb, old, clid, pid, n->nlmsg_seq, > + 0, RTM_DELQDISC) < 0) > + goto err_out; > + } > + if (new) { > + if (tc_fill(skb, new, clid, pid, n->nlmsg_seq, > + old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0) > + goto err_out; > + } > + if (skb->len) > + return rtnetlink_send(skb, pid, RTNLGRP_TC, n->nlmsg_flags); And why do you need a new notification function? qdisc_notify seems perfectly fine for this. > + > +err_out: > + kfree_skb(skb); > + return -EINVAL; > +} > +EXPORT_SYMBOL(qdisc_notify_pid); > > static int qdisc_notify(struct sk_buff *oskb, struct nlmsghdr *n, > u32 clid, struct Qdisc *old, struct Qdisc *new) > @@ -848,7 +905,7 @@ static int qdisc_notify(struct sk_buff * > struct sk_buff *skb; > u32 pid = oskb ? NETLINK_CB(oskb).pid : 0; > > - skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); > + skb = alloc_skb(NLMSG_GOODSIZE, gfp_any()); You don't even use qdisc_notify anywhere in your patch, why this change? > if (!skb) > return -ENOBUFS; > > diff -uprN -X linux-2.6.23.8/Documentation/dontdiff > linux-2.6.23.8/net/sched/sch_netem.c > linux-2.6.23.8_mod/net/sched/sch_netem.c > --- linux-2.6.23.8/net/sched/sch_netem.c 2007-11-16 > 19:14:27.000000000 +0100 > +++ linux-2.6.23.8_mod/net/sched/sch_netem.c 2007-12-21 > 19:42:49.000000000 +0100 > +/* don't call this function directly. It is called after > + * a packet has been taken out of a buffer and it was the last. > + */ > +static int reload_flowbuffer(struct netem_sched_data *q, struct Qdisc > *sch) > +{ > + struct tcn_control *flow = q->flowbuffer; > + struct nlmsghdr n; > + struct buflist *element = list_entry(flow->full_buffer_list.next, > + struct buflist, list); > + /* the current buffer is empty */ > + list_add_tail(&flow->buffer_in_use->list, &flow->empty_buffer_list); > + > + if (list_empty(&q->flowbuffer->full_buffer_list)) { > + printk(KERN_ERR "netem: reload_flowbuffer, no full buffer\n"); > + return -EFAULT; > + } > + > + list_del_init(&element->list); > + flow->buffer_in_use = element; > + flow->offsetpos = (int *)element->buf; > + memset(&n, 0, sizeof(struct nlmsghdr)); > + n.nlmsg_seq = 1; > + n.nlmsg_flags = NLM_F_REQUEST; This netlink header faking is horrible, please just change qdisc_notify to deal with absent netlink headers appropriately. The sequence number used for kernel notifications not related to userspace requests is 0. > + if (qdisc_notify_pid(q->flowid, &n, sch->parent, NULL, sch) < 0) > + printk(KERN_ERR "netem: unable to request for more data\n"); netlink_set_err() causing userspace to request all current information seems like better error handling. The remaining netem part also looks like it could use a lot of improvement, you shouldn't need manual notifications on destruction, change, etc., all this is already handled by sch_api. There should be a single new notification in netem_enqueue(), calling qdisc_notify(), which dumps the current state to userspace. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement: kernel 2007-12-28 16:08 ` Patrick McHardy @ 2007-12-28 21:02 ` Ariane Keller 0 siblings, 0 replies; 26+ messages in thread From: Ariane Keller @ 2007-12-28 21:02 UTC (permalink / raw) To: Patrick McHardy Cc: Ariane Keller, Ben Greear, Stephen Hemminger, netdev, Rainer Baumann Thanks for your comments! Patrick McHardy wrote: > Ariane Keller wrote: >> +/* must be divisible by 4 (=#pkts)*/ >> +#define DATA_PACKAGE 4000 > > Its not obvious that this refers to a size, please rename > to something more approriate. And why is it hardcoded > to 4000? Shouldn't it be related to NLMSG_GOODSIZE? Ok, I can rename it to TRACE_DATA_PACKET_SIZE > >> +#define DATA_PACKAGE_ID 4008 > > Its even less obvious that this is the netlink attribute > size. Its obfuscation anyway, just open-code > RTA_SPACE(new name of DATA_PACKAGE). DATA_PACKAGE_ID corresponds to DATA_PACKAGE + 2 * sizeof(int). The two ints are a small header in front of each packet. I agree the name is really bad and I have to think about the whole thing with this header. >> + >> +int qdisc_notify_pid(int pid, struct nlmsghdr *n, >> + u32 clid, struct Qdisc *old, struct Qdisc *new) >> +{ >> + struct sk_buff *skb; >> + skb = alloc_skb(NLMSG_GOODSIZE, gfp_any()); >> + if (!skb) >> + return -ENOBUFS; >> + >> + if (old && old->handle) { >> + if (tc_fill(skb, old, clid, pid, n->nlmsg_seq, >> + 0, RTM_DELQDISC) < 0) >> + goto err_out; >> + } >> + if (new) { >> + if (tc_fill(skb, new, clid, pid, n->nlmsg_seq, >> + old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0) >> + goto err_out; >> + } >> + if (skb->len) >> + return rtnetlink_send(skb, pid, RTNLGRP_TC, n->nlmsg_flags); > > And why do you need a new notification function? qdisc_notify > seems perfectly fine for this. qdisc_notify results in acquiring a lock (q->stats_lock) which we already hold in this situation (qdisc_notify->tc_fill_qdisc->gnet_stats_start_copy_compat). Writing a new notification function may be wrong, but I do not know a better way. >> diff -uprN -X linux-2.6.23.8/Documentation/dontdiff >> linux-2.6.23.8/net/sched/sch_netem.c >> linux-2.6.23.8_mod/net/sched/sch_netem.c >> --- linux-2.6.23.8/net/sched/sch_netem.c 2007-11-16 >> 19:14:27.000000000 +0100 >> +++ linux-2.6.23.8_mod/net/sched/sch_netem.c 2007-12-21 >> 19:42:49.000000000 +0100 > >> +/* don't call this function directly. It is called after >> + * a packet has been taken out of a buffer and it was the last. >> + */ >> +static int reload_flowbuffer(struct netem_sched_data *q, struct Qdisc >> *sch) >> +{ >> + struct tcn_control *flow = q->flowbuffer; >> + struct nlmsghdr n; >> + struct buflist *element = list_entry(flow->full_buffer_list.next, >> + struct buflist, list); >> + /* the current buffer is empty */ >> + list_add_tail(&flow->buffer_in_use->list, &flow->empty_buffer_list); >> + >> + if (list_empty(&q->flowbuffer->full_buffer_list)) { >> + printk(KERN_ERR "netem: reload_flowbuffer, no full buffer\n"); >> + return -EFAULT; >> + } >> + >> + list_del_init(&element->list); >> + flow->buffer_in_use = element; >> + flow->offsetpos = (int *)element->buf; >> + memset(&n, 0, sizeof(struct nlmsghdr)); >> + n.nlmsg_seq = 1; >> + n.nlmsg_flags = NLM_F_REQUEST; > > This netlink header faking is horrible, please just change qdisc_notify > to deal with absent netlink headers appropriately. The sequence number > used for kernel notifications not related to userspace requests is 0. > >> + if (qdisc_notify_pid(q->flowid, &n, sch->parent, NULL, sch) < 0) >> + printk(KERN_ERR "netem: unable to request for more data\n"); > > netlink_set_err() causing userspace to request all current information > seems like better error handling. The remaining netem part also looks > like it could use a lot of improvement, you shouldn't need manual > notifications on destruction, change, etc., all this is already > handled by sch_api. There should be a single new notification in > netem_enqueue(), calling qdisc_notify(), which dumps the current > state to userspace. I can summarize the notifications which request for more data. But I do not (yet) know how I get rid of those which deal with the notification of the deletion of a qdisc. "tc qdisc add/change ... trace ..." start a new process (flowseed) which waits for kernel requests to send trace data packets to the netem module. If "tc qdisc change/del ..." is called the previously generated flowseed process needs to be terminated. I did this by sending a notification to the corresponding flowseed process. Upon receiving this notification the flowseed process terminates itself. Is there already an event generated by sch_api on which the flowseed process could listen in order to be notified when a given qdisc is deleted? Thanks a lot! Ariane > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/2] netem: trace enhancement: iproute 2007-12-04 22:21 ` Ben Greear ` (2 preceding siblings ...) 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: kernel Ariane Keller @ 2007-12-23 19:54 ` Ariane Keller 3 siblings, 0 replies; 26+ messages in thread From: Ariane Keller @ 2007-12-23 19:54 UTC (permalink / raw) To: Ben Greear Cc: Ariane Keller, Patrick McHardy, Stephen Hemminger, netdev, Rainer Baumann The iproute patch is to big to send on the mailing list, since the distribution data have changed the directory. For ease of discussion I add the important changes in this mail. signed-of-by: Ariane Keller <ariane.keller@tik.ee.ethz.ch --- diff -uprN iproute2-2.6.23/netem/trace/flowseed.c iproute2-2.6.23_buf/netem/trace/flowseed.c --- iproute2-2.6.23/netem/trace/flowseed.c 1970-01-01 01:00:00.000000000 +0100 +++ iproute2-2.6.23_buf/netem/trace/flowseed.c 2007-12-12 08:43:01.000000000 +0100 @@ -0,0 +1,209 @@ +/* flowseed.c flowseedprocess to deliver values for packet delay, + * duplication, loss and curruption form userspace to netem + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Ariane Keller <arkeller@ee.ethz.ch> ETH Zurich + * Rainer Baumann <baumann@hypert.net> ETH Zurich + */ + +#include <ctype.h> +#include <stdio.h> +#include <fcntl.h> +#include <stdlib.h> +#include <string.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h> +#include <sys/ipc.h> +#include <sys/sem.h> +#include <signal.h> + +#include "utils.h" +#include "linux/pkt_sched.h" + +#define DATA_PACKAGE 4000 +#define DATA_PACKAGE_ID DATA_PACKAGE + sizeof(unsigned int) + sizeof(int) +#define TCA_BUF_MAX (64*1024) +/* maximal amount of parallel flows */ +struct rtnl_handle rth; +unsigned int loop; +int infinity = 0; +int fdflowseed; +char *sendpkg; +int fid; +int initialized = 0; +int semid; +int moreData = 1, r = 0, rold = 0; +FILE * file; + + +int printfct(const struct sockaddr_nl *who, + struct nlmsghdr *n, + void *arg) +{ + struct { + struct nlmsghdr n; + struct tcmsg t; + char buf[TCA_BUF_MAX]; + } req; + struct tcmsg *t = NLMSG_DATA(n); + struct rtattr *tail = NULL; + struct tc_netem_qopt opt; + memset(&opt, 0, sizeof(opt)); + + if(n->nlmsg_type == RTM_DELQDISC) { + goto outerr; + } + else if(n->nlmsg_type == RTM_NEWQDISC){ + initialized = 1; + + memset(&req, 0, sizeof(req)); + req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)); + req.n.nlmsg_flags = NLM_F_REQUEST; + req.n.nlmsg_type = RTM_NEWQDISC; + req.t.tcm_family = AF_UNSPEC; + req.t.tcm_handle = t->tcm_handle; + req.t.tcm_parent = t->tcm_parent; + req.t.tcm_ifindex = t->tcm_ifindex; + + tail = NLMSG_TAIL(&req.n); +again: + if (loop <= 0 && !infinity){ + goto out; + } + if ((r = read(fdflowseed, sendpkg + rold, DATA_PACKAGE - rold)) >= 0) { + if (r + rold < DATA_PACKAGE) { + /* Tail of input file reached, + set rest at start from next iteration */ + rold = r; + fprintf(file, "flowseed: at end of file.\n"); + + if (lseek(fdflowseed, 0L, SEEK_SET) < 0){ + perror("lseek reset"); + goto out; + } + goto again; + } + r = 0; + rold = 0; + memcpy(sendpkg + DATA_PACKAGE, &fid, sizeof(int)); + memcpy(sendpkg + DATA_PACKAGE + sizeof(int), &moreData, sizeof(int)); + + /* opt has to be added for each netem request */ + if (addattr_l(&req.n, TCA_BUF_MAX, TCA_OPTIONS, &opt, sizeof(opt)) < 0){ + perror("add options"); + return -1; + } + + if(addattr_l(&req.n, TCA_BUF_MAX, TCA_NETEM_TRACE_DATA, sendpkg, DATA_PACKAGE_ID) < 0){ + perror("add data\n"); + return -1; + } + + tail->rta_len = (void *)NLMSG_TAIL(&req.n) - (void *)tail; + + if(rtnl_send(&rth, (char*)&req, req.n.nlmsg_len) < 0){ + perror("send data"); + return -1; + } + return 0; + } + } +/* no more data, what to do? we send a notification to the kernel module */ +out: + fprintf(stderr, "flowseed: Tail of input file reached. Exit.\n"); + fprintf(file, "flowseed: Tail of input file reached. Exit.\n"); + moreData = 0; + memcpy(sendpkg + DATA_PACKAGE, &fid, sizeof(int)); + memcpy(sendpkg + DATA_PACKAGE + sizeof(int), &moreData, sizeof(int)); + if (addattr_l(&req.n, TCA_BUF_MAX, TCA_OPTIONS, &opt, sizeof(opt)) < 0){ + perror("add options"); + goto outerr; + } + if(addattr_l(&req.n, TCA_BUF_MAX, TCA_NETEM_TRACE_DATA, sendpkg, DATA_PACKAGE_ID) < 0){ + perror("add data\n"); + goto outerr; + } + + tail->rta_len = (void *)NLMSG_TAIL(&req.n) - (void *)tail; + + if(rtnl_send(&rth, (char*)&req, req.n.nlmsg_len) < 0){ + perror("rtnl_send"); + } +outerr: + fprintf(file, "flowseed: outerr Exit.\n"); + fclose(file); + close(fdflowseed); + free(sendpkg); + rtnl_close(&rth); + exit(0); +} + +void sigact(int signal){ + if(initialized){ + return; + } + else{ + fprintf(stderr, "flowseed: not yet initialized. Exit\n"); + exit(0); + } +} +int main(int argc, char *argv[]) +{ + struct sembuf buf; + file = fopen("flowseedout.txt", "a+"); + fprintf(file, "flowseed: initial msg.\n"); + + if (argc < 3) { + printf("usage: <tracefilename> <loop>"); + return -1; + } + loop = strtoul(argv[2], NULL, 10); + if (loop == 0) + infinity = 1; + + if ((fdflowseed = open(argv[1], O_RDONLY, 0)) < 0) { + perror("cannot open tracefile"); + return -1; + } + + fid = getpid(); + sendpkg = malloc(DATA_PACKAGE_ID); + + if (rtnl_open(&rth, 0) < 0) { + perror("Cannot open rtnetlink"); + return -1; + } + ll_init_map(&rth); + /* we are ready to receive notifications */ + if((semid = semget(0x12345678, 1, IPC_CREAT | 0666))<0){ + perror("semget"); + return -1; + } + buf.sem_num = 0; + buf.sem_op = +1; + buf.sem_flg = SEM_UNDO; + if(semop(semid, &buf, 1) < 0){ + perror("semop"); + return -1; + } + /* if the user typed an invalid command we cannot detect this + * therefore we set a timer, if the timer expires before we receive + * any message from the kernel module, we assume there was an + * error and quit. + */ + signal(SIGALRM, sigact); + alarm(3); + + /* listen to notifications from kernel */ + if (rtnl_listen(&rth, printfct, NULL) < 0) { + perror("listen"); + rtnl_close(&rth); + exit(2); + } + return 0; +} diff -uprN iproute2-2.6.23/tc/q_netem.c iproute2-2.6.23_buf/tc/q_netem.c --- iproute2-2.6.23/tc/q_netem.c 2007-10-16 23:27:42.000000000 +0200 +++ iproute2-2.6.23_buf/tc/q_netem.c 2007-12-21 19:08:19.000000000 +0100 @@ -6,7 +6,12 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * + * README files: iproute2/netem/distribution + * iproute2/netem/trace + * * Authors: Stephen Hemminger <shemminger@osdl.org> + * netem trace: Ariane Keller <arkeller@ee.ethz.ch> ETH Zurich + * Rainer Baumann <baumann@hypert.net> ETH Zurich * */ @@ -20,6 +25,9 @@ #include <arpa/inet.h> #include <string.h> #include <errno.h> +#include <sys/types.h> +#include <sys/ipc.h> +#include <sys/sem.h> #include "utils.h" #include "tc_util.h" @@ -34,7 +42,8 @@ static void explain(void) " [ drop PERCENT [CORRELATION]] \n" \ " [ corrupt PERCENT [CORRELATION]] \n" \ " [ duplicate PERCENT [CORRELATION]]\n" \ -" [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]\n"); +" [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]\n" \ +" [ trace PATH buf NR_BUFS loops NR_LOOPS [DEFAULT]\n"); } static void explain1(const char *arg) @@ -42,6 +51,7 @@ static void explain1(const char *arg) fprintf(stderr, "Illegal \"%s\"\n", arg); } +#define FLOWPATH "/usr/local/bin/flowseed" #define usage() return(-1) /* @@ -129,6 +139,7 @@ static int netem_parse_opt(struct qdisc_ struct tc_netem_corr cor; struct tc_netem_reorder reorder; struct tc_netem_corrupt corrupt; + struct tc_netem_trace traceopt; __s16 *dist_data = NULL; int present[__TCA_NETEM_MAX]; @@ -137,8 +148,12 @@ static int netem_parse_opt(struct qdisc_ memset(&cor, 0, sizeof(cor)); memset(&reorder, 0, sizeof(reorder)); memset(&corrupt, 0, sizeof(corrupt)); + memset(&traceopt, 0, sizeof(traceopt)); memset(present, 0, sizeof(present)); - + if (argc == 0) { + explain(); + return -1; + } while (argc > 0) { if (matches(*argv, "limit") == 0) { NEXT_ARG(); @@ -164,7 +179,7 @@ static int netem_parse_opt(struct qdisc_ if (NEXT_IS_NUMBER()) { NEXT_ARG(); ++present[TCA_NETEM_CORR]; - if (get_percent(&cor.delay_corr, *argv)) { + if (get_percent(&cor.delay_corr, *argv)) { explain1("latency"); return -1; } @@ -243,6 +258,75 @@ static int netem_parse_opt(struct qdisc_ } else if (strcmp(*argv, "help") == 0) { explain(); return -1; + } else if (strcmp(*argv, "trace") == 0) { + int fd; + int execvl; + char *filename; + int pid; + + /*get ticks correct since tracefile is in us, + *and ticks may not be equal to us + */ + get_ticks(&traceopt.ticks, "1000us"); + NEXT_ARG(); + filename = *argv; + if ((fd = open(filename, O_RDONLY, 0)) < 0) { + fprintf(stderr, "Cannot open trace file %s! \n", filename); + return -1; + } + close(fd); + NEXT_ARG(); + if(strcmp(*argv, "buf") == 0) { + NEXT_ARG(); + traceopt.nr_bufs = atoi(*argv); + } + else{ + explain(); + return -1; + } + NEXT_ARG(); + if (strcmp(*argv, "loops") == 0 && NEXT_IS_NUMBER()) { + NEXT_ARG(); + /*child will load tracefile to kernel */ + switch (pid = fork()) { + case -1:{ + fprintf(stderr, + "Cannot fork\n"); + return -1; + } + case 0:{ + execvl = execl(FLOWPATH, "flowseed", filename, *argv, NULL); + if (execvl < 0) { + fprintf(stderr, + "starting child failed\n"); + return -1; + } + } + default:{ + /* parent has to wait until child has done rtnl_open. + * otherwise the kernel module cannot send a notification + * to the child + */ + int semid = semget(0x12345678, 1, IPC_CREAT | 0666); + struct sembuf buf; + buf.sem_num = 0; + buf.sem_op = -1; + buf.sem_flg = SEM_UNDO; + semop(semid, &buf, 1); + semctl(semid, 0, IPC_RMID); + } + } + } + else { + explain(); + return -1; + } + traceopt.def = 0; + if (NEXT_IS_NUMBER()) { + NEXT_ARG(); + traceopt.def = atoi(*argv); + } + traceopt.fid = pid; } else { fprintf(stderr, "What is \"%s\"?\n", *argv); explain(); @@ -291,7 +375,13 @@ static int netem_parse_opt(struct qdisc_ dist_data, dist_size*sizeof(dist_data[0])) < 0) return -1; } - tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail; + if (traceopt.fid) { + if (addattr_l(n, TCA_BUF_MAX, TCA_NETEM_TRACE, &traceopt, + sizeof(traceopt)) < 0) + return -1; + } + + tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail; return 0; } @@ -300,6 +390,8 @@ static int netem_print_opt(struct qdisc_ const struct tc_netem_corr *cor = NULL; const struct tc_netem_reorder *reorder = NULL; const struct tc_netem_corrupt *corrupt = NULL; + const struct tc_netem_trace *traceopt = NULL; + const struct tc_netem_stats *tracestats = NULL; struct tc_netem_qopt qopt; int len = RTA_PAYLOAD(opt) - sizeof(qopt); SPRINT_BUF(b1); @@ -333,9 +425,31 @@ static int netem_print_opt(struct qdisc_ return -1; corrupt = RTA_DATA(tb[TCA_NETEM_CORRUPT]); } + if (tb[TCA_NETEM_TRACE]) { + if (RTA_PAYLOAD(tb[TCA_NETEM_TRACE]) < sizeof(*traceopt)) + return -1; + traceopt = RTA_DATA(tb[TCA_NETEM_TRACE]); + } + if (tb[TCA_NETEM_STATS]) { + if (RTA_PAYLOAD(tb[TCA_NETEM_STATS]) < sizeof(*tracestats)) + return -1; + tracestats = RTA_DATA(tb[TCA_NETEM_STATS]); + } } fprintf(f, "limit %d", qopt.limit); + if (traceopt && traceopt->fid) { + fprintf(f, " trace\n"); + + fprintf(f, "packetcount= %d\n", tracestats->packetcount); + fprintf(f, "packetok= %d\n", tracestats->packetok); + fprintf(f, "normaldelay= %d\n", tracestats->normaldelay); + fprintf(f, "drops= %d\n", tracestats->drops); + fprintf(f, "dupl= %d\n", tracestats->dupl); + fprintf(f, "corrupt= %d\n", tracestats->corrupt); + fprintf(f, "novaliddata= %d\n", tracestats->novaliddata); + fprintf(f, "bufferreload= %d\n", tracestats->reloadbuffer); + } if (qopt.latency) { fprintf(f, " delay %s", sprint_ticks(qopt.latency, b1)); ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-12-28 21:02 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-20 22:11 [PATCH 0/2] netem: trace enhancement Ariane Keller 2007-11-27 13:57 ` Ariane Keller 2007-11-29 21:45 ` Stephen Hemminger 2007-11-29 22:03 ` Patrick McHardy 2007-11-30 16:25 ` Ariane Keller 2007-12-03 7:45 ` Patrick McHardy 2007-12-03 9:12 ` Ariane Keller 2007-12-03 17:35 ` Patrick McHardy 2007-12-03 18:29 ` Ben Greear 2007-12-04 14:45 ` Ariane Keller 2007-12-04 14:58 ` Patrick McHardy 2007-12-05 12:57 ` Ariane Keller 2007-12-05 13:05 ` Patrick McHardy 2007-12-10 14:32 ` Ariane Keller 2007-12-12 23:13 ` Stephen Hemminger 2007-12-04 17:40 ` Ben Greear 2007-12-04 17:54 ` Ariane Keller 2007-12-04 18:07 ` Ben Greear 2007-12-04 21:41 ` Ariane Keller 2007-12-04 22:21 ` Ben Greear 2007-12-05 6:12 ` Ariane Keller 2007-12-23 19:54 ` Ariane Keller 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: kernel Ariane Keller 2007-12-28 16:08 ` Patrick McHardy 2007-12-28 21:02 ` Ariane Keller 2007-12-23 19:54 ` [PATCH 0/2] netem: trace enhancement: iproute Ariane Keller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).