* Lightweight packet timestamping [not found] ` <DB7PR08MB3130C02AB04133E07146F40D9E890@DB7PR08MB3130.eurprd08.prod.outlook.com> @ 2020-06-04 13:30 ` Federico Parola 2020-06-05 23:34 ` David Ahern 0 siblings, 1 reply; 7+ messages in thread From: Federico Parola @ 2020-06-04 13:30 UTC (permalink / raw) To: xdp-newbies@vger.kernel.org Hello everybody, I'm implementing a token bucket algorithm to apply rate limit to traffic and I need the timestamp of packets to update the bucket. To get this information I'm using the bpf_ktime_get_ns() helper but I've discovered it has a non negligible impact on performance. I've seen there is work in progress to make hardware timestamps available to XDP programs, but I don't know if this feature is already available. Is there a faster way to retrieve this information? Thanks for your attention. Federico Parola ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lightweight packet timestamping 2020-06-04 13:30 ` Lightweight packet timestamping Federico Parola @ 2020-06-05 23:34 ` David Ahern 2020-06-10 14:48 ` Federico Parola 0 siblings, 1 reply; 7+ messages in thread From: David Ahern @ 2020-06-05 23:34 UTC (permalink / raw) To: Federico Parola, xdp-newbies@vger.kernel.org On 6/4/20 7:30 AM, Federico Parola wrote: > Hello everybody, > I'm implementing a token bucket algorithm to apply rate limit to traffic and I need the timestamp of packets to update the bucket. To get this information I'm using the bpf_ktime_get_ns() helper but I've discovered it has a non negligible impact on performance. I've seen there is work in progress to make hardware timestamps available to XDP programs, but I don't know if this feature is already available. Is there a faster way to retrieve this information? > Thanks for your attention. > bpf_ktime_get_ns should be fairly light. What kind of performance loss are you seeing with it? XDP does not support access to h/w timestamps at the moment. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lightweight packet timestamping 2020-06-05 23:34 ` David Ahern @ 2020-06-10 14:48 ` Federico Parola 2020-06-10 21:09 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 7+ messages in thread From: Federico Parola @ 2020-06-10 14:48 UTC (permalink / raw) To: xdp-newbies On 06/06/20 01:34, David Ahern wrote: > On 6/4/20 7:30 AM, Federico Parola wrote: >> Hello everybody, >> I'm implementing a token bucket algorithm to apply rate limit to traffic and I need the timestamp of packets to update the bucket. To get this information I'm using the bpf_ktime_get_ns() helper but I've discovered it has a non negligible impact on performance. I've seen there is work in progress to make hardware timestamps available to XDP programs, but I don't know if this feature is already available. Is there a faster way to retrieve this information? >> Thanks for your attention. >> > bpf_ktime_get_ns should be fairly light. What kind of performance loss > are you seeing with it? I've run some tests on a program forwarding packets between two interfaces and applying rate limit: using the bpf_ktime_get_ns() I can process up to 3.84 Mpps, if I replace the helper with a lookup on a map containing the current timestamp updated in user space I go up to 4.48 Mpps. > XDP does not support access to h/w timestamps at the moment. I see, I think I'll keep the map solution for now, since I don't need nanoseconds precision. Thank you. Federico ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lightweight packet timestamping 2020-06-10 14:48 ` Federico Parola @ 2020-06-10 21:09 ` Toke Høiland-Jørgensen 2020-06-16 16:00 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 7+ messages in thread From: Toke Høiland-Jørgensen @ 2020-06-10 21:09 UTC (permalink / raw) To: Federico Parola, xdp-newbies Federico Parola <fede.parola@hotmail.it> writes: > On 06/06/20 01:34, David Ahern wrote: >> On 6/4/20 7:30 AM, Federico Parola wrote: >>> Hello everybody, >>> I'm implementing a token bucket algorithm to apply rate limit to traffic and I need the timestamp of packets to update the bucket. To get this information I'm using the bpf_ktime_get_ns() helper but I've discovered it has a non negligible impact on performance. I've seen there is work in progress to make hardware timestamps available to XDP programs, but I don't know if this feature is already available. Is there a faster way to retrieve this information? >>> Thanks for your attention. >>> >> bpf_ktime_get_ns should be fairly light. What kind of performance loss >> are you seeing with it? > I've run some tests on a program forwarding packets between two > interfaces and applying rate limit: using the bpf_ktime_get_ns() I can > process up to 3.84 Mpps, if I replace the helper with a lookup on a map > containing the current timestamp updated in user space I go up to 4.48 > Mpps. Can you share more details on the platform you're running this on? I.e., CPU and chipset details, network driver, etc. -Toke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lightweight packet timestamping 2020-06-10 21:09 ` Toke Høiland-Jørgensen @ 2020-06-16 16:00 ` Jesper Dangaard Brouer 2020-06-16 16:07 ` David Ahern 0 siblings, 1 reply; 7+ messages in thread From: Jesper Dangaard Brouer @ 2020-06-16 16:00 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: brouer, Federico Parola, xdp-newbies, Kanthi P On Wed, 10 Jun 2020 23:09:34 +0200 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Federico Parola <fede.parola@hotmail.it> writes: > > > On 06/06/20 01:34, David Ahern wrote: > >> On 6/4/20 7:30 AM, Federico Parola wrote: > >>> Hello everybody, > > >>> I'm implementing a token bucket algorithm to apply rate limit to > >>> traffic and I need the timestamp of packets to update the bucket. > >>> To get this information I'm using the bpf_ktime_get_ns() helper > >>> but I've discovered it has a non negligible impact on > >>> performance. I've seen there is work in progress to make hardware > >>> timestamps available to XDP programs, but I don't know if this > >>> feature is already available. Is there a faster way to retrieve > >>> this information? > > >>> Thanks for your attention. > >>> > >> bpf_ktime_get_ns should be fairly light. What kind of performance loss > >> are you seeing with it? > > > > I've run some tests on a program forwarding packets between two > > interfaces and applying rate limit: using the bpf_ktime_get_ns() I can > > process up to 3.84 Mpps, if I replace the helper with a lookup on a map > > containing the current timestamp updated in user space I go up to 4.48 > > Mpps. ((1/3.84*1000)-(1/4.48*1000) = 37.20 ns overhead) I was about to suggest doing something close to this. That is, only call bpf_ktime_get_ns() once per NAPI poll-cycle, and store the timestamp in a map. If you don't need super high per packet precision. You can even use a per-CPU map to store the info (to avoid cross CPU cache/talk), because softirq will keep RX-processing pinned to a CPU. It sounds like you update the timestamp from userspace, is that true? (Quote: "current timestamp updated in user space") I would suggest that you can leverage the softirq tracepoints (use SEC("raw_tracepoint/") for low overhead). E.g. irq:softirq_entry (see when kernel calls trace_softirq_entry) to update the map once per NAPI/net_rx_action. I have a bpftrace based-tool[1] that measure network-softirq latency, e.g time it takes from "softirq_raise" until it is run "softirq_entry". You can leverage ideas from that script, like 'vec == 3' is NET_RX_SOFTIRQ to limit this to networking. [1] https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt > Can you share more details on the platform you're running this on? > I.e., CPU and chipset details, network driver, etc. Yes, please. I plan to work on XDP-feature of extracting hardware offload-info from the drivers descriptor, like timestamps, vlan, rss-hash, checksum, etc. If you tell me what NIC driver you are using, I could make sure to include that in the supported drivers. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lightweight packet timestamping 2020-06-16 16:00 ` Jesper Dangaard Brouer @ 2020-06-16 16:07 ` David Ahern 2020-06-17 9:47 ` Federico Parola 0 siblings, 1 reply; 7+ messages in thread From: David Ahern @ 2020-06-16 16:07 UTC (permalink / raw) To: Jesper Dangaard Brouer, Toke Høiland-Jørgensen Cc: Federico Parola, xdp-newbies, Kanthi P On 6/16/20 10:00 AM, Jesper Dangaard Brouer wrote: > On Wed, 10 Jun 2020 23:09:34 +0200 > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> Federico Parola <fede.parola@hotmail.it> writes: >> >>> On 06/06/20 01:34, David Ahern wrote: >>>> On 6/4/20 7:30 AM, Federico Parola wrote: >>>>> Hello everybody, >> >>>>> I'm implementing a token bucket algorithm to apply rate limit to >>>>> traffic and I need the timestamp of packets to update the bucket. >>>>> To get this information I'm using the bpf_ktime_get_ns() helper >>>>> but I've discovered it has a non negligible impact on >>>>> performance. I've seen there is work in progress to make hardware >>>>> timestamps available to XDP programs, but I don't know if this >>>>> feature is already available. Is there a faster way to retrieve >>>>> this information? >> >>>>> Thanks for your attention. >>>>> >>>> bpf_ktime_get_ns should be fairly light. What kind of performance loss >>>> are you seeing with it? >>> >>> I've run some tests on a program forwarding packets between two >>> interfaces and applying rate limit: using the bpf_ktime_get_ns() I can >>> process up to 3.84 Mpps, if I replace the helper with a lookup on a map >>> containing the current timestamp updated in user space I go up to 4.48 >>> Mpps. > > ((1/3.84*1000)-(1/4.48*1000) = 37.20 ns overhead) I had the same math yesterday and did some tests as well. I am really surprised the timestamp is that high. > > I was about to suggest doing something close to this. That is, only call > bpf_ktime_get_ns() once per NAPI poll-cycle, and store the timestamp in > a map. If you don't need super high per packet precision. You can > even use a per-CPU map to store the info (to avoid cross CPU > cache/talk), because softirq will keep RX-processing pinned to a CPU. > > It sounds like you update the timestamp from userspace, is that true? > (Quote: "current timestamp updated in user space") > > I would suggest that you can leverage the softirq tracepoints (use > SEC("raw_tracepoint/") for low overhead). E.g. irq:softirq_entry > (see when kernel calls trace_softirq_entry) to update the map once per > NAPI/net_rx_action. I have a bpftrace based-tool[1] that measure I have code that measures the overhead of net_rx_action: https://github.com/dsahern/bpf-progs/blob/master/ksrc/net_rx_action.c this use case would just need the enter probe. > network-softirq latency, e.g time it takes from "softirq_raise" until > it is run "softirq_entry". You can leverage ideas from that script, > like 'vec == 3' is NET_RX_SOFTIRQ to limit this to networking. > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt > >> Can you share more details on the platform you're running this on? >> I.e., CPU and chipset details, network driver, etc. > > Yes, please. I plan to work on XDP-feature of extracting hardware > offload-info from the drivers descriptor, like timestamps, vlan, > rss-hash, checksum, etc. If you tell me what NIC driver you are using, > I could make sure to include that in the supported drivers. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lightweight packet timestamping 2020-06-16 16:07 ` David Ahern @ 2020-06-17 9:47 ` Federico Parola 0 siblings, 0 replies; 7+ messages in thread From: Federico Parola @ 2020-06-17 9:47 UTC (permalink / raw) To: xdp-newbies On 16/06/20 18:07, David Ahern wrote: > On 6/16/20 10:00 AM, Jesper Dangaard Brouer wrote: >> On Wed, 10 Jun 2020 23:09:34 +0200 >> Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >>> Federico Parola <fede.parola@hotmail.it> writes: >>> >>>> On 06/06/20 01:34, David Ahern wrote: >>>>> On 6/4/20 7:30 AM, Federico Parola wrote: >>>>>> Hello everybody, >>> >>>>>> I'm implementing a token bucket algorithm to apply rate limit to >>>>>> traffic and I need the timestamp of packets to update the bucket. >>>>>> To get this information I'm using the bpf_ktime_get_ns() helper >>>>>> but I've discovered it has a non negligible impact on >>>>>> performance. I've seen there is work in progress to make hardware >>>>>> timestamps available to XDP programs, but I don't know if this >>>>>> feature is already available. Is there a faster way to retrieve >>>>>> this information? >>> >>>>>> Thanks for your attention. >>>>>> >>>>> bpf_ktime_get_ns should be fairly light. What kind of performance loss >>>>> are you seeing with it? >>>> >>>> I've run some tests on a program forwarding packets between two >>>> interfaces and applying rate limit: using the bpf_ktime_get_ns() I can >>>> process up to 3.84 Mpps, if I replace the helper with a lookup on a map >>>> containing the current timestamp updated in user space I go up to 4.48 >>>> Mpps. >> >> ((1/3.84*1000)-(1/4.48*1000) = 37.20 ns overhead) > > I had the same math yesterday and did some tests as well. I am really > surprised the timestamp is that high. Do your tests show a similar overhead? > >> >> I was about to suggest doing something close to this. That is, only call >> bpf_ktime_get_ns() once per NAPI poll-cycle, and store the timestamp in >> a map. If you don't need super high per packet precision. You can >> even use a per-CPU map to store the info (to avoid cross CPU >> cache/talk), because softirq will keep RX-processing pinned to a CPU. >> >> It sounds like you update the timestamp from userspace, is that true? >> (Quote: "current timestamp updated in user space") >> >> I would suggest that you can leverage the softirq tracepoints (use >> SEC("raw_tracepoint/") for low overhead). E.g. irq:softirq_entry >> (see when kernel calls trace_softirq_entry) to update the map once per >> NAPI/net_rx_action. I have a bpftrace based-tool[1] that measure > > I have code that measures the overhead of net_rx_action: > https://github.com/dsahern/bpf-progs/blob/master/ksrc/net_rx_action.c > > this use case would just need the enter probe. > > >> network-softirq latency, e.g time it takes from "softirq_raise" until >> it is run "softirq_entry". You can leverage ideas from that script, >> like 'vec == 3' is NET_RX_SOFTIRQ to limit this to networking. >> >> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt >> Thanks for your suggestion, currently I have a thread in user space that updates a PERCPU_ARRAY map with the current timestamp every millisecond and the precision seems to be good enough. I'll check your solution as well. >>> Can you share more details on the platform you're running this on? >>> I.e., CPU and chipset details, network driver, etc. >> >> Yes, please. I plan to work on XDP-feature of extracting hardware >> offload-info from the drivers descriptor, like timestamps, vlan, >> rss-hash, checksum, etc. If you tell me what NIC driver you are using, >> I could make sure to include that in the supported drivers. >> > I ran the test on a Intel Xeon Gold 5120 @2.60GHz on a single core using a dual port 40 GbE Intel XL710 NIC (i40e driver), forwarding 64 bytes frames between the ports. Thanks for your help. Federico ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-06-17 9:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <DB7PR08MB3130BDD01387627E7FAD775F9E890@DB7PR08MB3130.eurprd08.prod.outlook.com>
[not found] ` <DB7PR08MB3130C02AB04133E07146F40D9E890@DB7PR08MB3130.eurprd08.prod.outlook.com>
2020-06-04 13:30 ` Lightweight packet timestamping Federico Parola
2020-06-05 23:34 ` David Ahern
2020-06-10 14:48 ` Federico Parola
2020-06-10 21:09 ` Toke Høiland-Jørgensen
2020-06-16 16:00 ` Jesper Dangaard Brouer
2020-06-16 16:07 ` David Ahern
2020-06-17 9:47 ` Federico Parola
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.