From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Tom Herbert <tom@herbertland.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
David Miller <davem@davemloft.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
Or Gerlitz <gerlitz.or@gmail.com>,
Eric Dumazet <edumazet@google.com>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Alexander Duyck <alexander.duyck@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Daniel Borkmann <borkmann@iogearbox.net>,
Marek Majkowski <marek@cloudflare.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
Florian Westphal <fw@strlen.de>, Paolo Abeni <pabeni@redhat.com>,
John Fastabend <john.r.fastabend@intel.com>,
Amir Vadai <amirva@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Vladislav Yasevich <vyasevich@gmail.com>,
brouer@redhat.com
Subject: Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)
Date: Thu, 28 Jan 2016 10:25:30 +0100 [thread overview]
Message-ID: <20160128102530.5c07a8e7@redhat.com> (raw)
In-Reply-To: <CALx6S350ra6fmZrcabasndWBq4A3L-xKLS__jy1gi3xmhMc=sA@mail.gmail.com>
On Wed, 27 Jan 2016 18:50:27 -0800
Tom Herbert <tom@herbertland.com> wrote:
> On Wed, Jan 27, 2016 at 12:47 PM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > On Mon, 25 Jan 2016 23:10:16 +0100
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> >> On Mon, 25 Jan 2016 09:50:16 -0800 John Fastabend <john.fastabend@gmail.com> wrote:
> >>
> >> > On 16-01-25 09:09 AM, Tom Herbert wrote:
> >> > > On Mon, Jan 25, 2016 at 5:15 AM, Jesper Dangaard Brouer
> >> > > <brouer@redhat.com> wrote:
> >> > >>
> >> [...]
> >> > >>
> >> > >> There are two ideas, getting mixed up here. (1) bundling from the
> >> > >> RX-ring, (2) allowing to pick up the "packet-page" directly.
> >> > >>
> >> > >> Bundling (1) is something that seems natural, and which help us
> >> > >> amortize the cost between layers (and utilizes icache better). Lets
> >> > >> keep that in another thread.
> >> > >>
> >> > >> This (2) direct forward of "packet-pages" is a fairly extreme idea,
> >> > >> BUT it have the potential of being an new integration point for
> >> > >> "selective" bypass-solutions and bringing RAW/af_packet (RX) up-to
> >> > >> speed with bypass-solutions.
> >> >
> >> [...]
> >> >
> >> > Jesper, at least for you (2) case what are we missing with the
> >> > bifurcated/queue splitting work? Are you really after systems
> >> > without SR-IOV support or are you trying to get this on the order
> >> > of queues instead of VFs.
> >>
> >> I'm not saying something is missing for bifurcated/queue splitting work.
> >> I'm not trying to work-around SR-IOV.
> >>
> >> This an extreme idea, which I got while looking at the lowest RX layer.
> >>
> >>
> >> Before working any further on this idea/path, I need/want to evaluate
> >> if it makes sense from a performance point of view. I need to evaluate
> >> if "pulling" out these "packet-pages" is fast enough to compete with
> >> DPDK/netmap. Else it makes no sense to work on this path.
> >>
> >> As a first step to evaluate this lowest RX layer, I'm simply hacking
> >> the drivers (ixgbe and mlx5) to drop/discard packets within-the-driver.
> >> For now, simply replacing napi_gro_receive() with dev_kfree_skb(), and
> >> measuring the "RX-drop" performance.
> >>
> >> Next step was to avoid the skb alloc+free calls, but doing so is more
> >> complicated that I first anticipated, as the SKB is tied in fairly
> >> heavily. Thus, right now I'm instead hooking in my bulk alloc+free
> >> API, as that will remove/mitigate most of the overhead of the
> >> kmem_cache/slab-allocators.
> >
> > I've tried to deduct that kind of speeds we can achieve, at this lowest
> > RX layer. By in the mlx5/100G driver drop packets directly in the driver.
> > Just replacing replacing napi_gro_receive() with dev_kfree_skb(), was
> > fairly depressing, showing only 6.2Mpps (6253970 pps => 159.9 ns) (single core).
> >
> > Looking at the perf report showed major cache-miss in eth_type_trans(29%/47ns).
> >
> > And driver is hitting the SLUB slowpath quite badly (because it
> > prealloc SKBs and binds to RX ring, usually this test case would hits
> > SLUB "recycle" fastpath):
> >
> > Group-report: kmem_cache/SLUB allocator functions ::
> > 5.00 % ~= 8.0 ns <= __slab_free
> > 4.91 % ~= 7.9 ns <= cmpxchg_double_slab.isra.65
> > 4.22 % ~= 6.7 ns <= kmem_cache_alloc
> > 1.68 % ~= 2.7 ns <= kmem_cache_free
> > 1.10 % ~= 1.8 ns <= ___slab_alloc
> > 0.93 % ~= 1.5 ns <= __cmpxchg_double_slab.isra.54
> > 0.65 % ~= 1.0 ns <= __slab_alloc.isra.74
> > 0.26 % ~= 0.4 ns <= put_cpu_partial
> > Sum: 18.75 % => calc: 30.0 ns (sum: 30.0 ns) => Total: 159.9 ns
> >
> > To get around the cache-miss in eth_type_trans(), I created a
> > "icache-loop" in mlx5e_poll_rx_cq() and pull all RX-ring packets "out",
> > before calling eth_type_trans(), reducing cost to 2.45%.
> >
> > To mitigate the SLUB slowpath, I used my slab + SKB-napi bulk API . And
> > also tuned SLUB (with slub_nomerge slub_min_objects=128) to get bigger
> > slab-pages, thus bigger bulk opportunities.
> >
> > This helped a lot, I can now drop 12Mpps (12,088,767 => 82.7 ns).
> >
> > Group-report: kmem_cache/SLUB allocator functions ::
> > 4.99 % ~= 4.1 ns <= kmem_cache_alloc_bulk
> > 2.87 % ~= 2.4 ns <= kmem_cache_free_bulk
> > 0.24 % ~= 0.2 ns <= ___slab_alloc
> > 0.23 % ~= 0.2 ns <= __slab_free
> > 0.21 % ~= 0.2 ns <= __cmpxchg_double_slab.isra.54
> > 0.17 % ~= 0.1 ns <= cmpxchg_double_slab.isra.65
> > 0.07 % ~= 0.1 ns <= put_cpu_partial
> > 0.04 % ~= 0.0 ns <= unfreeze_partials.isra.71
> > 0.03 % ~= 0.0 ns <= get_partial_node.isra.72
> > Sum: 8.85 % => calc: 7.3 ns (sum: 7.3 ns) => Total: 82.7 ns
> >
> > Full perf report output below signature, is from optimized case.
> >
> > SKB related cost is 22.9 ns. However 51.7% (11.84ns) cost originates
> > from memset of the SKB.
> >
> > Group-report: related to pattern "skb" ::
> > 17.92 % ~= 14.8 ns <= __napi_alloc_skb <== 80% memset(0) / rep stos
> > 3.29 % ~= 2.7 ns <= skb_release_data
> > 2.20 % ~= 1.8 ns <= napi_consume_skb
> > 1.86 % ~= 1.5 ns <= skb_release_head_state
> > 1.20 % ~= 1.0 ns <= skb_put
> > 1.14 % ~= 0.9 ns <= skb_release_all
> > 0.02 % ~= 0.0 ns <= __kfree_skb_flush
> > Sum: 27.63 % => calc: 22.9 ns (sum: 22.9 ns) => Total: 82.7 ns
> >
> > Doing a crude extrapolation, 82.7 ns subtract, SLUB (7.3 ns) and SKB
> > (22.9 ns) related => 52.5 ns -> extrapolate 19 Mpps would be the
> > maximum speed we can pull off packet-pages from the RX ring.
> >
> > I don't know if 19Mpps (52.5 ns "overhead") is fast enough, to compete
> > with just mapping a RX HW queue/ring to netmap or via SR-IOV to DPDK(?)
> >
> > But it was interesting to see how the lowest RX layer performs...
>
> Cool stuff!
Thanks :-)
> Looking at the typical driver receive path, I'm wonder if we should
> beak netif_receive_skb (napi_gro_receive) into two parts. One utility
> function to create a list of received skb's and prefetch the data
> called as ring is processed, the other one to give the list to the
> stack (e.g. netif_receive_skbs) and defer eth_type_trans as long as
> possible. Is something like this what you are contemplating?
Yes, that is exactly what I'm contemplating :-) That is idea "(1)".
A natural extension to this work, which I expect Tom will love, is to
also use the idea for RPS. Once we have a SKB list in stack/GRO-layer,
then we could build a local sk_buff_head list for each remote CPU, by
calling get_rps_cpu(). And then enqueue_list_to_backlog, by a
skb_queue_splice_tail(&cpu_list, &cpu->sd->input_pkt_queue) call.
This would amortize the cost of transferring packets to a remote CPU,
which Eric AFAIK points out is costing approx ~133ns.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
> > Perf-report script:
> > * https://github.com/netoptimizer/network-testing/blob/master/bin/perf_report_pps_stats.pl
> >
> > Report: ALL functions ::
> > 19.71 % ~= 16.3 ns <= mlx5e_poll_rx_cq
> > 17.92 % ~= 14.8 ns <= __napi_alloc_skb
> > 9.54 % ~= 7.9 ns <= __free_page_frag
> > 7.16 % ~= 5.9 ns <= mlx5e_get_cqe
> > 6.37 % ~= 5.3 ns <= mlx5e_post_rx_wqes
> > 4.99 % ~= 4.1 ns <= kmem_cache_alloc_bulk
> > 3.70 % ~= 3.1 ns <= __alloc_page_frag
> > 3.29 % ~= 2.7 ns <= skb_release_data
> > 2.87 % ~= 2.4 ns <= kmem_cache_free_bulk
> > 2.45 % ~= 2.0 ns <= eth_type_trans
> > 2.43 % ~= 2.0 ns <= get_page_from_freelist
> > 2.36 % ~= 2.0 ns <= swiotlb_map_page
> > 2.20 % ~= 1.8 ns <= napi_consume_skb
> > 1.86 % ~= 1.5 ns <= skb_release_head_state
> > 1.25 % ~= 1.0 ns <= free_pages_prepare
> > 1.20 % ~= 1.0 ns <= skb_put
> > 1.14 % ~= 0.9 ns <= skb_release_all
> > 0.77 % ~= 0.6 ns <= __free_pages_ok
> > 0.59 % ~= 0.5 ns <= get_pfnblock_flags_mask
> > 0.59 % ~= 0.5 ns <= swiotlb_dma_mapping_error
> > 0.59 % ~= 0.5 ns <= unmap_single
> > 0.58 % ~= 0.5 ns <= _raw_spin_lock_irqsave
> > 0.57 % ~= 0.5 ns <= free_one_page
> > 0.56 % ~= 0.5 ns <= swiotlb_unmap_page
> > 0.52 % ~= 0.4 ns <= _raw_spin_lock
> > 0.46 % ~= 0.4 ns <= __mod_zone_page_state
> > 0.36 % ~= 0.3 ns <= __rmqueue
> > 0.36 % ~= 0.3 ns <= net_rx_action
> > 0.34 % ~= 0.3 ns <= __alloc_pages_nodemask
> > 0.31 % ~= 0.3 ns <= __zone_watermark_ok
> > 0.27 % ~= 0.2 ns <= mlx5e_napi_poll
> > 0.24 % ~= 0.2 ns <= ___slab_alloc
> > 0.23 % ~= 0.2 ns <= __slab_free
> > 0.22 % ~= 0.2 ns <= __list_del_entry
> > 0.21 % ~= 0.2 ns <= __cmpxchg_double_slab.isra.54
> > 0.21 % ~= 0.2 ns <= next_zones_zonelist
> > 0.20 % ~= 0.2 ns <= __list_add
> > 0.17 % ~= 0.1 ns <= __do_softirq
> > 0.17 % ~= 0.1 ns <= cmpxchg_double_slab.isra.65
> > 0.16 % ~= 0.1 ns <= __inc_zone_state
> > 0.12 % ~= 0.1 ns <= _raw_spin_unlock
> > 0.12 % ~= 0.1 ns <= zone_statistics
> > (Percent limit(0.1%) stop at "mlx5e_poll_tx_cq")
> > Sum: 99.45 % => calc: 82.3 ns (sum: 82.3 ns) => Total: 82.7 ns
> >
> > Group-report: related to pattern "eth_type_trans|mlx5|ixgbe|__iowrite64_copy" ::
> > (Driver related)
> > 19.71 % ~= 16.3 ns <= mlx5e_poll_rx_cq
> > 7.16 % ~= 5.9 ns <= mlx5e_get_cqe
> > 6.37 % ~= 5.3 ns <= mlx5e_post_rx_wqes
> > 2.45 % ~= 2.0 ns <= eth_type_trans
> > 0.27 % ~= 0.2 ns <= mlx5e_napi_poll
> > 0.09 % ~= 0.1 ns <= mlx5e_poll_tx_cq
> > Sum: 36.05 % => calc: 29.8 ns (sum: 29.8 ns) => Total: 82.7 ns
> >
> > Group-report: DMA functions ::
> > 2.36 % ~= 2.0 ns <= swiotlb_map_page
> > 0.59 % ~= 0.5 ns <= unmap_single
> > 0.59 % ~= 0.5 ns <= swiotlb_dma_mapping_error
> > 0.56 % ~= 0.5 ns <= swiotlb_unmap_page
> > Sum: 4.10 % => calc: 3.4 ns (sum: 3.4 ns) => Total: 82.7 ns
> >
> > Group-report: page_frag_cache functions ::
> > 9.54 % ~= 7.9 ns <= __free_page_frag
> > 3.70 % ~= 3.1 ns <= __alloc_page_frag
> > 2.43 % ~= 2.0 ns <= get_page_from_freelist
> > 1.25 % ~= 1.0 ns <= free_pages_prepare
> > 0.77 % ~= 0.6 ns <= __free_pages_ok
> > 0.59 % ~= 0.5 ns <= get_pfnblock_flags_mask
> > 0.57 % ~= 0.5 ns <= free_one_page
> > 0.46 % ~= 0.4 ns <= __mod_zone_page_state
> > 0.36 % ~= 0.3 ns <= __rmqueue
> > 0.34 % ~= 0.3 ns <= __alloc_pages_nodemask
> > 0.31 % ~= 0.3 ns <= __zone_watermark_ok
> > 0.21 % ~= 0.2 ns <= next_zones_zonelist
> > 0.16 % ~= 0.1 ns <= __inc_zone_state
> > 0.12 % ~= 0.1 ns <= zone_statistics
> > 0.02 % ~= 0.0 ns <= mod_zone_page_state
> > Sum: 20.83 % => calc: 17.2 ns (sum: 17.2 ns) => Total: 82.7 ns
> >
> > Group-report: kmem_cache/SLUB allocator functions ::
> > 4.99 % ~= 4.1 ns <= kmem_cache_alloc_bulk
> > 2.87 % ~= 2.4 ns <= kmem_cache_free_bulk
> > 0.24 % ~= 0.2 ns <= ___slab_alloc
> > 0.23 % ~= 0.2 ns <= __slab_free
> > 0.21 % ~= 0.2 ns <= __cmpxchg_double_slab.isra.54
> > 0.17 % ~= 0.1 ns <= cmpxchg_double_slab.isra.65
> > 0.07 % ~= 0.1 ns <= put_cpu_partial
> > 0.04 % ~= 0.0 ns <= unfreeze_partials.isra.71
> > 0.03 % ~= 0.0 ns <= get_partial_node.isra.72
> > Sum: 8.85 % => calc: 7.3 ns (sum: 7.3 ns) => Total: 82.7 ns
> >
> > Group-report: related to pattern "skb" ::
> > 17.92 % ~= 14.8 ns <= __napi_alloc_skb <== 80% memset(0) / rep stos
> > 3.29 % ~= 2.7 ns <= skb_release_data
> > 2.20 % ~= 1.8 ns <= napi_consume_skb
> > 1.86 % ~= 1.5 ns <= skb_release_head_state
> > 1.20 % ~= 1.0 ns <= skb_put
> > 1.14 % ~= 0.9 ns <= skb_release_all
> > 0.02 % ~= 0.0 ns <= __kfree_skb_flush
> > Sum: 27.63 % => calc: 22.9 ns (sum: 22.9 ns) => Total: 82.7 ns
> >
> > Group-report: Core network-stack functions ::
> > 0.36 % ~= 0.3 ns <= net_rx_action
> > 0.17 % ~= 0.1 ns <= __do_softirq
> > 0.02 % ~= 0.0 ns <= __raise_softirq_irqoff
> > 0.01 % ~= 0.0 ns <= run_ksoftirqd
> > 0.00 % ~= 0.0 ns <= run_timer_softirq
> > 0.00 % ~= 0.0 ns <= ksoftirqd_should_run
> > 0.00 % ~= 0.0 ns <= raise_softirq
> > Sum: 0.56 % => calc: 0.5 ns (sum: 0.5 ns) => Total: 82.7 ns
> >
> > Group-report: GRO network-stack functions ::
> > Sum: 0.00 % => calc: 0.0 ns (sum: 0.0 ns) => Total: 82.7 ns
> >
> > Group-report: related to pattern "spin_.*lock|mutex" ::
> > 0.58 % ~= 0.5 ns <= _raw_spin_lock_irqsave
> > 0.52 % ~= 0.4 ns <= _raw_spin_lock
> > 0.12 % ~= 0.1 ns <= _raw_spin_unlock
> > 0.01 % ~= 0.0 ns <= _raw_spin_unlock_irqrestore
> > 0.00 % ~= 0.0 ns <= __mutex_lock_slowpath
> > 0.00 % ~= 0.0 ns <= _raw_spin_lock_irq
> > Sum: 1.23 % => calc: 1.0 ns (sum: 1.0 ns) => Total: 82.7 ns
> >
> > Negative Report: functions NOT included in group reports::
> > 0.22 % ~= 0.2 ns <= __list_del_entry
> > 0.20 % ~= 0.2 ns <= __list_add
> > 0.07 % ~= 0.1 ns <= list_del
> > 0.05 % ~= 0.0 ns <= native_sched_clock
> > 0.04 % ~= 0.0 ns <= irqtime_account_irq
> > 0.02 % ~= 0.0 ns <= rcu_bh_qs
> > 0.01 % ~= 0.0 ns <= task_tick_fair
> > 0.01 % ~= 0.0 ns <= net_rps_action_and_irq_enable.isra.112
> > 0.01 % ~= 0.0 ns <= perf_event_task_tick
> > 0.01 % ~= 0.0 ns <= apic_timer_interrupt
> > 0.01 % ~= 0.0 ns <= lapic_next_deadline
> > 0.01 % ~= 0.0 ns <= rcu_check_callbacks
> > 0.01 % ~= 0.0 ns <= smpboot_thread_fn
> > 0.01 % ~= 0.0 ns <= irqtime_account_process_tick.isra.3
> > 0.00 % ~= 0.0 ns <= intel_bts_enable_local
> > 0.00 % ~= 0.0 ns <= kthread_should_park
> > 0.00 % ~= 0.0 ns <= native_apic_mem_write
> > 0.00 % ~= 0.0 ns <= hrtimer_forward
> > 0.00 % ~= 0.0 ns <= get_work_pool
> > 0.00 % ~= 0.0 ns <= cpu_startup_entry
> > 0.00 % ~= 0.0 ns <= acct_account_cputime
> > 0.00 % ~= 0.0 ns <= set_next_entity
> > 0.00 % ~= 0.0 ns <= worker_thread
> > 0.00 % ~= 0.0 ns <= dbs_timer_handler
> > 0.00 % ~= 0.0 ns <= delay_tsc
> > 0.00 % ~= 0.0 ns <= idle_cpu
> > 0.00 % ~= 0.0 ns <= timerqueue_add
> > 0.00 % ~= 0.0 ns <= hrtimer_interrupt
> > 0.00 % ~= 0.0 ns <= dbs_work_handler
> > 0.00 % ~= 0.0 ns <= dequeue_entity
> > 0.00 % ~= 0.0 ns <= update_cfs_shares
> > 0.00 % ~= 0.0 ns <= update_fast_timekeeper
> > 0.00 % ~= 0.0 ns <= smp_trace_apic_timer_interrupt
> > 0.00 % ~= 0.0 ns <= __update_cpu_load
> > 0.00 % ~= 0.0 ns <= cpu_needs_another_gp
> > 0.00 % ~= 0.0 ns <= ret_from_intr
> > 0.00 % ~= 0.0 ns <= __intel_pmu_enable_all
> > 0.00 % ~= 0.0 ns <= trigger_load_balance
> > 0.00 % ~= 0.0 ns <= __schedule
> > 0.00 % ~= 0.0 ns <= nsecs_to_jiffies64
> > 0.00 % ~= 0.0 ns <= account_entity_dequeue
> > 0.00 % ~= 0.0 ns <= worker_enter_idle
> > 0.00 % ~= 0.0 ns <= __hrtimer_get_next_event
> > 0.00 % ~= 0.0 ns <= rcu_irq_exit
> > 0.00 % ~= 0.0 ns <= rb_erase
> > 0.00 % ~= 0.0 ns <= __intel_pmu_disable_all
> > 0.00 % ~= 0.0 ns <= tick_sched_do_timer
> > 0.00 % ~= 0.0 ns <= cpuacct_account_field
> > 0.00 % ~= 0.0 ns <= update_wall_time
> > 0.00 % ~= 0.0 ns <= notifier_call_chain
> > 0.00 % ~= 0.0 ns <= timekeeping_update
> > 0.00 % ~= 0.0 ns <= ktime_get_update_offsets_now
> > 0.00 % ~= 0.0 ns <= rb_next
> > 0.00 % ~= 0.0 ns <= rcu_all_qs
> > 0.00 % ~= 0.0 ns <= x86_pmu_disable
> > 0.00 % ~= 0.0 ns <= _cond_resched
> > 0.00 % ~= 0.0 ns <= __rcu_read_lock
> > 0.00 % ~= 0.0 ns <= __local_bh_enable
> > 0.00 % ~= 0.0 ns <= update_cpu_load_active
> > 0.00 % ~= 0.0 ns <= x86_pmu_enable
> > 0.00 % ~= 0.0 ns <= insert_work
> > 0.00 % ~= 0.0 ns <= ktime_get
> > 0.00 % ~= 0.0 ns <= __usecs_to_jiffies
> > 0.00 % ~= 0.0 ns <= __acct_update_integrals
> > 0.00 % ~= 0.0 ns <= scheduler_tick
> > 0.00 % ~= 0.0 ns <= update_vsyscall
> > 0.00 % ~= 0.0 ns <= memcpy_erms
> > 0.00 % ~= 0.0 ns <= get_cpu_idle_time_us
> > 0.00 % ~= 0.0 ns <= sched_clock_cpu
> > 0.00 % ~= 0.0 ns <= tick_do_update_jiffies64
> > 0.00 % ~= 0.0 ns <= hrtimer_active
> > 0.00 % ~= 0.0 ns <= profile_tick
> > 0.00 % ~= 0.0 ns <= __hrtimer_run_queues
> > 0.00 % ~= 0.0 ns <= kthread_should_stop
> > 0.00 % ~= 0.0 ns <= run_posix_cpu_timers
> > 0.00 % ~= 0.0 ns <= read_tsc
> > 0.00 % ~= 0.0 ns <= __remove_hrtimer
> > 0.00 % ~= 0.0 ns <= calc_global_load_tick
> > 0.00 % ~= 0.0 ns <= hrtimer_run_queues
> > 0.00 % ~= 0.0 ns <= irq_work_tick
> > 0.00 % ~= 0.0 ns <= cpuacct_charge
> > 0.00 % ~= 0.0 ns <= clockevents_program_event
> > 0.00 % ~= 0.0 ns <= update_blocked_averages
> > Sum: 0.68 % => calc: 0.6 ns (sum: 0.6 ns) => Total: 82.7 ns
> >
> >
next prev parent reply other threads:[~2016-01-28 9:25 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-15 13:22 Optimizing instruction-cache, more packets at each stage Jesper Dangaard Brouer
2016-01-15 13:32 ` Hannes Frederic Sowa
2016-01-15 14:17 ` Jesper Dangaard Brouer
2016-01-15 13:36 ` David Laight
2016-01-15 14:00 ` Jesper Dangaard Brouer
2016-01-15 14:38 ` Felix Fietkau
2016-01-18 11:54 ` Jesper Dangaard Brouer
2016-01-18 17:01 ` Eric Dumazet
2016-01-25 0:08 ` Florian Fainelli
2016-01-15 20:47 ` David Miller
2016-01-18 10:27 ` Jesper Dangaard Brouer
2016-01-18 16:24 ` David Miller
2016-01-20 22:20 ` Or Gerlitz
2016-01-20 23:02 ` Eric Dumazet
2016-01-20 23:27 ` Tom Herbert
2016-01-21 11:27 ` Jesper Dangaard Brouer
2016-01-21 12:49 ` Or Gerlitz
2016-01-21 13:57 ` Jesper Dangaard Brouer
2016-01-21 18:56 ` David Miller
2016-01-21 22:45 ` Or Gerlitz
2016-01-21 22:59 ` David Miller
2016-01-21 16:38 ` Eric Dumazet
2016-01-21 18:54 ` David Miller
2016-01-24 14:28 ` Jesper Dangaard Brouer
2016-01-24 14:44 ` Michael S. Tsirkin
2016-01-24 17:28 ` John Fastabend
2016-01-25 13:15 ` Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage) Jesper Dangaard Brouer
2016-01-25 17:09 ` Tom Herbert
2016-01-25 17:50 ` John Fastabend
2016-01-25 21:32 ` Tom Herbert
2016-01-25 21:58 ` John Fastabend
2016-01-25 22:10 ` Jesper Dangaard Brouer
2016-01-27 20:47 ` Jesper Dangaard Brouer
2016-01-27 21:56 ` Alexei Starovoitov
2016-01-28 9:52 ` Jesper Dangaard Brouer
2016-01-28 12:54 ` Eric Dumazet
2016-01-28 13:25 ` Eric Dumazet
2016-01-28 16:43 ` Tom Herbert
2016-01-28 2:50 ` Tom Herbert
2016-01-28 9:25 ` Jesper Dangaard Brouer [this message]
2016-01-28 12:45 ` Eric Dumazet
2016-01-28 16:37 ` Tom Herbert
2016-01-28 16:43 ` Eric Dumazet
2016-01-28 17:04 ` Jesper Dangaard Brouer
2016-01-24 20:09 ` Optimizing instruction-cache, more packets at each stage Tom Herbert
2016-01-24 21:41 ` John Fastabend
2016-01-24 23:50 ` Tom Herbert
2016-01-21 12:23 ` Jesper Dangaard Brouer
2016-01-21 16:38 ` Tom Herbert
2016-01-21 17:48 ` Eric Dumazet
2016-01-22 12:33 ` Jesper Dangaard Brouer
2016-01-22 14:33 ` Eric Dumazet
2016-01-22 17:07 ` Tom Herbert
2016-01-22 17:17 ` Jesper Dangaard Brouer
2016-02-02 16:13 ` Or Gerlitz
2016-02-02 16:37 ` Eric Dumazet
2016-01-18 16:53 ` Eric Dumazet
2016-01-18 17:36 ` Tom Herbert
2016-01-18 17:49 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160128102530.5c07a8e7@redhat.com \
--to=brouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=amirva@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=fw@strlen.de \
--cc=gerlitz.or@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=john.fastabend@gmail.com \
--cc=john.r.fastabend@intel.com \
--cc=marek@cloudflare.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tom@herbertland.com \
--cc=vyasevich@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).