From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue Date: Mon, 26 Apr 2010 16:55:07 +0200 Message-ID: <1272293707.19143.51.camel@edumazet-laptop> References: <1272010378-2955-1-git-send-email-xiaosuo@gmail.com> <1272014825.7895.7851.camel@edumazet-laptop> <1272060153.8918.8.camel@bigi> <1272118252.8918.13.camel@bigi> <1272290584.19143.43.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Changli Gao , "David S. Miller" , Tom Herbert , Stephen Hemminger , netdev@vger.kernel.org, Andi Kleen To: hadi@cyberus.ca Return-path: Received: from mail-bw0-f219.google.com ([209.85.218.219]:58969 "EHLO mail-bw0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751891Ab0DZOzN (ORCPT ); Mon, 26 Apr 2010 10:55:13 -0400 Received: by bwz19 with SMTP id 19so222523bwz.21 for ; Mon, 26 Apr 2010 07:55:11 -0700 (PDT) In-Reply-To: <1272290584.19143.43.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: Le lundi 26 avril 2010 =C3=A0 16:03 +0200, Eric Dumazet a =C3=A9crit : > Le samedi 24 avril 2010 =C3=A0 10:10 -0400, jamal a =C3=A9crit : > > On Fri, 2010-04-23 at 18:02 -0400, jamal wrote: > >=20 > > > Ive done a setup with the last patch from Changli + net-next - I = will > > > post test results tomorrow AM. > >=20 > > ok, annotated results attached.=20 > >=20 > > cheers, > > jamal >=20 > Jamal, I have a Nehalem setup now, and I can see > _raw_spin_lock_irqsave() abuse is not coming from network tree, but f= rom > clockevents_notify() >=20 Another interesting finding: - if all packets are received on a single queue, max speed seems to be 1.200.000 packets per second on my machine :-( And on profile of receiving cpu (RPS enabled, pakets sent to 15 other cpus), we can see default_send_IPI_mask_sequence_phys() is the slow thing... Andi, what do you think of this one ? Dont we have a function to send an IPI to an individual cpu instead ? void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, in= t vector) { unsigned long query_cpu; unsigned long flags; /* * Hack. The clustered APIC addressing mode doesn't allow us to send * to an arbitrary mask, so I do a unicast to each CPU instead. * - mbligh */ local_irq_save(flags); for_each_cpu(query_cpu, mask) { __default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid= , query_cpu), vector, APIC_DEST_PHYSICAL)= ; } local_irq_restore(flags); } -----------------------------------------------------------------------= ------------------------------------------------------------------ PerfTop: 1000 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu= : 7) -----------------------------------------------------------------------= ------------------------------------------------------------------ samples pcnt function DSO _______ _____ ___________________________________ _______ 668.00 17.7% default_send_IPI_mask_sequence_phys vmlinux 363.00 9.6% bnx2x_rx_int vmlinux 354.00 9.4% eth_type_trans vmlinux 332.00 8.8% kmem_cache_alloc_node vmlinux 285.00 7.6% __kmalloc_node_track_caller vmlinux 278.00 7.4% _raw_spin_lock vmlinux 166.00 4.4% __slab_alloc vmlinux 147.00 3.9% __memset vmlinux 136.00 3.6% list_del vmlinux 132.00 3.5% get_partial_node vmlinux 131.00 3.5% get_rps_cpu vmlinux 102.00 2.7% enqueue_to_backlog vmlinux 95.00 2.5% unmap_single vmlinux 94.00 2.5% __alloc_skb vmlinux 74.00 2.0% vlan_gro_common vmlinux 52.00 1.4% __phys_addr vmlinux 48.00 1.3% dev_gro_receive vmlinux 39.00 1.0% swiotlb_dma_mapping_error vmlinux 36.00 1.0% swiotlb_map_page vmlinux 34.00 0.9% skb_put vmlinux 27.00 0.7% is_swiotlb_buffer vmlinux 23.00 0.6% deactivate_slab vmlinux 20.00 0.5% vlan_gro_receive vmlinux 17.00 0.5% __skb_bond_should_drop vmlinux 14.00 0.4% netif_receive_skb vmlinux 14.00 0.4% __netdev_alloc_skb vmlinux 12.00 0.3% skb_gro_reset_offset vmlinux 12.00 0.3% get_slab vmlinux 11.00 0.3% napi_skb_finish vmlinux