All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Kenny Chang <kchang@athenacr.com>, netdev@vger.kernel.org
Subject: Re: Multicast packet loss
Date: Fri, 27 Feb 2009 22:36:21 +0100	[thread overview]
Message-ID: <49A85CD5.40404@cosmosbay.com> (raw)
In-Reply-To: <49A84916.9090106@cosmosbay.com>

Eric Dumazet a écrit :
> Christoph Lameter a écrit :
>> On Fri, 27 Feb 2009, Eric Dumazet wrote:
>>
>>> Christoph Lameter a ?crit :
>>>> On Fri, 30 Jan 2009, Eric Dumazet wrote:
>>>>> 2.6.29-rc contains UDP receive improvements (lockless)
>>>>> Problem is multicast handling was not yet updated, but could be :)
>>>> When will that happen?
>>> When proven necessary :)
>>>
>>> Kenny problem is about scheduling storm. The extra spin_lock() in UDP
>>> multicast receive is not a problem.
>> My tests here show that 2.6.29-rc5 still looses ~5usec vs. 2.6.22 via
>> UDP. This would fix a regression.....
>>
> 
> Could you elaborate ?
> 
> I just retried Kenny test here. As one cpu is looping in ksoftirqd, only this cpu
> touches the spin_lock, so spin_lock()/spin_unlock() is free.
> 
> oprofile shows that udp stack is lightweight in this case. Problem is about wakeing up
> so many threads...
> 
> CPU: Core 2, speed 3000.16 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
> samples  cum. samples  %        cum. %     symbol name
> 356857   356857        15.1789  15.1789    schedule
> 274028   630885        11.6557  26.8346    mwait_idle
> 189218   820103         8.0484  34.8829    __skb_recv_datagram
> 116903   937006         4.9725  39.8554    skb_release_data
> 103152   1040158        4.3876  44.2430    lock_sock_nested
> 89600    1129758        3.8111  48.0541    udp_recvmsg
> 74171    1203929        3.1549  51.2089    copy_to_user
> 72299    1276228        3.0752  54.2842    set_next_entity
> 60392    1336620        2.5688  56.8529    sched_clock_cpu
> 54026    1390646        2.2980  59.1509    __slab_free
> 50212    1440858        2.1358  61.2867    prepare_to_wait_exclusive
> 38689    1479547        1.6456  62.9323    cpu_idle
> 38142    1517689        1.6224  64.5547    __switch_to
> 36701    1554390        1.5611  66.1157    hrtick_start_fair
> 36673    1591063        1.5599  67.6756    dst_release
> 36268    1627331        1.5427  69.2183    sys_recvfrom
> 35052    1662383        1.4909  70.7092    kmem_cache_free
> 32680    1695063        1.3900  72.0992    pick_next_task_fair
> 31209    1726272        1.3275  73.4267    try_to_wake_up
> 30382    1756654        1.2923  74.7190    dequeue_task_fair
> 29048    1785702        1.2356  75.9545    __copy_skb_header
> 28801    1814503        1.2250  77.1796    sock_def_readable
> 28655    1843158        1.2188  78.3984    enqueue_task_fair
> 27232    1870390        1.1583  79.5567    update_curr
> 21688    1892078        0.9225  80.4792    copy_from_user
> 18832    1910910        0.8010  81.2802    sysenter_past_esp
> 17732    1928642        0.7542  82.0345    finish_task_switch
> 17583    1946225        0.7479  82.7824    resched_task
> 17367    1963592        0.7387  83.5211    native_sched_clock
> 15691    1979283        0.6674  84.1885    task_rq_lock
> 15352    1994635        0.6530  84.8415    sock_queue_rcv_skb
> 15022    2009657        0.6390  85.4804    udp_queue_rcv_skb
> 13999    2023656        0.5954  86.0759    __update_sched_clock
> 12284    2035940        0.5225  86.5984    skb_copy_datagram_iovec
> 11869    2047809        0.5048  87.1032    release_sock
> 10986    2058795        0.4673  87.5705    __wake_up_sync
> 10488    2069283        0.4461  88.0166    sock_recvmsg
> 9686     2078969        0.4120  88.4286    skb_queue_tail
> 9425     2088394        0.4009  88.8295    sys_socketcall
> 
> 

My guess is commit 95766fff6b9a78d11fc2d3812dd035381690b55d
(UDP: Add memory accounting)
Hideo Aoki [Mon, 31 Dec 2007 08:29:24 +0000 (00:29 -0800)]

and 3ab224be6d69de912ee21302745ea45a99274dbc
[NET] CORE: Introducing new memory accounting interface.
Date:   Mon Dec 31 00:11:19 2007 -0800

are responsible for slowdown, because they add some
lock_sock()/release_sock() pairs.

function udp_recvmsg()

out_free:
+       lock_sock(sk);
        skb_free_datagram(sk, skb);
+       release_sock(sk);
 out:

I wonder why we can call __sk_mem_reclaim() when dequeing *one* UDP
frame in queue, while many others can still be in sk_receive_queue.
This defeats memory accounting, no ?

We should avoid lock_sock() if possible, or risk delaying
softirq RX in udp_queue_rcv_skb()



  reply	other threads:[~2009-02-27 21:37 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30 17:49 Multicast packet loss Kenny Chang
2009-01-30 19:04 ` Eric Dumazet
2009-01-30 19:17 ` Denys Fedoryschenko
2009-01-30 20:03 ` Neil Horman
2009-01-30 22:29   ` Kenny Chang
2009-01-30 22:41     ` Eric Dumazet
2009-01-31 16:03       ` Neil Horman
2009-02-02 16:13         ` Kenny Chang
2009-02-02 16:48         ` Kenny Chang
2009-02-03 11:55           ` Neil Horman
2009-02-03 15:20             ` Kenny Chang
2009-02-04  1:15               ` Neil Horman
2009-02-04 16:07                 ` Kenny Chang
2009-02-04 16:46                   ` Wesley Chow
2009-02-04 18:11                     ` Eric Dumazet
2009-02-05 13:33                       ` Neil Horman
2009-02-05 13:46                         ` Wesley Chow
2009-02-05 13:29                   ` Neil Horman
2009-02-01 12:40       ` Eric Dumazet
2009-02-02 13:45         ` Neil Horman
2009-02-02 16:57           ` Eric Dumazet
2009-02-02 18:22             ` Neil Horman
2009-02-02 19:51               ` Wes Chow
2009-02-02 20:29                 ` Eric Dumazet
2009-02-02 21:09                   ` Wes Chow
2009-02-02 21:31                     ` Eric Dumazet
2009-02-03 17:34                       ` Kenny Chang
2009-02-04  1:21                         ` Neil Horman
2009-02-26 17:15                           ` Kenny Chang
2009-02-28  8:51                             ` Eric Dumazet
2009-03-01 17:03                               ` Eric Dumazet
2009-03-04  8:16                               ` David Miller
2009-03-04  8:36                                 ` Eric Dumazet
2009-03-07  7:46                                   ` Eric Dumazet
2009-03-08 16:46                                     ` Eric Dumazet
2009-03-09  2:49                                       ` David Miller
2009-03-09  6:36                                         ` Eric Dumazet
2009-03-13 21:51                                           ` David Miller
2009-03-13 22:30                                             ` Eric Dumazet
2009-03-13 22:38                                               ` David Miller
2009-03-13 22:45                                                 ` Eric Dumazet
2009-03-14  9:03                                                   ` [PATCH] net: reorder fields of struct socket Eric Dumazet
2009-03-16  2:59                                                     ` David Miller
2009-03-16 22:22                                                 ` Multicast packet loss Eric Dumazet
2009-03-17 10:11                                                   ` Peter Zijlstra
2009-03-17 11:08                                                     ` Eric Dumazet
2009-03-17 11:57                                                       ` Peter Zijlstra
2009-03-17 15:00                                                       ` Brian Bloniarz
2009-03-17 15:16                                                         ` Eric Dumazet
2009-03-17 19:39                                                           ` David Stevens
2009-03-17 21:19                                                             ` Eric Dumazet
2009-04-03 19:28                                                   ` Brian Bloniarz
2009-04-05 13:49                                                     ` Eric Dumazet
2009-04-06 21:53                                                       ` Brian Bloniarz
2009-04-06 22:12                                                         ` Brian Bloniarz
2009-04-07 20:08                                                       ` Brian Bloniarz
2009-04-08  8:12                                                         ` Eric Dumazet
2009-03-09 22:56                                       ` Brian Bloniarz
2009-03-10  5:28                                         ` Eric Dumazet
2009-03-10 23:22                                           ` Brian Bloniarz
2009-03-11  3:00                                             ` Eric Dumazet
2009-03-12 15:47                                               ` Brian Bloniarz
2009-03-12 16:34                                                 ` Eric Dumazet
2009-02-27 18:40       ` Christoph Lameter
2009-02-27 18:56         ` Eric Dumazet
2009-02-27 19:45           ` Christoph Lameter
2009-02-27 20:12             ` Eric Dumazet
2009-02-27 21:36               ` Eric Dumazet [this message]
2009-02-02 13:53     ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2009-04-05 14:42 bmb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A85CD5.40404@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=cl@linux-foundation.org \
    --cc=kchang@athenacr.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.