All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kenny Chang <kchang@athenacr.com>
To: netdev@vger.kernel.org
Subject: Re: Multicast packet loss
Date: Tue, 03 Feb 2009 12:34:54 -0500	[thread overview]
Message-ID: <4988803E.2020009@athenacr.com> (raw)
In-Reply-To: <4987663D.6080802@cosmosbay.com>

[-- Attachment #1: Type: text/plain, Size: 4951 bytes --]

Eric Dumazet wrote:
> Wes Chow a écrit :
>   
>> Eric Dumazet wrote:
>>     
>>> Wes Chow a écrit :
>>>       
>>>> (I'm Kenny's colleague, and I've been doing the kernel builds)
>>>>
>>>> First I'd like to note that there were a lot of bnx2 NAPI changes
>>>> between 2.6.21 and 2.6.22. As a reminder, 2.6.21 shows tiny amounts
>>>> of packet loss,
>>>> whereas loss in 2.6.22 is significant.
>>>>
>>>> Second, some CPU affinity info: if I do like Eric and pin all of the
>>>> apps onto a single CPU, I see no packet loss. Also, I do *not* see
>>>> ksoftirqd show up on top at all!
>>>>
>>>> If I pin half the processes on one CPU and the other half on another
>>>> CPU, one ksoftirqd processes shows up in top and completely pegs one
>>>> CPU. My packet loss
>>>> in that case is significant (25%).
>>>>
>>>> Now, the strange case: if I pin 3 processes to one CPU and 1 process
>>>> to another, I get about 25% packet loss and ksoftirqd pins one CPU.
>>>> However, one
>>>> of the apps takes significantly less CPU than the others, and all
>>>> apps lose the
>>>> *exact same number of packets*. In all other situations where we see
>>>> packet
>>>> loss, the actual number lost per application instance appears random.
>>>>         
>>> You see same number of packet lost because they are lost at NIC level
>>>       
>> Understood.
>>
>> I have a new observation: if I pin processes to just CPUs 0 and 1, I see
>> no packet loss. Pinning to 0 and 2, I do see packet loss. Pinning 2 and
>> 3, no packet loss. 4 & 5 - no packet loss, 6 & 7 - no packet loss. Any
>> other combination appears to produce loss (though I have not tried all
>> 28 combinations, this seems to be the case).
>>
>> At first I thought maybe it had to do with processes pinned to the same
>> CPU, but different cores. The machine is a dual quad core, which means
>> that CPUs 0-3 should be a physical CPU, correct? Pinning to 0/2 and 0/3
>> produce packet loss.
>>     
>
> a quad core is really a 2 x 2 core
>
> L2 cache is splited on two blocks, one block used by CPU0/1, other by CPU2/3 
>
> You are at the limit of the machine with such workload, so as soon as your
> CPUs have to transfert 64 bytes lines between those two L2 blocks, you loose.
>
>
>   
>> I've also noticed that it does not matter which of the working pairs I
>> pin to. For example, pinning 5 processes in any combination on either
>> 0/1 produce no packet loss, pinning all 5 to just CPU 0 also produces no
>> packet loss.
>>
>> The failures are also sudden. In all of the working cases mentioned
>> above, I don't see ksoftirqd on top at all. But when I run 6 processes
>> on a single CPU, ksoftirqd shoots up to 100% and I lose a huge number of
>> packets.
>>
>>     
>>> Normaly, softirq runs on same cpu (the one handling hard irq)
>>>       
>> What determines which CPU the hard irq occurs on?
>>
>>     
>
> Check /proc/irq/{irqnumber}/smp_affinity
>
> If you want IRQ16 only served by CPU0 :
>
> echo 1 >/proc/irq/16/smp_affinity
>
>   
Hi everyone,

First, thanks for all the effort so far, I think we've learned so much 
more about the problem in the last couple of days than we had previously 
in a month.

Just to summarize where we are:

* pinning processes to specific cores/CPUs alleviate the problem
* issues exist from 2.6.22 up to 2.6.29-rc3
* issue does not appear to be isolated to 64-bit, 32-bits have problems 
too.
* I'm attaching an updated test program with the PR_SET_TIMERSTACK call 
added.
* on troubled machines, we are seeing high number of context switches 
and interrupts.
* we've ordered an Intel card to try in our machine to see if we can 
circumvent the issue with a different driver.

Kernel Version         Has Problem?     Notes
----------             ----------       ----------
2.6.15.x                N    
2.6.16.x                -
2.6.17.x                -               Doesn't build on Hardy
2.6.18.x                -               Doesn't boot (kernel panic)
2.6.19.7                N               ksoftirqd is up there, but not 
pegging a CPU.
                                        Takes roughly same amount of CPU 
as the other
                                        processes, all of which are from 
20-40%
2.6.20.21               N
2.6.21.7                N               sort of lopsided load, but no 
load from
                                        ksoftirqd -- strange
2.6.22.19               Y               First broken kernel
2.6.23.x                -
2.6.24-19               Y               (from Hardy)
2.6.25.x                -
2.6.26.x                -
2.6.27.x                Y               (from Intrepid)
2.6.28.1                Y
2.6.29-rc               Y


Correct me if I'm wrong, from what we've seen, it looks like its 
pointing to some inefficiency in the softirq handling.  The question is 
whether it's something in the driver or the kernel.  If we can isolate 
that, maybe we can take some action to have it fixed.

Kenny

[-- Attachment #2: mcasttest.c --]
[-- Type: text/x-csrc, Size: 3392 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <arpa/inet.h>
#include <sys/epoll.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <unistd.h>

#ifndef PR_SET_TIMERSLACK
#define PR_SET_TIMERSLACK 29
#endif

const char *g_mcastaddr = "239.100.0.99";
int g_port = 10100;

void error(const char *s)
{
    fprintf(stderr, "%s\n", s);
    exit(1);
}

void check(int v)
{
    int myerr = errno;
    char *myerrstr = strerror(myerr);
    if(!v)
        error("bad return code");
}

int main(int argc, char **argv)
{
    if(argc != 2)
        error("usage: mcasttest (server|client)");
    if(strcmp(argv[1], "client") == 0)
    {
        /*
         * Client program: subscribes to a multicast group, receives messages
         * and prints a count of messages received once it's done.
         */
        int s = socket(AF_INET, SOCK_DGRAM, 0);
        check(s > 0);
        int val = 1;
        check(setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)) == 0);

        struct sockaddr_in addr;
        memset(&addr, 0, sizeof(addr));
        addr.sin_family = AF_INET;
        addr.sin_port = htons(g_port);
        addr.sin_addr.s_addr = htonl(INADDR_ANY);
        check(bind(s, (struct sockaddr *) &addr, sizeof(addr)) == 0);

        struct ip_mreqn mreq;
        memset(&mreq, 0, sizeof(mreq));
        check(inet_pton(AF_INET, g_mcastaddr, &mreq.imr_multiaddr));
        mreq.imr_address.s_addr = htonl(INADDR_ANY);
        mreq.imr_ifindex = 0;
        check(setsockopt(s, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) == 0);

        int bufSz;
        socklen_t len = sizeof(bufSz);
        getsockopt(s, SOL_SOCKET, SO_RCVBUF, (char*)(&bufSz), &len);
        printf("bufsz: %d\n", bufSz);

        int npackets = 0;
        char buf[1000];
        memset(buf, 0, sizeof(buf));
        while(1)
        {
            struct sockaddr_in from;
            socklen_t fromlen = sizeof(from);
            check(recvfrom(s, buf, 1000, 0, (struct sockaddr*)&from, &fromlen) == 100);
            ++npackets;
            if(buf[0] == 1) // exit message
                break;
        }
        printf("received %d packets\n", npackets);
    }
    else if(strcmp(argv[1], "server") == 0)
    {
        /*
         * Setup a timer resolution of 1000 ns : 1 us
         */
        prctl(PR_SET_TIMERSLACK, 1000); 

        /*
         * Server program: sends 50,000 packets per second to a multicast address,
         * for 10 seconds.
         */
        int s = socket(AF_INET, SOCK_DGRAM, 0);
        int val = 1;
        int i = 1;
        check(s > 0);

        struct sockaddr_in addr;
        memset(&addr, 0, sizeof(addr));
        addr.sin_family = AF_INET;
        addr.sin_port = htons(g_port);
        check(inet_pton(AF_INET, g_mcastaddr, &addr.sin_addr.s_addr));
        check(connect(s, (struct sockaddr *) &addr, sizeof(addr)) == 0);

        int npackets = 500000;
        char buf[100];
        memset(buf, 0, sizeof(buf));
        for(i = 1; i < npackets; ++i)
        {
            check(send(s, buf, sizeof(buf), 0) > 0);
            usleep(20); // 50,000 messages per second
        }

        buf[0] = 1;
        for(i = 1; i < 5; ++i)
        {
            check(send(s, buf, sizeof(buf), 0) > 0);
            sleep(1);
        }
    }
    else
        error("unknown mode");
    return 0;
}

  reply	other threads:[~2009-02-03 17:34 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-30 17:49 Multicast packet loss Kenny Chang
2009-01-30 19:04 ` Eric Dumazet
2009-01-30 19:17 ` Denys Fedoryschenko
2009-01-30 20:03 ` Neil Horman
2009-01-30 22:29   ` Kenny Chang
2009-01-30 22:41     ` Eric Dumazet
2009-01-31 16:03       ` Neil Horman
2009-02-02 16:13         ` Kenny Chang
2009-02-02 16:48         ` Kenny Chang
2009-02-03 11:55           ` Neil Horman
2009-02-03 15:20             ` Kenny Chang
2009-02-04  1:15               ` Neil Horman
2009-02-04 16:07                 ` Kenny Chang
2009-02-04 16:46                   ` Wesley Chow
2009-02-04 18:11                     ` Eric Dumazet
2009-02-05 13:33                       ` Neil Horman
2009-02-05 13:46                         ` Wesley Chow
2009-02-05 13:29                   ` Neil Horman
2009-02-01 12:40       ` Eric Dumazet
2009-02-02 13:45         ` Neil Horman
2009-02-02 16:57           ` Eric Dumazet
2009-02-02 18:22             ` Neil Horman
2009-02-02 19:51               ` Wes Chow
2009-02-02 20:29                 ` Eric Dumazet
2009-02-02 21:09                   ` Wes Chow
2009-02-02 21:31                     ` Eric Dumazet
2009-02-03 17:34                       ` Kenny Chang [this message]
2009-02-04  1:21                         ` Neil Horman
2009-02-26 17:15                           ` Kenny Chang
2009-02-28  8:51                             ` Eric Dumazet
2009-03-01 17:03                               ` Eric Dumazet
2009-03-04  8:16                               ` David Miller
2009-03-04  8:36                                 ` Eric Dumazet
2009-03-07  7:46                                   ` Eric Dumazet
2009-03-08 16:46                                     ` Eric Dumazet
2009-03-09  2:49                                       ` David Miller
2009-03-09  6:36                                         ` Eric Dumazet
2009-03-13 21:51                                           ` David Miller
2009-03-13 22:30                                             ` Eric Dumazet
2009-03-13 22:38                                               ` David Miller
2009-03-13 22:45                                                 ` Eric Dumazet
2009-03-14  9:03                                                   ` [PATCH] net: reorder fields of struct socket Eric Dumazet
2009-03-16  2:59                                                     ` David Miller
2009-03-16 22:22                                                 ` Multicast packet loss Eric Dumazet
2009-03-17 10:11                                                   ` Peter Zijlstra
2009-03-17 11:08                                                     ` Eric Dumazet
2009-03-17 11:57                                                       ` Peter Zijlstra
2009-03-17 15:00                                                       ` Brian Bloniarz
2009-03-17 15:16                                                         ` Eric Dumazet
2009-03-17 19:39                                                           ` David Stevens
2009-03-17 21:19                                                             ` Eric Dumazet
2009-04-03 19:28                                                   ` Brian Bloniarz
2009-04-05 13:49                                                     ` Eric Dumazet
2009-04-06 21:53                                                       ` Brian Bloniarz
2009-04-06 22:12                                                         ` Brian Bloniarz
2009-04-07 20:08                                                       ` Brian Bloniarz
2009-04-08  8:12                                                         ` Eric Dumazet
2009-03-09 22:56                                       ` Brian Bloniarz
2009-03-10  5:28                                         ` Eric Dumazet
2009-03-10 23:22                                           ` Brian Bloniarz
2009-03-11  3:00                                             ` Eric Dumazet
2009-03-12 15:47                                               ` Brian Bloniarz
2009-03-12 16:34                                                 ` Eric Dumazet
2009-02-27 18:40       ` Christoph Lameter
2009-02-27 18:56         ` Eric Dumazet
2009-02-27 19:45           ` Christoph Lameter
2009-02-27 20:12             ` Eric Dumazet
2009-02-27 21:36               ` Eric Dumazet
2009-02-02 13:53     ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2009-04-05 14:42 bmb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4988803E.2020009@athenacr.com \
    --to=kchang@athenacr.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.