netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [Bug 88111] New: Race condition in net_tx_action?
@ 2014-11-13  3:19 Stephen Hemminger
  0 siblings, 0 replies; only message in thread
From: Stephen Hemminger @ 2014-11-13  3:19 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Wed, 12 Nov 2014 10:52:10 -0800
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 88111] New: Race condition in net_tx_action?


https://bugzilla.kernel.org/show_bug.cgi?id=88111

            Bug ID: 88111
           Summary: Race condition in net_tx_action?
           Product: Networking
           Version: 2.5
    Kernel Version: all
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: angelo.rizzi@3dautomazione.it
        Regression: No

Hi all,

I have a question about a strange situation i've faced on my linux-based
embedded system:

Using 2 network device (transmitting asynchronously), i found a kind of "leak"
in sk_buff alloc/free that drives my test program, after some days of
continuous transmission, to be unable to write on the xmitting socket ("poll()"
function using POLLOUT request always returning 0).

After a lot of test, i've found the reason for such behaviour in the
net_tx_action() function (net/core/dev.c):

Let me explain what i've found:

The following code is used in order to get the current list of sk_buff to free:

static void net_tx_action(struct softirq_action *h)
{
        struct softnet_data *sd = &__get_cpu_var(softnet_data);

        if (sd->completion_queue) {
                 struct sk_buff *clist;

                 local_irq_disable();
                 clist = sd->completion_queue;
                 sd->completion_queue = NULL;
                 local_irq_enable();

Transmitting asynchronously on all the network devices available i've noticed
the following behaviour:
a) The instruction "if (sd->completion_queue) {" saves on a CPU register the
pointer value (register contents is used for the comparison)
b) The interupt is disabled (using "local_irq_disable")
c) when the content of "clist" is updated, the register is used, instead of
re-read the "completion_queue" variable.

So, when a low-level tx interrupt arrives after the latching of
"completion_queue", but before "local_irq_disable", the value stored in "clist"
reflect the situation before low-level tx interrupt, resulting in a sk_buff
leak

I've changed the declaration of "sd" as follows:

        volatile struct softnet_data *sd = &__get_cpu_var(softnet_data);

and everything is now ok.

Is that correct?

Thanks,
Angelo

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2014-11-24 19:27 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-13  3:19 Fw: [Bug 88111] New: Race condition in net_tx_action? Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).