All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wengang <wen.gang.wang@oracle.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>,
	Cong Wang <cwang@twopensource.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH] bonding: clear header_ops when last slave detached (v2)
Date: Mon, 24 Nov 2014 11:05:39 +0800	[thread overview]
Message-ID: <5472A083.9020801@oracle.com> (raw)
In-Reply-To: <29850.1416596051@famine>

于 2014年11月22日 02:54, Jay Vosburgh 写道:
> Cong Wang <cwang@twopensource.com> wrote:
>
>> On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
>> <jay.vosburgh@canonical.com> wrote:
>>> Cong Wang <cwang@twopensource.com> wrote:
>>>
>>>> Also, no one seems to care about my previous question:
>>>> why only bonding has the problem?
>>>          Bonding has the problem because it stashes a pointer to a data
>>> structure (the header_ops) from another module, and when that module is
>>> unloaded the dangling pointer may be dereferenced if it's not either
>>> cleared or made to never go away.
>> I knew, please re-read my question, I was asking why ONLY bonding
>> has the problem, i.e. why not neigh or whatever else calling
>> header_ops->foo()? :)
>>
>> As I said, I may miss some try_get_module() somewhere of course.
>> Needs more digging.
> 	My explanation is why only bonding has the problem; it's keeping
> a pointer (in bond_dev->header_ops) that is copied from the slave
> device's ->header_ops, and clearing that stashed pointer is (a) not
> correctly synchronized with the removal of the slave device, and (b)
> trying to simply clear the pointer has a check then use race in
> dev_hard_header.
>
> 	8021q, for example, uses a "passthru" header_ops to call the
> underlying device's header_ops, but 8021q is only for ethernet, and the
> eth_header_ops are static in vmlinux, so it won't see these problems.
>
> 	Actually, now that I think about it, when the last ipoib slave
> is released, the bonding master device is theoretically supposed to be
> removed to avoid the sort of problem we're discussing here.
>
> 	That apparently isn't happening, unless Wengang is running
> pktgen and simultaneously removing the ipoib module (racing the transmit
> against the removal), or maybe something else is going on (perhaps
> pktgen holds a reference to the bonding master, preventing its removal).
>
> 	Also, curiously, looking at pkgten, pktgen_setup_dev appears to
> only accept devices of type ARPHRD_ETHER, but bonding with an ipoib
> slave would be ARPHRD_INFINIBAND.  I'm therefore not sure how Wengang
> configured pktgen over an ipoib bond.
>
> 	Wengang, what kernel are you using, and is your kernel modified
> to change pktgen_setup_dev?
>
> 	-J

It's a 2.6.39 kernel.
code is like this:

static int pktgen_setup_dev(struct pktgen_dev *pkt_dev, const char *ifname)
{
         struct net_device *odev;
         int err;

         /* Clean old setups */
         if (pkt_dev->odev) {
                 dev_put(pkt_dev->odev);
                 pkt_dev->odev = NULL;
         }

         odev = pktgen_dev_get_by_name(pkt_dev, ifname);
         if (!odev) {
                 pr_err("no such netdevice: \"%s\"\n", ifname);
                 return -ENODEV;
         }

         if (odev->type != ARPHRD_ETHER) {
                 pr_err("not an ethernet device: \"%s\"\n", ifname);
                 err = -EINVAL;
         } else if (!netif_running(odev)) {
                 pr_err("device is down: \"%s\"\n", ifname);
                 err = -ENETDOWN;
         } else {
                 pkt_dev->odev = odev;
                 return 0;
         }

         dev_put(odev);
         return err;
}

No change done to it.

This problem is a side product when I was working with another area. I 
am so far not very clear about the setup(no env to check now either).

thanks,
wengang
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

  reply	other threads:[~2014-11-24  3:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-19  5:18 [PATCH] bonding: clear header_ops when last slave detached (v2) Wengang Wang
2014-11-19  5:22 ` Wengang
2014-11-19  5:39 ` Eric Dumazet
2014-11-19  7:00   ` Wengang
2014-11-19 22:26     ` Cong Wang
2014-11-19 22:56       ` Cong Wang
2014-11-20  6:41       ` Eric Dumazet
2014-11-20 17:34         ` Cong Wang
2014-11-20 20:41           ` Eric Dumazet
2014-11-20 21:57             ` Cong Wang
2014-11-20 22:03               ` Eric Dumazet
2014-11-20 22:13                 ` Cong Wang
2014-11-20 22:53                   ` Jay Vosburgh
2014-11-21 18:17                     ` Cong Wang
2014-11-21 18:54                       ` Jay Vosburgh
2014-11-24  3:05                         ` Wengang [this message]
2014-11-21 18:55                       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5472A083.9020801@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=cwang@twopensource.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.