netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wengang <wen.gang.wang@oracle.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>,
	Cong Wang <cwang@twopensource.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH] bonding: clear header_ops when last slave detached (v2)
Date: Mon, 24 Nov 2014 11:05:39 +0800	[thread overview]
Message-ID: <5472A083.9020801@oracle.com> (raw)
In-Reply-To: <29850.1416596051@famine>

于 2014年11月22日 02:54, Jay Vosburgh 写道:
> Cong Wang <cwang@twopensource.com> wrote:
>
>> On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
>> <jay.vosburgh@canonical.com> wrote:
>>> Cong Wang <cwang@twopensource.com> wrote:
>>>
>>>> Also, no one seems to care about my previous question:
>>>> why only bonding has the problem?
>>>          Bonding has the problem because it stashes a pointer to a data
>>> structure (the header_ops) from another module, and when that module is
>>> unloaded the dangling pointer may be dereferenced if it's not either
>>> cleared or made to never go away.
>> I knew, please re-read my question, I was asking why ONLY bonding
>> has the problem, i.e. why not neigh or whatever else calling
>> header_ops->foo()? :)
>>
>> As I said, I may miss some try_get_module() somewhere of course.
>> Needs more digging.
> 	My explanation is why only bonding has the problem; it's keeping
> a pointer (in bond_dev->header_ops) that is copied from the slave
> device's ->header_ops, and clearing that stashed pointer is (a) not
> correctly synchronized with the removal of the slave device, and (b)
> trying to simply clear the pointer has a check then use race in
> dev_hard_header.
>
> 	8021q, for example, uses a "passthru" header_ops to call the
> underlying device's header_ops, but 8021q is only for ethernet, and the
> eth_header_ops are static in vmlinux, so it won't see these problems.
>
> 	Actually, now that I think about it, when the last ipoib slave
> is released, the bonding master device is theoretically supposed to be
> removed to avoid the sort of problem we're discussing here.
>
> 	That apparently isn't happening, unless Wengang is running
> pktgen and simultaneously removing the ipoib module (racing the transmit
> against the removal), or maybe something else is going on (perhaps
> pktgen holds a reference to the bonding master, preventing its removal).
>
> 	Also, curiously, looking at pkgten, pktgen_setup_dev appears to
> only accept devices of type ARPHRD_ETHER, but bonding with an ipoib
> slave would be ARPHRD_INFINIBAND.  I'm therefore not sure how Wengang
> configured pktgen over an ipoib bond.
>
> 	Wengang, what kernel are you using, and is your kernel modified
> to change pktgen_setup_dev?
>
> 	-J

It's a 2.6.39 kernel.
code is like this:

static int pktgen_setup_dev(struct pktgen_dev *pkt_dev, const char *ifname)
{
         struct net_device *odev;
         int err;

         /* Clean old setups */
         if (pkt_dev->odev) {
                 dev_put(pkt_dev->odev);
                 pkt_dev->odev = NULL;
         }

         odev = pktgen_dev_get_by_name(pkt_dev, ifname);
         if (!odev) {
                 pr_err("no such netdevice: \"%s\"\n", ifname);
                 return -ENODEV;
         }

         if (odev->type != ARPHRD_ETHER) {
                 pr_err("not an ethernet device: \"%s\"\n", ifname);
                 err = -EINVAL;
         } else if (!netif_running(odev)) {
                 pr_err("device is down: \"%s\"\n", ifname);
                 err = -ENETDOWN;
         } else {
                 pkt_dev->odev = odev;
                 return 0;
         }

         dev_put(odev);
         return err;
}

No change done to it.

This problem is a side product when I was working with another area. I 
am so far not very clear about the setup(no env to check now either).

thanks,
wengang
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

  reply	other threads:[~2014-11-24  3:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-19  5:18 [PATCH] bonding: clear header_ops when last slave detached (v2) Wengang Wang
2014-11-19  5:22 ` Wengang
2014-11-19  5:39 ` Eric Dumazet
2014-11-19  7:00   ` Wengang
2014-11-19 22:26     ` Cong Wang
2014-11-19 22:56       ` Cong Wang
2014-11-20  6:41       ` Eric Dumazet
2014-11-20 17:34         ` Cong Wang
2014-11-20 20:41           ` Eric Dumazet
2014-11-20 21:57             ` Cong Wang
2014-11-20 22:03               ` Eric Dumazet
2014-11-20 22:13                 ` Cong Wang
2014-11-20 22:53                   ` Jay Vosburgh
2014-11-21 18:17                     ` Cong Wang
2014-11-21 18:54                       ` Jay Vosburgh
2014-11-24  3:05                         ` Wengang [this message]
2014-11-21 18:55                       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5472A083.9020801@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=cwang@twopensource.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).