Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)
From: Jay Vosburgh @ 2011-05-25 18:59 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Andy Gospodarek, nicolas.2p.debian, David S. Miller
In-Reply-To: <1306347181-3757-1-git-send-email-nhorman@tuxdriver.com>

Neil Horman <nhorman@tuxdriver.com> wrote:

>This soft lockup was recently reported:
>
>[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
>[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>bonding bond5: master_dev is not up in bond_enslave
>[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>
>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>CPU 12:
>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>be2d
>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
>.text.lock.spinlock+0x26/00
>RSP: 0018:ffff810113167da8  EFLAGS: 00000286
>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>
>Call Trace:
> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> [<ffffffff80016b87>] vfs_write+0xce/0x174
> [<ffffffff80017450>] sys_write+0x45/0x6e
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>It occurs because we are able to change the slave configuarion of a bond while
>the bond interface is down.  The bonding driver initializes some data structures
>only after its ndo_open routine is called.  Among them is the initalization of
>the alb tx and rx hash locks.  So if we add or remove a slave without first
>opening the bond master device, we run the risk of trying to lock/unlock a
>spinlock that has garbage for data in it, which results in our above softlock.
>
>Note that sometimes this works, because in many cases an unlocked spinlock has
>the raw_lock parameter initialized to zero (meaning that the kzalloc of the
>net_device private data is equivalent to calling spin_lock_init), but thats not
>true in all cases, and we aren't guaranteed that condition, so we need to pass
>the relevant spinlocks through the spin_lock_init function.
>
>Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
>the ndo_init path, so they are ready for use by the bond_store_slaves path.
>
>Change notes:
>v2) Based on conversation with Jay and Nicolas it seems that the ability to
>enslave devices while the bond master is down should be safe to do.  As such
>this is an outlier bug, and so instead we'll just initalize the errant spinlocks
>in the init path rather than the open path, solving the problem.  We'll also
>remove the warnings about the bond being down during enslave operations, since
>it should be safe
>
>v3) Fix spelling error
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>


>Reported-by: jtluka@redhat.com
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: nicolas.2p.debian@gmail.com
>CC: "David S. Miller" <davem@davemloft.net>
>---
> drivers/net/bonding/bond_alb.c   |    4 ----
> drivers/net/bonding/bond_main.c  |   16 ++++++++++------
> drivers/net/bonding/bond_sysfs.c |    6 ------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 8f2d2e7..2df9276 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
> 	struct tlb_client_info *new_hashtbl;
> 	int i;
>
>-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>-
> 	new_hashtbl = kzalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
>@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
> 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
> 	int i;
>
>-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>-
> 	new_hashtbl = kmalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 088fd84..c0045d7 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 			   bond_dev->name, slave_dev->name);
> 	}
>
>-	/* bond must be initialized by bond_open() before enslaving */
>-	if (!(bond_dev->flags & IFF_UP)) {
>-		pr_warning("%s: master_dev is not up in bond_enslave\n",
>-			   bond_dev->name);
>-	}
>-
> 	/* already enslaved */
> 	if (slave_dev->flags & IFF_SLAVE) {
> 		pr_debug("Error, Device was already enslaved\n");
>@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>
> 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
>
>+	/*
>+	 * Initialize locks that may be required during
>+	 * en/deslave operations.  All of the bond_open work
>+	 * (of which this is part) should really be moved to
>+	 * a phase prior to dev_open
>+	 */
>+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>+
> 	bond->wq = create_singlethread_workqueue(bond_dev->name);
> 	if (!bond->wq)
> 		return -ENOMEM;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 4059bfc..bb1319f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
> 	struct net_device *dev;
> 	struct bonding *bond = to_bond(d);
>
>-	/* Quick sanity check -- is the bond interface up? */
>-	if (!(bond->dev->flags & IFF_UP)) {
>-		pr_warning("%s: doing slave updates when interface is down.\n",
>-			   bond->dev->name);
>-	}
>-
> 	if (!rtnl_trylock())
> 		return restart_syscall();
>
>-- 
>1.7.5.2
>

^ permalink raw reply

* Re: [PATCH 12/34] isdn/diva: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 20:38 UTC (permalink / raw)
  To: Armin Schindler; +Cc: linux-kbuild, linux-kernel, netdev
In-Reply-To: <alpine.DEB.2.00.1104051709040.7262@justus.melware.de>

Dne 5.4.2011 17:10, Armin Schindler napsal(a):
> On Tue, 5 Apr 2011, Michal Marek wrote:
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
> 
> The module can be build separately from the kernel, therefore it can have
> an own build timestamp.

So the module timestamp and kernel timestamp vary by a couple of
minutes. But is it really a problem? I don't think so. So is there an
objection against applying this patch?

thanks,
Michal

^ permalink raw reply

* Re: [PATCH 18/34] wan/pc300: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 20:43 UTC (permalink / raw)
  To: linux-kbuild; +Cc: linux-kernel, David S. Miller, netdev
In-Reply-To: <1302015561-21047-19-git-send-email-mmarek@suse.cz>

Dne 5.4.2011 16:59, Michal Marek napsal(a):
> The kernel already prints its build timestamp during boot, no need to
> repeat it in random drivers and produce different object files each
> time.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Marek <mmarek@suse.cz>
> ---
>  drivers/net/wan/pc300_drv.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)

I don't see the patch in today's linux-next. Does anyone object against
me applying it to kbuild-2.6.git?

Thanks,
Michal


> diff --git a/drivers/net/wan/pc300_drv.c b/drivers/net/wan/pc300_drv.c
> index f875cfa..80ae503 100644
> --- a/drivers/net/wan/pc300_drv.c
> +++ b/drivers/net/wan/pc300_drv.c
> @@ -3242,8 +3242,7 @@ static inline void show_version(void)
>  	rcsdate++;
>  	tmp = strrchr(rcsdate, ' ');
>  	*tmp = '\0';
> -	printk(KERN_INFO "Cyclades-PC300 driver %s %s (built %s %s)\n", 
> -		rcsvers, rcsdate, __DATE__, __TIME__);
> +	printk(KERN_INFO "Cyclades-PC300 driver %s %s\n", rcsvers, rcsdate);
>  }				/* show_version */
>  
>  static const struct net_device_ops cpc_netdev_ops = {

^ permalink raw reply

* Re: [PATCH 18/34] wan/pc300: Drop __TIME__ usage
From: David Miller @ 2011-05-25 20:44 UTC (permalink / raw)
  To: mmarek; +Cc: linux-kbuild, linux-kernel, netdev
In-Reply-To: <4DDD69DD.7030601@suse.cz>

From: Michal Marek <mmarek@suse.cz>
Date: Wed, 25 May 2011 22:43:09 +0200

> Dne 5.4.2011 16:59, Michal Marek napsal(a):
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>> 
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Michal Marek <mmarek@suse.cz>
>> ---
>>  drivers/net/wan/pc300_drv.c |    3 +--
>>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> I don't see the patch in today's linux-next. Does anyone object against
> me applying it to kbuild-2.6.git?

Just do it :-)

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH 28/34] atm: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 20:49 UTC (permalink / raw)
  To: linux-kbuild; +Cc: linux-kernel, David S. Miller, netdev
In-Reply-To: <1302015561-21047-29-git-send-email-mmarek@suse.cz>

Dne 5.4.2011 16:59, Michal Marek napsal(a):
> The kernel already prints its build timestamp during boot, no need to
> repeat it in random drivers and produce different object files each
> time.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Marek <mmarek@suse.cz>
> ---
>  net/atm/lec.c |    2 +-
>  net/atm/mpc.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Hi,

I don't see this patch in today's linux-next. Any objection against me
applying it to the kbuild-2.6.git repository?

Thanks,
Michal


> diff --git a/net/atm/lec.c b/net/atm/lec.c
> index 38754fd..cb14ddf 100644
> --- a/net/atm/lec.c
> +++ b/net/atm/lec.c
> @@ -1173,7 +1173,7 @@ static int __init lane_module_init(void)
>  #endif
>  
>  	register_atm_ioctl(&lane_ioctl_ops);
> -	pr_info("lec.c: " __DATE__ " " __TIME__ " initialized\n");
> +	pr_info("lec.c: initialized\n");
>  	return 0;
>  }
>  
> diff --git a/net/atm/mpc.c b/net/atm/mpc.c
> index 644cdf0..3ccca42 100644
> --- a/net/atm/mpc.c
> +++ b/net/atm/mpc.c
> @@ -1482,7 +1482,7 @@ static __init int atm_mpoa_init(void)
>  	if (mpc_proc_init() != 0)
>  		pr_info("failed to initialize /proc/mpoa\n");
>  
> -	pr_info("mpc.c: " __DATE__ " " __TIME__ " initialized\n");
> +	pr_info("mpc.c: initialized\n");
>  
>  	return 0;
>  }


^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: C Anthony Risinger @ 2011-05-25 21:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux Containers, netdev, linux-kernel
In-Reply-To: <m1wrhh3z62.fsf@fess.ebiederm.org>

On Mon, May 23, 2011 at 4:05 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> This tree adds the files /proc/<pid>/ns/net, /proc/<pid>/ns/ipc,
> /proc/<pid>/ns/uts that can be opened to refer to the namespaces of a
> process at the time those files are opened, and can be bind mounted to
> keep the specified namespace alive without a process.
>
> This tree adds the setns system call that can be used to change the
> specified namespace of a process to the namespace specified by a system
> call.

i just have a quick question regarding these, apologies if wrong place
to respond -- i trimmed to lists only.

if i understand correctly, mount namespaces (for example), allow one
to build such constructs as "private /tmp" and similar that even
`root` cannot access ... and there are many reasons `root` does not
deserve to completely know/interact with user processes (FUSE makes a
good example ... just because i [user] have SSH access to a machine,
why should `root`?)

would these /proc additions break such guarantees?  IOW, would it now
become possible for `root` to inject stuff into my private namespaces,
and/or has these guarantees never existed and i am mistaken?  is there
any kind of ACL mechanism that endows the origin process (or similar)
with the ability to dictate who can hold and/or interact with these
references?

Thanks for your time,

-- 

C Anthony

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Serge E. Hallyn @ 2011-05-25 21:38 UTC (permalink / raw)
  To: C Anthony Risinger
  Cc: Eric W. Biederman, Linux Containers, netdev, linux-kernel
In-Reply-To: <BANLkTikW4vJbC8kcLSKuemUBbu36SO6hwg@mail.gmail.com>

Quoting C Anthony Risinger (anthony@xtfx.me):
> On Mon, May 23, 2011 at 4:05 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> >
> > This tree adds the files /proc/<pid>/ns/net, /proc/<pid>/ns/ipc,
> > /proc/<pid>/ns/uts that can be opened to refer to the namespaces of a
> > process at the time those files are opened, and can be bind mounted to
> > keep the specified namespace alive without a process.
> >
> > This tree adds the setns system call that can be used to change the
> > specified namespace of a process to the namespace specified by a system
> > call.
> 
> i just have a quick question regarding these, apologies if wrong place
> to respond -- i trimmed to lists only.
> 
> if i understand correctly, mount namespaces (for example), allow one
> to build such constructs as "private /tmp" and similar that even
> `root` cannot access ... and there are many reasons `root` does not
> deserve to completely know/interact with user processes (FUSE makes a
> good example ... just because i [user] have SSH access to a machine,
> why should `root`?)
> 
> would these /proc additions break such guarantees?  IOW, would it now
> become possible for `root` to inject stuff into my private namespaces,
> and/or has these guarantees never existed and i am mistaken?  is there
> any kind of ACL mechanism that endows the origin process (or similar)
> with the ability to dictate who can hold and/or interact with these
> references?

If for instance you have a file open in your private /tmp, then root
in another mounts ns can open the file through /proc/$$/fd/N anyway.
If it's a directory, he can now traverse the whole fs.

-serge

^ permalink raw reply

* Re: [PATCH 28/34] atm: Drop __TIME__ usage
From: David Miller @ 2011-05-25 21:39 UTC (permalink / raw)
  To: mmarek; +Cc: linux-kbuild, linux-kernel, netdev
In-Reply-To: <4DDD6B59.9000108@suse.cz>

From: Michal Marek <mmarek@suse.cz>
Date: Wed, 25 May 2011 22:49:29 +0200

> Dne 5.4.2011 16:59, Michal Marek napsal(a):
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>> 
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Michal Marek <mmarek@suse.cz>
>> ---
>>  net/atm/lec.c |    2 +-
>>  net/atm/mpc.c |    2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Hi,
> 
> I don't see this patch in today's linux-next. Any objection against me
> applying it to the kbuild-2.6.git repository?

Please apply it:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: C Anthony Risinger @ 2011-05-25 21:55 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Eric W. Biederman, Linux Containers, netdev, linux-kernel
In-Reply-To: <20110525213806.GA4590@mail.hallyn.com>

On Wed, May 25, 2011 at 4:38 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> Quoting C Anthony Risinger (anthony@xtfx.me):
>> On Mon, May 23, 2011 at 4:05 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>> >
>> > This tree adds the files /proc/<pid>/ns/net, /proc/<pid>/ns/ipc,
>> > /proc/<pid>/ns/uts that can be opened to refer to the namespaces of a
>> > process at the time those files are opened, and can be bind mounted to
>> > keep the specified namespace alive without a process.
>> >
>> > This tree adds the setns system call that can be used to change the
>> > specified namespace of a process to the namespace specified by a system
>> > call.
>>
>> i just have a quick question regarding these, apologies if wrong place
>> to respond -- i trimmed to lists only.
>>
>> if i understand correctly, mount namespaces (for example), allow one
>> to build such constructs as "private /tmp" and similar that even
>> `root` cannot access ... and there are many reasons `root` does not
>> deserve to completely know/interact with user processes (FUSE makes a
>> good example ... just because i [user] have SSH access to a machine,
>> why should `root`?)
>>
>> would these /proc additions break such guarantees?  IOW, would it now
>> become possible for `root` to inject stuff into my private namespaces,
>> and/or has these guarantees never existed and i am mistaken?  is there
>> any kind of ACL mechanism that endows the origin process (or similar)
>> with the ability to dictate who can hold and/or interact with these
>> references?
>
> If for instance you have a file open in your private /tmp, then root
> in another mounts ns can open the file through /proc/$$/fd/N anyway.
> If it's a directory, he can now traverse the whole fs.

aaah right :-( ... there's always another way isn't there ... curse
you Linux for being so flexible! (just kidding baby i love you)

this seems like a more fundamental issue then?  or should i not expect
to be able to achieve separation like this?  i ask in the context of
OS virt via cgroups + namespaces, eg. LXC et al, because i'm about to
perform a massive overhaul to our crusty sub-2.6.18 infrastructure and
i've used/followed these technologies for couple years now ... and
it's starting to feel like "the right time".

C Anthony

^ permalink raw reply

* [RFC] af-packet:  Save reference to bound network device.
From: greearb @ 2011-05-25 21:56 UTC (permalink / raw)
  To: netdev; +Cc: Ben Greear

From: Ben Greear <greearb@candelatech.com>

This saves a network device lookup on each packet transmitted,
for sockets that are bound to a network device.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
:100644 100644 4005b24... ba16759... M	net/packet/af_packet.c
 net/packet/af_packet.c |   48 +++++++++++++++++++++++++++++++++++-------------
 1 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4005b24..ba16759 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -201,8 +201,9 @@ struct packet_sock {
 				auxdata:1,
 				origdev:1,
 				has_vnet_hdr:1;
-	int			ifindex;	/* bound device		*/
+	int			ifindex;	/* bound device id	*/
 	__be16			num;
+	struct net_device	*bound_dev; /* bound device */
 	struct packet_mclist	*mclist;
 	atomic_t		mapped;
 	enum tpacket_versions	tp_version;
@@ -987,8 +988,9 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 {
 	struct sk_buff *skb;
-	struct net_device *dev;
+	struct net_device *dev = NULL;
 	__be16 proto;
+	bool need_rls_dev = false;
 	int ifindex, err, reserve = 0;
 	void *ph;
 	struct sockaddr_ll *saddr = (struct sockaddr_ll *)msg->msg_name;
@@ -1002,6 +1004,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 	err = -EBUSY;
 	if (saddr == NULL) {
 		ifindex	= po->ifindex;
+		dev = po->bound_dev;
 		proto	= po->num;
 		addr	= NULL;
 	} else {
@@ -1017,7 +1020,10 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		addr	= saddr->sll_addr;
 	}
 
-	dev = dev_get_by_index(sock_net(&po->sk), ifindex);
+	if (!dev) {
+		dev = dev_get_by_index(sock_net(&po->sk), ifindex);
+		need_rls_dev = true;
+	}
 	err = -ENXIO;
 	if (unlikely(dev == NULL))
 		goto out;
@@ -1103,7 +1109,8 @@ out_status:
 	__packet_set_status(po, ph, status);
 	kfree_skb(skb);
 out_put:
-	dev_put(dev);
+	if (need_rls_dev)
+		dev_put(dev);
 out:
 	mutex_unlock(&po->pg_vec_lock);
 	return err;
@@ -1139,8 +1146,9 @@ static int packet_snd(struct socket *sock,
 	struct sock *sk = sock->sk;
 	struct sockaddr_ll *saddr = (struct sockaddr_ll *)msg->msg_name;
 	struct sk_buff *skb;
-	struct net_device *dev;
+	struct net_device *dev = NULL;
 	__be16 proto;
+	bool need_rls_dev = false;
 	unsigned char *addr;
 	int ifindex, err, reserve = 0;
 	struct virtio_net_hdr vnet_hdr = { 0 };
@@ -1161,6 +1169,7 @@ static int packet_snd(struct socket *sock,
 
 	if (saddr == NULL) {
 		ifindex	= po->ifindex;
+		dev = po->bound_dev;
 		proto	= po->num;
 		addr	= NULL;
 	} else {
@@ -1174,8 +1183,11 @@ static int packet_snd(struct socket *sock,
 		addr	= saddr->sll_addr;
 	}
 
+	if (!dev) {
+		dev = dev_get_by_index(sock_net(sk), ifindex);
+		need_rls_dev = true;
+	}
 
-	dev = dev_get_by_index(sock_net(sk), ifindex);
 	err = -ENXIO;
 	if (dev == NULL)
 		goto out_unlock;
@@ -1315,14 +1327,15 @@ static int packet_snd(struct socket *sock,
 	if (err > 0 && (err = net_xmit_errno(err)) != 0)
 		goto out_unlock;
 
-	dev_put(dev);
+	if (need_rls_dev)
+		dev_put(dev);
 
 	return len;
 
 out_free:
 	kfree_skb(skb);
 out_unlock:
-	if (dev)
+	if (dev && need_rls_dev)
 		dev_put(dev);
 out:
 	return err;
@@ -1372,6 +1385,12 @@ static int packet_release(struct socket *sock)
 		__dev_remove_pack(&po->prot_hook);
 		__sock_put(sk);
 	}
+
+	if (po->bound_dev) {
+		dev_put(po->bound_dev);
+		po->bound_dev = NULL;
+	}
+	
 	spin_unlock(&po->bind_lock);
 
 	packet_flush_mclist(sk);
@@ -1428,6 +1447,9 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protoc
 	po->prot_hook.dev = dev;
 
 	po->ifindex = dev ? dev->ifindex : 0;
+	if (po->bound_dev)
+		dev_put(po->bound_dev);
+	po->bound_dev = dev;
 
 	if (protocol == 0)
 		goto out_unlock;
@@ -1469,10 +1491,8 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
 	strlcpy(name, uaddr->sa_data, sizeof(name));
 
 	dev = dev_get_by_name(sock_net(sk), name);
-	if (dev) {
+	if (dev)
 		err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
-		dev_put(dev);
-	}
 	return err;
 }
 
@@ -1500,8 +1520,6 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len
 			goto out;
 	}
 	err = packet_do_bind(sk, dev, sll->sll_protocol ? : pkt_sk(sk)->num);
-	if (dev)
-		dev_put(dev);
 
 out:
 	return err;
@@ -2266,6 +2284,10 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 				}
 				if (msg == NETDEV_UNREGISTER) {
 					po->ifindex = -1;
+					if (po->bound_dev) {
+						dev_put(po->bound_dev);
+						po->bound_dev = NULL;
+					}
 					po->prot_hook.dev = NULL;
 				}
 				spin_unlock(&po->bind_lock);
-- 
1.7.3.4


^ permalink raw reply related

* Re: [PATCH] sctp: fix memory leak of the ASCONF queue when free asoc
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
  To: yjwei; +Cc: vladislav.yasevich, netdev, linux-sctp
In-Reply-To: <4DDCB432.7000903@cn.fujitsu.com>

From: Wei Yongjun <yjwei@cn.fujitsu.com>
Date: Wed, 25 May 2011 15:48:02 +0800

> If an ASCONF chunk is outstanding, then the following ASCONF
> chunk will be queued for later transmission. But when we free
> the asoc, we forget to free the ASCONF queue at the same time,
> this will cause memory leak.
> 
> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH] xen: netfront: hold RTNL when updating features.
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
  To: ian.campbell; +Cc: xen-devel, netdev
In-Reply-To: <1306310162-5627-1-git-send-email-ian.campbell@citrix.com>

From: Ian Campbell <ian.campbell@citrix.com>
Date: Wed, 25 May 2011 08:56:02 +0100

> Konrad reports:
> [    0.930811] RTNL: assertion failed at /home/konrad/ssd/linux/net/core/dev.c (5258)
> [    0.930821] Pid: 22, comm: xenwatch Not tainted 2.6.39-05193-gd762f43 #1
> [    0.930825] Call Trace:
> [    0.930834]  [<ffffffff8143bd0e>] __netdev_update_features+0xae/0xe0
> [    0.930840]  [<ffffffff8143dd41>] netdev_update_features+0x11/0x30
> [    0.930847]  [<ffffffffa0037105>] netback_changed+0x4e5/0x800 [xen_netfront]
> [    0.930854]  [<ffffffff8132a838>] xenbus_otherend_changed+0xa8/0xb0
> [    0.930860]  [<ffffffff8157ca99>] ? _raw_spin_unlock_irqrestore+0x19/0x20
> [    0.930866]  [<ffffffff8132adfe>] backend_changed+0xe/0x10
> [    0.930871]  [<ffffffff8132875a>] xenwatch_thread+0xba/0x180
> [    0.930876]  [<ffffffff810a8ba0>] ? wake_up_bit+0x40/0x40
> [    0.930881]  [<ffffffff813286a0>] ? split+0xf0/0xf0
> [    0.930886]  [<ffffffff810a8646>] kthread+0x96/0xa0
> [    0.930891]  [<ffffffff815855a4>] kernel_thread_helper+0x4/0x10
> [    0.930896]  [<ffffffff815846b3>] ? int_ret_from_sys_call+0x7/0x1b
> [    0.930901]  [<ffffffff8157cf61>] ? retint_restore_args+0x5/0x6
> [    0.930906]  [<ffffffff815855a0>] ? gs_change+0x13/0x13
> 
> This update happens in xenbus watch callback context and hence does not already
> hold the rtnl. Take the lock as necessary.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] net: make dev_disable_lro use physical device if passed a vlan dev (v2)
From: David Miller @ 2011-05-25 21:56 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, bhutchings
In-Reply-To: <1306261869-7276-3-git-send-email-nhorman@tuxdriver.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 24 May 2011 14:31:09 -0400

> If the device passed into dev_disable_lro is a vlan, then repoint the dev
> poniter so that we actually modify the underlying physical device.
> 
> Signed-of-by: Neil Horman <nhorman@tuxdriver.com>

Applied.

^ permalink raw reply

* Re: [PATCH] via-velocity: don't annotate MAC registers as packed
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
  To: uli; +Cc: netdev
In-Reply-To: <1306321642-2861-1-git-send-email-uli@suse.de>

From: Ulrich Hecht <uli@suse.de>
Date: Wed, 25 May 2011 13:07:22 +0200

> On ARM, memory accesses through packed pointers behave in unexpected
> ways in GCC releases 4.3 and higher; see https://lkml.org/lkml/2011/2/2/163
> for discussion.
> 
> In this particular case, 32-bit I/O registers are accessed bytewise,
> causing incorrect setting of the DMA address registers which in turn
> leads to an error interrupt storm that brings the system to a halt.
> 
> Since the mac_regs structure does not need any packing anyway, this patch
> simply removes the attribute to fix the issue.
> 
> Signed-off-by: Ulrich Hecht <uli@suse.de>

Applied.

^ permalink raw reply

* Re: [PATCH]: isdn: netjet - blacklist Digium TDM400P
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
  To: prarit; +Cc: netdev, isdn
In-Reply-To: <20110525121223.26439.57412.sendpatchset@prarit.bos.redhat.com>

From: Prarit Bhargava <prarit@redhat.com>
Date: Wed, 25 May 2011 08:12:23 -0400

> [2nd try ... 1st attempt didn't make it to netdev mailing list]
> 
> A quick google search reveals that people with this card are blacklisting it
> in the initramfs and in the module blacklist based on a statement that it
> is unsupported. Since the older Digium is also unsupported I'm pretty
> confident that this newer card is also not supported.
> 
> lspci -xxx -vv shows
> 
> 04:07.0 Communication controller: Tiger Jet Network Inc. Tiger3XX Modem/ISDN interface
>         Subsystem: Device b100:0003
> P.
> 
> ----8<----
> The Asterisk Voice Card, DIGIUM TDM400P is unsupported by the netjet driver.
> Blacklist it like the Digium X100P/X101P card.
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] net: move is_vlan_dev into public header file (v2)
From: David Miller @ 2011-05-25 21:56 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, bhutchings
In-Reply-To: <1306261869-7276-2-git-send-email-nhorman@tuxdriver.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 24 May 2011 14:31:08 -0400

> Migrate is_vlan_dev() to if_vlan.h so that core networkig can use it
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6] bnx2x: protect sequence increment with mutex
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
  To: dmitry; +Cc: netdev, eilong
In-Reply-To: <1306335351.15327.0.camel@lb-tlvb-dmitry>

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Wed, 25 May 2011 17:55:51 +0300

> 
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] bonding: documentation and code cleanup for resend_igmp
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
  To: fbl; +Cc: netdev, fubar, andy, rick.jones2
In-Reply-To: <1306348738-23946-1-git-send-email-fbl@redhat.com>

From: Flavio Leitner <fbl@redhat.com>
Date: Wed, 25 May 2011 15:38:58 -0300

> Improves the documentation about how IGMP resend parameter
> works, fix two missing checks and coding style issues.
> 
> Signed-off-by: Flavio Leitner <fbl@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, fubar, andy, nicolas.2p.debian
In-Reply-To: <1306343805-3223-1-git-send-email-nhorman@tuxdriver.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 25 May 2011 13:16:45 -0400

> This soft lockup was recently reported:
 ...
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: jtluka@redhat.com

Applied.

^ permalink raw reply

* Re: [PATCH] net: hold rtnl again in dump callbacks
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, kaber, shemminger
In-Reply-To: <1306344844.11400.11.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 May 2011 19:34:04 +0200

> Commit e67f88dd12f6 (dont hold rtnl mutex during netlink dump callbacks)
> missed fact that rtnl_fill_ifinfo() must be called with rtnl held.
> 
> Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
> its not easy to solve this problem, so revert this part of the patch.
> 
> It also forgot one rcu_read_unlock() in FIB dump_rules()
> 
> Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>
> CC: Stephen Hemminger <shemminger@vyatta.com>
> ---
> I tried to solve this problem differently but failed. We need more
> preliminary steps.

Applied.

^ permalink raw reply

* Re: [PATCH] sch_sfq: fix peek() implementation
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, kaber, netdev, hawk
In-Reply-To: <1306334411.2820.26.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 May 2011 16:40:11 +0200

> Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair),
> sfq_peek() can return a different skb that would be normally dequeued by
> sfq_dequeue() [ if current slot->allot is negative ]
> 
> Use generic qdisc_peek_dequeued() instead of custom implementation, to
> get consistent result.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [RFC] af-packet: Save reference to bound network device.
From: David Miller @ 2011-05-25 22:01 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <1306360602-9672-1-git-send-email-greearb@candelatech.com>

From: greearb@candelatech.com
Date: Wed, 25 May 2011 14:56:42 -0700

> From: Ben Greear <greearb@candelatech.com>
> 
> This saves a network device lookup on each packet transmitted,
> for sockets that are bound to a network device.
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>

You can't hold onto devices like this unless you also add a netdev
event notifier that will release it.  Otherwise we'll hang on net
driver module unload until the packet socket is closed.

I don't think you really want to walk all pf-packet sockets on netdev
events just to do this.

dev_get_by_index(,_rcu}() is insanely cheap, I doubt it's showing up
on your profiles at all.

^ permalink raw reply

* Re: [RFC] af-packet: Save reference to bound network device.
From: Ben Greear @ 2011-05-25 22:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110525.180113.1194226831134092545.davem@davemloft.net>

On 05/25/2011 03:01 PM, David Miller wrote:
> From: greearb@candelatech.com
> Date: Wed, 25 May 2011 14:56:42 -0700
>
>> From: Ben Greear<greearb@candelatech.com>
>>
>> This saves a network device lookup on each packet transmitted,
>> for sockets that are bound to a network device.
>>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>
> You can't hold onto devices like this unless you also add a netdev
> event notifier that will release it.  Otherwise we'll hang on net
> driver module unload until the packet socket is closed.
>
> I don't think you really want to walk all pf-packet sockets on netdev
> events just to do this.

Doesn't this piece of code take care of that?
I tested with rmmod..but of course I could have missed something.

@@ -2266,6 +2284,10 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
  				}
  				if (msg == NETDEV_UNREGISTER) {
  					po->ifindex = -1;
+					if (po->bound_dev) {
+						dev_put(po->bound_dev);
+						po->bound_dev = NULL;
+					}
  					po->prot_hook.dev = NULL;
  				}
  				spin_unlock(&po->bind_lock);


>
> dev_get_by_index(,_rcu}() is insanely cheap, I doubt it's showing up
> on your profiles at all.

I admit it was a small change...maybe 5Mbps (from 165 to 170Mbps in
this particular test), but it did seem to improve things a bit.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Michał Mirosław @ 2011-05-25 22:11 UTC (permalink / raw)
  To: C Anthony Risinger
  Cc: Serge E. Hallyn, Eric W. Biederman, Linux Containers, netdev,
	linux-kernel
In-Reply-To: <BANLkTinbw6pZjhMscfXFMArd=XU=VC=+eQ@mail.gmail.com>

2011/5/25 C Anthony Risinger <anthony@xtfx.me>:
> On Wed, May 25, 2011 at 4:38 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
>> Quoting C Anthony Risinger (anthony@xtfx.me):
[...]
>>> if i understand correctly, mount namespaces (for example), allow one
>>> to build such constructs as "private /tmp" and similar that even
>>> `root` cannot access ... and there are many reasons `root` does not
>>> deserve to completely know/interact with user processes (FUSE makes a
>>> good example ... just because i [user] have SSH access to a machine,
>>> why should `root`?)
>> If for instance you have a file open in your private /tmp, then root
>> in another mounts ns can open the file through /proc/$$/fd/N anyway.
>> If it's a directory, he can now traverse the whole fs.
> aaah right :-( ... there's always another way isn't there ... curse
> you Linux for being so flexible! (just kidding baby i love you)
>
> this seems like a more fundamental issue then?  or should i not expect
> to be able to achieve separation like this?  i ask in the context of
> OS virt via cgroups + namespaces, eg. LXC et al, because i'm about to
> perform a massive overhaul to our crusty sub-2.6.18 infrastructure and
> i've used/followed these technologies for couple years now ... and
> it's starting to feel like "the right time".

You either trust the admin or don't use the machine. There is no third way.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC] af-packet: Save reference to bound network device.
From: David Miller @ 2011-05-25 22:14 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <4DDD7D16.6030907@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Wed, 25 May 2011 15:05:10 -0700

> Doesn't this piece of code take care of that?
> I tested with rmmod..but of course I could have missed something.
> 
> @@ -2266,6 +2284,10 @@ static int packet_notifier(struct
> notifier_block *this, unsigned long msg, void
>  				}
>  				if (msg == NETDEV_UNREGISTER) {
>  					po->ifindex = -1;
> +					if (po->bound_dev) {
> + dev_put(po->bound_dev);
> +						po->bound_dev = NULL;
> +					}
>  					po->prot_hook.dev = NULL;
>  				}
>  				spin_unlock(&po->bind_lock);
> 

Indeed, it should, thanks for pointing that out.

Wait a second, why do you need to store the device a second
time, can't you get at po->prot_hook.dev in all the necessary
spots?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox