* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)
From: Jay Vosburgh @ 2011-05-25 18:59 UTC (permalink / raw)
To: Neil Horman; +Cc: netdev, Andy Gospodarek, nicolas.2p.debian, David S. Miller
In-Reply-To: <1306347181-3757-1-git-send-email-nhorman@tuxdriver.com>
Neil Horman <nhorman@tuxdriver.com> wrote:
>This soft lockup was recently reported:
>
>[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
>[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>bonding bond5: master_dev is not up in bond_enslave
>[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>
>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>CPU 12:
>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>be2d
>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>RIP: 0010:[<ffffffff80064bf0>] [<ffffffff80064bf0>]
>.text.lock.spinlock+0x26/00
>RSP: 0018:ffff810113167da8 EFLAGS: 00000286
>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>
>Call Trace:
> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> [<ffffffff80016b87>] vfs_write+0xce/0x174
> [<ffffffff80017450>] sys_write+0x45/0x6e
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>It occurs because we are able to change the slave configuarion of a bond while
>the bond interface is down. The bonding driver initializes some data structures
>only after its ndo_open routine is called. Among them is the initalization of
>the alb tx and rx hash locks. So if we add or remove a slave without first
>opening the bond master device, we run the risk of trying to lock/unlock a
>spinlock that has garbage for data in it, which results in our above softlock.
>
>Note that sometimes this works, because in many cases an unlocked spinlock has
>the raw_lock parameter initialized to zero (meaning that the kzalloc of the
>net_device private data is equivalent to calling spin_lock_init), but thats not
>true in all cases, and we aren't guaranteed that condition, so we need to pass
>the relevant spinlocks through the spin_lock_init function.
>
>Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
>the ndo_init path, so they are ready for use by the bond_store_slaves path.
>
>Change notes:
>v2) Based on conversation with Jay and Nicolas it seems that the ability to
>enslave devices while the bond master is down should be safe to do. As such
>this is an outlier bug, and so instead we'll just initalize the errant spinlocks
>in the init path rather than the open path, solving the problem. We'll also
>remove the warnings about the bond being down during enslave operations, since
>it should be safe
>
>v3) Fix spelling error
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
>Reported-by: jtluka@redhat.com
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: nicolas.2p.debian@gmail.com
>CC: "David S. Miller" <davem@davemloft.net>
>---
> drivers/net/bonding/bond_alb.c | 4 ----
> drivers/net/bonding/bond_main.c | 16 ++++++++++------
> drivers/net/bonding/bond_sysfs.c | 6 ------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 8f2d2e7..2df9276 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
> struct tlb_client_info *new_hashtbl;
> int i;
>
>- spin_lock_init(&(bond_info->tx_hashtbl_lock));
>-
> new_hashtbl = kzalloc(size, GFP_KERNEL);
> if (!new_hashtbl) {
> pr_err("%s: Error: Failed to allocate TLB hash table\n",
>@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
> int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
> int i;
>
>- spin_lock_init(&(bond_info->rx_hashtbl_lock));
>-
> new_hashtbl = kmalloc(size, GFP_KERNEL);
> if (!new_hashtbl) {
> pr_err("%s: Error: Failed to allocate RLB hash table\n",
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 088fd84..c0045d7 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> bond_dev->name, slave_dev->name);
> }
>
>- /* bond must be initialized by bond_open() before enslaving */
>- if (!(bond_dev->flags & IFF_UP)) {
>- pr_warning("%s: master_dev is not up in bond_enslave\n",
>- bond_dev->name);
>- }
>-
> /* already enslaved */
> if (slave_dev->flags & IFF_SLAVE) {
> pr_debug("Error, Device was already enslaved\n");
>@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
> {
> struct bonding *bond = netdev_priv(bond_dev);
> struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
>+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>
> pr_debug("Begin bond_init for %s\n", bond_dev->name);
>
>+ /*
>+ * Initialize locks that may be required during
>+ * en/deslave operations. All of the bond_open work
>+ * (of which this is part) should really be moved to
>+ * a phase prior to dev_open
>+ */
>+ spin_lock_init(&(bond_info->tx_hashtbl_lock));
>+ spin_lock_init(&(bond_info->rx_hashtbl_lock));
>+
> bond->wq = create_singlethread_workqueue(bond_dev->name);
> if (!bond->wq)
> return -ENOMEM;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 4059bfc..bb1319f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
> struct net_device *dev;
> struct bonding *bond = to_bond(d);
>
>- /* Quick sanity check -- is the bond interface up? */
>- if (!(bond->dev->flags & IFF_UP)) {
>- pr_warning("%s: doing slave updates when interface is down.\n",
>- bond->dev->name);
>- }
>-
> if (!rtnl_trylock())
> return restart_syscall();
>
>--
>1.7.5.2
>
^ permalink raw reply
* Re: [PATCH 12/34] isdn/diva: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 20:38 UTC (permalink / raw)
To: Armin Schindler; +Cc: linux-kbuild, linux-kernel, netdev
In-Reply-To: <alpine.DEB.2.00.1104051709040.7262@justus.melware.de>
Dne 5.4.2011 17:10, Armin Schindler napsal(a):
> On Tue, 5 Apr 2011, Michal Marek wrote:
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>
> The module can be build separately from the kernel, therefore it can have
> an own build timestamp.
So the module timestamp and kernel timestamp vary by a couple of
minutes. But is it really a problem? I don't think so. So is there an
objection against applying this patch?
thanks,
Michal
^ permalink raw reply
* Re: [PATCH 18/34] wan/pc300: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 20:43 UTC (permalink / raw)
To: linux-kbuild; +Cc: linux-kernel, David S. Miller, netdev
In-Reply-To: <1302015561-21047-19-git-send-email-mmarek@suse.cz>
Dne 5.4.2011 16:59, Michal Marek napsal(a):
> The kernel already prints its build timestamp during boot, no need to
> repeat it in random drivers and produce different object files each
> time.
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Marek <mmarek@suse.cz>
> ---
> drivers/net/wan/pc300_drv.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
I don't see the patch in today's linux-next. Does anyone object against
me applying it to kbuild-2.6.git?
Thanks,
Michal
> diff --git a/drivers/net/wan/pc300_drv.c b/drivers/net/wan/pc300_drv.c
> index f875cfa..80ae503 100644
> --- a/drivers/net/wan/pc300_drv.c
> +++ b/drivers/net/wan/pc300_drv.c
> @@ -3242,8 +3242,7 @@ static inline void show_version(void)
> rcsdate++;
> tmp = strrchr(rcsdate, ' ');
> *tmp = '\0';
> - printk(KERN_INFO "Cyclades-PC300 driver %s %s (built %s %s)\n",
> - rcsvers, rcsdate, __DATE__, __TIME__);
> + printk(KERN_INFO "Cyclades-PC300 driver %s %s\n", rcsvers, rcsdate);
> } /* show_version */
>
> static const struct net_device_ops cpc_netdev_ops = {
^ permalink raw reply
* Re: [PATCH 18/34] wan/pc300: Drop __TIME__ usage
From: David Miller @ 2011-05-25 20:44 UTC (permalink / raw)
To: mmarek; +Cc: linux-kbuild, linux-kernel, netdev
In-Reply-To: <4DDD69DD.7030601@suse.cz>
From: Michal Marek <mmarek@suse.cz>
Date: Wed, 25 May 2011 22:43:09 +0200
> Dne 5.4.2011 16:59, Michal Marek napsal(a):
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Michal Marek <mmarek@suse.cz>
>> ---
>> drivers/net/wan/pc300_drv.c | 3 +--
>> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> I don't see the patch in today's linux-next. Does anyone object against
> me applying it to kbuild-2.6.git?
Just do it :-)
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH 28/34] atm: Drop __TIME__ usage
From: Michal Marek @ 2011-05-25 20:49 UTC (permalink / raw)
To: linux-kbuild; +Cc: linux-kernel, David S. Miller, netdev
In-Reply-To: <1302015561-21047-29-git-send-email-mmarek@suse.cz>
Dne 5.4.2011 16:59, Michal Marek napsal(a):
> The kernel already prints its build timestamp during boot, no need to
> repeat it in random drivers and produce different object files each
> time.
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Michal Marek <mmarek@suse.cz>
> ---
> net/atm/lec.c | 2 +-
> net/atm/mpc.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
Hi,
I don't see this patch in today's linux-next. Any objection against me
applying it to the kbuild-2.6.git repository?
Thanks,
Michal
> diff --git a/net/atm/lec.c b/net/atm/lec.c
> index 38754fd..cb14ddf 100644
> --- a/net/atm/lec.c
> +++ b/net/atm/lec.c
> @@ -1173,7 +1173,7 @@ static int __init lane_module_init(void)
> #endif
>
> register_atm_ioctl(&lane_ioctl_ops);
> - pr_info("lec.c: " __DATE__ " " __TIME__ " initialized\n");
> + pr_info("lec.c: initialized\n");
> return 0;
> }
>
> diff --git a/net/atm/mpc.c b/net/atm/mpc.c
> index 644cdf0..3ccca42 100644
> --- a/net/atm/mpc.c
> +++ b/net/atm/mpc.c
> @@ -1482,7 +1482,7 @@ static __init int atm_mpoa_init(void)
> if (mpc_proc_init() != 0)
> pr_info("failed to initialize /proc/mpoa\n");
>
> - pr_info("mpc.c: " __DATE__ " " __TIME__ " initialized\n");
> + pr_info("mpc.c: initialized\n");
>
> return 0;
> }
^ permalink raw reply
* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: C Anthony Risinger @ 2011-05-25 21:05 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux Containers, netdev, linux-kernel
In-Reply-To: <m1wrhh3z62.fsf@fess.ebiederm.org>
On Mon, May 23, 2011 at 4:05 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> This tree adds the files /proc/<pid>/ns/net, /proc/<pid>/ns/ipc,
> /proc/<pid>/ns/uts that can be opened to refer to the namespaces of a
> process at the time those files are opened, and can be bind mounted to
> keep the specified namespace alive without a process.
>
> This tree adds the setns system call that can be used to change the
> specified namespace of a process to the namespace specified by a system
> call.
i just have a quick question regarding these, apologies if wrong place
to respond -- i trimmed to lists only.
if i understand correctly, mount namespaces (for example), allow one
to build such constructs as "private /tmp" and similar that even
`root` cannot access ... and there are many reasons `root` does not
deserve to completely know/interact with user processes (FUSE makes a
good example ... just because i [user] have SSH access to a machine,
why should `root`?)
would these /proc additions break such guarantees? IOW, would it now
become possible for `root` to inject stuff into my private namespaces,
and/or has these guarantees never existed and i am mistaken? is there
any kind of ACL mechanism that endows the origin process (or similar)
with the ability to dictate who can hold and/or interact with these
references?
Thanks for your time,
--
C Anthony
^ permalink raw reply
* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Serge E. Hallyn @ 2011-05-25 21:38 UTC (permalink / raw)
To: C Anthony Risinger
Cc: Eric W. Biederman, Linux Containers, netdev, linux-kernel
In-Reply-To: <BANLkTikW4vJbC8kcLSKuemUBbu36SO6hwg@mail.gmail.com>
Quoting C Anthony Risinger (anthony@xtfx.me):
> On Mon, May 23, 2011 at 4:05 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> >
> > This tree adds the files /proc/<pid>/ns/net, /proc/<pid>/ns/ipc,
> > /proc/<pid>/ns/uts that can be opened to refer to the namespaces of a
> > process at the time those files are opened, and can be bind mounted to
> > keep the specified namespace alive without a process.
> >
> > This tree adds the setns system call that can be used to change the
> > specified namespace of a process to the namespace specified by a system
> > call.
>
> i just have a quick question regarding these, apologies if wrong place
> to respond -- i trimmed to lists only.
>
> if i understand correctly, mount namespaces (for example), allow one
> to build such constructs as "private /tmp" and similar that even
> `root` cannot access ... and there are many reasons `root` does not
> deserve to completely know/interact with user processes (FUSE makes a
> good example ... just because i [user] have SSH access to a machine,
> why should `root`?)
>
> would these /proc additions break such guarantees? IOW, would it now
> become possible for `root` to inject stuff into my private namespaces,
> and/or has these guarantees never existed and i am mistaken? is there
> any kind of ACL mechanism that endows the origin process (or similar)
> with the ability to dictate who can hold and/or interact with these
> references?
If for instance you have a file open in your private /tmp, then root
in another mounts ns can open the file through /proc/$$/fd/N anyway.
If it's a directory, he can now traverse the whole fs.
-serge
^ permalink raw reply
* Re: [PATCH 28/34] atm: Drop __TIME__ usage
From: David Miller @ 2011-05-25 21:39 UTC (permalink / raw)
To: mmarek; +Cc: linux-kbuild, linux-kernel, netdev
In-Reply-To: <4DDD6B59.9000108@suse.cz>
From: Michal Marek <mmarek@suse.cz>
Date: Wed, 25 May 2011 22:49:29 +0200
> Dne 5.4.2011 16:59, Michal Marek napsal(a):
>> The kernel already prints its build timestamp during boot, no need to
>> repeat it in random drivers and produce different object files each
>> time.
>>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Michal Marek <mmarek@suse.cz>
>> ---
>> net/atm/lec.c | 2 +-
>> net/atm/mpc.c | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> Hi,
>
> I don't see this patch in today's linux-next. Any objection against me
> applying it to the kbuild-2.6.git repository?
Please apply it:
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: C Anthony Risinger @ 2011-05-25 21:55 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Eric W. Biederman, Linux Containers, netdev, linux-kernel
In-Reply-To: <20110525213806.GA4590@mail.hallyn.com>
On Wed, May 25, 2011 at 4:38 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> Quoting C Anthony Risinger (anthony@xtfx.me):
>> On Mon, May 23, 2011 at 4:05 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>> >
>> > This tree adds the files /proc/<pid>/ns/net, /proc/<pid>/ns/ipc,
>> > /proc/<pid>/ns/uts that can be opened to refer to the namespaces of a
>> > process at the time those files are opened, and can be bind mounted to
>> > keep the specified namespace alive without a process.
>> >
>> > This tree adds the setns system call that can be used to change the
>> > specified namespace of a process to the namespace specified by a system
>> > call.
>>
>> i just have a quick question regarding these, apologies if wrong place
>> to respond -- i trimmed to lists only.
>>
>> if i understand correctly, mount namespaces (for example), allow one
>> to build such constructs as "private /tmp" and similar that even
>> `root` cannot access ... and there are many reasons `root` does not
>> deserve to completely know/interact with user processes (FUSE makes a
>> good example ... just because i [user] have SSH access to a machine,
>> why should `root`?)
>>
>> would these /proc additions break such guarantees? IOW, would it now
>> become possible for `root` to inject stuff into my private namespaces,
>> and/or has these guarantees never existed and i am mistaken? is there
>> any kind of ACL mechanism that endows the origin process (or similar)
>> with the ability to dictate who can hold and/or interact with these
>> references?
>
> If for instance you have a file open in your private /tmp, then root
> in another mounts ns can open the file through /proc/$$/fd/N anyway.
> If it's a directory, he can now traverse the whole fs.
aaah right :-( ... there's always another way isn't there ... curse
you Linux for being so flexible! (just kidding baby i love you)
this seems like a more fundamental issue then? or should i not expect
to be able to achieve separation like this? i ask in the context of
OS virt via cgroups + namespaces, eg. LXC et al, because i'm about to
perform a massive overhaul to our crusty sub-2.6.18 infrastructure and
i've used/followed these technologies for couple years now ... and
it's starting to feel like "the right time".
C Anthony
^ permalink raw reply
* [RFC] af-packet: Save reference to bound network device.
From: greearb @ 2011-05-25 21:56 UTC (permalink / raw)
To: netdev; +Cc: Ben Greear
From: Ben Greear <greearb@candelatech.com>
This saves a network device lookup on each packet transmitted,
for sockets that are bound to a network device.
Signed-off-by: Ben Greear <greearb@candelatech.com>
---
:100644 100644 4005b24... ba16759... M net/packet/af_packet.c
net/packet/af_packet.c | 48 +++++++++++++++++++++++++++++++++++-------------
1 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4005b24..ba16759 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -201,8 +201,9 @@ struct packet_sock {
auxdata:1,
origdev:1,
has_vnet_hdr:1;
- int ifindex; /* bound device */
+ int ifindex; /* bound device id */
__be16 num;
+ struct net_device *bound_dev; /* bound device */
struct packet_mclist *mclist;
atomic_t mapped;
enum tpacket_versions tp_version;
@@ -987,8 +988,9 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
{
struct sk_buff *skb;
- struct net_device *dev;
+ struct net_device *dev = NULL;
__be16 proto;
+ bool need_rls_dev = false;
int ifindex, err, reserve = 0;
void *ph;
struct sockaddr_ll *saddr = (struct sockaddr_ll *)msg->msg_name;
@@ -1002,6 +1004,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
err = -EBUSY;
if (saddr == NULL) {
ifindex = po->ifindex;
+ dev = po->bound_dev;
proto = po->num;
addr = NULL;
} else {
@@ -1017,7 +1020,10 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
addr = saddr->sll_addr;
}
- dev = dev_get_by_index(sock_net(&po->sk), ifindex);
+ if (!dev) {
+ dev = dev_get_by_index(sock_net(&po->sk), ifindex);
+ need_rls_dev = true;
+ }
err = -ENXIO;
if (unlikely(dev == NULL))
goto out;
@@ -1103,7 +1109,8 @@ out_status:
__packet_set_status(po, ph, status);
kfree_skb(skb);
out_put:
- dev_put(dev);
+ if (need_rls_dev)
+ dev_put(dev);
out:
mutex_unlock(&po->pg_vec_lock);
return err;
@@ -1139,8 +1146,9 @@ static int packet_snd(struct socket *sock,
struct sock *sk = sock->sk;
struct sockaddr_ll *saddr = (struct sockaddr_ll *)msg->msg_name;
struct sk_buff *skb;
- struct net_device *dev;
+ struct net_device *dev = NULL;
__be16 proto;
+ bool need_rls_dev = false;
unsigned char *addr;
int ifindex, err, reserve = 0;
struct virtio_net_hdr vnet_hdr = { 0 };
@@ -1161,6 +1169,7 @@ static int packet_snd(struct socket *sock,
if (saddr == NULL) {
ifindex = po->ifindex;
+ dev = po->bound_dev;
proto = po->num;
addr = NULL;
} else {
@@ -1174,8 +1183,11 @@ static int packet_snd(struct socket *sock,
addr = saddr->sll_addr;
}
+ if (!dev) {
+ dev = dev_get_by_index(sock_net(sk), ifindex);
+ need_rls_dev = true;
+ }
- dev = dev_get_by_index(sock_net(sk), ifindex);
err = -ENXIO;
if (dev == NULL)
goto out_unlock;
@@ -1315,14 +1327,15 @@ static int packet_snd(struct socket *sock,
if (err > 0 && (err = net_xmit_errno(err)) != 0)
goto out_unlock;
- dev_put(dev);
+ if (need_rls_dev)
+ dev_put(dev);
return len;
out_free:
kfree_skb(skb);
out_unlock:
- if (dev)
+ if (dev && need_rls_dev)
dev_put(dev);
out:
return err;
@@ -1372,6 +1385,12 @@ static int packet_release(struct socket *sock)
__dev_remove_pack(&po->prot_hook);
__sock_put(sk);
}
+
+ if (po->bound_dev) {
+ dev_put(po->bound_dev);
+ po->bound_dev = NULL;
+ }
+
spin_unlock(&po->bind_lock);
packet_flush_mclist(sk);
@@ -1428,6 +1447,9 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 protoc
po->prot_hook.dev = dev;
po->ifindex = dev ? dev->ifindex : 0;
+ if (po->bound_dev)
+ dev_put(po->bound_dev);
+ po->bound_dev = dev;
if (protocol == 0)
goto out_unlock;
@@ -1469,10 +1491,8 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
strlcpy(name, uaddr->sa_data, sizeof(name));
dev = dev_get_by_name(sock_net(sk), name);
- if (dev) {
+ if (dev)
err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
- dev_put(dev);
- }
return err;
}
@@ -1500,8 +1520,6 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len
goto out;
}
err = packet_do_bind(sk, dev, sll->sll_protocol ? : pkt_sk(sk)->num);
- if (dev)
- dev_put(dev);
out:
return err;
@@ -2266,6 +2284,10 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
}
if (msg == NETDEV_UNREGISTER) {
po->ifindex = -1;
+ if (po->bound_dev) {
+ dev_put(po->bound_dev);
+ po->bound_dev = NULL;
+ }
po->prot_hook.dev = NULL;
}
spin_unlock(&po->bind_lock);
--
1.7.3.4
^ permalink raw reply related
* Re: [PATCH] sctp: fix memory leak of the ASCONF queue when free asoc
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
To: yjwei; +Cc: vladislav.yasevich, netdev, linux-sctp
In-Reply-To: <4DDCB432.7000903@cn.fujitsu.com>
From: Wei Yongjun <yjwei@cn.fujitsu.com>
Date: Wed, 25 May 2011 15:48:02 +0800
> If an ASCONF chunk is outstanding, then the following ASCONF
> chunk will be queued for later transmission. But when we free
> the asoc, we forget to free the ASCONF queue at the same time,
> this will cause memory leak.
>
> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [PATCH] xen: netfront: hold RTNL when updating features.
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
To: ian.campbell; +Cc: xen-devel, netdev
In-Reply-To: <1306310162-5627-1-git-send-email-ian.campbell@citrix.com>
From: Ian Campbell <ian.campbell@citrix.com>
Date: Wed, 25 May 2011 08:56:02 +0100
> Konrad reports:
> [ 0.930811] RTNL: assertion failed at /home/konrad/ssd/linux/net/core/dev.c (5258)
> [ 0.930821] Pid: 22, comm: xenwatch Not tainted 2.6.39-05193-gd762f43 #1
> [ 0.930825] Call Trace:
> [ 0.930834] [<ffffffff8143bd0e>] __netdev_update_features+0xae/0xe0
> [ 0.930840] [<ffffffff8143dd41>] netdev_update_features+0x11/0x30
> [ 0.930847] [<ffffffffa0037105>] netback_changed+0x4e5/0x800 [xen_netfront]
> [ 0.930854] [<ffffffff8132a838>] xenbus_otherend_changed+0xa8/0xb0
> [ 0.930860] [<ffffffff8157ca99>] ? _raw_spin_unlock_irqrestore+0x19/0x20
> [ 0.930866] [<ffffffff8132adfe>] backend_changed+0xe/0x10
> [ 0.930871] [<ffffffff8132875a>] xenwatch_thread+0xba/0x180
> [ 0.930876] [<ffffffff810a8ba0>] ? wake_up_bit+0x40/0x40
> [ 0.930881] [<ffffffff813286a0>] ? split+0xf0/0xf0
> [ 0.930886] [<ffffffff810a8646>] kthread+0x96/0xa0
> [ 0.930891] [<ffffffff815855a4>] kernel_thread_helper+0x4/0x10
> [ 0.930896] [<ffffffff815846b3>] ? int_ret_from_sys_call+0x7/0x1b
> [ 0.930901] [<ffffffff8157cf61>] ? retint_restore_args+0x5/0x6
> [ 0.930906] [<ffffffff815855a0>] ? gs_change+0x13/0x13
>
> This update happens in xenbus watch callback context and hence does not already
> hold the rtnl. Take the lock as necessary.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Applied.
^ permalink raw reply
* Re: [PATCH 2/2] net: make dev_disable_lro use physical device if passed a vlan dev (v2)
From: David Miller @ 2011-05-25 21:56 UTC (permalink / raw)
To: nhorman; +Cc: netdev, bhutchings
In-Reply-To: <1306261869-7276-3-git-send-email-nhorman@tuxdriver.com>
From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 24 May 2011 14:31:09 -0400
> If the device passed into dev_disable_lro is a vlan, then repoint the dev
> poniter so that we actually modify the underlying physical device.
>
> Signed-of-by: Neil Horman <nhorman@tuxdriver.com>
Applied.
^ permalink raw reply
* Re: [PATCH] via-velocity: don't annotate MAC registers as packed
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
To: uli; +Cc: netdev
In-Reply-To: <1306321642-2861-1-git-send-email-uli@suse.de>
From: Ulrich Hecht <uli@suse.de>
Date: Wed, 25 May 2011 13:07:22 +0200
> On ARM, memory accesses through packed pointers behave in unexpected
> ways in GCC releases 4.3 and higher; see https://lkml.org/lkml/2011/2/2/163
> for discussion.
>
> In this particular case, 32-bit I/O registers are accessed bytewise,
> causing incorrect setting of the DMA address registers which in turn
> leads to an error interrupt storm that brings the system to a halt.
>
> Since the mac_regs structure does not need any packing anyway, this patch
> simply removes the attribute to fix the issue.
>
> Signed-off-by: Ulrich Hecht <uli@suse.de>
Applied.
^ permalink raw reply
* Re: [PATCH]: isdn: netjet - blacklist Digium TDM400P
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
To: prarit; +Cc: netdev, isdn
In-Reply-To: <20110525121223.26439.57412.sendpatchset@prarit.bos.redhat.com>
From: Prarit Bhargava <prarit@redhat.com>
Date: Wed, 25 May 2011 08:12:23 -0400
> [2nd try ... 1st attempt didn't make it to netdev mailing list]
>
> A quick google search reveals that people with this card are blacklisting it
> in the initramfs and in the module blacklist based on a statement that it
> is unsupported. Since the older Digium is also unsupported I'm pretty
> confident that this newer card is also not supported.
>
> lspci -xxx -vv shows
>
> 04:07.0 Communication controller: Tiger Jet Network Inc. Tiger3XX Modem/ISDN interface
> Subsystem: Device b100:0003
> P.
>
> ----8<----
> The Asterisk Voice Card, DIGIUM TDM400P is unsupported by the netjet driver.
> Blacklist it like the Digium X100P/X101P card.
>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] net: move is_vlan_dev into public header file (v2)
From: David Miller @ 2011-05-25 21:56 UTC (permalink / raw)
To: nhorman; +Cc: netdev, bhutchings
In-Reply-To: <1306261869-7276-2-git-send-email-nhorman@tuxdriver.com>
From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 24 May 2011 14:31:08 -0400
> Migrate is_vlan_dev() to if_vlan.h so that core networkig can use it
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-2.6] bnx2x: protect sequence increment with mutex
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
To: dmitry; +Cc: netdev, eilong
In-Reply-To: <1306335351.15327.0.camel@lb-tlvb-dmitry>
From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Wed, 25 May 2011 17:55:51 +0300
>
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2] bonding: documentation and code cleanup for resend_igmp
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
To: fbl; +Cc: netdev, fubar, andy, rick.jones2
In-Reply-To: <1306348738-23946-1-git-send-email-fbl@redhat.com>
From: Flavio Leitner <fbl@redhat.com>
Date: Wed, 25 May 2011 15:38:58 -0300
> Improves the documentation about how IGMP resend parameter
> works, fix two missing checks and coding style issues.
>
> Signed-off-by: Flavio Leitner <fbl@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v2)
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
To: nhorman; +Cc: netdev, fubar, andy, nicolas.2p.debian
In-Reply-To: <1306343805-3223-1-git-send-email-nhorman@tuxdriver.com>
From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 25 May 2011 13:16:45 -0400
> This soft lockup was recently reported:
...
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: jtluka@redhat.com
Applied.
^ permalink raw reply
* Re: [PATCH] net: hold rtnl again in dump callbacks
From: David Miller @ 2011-05-25 21:58 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, kaber, shemminger
In-Reply-To: <1306344844.11400.11.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 May 2011 19:34:04 +0200
> Commit e67f88dd12f6 (dont hold rtnl mutex during netlink dump callbacks)
> missed fact that rtnl_fill_ifinfo() must be called with rtnl held.
>
> Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
> its not easy to solve this problem, so revert this part of the patch.
>
> It also forgot one rcu_read_unlock() in FIB dump_rules()
>
> Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>
> CC: Stephen Hemminger <shemminger@vyatta.com>
> ---
> I tried to solve this problem differently but failed. We need more
> preliminary steps.
Applied.
^ permalink raw reply
* Re: [PATCH] sch_sfq: fix peek() implementation
From: David Miller @ 2011-05-25 21:57 UTC (permalink / raw)
To: eric.dumazet; +Cc: jarkao2, kaber, netdev, hawk
In-Reply-To: <1306334411.2820.26.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 May 2011 16:40:11 +0200
> Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair),
> sfq_peek() can return a different skb that would be normally dequeued by
> sfq_dequeue() [ if current slot->allot is negative ]
>
> Use generic qdisc_peek_dequeued() instead of custom implementation, to
> get consistent result.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [RFC] af-packet: Save reference to bound network device.
From: David Miller @ 2011-05-25 22:01 UTC (permalink / raw)
To: greearb; +Cc: netdev
In-Reply-To: <1306360602-9672-1-git-send-email-greearb@candelatech.com>
From: greearb@candelatech.com
Date: Wed, 25 May 2011 14:56:42 -0700
> From: Ben Greear <greearb@candelatech.com>
>
> This saves a network device lookup on each packet transmitted,
> for sockets that are bound to a network device.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
You can't hold onto devices like this unless you also add a netdev
event notifier that will release it. Otherwise we'll hang on net
driver module unload until the packet socket is closed.
I don't think you really want to walk all pf-packet sockets on netdev
events just to do this.
dev_get_by_index(,_rcu}() is insanely cheap, I doubt it's showing up
on your profiles at all.
^ permalink raw reply
* Re: [RFC] af-packet: Save reference to bound network device.
From: Ben Greear @ 2011-05-25 22:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20110525.180113.1194226831134092545.davem@davemloft.net>
On 05/25/2011 03:01 PM, David Miller wrote:
> From: greearb@candelatech.com
> Date: Wed, 25 May 2011 14:56:42 -0700
>
>> From: Ben Greear<greearb@candelatech.com>
>>
>> This saves a network device lookup on each packet transmitted,
>> for sockets that are bound to a network device.
>>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>
> You can't hold onto devices like this unless you also add a netdev
> event notifier that will release it. Otherwise we'll hang on net
> driver module unload until the packet socket is closed.
>
> I don't think you really want to walk all pf-packet sockets on netdev
> events just to do this.
Doesn't this piece of code take care of that?
I tested with rmmod..but of course I could have missed something.
@@ -2266,6 +2284,10 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
}
if (msg == NETDEV_UNREGISTER) {
po->ifindex = -1;
+ if (po->bound_dev) {
+ dev_put(po->bound_dev);
+ po->bound_dev = NULL;
+ }
po->prot_hook.dev = NULL;
}
spin_unlock(&po->bind_lock);
>
> dev_get_by_index(,_rcu}() is insanely cheap, I doubt it's showing up
> on your profiles at all.
I admit it was a small change...maybe 5Mbps (from 165 to 170Mbps in
this particular test), but it did seem to improve things a bit.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Michał Mirosław @ 2011-05-25 22:11 UTC (permalink / raw)
To: C Anthony Risinger
Cc: Serge E. Hallyn, Eric W. Biederman, Linux Containers, netdev,
linux-kernel
In-Reply-To: <BANLkTinbw6pZjhMscfXFMArd=XU=VC=+eQ@mail.gmail.com>
2011/5/25 C Anthony Risinger <anthony@xtfx.me>:
> On Wed, May 25, 2011 at 4:38 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
>> Quoting C Anthony Risinger (anthony@xtfx.me):
[...]
>>> if i understand correctly, mount namespaces (for example), allow one
>>> to build such constructs as "private /tmp" and similar that even
>>> `root` cannot access ... and there are many reasons `root` does not
>>> deserve to completely know/interact with user processes (FUSE makes a
>>> good example ... just because i [user] have SSH access to a machine,
>>> why should `root`?)
>> If for instance you have a file open in your private /tmp, then root
>> in another mounts ns can open the file through /proc/$$/fd/N anyway.
>> If it's a directory, he can now traverse the whole fs.
> aaah right :-( ... there's always another way isn't there ... curse
> you Linux for being so flexible! (just kidding baby i love you)
>
> this seems like a more fundamental issue then? or should i not expect
> to be able to achieve separation like this? i ask in the context of
> OS virt via cgroups + namespaces, eg. LXC et al, because i'm about to
> perform a massive overhaul to our crusty sub-2.6.18 infrastructure and
> i've used/followed these technologies for couple years now ... and
> it's starting to feel like "the right time".
You either trust the admin or don't use the machine. There is no third way.
Best Regards,
Michał Mirosław
^ permalink raw reply
* Re: [RFC] af-packet: Save reference to bound network device.
From: David Miller @ 2011-05-25 22:14 UTC (permalink / raw)
To: greearb; +Cc: netdev
In-Reply-To: <4DDD7D16.6030907@candelatech.com>
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 25 May 2011 15:05:10 -0700
> Doesn't this piece of code take care of that?
> I tested with rmmod..but of course I could have missed something.
>
> @@ -2266,6 +2284,10 @@ static int packet_notifier(struct
> notifier_block *this, unsigned long msg, void
> }
> if (msg == NETDEV_UNREGISTER) {
> po->ifindex = -1;
> + if (po->bound_dev) {
> + dev_put(po->bound_dev);
> + po->bound_dev = NULL;
> + }
> po->prot_hook.dev = NULL;
> }
> spin_unlock(&po->bind_lock);
>
Indeed, it should, thanks for pointing that out.
Wait a second, why do you need to store the device a second
time, can't you get at po->prot_hook.dev in all the necessary
spots?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox