* [PATCH net-next] packet: Protect packet sk list with mutex
@ 2012-08-20 14:50 Pavel Emelyanov
2012-08-20 15:11 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Pavel Emelyanov @ 2012-08-20 14:50 UTC (permalink / raw)
To: David Miller, Linux Netdev List
In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken. Similar thing
was then re-introduced by further packet diag patches (fanount mutex and
pgvec mutex for rings) :(
Apart from being terribly sorry for the above, I propose to change the
packet sk list protection from spinlock to mutex. This lock currently
protects only the sklist modifications (that already happen in sleeping
context) and nothing more.
Am I wrong again and a fine-grained atomic locking is required for
everything that is reported by packet diag instead?
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
diff --git a/include/net/netns/packet.h b/include/net/netns/packet.h
index cb4e894..4780b08 100644
--- a/include/net/netns/packet.h
+++ b/include/net/netns/packet.h
@@ -8,7 +8,7 @@
#include <linux/spinlock.h>
struct netns_packet {
- spinlock_t sklist_lock;
+ struct mutex sklist_lock;
struct hlist_head sklist;
};
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 226b2cd..5048672 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2308,10 +2308,10 @@ static int packet_release(struct socket *sock)
net = sock_net(sk);
po = pkt_sk(sk);
- spin_lock_bh(&net->packet.sklist_lock);
+ mutex_lock(&net->packet.sklist_lock);
sk_del_node_init_rcu(sk);
sock_prot_inuse_add(net, sk->sk_prot, -1);
- spin_unlock_bh(&net->packet.sklist_lock);
+ mutex_unlock(&net->packet.sklist_lock);
spin_lock(&po->bind_lock);
unregister_prot_hook(sk, false);
@@ -2510,10 +2510,10 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
register_prot_hook(sk);
}
- spin_lock_bh(&net->packet.sklist_lock);
+ mutex_lock(&net->packet.sklist_lock);
sk_add_node_rcu(sk, &net->packet.sklist);
sock_prot_inuse_add(net, &packet_proto, 1);
- spin_unlock_bh(&net->packet.sklist_lock);
+ mutex_unlock(&net->packet.sklist_lock);
return 0;
out:
@@ -3766,7 +3766,7 @@ static const struct file_operations packet_seq_fops = {
static int __net_init packet_net_init(struct net *net)
{
- spin_lock_init(&net->packet.sklist_lock);
+ mutex_init(&net->packet.sklist_lock);
INIT_HLIST_HEAD(&net->packet.sklist);
if (!proc_net_fops_create(net, "packet", 0, &packet_seq_fops))
diff --git a/net/packet/diag.c b/net/packet/diag.c
index bc33fbe..39bce0d 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -177,8 +177,8 @@ static int packet_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
net = sock_net(skb->sk);
req = nlmsg_data(cb->nlh);
- rcu_read_lock();
- sk_for_each_rcu(sk, node, &net->packet.sklist) {
+ mutex_lock(&net->packet.sklist_lock);
+ sk_for_each(sk, node, &net->packet.sklist) {
if (!net_eq(sock_net(sk), net))
continue;
if (num < s_num)
@@ -192,7 +192,7 @@ next:
num++;
}
done:
- rcu_read_unlock();
+ mutex_unlock(&net->packet.sklist_lock);
cb->args[0] = num;
return skb->len;
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH net-next] packet: Protect packet sk list with mutex
2012-08-20 14:50 [PATCH net-next] packet: Protect packet sk list with mutex Pavel Emelyanov
@ 2012-08-20 15:11 ` Eric Dumazet
2012-08-20 18:10 ` Pavel Emelyanov
0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2012-08-20 15:11 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List
On Mon, 2012-08-20 at 18:50 +0400, Pavel Emelyanov wrote:
> In patch eea68e2f (packet: Report socket mclist info via diag module) I've
> introduced a "scheduling in atomic" problem in packet diag module -- the
> socket list is traversed under rcu_read_lock() while performed under it sk
> mclist access requires rtnl lock (i.e. -- mutex) to be taken. Similar thing
> was then re-introduced by further packet diag patches (fanount mutex and
> pgvec mutex for rings) :(
>
> Apart from being terribly sorry for the above, I propose to change the
> packet sk list protection from spinlock to mutex. This lock currently
> protects only the sklist modifications (that already happen in sleeping
> context) and nothing more.
>
> Am I wrong again and a fine-grained atomic locking is required for
> everything that is reported by packet diag instead?
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> ---
>
> diff --git a/include/net/netns/packet.h b/include/net/netns/packet.h
> index cb4e894..4780b08 100644
> --- a/include/net/netns/packet.h
> +++ b/include/net/netns/packet.h
> @@ -8,7 +8,7 @@
> #include <linux/spinlock.h>
>
> struct netns_packet {
> - spinlock_t sklist_lock;
> + struct mutex sklist_lock;
> struct hlist_head sklist;
> };
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 226b2cd..5048672 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2308,10 +2308,10 @@ static int packet_release(struct socket *sock)
> net = sock_net(sk);
> po = pkt_sk(sk);
>
> - spin_lock_bh(&net->packet.sklist_lock);
> + mutex_lock(&net->packet.sklist_lock);
> sk_del_node_init_rcu(sk);
> sock_prot_inuse_add(net, sk->sk_prot, -1);
Last time I checked, sock_prot_inuse_add() needed BH protection.
( This could be relaxed somehow on x86 thanks to this_cpu_add() ... but
thats another point)
Could you please report the full stack trace ?
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net-next] packet: Protect packet sk list with mutex
2012-08-20 15:11 ` Eric Dumazet
@ 2012-08-20 18:10 ` Pavel Emelyanov
2012-08-20 19:05 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Pavel Emelyanov @ 2012-08-20 18:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Linux Netdev List
On 08/20/2012 07:11 PM, Eric Dumazet wrote:
> On Mon, 2012-08-20 at 18:50 +0400, Pavel Emelyanov wrote:
>> In patch eea68e2f (packet: Report socket mclist info via diag module) I've
>> introduced a "scheduling in atomic" problem in packet diag module -- the
>> socket list is traversed under rcu_read_lock() while performed under it sk
>> mclist access requires rtnl lock (i.e. -- mutex) to be taken. Similar thing
>> was then re-introduced by further packet diag patches (fanount mutex and
>> pgvec mutex for rings) :(
>>
>> Apart from being terribly sorry for the above, I propose to change the
>> packet sk list protection from spinlock to mutex. This lock currently
>> protects only the sklist modifications (that already happen in sleeping
>> context) and nothing more.
>>
>> Am I wrong again and a fine-grained atomic locking is required for
>> everything that is reported by packet diag instead?
>>
>> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>>
>> ---
>>
>> diff --git a/include/net/netns/packet.h b/include/net/netns/packet.h
>> index cb4e894..4780b08 100644
>> --- a/include/net/netns/packet.h
>> +++ b/include/net/netns/packet.h
>> @@ -8,7 +8,7 @@
>> #include <linux/spinlock.h>
>>
>> struct netns_packet {
>> - spinlock_t sklist_lock;
>> + struct mutex sklist_lock;
>> struct hlist_head sklist;
>> };
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 226b2cd..5048672 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -2308,10 +2308,10 @@ static int packet_release(struct socket *sock)
>> net = sock_net(sk);
>> po = pkt_sk(sk);
>>
>> - spin_lock_bh(&net->packet.sklist_lock);
>> + mutex_lock(&net->packet.sklist_lock);
>> sk_del_node_init_rcu(sk);
>> sock_prot_inuse_add(net, sk->sk_prot, -1);
>
> Last time I checked, sock_prot_inuse_add() needed BH protection.
AFAIU now this is the case for e.g. TCP sockets since they can be created in
rx code. For packet I've found no places where these stats can be modified other
than those two in af_packet.c
> ( This could be relaxed somehow on x86 thanks to this_cpu_add() ... but
> thats another point)
>
> Could you please report the full stack trace ?
Sure:
[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
[152363.820724] Modules linked in:
[152363.820737] Pid: 12517, comm: crtools Tainted: G W 3.6.0-rc1-46175-g07cbd24-dirty #236
[152363.820750] Call Trace:
[152363.820761] [<ffffffff810959c2>] __schedule_bug+0x6a/0x78
[152363.820772] [<ffffffff81c26cb9>] __schedule+0xb1/0x5b1
[152363.820784] [<ffffffff81098af0>] __cond_resched+0x2a/0x35
[152363.820795] [<ffffffff81c27234>] _cond_resched+0x2c/0x37
[152363.820806] [<ffffffff81c26210>] mutex_lock_nested+0x2a/0x45
[152363.820818] [<ffffffff81a2b81c>] rtnl_lock+0x17/0x19
[152363.820829] [<ffffffff81b69de0>] pdiag_put_mclist+0x4e/0xeb
[152363.820840] [<ffffffff81b6a001>] sk_diag_fill.clone.0+0x14c/0x194
[152363.820851] [<ffffffff81c28545>] ? _raw_read_unlock_bh+0x43/0x47
[152363.820863] [<ffffffff81b6a147>] packet_diag_dump+0xfe/0x1af
[152363.820874] [<ffffffff81b6a049>] ? sk_diag_fill.clone.0+0x194/0x194
[152363.820885] [<ffffffff81a67d4b>] netlink_dump+0x6d/0x1ab
[152363.820896] [<ffffffff81a6847f>] netlink_dump_start+0xed/0x114
[152363.820907] [<ffffffff81b69d89>] packet_diag_handler_dump+0x61/0x6a
[152363.820918] [<ffffffff81b6a049>] ? sk_diag_fill.clone.0+0x194/0x194
[152363.820930] [<ffffffff81a2de8b>] sock_diag_rcv_msg+0xf6/0x11a
[152363.820941] [<ffffffff81a2dd95>] ? sock_diag_register_inet_compat+0x36/0x36
[152363.820953] [<ffffffff81a690a0>] netlink_rcv_skb+0x43/0x94
[152363.820979] [<ffffffff81a2dcc4>] sock_diag_rcv+0x2e/0x3e
[152363.820990] [<ffffffff81a68e4f>] netlink_unicast+0xe9/0x16f
[152363.821002] [<ffffffff81a69601>] netlink_sendmsg+0x1e6/0x204
[152363.821129] [<ffffffff81a0ba29>] ? rcu_read_unlock+0x56/0x67
[152363.821143] [<ffffffff81a06877>] __sock_sendmsg_nosec+0x58/0x61
[152363.821155] [<ffffffff81a072b6>] sock_sendmsg+0x6e/0x87
[152363.821168] [<ffffffff8113a172>] ? might_fault+0xa5/0xac
[152363.821180] [<ffffffff81a138ae>] ? copy_from_user+0x2a/0x2c
[152363.821192] [<ffffffff81a13c9e>] ? verify_iovec+0x54/0xaa
[152363.821204] [<ffffffff81a08092>] __sys_sendmsg+0x211/0x2a7
[152363.821216] [<ffffffff810b7515>] ? lock_release_holdtime+0x2c/0xf5
[152363.821228] [<ffffffff810bba71>] ? __lock_release+0x113/0x12c
[152363.821243] [<ffffffff81c28937>] ? _raw_spin_unlock_irq+0x30/0x4a
[152363.821257] [<ffffffff810bafa6>] ? trace_hardirqs_on_caller+0x123/0x15a
[152363.821272] [<ffffffff810bafea>] ? trace_hardirqs_on+0xd/0xf
[152363.821286] [<ffffffff8116dc31>] ? fcheck_files+0xac/0xea
[152363.821297] [<ffffffff8116ddaa>] ? fget_light+0x3a/0xb9
[152363.821309] [<ffffffff81c27183>] ? __schedule+0x57b/0x5b1
[152363.821320] [<ffffffff81a082ed>] sys_sendmsg+0x42/0x60
[152363.821332] [<ffffffff81c295e9>] system_call_fastpath+0x16/0x1b
>
>
Thank,
Pavel
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net-next] packet: Protect packet sk list with mutex
2012-08-20 18:10 ` Pavel Emelyanov
@ 2012-08-20 19:05 ` Eric Dumazet
0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2012-08-20 19:05 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List
On Mon, 2012-08-20 at 22:10 +0400, Pavel Emelyanov wrote:
> AFAIU now this is the case for e.g. TCP sockets since they can be created in
> rx code. For packet I've found no places where these stats can be modified other
> than those two in af_packet.c
sock_prot_inuse_add() uses __this_cpu_add(), and on non x86, needs to be
safe against preemption ( or irq if applicable)
Changing spin_lock_bh() to mutex_lock() breaks this.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-20 19:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-20 14:50 [PATCH net-next] packet: Protect packet sk list with mutex Pavel Emelyanov
2012-08-20 15:11 ` Eric Dumazet
2012-08-20 18:10 ` Pavel Emelyanov
2012-08-20 19:05 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).