All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-12 22:12 netfilter QUEUE target and packet socket interactions buggy or not Nuutti Kotivuori
@ 2005-09-12 22:11 ` David S. Miller
  2005-09-12 22:34   ` Nuutti Kotivuori
  2005-09-14 11:20 ` Nuutti Kotivuori
  1 sibling, 1 reply; 21+ messages in thread
From: David S. Miller @ 2005-09-12 22:11 UTC (permalink / raw)
  To: naked; +Cc: linux-kernel

From: Nuutti Kotivuori <naked@iki.fi>
Date: Tue, 13 Sep 2005 01:12:26 +0300

> ,----
> | Unable to handle kernel NULL pointer dereference at virtual address 00000018
> | ...
> |         __kfree_skb+0xf4/0xf7
> |  [<c02c3188>] packet_rcv+0x2ca/0x2d4
> |  [<f888f792>] bcm5700_start_xmit+0x477/0x4a5 [bcm5700]

Please use the tg3 driver that actually comes with
the kernel :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* netfilter QUEUE target and packet socket interactions buggy or not
@ 2005-09-12 22:12 Nuutti Kotivuori
  2005-09-12 22:11 ` David S. Miller
  2005-09-14 11:20 ` Nuutti Kotivuori
  0 siblings, 2 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-12 22:12 UTC (permalink / raw)
  To: linux-kernel

I am in the process of debugging a kernel panic manifested on a Red
Hat Enterprise Linux 4 under rather difficult conditions. While
investigating this, I came upon a few bits of code that I'd like some
clarification on. However, I will start by describing the problem.

I am getting a consistent kernel panic under specific high load, which
involves heavy use of the netfilter QUEUE target and packet filter. I
will paraphrase the important parts of the backtrace here:

,----
| Unable to handle kernel NULL pointer dereference at virtual address 00000018
| ...
|         __kfree_skb+0xf4/0xf7
|  [<c02c3188>] packet_rcv+0x2ca/0x2d4
|  [<f888f792>] bcm5700_start_xmit+0x477/0x4a5 [bcm5700]
|  [<c01a3a02>] selinux_ipv4_postroute_last+0xf/0x13
| ...
|  [<c028cf66>] dst_output+0xf/0x1a
|  [<c027cfdb>] nf_reinject+0x14d/0x1a9
|  [<f894101e>] ipq_issue_verdict+0x1e/0x2b [ip_queue]
| ...
|  [<c028592c>] netlink_sendmsg+0x254/0x263
|  [<c011dcf5>] __wake_up+0x29/0x3c
|  [<c026b92d>] sock_sendmsg+0xdb/0xf7
| ...
|  [<c0133b04>] unqueue_me+0x73/0x79
|  [<c011dcf5>] __wake_up+0x29/0x3c
|  [<c026d465>] sys_socketcall+0x1c1/0x1dd
|  [<c0125351>] sys_gettimeofday+0x53/0xac
|  [<c02c7377>] syscall_call+0x7/0xb
`----

So what I gather is happening here is that we are in syscall context,
inside that the nf_reinject stuff puts the queued packet decision
received from userspace onwards and it ends up being captured by a
packet socket. And for some reason, the packet ends up being
kfree_skb'd twice.

Two things caught my attention. First of all, there was a relatively
recent fix to ip_queue which had to do with the calling context. I
will copy the rationale here:

,----[ Harald Welte <laforge at netfilter.org> ]
| [NETFILTER]: Fix deadlock with ip_queue and tcp local input path.
| 
| When we have ip_queue being used from LOCAL_IN, then we end up with a
| situation where the verdicts coming back from userspace traverse the TCP
| input path from syscall context.  While this seems to work most of the
| time, there's an ugly deadlock:
| 
| syscall context is interrupted by the timer interrupt.  When the timer
| interrupt leaves, the timer softirq get's scheduled and calls
| tcp_delack_timer() and alike.  They themselves do bh_lock_sock(sk),
| which is already held from somewhere else -> boom.
|
| I've now tested the suggested solution by Patrick McHardy and Herbert
| Xu to simply use local_bh_{en,dis}able().
`----

Second, I went looking at the packet socket code and found this
comment:

,----
| This function makes lazy skb cloning in hope that most of packets
| are discarded by BPF.
| 
| Note tricky part: we DO mangle shared skb! skb->data, skb->len
| and skb->cb are mangled. It works because (and until) packets
| falling here are owned by current CPU. Output packets are cloned
| by dev_queue_xmit_nit(), input packets are processed by net_bh
| sequencially, so that if we return skb to original state on exit,
| we will not harm anyone.
`----

But are those assumptions valid in the obscure case of us being in the
syscall context, receiving a queued packet from userspace? In any
case, by looking at the disassembly and at the stacktrace, it seems
that the incoming skb is not shared and gets dropped by one of the
goto clauses. The crashing call is the kfree_skb at the very end of
the af_packet.c:packet_rcv function.

I am putting this mail here as a heads up if someone manages to
instantly spot what's wrong with this setup. I will continue debugging
the real cause, and eliminating all the possible variables, seeing
whether this is an SMP problem, checking if it can be manifested with
a vanilla kernel and such.

A detailed dump of the crash can be found at:

  https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=118541

Warm fuzzies,
-- Naked


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-12 22:11 ` David S. Miller
@ 2005-09-12 22:34   ` Nuutti Kotivuori
  2005-09-13 10:54     ` Nuutti Kotivuori
  0 siblings, 1 reply; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-12 22:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

David S. Miller wrote:
> From: Nuutti Kotivuori <naked@iki.fi>
> Date: Tue, 13 Sep 2005 01:12:26 +0300
>
>> ,----
>> | Unable to handle kernel NULL pointer dereference at virtual address 00000018
>> | ...
>> |         __kfree_skb+0xf4/0xf7
>> |  [<c02c3188>] packet_rcv+0x2ca/0x2d4
>> |  [<f888f792>] bcm5700_start_xmit+0x477/0x4a5 [bcm5700]
>
> Please use the tg3 driver that actually comes with
> the kernel :-)

The problem also appears with the tg3 driver. In some other crashes,
the traceback looks like:

,----
|         __kfree_skb+0xf4/0xf7
|  [<c02c3188>] packet_rcv+0x2ca/0x2d4
|  [<c0273ca8>] dev_queue_xmit_nit+0xc1/0xd3
|  [<c01a3a02>] selinux_ipv4_postroute_last+0xf/0x13
`----

I am doubtful the network card driver would be at fault here, but
that'll be confirmed once I manage to narrow this down more.

-- Naked


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-12 22:34   ` Nuutti Kotivuori
@ 2005-09-13 10:54     ` Nuutti Kotivuori
  2005-09-13 16:33       ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-13 10:54 UTC (permalink / raw)
  To: linux-kernel

Nuutti Kotivuori wrote:
> David S. Miller wrote:
>> Please use the tg3 driver that actually comes with
>> the kernel :-)

[...]

> I am doubtful the network card driver would be at fault here, but
> that'll be confirmed once I manage to narrow this down more.

Appended here is a backtrace with the tg3 driver. Also, it seems that
the bug cannot be reproduced with uniprocessor, only SMP.

Unable to handle kernel NULL pointer dereference at virtual address 00000018
 printing eip:
c01a387f
*pde = 36ee0001
Oops: 0000 [#1]
SMP
Modules linked in: arpt_mangle arptable_filter arp_tables iptable_filter ip_tables ip_queue parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core sunrpc dm_mod button battery acEIP is at selinux_ip_postroute_last+0x6a/0x1de
eax: 00000000   ebx: 00000000   ecx: f7a91bb0   edx: 00000003
esi: efb8f080   edi: c0455780   ebp: 00000004   esp: f7a91b8c
ds: 007b   es: 007b   ss: 0068
Process dispatcher (pid: 2748, threadinfo=f7a91000 task=f31ed830)
Stack: 00000000 e9011280 00000000 e9bfdb80 00000002 f88a965a 01ade49e 00000000
       00000206 000000f5 f88a983c c026f163 f6878680 f7538b68 c02c3188 000000c7
       00000042 00000206  [<f88a965a>] tg3_start_xmit+0x27e/0x476 [tg3]
 [<f88a983c>] tg3_start_xmit+0x460/0x476 [tg3] __kfree_skb+0xf4/0xf7
 [<c02c3188>] packet_rcv+0x2ca/0x2d4
 [<c011d270>] find_busiest_group+0xf1/0x2e0
 [<c01a3a02>] selinux_ipv4_postroute_last+0xf/0x13
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c027cb23>] nf_iterate+0x40/0x81
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c027ce21>] nf_hook_slow+0x47/0xb4
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c028d116>] ip_finish_output+0x1a5/0x1ae
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c028cf66>] dst_output+0xf/0x1a
 [<c027cfdb>] nf_reinject+0x14d/0x1a9
 [<f891401e>] ipq_issue_verdict+0x1e/0x2b [ip_queue]
 [<f8914676>] ipq_set_verdict+0x53/0x5a [ip_queue]
 [<f891472c>] ipq_receive_peer+0x3d/0x46 [ip_queue]
 [<f891487d>] ipq_rcv_sk+0xfc/0x175 [ip_queue]
 [<c0285b11>] netlink_data_ready+0x14/0x44
 [<c028525b>] netlink_sendskb+0x52/0x6c
 [<c028592c>] netlink_sendmsg+0x254/0x263
 [<c011dcf5>] __wake_up+0x29/0x3c
 [<c026b92d>] sock_sendmsg+0xdb/0xf7
 [<c0285ae9>] netlink_recvmsg+0x1ae/0x1c2
 [<c026ba64>] sock_recvmsg+0xef/0x10c
 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
 [<c02709ba>] verify_iovec+0x76/0xc2
 [<c026d07c>] sys_sendmsg+0x1ee/0x23b
 [<c026b4fe>] move_addr_to_user+0x67/0x7f
 [<c02c7d56>] reschedule_interrupt+0x1a/0x20
 [<c01116de>] sched_clock+0x46/0x73
 [<c011caf1>] finish_task_switch+0x30/0x66
 [<c02c5604>] schedule+0x844/0x87a
 [<c026d465>] sys_socketcall+0x1c1/0x1dd
 [<c0125351>] sys_gettimeofday+0x53/0xac
 [<c02c7377>] syscall_call+0x7/0xb
 [<c02c007b>] unix_release_sock+0x15a/0x201
Code: 89 d3 83 c3 2c 0f 84 8c 01 00 00 8b 44 24 7c 31 c9 8d 54 24 24 e8 df 29 00 00 85 c0 0f 85

-- Naked


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-13 10:54     ` Nuutti Kotivuori
@ 2005-09-13 16:33       ` Patrick McHardy
  2005-09-13 18:22           ` Nuutti Kotivuori
  2005-09-16 13:38         ` Nuutti Kotivuori
  0 siblings, 2 replies; 21+ messages in thread
From: Patrick McHardy @ 2005-09-13 16:33 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: linux-kernel, Netfilter Development Mailinglist

Nuutti Kotivuori wrote:
> 
> Appended here is a backtrace with the tg3 driver. Also, it seems that
> the bug cannot be reproduced with uniprocessor, only SMP.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 00000018

This means inode->i_security was NULL. AFAICT it is only set to NULL in
inode_free_security() when the inode is freed. This shouldn't happen
while the packet is queued since the skb should hold a reference to
the socket on the output path. So it could be some protocol forgetting
to increase the refcnt when taking a reference. What kind of packet
is this? And what kernel version are you running? Until recently
ip_conntrack did some fiddling with skb->sk which could lead to
a packet on the output path with skb->sk set but no reference taken.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-13 16:33       ` Patrick McHardy
@ 2005-09-13 18:22           ` Nuutti Kotivuori
  2005-09-16 13:38         ` Nuutti Kotivuori
  1 sibling, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-13 18:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, linux-kernel

Patrick McHardy wrote:
> Nuutti Kotivuori wrote:
>>
>> Appended here is a backtrace with the tg3 driver. Also, it seems that
>> the bug cannot be reproduced with uniprocessor, only SMP.
>>
>> Unable to handle kernel NULL pointer dereference at virtual address 00000018
>
> This means inode->i_security was NULL. AFAICT it is only set to NULL in
> inode_free_security() when the inode is freed. This shouldn't happen
> while the packet is queued since the skb should hold a reference to
> the socket on the output path. So it could be some protocol forgetting
> to increase the refcnt when taking a reference.

Right.

> What kind of packet is this? And what kernel version are you
> running? Until recently ip_conntrack did some fiddling with skb->sk
> which could lead to a packet on the output path with skb->sk set but
> no reference taken.

This happens on Red Hat Enterprise Linux 4, with a 2.6.9 kernel (with
a gazillion of Red Hat patches in it, latest ones being from 2.6.11)
and the ip_queue patch that adds the bottom-half disabling. I will
know for sure tomorrow, but it seems that it doesn't appear on vanilla
2.6.13.1 or without SMP.

It is very hard to know which packet specifically triggers this. The
machine is under heavy load in general, a lot of packets are handled
via a QUEUE target, and some packets are captured via packet socket.

I will post more details tomorrow, but if you could point me towards
the changes in ip_conntrack that affected this, it would be very
helpful. I could check if they are in the Red Hat kernel and if not,
patch them manually and see if it makes a difference. The problem is
now reproduciable in a couple hours, so it shouldn't be too hard.

Thanks,
-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
@ 2005-09-13 18:22           ` Nuutti Kotivuori
  0 siblings, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-13 18:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: linux-kernel, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Nuutti Kotivuori wrote:
>>
>> Appended here is a backtrace with the tg3 driver. Also, it seems that
>> the bug cannot be reproduced with uniprocessor, only SMP.
>>
>> Unable to handle kernel NULL pointer dereference at virtual address 00000018
>
> This means inode->i_security was NULL. AFAICT it is only set to NULL in
> inode_free_security() when the inode is freed. This shouldn't happen
> while the packet is queued since the skb should hold a reference to
> the socket on the output path. So it could be some protocol forgetting
> to increase the refcnt when taking a reference.

Right.

> What kind of packet is this? And what kernel version are you
> running? Until recently ip_conntrack did some fiddling with skb->sk
> which could lead to a packet on the output path with skb->sk set but
> no reference taken.

This happens on Red Hat Enterprise Linux 4, with a 2.6.9 kernel (with
a gazillion of Red Hat patches in it, latest ones being from 2.6.11)
and the ip_queue patch that adds the bottom-half disabling. I will
know for sure tomorrow, but it seems that it doesn't appear on vanilla
2.6.13.1 or without SMP.

It is very hard to know which packet specifically triggers this. The
machine is under heavy load in general, a lot of packets are handled
via a QUEUE target, and some packets are captured via packet socket.

I will post more details tomorrow, but if you could point me towards
the changes in ip_conntrack that affected this, it would be very
helpful. I could check if they are in the Red Hat kernel and if not,
patch them manually and see if it makes a difference. The problem is
now reproduciable in a couple hours, so it shouldn't be too hard.

Thanks,
-- Naked


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-13 18:22           ` Nuutti Kotivuori
  (?)
@ 2005-09-14  2:52           ` Patrick McHardy
  2005-09-14  8:31             ` Nuutti Kotivuori
  -1 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2005-09-14  2:52 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]

Nuutti Kotivuori wrote:
> Patrick McHardy wrote:
> 
>>What kind of packet is this? And what kernel version are you
>>running? Until recently ip_conntrack did some fiddling with skb->sk
>>which could lead to a packet on the output path with skb->sk set but
>>no reference taken.
> 
> This happens on Red Hat Enterprise Linux 4, with a 2.6.9 kernel (with
> a gazillion of Red Hat patches in it, latest ones being from 2.6.11)
> and the ip_queue patch that adds the bottom-half disabling. I will
> know for sure tomorrow, but it seems that it doesn't appear on vanilla
> 2.6.13.1 or without SMP.

Hmm .. I don't want to spend time fixing bugs already fixed, so it
would be good if you could confirm that the bug still exists in the
current vanilla kernel.

> It is very hard to know which packet specifically triggers this. The
> machine is under heavy load in general, a lot of packets are handled
> via a QUEUE target, and some packets are captured via packet socket.

It happens when reinjecting the packet, adding some debug code to
ipq_issue_verdict should work.

> I will post more details tomorrow, but if you could point me towards
> the changes in ip_conntrack that affected this, it would be very
> helpful. I could check if they are in the Red Hat kernel and if not,
> patch them manually and see if it makes a difference. The problem is
> now reproduciable in a couple hours, so it shouldn't be too hard.

I've attached the patch.

[-- Attachment #2: X --]
[-- Type: text/plain, Size: 1727 bytes --]

[NETFILTER]: Do not be clever about SKB ownership in ip_ct_gather_frags().

Just do an skb_orphan() and be done with it.
Based upon discussions with Herbert Xu on netdev.

Signed-off-by: David S. Miller <davem@davemloft.net>

---
commit 8be58932ca596972e4953ae980d8bc286857cae8
tree 44ee4e92a652bdbc3f3f368bc8f253ce9539a13a
parent d9fa0f392b20b2b8e3df379c44194492a2446c6e
author David S. Miller <davem@davemloft.net> Thu, 19 May 2005 12:36:33 -0700
committer David S. Miller <davem@davemloft.net> Thu, 19 May 2005 12:36:33 -0700

 net/ipv4/netfilter/ip_conntrack_core.c |   28 ++++++++--------------------
 1 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/netfilter/ip_conntrack_core.c b/net/ipv4/netfilter/ip_conntrack_core.c
--- a/net/ipv4/netfilter/ip_conntrack_core.c
+++ b/net/ipv4/netfilter/ip_conntrack_core.c
@@ -940,37 +940,25 @@ void ip_ct_refresh_acct(struct ip_conntr
 struct sk_buff *
 ip_ct_gather_frags(struct sk_buff *skb, u_int32_t user)
 {
-	struct sock *sk = skb->sk;
 #ifdef CONFIG_NETFILTER_DEBUG
 	unsigned int olddebug = skb->nf_debug;
 #endif
 
-	if (sk) {
-		sock_hold(sk);
-		skb_orphan(skb);
-	}
+	skb_orphan(skb);
 
 	local_bh_disable(); 
 	skb = ip_defrag(skb, user);
 	local_bh_enable();
 
-	if (!skb) {
-		if (sk)
-			sock_put(sk);
-		return skb;
-	}
-
-	if (sk) {
-		skb_set_owner_w(skb, sk);
-		sock_put(sk);
-	}
-
-	ip_send_check(skb->nh.iph);
-	skb->nfcache |= NFC_ALTERED;
+	if (skb) {
+		ip_send_check(skb->nh.iph);
+		skb->nfcache |= NFC_ALTERED;
 #ifdef CONFIG_NETFILTER_DEBUG
-	/* Packet path as if nothing had happened. */
-	skb->nf_debug = olddebug;
+		/* Packet path as if nothing had happened. */
+		skb->nf_debug = olddebug;
 #endif
+	}
+
 	return skb;
 }
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-14  2:52           ` Patrick McHardy
@ 2005-09-14  8:31             ` Nuutti Kotivuori
  2005-09-14 12:10               ` Nuutti Kotivuori
  0 siblings, 1 reply; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-14  8:31 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Patrick McHardy wrote:
> Nuutti Kotivuori wrote:
>> Patrick McHardy wrote:
>>
>>> What kind of packet is this? And what kernel version are you
>>> running? Until recently ip_conntrack did some fiddling with skb->sk
>>> which could lead to a packet on the output path with skb->sk set but
>>> no reference taken.
>>
>> This happens on Red Hat Enterprise Linux 4, with a 2.6.9 kernel (with
>> a gazillion of Red Hat patches in it, latest ones being from 2.6.11)
>> and the ip_queue patch that adds the bottom-half disabling. I will
>> know for sure tomorrow, but it seems that it doesn't appear on vanilla
>> 2.6.13.1 or without SMP.
>
> Hmm .. I don't want to spend time fixing bugs already fixed, so it
> would be good if you could confirm that the bug still exists in the
> current vanilla kernel.

The bug does *not* appear on vanilla 2.6.13.1. The test isn't 100%
conclusive, as the kernel configs differ a bit and I had to disable
selinux policy at boot since there were some differences in the
generated policies.

>> It is very hard to know which packet specifically triggers this. The
>> machine is under heavy load in general, a lot of packets are handled
>> via a QUEUE target, and some packets are captured via packet socket.
>
> It happens when reinjecting the packet, adding some debug code to
> ipq_issue_verdict should work.

Just about every packet goes through the QUEUE target and gets
accepted. So the amount of packets being reinjected is really high.

I guess I could catch that specific error in the places it happens and
print a dump of the packet then.

>> I will post more details tomorrow, but if you could point me towards
>> the changes in ip_conntrack that affected this, it would be very
>> helpful. I could check if they are in the Red Hat kernel and if not,
>> patch them manually and see if it makes a difference. The problem is
>> now reproduciable in a couple hours, so it shouldn't be too hard.
>
> I've attached the patch.

Thank you. I should know if it makes a difference within a few hours.

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-12 22:12 netfilter QUEUE target and packet socket interactions buggy or not Nuutti Kotivuori
  2005-09-12 22:11 ` David S. Miller
@ 2005-09-14 11:20 ` Nuutti Kotivuori
  1 sibling, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-14 11:20 UTC (permalink / raw)
  To: linux-kernel

Nuutti Kotivuori wrote:
> I am in the process of debugging a kernel panic manifested on a Red
> Hat Enterprise Linux 4 under rather difficult conditions. While
> investigating this, I came upon a few bits of code that I'd like some
> clarification on. However, I will start by describing the problem.

Just as a heads up, I have now confirmed that the problem does not
happen on vanilla 2.6.13.1. The config is a bit different, but mostly
the same - although I had to disable SELinux due to other reasons, so
that might still be the culprit. Or the bug may have been fixed, or
never existed, in the mainline kernel.

-- Naked



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-14  8:31             ` Nuutti Kotivuori
@ 2005-09-14 12:10               ` Nuutti Kotivuori
  0 siblings, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-14 12:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Nuutti Kotivuori wrote:
> Patrick McHardy wrote:
>> I've attached the patch.
>
> Thank you. I should know if it makes a difference within a few hours.

Same crash with the patch. I will try something else. But in any case,
here is the backtrace:

Unable to handle kernel NULL pointer dereference at virtual address 00000018
 printing eip:
c01a387f
*pde = 366b6001
Oops: 0000 [#1]
SMP
Modules linked in: arpt_mangle arptable_filter arp_tables iptable_filter ip_tables ip_queue parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core
sunrpc dm_mod button batteryEIP is at selinux_ip_postroute_last+0x6a/0x1de
eax: 00000000   ebx: 00000000   ecx: f742fbb0   edx: 00000003
esi: f6e68e80   edi: c0455780   ebp: 00000004   esp: f742fb8c
ds: 007b   es: 007b   ss: 0068
Process dispatcher (pid: 2632, threadinfo=f742f000 task=f602a030)
Stack: 00000000 e8723280 00000000 e9180880 00000002 f88a965a 37f3c49e 00000000
       00000206 000000f3 f88a983c c026f163 e9672a80 f7fd8268 c02c3188 000000ce
        __kfree_skb+0xf4/0xf7
 [<c02c3188>] packet_rcv+0x2ca/0x2d4
 [<c0273ca8>] dev_queue_xmit_nit+0xc1/0xd3
 [<c01a3a02>] selinux_ipv4_postroute_last+0xf/0x13
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c027cb23>] nf_iterate+0x40/0x81
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c027ce21>] nf_hook_slow+0x47/0xb4
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c028d116>] ip_finish_output+0x1a5/0x1ae
 [<c028d11f>] ip_finish_output2+0x0/0x16d
 [<c028cf66>] dst_output+0xf/0x1a
 [<c027cfdb>] nf_reinject+0x14d/0x1a9
 [<f891401e>] ipq_issue_verdict+0x1e/0x2b [ip_queue]
 [<f8914676>] ipq_set_verdict+0x53/0x5a [ip_queue]
 [<f891472c>] ipq_receive_peer+0x3d/0x46 [ip_queue]
 [<f891487d>] ipq_rcv_sk+0xfc/0x175 [ip_queue]
 [<c0285b11>] netlink_data_ready+0x14/0x44
 [<c028525b>] netlink_sendskb+0x52/0x6c
 [<c028592c>] netlink_sendmsg+0x254/0x263
 [<c011dcf5>] __wake_up+0x29/0x3c
 [<c026b92d>] sock_sendmsg+0xdb/0xf7
 [<c0285ae9>] netlink_recvmsg+0x1ae/0x1c2
 [<c0111c12>] mark_offset_tsc+0x285/0x303
 [<c010741a>] handle_IRQ_event+0x25/0x4f
 [<c026ba64>] sock_recvmsg+0xef/0x10c
 [<c011f6ee>] autoremove_wake_function+0x0/0x2d
 [<c02709ba>] verify_iovec+0x76/0xc2
 [<c026d07c>] sys_sendmsg+0x1ee/0x23b
 [<c026b4fe>] move_addr_to_user+0x67/0x7f
 [<c01335b7>] get_futex_key+0x39/0x108
 [<c0133b04>] unqueue_me+0x73/0x79
 [<c014b9b5>] find_extend_vma+0x12/0x4f
 [<c01335b7>] get_futex_key+0x39/0x108
 [<c026d465>] sys_socketcall+0x1c1/0x1dd
 [<c0125351>] sys_gettimeofday+0x53/0xac
 [<c02c7377>] syscall_call+0x7/0xb
 [<c02c007b>] unix_release_sock+0x15a/0x201
Code: 89 d3 83 c3 2c 0f 84 8c 01 00 00 8b 44 24 7c 31 c9 8d 54 24 24 e8 df 29 00 00 85 c0 0f 85 75

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-13 18:22           ` Nuutti Kotivuori
  (?)
  (?)
@ 2005-09-14 12:20           ` Nuutti Kotivuori
  2005-09-15  8:50             ` Nuutti Kotivuori
  2005-09-17 17:59             ` Patrick McHardy
  -1 siblings, 2 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-14 12:20 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Nuutti Kotivuori wrote:
> Patrick McHardy wrote:
>> This means inode->i_security was NULL. AFAICT it is only set to NULL in
>> inode_free_security() when the inode is freed. This shouldn't happen
>> while the packet is queued since the skb should hold a reference to
>> the socket on the output path. So it could be some protocol forgetting
>> to increase the refcnt when taking a reference.
>
> Right.

Okay, I said "right" because I did not have time to read the code and
really understand what's happening ;-)

Now I did - and I hope I got it right. So what you are saying is that
the skb is just fine, but the skb references a socket (skb->sk) and
that socket has been freed, even though the skb still lives? And
because of that, SOCK_INODE(skb->sk->sk_socket)->i_security is NULL,
so accessing sclass from there causes the kernel panic?

If that is correct, then I guess I could add a test to see if
i_security member is NULL - and if so, print out a hexdump of the
packet. Or even, as a really hackish workaround, just return NF_ACCEPT
if that's the case (selinux isn't really needed here, just enabled by
default).

But also, if it is that particular spot that causes this exact crash -
the problem should go away if selinux is disabled, even though the bug
still exists (socket being freed before skb stops referencing it).

Just brainstorming here, so people can point out flaws in my thinking
if they happen to stumble on such. I will investigate further.

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-14 12:20           ` Nuutti Kotivuori
@ 2005-09-15  8:50             ` Nuutti Kotivuori
  2005-09-17 17:59             ` Patrick McHardy
  1 sibling, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-15  8:50 UTC (permalink / raw)
  To: netfilter-devel

Nuutti Kotivuori wrote:
> But also, if it is that particular spot that causes this exact crash -
> the problem should go away if selinux is disabled, even though the bug
> still exists (socket being freed before skb stops referencing it).

The setup crashes even with selinux disabled. The crash is different,
and is preceded by kernel messages. Here it is:

...
ip_queue: full at 1024 entries, dropping packet(s).
printk: 158 messages suppressed.
ip_queue: full at 1024 entries, dropping packet(s).
printk: 153 messages suppressed.
ip_queue: full at 1024 entries, dropping packet(s).
printk: 152 messages suppressed.
<1>Unable to handle kernel NULL pointer dereference at virtual address 000000a4
 printing eip:
f8890da9
*pde = 2e4db001
Oops: 0000 [#1]
SMP
Modules linked in: arpt_mangle arptable_filter arp_tables iptable_filter ip_tables ip_queue parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core sunrpc dm_mod button poll_napi+0x64/0x84
 [<c027eb58>] netpoll_poll+0x30/0x35
 [<c027ecfb>] netpoll_send_skb+0x8f/0x98
 [<f88b5156>] write_msg+0x156/0x16d [netconsole]
 [<f88b5000>] write_msg+0x0/0x16d [netconsole]
 [<c01217c7>] __call_console_drivers+0x36/0x40
 [<c01218df>] call_console_drivers+0xb6/0xd8
 [<c0121bd3>] release_console_sem+0x43/0xa9
 [<c0121b1d>] vprintk+0x136/0x14a
 [<c01219e4>] printk+0xe/0x11
 [<f88f1413>] ipq_enqueue_packet+0xd1/0x143 [ip_queue]
 [<c027cd21>] nf_queue+0x148/0x201
 [<c028a676>] ip_local_deliver_finish+0x0/0x188
 [<c027ce46>] nf_hook_slow+0x6c/0xb4
 [<c028a676>] ip_local_deliver_finish+0x0/0x188
 [<c028a66f>] ip_local_deliver+0x1d9/0x1e0
 [<c028a676>] ip_local_deliver_finish+0x0/0x188
 [<c028ab5c>] ip_rcv+0x35e/0x3ff
 [<c0274569>] netif_receive_skb+0x1f1/0x21f
 [<f8890e27>] tg3_rx+0x2a3/0x3a0 [tg3]
 [<f8890fbd>] tg3_poll+0x99/0x11a [tg3]
 [<c02746f5>] net_rx_action+0x61/0xd8
 [<c0125a5c>] __do_softirq+0x4c/0xb1
 [<c010806d>] do_softirq+0x4f/0x56
 =======================
 [<c0107983>] do_IRQ+0x125/0x130
 [<c02c7d34>] common_interrupt+0x18/0x20
 [<c0104018>] default_idle+0x0/0x2c
 [<c0104041>] default_idle+0x29/0x2c
 [<c010409d>] cpu_idle+0x26/0x3b
 [<c0384784>] start_kernel+0x194/0x198
Code: 01 6b 60 01 e8 3b 83 ac 00 00 00 89 83 a8 00 00 00 76 0e 89 ea 89 d8 b9 9a 0d 89 f8 e8

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-13 16:33       ` Patrick McHardy
  2005-09-13 18:22           ` Nuutti Kotivuori
@ 2005-09-16 13:38         ` Nuutti Kotivuori
  2005-09-17 17:57           ` Patrick McHardy
  2005-09-18  7:41           ` Eric Leblond
  1 sibling, 2 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-16 13:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Just to reiterate, I still haven't been able to reproduce this on the
mainline kernel, only with RHEL 4 kernels. So if you want to skip
this, that's fine.

Patrick McHardy wrote:
> Nuutti Kotivuori wrote:
>>
>> Appended here is a backtrace with the tg3 driver. Also, it seems that
>> the bug cannot be reproduced with uniprocessor, only SMP.
>>
>> Unable to handle kernel NULL pointer dereference at virtual address 00000018
>
> This means inode->i_security was NULL. AFAICT it is only set to NULL in
> inode_free_security() when the inode is freed. This shouldn't happen
> while the packet is queued since the skb should hold a reference to
> the socket on the output path. So it could be some protocol forgetting
> to increase the refcnt when taking a reference. What kind of packet
> is this? And what kernel version are you running? Until recently
> ip_conntrack did some fiddling with skb->sk which could lead to
> a packet on the output path with skb->sk set but no reference taken.

I finally managed to add enough debug dumps to find out what packet it
is. It is a TCP FIN,ACK packet, going outwards, originating from the
machine which crashes. It seems that the TCP FIN,ACK packet get sent
outwards, is caught by the QUEUE target in netfilter, goes to
userspace, comes back, continues onwards, gets rejected by the filter
rule in packet socket and then hits the selinux outbound handler and
at that point, the socket has been freed, so it crashes. Atleast this
is my understanding at the moment. This all is very confusing, though.

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-16 13:38         ` Nuutti Kotivuori
@ 2005-09-17 17:57           ` Patrick McHardy
  2005-09-18  7:27             ` David S. Miller
  2005-09-18  7:41           ` Eric Leblond
  1 sibling, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2005-09-17 17:57 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: Netfilter Development Mailinglist

Nuutti Kotivuori wrote:
> Just to reiterate, I still haven't been able to reproduce this on the
> mainline kernel, only with RHEL 4 kernels. So if you want to skip
> this, that's fine.

I didn't have a chance to look at your mails yet. I'll do so now, maybe
it rings a bell.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-14 12:20           ` Nuutti Kotivuori
  2005-09-15  8:50             ` Nuutti Kotivuori
@ 2005-09-17 17:59             ` Patrick McHardy
  1 sibling, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2005-09-17 17:59 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: Netfilter Development Mailinglist

Nuutti Kotivuori wrote:
> Nuutti Kotivuori wrote:
> 
>>Patrick McHardy wrote:
>>
>>>This means inode->i_security was NULL. AFAICT it is only set to NULL in
>>>inode_free_security() when the inode is freed. This shouldn't happen
>>>while the packet is queued since the skb should hold a reference to
>>>the socket on the output path. So it could be some protocol forgetting
>>>to increase the refcnt when taking a reference.
>>
>>Right.
> 
> Okay, I said "right" because I did not have time to read the code and
> really understand what's happening ;-)
> 
> Now I did - and I hope I got it right. So what you are saying is that
> the skb is just fine, but the skb references a socket (skb->sk) and
> that socket has been freed, even though the skb still lives? And
> because of that, SOCK_INODE(skb->sk->sk_socket)->i_security is NULL,
> so accessing sclass from there causes the kernel panic?

Yes.

> If that is correct, then I guess I could add a test to see if
> i_security member is NULL - and if so, print out a hexdump of the
> packet. Or even, as a really hackish workaround, just return NF_ACCEPT
> if that's the case (selinux isn't really needed here, just enabled by
> default).
> 
> But also, if it is that particular spot that causes this exact crash -
> the problem should go away if selinux is disabled, even though the bug
> still exists (socket being freed before skb stops referencing it).
> 
> Just brainstorming here, so people can point out flaws in my thinking
> if they happen to stumble on such. I will investigate further.

It will probably cause problems in other places as well.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-17 17:57           ` Patrick McHardy
@ 2005-09-18  7:27             ` David S. Miller
  2005-09-18 10:37               ` Nuutti Kotivuori
  0 siblings, 1 reply; 21+ messages in thread
From: David S. Miller @ 2005-09-18  7:27 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, naked

From: Patrick McHardy <kaber@trash.net>
Date: Sat, 17 Sep 2005 19:57:09 +0200

> Nuutti Kotivuori wrote:
> > Just to reiterate, I still haven't been able to reproduce this on the
> > mainline kernel, only with RHEL 4 kernels. So if you want to skip
> > this, that's fine.
> 
> I didn't have a chance to look at your mails yet. I'll do so now, maybe
> it rings a bell.

I think the key datapoint right now is that they have a network
tap on the device active.  There could be something awry in the
AF_PACKET code of dev_queue_xmit_nit(), or similar, that doesn't
do SKB refcounting correctly.

Nuutti is it possible to test without the pcap packet tap active
on the interface?  That would help us enormously in narrowing down
where the problem might be.  If we can definitely put the blame
on the pcap being active, we can probably fix this very fast.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-16 13:38         ` Nuutti Kotivuori
  2005-09-17 17:57           ` Patrick McHardy
@ 2005-09-18  7:41           ` Eric Leblond
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Leblond @ 2005-09-18  7:41 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Le vendredi 16 septembre 2005 à 16:38 +0300, Nuutti Kotivuori a écrit :
> Just to reiterate, I still haven't been able to reproduce this on the
> mainline kernel, only with RHEL 4 kernels. So if you want to skip
> this, that's fine.

It remembers me the problem I had with RHEL4 :

https://lists.netfilter.org/pipermail/netfilter-devel/2005-July/020505.html

I did not have to study the problem in detail but it may be helpful for
your current problem.

BR,

> 
> Patrick McHardy wrote:
> > Nuutti Kotivuori wrote:
> >>
> >> Appended here is a backtrace with the tg3 driver. Also, it seems that
> >> the bug cannot be reproduced with uniprocessor, only SMP.
> >>
> >> Unable to handle kernel NULL pointer dereference at virtual address 00000018
> >
> > This means inode->i_security was NULL. AFAICT it is only set to NULL in
> > inode_free_security() when the inode is freed. This shouldn't happen
> > while the packet is queued since the skb should hold a reference to
> > the socket on the output path. So it could be some protocol forgetting
> > to increase the refcnt when taking a reference. What kind of packet
> > is this? And what kernel version are you running? Until recently
> > ip_conntrack did some fiddling with skb->sk which could lead to
> > a packet on the output path with skb->sk set but no reference taken.
> 
> I finally managed to add enough debug dumps to find out what packet it
> is. It is a TCP FIN,ACK packet, going outwards, originating from the
> machine which crashes. It seems that the TCP FIN,ACK packet get sent
> outwards, is caught by the QUEUE target in netfilter, goes to
> userspace, comes back, continues onwards, gets rejected by the filter
> rule in packet socket and then hits the selinux outbound handler and
> at that point, the socket has been freed, so it crashes. Atleast this
> is my understanding at the moment. This all is very confusing, though.
> 
> -- Naked
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-18  7:27             ` David S. Miller
@ 2005-09-18 10:37               ` Nuutti Kotivuori
  2005-09-19 10:54                 ` Nuutti Kotivuori
  2005-09-19 13:34                 ` Nuutti Kotivuori
  0 siblings, 2 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-18 10:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netfilter-devel, kaber

David S. Miller wrote:
> Nuutti is it possible to test without the pcap packet tap active
> on the interface?  That would help us enormously in narrowing down
> where the problem might be.  If we can definitely put the blame
> on the pcap being active, we can probably fix this very fast.

I'm afraid not. We haven't been able to reproduce the problem in any
other case except the full setup and that depends on having pcap
active. I'll find out on monday if there is some trick that could be
used to run the whole thing without pcap.

I did a separate test earlier, which had a dummy program saying
NF_ACCEPT to every QUEUE packet and TCP streams going both ways with
pcap active - but at that time, I did not realize to test the TCP
streams closing and opening, so there were no TCP FINs transmitted.

Our next step in testing is to modify the application handling the
QUEUE packets to detect TCP FIN,ACK packets and intentionally sleep
for a bit before accepting them - with that we can see if it crashes a
lot sooner that way and pinpoint the problem at something bad
happening while the packets sit in the QUEUE.

If that doesn't help then I'm a bit dubious on managing to reproduce
this without the full setup. Since it requires SMP to crash and may
take 8 hours to crash, this sounds like a nasty timing issue and it
may not be too easy to get similar conditions.

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-18 10:37               ` Nuutti Kotivuori
@ 2005-09-19 10:54                 ` Nuutti Kotivuori
  2005-09-19 13:34                 ` Nuutti Kotivuori
  1 sibling, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-19 10:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netfilter-devel, kaber

Nuutti Kotivuori wrote:
> Our next step in testing is to modify the application handling the
> QUEUE packets to detect TCP FIN,ACK packets and intentionally sleep
> for a bit before accepting them - with that we can see if it crashes a
> lot sooner that way and pinpoint the problem at something bad
> happening while the packets sit in the QUEUE.

Well, this is now done - and makes no difference. So simply letting
the FIN,ACK packets delay in QUEUE does not make it crash.

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfilter QUEUE target and packet socket interactions buggy or not
  2005-09-18 10:37               ` Nuutti Kotivuori
  2005-09-19 10:54                 ` Nuutti Kotivuori
@ 2005-09-19 13:34                 ` Nuutti Kotivuori
  1 sibling, 0 replies; 21+ messages in thread
From: Nuutti Kotivuori @ 2005-09-19 13:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: netfilter-devel, kaber

Nuutti Kotivuori wrote:
> David S. Miller wrote:
>> Nuutti is it possible to test without the pcap packet tap active
>> on the interface?  That would help us enormously in narrowing down
>> where the problem might be.  If we can definitely put the blame
>> on the pcap being active, we can probably fix this very fast.
>
> I'm afraid not. We haven't been able to reproduce the problem in any
> other case except the full setup and that depends on having pcap
> active. I'll find out on monday if there is some trick that could be
> used to run the whole thing without pcap.

Disabling all the uses of packet sockets in the code was deemed an
overwhelming job, but through certain modifications we were able to
run the system such that it did not need the packet socket capturing
at all.

After that I made a patch that disabled all the calls to dev_add_pack
and dev_remove_pack from af_packet.c and verified that with this
setup, tcpdump indeed did not receive any packets.

However, this did not fix the problem. Since no backtrace was
available in this current setup, I could not verify that packet_rcv
was never called, but I am pretty certain that it wasn't.

So I think it is safe to rule out the hypothesis taht the packet
socket might have something to do with it.

-- Naked

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-09-19 13:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-12 22:12 netfilter QUEUE target and packet socket interactions buggy or not Nuutti Kotivuori
2005-09-12 22:11 ` David S. Miller
2005-09-12 22:34   ` Nuutti Kotivuori
2005-09-13 10:54     ` Nuutti Kotivuori
2005-09-13 16:33       ` Patrick McHardy
2005-09-13 18:22         ` Nuutti Kotivuori
2005-09-13 18:22           ` Nuutti Kotivuori
2005-09-14  2:52           ` Patrick McHardy
2005-09-14  8:31             ` Nuutti Kotivuori
2005-09-14 12:10               ` Nuutti Kotivuori
2005-09-14 12:20           ` Nuutti Kotivuori
2005-09-15  8:50             ` Nuutti Kotivuori
2005-09-17 17:59             ` Patrick McHardy
2005-09-16 13:38         ` Nuutti Kotivuori
2005-09-17 17:57           ` Patrick McHardy
2005-09-18  7:27             ` David S. Miller
2005-09-18 10:37               ` Nuutti Kotivuori
2005-09-19 10:54                 ` Nuutti Kotivuori
2005-09-19 13:34                 ` Nuutti Kotivuori
2005-09-18  7:41           ` Eric Leblond
2005-09-14 11:20 ` Nuutti Kotivuori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.