* NFQUEUE stucked packets
@ 2009-03-09 14:46 Kuzin Andrey
2009-03-09 16:30 ` Patrick McHardy
0 siblings, 1 reply; 2+ messages in thread
From: Kuzin Andrey @ 2009-03-09 14:46 UTC (permalink / raw)
To: netfilter-devel
Hello list!
I have similar situation as described in
http://lists.netfilter.org/pipermail/netfilter-devel/2006-January/023153.html
I have network gateway for 700 users with average 30-40 mbit channel
utilization. For traffic acounting using NFQUEUE for realtime
statistics and user MAC address security.
root@darkstar:/# uname -a
Linux darkstar 2.6.28.7-smp #1 SMP Mon Mar 9 14:07:23 MSK 2009
i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux
Yesterday all day queue overflow in about 10 min.
I write special script to discover this situation.
#!/bin/sh
QUEUE_NUM=10
while [ 1 ]; do
TOTAL=
DATE=`date +"%Y-%m-%d %H:%M:%S"`
CAT=`cat /proc/net/netfilter/nfnetlink_queue | grep $QUEUE_NUM`
QUEUE_TOTAL=`echo $CAT | cut -f 3 -d ' '`
echo $CAT $DATE =$QUEUE_TOTAL
sleep 10
done
And have this log:
=======================
10 26197 89 2 65535 0 40 157532 1 2009-03-08 19:18:00 =89
10 26197 80 2 65535 0 99 306347 1 2009-03-08 19:18:11 =80
10 26197 97 2 65535 0 198 453838 1 2009-03-08 19:18:21 =97
10 26197 120 2 65535 0 254 600438 1 2009-03-08 19:18:31 =120
10 26197 162 2 65535 0 254 740278 1 2009-03-08 19:18:41 =162
10 26197 165 2 65535 0 385 886832 1 2009-03-08 19:18:51 =165
10 26197 173 2 65535 0 477 1030473 1 2009-03-08 19:19:01 =173
10 26197 195 2 65535 0 477 1173828 1 2009-03-08 19:19:11 =195
10 26197 199 2 65535 0 613 1320007 1 2009-03-08 19:19:21 =199
......
10 26197 1314 2 65535 0 150480 8682536 1 2009-03-08 19:30:54 =1314
10 26197 1297 2 65535 0 155210 8762215 1 2009-03-08 19:31:04 =1297
10 26197 1395 2 65535 0 159517 8840470 1 2009-03-08 19:31:14 =1395
10 26197 1431 2 65535 0 164050 8917428 1 2009-03-08 19:31:24 =1431
10 26197 1426 2 65535 0 168789 8997451 1 2009-03-08 19:31:34 =1426
10 26197 1440 2 65535 0 173915 9077845 1 2009-03-08 19:31:44 =1440
10 26197 1469 2 65535 0 179015 9152595 1 2009-03-08 19:31:54 =1469
10 26197 1499 2 65535 0 183489 9226571 1 2009-03-08 19:32:04 =1499
10 26197 1564 2 65535 0 188427 9296313 1 2009-03-08 19:32:14 =1564
After about 10-15 min we have very big delays for each packet.
Restart of accounting program flush the queue and starting again:
10 12083 32 2 65535 0 290 129955 1 2009-03-08 19:32:37 =32
10 12083 123 2 65535 0 404 263677 1 2009-03-08 19:32:47 =123
10 12083 192 2 65535 0 577 395122 1 2009-03-08 19:32:57 =192
10 12083 140 2 65535 0 739 524411 1 2009-03-08 19:33:08 =140
10 12083 193 2 65535 0 1209 654960 1 2009-03-08 19:33:18 =193
10 12083 247 2 65535 0 1465 805822 1 2009-03-08 19:33:28 =247
10 12083 283 2 65535 0 1610 945017 1 2009-03-08 19:33:38 =283
10 12083 402 2 65535 0 1775 1073752 1 2009-03-08 19:33:48 =402
.....
10 12083 1358 2 65535 0 49719 4501398 1 2009-03-08 19:39:19 =1358
10 12083 1383 2 65535 0 54400 4578319 1 2009-03-08 19:39:29 =1383
10 12083 1484 2 65535 0 57611 4651952 1 2009-03-08 19:39:39 =1484
10 12083 1433 2 65535 0 61306 4727467 1 2009-03-08 19:39:49 =1433
10 12083 1478 2 65535 0 64507 4800503 1 2009-03-08 19:39:59 =1478
10 12083 1468 2 65535 0 68569 4873713 1 2009-03-08 19:40:09 =1468
10 12083 1516 2 65535 0 72345 4947701 1 2009-03-08 19:40:19 =1516
At night situation was not such terrible..
10 25572 3 2 65535 0 0 81409 1 2009-03-09 04:56:14 =3
10 25572 18 2 65535 0 0 166434 1 2009-03-09 04:56:24 =18
10 25572 36 2 65535 0 73 250703 1 2009-03-09 04:56:34 =36
10 25572 45 2 65535 0 78 346981 1 2009-03-09 04:56:44 =45
10 25572 17 2 65535 0 89 429374 1 2009-03-09 04:56:54 =17
10 25572 12 2 65535 0 89 513713 1 2009-03-09 04:57:04 =12
10 25572 15 2 65535 0 89 597835 1 2009-03-09 04:57:14 =15
.....
10 25572 1252 2 65535 0 3767514 188608304 1 2009-03-09 11:57:04 =1252
10 25572 1246 2 65535 0 3773969 188693799 1 2009-03-09 11:57:14 =1246
10 25572 1237 2 65535 0 3779614 188777734 1 2009-03-09 11:57:24 =1237
10 25572 1261 2 65535 0 3785212 188860586 1 2009-03-09 11:57:34 =1261
10 25572 1242 2 65535 0 3790644 188941483 1 2009-03-09 11:57:44 =1242
10 25572 1266 2 65535 0 3796270 189021111 1 2009-03-09 11:57:54 =1266
10 25572 1283 2 65535 0 3801408 189105046 1 2009-03-09 11:58:04 =1283
10 25572 1278 2 65535 0 3806845 189183468 1 2009-03-09 11:58:14 =1278
10 25572 1242 2 65535 0 3812415 189265729 1 2009-03-09 11:58:24 =1242
10 25572 1242 2 65535 0 3818292 189343670 1 2009-03-09 11:58:34 =1242
As you see after 189 millions of packet we have 1242 stucked packets
in queue.
Today i try to modify kernel to get more information about this:
diff -rcB nfnetlink_queue.c-orig nfnetlink_queue.c
======= cut here ==========
*** nfnetlink_queue.c-orig Sat Feb 21 01:41:27 2009
--- nfnetlink_queue.c Mon Mar 9 17:04:27 2009
***************
*** 169,178 ****
--- 169,181 ----
queue->queue_total++;
}
+ static unsigned int max_dequeue_position = 0;
+
static struct nf_queue_entry *
find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
{
struct nf_queue_entry *entry = NULL, *i;
+ int dequeue_position = 0;
spin_lock_bh(&queue->lock);
***************
*** 180,188 ****
if (i->id == id) {
entry = i;
break;
! }
}
if (entry) {
list_del(&entry->list);
queue->queue_total--;
--- 183,195 ----
if (i->id == id) {
entry = i;
break;
! } else dequeue_position++;
}
+ if (max_dequeue_position < dequeue_position) {
+ max_dequeue_position = dequeue_position;
+ };
+
if (entry) {
list_del(&entry->list);
queue->queue_total--;
***************
*** 860,871 ****
{
const struct nfqnl_instance *inst = v;
! return seq_printf(s, "%5d %6d %5d %1d %5d %5d %5d %8d %2d\n",
inst->queue_num,
! inst->peer_pid, inst->queue_total,
inst->copy_mode, inst->copy_range,
inst->queue_dropped, inst->queue_user_dropped,
! inst->id_sequence, 1);
}
static const struct seq_operations nfqnl_seq_ops = {
--- 867,878 ----
{
const struct nfqnl_instance *inst = v;
! return seq_printf(s, "%5d %6d %5d %5d %1d %5d %5d %5d %8d %2d %5d\n",
inst->queue_num,
! inst->peer_pid, inst->queue_total, inst->queue_maxlen,
inst->copy_mode, inst->copy_range,
inst->queue_dropped, inst->queue_user_dropped,
! inst->id_sequence, 1, max_dequeue_position);
}
static const struct seq_operations nfqnl_seq_ops = {
======= end of cut ==========
And after that i have information in /proc about queue_maxlen and
stucked packets in queue.
Log with this patch:
10 3052 55 20000 2 65535 0 3194 1563090 1 0 2009-03-09 14:22:32 =55
10 3052 0 20000 2 65535 0 3750 1680510 1 0 2009-03-09 14:22:42 =0
10 3052 0 20000 2 65535 0 4666 1806095 1 0 2009-03-09 14:22:52 =0
10 3052 36 20000 2 65535 0 5124 1934143 1 0 2009-03-09 14:23:02 =36
10 3052 0 20000 2 65535 0 5124 2067343 1 0 2009-03-09 14:23:12 =0
10 3052 0 20000 2 65535 0 5662 2185490 1 0 2009-03-09 14:23:22 =0
10 3052 3 20000 2 65535 0 5953 2314004 1 0 2009-03-09 14:23:32 =3
......
10 3052 85 20000 2 65535 0 41306 57183821 1 0 2009-03-09 15:38:46 =85
10 3052 3 20000 2 65535 0 41378 57287667 1 0 2009-03-09 15:38:56 =3
10 3052 23 20000 2 65535 0 41378 57383407 1 1 2009-03-09 15:39:06 =23
10 3052 1 20000 2 65535 0 41387 57488941 1 1 2009-03-09 15:39:16 =1
......
10 3052 36 20000 2 65535 0 45185 66286805 1 1 2009-03-09 15:52:39 =36
10 3052 1 20000 2 65535 0 45643 66403946 1 1 2009-03-09 15:52:49 =1
10 3052 20 20000 2 65535 0 46222 66522173 1 2 2009-03-09 15:52:59 =20
10 3052 41 20000 2 65535 0 46556 66642593 1 2 2009-03-09 15:53:09 =41
......
10 3052 73 20000 2 65535 0 169361 111410441 1 2 2009-03-09 16:48:20 =73
10 3052 61 20000 2 65535 0 169800 111517779 1 2 2009-03-09 16:48:30 =61
10 3052 60 20000 2 65535 0 170111 111626724 1 3 2009-03-09 16:48:40 =60
10 3052 62 20000 2 65535 0 170430 111738973 1 3 2009-03-09 16:48:50 =62
......
10 3052 117 20000 2 65535 0 224668 120418980 1 3 2009-03-09 16:59:42 =117
10 3052 3 20000 2 65535 0 224941 120567191 1 3 2009-03-09 16:59:52 =3
10 3052 101 20000 2 65535 0 225575 120724075 1 4 2009-03-09 17:00:02 =101
10 3052 70 20000 2 65535 0 225693 120886263 1 4 2009-03-09 17:00:12 =70
......
10 3052 6 20000 2 65535 0 365069 145741324 1 6 2009-03-09 17:28:48 =6
10 3052 6 20000 2 65535 0 365287 145925306 1 6 2009-03-09 17:28:58 =6
10 3052 99 20000 2 65535 0 365504 146113439 1 6 2009-03-09 17:29:08 =99
10 3052 95 20000 2 65535 0 365667 146296955 1 6 2009-03-09 17:29:18 =95
10 3052 6 20000 2 65535 0 365823 146489625 1 6 2009-03-09 17:29:28 =6
10 3052 97 20000 2 65535 0 365916 146676339 1 6 2009-03-09 17:29:39 =97
......
Today 3 hours of work with very high load (about 40 mbit and in peaks up
to 60 mbit) we have 6 packets stuck in queue...
How to resolve this problem ?
This is hardware bug, SMP bug, users corrupted packets bug, NFQUEUE
bug or anything else ? How i can view this stucked packets in queue to
try understand what is going wrong ?
(userspace don't have any errors, because this code work normally
from 2004 ip_queue ages, and we have this after rewrite application
for NFQUEUE)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Kuzin Andrey - kuzinandrey@yandex.ru
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: NFQUEUE stucked packets
2009-03-09 14:46 NFQUEUE stucked packets Kuzin Andrey
@ 2009-03-09 16:30 ` Patrick McHardy
0 siblings, 0 replies; 2+ messages in thread
From: Patrick McHardy @ 2009-03-09 16:30 UTC (permalink / raw)
To: Kuzin Andrey; +Cc: netfilter-devel
Kuzin Andrey wrote:
> How to resolve this problem ?
> This is hardware bug, SMP bug, users corrupted packets bug, NFQUEUE
> bug or anything else ? How i can view this stucked packets in queue to
> try understand what is going wrong ?
> (userspace don't have any errors, because this code work normally
> from 2004 ip_queue ages, and we have this after rewrite application
> for NFQUEUE)
The first thing I'd do is make sure that userspace actually delivers
the verdicts its supposed to.
Just keep track of the last packet ID somewhere in the kernel, the
verdict always has to be for the next higher number. If not, the bug
is in userspace.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-03-09 16:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-09 14:46 NFQUEUE stucked packets Kuzin Andrey
2009-03-09 16:30 ` Patrick McHardy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.