libnetfilter_queue exiting on big tcp sessions

netfilter.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* libnetfilter_queue exiting on big tcp sessions
@ 2010-11-02 15:46 Rajkumar S
       [not found] ` <AANLkTin_ZFeXkzJ6zELpX3pP3782YfLjHcPzHrjDt1Ae@mail.gmail.com>
  2010-11-03 18:35 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 10+ messages in thread
From: Rajkumar S @ 2010-11-02 15:46 UTC (permalink / raw)
  To: netfilter

Hi all,

I am using latest git checkout of libnetfilter_queue and libnfnetlink
on debian etch with kernel 2.6.26-2-686. The iptables rules used while
testing are:

-A INPUT -s 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
NFQUEUE --queue-num 0
-A OUTPUT -d 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
NFQUEUE --queue-num 0

I am using utils/nfqnl_test.c as my test program and using wget to get
a file from 192.168.3.22 for testing. The program runs okay when
getting smaller files but if number of packets go above say 200
nfqnl_test exits with following message:

hw_protocol=0x0800 hook=1 id=389 hw_src_addr=00:14:2a:c9:e1:5d indev=2
payload_len=1500
entering callback
hw_protocol=0x0800 hook=1 id=390 hw_src_addr=00:14:2a:c9:e1:5d indev=2
payload_len=1500
entering callback
closing library handle

The number of packets to trigger this condition varies from say 200 to
about 1000 and changes with each run.

dmesg does not show any error, the last lines of dmesg are:
[76465.470246] ip_tables: (C) 2000-2006 Netfilter Core Team
[92735.818567] Netfilter messages via NETLINK v0.30.
[92793.863824] nf_conntrack version 0.5.0 (6144 buckets, 24576 max)

Before testing with compiled git version I was trying with ubuntu
(lucid) and nfqueue-bindings for python and got the same error.

I am not sure what goes wrong here, I can help with any debug steps to
find out the exact error if required. Any help to locate and fix this
issue is much appreciated.

with regards,

raj

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <AANLkTin_ZFeXkzJ6zELpX3pP3782YfLjHcPzHrjDt1Ae@mail.gmail.com>]

[parent not found: <AANLkTikV4_MD0JZzbvKhSXjL-abMDY7Af_3FTbbTzP33@mail.gmail.com>]

* Re: libnetfilter_queue exiting on big tcp sessions
       [not found]   ` <AANLkTikV4_MD0JZzbvKhSXjL-abMDY7Af_3FTbbTzP33@mail.gmail.com>
@ 2010-11-02 17:51     ` Mistick Levi
  2010-11-03  1:53       ` Justin Yaple
  0 siblings, 1 reply; 10+ messages in thread
From: Mistick Levi @ 2010-11-02 17:51 UTC (permalink / raw)
  To: Rajkumar S; +Cc: netfilter-devel, netfilter

Hi,

This error is kind of showing up alot in this mailing list.. ( I'd
love to hear a response about my  thought on how to solve those
re-occurring mail's, in the last paragraph ).

What's causing this error is that you do not handle packet's fast enough...
meaning that you're callback takes time to finish, therefor it delay
the recv functions.
The bufferspace that is filling up is actually the socket buffer.. the
fd you work the recv function on...
You can tune the socket buffer size, though it won't help because with
time you're buffer will fill up.
and as such you must handle you're packets asap, maybe in a different
thread( if you have multiple cpu's otherwise its kind of a waste).

I hope this mail will be available as an answer to everyone searching
this error on the web.. i know that when i looked for it, i found very
little information.

Maybe this information should be added to the doc's or maybe we could
create a Wiki for netfilter that will help newcomers and solve most of
those problems before they get to the mailing list, thus leaving the
mailing list for new issue's as they arise.


Kind Regards
Yechiel Levi

On Tue, Nov 2, 2010 at 7:30 PM, Rajkumar S <rajkumars@gmail.com> wrote:
> Hi,
>
> Thanks for the reply, you were spot on. I removed && rv >= 0 and now
> it's working fine.
>
> btw, what could have caused buffer space unavailable error?
>
> Thanks and regards,
>
> raj
>
> On Tue, Nov 2, 2010 at 10:30 PM, Mistick Levi <gmistick@gmail.com> wrote:
>> Hi,
>>
>> Well, if you didn't change the nfqnl_test program at all, what i think
>> happend is that you got : buffer space unavailable error...
>>
>> meaning that in you're loop ( "        while ((rv = recv(fd, buf,
>> sizeof(buf), 0)) && rv >= 0) "
>> you get rv < 0, and then you exit properly.
>> You could ignore this "recv error" and just continue on packeting.
>>
>> Try removing the "( && rv >= 0 )  ,and let us know if it helped.
>>
>> Kind Regards,
>> Yechiel Levi
>>
>> On Tue, Nov 2, 2010 at 5:46 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>>> Hi all,
>>>
>>> I am using latest git checkout of libnetfilter_queue and libnfnetlink
>>> on debian etch with kernel 2.6.26-2-686. The iptables rules used while
>>> testing are:
>>>
>>> -A INPUT -s 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>> NFQUEUE --queue-num 0
>>> -A OUTPUT -d 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>> NFQUEUE --queue-num 0
>>>
>>> I am using utils/nfqnl_test.c as my test program and using wget to get
>>> a file from 192.168.3.22 for testing. The program runs okay when
>>> getting smaller files but if number of packets go above say 200
>>> nfqnl_test exits with following message:
>>>
>>> hw_protocol=0x0800 hook=1 id=389 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>> payload_len=1500
>>> entering callback
>>> hw_protocol=0x0800 hook=1 id=390 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>> payload_len=1500
>>> entering callback
>>> closing library handle
>>>
>>> The number of packets to trigger this condition varies from say 200 to
>>> about 1000 and changes with each run.
>>>
>>> dmesg does not show any error, the last lines of dmesg are:
>>> [76465.470246] ip_tables: (C) 2000-2006 Netfilter Core Team
>>> [92735.818567] Netfilter messages via NETLINK v0.30.
>>> [92793.863824] nf_conntrack version 0.5.0 (6144 buckets, 24576 max)
>>>
>>> Before testing with compiled git version I was trying with ubuntu
>>> (lucid) and nfqueue-bindings for python and got the same error.
>>>
>>> I am not sure what goes wrong here, I can help with any debug steps to
>>> find out the exact error if required. Any help to locate and fix this
>>> issue is much appreciated.
>>>
>>> with regards,
>>>
>>> raj
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-02 17:51     ` Mistick Levi
@ 2010-11-03  1:53       ` Justin Yaple
  2010-11-03  5:06         ` Mistick Levi
  0 siblings, 1 reply; 10+ messages in thread
From: Justin Yaple @ 2010-11-03  1:53 UTC (permalink / raw)
  To: Mistick Levi; +Cc: Rajkumar S, netfilter-devel, netfilter

Yechiel,

Thank you so much I have been trying to figure this exact issue out
for the past few day and was getting nowhere.  Your suggestion to
ignore the error and keep going worked fine but not sure if its the
best solution.

On Tue, Nov 2, 2010 at 10:51 AM, Mistick Levi <gmistick@gmail.com> wrote:
> Hi,
>
> This error is kind of showing up alot in this mailing list.. ( I'd
> love to hear a response about my  thought on how to solve those
> re-occurring mail's, in the last paragraph ).
>
> What's causing this error is that you do not handle packet's fast enough...
> meaning that you're callback takes time to finish, therefor it delay
> the recv functions.
> The bufferspace that is filling up is actually the socket buffer.. the
> fd you work the recv function on...
> You can tune the socket buffer size, though it won't help because with
> time you're buffer will fill up.
> and as such you must handle you're packets asap, maybe in a different
> thread( if you have multiple cpu's otherwise its kind of a waste).

My application already using multiple threads one to recv() packets
from the queue and then one or several others to do more advanced
processing (ie compression/optimizations on TCP segments).  When any
more than 1 TCP session was being processed I would get rv = -1.

My project allows packets that don't need advanced processing to
bypass this in the queue handler function so it can causes packets to
be returned to the queue out of order when there is more than 1
session involved.  These could be SYN packets for new sessions that
dont carry any data so bypass the internal processing queues
altogether.

Is this the problem here that packets being nfq_set_verdict() out of
queue order causing the rv = -1?  I am handling packets in a manner
than ensures that is processes TCP sessions in sequence but not
necessarily in the order received by the queue.

Perhaps there is a nfq_set_verdict() that would inform the queue that
the process is holding this packet so go ahead and move on to the next
packet in the queue?

Again thanks,
Justin.

>
> I hope this mail will be available as an answer to everyone searching
> this error on the web.. i know that when i looked for it, i found very
> little information.
>
> Maybe this information should be added to the doc's or maybe we could
> create a Wiki for netfilter that will help newcomers and solve most of
> those problems before they get to the mailing list, thus leaving the
> mailing list for new issue's as they arise.
>
>
> Kind Regards
> Yechiel Levi
>
> On Tue, Nov 2, 2010 at 7:30 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>> Hi,
>>
>> Thanks for the reply, you were spot on. I removed && rv >= 0 and now
>> it's working fine.
>>
>> btw, what could have caused buffer space unavailable error?
>>
>> Thanks and regards,
>>
>> raj
>>
>> On Tue, Nov 2, 2010 at 10:30 PM, Mistick Levi <gmistick@gmail.com> wrote:
>>> Hi,
>>>
>>> Well, if you didn't change the nfqnl_test program at all, what i think
>>> happend is that you got : buffer space unavailable error...
>>>
>>> meaning that in you're loop ( "        while ((rv = recv(fd, buf,
>>> sizeof(buf), 0)) && rv >= 0) "
>>> you get rv < 0, and then you exit properly.
>>> You could ignore this "recv error" and just continue on packeting.
>>>
>>> Try removing the "( && rv >= 0 )  ,and let us know if it helped.
>>>
>>> Kind Regards,
>>> Yechiel Levi
>>>
>>> On Tue, Nov 2, 2010 at 5:46 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I am using latest git checkout of libnetfilter_queue and libnfnetlink
>>>> on debian etch with kernel 2.6.26-2-686. The iptables rules used while
>>>> testing are:
>>>>
>>>> -A INPUT -s 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>>> NFQUEUE --queue-num 0
>>>> -A OUTPUT -d 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>>> NFQUEUE --queue-num 0
>>>>
>>>> I am using utils/nfqnl_test.c as my test program and using wget to get
>>>> a file from 192.168.3.22 for testing. The program runs okay when
>>>> getting smaller files but if number of packets go above say 200
>>>> nfqnl_test exits with following message:
>>>>
>>>> hw_protocol=0x0800 hook=1 id=389 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>>> payload_len=1500
>>>> entering callback
>>>> hw_protocol=0x0800 hook=1 id=390 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>>> payload_len=1500
>>>> entering callback
>>>> closing library handle
>>>>
>>>> The number of packets to trigger this condition varies from say 200 to
>>>> about 1000 and changes with each run.
>>>>
>>>> dmesg does not show any error, the last lines of dmesg are:
>>>> [76465.470246] ip_tables: (C) 2000-2006 Netfilter Core Team
>>>> [92735.818567] Netfilter messages via NETLINK v0.30.
>>>> [92793.863824] nf_conntrack version 0.5.0 (6144 buckets, 24576 max)
>>>>
>>>> Before testing with compiled git version I was trying with ubuntu
>>>> (lucid) and nfqueue-bindings for python and got the same error.
>>>>
>>>> I am not sure what goes wrong here, I can help with any debug steps to
>>>> find out the exact error if required. Any help to locate and fix this
>>>> issue is much appreciated.
>>>>
>>>> with regards,
>>>>
>>>> raj
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-03  1:53       ` Justin Yaple
@ 2010-11-03  5:06         ` Mistick Levi
  2010-11-03 18:42           ` Justin Yaple
  0 siblings, 1 reply; 10+ messages in thread
From: Mistick Levi @ 2010-11-03  5:06 UTC (permalink / raw)
  To: Justin Yaple; +Cc: Rajkumar S, netfilter-devel, netfilter

Justin,

Is you're recv error(-1) print's : bufferspace unavailable?  If so,
the problem is not with the tcp sequence, and the multiple tcp
sessions, it is again with the face that you're process take's alot of
time..
I don't recommend using more threads than you're cpu count, and
ofcourse try using a thread pool..
 (If someone has better knowledge of schedulers in userland, please
input here ).

even if you are dispatching the packet to a different thread to work /
queue .( as in thread pool ), maybe the throughput you are getting is
way over you're processing capability.
(Also, if anyone was successful in doing deep processing on packets
with libnetfilter_queue, and was successful in outstanding big
throughputs, you're input will be appreciated here )

If you want, you can increase the buffer space size, and i said in my
previous mail, but it will not solve the issue, since over time / more
tcp sessions, even that buffer will fill up. Increasing the buffer can
be done via: "nfnl_rcvbufsiz(nfq_nfnlh(my_nfq_handle),
NFQ_NF_BUFSIZE);" ..

NOTE: if you write to hdd while processing, don't / use an advanced
flushing method's in order to avoid writing to disk so much... for
example try using syslog... )

Kind regards
Yechiel Levi.

On Wed, Nov 3, 2010 at 3:53 AM, Justin Yaple <yaplej@gmail.com> wrote:
> Yechiel,
>
> Thank you so much I have been trying to figure this exact issue out
> for the past few day and was getting nowhere.  Your suggestion to
> ignore the error and keep going worked fine but not sure if its the
> best solution.
>
> On Tue, Nov 2, 2010 at 10:51 AM, Mistick Levi <gmistick@gmail.com> wrote:
>> Hi,
>>
>> This error is kind of showing up alot in this mailing list.. ( I'd
>> love to hear a response about my  thought on how to solve those
>> re-occurring mail's, in the last paragraph ).
>>
>> What's causing this error is that you do not handle packet's fast enough...
>> meaning that you're callback takes time to finish, therefor it delay
>> the recv functions.
>> The bufferspace that is filling up is actually the socket buffer.. the
>> fd you work the recv function on...
>> You can tune the socket buffer size, though it won't help because with
>> time you're buffer will fill up.
>> and as such you must handle you're packets asap, maybe in a different
>> thread( if you have multiple cpu's otherwise its kind of a waste).
>
> My application already using multiple threads one to recv() packets
> from the queue and then one or several others to do more advanced
> processing (ie compression/optimizations on TCP segments).  When any
> more than 1 TCP session was being processed I would get rv = -1.
>
> My project allows packets that don't need advanced processing to
> bypass this in the queue handler function so it can causes packets to
> be returned to the queue out of order when there is more than 1
> session involved.  These could be SYN packets for new sessions that
> dont carry any data so bypass the internal processing queues
> altogether.
>
> Is this the problem here that packets being nfq_set_verdict() out of
> queue order causing the rv = -1?  I am handling packets in a manner
> than ensures that is processes TCP sessions in sequence but not
> necessarily in the order received by the queue.
>
> Perhaps there is a nfq_set_verdict() that would inform the queue that
> the process is holding this packet so go ahead and move on to the next
> packet in the queue?
>
> Again thanks,
> Justin.
>
>>
>> I hope this mail will be available as an answer to everyone searching
>> this error on the web.. i know that when i looked for it, i found very
>> little information.
>>
>> Maybe this information should be added to the doc's or maybe we could
>> create a Wiki for netfilter that will help newcomers and solve most of
>> those problems before they get to the mailing list, thus leaving the
>> mailing list for new issue's as they arise.
>>
>>
>> Kind Regards
>> Yechiel Levi
>>
>> On Tue, Nov 2, 2010 at 7:30 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>>> Hi,
>>>
>>> Thanks for the reply, you were spot on. I removed && rv >= 0 and now
>>> it's working fine.
>>>
>>> btw, what could have caused buffer space unavailable error?
>>>
>>> Thanks and regards,
>>>
>>> raj
>>>
>>> On Tue, Nov 2, 2010 at 10:30 PM, Mistick Levi <gmistick@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Well, if you didn't change the nfqnl_test program at all, what i think
>>>> happend is that you got : buffer space unavailable error...
>>>>
>>>> meaning that in you're loop ( "        while ((rv = recv(fd, buf,
>>>> sizeof(buf), 0)) && rv >= 0) "
>>>> you get rv < 0, and then you exit properly.
>>>> You could ignore this "recv error" and just continue on packeting.
>>>>
>>>> Try removing the "( && rv >= 0 )  ,and let us know if it helped.
>>>>
>>>> Kind Regards,
>>>> Yechiel Levi
>>>>
>>>> On Tue, Nov 2, 2010 at 5:46 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I am using latest git checkout of libnetfilter_queue and libnfnetlink
>>>>> on debian etch with kernel 2.6.26-2-686. The iptables rules used while
>>>>> testing are:
>>>>>
>>>>> -A INPUT -s 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>>>> NFQUEUE --queue-num 0
>>>>> -A OUTPUT -d 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>>>> NFQUEUE --queue-num 0
>>>>>
>>>>> I am using utils/nfqnl_test.c as my test program and using wget to get
>>>>> a file from 192.168.3.22 for testing. The program runs okay when
>>>>> getting smaller files but if number of packets go above say 200
>>>>> nfqnl_test exits with following message:
>>>>>
>>>>> hw_protocol=0x0800 hook=1 id=389 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>>>> payload_len=1500
>>>>> entering callback
>>>>> hw_protocol=0x0800 hook=1 id=390 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>>>> payload_len=1500
>>>>> entering callback
>>>>> closing library handle
>>>>>
>>>>> The number of packets to trigger this condition varies from say 200 to
>>>>> about 1000 and changes with each run.
>>>>>
>>>>> dmesg does not show any error, the last lines of dmesg are:
>>>>> [76465.470246] ip_tables: (C) 2000-2006 Netfilter Core Team
>>>>> [92735.818567] Netfilter messages via NETLINK v0.30.
>>>>> [92793.863824] nf_conntrack version 0.5.0 (6144 buckets, 24576 max)
>>>>>
>>>>> Before testing with compiled git version I was trying with ubuntu
>>>>> (lucid) and nfqueue-bindings for python and got the same error.
>>>>>
>>>>> I am not sure what goes wrong here, I can help with any debug steps to
>>>>> find out the exact error if required. Any help to locate and fix this
>>>>> issue is much appreciated.
>>>>>
>>>>> with regards,
>>>>>
>>>>> raj
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-03  5:06         ` Mistick Levi
@ 2010-11-03 18:42           ` Justin Yaple
  2010-11-03 18:55             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Justin Yaple @ 2010-11-03 18:42 UTC (permalink / raw)
  To: Mistick Levi; +Cc: Rajkumar S, netfilter-devel, netfilter

Yechiel,

Again thank you this seems like a better fix than ignoring the error.
I had set the queue length to 1024 so it makes sense that the
buffersize would also need to be adjusted to acomidate the new queue
length.  I have not figured out how to access the error message along
with the error number so I dont know for sure what it was.  Its
probably safe to guess that because I was using the default buffer
space it was "bufferspace unavailible".

#define BUFSIZE 4096 // Size of buffer used to store IP packets.
#define NFQLENGTH 1024 // Length of the netfilter queue.

nfnl_rcvbufsiz(nfq_nfnlh(h), NFQLENGTH * BUFSIZE);

So yes the kernel is sending packets to the queue faster than my
application was processing them but I expected that.  The buffer does
need to be adjusted so the queue can hold those packets until they can
be processed.  A single session did not run into this issue because
the default buffer size is large enough to hold any outstanding
packets of a single session but once multiple sessions were involved
the queue would fill up quickly.

After that change I was able to handle running 32 parallel iperf
connections.  The total throughput was ~20Mb.  Without sending traffic
through my application the throughput was ~138Mb but given that all my
test systems are running as VMs on the same system with only 4GB of
ram it does not supprise me to see a big hit in performance there.
Should see better results running on dedicated hardware.

Thanks,
Justin.

On Tue, Nov 2, 2010 at 10:06 PM, Mistick Levi <gmistick@gmail.com> wrote:
> Justin,
>
> Is you're recv error(-1) print's : bufferspace unavailable?  If so,
> the problem is not with the tcp sequence, and the multiple tcp
> sessions, it is again with the face that you're process take's alot of
> time..
> I don't recommend using more threads than you're cpu count, and
> ofcourse try using a thread pool..
>  (If someone has better knowledge of schedulers in userland, please
> input here ).
>
> even if you are dispatching the packet to a different thread to work /
> queue .( as in thread pool ), maybe the throughput you are getting is
> way over you're processing capability.
> (Also, if anyone was successful in doing deep processing on packets
> with libnetfilter_queue, and was successful in outstanding big
> throughputs, you're input will be appreciated here )
>
> If you want, you can increase the buffer space size, and i said in my
> previous mail, but it will not solve the issue, since over time / more
> tcp sessions, even that buffer will fill up. Increasing the buffer can
> be done via: "nfnl_rcvbufsiz(nfq_nfnlh(my_nfq_handle),
> NFQ_NF_BUFSIZE);" ..
>
> NOTE: if you write to hdd while processing, don't / use an advanced
> flushing method's in order to avoid writing to disk so much... for
> example try using syslog... )
>
> Kind regards
> Yechiel Levi.
>
> On Wed, Nov 3, 2010 at 3:53 AM, Justin Yaple <yaplej@gmail.com> wrote:
>> Yechiel,
>>
>> Thank you so much I have been trying to figure this exact issue out
>> for the past few day and was getting nowhere.  Your suggestion to
>> ignore the error and keep going worked fine but not sure if its the
>> best solution.
>>
>> On Tue, Nov 2, 2010 at 10:51 AM, Mistick Levi <gmistick@gmail.com> wrote:
>>> Hi,
>>>
>>> This error is kind of showing up alot in this mailing list.. ( I'd
>>> love to hear a response about my  thought on how to solve those
>>> re-occurring mail's, in the last paragraph ).
>>>
>>> What's causing this error is that you do not handle packet's fast enough...
>>> meaning that you're callback takes time to finish, therefor it delay
>>> the recv functions.
>>> The bufferspace that is filling up is actually the socket buffer.. the
>>> fd you work the recv function on...
>>> You can tune the socket buffer size, though it won't help because with
>>> time you're buffer will fill up.
>>> and as such you must handle you're packets asap, maybe in a different
>>> thread( if you have multiple cpu's otherwise its kind of a waste).
>>
>> My application already using multiple threads one to recv() packets
>> from the queue and then one or several others to do more advanced
>> processing (ie compression/optimizations on TCP segments).  When any
>> more than 1 TCP session was being processed I would get rv = -1.
>>
>> My project allows packets that don't need advanced processing to
>> bypass this in the queue handler function so it can causes packets to
>> be returned to the queue out of order when there is more than 1
>> session involved.  These could be SYN packets for new sessions that
>> dont carry any data so bypass the internal processing queues
>> altogether.
>>
>> Is this the problem here that packets being nfq_set_verdict() out of
>> queue order causing the rv = -1?  I am handling packets in a manner
>> than ensures that is processes TCP sessions in sequence but not
>> necessarily in the order received by the queue.
>>
>> Perhaps there is a nfq_set_verdict() that would inform the queue that
>> the process is holding this packet so go ahead and move on to the next
>> packet in the queue?
>>
>> Again thanks,
>> Justin.
>>
>>>
>>> I hope this mail will be available as an answer to everyone searching
>>> this error on the web.. i know that when i looked for it, i found very
>>> little information.
>>>
>>> Maybe this information should be added to the doc's or maybe we could
>>> create a Wiki for netfilter that will help newcomers and solve most of
>>> those problems before they get to the mailing list, thus leaving the
>>> mailing list for new issue's as they arise.
>>>
>>>
>>> Kind Regards
>>> Yechiel Levi
>>>
>>> On Tue, Nov 2, 2010 at 7:30 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Thanks for the reply, you were spot on. I removed && rv >= 0 and now
>>>> it's working fine.
>>>>
>>>> btw, what could have caused buffer space unavailable error?
>>>>
>>>> Thanks and regards,
>>>>
>>>> raj
>>>>
>>>> On Tue, Nov 2, 2010 at 10:30 PM, Mistick Levi <gmistick@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Well, if you didn't change the nfqnl_test program at all, what i think
>>>>> happend is that you got : buffer space unavailable error...
>>>>>
>>>>> meaning that in you're loop ( "        while ((rv = recv(fd, buf,
>>>>> sizeof(buf), 0)) && rv >= 0) "
>>>>> you get rv < 0, and then you exit properly.
>>>>> You could ignore this "recv error" and just continue on packeting.
>>>>>
>>>>> Try removing the "( && rv >= 0 )  ,and let us know if it helped.
>>>>>
>>>>> Kind Regards,
>>>>> Yechiel Levi
>>>>>
>>>>> On Tue, Nov 2, 2010 at 5:46 PM, Rajkumar S <rajkumars@gmail.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I am using latest git checkout of libnetfilter_queue and libnfnetlink
>>>>>> on debian etch with kernel 2.6.26-2-686. The iptables rules used while
>>>>>> testing are:
>>>>>>
>>>>>> -A INPUT -s 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>>>>> NFQUEUE --queue-num 0
>>>>>> -A OUTPUT -d 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
>>>>>> NFQUEUE --queue-num 0
>>>>>>
>>>>>> I am using utils/nfqnl_test.c as my test program and using wget to get
>>>>>> a file from 192.168.3.22 for testing. The program runs okay when
>>>>>> getting smaller files but if number of packets go above say 200
>>>>>> nfqnl_test exits with following message:
>>>>>>
>>>>>> hw_protocol=0x0800 hook=1 id=389 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>>>>> payload_len=1500
>>>>>> entering callback
>>>>>> hw_protocol=0x0800 hook=1 id=390 hw_src_addr=00:14:2a:c9:e1:5d indev=2
>>>>>> payload_len=1500
>>>>>> entering callback
>>>>>> closing library handle
>>>>>>
>>>>>> The number of packets to trigger this condition varies from say 200 to
>>>>>> about 1000 and changes with each run.
>>>>>>
>>>>>> dmesg does not show any error, the last lines of dmesg are:
>>>>>> [76465.470246] ip_tables: (C) 2000-2006 Netfilter Core Team
>>>>>> [92735.818567] Netfilter messages via NETLINK v0.30.
>>>>>> [92793.863824] nf_conntrack version 0.5.0 (6144 buckets, 24576 max)
>>>>>>
>>>>>> Before testing with compiled git version I was trying with ubuntu
>>>>>> (lucid) and nfqueue-bindings for python and got the same error.
>>>>>>
>>>>>> I am not sure what goes wrong here, I can help with any debug steps to
>>>>>> find out the exact error if required. Any help to locate and fix this
>>>>>> issue is much appreciated.
>>>>>>
>>>>>> with regards,
>>>>>>
>>>>>> raj
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netfilter" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-03 18:42           ` Justin Yaple
@ 2010-11-03 18:55             ` Eric Dumazet
  2010-11-03 19:51               ` Justin Yaple
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2010-11-03 18:55 UTC (permalink / raw)
  To: Justin Yaple; +Cc: Mistick Levi, Rajkumar S, netfilter-devel, netfilter

Le mercredi 03 novembre 2010 à 11:42 -0700, Justin Yaple a écrit :
> Yechiel,
> 
> Again thank you this seems like a better fix than ignoring the error.
> I had set the queue length to 1024 so it makes sense that the
> buffersize would also need to be adjusted to acomidate the new queue
> length.  I have not figured out how to access the error message along
> with the error number so I dont know for sure what it was.  Its
> probably safe to guess that because I was using the default buffer
> space it was "bufferspace unavailible".
> 
> #define BUFSIZE 4096 // Size of buffer used to store IP packets.
> #define NFQLENGTH 1024 // Length of the netfilter queue.
> 
> nfnl_rcvbufsiz(nfq_nfnlh(h), NFQLENGTH * BUFSIZE);
> 
> So yes the kernel is sending packets to the queue faster than my
> application was processing them but I expected that.  The buffer does
> need to be adjusted so the queue can hold those packets until they can
> be processed.  A single session did not run into this issue because
> the default buffer size is large enough to hold any outstanding
> packets of a single session but once multiple sessions were involved
> the queue would fill up quickly.
> 
> After that change I was able to handle running 32 parallel iperf
> connections.  The total throughput was ~20Mb.  Without sending traffic
> through my application the throughput was ~138Mb but given that all my
> test systems are running as VMs on the same system with only 4GB of
> ram it does not supprise me to see a big hit in performance there.
> Should see better results running on dedicated hardware.

If this is running on multi processor machine, you could use several NF
queues (one per cpu).

Eventually also use RPS if your network card is not multiqueue, to
spread tcp flows to different cpus and different queues.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-03 18:55             ` Eric Dumazet
@ 2010-11-03 19:51               ` Justin Yaple
  0 siblings, 0 replies; 10+ messages in thread
From: Justin Yaple @ 2010-11-03 19:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Mistick Levi, Rajkumar S, netfilter-devel, netfilter

On Wed, Nov 3, 2010 at 11:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> If this is running on multi processor machine, you could use several NF
> queues (one per cpu).
I dont know if it would make much of a difference for my application.
Either way the kernel is going to be capable of receiving more traffic
than the application can handle but my application is not meant to
process traffic from high bandwidth connections.  Its designed to
increase performance of low speed WAN connections.  If the virtual
machines are able to handle ~20Mb then thats multiple DS1s right
there.

> Eventually also use RPS if your network card is not multiqueue, to
> spread tcp flows to different cpus and different queues.
I am doing this currently inside the application.  I have read some
papers on using multiple queues for sniffing high-speed circuits but
my application is doing compression, disk IO and eventually
application layer specific processing on the payload of every TCP
segment for each session.  I dont expect that all to happen at 1Gbps.

I would be quite happy if I can get a single system able to handle a
DS3 worth of optimized TCP traffic and extremely happy if I can get an
OC3 worth.  I know without a doubt that my current bottleneck is the
host system.  Running the iperf tests causes near 100% CPU usage on
the host system and the VMs are near ~60% on each of their virtual
CPUs.  Now that the issue with the queue is resolved I will go back to
testing on physical hardware to see what results I get.

The final goal is something similar to the commercially avaliable WAN
accelerators and none of them offer single system solutions for
greater than DS3 connections that I am aware of.  I need to determine
the number of TCP sessions that a single system with a given ammount
of memory for the buffer.  This very issue might be why the commercial
product vendors ask questions about number of users and active TCP
sessions in their planning guides.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-02 15:46 libnetfilter_queue exiting on big tcp sessions Rajkumar S
       [not found] ` <AANLkTin_ZFeXkzJ6zELpX3pP3782YfLjHcPzHrjDt1Ae@mail.gmail.com>
@ 2010-11-03 18:35 ` Pablo Neira Ayuso
  2010-11-05 11:09   ` Alessandro Vesely
  1 sibling, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2010-11-03 18:35 UTC (permalink / raw)
  To: Rajkumar S; +Cc: netfilter

On 02/11/10 16:46, Rajkumar S wrote:
> Hi all,
> 
> I am using latest git checkout of libnetfilter_queue and libnfnetlink
> on debian etch with kernel 2.6.26-2-686. The iptables rules used while
> testing are:
> 
> -A INPUT -s 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
> NFQUEUE --queue-num 0
> -A OUTPUT -d 192.168.3.22/32 -m state --state NEW,ESTABLISHED -j
> NFQUEUE --queue-num 0
> 
> I am using utils/nfqnl_test.c as my test program and using wget to get
> a file from 192.168.3.22 for testing. The program runs okay when
> getting smaller files but if number of packets go above say 200
> nfqnl_test exits with following message:
> 
> hw_protocol=0x0800 hook=1 id=389 hw_src_addr=00:14:2a:c9:e1:5d indev=2
> payload_len=1500
> entering callback
> hw_protocol=0x0800 hook=1 id=390 hw_src_addr=00:14:2a:c9:e1:5d indev=2
> payload_len=1500
> entering callback
> closing library handle
> 
> The number of packets to trigger this condition varies from say 200 to
> about 1000 and changes with each run.
> 
> dmesg does not show any error, the last lines of dmesg are:
> [76465.470246] ip_tables: (C) 2000-2006 Netfilter Core Team
> [92735.818567] Netfilter messages via NETLINK v0.30.
> [92793.863824] nf_conntrack version 0.5.0 (6144 buckets, 24576 max)
> 
> Before testing with compiled git version I was trying with ubuntu
> (lucid) and nfqueue-bindings for python and got the same error.
> 
> I am not sure what goes wrong here, I can help with any debug steps to
> find out the exact error if required. Any help to locate and fix this
> issue is much appreciated.

Please, see:

http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_queue.git;a=commitdiff;h=37791b0eb98c00098a6410f6dedfdce92fc88f3e;hp=c4692e02d4fc804f7aa31f407d7d2f31861753bc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-03 18:35 ` Pablo Neira Ayuso
@ 2010-11-05 11:09   ` Alessandro Vesely
  2010-11-07 20:44     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 10+ messages in thread
From: Alessandro Vesely @ 2010-11-05 11:09 UTC (permalink / raw)
  To: netfilter

[-- Attachment #1: Type: text/plain, Size: 689 bytes --]

On 03/Nov/10 19:35, Pablo Neira Ayuso wrote:
> On 02/11/10 16:46, Rajkumar S wrote:
>>  I am using utils/nfqnl_test.c as my test program
>
> Please, see:
>
> http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_queue.git;a=commitdiff;h=37791b0eb98c00098a6410f6dedfdce92fc88f3e;hp=c4692e02d4fc804f7aa31f407d7d2f31861753bc

Thanks for the improved docs!

For older kernels, would it also help setting something like
-A INPUT -m limit --limit 10/second -j NFQUEUE --queue-num 0?

Would you please also amend nfqnl_test.c?  From this thread I grasp 
that packets that overflowed the queue are still received/ 
transmitted, but am unable to do better than the attached (untested) 
patch.

[-- Attachment #2: nfqnl_test.patch.txt --]
[-- Type: text/plain, Size: 935 bytes --]

--- nfqnl_test.original.c	2009-02-17 20:27:28.000000000 +0100
+++ nfqnl_test.c	2010-11-05 11:24:26.000000000 +0100
@@ -8,6 +8,8 @@
 
 #include <libnetfilter_queue/libnetfilter_queue.h>
 
+#include <errno.h>
+
 /* returns packet id */
 static u_int32_t print_pkt (struct nfq_data *tb)
 {
@@ -115,9 +117,21 @@
 
 	fd = nfq_fd(h);
 
-	while ((rv = recv(fd, buf, sizeof(buf), 0)) && rv >= 0) {
-		printf("pkt received\n");
-		nfq_handle_packet(h, buf, rv);
+	for (;;) {
+		if ((rv = recv(fd, buf, sizeof(buf), 0)) >= 0) {
+			printf("pkt received\n");
+			nfq_handle_packet(h, buf, rv);
+			continue;
+		}
+		/* if the computer is slower than the network the buffer
+		* may fill up. Depending on the application, this error
+		* may be ignored */		
+		if (errno == ENOBUFS) {
+			printf("pkt lost!!\n");
+			continue;
+		}
+		printf("recv failed: errno=%d (%s)\n",
+			errno, strerror(errno));
 	}
 
 	printf("unbinding from queue 0\n");

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: libnetfilter_queue exiting on big tcp sessions
  2010-11-05 11:09   ` Alessandro Vesely
@ 2010-11-07 20:44     ` Pablo Neira Ayuso
  0 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2010-11-07 20:44 UTC (permalink / raw)
  To: Alessandro Vesely; +Cc: netfilter, Mistick Levi

On 05/11/10 12:09, Alessandro Vesely wrote:
> On 03/Nov/10 19:35, Pablo Neira Ayuso wrote:
>> On 02/11/10 16:46, Rajkumar S wrote:
>>>  I am using utils/nfqnl_test.c as my test program
>>
>> Please, see:
>>
>> http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_queue.git;a=commitdiff;h=37791b0eb98c00098a6410f6dedfdce92fc88f3e;hp=c4692e02d4fc804f7aa31f407d7d2f31861753bc
> 
> Thanks for the improved docs!
> 
> For older kernels, would it also help setting something like
> -A INPUT -m limit --limit 10/second -j NFQUEUE --queue-num 0?

I don't want to add that in the docs, sorry. It looks more like a crafty
workaround.

> Would you please also amend nfqnl_test.c?  From this thread I grasp that
> packets that overflowed the queue are still received/ transmitted, but
> am unable to do better than the attached (untested) patch.

I have pushed the following patch, it's based on yours (i have however
respected your credits).

http://git.netfilter.org/cgi-bin/gitweb.cgi?p=libnetfilter_queue.git;a=commit;h=a10a4d9291181a142ff85b0db8f2907cd05b978f

Mistick Levi sent a similar patch in the same timeline, btw.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-11-07 20:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-02 15:46 libnetfilter_queue exiting on big tcp sessions Rajkumar S
     [not found] ` <AANLkTin_ZFeXkzJ6zELpX3pP3782YfLjHcPzHrjDt1Ae@mail.gmail.com>
     [not found]   ` <AANLkTikV4_MD0JZzbvKhSXjL-abMDY7Af_3FTbbTzP33@mail.gmail.com>
2010-11-02 17:51     ` Mistick Levi
2010-11-03  1:53       ` Justin Yaple
2010-11-03  5:06         ` Mistick Levi
2010-11-03 18:42           ` Justin Yaple
2010-11-03 18:55             ` Eric Dumazet
2010-11-03 19:51               ` Justin Yaple
2010-11-03 18:35 ` Pablo Neira Ayuso
2010-11-05 11:09   ` Alessandro Vesely
2010-11-07 20:44     ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).