* EILSEQ with libnetfilter_conntrack on multi-threaded app
@ 2012-02-08 13:27 abirvalg
2012-02-09 15:54 ` Pablo Neira Ayuso
0 siblings, 1 reply; 2+ messages in thread
From: abirvalg @ 2012-02-08 13:27 UTC (permalink / raw)
To: netfilter-devel
My multi-threaded app makes heavy use of libnetfilter_conntrack.
After running properly for a number of hours, at a certain point which I am not able to reproduce a call to conntrack function does not return for good 10 secs, CPU usage of my process spikes to 80% and running conntrack -L from terminal freezes. When the conntrack function returns with retval EILSEQ, CPU usage drops, conntrack -L unfreezes an dumps the output.
The code in question does:
nfct_query(setmark_handle_out, NFCT_Q_GET, ct_out_udp)
where setmark_handle_out was previously linked to this function
int setmark_out (enum nf_conntrack_msg_type type, struct nf_conntrack *mct,void *data)
{
nfct_set_attr_u32(mct, ATTR_MARK, nfmark_to_set_out);
nfct_query(setmark_handle_out, NFCT_Q_UPDATE, mct); ***
return NFCT_CB_CONTINUE;
}
nfmark_to_set_out is a global variable
***Could this line be the offending one? As I understand, when issuing NFCT_Q_UPDATE, indicating an nfct_handle is just a formality - any handle can be given as an argument, so I'm simply reusing an existing handle.
I really want to get to the bottom of this issue. Please let me know what other actions I can perform to produce some valuable debuginfo.
I'm actually right now keeping the process suspended in gdb, because the issue takes many hours to reproduce.
Here's the link to the offending line 3514 in my project's webgit:
http://leopardflower.git.sourceforge.net/git/gitweb.cgi?p=leopardflower/leopardflower;a=blob;f=lpfw.c;h=c7af69c1def30d1a18e1bf839acbb60064ee3ba2;hb=709b1e87cf17e6e6e9d8a908ad8a6b77359f1d69#l3514
Thanks.
P.S.
Please CC me when responding to this
P.P.S.
I already posted a similar issue on this mailing list
http://marc.info/?t=131827063700008&r=1&w=2
Back then Pablo responded with:
/quote
Regarding the EILSEQ error:
The second parameter of nfct_open must be 0. However, if you use the
same socket for sending commands and receiving events, then you have
to disable sequence tracking, there is a function in libnfnetlink to
do that.
/unquote
My code does call nfct_open with 0.
Is the note above that I marked with *** a case of using the same socket for sending commands and receiving events?
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: EILSEQ with libnetfilter_conntrack on multi-threaded app
2012-02-08 13:27 EILSEQ with libnetfilter_conntrack on multi-threaded app abirvalg
@ 2012-02-09 15:54 ` Pablo Neira Ayuso
0 siblings, 0 replies; 2+ messages in thread
From: Pablo Neira Ayuso @ 2012-02-09 15:54 UTC (permalink / raw)
To: abirvalg; +Cc: netfilter-devel
On Wed, Feb 08, 2012 at 01:27:36PM +0000, abirvalg@lavabit.com wrote:
> My multi-threaded app makes heavy use of libnetfilter_conntrack.
> After running properly for a number of hours, at a certain point which I am not able to reproduce a call to conntrack function does not return for good 10 secs, CPU usage of my process spikes to 80% and running conntrack -L from terminal freezes. When the conntrack function returns with retval EILSEQ, CPU usage drops, conntrack -L unfreezes an dumps the output.
>
> The code in question does:
>
> nfct_query(setmark_handle_out, NFCT_Q_GET, ct_out_udp)
>
> where setmark_handle_out was previously linked to this function
>
> int setmark_out (enum nf_conntrack_msg_type type, struct nf_conntrack *mct,void *data)
> {
> nfct_set_attr_u32(mct, ATTR_MARK, nfmark_to_set_out);
> nfct_query(setmark_handle_out, NFCT_Q_UPDATE, mct); ***
> return NFCT_CB_CONTINUE;
> }
>
> nfmark_to_set_out is a global variable
>
> ***Could this line be the offending one? As I understand, when issuing NFCT_Q_UPDATE, indicating an nfct_handle is just a formality - any handle can be given as an argument, so I'm simply reusing an existing handle.
>
> I really want to get to the bottom of this issue. Please let me know what other actions I can perform to produce some valuable debuginfo.
> I'm actually right now keeping the process suspended in gdb, because the issue takes many hours to reproduce.
>
> Here's the link to the offending line 3514 in my project's webgit:
> http://leopardflower.git.sourceforge.net/git/gitweb.cgi?p=leopardflower/leopardflower;a=blob;f=lpfw.c;h=c7af69c1def30d1a18e1bf839acbb60064ee3ba2;hb=709b1e87cf17e6e6e9d8a908ad8a6b77359f1d69#l3514
>
> Thanks.
>
> P.S.
> Please CC me when responding to this
>
> P.P.S.
> I already posted a similar issue on this mailing list
> http://marc.info/?t=131827063700008&r=1&w=2
>
> Back then Pablo responded with:
> /quote
> Regarding the EILSEQ error:
>
> The second parameter of nfct_open must be 0. However, if you use the
> same socket for sending commands and receiving events, then you have
> to disable sequence tracking, there is a function in libnfnetlink to
> do that.
> /unquote
>
> My code does call nfct_open with 0.
OK.
> Is the note above that I marked with *** a case of using the same socket for sending commands and receiving events?
Let me make this more generic:
If you use the same netlink socket to send and to receive data using
multiple thread/processes, then you have to disable sequence tracking.
This seems to be your case. Basically, a race condition may occur
following this steps:
1) you send a get command from process/thread h1 with seqnum S1.
2) you send an update command from process/thread h2 with seqnum S2.
3) you get the reply for get command, libnfnetlink sequence checks for
S2 but it gets S1. So it hits EILSEQ.
libnfnetlink sequence tracking is not thread safe. This is fixed by
libmnl. I'm still porting libnetfilter_* friends to libmnl, but this
will take time. So your solution is to disable sequence tracking in
libnfnetlink.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-02-09 15:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-08 13:27 EILSEQ with libnetfilter_conntrack on multi-threaded app abirvalg
2012-02-09 15:54 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).