netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Passing data from user-space connections to Netfilter and back
@ 2011-09-06 12:32 Chris Wilson
  2011-09-08  5:32 ` Amos Jeffries
  0 siblings, 1 reply; 2+ messages in thread
From: Chris Wilson @ 2011-09-06 12:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Paolo Lucente, Squid Developers

Dear Netfilter and Squid developers,

I'm working on software to help network administrators with slow Internet 
connections (e.g. in Africa) to monitor, understand and optimise their 
network's Internet usage.

Currently we are running pmacct (reading from pcap/ULOG and generating 
flow records) and Squid (for caching and for recording the requested URLs, 
since web traffic is pretty opaque without them).

We'd like to be able to record the traffic flows coming out of Squid 
towards the Internet, and associate the requested URL with a flow record. 
Unfortunately this is quite difficult because of the limited information 
available to match flow records to Squid logs:

* flow accounting (pcap/ulog) sees: source IP+port (squid host+random 
high port), destination IP+port (web host, port 80 or 443), packet time;

* recorded in database by flow accounting: source IP+port (squid 
host+random high port), destination IP+port (web host, port 80 or 443), 
flow timestamp (rounded to the nearest minute, multiple flow records for 
a long-lived connection);

* squid sees and logs: source IP+port (squid host+random high port), 
destination IP+port (web host, port 80 or 443), connection start time, 
URL.

We could achieve something by matching on source and destination IP and 
port, but this is not very reliable. In the case of a frequently accessed 
website (e.g. google, facebook) only the source port changes between 
connections, and this could be recycled quite quickly, leading to 
ambiguous or false accounting. This is even more true of the reverse proxy 
case (Squid in front of your web server).

I think it would make sense for:

* Squid to generate a (near) unique ID for the connection (or use the TCP 
ISN? 32 bit ISN + 16 bit source port = 48 bit random ID);

* Squid to pass that information to Netfilter (e.g. with an ioctl() on the 
socket);

* Netfilter to associate that ID with the connection (e.g. copy it into 
the CONNMARK);

* Netfilter to log it to user space along with the connection's packets 
via ULOG;

* pmacct to store it in the flow record in the database.

Does this sound like a sensible plan? Is there any existing interface for 
a user-space application like Squid to associate opaque information with a 
connection that it makes, and for that information to make it back to user 
space via ULOG or similar? If not, where would you add it, and would the 
Netfilter and Squid teams in principle accept patches to make it possible?

Can it be done without touching any part of the kernel except Netfilter? 
Is there already a way for a user-space application to query the ISN of 
its own TCP connections from the kernel, or communicate with Netfilter 
about them?

Thanks in advance,

Chris Wilson.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Passing data from user-space connections to Netfilter and back
  2011-09-06 12:32 Passing data from user-space connections to Netfilter and back Chris Wilson
@ 2011-09-08  5:32 ` Amos Jeffries
  0 siblings, 0 replies; 2+ messages in thread
From: Amos Jeffries @ 2011-09-08  5:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: netfilter-devel, Paolo Lucente, Squid Developers

On 07/09/11 00:32, Chris Wilson wrote:
> Dear Netfilter and Squid developers,
>
> I'm working on software to help network administrators with slow
> Internet connections (e.g. in Africa) to monitor, understand and
> optimise their network's Internet usage.
>
> Currently we are running pmacct (reading from pcap/ULOG and generating
> flow records) and Squid (for caching and for recording the requested
> URLs, since web traffic is pretty opaque without them).
>
> We'd like to be able to record the traffic flows coming out of Squid
> towards the Internet, and associate the requested URL with a flow
> record. Unfortunately this is quite difficult because of the limited
> information available to match flow records to Squid logs:
>
> * flow accounting (pcap/ulog) sees: source IP+port (squid host+random
> high port), destination IP+port (web host, port 80 or 443), packet time;
>
> * recorded in database by flow accounting: source IP+port (squid
> host+random high port), destination IP+port (web host, port 80 or 443),
> flow timestamp (rounded to the nearest minute, multiple flow records for
> a long-lived connection);
>
> * squid sees and logs: source IP+port (squid host+random high port),
> destination IP+port (web host, port 80 or 443), connection start time, URL.
>
> We could achieve something by matching on source and destination IP and
> port, but this is not very reliable. In the case of a frequently
> accessed website (e.g. google, facebook) only the source port changes
> between connections, and this could be recycled quite quickly, leading
> to ambiguous or false accounting. This is even more true of the reverse
> proxy case (Squid in front of your web server).

Why do you call that false accounting? It is all flows from Squid to 
google, facebook, etc. Does it really matter if it was 1 flow or 10 when 
there is Mbps going through to the same place?

The problems as described appear to be side effects directly related to 
the 1 minute resolution in your database. Within any given minute 
upwards of 1,000 unique HTTP transactions may have occurred over one TCP 
flow.

>
> I think it would make sense for:
>
> * Squid to generate a (near) unique ID for the connection (or use the
> TCP ISN? 32 bit ISN + 16 bit source port = 48 bit random ID);
>
> * Squid to pass that information to Netfilter (e.g. with an ioctl() on
> the socket);
>
> * Netfilter to associate that ID with the connection (e.g. copy it into
> the CONNMARK);
>
> * Netfilter to log it to user space along with the connection's packets
> via ULOG;
>
> * pmacct to store it in the flow record in the database.
>
> Does this sound like a sensible plan? Is there any existing interface
> for a user-space application like Squid to associate opaque information
> with a connection that it makes, and for that information to make it
> back to user space via ULOG or similar? If not, where would you add it,
> and would the Netfilter and Squid teams in principle accept patches to
> make it possible?
>
> Can it be done without touching any part of the kernel except Netfilter?
> Is there already a way for a user-space application to query the ISN of
> its own TCP connections from the kernel, or communicate with Netfilter
> about them?
>
> Thanks in advance,
>
> Chris Wilson.

Regarding Squid,

  The logs contain time + duration, not just time, with a millisecond 
resolution. Reported at the termination of each request. So port 
rotation should not be relevant. For accuracy accounting needs only to 
associate the URL and IP:port tuples for all matching flow records 
across that specified duration.

  You may also want to have a look at the capabilities which we are 
currently stabilizing in squid 3.2.0.9 and later. They have more 
reliable IP:port details able to be logged regarding the connections. 
Along with flexible modules able to be added easily for realtime logging 
to any API you want.
  IPFIX logging has been requested, but not added yet. If you are 
interested in sponsoring or doing that modules implementation please 
contact me off-list.


Amos Jeffries
Squid HTTP Proxy Project

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-09-08  5:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-06 12:32 Passing data from user-space connections to Netfilter and back Chris Wilson
2011-09-08  5:32 ` Amos Jeffries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).