netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Florian Westphal <fw@strlen.de>
Cc: Lukas Tribus <lists@ltri.eu>,
	Mohandass Roobesh <Roobesh_Mohandass@mcafee.com>,
	netdev@vger.kernel.org
Subject: Re: : getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Sat, 12 Jan 2019 18:26:36 +0100	[thread overview]
Message-ID: <20190112172636.GA26639@1wt.eu> (raw)
In-Reply-To: <20190112160400.dblitzk2ftlfzryd@breakpoint.cc>

Hi Florian!

Sorry for the lag, I was busy with other issues.

On Sat, Jan 12, 2019 at 05:04:00PM +0100, Florian Westphal wrote:
> Lukas Tribus <lists@ltri.eu> wrote:
> > The application (haproxy) needs to know the original destination IP
> > address, however it does not know whether -j REDIRECT was used or not.
> > Because of this the application always queries SO_ORIGINAL_DST, and
> > this includes configurations without -j REDIRECT.
> > 
> > Are you saying the behavior of SO_ORIGINAL_DST is undefined when not
> > used with -j REDIRECT and that this issue does not happen when -j
> > REDIRECT is actually used?
> 
> No, thats not what I said.  Because OP provided a link that mentions
> TPROXY, I concluded OP was using TPROXY, so I pointed out that the
> error source can be completely avoided by not using SO_ORIGINAL_DST.
> 
> As I said, SO_ORIGINAL_DST returns the dst address of
> the original direction *as seen by conntrack*.
> 
> In case REDIRECT or DNAT was used, the address returned is the on-wire
> one, before DNAT rewrite took place.
>
> Therefore, SO_ORIGINAL_DST is only needed when REDIRECT or DNAT was
> used. If no DNAT rewrite takes place, sockaddr returned by accept or
> getsockname can be used directly and SO_ORIGINAL_DST isn't needed.
> The returned address should be identical to the one given by accept().

That's also the way we're using it :

     http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/proto_tcp.c;h=52a4691d2ba93aec93a3e8a0edd1e90d93de968d;hb=HEAD#l600

More precisely if getsockopt() fails we only use getsockname(). It
happens that this code is very old (2002) and used to work with kernel
2.4's TPROXY which did rely on conntrack, hence the ifdef including
TPROXY. But it's irrelevant here, with a modern TPROXY the getsockopt()
here will usually fail, and the result will come from getsockname()
only.

When seeing the (old) haproxy code there, I've been thinking about
another possibility. Right now haproxy does getsockname() then *tries*
getsockopt(). In the unlikely case getsockopt() would modify part of
the address already returned by getsockname(), it could leave an
incorrect address there. But such an issue would require partial uses
of copy_to_user() which doesn't make much sense, and getorigdst()
doesn't do this either. So this one was ruled out.

I've been wondering if it was possible that from time to time, this
getsockopt() accidently succeeds and returns something wrong, maybe
due to a race with another thread, or maybe because it would return
an uninitialized field which happened to see the source address
previously. The very low amount of occurrence makes me think it could
possibly happen only upon certain error conditions. But looking at
getorigdst(), I hardly imagine how this would happen.

> If SO_ORIGINAL_DST returns the reply, then conntrack picked up
> a reply packet as the first packet of the connection, so it believes
> originator is the responder and vice versa.
> 
> One case where this can happen is when nf_conntrack_tcp_loose
> (mid-stream pickup) is enabled.

Very interesting case indeed, I hadn't thought about it! I think we
don't have enough info from the original reporter's setup but it
would definitely make sense and explain why it's the other end which
is retrieved!

I'm seeing one possibility to explain this : Let's say the OP's setup
has a short conntrack timeout and a longer haproxy timeout. If the
address is only retrieved for logging, it will be retrieved at the
end of the connection. Let's assume haproxy receives a request from
a client, a conntrack entry is created and haproxy forwards the request
to a very slow server. Before the server responds, the conntrack entry
expires, then the server responds and haproxy forwards to the client,
re-creating the entry and hitting this case before the address is
picked up for logging.

Roobesh, do you use the destination address only for logging or
anywhere else in the request path ? And could you check if you have
nf_conntrack_tcp_loose set as Florian suggests ? I really think he
figured it right.

If that's the problem, I think we can address it with documentation :
first, warn about the use case, second, explain how to store the
address into a variable once the connection is accepted since it will
not have expired and will still be valid at this moment.

> This is not a haproxy bug.
> 
> Only thing that haproxy could is to provide a knob to make it only
> use addresses returned by accept, rather than relying on
> SO_ORIGINAL_DST for those that use TPROXY to do MITM interception.

Since it's the destination we get it using getsockname(), not accept(),
but yeah maybe we'd benefit from having a tunable to disable the use of
getsockopt(SO_ORIGINAL_DST). Actually I'd prefer to avoid doing this
until the issue it confirmed because I don't feel comfortable with having
an option saying "disable the use of getsockopt(SO_ORIGINAL_DST) which
may fail once per million due to some obscure reasons". However if it's
confirmed that it indeed is the scenario above that happens, we could
possibly think about adding a setting to work around if the doc approach
is not enough.

Many thanks for your insights, these are exactly the reason I asked
Roobesh to bring the issue here :-)

Cheers,
Willy

WARNING: multiple messages have this Message-ID (diff)
From: Willy Tarreau <w@1wt.eu>
To: Florian Westphal <fw@strlen.de>
Cc: Lukas Tribus <lists@ltri.eu>,
	Mohandass Roobesh <Roobesh_Mohandass@mcafee.com>,
	netdev@vger.kernel.org
Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Sat, 12 Jan 2019 18:26:36 +0100	[thread overview]
Message-ID: <20190112172636.GA26639@1wt.eu> (raw)
Message-ID: <20190112172636.JbmEw-sVbXboYQTRGxCodGkttRj1wX0mZtUV8nlo-XI@z> (raw)
In-Reply-To: <20190112160400.dblitzk2ftlfzryd@breakpoint.cc>

Hi Florian!

Sorry for the lag, I was busy with other issues.

On Sat, Jan 12, 2019 at 05:04:00PM +0100, Florian Westphal wrote:
> Lukas Tribus <lists@ltri.eu> wrote:
> > The application (haproxy) needs to know the original destination IP
> > address, however it does not know whether -j REDIRECT was used or not.
> > Because of this the application always queries SO_ORIGINAL_DST, and
> > this includes configurations without -j REDIRECT.
> > 
> > Are you saying the behavior of SO_ORIGINAL_DST is undefined when not
> > used with -j REDIRECT and that this issue does not happen when -j
> > REDIRECT is actually used?
> 
> No, thats not what I said.  Because OP provided a link that mentions
> TPROXY, I concluded OP was using TPROXY, so I pointed out that the
> error source can be completely avoided by not using SO_ORIGINAL_DST.
> 
> As I said, SO_ORIGINAL_DST returns the dst address of
> the original direction *as seen by conntrack*.
> 
> In case REDIRECT or DNAT was used, the address returned is the on-wire
> one, before DNAT rewrite took place.
>
> Therefore, SO_ORIGINAL_DST is only needed when REDIRECT or DNAT was
> used. If no DNAT rewrite takes place, sockaddr returned by accept or
> getsockname can be used directly and SO_ORIGINAL_DST isn't needed.
> The returned address should be identical to the one given by accept().

That's also the way we're using it :

     http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/proto_tcp.c;h=52a4691d2ba93aec93a3e8a0edd1e90d93de968d;hb=HEAD#l600

More precisely if getsockopt() fails we only use getsockname(). It
happens that this code is very old (2002) and used to work with kernel
2.4's TPROXY which did rely on conntrack, hence the ifdef including
TPROXY. But it's irrelevant here, with a modern TPROXY the getsockopt()
here will usually fail, and the result will come from getsockname()
only.

When seeing the (old) haproxy code there, I've been thinking about
another possibility. Right now haproxy does getsockname() then *tries*
getsockopt(). In the unlikely case getsockopt() would modify part of
the address already returned by getsockname(), it could leave an
incorrect address there. But such an issue would require partial uses
of copy_to_user() which doesn't make much sense, and getorigdst()
doesn't do this either. So this one was ruled out.

I've been wondering if it was possible that from time to time, this
getsockopt() accidently succeeds and returns something wrong, maybe
due to a race with another thread, or maybe because it would return
an uninitialized field which happened to see the source address
previously. The very low amount of occurrence makes me think it could
possibly happen only upon certain error conditions. But looking at
getorigdst(), I hardly imagine how this would happen.

> If SO_ORIGINAL_DST returns the reply, then conntrack picked up
> a reply packet as the first packet of the connection, so it believes
> originator is the responder and vice versa.
> 
> One case where this can happen is when nf_conntrack_tcp_loose
> (mid-stream pickup) is enabled.

Very interesting case indeed, I hadn't thought about it! I think we
don't have enough info from the original reporter's setup but it
would definitely make sense and explain why it's the other end which
is retrieved!

I'm seeing one possibility to explain this : Let's say the OP's setup
has a short conntrack timeout and a longer haproxy timeout. If the
address is only retrieved for logging, it will be retrieved at the
end of the connection. Let's assume haproxy receives a request from
a client, a conntrack entry is created and haproxy forwards the request
to a very slow server. Before the server responds, the conntrack entry
expires, then the server responds and haproxy forwards to the client,
re-creating the entry and hitting this case before the address is
picked up for logging.

Roobesh, do you use the destination address only for logging or
anywhere else in the request path ? And could you check if you have
nf_conntrack_tcp_loose set as Florian suggests ? I really think he
figured it right.

If that's the problem, I think we can address it with documentation :
first, warn about the use case, second, explain how to store the
address into a variable once the connection is accepted since it will
not have expired and will still be valid at this moment.

> This is not a haproxy bug.
> 
> Only thing that haproxy could is to provide a knob to make it only
> use addresses returned by accept, rather than relying on
> SO_ORIGINAL_DST for those that use TPROXY to do MITM interception.

Since it's the destination we get it using getsockname(), not accept(),
but yeah maybe we'd benefit from having a tunable to disable the use of
getsockopt(SO_ORIGINAL_DST). Actually I'd prefer to avoid doing this
until the issue it confirmed because I don't feel comfortable with having
an option saying "disable the use of getsockopt(SO_ORIGINAL_DST) which
may fail once per million due to some obscure reasons". However if it's
confirmed that it indeed is the scenario above that happens, we could
possibly think about adding a setting to work around if the doc approach
is not enough.

Many thanks for your insights, these are exactly the reason I asked
Roobesh to bring the issue here :-)

Cheers,
Willy

  parent reply	other threads:[~2019-01-12 17:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <MWHPR16MB1502549F7BAAE102EF1D5DF4EDB50@MWHPR16MB1502.namprd16.prod.outlook.com>
     [not found] ` <MWHPR16MB150233EF69CFD4562BFA65ABEDB60@MWHPR16MB1502.namprd16.prod.outlook.com>
     [not found]   ` <MWHPR16MB1502DC55542EBA8121631B16EDB20@MWHPR16MB1502.namprd16.prod.outlook.com>
2018-12-31  6:20     ` : getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP Mohandass, Roobesh
2019-01-07 11:17       ` Florian Westphal
2019-01-07 14:57         ` Mohandass, Roobesh
2019-01-12 14:37         ` Lukas Tribus
2019-01-12 14:37           ` [NETDEV]: " Lukas Tribus
2019-01-12 16:04           ` : " Florian Westphal
2019-01-12 16:04             ` [NETDEV]: " Florian Westphal
2019-01-12 17:26             ` Willy Tarreau [this message]
2019-01-12 17:26               ` Willy Tarreau
2019-01-12 18:01               ` : " Lukas Tribus
2019-01-12 18:01                 ` [NETDEV]: " Lukas Tribus
2019-01-12 18:33                 ` : " Willy Tarreau
2019-01-12 18:33                   ` [NETDEV]: " Willy Tarreau
2019-01-17  5:23                   ` Mohandass, Roobesh
2019-01-23  8:07                     ` Mohandass, Roobesh
2019-01-23  8:54                       ` Willy Tarreau
     [not found]     ` <MWHPR16MB1502A336CCB4EF3F0FCB69ECEDB20@MWHPR16MB1502.namprd16.prod.outlook.com>
2019-01-07  7:58       ` : " Mohandass, Roobesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190112172636.GA26639@1wt.eu \
    --to=w@1wt.eu \
    --cc=Roobesh_Mohandass@mcafee.com \
    --cc=fw@strlen.de \
    --cc=lists@ltri.eu \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).