* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
2011-11-08 0:23 [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink Maciej Żenczykowski
@ 2011-11-08 0:25 ` Maciej Żenczykowski
2011-11-08 0:27 ` Stephen Hemminger
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Maciej Żenczykowski @ 2011-11-08 0:25 UTC (permalink / raw)
To: Maciej Żenczykowski
Cc: netdev, Murali Raja, Stephen Hemminger, Eric Dumazet,
David S. Miller
FYI, This obviously requires a follow up patch which will add TOS and
TCLASS info to appropriate dual-stack sockets (for example listening
tcp v6 non-ipv6only).
On Mon, Nov 7, 2011 at 4:23 PM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> From: Maciej Żenczykowski <maze@google.com>
>
> commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute.
>
> Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket,
> furthermore they can have different values. As such one cannot in a
> sane way expose both through a single attribute.
>
> Signed-off-by: Maciej Żenczyowski <maze@google.com>
> CC: Murali Raja <muralira@google.com>
> CC: Stephen Hemminger <shemminger@vyatta.com>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: David S. Miller <davem@davemloft.net>
> ---
> include/linux/inet_diag.h | 3 ++-
> net/ipv4/inet_diag.c | 4 ++--
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
> index 80b480c..abf5028 100644
> --- a/include/linux/inet_diag.h
> +++ b/include/linux/inet_diag.h
> @@ -98,9 +98,10 @@ enum {
> INET_DIAG_VEGASINFO,
> INET_DIAG_CONG,
> INET_DIAG_TOS,
> + INET_DIAG_TCLASS,
> };
>
> -#define INET_DIAG_MAX INET_DIAG_TOS
> +#define INET_DIAG_MAX INET_DIAG_TCLASS
>
>
> /* INET_DIAG_MEM */
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index f5e2bda..68e8ac5 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -133,8 +133,8 @@ static int inet_csk_diag_fill(struct sock *sk,
> &np->rcv_saddr);
> ipv6_addr_copy((struct in6_addr *)r->id.idiag_dst,
> &np->daddr);
> - if (ext & (1 << (INET_DIAG_TOS - 1)))
> - RTA_PUT_U8(skb, INET_DIAG_TOS, np->tclass);
> + if (ext & (1 << (INET_DIAG_TCLASS - 1)))
> + RTA_PUT_U8(skb, INET_DIAG_TCLASS, np->tclass);
> }
> #endif
>
> --
> 1.7.3.1
>
>
--
Maciej A. Żenczykowski
Kernel Networking Developer @ Google
1600 Amphitheatre Parkway, Mountain View, CA 94043
tel: +1 (650) 253-0062
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
2011-11-08 0:23 [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink Maciej Żenczykowski
2011-11-08 0:25 ` Maciej Żenczykowski
@ 2011-11-08 0:27 ` Stephen Hemminger
2011-11-08 0:29 ` Maciej Żenczykowski
2011-11-08 1:46 ` [PATCH] [RFC] net-netlink: fix tos/tclass for dual-stack ipv6 sockets Maciej Żenczykowski
2011-11-09 20:34 ` [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink David Miller
3 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2011-11-08 0:27 UTC (permalink / raw)
To: Maciej Żenczykowski
Cc: Maciej Żenczykowski, netdev, Murali Raja, Eric Dumazet,
David S. Miller
On Mon, 7 Nov 2011 16:23:11 -0800
Maciej Żenczykowski <zenczykowski@gmail.com> wrote:
> commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute.
>
> Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket,
> furthermore they can have different values. As such one cannot in a
> sane way expose both through a single attribute.
Do we really want to continue to expose that as a supported
API. I would argue it was a mistake in the original implementation.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
2011-11-08 0:27 ` Stephen Hemminger
@ 2011-11-08 0:29 ` Maciej Żenczykowski
0 siblings, 0 replies; 15+ messages in thread
From: Maciej Żenczykowski @ 2011-11-08 0:29 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Murali Raja, Eric Dumazet, David S. Miller
> Do we really want to continue to expose that as a supported
> API. I would argue it was a mistake in the original implementation.
Yes, that's an interesting question...
However, can this be changed at this point in time?
Theoretically network fabric hardware could interpret v4 and v6 dscp
code points differently
(don't think anyone sane would do really want to do this though...)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] [RFC] net-netlink: fix tos/tclass for dual-stack ipv6 sockets
2011-11-08 0:23 [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink Maciej Żenczykowski
2011-11-08 0:25 ` Maciej Żenczykowski
2011-11-08 0:27 ` Stephen Hemminger
@ 2011-11-08 1:46 ` Maciej Żenczykowski
2011-11-09 20:35 ` David Miller
2011-11-09 20:34 ` [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink David Miller
3 siblings, 1 reply; 15+ messages in thread
From: Maciej Żenczykowski @ 2011-11-08 1:46 UTC (permalink / raw)
To: Maciej Żenczykowski; +Cc: netdev, Maciej Żenczykowski
From: Maciej Żenczykowski <maze@google.com>
Something along the following lines would be needed.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
include/net/ipv6.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/inet_diag.c | 11 +++++----
2 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3b5ac1f..50c7a3b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -465,6 +465,58 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
extern void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
+/* Return true for:
+ * - an IPv4 socket (listening or connected)
+ * - an IPv4 connection on a dual-stack IPv6 socket
+ * - an IPv6 dual-stack listening socket -> can later accept IPv4 connection
+ */
+static inline bool sk_might_be_ipv4(struct sock *sk) {
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+ const struct ipv6_pinfo *np;
+
+ if (!sk) return false;
+ if (sk->sk_family == AF_INET) return true;
+ if (sk->sk_family != AF_INET6) return false;
+ np = inet6_sk(sk);
+ if (np->ipv6only) return false;
+
+ if (ipv6_addr_v4mapped(&np->rcv_saddr)) return true;
+ if (!ipv6_addr_any(&np->rcv_saddr)) return false;
+
+ if (sk->sk_state == TCP_LISTEN) return true;
+
+ if (ipv6_addr_v4mapped(&np->saddr)) return true;
+ return false;
+#else
+ return sk && (sk->sk_family == AF_INET);
+#endif
+}
+
+/* Return true for:
+ * - a native IPv6 connection
+ * - a listening IPv6 socket
+ */
+static inline bool sk_might_be_ipv6(struct sock *sk) {
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+ const struct ipv6_pinfo *np;
+
+ if (!sk) return false;
+ if (sk->sk_family != AF_INET6) return false;
+ np = inet6_sk(sk);
+ if (np->ipv6only) return true;
+
+ if (ipv6_addr_v4mapped(&np->rcv_saddr)) return false;
+ if (!ipv6_addr_any(&np->rcv_saddr)) return true;
+
+ if (sk->sk_state == TCP_LISTEN) return true;
+
+ if (ipv6_addr_v4mapped(&np->saddr)) return false;
+ return true;
+#else
+ return false;
+#endif
+}
+
/*
* Prototypes exported by ipv6
*/
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 68e8ac5..39bc97c 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -108,9 +108,6 @@ static int inet_csk_diag_fill(struct sock *sk,
icsk->icsk_ca_ops->name);
}
- if ((ext & (1 << (INET_DIAG_TOS - 1))) && (sk->sk_family != AF_INET6))
- RTA_PUT_U8(skb, INET_DIAG_TOS, inet->tos);
-
r->idiag_family = sk->sk_family;
r->idiag_state = sk->sk_state;
r->idiag_timer = 0;
@@ -125,7 +122,13 @@ static int inet_csk_diag_fill(struct sock *sk,
r->id.idiag_src[0] = inet->inet_rcv_saddr;
r->id.idiag_dst[0] = inet->inet_daddr;
+ if ((ext & (1 << (INET_DIAG_TOS - 1))) && sk_might_be_ipv4(sk))
+ RTA_PUT_U8(skb, INET_DIAG_TOS, inet->tos);
+
#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ if ((ext & (1 << (INET_DIAG_TCLASS - 1))) && sk_might_be_ipv6(sk))
+ RTA_PUT_U8(skb, INET_DIAG_TCLASS, inet6_sk(sk)->tclass);
+
if (r->idiag_family == AF_INET6) {
const struct ipv6_pinfo *np = inet6_sk(sk);
@@ -133,8 +136,6 @@ static int inet_csk_diag_fill(struct sock *sk,
&np->rcv_saddr);
ipv6_addr_copy((struct in6_addr *)r->id.idiag_dst,
&np->daddr);
- if (ext & (1 << (INET_DIAG_TCLASS - 1)))
- RTA_PUT_U8(skb, INET_DIAG_TCLASS, np->tclass);
}
#endif
--
1.7.3.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] [RFC] net-netlink: fix tos/tclass for dual-stack ipv6 sockets
2011-11-08 1:46 ` [PATCH] [RFC] net-netlink: fix tos/tclass for dual-stack ipv6 sockets Maciej Żenczykowski
@ 2011-11-09 20:35 ` David Miller
2011-11-10 1:52 ` Maciej Żenczykowski
0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2011-11-09 20:35 UTC (permalink / raw)
To: zenczykowski; +Cc: maze, netdev
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 7 Nov 2011 17:46:40 -0800
> From: Maciej Żenczykowski <maze@google.com>
>
> Something along the following lines would be needed.
>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
This is terrible, see my other email, inet->tos doesn't matter even
for mapped ipv6 sockets.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] [RFC] net-netlink: fix tos/tclass for dual-stack ipv6 sockets
2011-11-09 20:35 ` David Miller
@ 2011-11-10 1:52 ` Maciej Żenczykowski
0 siblings, 0 replies; 15+ messages in thread
From: Maciej Żenczykowski @ 2011-11-10 1:52 UTC (permalink / raw)
To: David Miller; +Cc: netdev
2011/11/9 David Miller <davem@davemloft.net>:
> From: Maciej Żenczykowski <zenczykowski@gmail.com>
> Date: Mon, 7 Nov 2011 17:46:40 -0800
>
>> From: Maciej Żenczykowski <maze@google.com>
>>
>> Something along the following lines would be needed.
>>
>> Signed-off-by: Maciej Żenczykowski <maze@google.com>
>
> This is terrible, see my other email, inet->tos doesn't matter even
> for mapped ipv6 sockets.
Oh, yes, I definitely agree that this is terrible.
Can we just get rid of tclass and use inet->tos for ipv6 as well?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
2011-11-08 0:23 [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink Maciej Żenczykowski
` (2 preceding siblings ...)
2011-11-08 1:46 ` [PATCH] [RFC] net-netlink: fix tos/tclass for dual-stack ipv6 sockets Maciej Żenczykowski
@ 2011-11-09 20:34 ` David Miller
2011-11-10 1:50 ` Maciej Żenczykowski
3 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2011-11-09 20:34 UTC (permalink / raw)
To: zenczykowski; +Cc: maze, netdev, muralira, shemminger, eric.dumazet
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 7 Nov 2011 16:23:11 -0800
> From: Maciej Żenczykowski <maze@google.com>
>
> commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute.
>
> Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket,
> furthermore they can have different values. As such one cannot in a
> sane way expose both through a single attribute.
>
> Signed-off-by: Maciej Żenczyowski <maze@google.com>
I can't see how an ipv6 mapped socket can even set the inet->tos value.
As far as I can see, only net/ipv4/ip_sockglue.c:ip_setsockopt() provides
the interface to change inet->tos.
And ipv6 sockets, of any type, are provided no such vector by which to
get at those interfaces.
So inet->tos is always left at it's default value for ipv6 mapped sockets,
and therefore I see no reason to report TCLASS vs. TOS separately.
In fact, what I would suggest is to do something about the lack of
ability to set inet->tos, and the best way to do that seems to be to
simply propagate the npinfo->tclass setting into inet->tos. Performaing
any munging if necessary.
I'm not applying this patch.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
2011-11-09 20:34 ` [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink David Miller
@ 2011-11-10 1:50 ` Maciej Żenczykowski
2011-11-10 1:53 ` David Miller
0 siblings, 1 reply; 15+ messages in thread
From: Maciej Żenczykowski @ 2011-11-10 1:50 UTC (permalink / raw)
To: David Miller; +Cc: netdev, muralira, shemminger, eric.dumazet
> I can't see how an ipv6 mapped socket can even set the inet->tos value.
>
> As far as I can see, only net/ipv4/ip_sockglue.c:ip_setsockopt() provides
> the interface to change inet->tos.
Yes, but an ipv6 socket is permitted to setsockopt SOL_IP in addition
to SOL_IPV6.
As such, one can simply setsockopt(ipv6_socket, SOL_IP, IP_TOS, value).
This is just as well, because inet->tos is what determines the ipv4 tos value
of any ipv4 packets generated by an ipv6 socket, ie. when connect(ed/ing)
to an ipv4 mapped ipv6 address.
> And ipv6 sockets, of any type, are provided no such vector by which to
> get at those interfaces.
> So inet->tos is always left at it's default value for ipv6 mapped sockets,
> and therefore I see no reason to report TCLASS vs. TOS separately.
> In fact, what I would suggest is to do something about the lack of
> ability to set inet->tos, and the best way to do that seems to be to
> simply propagate the npinfo->tclass setting into inet->tos. Performaing
> any munging if necessary.
> I'm not applying this patch.
If people are in agreement that inet->tos vs ipv6->tclass being
separately settable is not desirable,
then I'm willing to go through and remove the tclass field (and send
out a patch to do that).
However, this is _QUITE DEFINITELY_ a user visible change in API and
application visible behaviour.
It is however an annoying corner case, so perhaps would be best fixed
by getting rid of it?
OTOH, I'm not sure if people aren't perhaps relying on the ability to
separately set ipv4 dscp vs ipv6 tclass due to some internal network
constraints...
Personally, I'd prefer the simplification of not having to deal with
IP_TOS vs IPV6_TCLASS discrepancies on ipv6 dual-stack sockets.
- Maciej
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] net-netlink: Add a new attribute to expose TCLASS values via netlink
2011-11-10 1:50 ` Maciej Żenczykowski
@ 2011-11-10 1:53 ` David Miller
[not found] ` <CAHo-OoybYpiusVFbDLEhS2ayPOVC1sXt7WZvpgW-Fk7ZqKj+SQ@mail.gmail.com>
0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2011-11-10 1:53 UTC (permalink / raw)
To: zenczykowski; +Cc: netdev, muralira, shemminger, eric.dumazet
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Wed, 9 Nov 2011 17:50:27 -0800
> Yes, but an ipv6 socket is permitted to setsockopt SOL_IP in addition
> to SOL_IPV6.
> As such, one can simply setsockopt(ipv6_socket, SOL_IP, IP_TOS, value).
That's my whole point, this operation isn't currently possible, the
socket ops for ipv4 setsockopt() aren't hooked up to ipv6 mapped
sockets in the kernel right now.
^ permalink raw reply [flat|nested] 15+ messages in thread