From mboxrd@z Thu Jan  1 00:00:00 1970
From: Liran Alon <liran.alon@oracle.com>
Subject: Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info
 only when crossing netns
Date: Thu, 15 Mar 2018 08:05:33 -0700 (PDT)
Message-ID: <eb240da0-9498-4367-8430-600a481f8159@default>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: <netdev@vger.kernel.org>, <shmulik.ladkani@gmail.com>,
        <mrv@mojatatu.com>, <davem@davemloft.net>,
        <linux-kernel@vger.kernel.org>, <yuval.shaia@oracle.com>,
        <idan.brown@oracle.com>
To: <daniel@iogearbox.net>
Return-path: <linux-kernel-owner@vger.kernel.org>
Content-Disposition: inline
Sender: linux-kernel-owner@vger.kernel.org
List-Id: netdev.vger.kernel.org


----- daniel@iogearbox.net wrote:

> On 03/15/2018 03:35 PM, Roman Mashak wrote:
> > Liran Alon <liran.alon@oracle.com> writes:
> > [...]
> >>> Overall I think it might be nice to not need scrubbing skb in
> such
> >>> cases,
> >>> although my concern would be that this has potential to break
> >>> existing
> >>> setups when they would expect mark being zero on other veth peer
> in
> >>> any
> >>> case since it's the behavior for a long time already. The safer
> >>> option
> >>> would be to have some sort of explicit opt-in e.g. on link
> creation to
> >>> let
> >>> the skb->mark pass through unscrubbed. This would definitely be a
> >>> useful
> >>> option e.g. when mark is set in the netns facing veth via
> >>> clsact/egress
> >>> on xmit and when the container is unprivileged anyway.
> >>>
> >>> Thanks,
> >>> Daniel
> >>
> >> I see your point in regards to backwards comparability.
> >> However, not scrubbing skb when it cross netns via some kernel
> functions compared to
> >> others is basically a bug which could easily break with a little
> bit of more refactoring.
> >> Therefore, it seems a bit weird to me to from now on, we will
> force
> >> every user on link creation to consider that once there was a bug
> leading
> >> to this weird behavior on specific netdevs.
>=20
> Why bug specifically? It could well be that for some unpriv
> containers
> it would be fine to do e.g. in cases where orchestrator sets up
> clsact/
> egress on veth/ipvlan/etc in the container to set the mark and where
> app
> cannot mess with this while for others you need to act out of host
> facing
> veth; thus, explicit opt-in per dev could provide such more fine
> grained
> control.
>=20
> > One valid use case could be preserving a source namespace nsid in
> > skb->mark when a packet crosses netns.
>=20
> Right, was thinking about something similar.

I agree with all the above but this behavior was not supported both
before and after this commit. skb->mark is always zeroed when crossing netn=
s.
This commit only changes behavior for skb crossing netdevs on *same* netns
via dev_forward_skb().

Therefore, I believe we should discuss here what we want default behavior t=
o be
and how it should be controlled for backwards comparability.
Only after we should discuss about adding an extra feature of controlling s=
kb scrub
per netdev or something similar.