Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] Fix copy-paste bug: assign from src struct not dest
From: Arend van Spriel @ 2015-01-07 23:06 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Giel van Schijndel, Andy Shevchenko, linux-kernel, Kalle Valo,
	Eliad Peller, John W. Linville, Arik Nemtsov,
	open list:TI WILINK WIRELES..., NETWORKING DRIVERS
In-Reply-To: <1420668978.3407.28.camel@sipsolutions.net>

On 01/07/15 23:16, Johannes Berg wrote:
> On Wed, 2015-01-07 at 20:18 +0100, Giel van Schijndel wrote:
>
>> IMO the aligned block of code has the significant advantage of taking
>> advantage of humans' ability to spot things that break a pattern. Which
>> in this case becomes *very* visible when properly aligned, because
>> without the alignment there is no (visual) pattern (or at least not one
>> very suitable for my "visual processing system", I know the same applies
>> to at least some others).
>
> Yeah, well, but why even invoke that "visual processing system"?
>
> If you look, for example, at the __skb_clone function it just uses a
> macro:
>
> #define C(x) n->x = skb->x

This requires fixed names so I generally prefer to add them:

#define C(d, s, f)	(d)->f = (s)->f

> and then
>
> 	C(len);
> 	C(data_len);

	C(acx, conf, window_size);
	C(acx, conf, increase_time);

Regards,
Arend

>
> etc.
>
> johannes
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH v4 14/20] selftests/size: add install target to enable test install
From: Tim Bird @ 2015-01-07 23:05 UTC (permalink / raw)
  To: Shuah Khan, mmarek@suse.cz, gregkh@linuxfoundation.org,
	akpm@linux-foundation.org, rostedt@goodmis.org, mingo@redhat.com,
	davem@davemloft.net, keescook@chromium.org,
	tranmanphong@gmail.com, mpe@ellerman.id.au, cov@codeaurora.org,
	dh.herrmann@gmail.com, hughd@google.com, bobby.prani@gmail.com,
	serge.hallyn@ubuntu.com, ebiederm@xmission.com,
	josh@joshtriplett.org, koct9i@gmail.com,
	"masami.hiramatsu.pt@hitachi.com" <masami.hiram
  Cc: linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <5861e16e1b533cae37471a8cf18597d3e9e6b6f6.1420571615.git.shuahkh@osg.samsung.com>

On 01/06/2015 11:43 AM, Shuah Khan wrote:
> Add a new make target to enable installing test. This target
> installs test in the kselftest install location and add to the
> kselftest script to run the test. Install target can be run
> only from top level kernel source directory.
> 
> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
> ---
>  tools/testing/selftests/size/Makefile | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/size/Makefile b/tools/testing/selftests/size/Makefile
> index 04dc25e..a1478fa 100644
> --- a/tools/testing/selftests/size/Makefile
> +++ b/tools/testing/selftests/size/Makefile
> @@ -1,12 +1,22 @@
>  CC = $(CROSS_COMPILE)gcc
>  
> +TEST_STR = ./get_size || echo 'get_size selftests: [FAIL]'
> +
>  all: get_size
>  
>  get_size: get_size.c
>  	$(CC) -static -ffreestanding -nostartfiles -s $< -o $@
>  
> +install:
> +ifdef INSTALL_KSFT_PATH
> +	install ./get_size $(INSTALL_KSFT_PATH)
> +	@echo "$(TEST_STR)" >> $(KSELFTEST)
> +else
> +	@echo "Run make kselftest_install in top level source directory"
> +endif
> +
>  run_tests: all
> -	./get_size
> +	@$(TEST_STR)
>  
>  clean:
>  	$(RM) get_size
> 

Acked-by: Tim Bird <tim.bird@sonymobile.com>

^ permalink raw reply

* Re: [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
From: Thomas Graf @ 2015-01-07 23:01 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev,
	dev@openvswitch.org
In-Reply-To: <CAEP_g=__Ssxj3B_0sbp5D+Gh8YmQ7ub9giK4A_QYWVmYctjm-w@mail.gmail.com>

On 01/07/15 at 02:46pm, Jesse Gross wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The group policy metadata is handled in the same way as Geneve options
> > and transported as binary blob in a new Netlink attribute
> > OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
> > existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.
> 
> Can you explain some more what the encoding would look like if
> additional options were defined (including ones that are potentially
> mutually exclusive)? The Geneve options are binary but that is coming
> directly from the protocol specification. However, this isn't an on
> the wire format so I'm not sure what it would look like or how it
> would be defined to avoid conflict and allow evolution.

The encoding will be based on struct ovs_vxlan_opts which is extended
as needed by appending new members to the end of the struct. Parsers
will look at the provided length to see which fields are provided.

The user space side looks as follows. I will add similar logic to the
kernel side as soon as we have a 2nd extension.

+/* Returns true if attribute is long enough to cover member of type. */
+#define NL_PROVIDES_MEMBER(attr, type, member, size) \
+       (nl_attr_get_size(attr) >= (offsetof(type, member) + size))
+

[...]

+        case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS: {
+            struct ovs_vxlan_opts *vxlan_opts;
+
+            /* Length verification done per member */
+            vxlan_opts = (struct ovs_vxlan_opts *)nl_attr_get_unspec(a, 0);
+
+            if (NL_PROVIDES_MEMBER(a, struct ovs_vxlan_opts, gbp, sizeof(vxlan_opts->gbp))) {
+                tun->gbp_id = htons(vxlan_opts->gbp & 0xFFFF);
+                tun->gbp_flags = (vxlan_opts->gbp >> 16) & 0xFF;
+            }
+            break;
+        }

^ permalink raw reply

* [PATCH] net/at91_ether: prepare and unprepare clock
From: Alexandre Belloni @ 2015-01-07 22:59 UTC (permalink / raw)
  To: Nicolas Ferre, David S. Miller
  Cc: Boris Brezillon, linux-arm-kernel, linux-kernel, netdev,
	Alexandre Belloni

The clock is enabled without being prepared, this leads to:

WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:889 __clk_enable+0x24/0xa8()

and a non working ethernet interface.

Use clk_prepare_enable() and clk_disable_unprepare() to handle the clock.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 drivers/net/ethernet/cadence/at91_ether.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cadence/at91_ether.c b/drivers/net/ethernet/cadence/at91_ether.c
index 55eb7f2af2b4..7ef55f5fa664 100644
--- a/drivers/net/ethernet/cadence/at91_ether.c
+++ b/drivers/net/ethernet/cadence/at91_ether.c
@@ -340,7 +340,7 @@ static int __init at91ether_probe(struct platform_device *pdev)
 		res = PTR_ERR(lp->pclk);
 		goto err_free_dev;
 	}
-	clk_enable(lp->pclk);
+	clk_prepare_enable(lp->pclk);
 
 	lp->hclk = ERR_PTR(-ENOENT);
 	lp->tx_clk = ERR_PTR(-ENOENT);
@@ -406,7 +406,7 @@ static int __init at91ether_probe(struct platform_device *pdev)
 err_out_unregister_netdev:
 	unregister_netdev(dev);
 err_disable_clock:
-	clk_disable(lp->pclk);
+	clk_disable_unprepare(lp->pclk);
 err_free_dev:
 	free_netdev(dev);
 	return res;
@@ -424,7 +424,7 @@ static int at91ether_remove(struct platform_device *pdev)
 	kfree(lp->mii_bus->irq);
 	mdiobus_free(lp->mii_bus);
 	unregister_netdev(dev);
-	clk_disable(lp->pclk);
+	clk_disable_unprepare(lp->pclk);
 	free_netdev(dev);
 
 	return 0;
@@ -440,7 +440,7 @@ static int at91ether_suspend(struct platform_device *pdev, pm_message_t mesg)
 		netif_stop_queue(net_dev);
 		netif_device_detach(net_dev);
 
-		clk_disable(lp->pclk);
+		clk_disable_unprepare(lp->pclk);
 	}
 	return 0;
 }
@@ -451,7 +451,7 @@ static int at91ether_resume(struct platform_device *pdev)
 	struct macb *lp = netdev_priv(net_dev);
 
 	if (netif_running(net_dev)) {
-		clk_enable(lp->pclk);
+		clk_prepare_enable(lp->pclk);
 
 		netif_device_attach(net_dev);
 		netif_start_queue(net_dev);
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
From: Thomas Graf @ 2015-01-07 22:55 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev,
	dev@openvswitch.org
In-Reply-To: <CAEP_g=-X_S1qmMQt3VjOUT9+qUEnxeJxUSf-B90Yktns72zWaA@mail.gmail.com>

On 01/07/15 at 02:46pm, Jesse Gross wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > A subsequent patch will introduce VXLAN options. Rename the existing
> > GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
> > tunnel metadata options.
> >
> > Signed-off-by: Thomas Graf <tgraf@suug.ch>
> 
> This is generally a good idea (even outside of the larger context of
> this patch series) although a couple comments below:
> 
> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index d1eecf7..c60ae3f 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c
> > @@ -387,20 +387,20 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
> >         return __parse_flow_nlattrs(attr, a, attrsp, log, false);
> >  }
> >
> > -static int genev_tun_opt_from_nlattr(const struct nlattr *a,
> > -                                    struct sw_flow_match *match, bool is_mask,
> > -                                    bool log)
> > +static int tun_md_opt_from_nlattr(const struct nlattr *a,
> > +                                 struct sw_flow_match *match, bool is_mask,
> > +                                 bool log)
> 
> I think this is a somewhat overzealous conversion. This function is
> verifying OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS and as such has some
> elements of Geneve protocol knowledge baked in (such as the check that
> the options are a multiple of 4 bytes). It's also nice for the log
> messages to indicate which netlink key is causing the problem and not
> be too generic.

OK. I was planning on always padding vxlan_opts to 4 bytes as well but
I agree with you. It seems clearer to have distinct functions for each
and they can share code where it makes sense.

> > @@ -1148,10 +1145,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
> >                 goto nla_put_failure;
> >
> >         if ((swkey->tun_key.ipv4_dst || is_mask)) {
> > -               const struct geneve_opt *opts = NULL;
> > +               const void *opts = NULL;
> >
> >                 if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
> > -                       opts = GENEVE_OPTS(output, swkey->tun_opts_len);
> > +                       opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
> >
> >                 if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
> >                                        swkey->tun_opts_len))
> 
> I think it's probably better to factor this block of code out into a
> new function as you have done in the next patch (and include it with
> this one). It makes it clearer that it is Geneve-specific.

OK, will do.

^ permalink raw reply

* Re: [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled
From: Thomas Graf @ 2015-01-07 22:52 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev,
	dev@openvswitch.org
In-Reply-To: <CAEP_g=8_BFYO67Qj+TQUprqomBrwPLSGJvA3+F=9E3bgKhcrEA@mail.gmail.com>

On 01/07/15 at 02:45pm, Jesse Gross wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > A VXLAN net_device looking for an appropriate socket may only
> > consider a socket which has the exact set of extensions enabled.
> > If none can be found, a new socket must be created.
> 
> Maybe it's just the phrasing of the commit message but won't the new
> socket that needs to be created immediately fail? I think this is
> really just checking that you don't try to instantiate two different
> sets of extensions on the same UDP port - it's not like this is going
> to somehow create a new socket and they will be able to coexist.

Your interpretation is correct and the phrasing is poor. It prevents
a non-GBP socket from being used for a GBP socket. I will rework the
commit message.

^ permalink raw reply

* Re: [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
From: Jesse Gross @ 2015-01-07 22:46 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev,
	dev@openvswitch.org
In-Reply-To: <9c1b0b0acde09019acb61b9b1a4eb4b18c62642a.1420594925.git.tgraf@suug.ch>

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> The group policy metadata is handled in the same way as Geneve options
> and transported as binary blob in a new Netlink attribute
> OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
> existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.

Can you explain some more what the encoding would look like if
additional options were defined (including ones that are potentially
mutually exclusive)? The Geneve options are binary but that is coming
directly from the protocol specification. However, this isn't an on
the wire format so I'm not sure what it would look like or how it
would be defined to avoid conflict and allow evolution.

^ permalink raw reply

* Re: [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
From: Jesse Gross @ 2015-01-07 22:46 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev,
	dev@openvswitch.org
In-Reply-To: <10cbd42aca443d862b8b62017b5a4356326e026f.1420594925.git.tgraf@suug.ch>

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> A subsequent patch will introduce VXLAN options. Rename the existing
> GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
> tunnel metadata options.
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

This is generally a good idea (even outside of the larger context of
this patch series) although a couple comments below:

> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index d1eecf7..c60ae3f 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -387,20 +387,20 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
>         return __parse_flow_nlattrs(attr, a, attrsp, log, false);
>  }
>
> -static int genev_tun_opt_from_nlattr(const struct nlattr *a,
> -                                    struct sw_flow_match *match, bool is_mask,
> -                                    bool log)
> +static int tun_md_opt_from_nlattr(const struct nlattr *a,
> +                                 struct sw_flow_match *match, bool is_mask,
> +                                 bool log)

I think this is a somewhat overzealous conversion. This function is
verifying OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS and as such has some
elements of Geneve protocol knowledge baked in (such as the check that
the options are a multiple of 4 bytes). It's also nice for the log
messages to indicate which netlink key is causing the problem and not
be too generic.

> @@ -1148,10 +1145,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
>                 goto nla_put_failure;
>
>         if ((swkey->tun_key.ipv4_dst || is_mask)) {
> -               const struct geneve_opt *opts = NULL;
> +               const void *opts = NULL;
>
>                 if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
> -                       opts = GENEVE_OPTS(output, swkey->tun_opts_len);
> +                       opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
>
>                 if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
>                                        swkey->tun_opts_len))

I think it's probably better to factor this block of code out into a
new function as you have done in the next patch (and include it with
this one). It makes it clearer that it is Geneve-specific.

^ permalink raw reply

* Re: [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled
From: Jesse Gross @ 2015-01-07 22:45 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, netdev,
	dev@openvswitch.org
In-Reply-To: <46b4ceefbd8513250f64ba9c7297a8d2743f3108.1420594925.git.tgraf@suug.ch>

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> A VXLAN net_device looking for an appropriate socket may only
> consider a socket which has the exact set of extensions enabled.
> If none can be found, a new socket must be created.

Maybe it's just the phrasing of the commit message but won't the new
socket that needs to be created immediately fail? I think this is
really just checking that you don't try to instantiate two different
sets of extensions on the same UDP port - it's not like this is going
to somehow create a new socket and they will be able to coexist.

^ permalink raw reply

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
From: Jesse Gross @ 2015-01-07 22:45 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Tom Herbert, David Miller, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev@openvswitch.org
In-Reply-To: <20150107102740.GM21820@casper.infradead.org>

On Wed, Jan 7, 2015 at 2:27 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/06/15 at 07:46pm, Tom Herbert wrote:
>> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
>> > The VXLAN receive code is currently conservative in what it accepts and
>> > will reject any frame that uses any of the reserved VXLAN protocol fields.
>> > The VXLAN draft specifies that "reserved fields MUST be set to zero on
>> > transmit and ignored on receive.".
>> >
>> IMO it is an unfortunate decision in VXLAN to ignore set reserved
>> fields on receive. Had they not done this, then adding a protocol
>> field to the header would have been feasible and we wouldn't need yet
>> another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
>> reserved bits set is the better behavior, but I think the comment
>> about this needs to be clear about why this diverges from RFC7348.
>
> Agreed. Do you want to give it a shot? I know you have been involved
> on all aspects of this topic for a long time in NVO3 ;-)

There is little chance of this happening. Besides the IETF procedural
aspects, ignoring unknown reserved bits on receive is the long
standing standard way of extending protocols. It's how TCP was
extended to support ECN, for example.

>> > Retain the current conservative parsing behaviour by default but allows
>> > these fields to be used by VXLAN extensions which are explicitly enabled
>> > on the VXLAN socket respectively VXLAN net_device.
>> >
>> Conceptually, VXLAN has both mandatory flags and optional flags for
>> extensions. You may want to look at the VXLAN RCO patches that added
>> an extension and infrastructure for them.
>
> I've seen your proposed changes. I think they could be easily rebased
> on top of this and use the enablement infrastructure. In fact, I see
> this as the only feasible option to deal with VXLAN extensions: Leave
> it to the user to decide which extensions to run. That way we can
> support any combination of extensions, even conflicting ones. All we
> have to do is catch incompatible extension configurations on a socket
> and reject them at configuration time. Expecting that NVO3/IETF will
> find consensus on a list of compatible set of extensions with perfect
> forward and backwards compatibility is unrealistic in my eyes. We have
> Geneve and GUE to solve it properly in the future.

My concern is that having multiple (and potentially overlapping)
extensions is going to make the VXLAN code very messy and hard to
follow. I think there's already quite a big of complexity there from
the DOVE extensions (which are basically dead at this point to the
best of my knowledge). We already see this some with the discussion
over the header struct but that's just the beginning - can you imagine
the code that checks a single bit for several different
interpretations? Not to mention the fact that the location of the bits
in some of these proposals has already been shuffled several times. It
seems very painful...

^ permalink raw reply

* Re: iproute2 tc: Print qdisc, stats, class, filter info to string rather than FILE
From: Vadim Kochan @ 2015-01-07 22:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Vadim Kochan, netdev
In-Reply-To: <20150107221309.GG11324@breakpoint.cc>

On Wed, Jan 07, 2015 at 11:13:09PM +0100, Florian Westphal wrote:
> Vadim Kochan <vadim4j@gmail.com> wrote:
> > I'd like to make possible to generate plot info about class hierarchy
> > with qdisc, filters, stats info, but I see that all this staffs are
> > printed to 'FILE *fp' which makes this hard to implement and seems to me
> > that it should be changed to be printed to 'char *s' (memory safety
> > should be considered by asprintf or similar), so this is a big change
> > but all this is possible and I ready to do it, but before do it I'd like
> > to ask if this could be acceptable as idea.
> > 
> > BUT maybe someone can advise me easier solution ?
> 
> Would fmemopen/open_memstream(3) do what you want?
I briefly looked on it, and it might help, THANKS A LOT!

^ permalink raw reply

* Re: [PATCH 2/2] Fix copy-paste bug: assign from src struct not dest
From: Johannes Berg @ 2015-01-07 22:16 UTC (permalink / raw)
  To: Giel van Schijndel
  Cc: Andy Shevchenko, linux-kernel, Kalle Valo, Eliad Peller,
	John W. Linville, Arik Nemtsov, open list:TI WILINK WIRELES...,
	open list:NETWORKING DRIVERS
In-Reply-To: <20150107191836.GA18978@salidar.dom.custoft.eu>

On Wed, 2015-01-07 at 20:18 +0100, Giel van Schijndel wrote:

> IMO the aligned block of code has the significant advantage of taking
> advantage of humans' ability to spot things that break a pattern. Which
> in this case becomes *very* visible when properly aligned, because
> without the alignment there is no (visual) pattern (or at least not one
> very suitable for my "visual processing system", I know the same applies
> to at least some others).

Yeah, well, but why even invoke that "visual processing system"?

If you look, for example, at the __skb_clone function it just uses a
macro:

#define C(x) n->x = skb->x

and then

	C(len);
	C(data_len);

etc.

johannes

^ permalink raw reply

* Re: iproute2 tc: Print qdisc, stats, class, filter info to string rather than FILE
From: Florian Westphal @ 2015-01-07 22:13 UTC (permalink / raw)
  To: Vadim Kochan; +Cc: netdev
In-Reply-To: <20150107215331.GA19197@angus-think.lan>

Vadim Kochan <vadim4j@gmail.com> wrote:
> I'd like to make possible to generate plot info about class hierarchy
> with qdisc, filters, stats info, but I see that all this staffs are
> printed to 'FILE *fp' which makes this hard to implement and seems to me
> that it should be changed to be printed to 'char *s' (memory safety
> should be considered by asprintf or similar), so this is a big change
> but all this is possible and I ready to do it, but before do it I'd like
> to ask if this could be acceptable as idea.
> 
> BUT maybe someone can advise me easier solution ?

Would fmemopen/open_memstream(3) do what you want?

^ permalink raw reply

* iproute2 tc: Print qdisc, stats, class, filter info to string rather than FILE
From: Vadim Kochan @ 2015-01-07 21:53 UTC (permalink / raw)
  To: netdev; +Cc: vadim4j

Hi,

I'd like to make possible to generate plot info about class hierarchy
with qdisc, filters, stats info, but I see that all this staffs are
printed to 'FILE *fp' which makes this hard to implement and seems to me
that it should be changed to be printed to 'char *s' (memory safety
should be considered by asprintf or similar), so this is a big change
but all this is possible and I ready to do it, but before do it I'd like
to ask if this could be acceptable as idea.

BUT maybe someone can advise me easier solution ?

Regards,

^ permalink raw reply

* Re: REGRESSION in nfnetlink on 3.18.x (bisected)
From: Andre Tomt @ 2015-01-07 22:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev
In-Reply-To: <5498A7DF.6040903@tomt.net>

On 23. des. 2014 00:23, Andre Tomt wrote:
> On 22. des. 2014 12:56, Pablo Neira Ayuso wrote:
>> Could you give a test to this patch? Thanks.
>>
>
> Initial testing looks good with this patch applied on top of 3.18.1
> I will give it a spin on some more systems tomorrow.

No news is good news :-)

~10 3.18.x systems in various roles have had this fix for two weeks with 
no issues.

^ permalink raw reply

* Re: [linux-nics] [PATCH] e1000: fix time comparison
From: Jeff Kirsher @ 2015-01-07 22:02 UTC (permalink / raw)
  To: Asaf Vertz
  Cc: linux.nics, e1000-devel, linux-kernel, matthew.vick, netdev,
	carolyn.wyborny
In-Reply-To: <20150107124145.GA15954@ubuntu>

[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

On Wed, 2015-01-07 at 14:41 +0200, Asaf Vertz wrote:
> To be future-proof and for better readability the time comparisons are
> modified to use time_after_eq() instead of plain, error-prone math.
> 
> Signed-off-by: Asaf Vertz <asaf.vertz@tandemg.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_ethtool.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

Thanks Asaf, I will add your patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: John Fastabend @ 2015-01-07 22:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev@vger.kernel.org,
	David S. Miller, Andy Gospodarek
In-Reply-To: <CAADnVQLGLy5w8fcUKdVG6g6EpMJ=n=NMn=pq+mdYSqAOJdtk=w@mail.gmail.com>

On 01/07/2015 01:17 PM, Alexei Starovoitov wrote:
> On Tue, Jan 6, 2015 at 9:37 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>> - above plus put_header_graph() which will allow to
>>>     rearrange some fixed sized headers ?
>>
>> OK but I'm not sure where/if these devices exist. Maybe your
>> thinking of a software dataplane case? Would get_headers return
>> some headers/fields but not include them in the graph and then
>> expect the user to build a graph with them if the user needs
>> them. Are there restrictions on how the graph can be built
>> out? I guess I'm working with the assumption that the device
>> returns a complete parse graph for all combinations it supports.
>
> ahh. I thought that get_hdr_graph() will return one
> that is currently configured and put_hdr_graph()
> will try to configure new sequence of headers.
> I think returning all possible combinations is not practical,
> since number of such combinations can be very large for
> some hw.

Agree here I think it should return the currently configured
and active hdr graph. Just to be clear I had assumed that any driver
that supported put_header_graph would also support a put_headers
call. basically your case 3 below.

> Also it seems that 4/11 patch and rocker_header_nodes[]
> in particular describing one graph instead of
> all possible?

It returns the one and only graph rocker supports now or at least
the graph of supported headers as I read the rocker code. As
rocker becomes more flexible I would expect this to grow including
tunnels, stacked headers, tcp, etc.

>
>>> - above plus put_header() ?
>>>     I'm having a hard time envisioning how that would
>>>     look like.
>>
>> This case makes more sense to me. The user supplies the definition
>> of the headers and the graph showing how they are related and the
>> driver can program the parser to work correctly.
>
> yes, assuming that put_hdr_graph() programs one
> sequence of jumping through hdrs...
> but I think it's also fine if you do one put_hdrs_and_graph()
> function as you described.
>
>> To be honest though I would really be happy getting the 1st option
>> working.
>
> agree.
> as long as we don't screw up get*() semantics that
> prevent clean put*() logic :)
> To illustrate my point:
> if hw parser can parse 2 vlans and there is
> no way to configure it to do zero, one or three, it's perfectly
> fine for put_hdr_graph() to fail when it tries to configure
> something different.
> But if hw can be configured to do 1 vlan or 2 vlans, it
> would be overkill to pass both graphs in get().
> Just pass one that is currently active and let put() try things?

This is what I intended. I think it is good enough.

> I think get_hdrs() on its own is good enough indication
> to the user what hw is capable of and hdr_graph is
> just a jump table between them. If hw can parse vxlan
> without vxlan extensions it will be clearly seen in get_hdrs,
> so no point trying to do put_hdrs() with some new
> definition of vxlan unless parser is fully programmable.
> that's where I was going with my category 2 where
> only put_hdr_graph() exists... imo it will fit trident
> and alta models ?
> Personally I believe that we should design this API
> with as much as possible real hw in mind.
> rocker can support different models of hw...
>

Yep. Which is why at some point I would like to program up
a couple other "worlds" for rocker that have different pipelines.
This would allow experimenting with more then the current static
model rocker uses.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [PATCH net-next] Driver: Vmxnet3: Reinitialize vmxnet3 backend on wakeup from hibernate
From: Sergei Shtylyov @ 2015-01-07 21:57 UTC (permalink / raw)
  To: Shrikrishna Khare, sbhatewara, pv-drivers, netdev, linux-kernel
  Cc: Srividya Murali
In-Reply-To: <1420653393-27657-1-git-send-email-skhare@vmware.com>

Hello.

On 01/07/2015 08:56 PM, Shrikrishna Khare wrote:

> Failing to reinitialize on wakeup results in loss of network connectivity for
> vmxnet3 interface.

> Signed-off-by: Srividya Murali <smurali@vmware.com>
> Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
> Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
> ---
>   drivers/net/vmxnet3/vmxnet3_drv.c |   50 +++++++++++++++++++++---------------
>   drivers/net/vmxnet3/vmxnet3_int.h |    4 +-
>   2 files changed, 31 insertions(+), 23 deletions(-)

> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> index 7af1f5c..124cb9f 100644
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -3290,51 +3290,59 @@ skip_arp:
>   static int
>   vmxnet3_resume(struct device *device)
>   {
> -	int err, i = 0;
> +	int err;
>   	unsigned long flags;
>   	struct pci_dev *pdev = to_pci_dev(device);
>   	struct net_device *netdev = pci_get_drvdata(pdev);
>   	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> -	struct Vmxnet3_PMConf *pmConf;
>
>   	if (!netif_running(netdev))
>   		return 0;
>
> -	/* Destroy wake-up filters. */
> -	pmConf = adapter->pm_conf;
> -	memset(pmConf, 0, sizeof(*pmConf));
> -
> -	adapter->shared->devRead.pmConfDesc.confVer = cpu_to_le32(1);
> -	adapter->shared->devRead.pmConfDesc.confLen = cpu_to_le32(sizeof(
> -								  *pmConf));
> -	adapter->shared->devRead.pmConfDesc.confPA =
> -		cpu_to_le64(adapter->pm_conf_pa);
> -
> -	netif_device_attach(netdev);
>   	pci_set_power_state(pdev, PCI_D0);
>   	pci_restore_state(pdev);
>   	err = pci_enable_device_mem(pdev);
>   	if (err != 0)
> -		return err;
> +		goto err;

    Why?

>
>   	pci_enable_wake(pdev, PCI_D0, 0);
>
> +	vmxnet3_alloc_intr_resources(adapter);
> +
> +	/* During hibernate and suspend, device has to be reinitialized as the
> +	 * device state need not be preserved.
> +	 */
> +
> +	/* Need not check adapter state as other reset tasks cannot run during
> +	 * device resume.
> +	 */
>   	spin_lock_irqsave(&adapter->cmd_lock, flags);
>   	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> -			       VMXNET3_CMD_UPDATE_PMCFG);
> +			       VMXNET3_CMD_QUIESCE_DEV);
>   	spin_unlock_irqrestore(&adapter->cmd_lock, flags);
> -	vmxnet3_alloc_intr_resources(adapter);
> -	vmxnet3_request_irqs(adapter);
> -	for (i = 0; i < adapter->num_rx_queues; i++)
> -		napi_enable(&adapter->rx_queue[i].napi);
> -	vmxnet3_enable_all_intrs(adapter);
> +	vmxnet3_tq_cleanup_all(adapter);
> +	vmxnet3_rq_cleanup_all(adapter);
>
> -	return 0;
> +	vmxnet3_reset_dev(adapter);
> +	err = vmxnet3_activate_dev(adapter);
> +	if (err) {
> +		netdev_err(adapter->netdev,

    Not just 'netdev'?

> +			   "%s: failed to re-activate on resume, error: %d",
> +		    netdev->name, err);

    Should have been aligned with the line right above. And isn't netdev->name 
printed by netdev_err() already?

> +		vmxnet3_force_close(adapter);
> +		goto err;

    Why not just *return*?

> +	}
> +	netif_device_attach(netdev);
> +
> +err:
> +	return err;
>   }
[...]

WBR, Sergei

^ permalink raw reply

* Re: [net v2 2/3] i40e: Fix Rx checksum error counter
From: Tom Herbert @ 2015-01-07 21:34 UTC (permalink / raw)
  To: Singhai, Anjali
  Cc: Kirsher, Jeffrey T, David Miller, Linux Netdev List,
	nhorman@redhat.com, sassmann@redhat.com, jogreene@redhat.com,
	Rose, Gregory V
In-Reply-To: <C40BE8378EF49C44B9184714DBC8EF2992D9668D@ORSMSX110.amr.corp.intel.com>

On Wed, Jan 7, 2015 at 12:14 PM, Singhai, Anjali
<anjali.singhai@intel.com> wrote:
> On Tue, 6 Jan 2015 21:43:57 -0800
> Tom Herbert <therbert@google.com> wrote:
>> > @@ -1337,15 +1335,19 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
>> >                                           skb->protocol == htons(ETH_P_8021AD))
>> >                                           ? VLAN_HLEN : 0;
>> >
>> > -               rx_udp_csum = udp_csum(skb);
>> > -               iph = ip_hdr(skb);
>> > -               csum = csum_tcpudp_magic(
>> > -                               iph->saddr, iph->daddr,
>> > -                               (skb->len - skb_transport_offset(skb)),
>> > -                               IPPROTO_UDP, rx_udp_csum);
>> > +               if ((ip_hdr(skb)->protocol == IPPROTO_UDP) &&
>> > +                   (udp_hdr(skb)->check != 0)) {
>> > +                       rx_udp_csum = udp_csum(skb);
>>
>> Doesn't this compute the whole checksum of the packet making the fact
>> that device verified inner checksum pretty much irrelevant? It would
>> probably be just as well to return CHECKSUM_NONE and let the stack
>> deal with it and remove all this complexity.
>
> This is only calculating outer UDP csum, inner csums are offloaded and so is the outer IP csum. Overall this is less work than asking the stack to do all of those by marking it as CHECKSUM_UNNECESSARY. We do have a patch in line to use csum_level but I believe even with that we would be asking the stack to do more work than necessary if we indicate that only inner checksums are offloaded. With our HW we are able to offload 3 out of 4 csums.

The stack will only compute the checksum over the packet zero or one
times. The fact that the driver is doing it instead of the stack makes
little difference, once the decision is made to checksum the packet
the performance gains of offloading are lost. If you defer to the
stack then this complex code is removed and there is the possibility
that the checksum doesn't even need to be calculated at all (like UDP
packet is bad or no listener for port).

What does "we are able to offload 3 out of 4 csums" mean?

Thanks,
Tom

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Yuchung Cheng @ 2015-01-07 21:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <A58F8257-97F8-4662-98FC-7AC57F35A3A4@bengler.no>

On Wed, Jan 7, 2015 at 12:37 PM, Erik Grinaker <erik@bengler.no> wrote:
> On 07 Jan 2015, at 15:58, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
>>> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
>>>> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
>>>>>
>>>>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
>>>>> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
>>>> probably some minor change, which gets amplified by the lack of SACKs
>>>> on the loadbalancer. Anyway, I’ll bring it up with Amazon.
>>>> can you post traces with the older kernels?
>>>
>>> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
>>>
>>> The transfer shows lots of DUPACKs and retransmits, but this does not
>>> seem to have as bad an effect as it did with the failing transfer we
>>> saw on newer kernels:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>>
>>> One big difference, which Rick touched on earlier, is that the newer
>>> kernels keep sending TCP window updates as it’s going through the
>>> retransmits. The older kernel does not do this.
>>
>> The new kernel is the receiver : It does no retransmits.
>>
>> Increasing window in ACK packets should not prevent sender into
>> retransmitting missing packets.
>>
>> Sender is not a linux host and is very buggy IMO : If receiver
>> advertises a too big window, sender decides to not retransmit in some
>> cases.
>
> I agree. I have contacted Amazon about this, but am not too hopeful for a quick fix; they have been promising SACK-support on their loadbalancers since 2006, for example.
>
> That said, since this change breaks a service as popular as S3, it might be worth reconsidering.
With the newer kernel and bigger receive window, the sender skips (the
already slow NewReno) fast recovery and falls back to (exp backoff)
timeout recovery. Reducing rwin to accommodate the sender's bug seems
backward to me.


>
>> You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
>> to work around the sender bug.
>>
>> ( Or use SO_RCVBUF in receiver application)
>
> Thanks, setting SO_RCVBUF seems like a reasonable workaround.

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Eric Dumazet @ 2015-01-07 21:33 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <A58F8257-97F8-4662-98FC-7AC57F35A3A4@bengler.no>

On Wed, 2015-01-07 at 20:37 +0000, Erik Grinaker wrote:

> I agree. I have contacted Amazon about this, but am not too hopeful
> for a quick fix; they have been promising SACK-support on their
> loadbalancers since 2006, for example.
> 
> That said, since this change breaks a service as popular as S3, it
> might be worth reconsidering.
> 

Which change are you talking about ? Have you done a bisection to
clearly identify the patch exposing this sender bug ?

We are not going to stick TCP stack to 20th century and buggy peers or
middleboxes, sorry.

^ permalink raw reply

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: Alexei Starovoitov @ 2015-01-07 21:17 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev@vger.kernel.org,
	David S. Miller, Andy Gospodarek

On Tue, Jan 6, 2015 at 9:37 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>> - above plus put_header_graph() which will allow to
>>    rearrange some fixed sized headers ?
>
> OK but I'm not sure where/if these devices exist. Maybe your
> thinking of a software dataplane case? Would get_headers return
> some headers/fields but not include them in the graph and then
> expect the user to build a graph with them if the user needs
> them. Are there restrictions on how the graph can be built
> out? I guess I'm working with the assumption that the device
> returns a complete parse graph for all combinations it supports.

ahh. I thought that get_hdr_graph() will return one
that is currently configured and put_hdr_graph()
will try to configure new sequence of headers.
I think returning all possible combinations is not practical,
since number of such combinations can be very large for
some hw.
Also it seems that 4/11 patch and rocker_header_nodes[]
in particular describing one graph instead of
all possible?

>> - above plus put_header() ?
>>    I'm having a hard time envisioning how that would
>>    look like.
>
> This case makes more sense to me. The user supplies the definition
> of the headers and the graph showing how they are related and the
> driver can program the parser to work correctly.

yes, assuming that put_hdr_graph() programs one
sequence of jumping through hdrs...
but I think it's also fine if you do one put_hdrs_and_graph()
function as you described.

> To be honest though I would really be happy getting the 1st option
> working.

agree.
as long as we don't screw up get*() semantics that
prevent clean put*() logic :)
To illustrate my point:
if hw parser can parse 2 vlans and there is
no way to configure it to do zero, one or three, it's perfectly
fine for put_hdr_graph() to fail when it tries to configure
something different.
But if hw can be configured to do 1 vlan or 2 vlans, it
would be overkill to pass both graphs in get().
Just pass one that is currently active and let put() try things?
I think get_hdrs() on its own is good enough indication
to the user what hw is capable of and hdr_graph is
just a jump table between them. If hw can parse vxlan
without vxlan extensions it will be clearly seen in get_hdrs,
so no point trying to do put_hdrs() with some new
definition of vxlan unless parser is fully programmable.
that's where I was going with my category 2 where
only put_hdr_graph() exists... imo it will fit trident
and alta models ?
Personally I believe that we should design this API
with as much as possible real hw in mind.
rocker can support different models of hw...

^ permalink raw reply

* Webmail Help-desk support
From: Marianne Gordon @ 2015-01-07 21:00 UTC (permalink / raw)
  To: info

Your email has exceeded the storage limit created. You will no longer
be able to send or receive messages.
To activate,  Chick here and fill in the necessary information
;http://onlineseverupdatehanceclear.jigsy.com/

Consideration must be activated in the day to regenerate new surfaces.
Help-desk support.

^ permalink raw reply

* Re: [PATCH] vhost/net: length miscalculation
From: Sergei Shtylyov @ 2015-01-07 20:58 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Alex Williamson, Greg Kurz, kvm, virtualization, netdev
In-Reply-To: <1420620847-24477-1-git-send-email-mst@redhat.com>

Hello.

On 01/07/2015 11:55 AM, Michael S. Tsirkin wrote:

> commit 8b38694a2dc8b18374310df50174f1e4376d6824
>      vhost/net: virtio 1.0 byte swap
> had this chunk:
> -       heads[headcount - 1].len += datalen;
> +       heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);

> This adds datalen with the wrong sign, causing guest panics.

> Fixes: 8b38694a2dc8b18374310df50174f1e4376d6824

    The format of this tag assumes 12-digit SHA1 hash and the commit 
description enclosed in parens and double quotes. See 
Documentation/SubmittingPatches.

> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Suggested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2015-01-07 20:52 UTC (permalink / raw)
  To: Fan Du
  Cc: Du, Fan, Thomas Graf, davem@davemloft.net, Michael S. Tsirkin,
	Jason Wang, netdev@vger.kernel.org, fw@strlen.de,
	dev@openvswitch.org, pshelar@nicira.com
In-Reply-To: <54ACCAFD.4070203@gmail.com>

On Tue, Jan 6, 2015 at 9:58 PM, Fan Du <fengyuleidian0615@gmail.com> wrote:
> 于 2015年01月07日 03:11, Jesse Gross 写道:
>>>>
>>>> One of the reasons for only doing path MTU discovery
>>>> >>for L3 is that it operates seamlessly as part of normal operation -
>>>> >>there is no need to forge addresses or potentially generate ICMP when
>>>> >>on an L2 network. However, this ignores the IP handling that is going
>>>> >>on (note that in OVS it is possible for L3 to be implemented as a set
>>>> >>of flows coming from a controller).
>>>> >>
>>>> >>It also should not be VXLAN specific or duplicate VXLAN encapsulation
>>>> >>code. As this is happening before encapsulation, the generated ICMP
>>>> >>does not need to be encapsulated either if it is created in the right
>>>> >>location.
>>>
>>> >
>>> >Yes, I agree. GRE share the same issue from the code flow.
>>> >Pushing back ICMP msg back without encapsulation without circulating
>>> > down
>>> >to physical device is possible. The "right location" as far as I know
>>> >could only be in ovs_vport_send. In addition this probably requires
>>> > wrapper
>>> >route looking up operation for GRE/VXLAN, after get the under layer
>>> > device
>>> >MTU
>>> >from the routing information, then calculate reduced MTU becomes
>>> > feasible.
>>
>> As I said, it needs to be integrated into L3 processing. In OVS this
>> would mean adding some primitives to the kernel and then exposing the
>> functionality upwards into userspace/controller.
>
>
> I'm bit of confused with "L3 processing" you mentioned here... SORRY
> Apparently I'm not seeing the whole picture as you pointed out. Could you
> please
> elaborate "L3 processing" a bit more? docs/codes/or other useful links.
> Appreciated.

L3 processing is anywhere that routing takes place - i.e. where you
would decrement the TTL and change the MAC addresses. Part of routing
is dealing with differing MTUs, so that needs to be integrated into
the same logic.

> My understanding is:
> controller sets the forwarding rules into kernel datapath, any flow not
> matching
> with the rules are threw to controller by upcall. Once the rule decision is
> made
> by controller, then, this flow packet is pushed down to datapath to be
> forwarded
> again according to the new rule.
>
> So I'm not sure whether pushing the over-MTU-sized packet or pushing the
> forged ICMP
> without encapsulation to controller is required by current ovs
> implementation. By doing
> so, such over-MTU-sized packet is treated as a event for the controller to
> be take
> care of.

If flows are implementing routing (again, they are doing things like
decrementing the TTL) then it is necessary for them to also handle
this situation using some potentially new primitives (like a size
check). Otherwise you end up with issues like the ones that I
mentioned above like needing to forge addresses because you don't know
what the correct ones are. If the flows aren't doing things to
implement routing, then you really have a flat L2 network and you
shouldn't be doing this type of behavior at all as I described in the
original plan.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox