Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] vhost/net: length miscalculation
From: Alex Williamson @ 2015-01-07 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Greg Kurz, kvm, virtualization, netdev
In-Reply-To: <1420620847-24477-1-git-send-email-mst@redhat.com>

On Wed, 2015-01-07 at 10:55 +0200, Michael S. Tsirkin wrote:
> commit 8b38694a2dc8b18374310df50174f1e4376d6824
>     vhost/net: virtio 1.0 byte swap
> had this chunk:
> -       heads[headcount - 1].len += datalen;
> +       heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
> 
> This adds datalen with the wrong sign, causing guest panics.
> 
> Fixes: 8b38694a2dc8b18374310df50174f1e4376d6824
> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Suggested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Alex, could you please confirm this fixes the crash for you?

Confirmed, this works.  Thanks,

Alex

>  drivers/vhost/net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 14419a8..d415d69 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -538,7 +538,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
>  		++headcount;
>  		seg += in;
>  	}
> -	heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);
> +	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
>  	*iovcount = seg;
>  	if (unlikely(log))
>  		*log_num = nlogs;




^ permalink raw reply

* Re: [PATCH iproute2 3/3] ip netns: Delete all netns
From: Brian Haley @ 2015-01-07 15:44 UTC (permalink / raw)
  To: Vadim Kochan, netdev
In-Reply-To: <1420628662-9930-4-git-send-email-vadim4j@gmail.com>

On 01/07/2015 06:04 AM, Vadim Kochan wrote:
> From: Vadim Kochan <vadim4j@gmail.com>
> 
> Allow delete all namespace names by:
> 
>     $ ip netns del all

So I can still create a namespace called 'all', but can't exec in it or delete
it independently with this change.  Perhaps you need to block that as well?
Unless there's some other patch I'm missing?

-Brian

^ permalink raw reply

* Re: [PATCH net-next] net: sched: use pinned timers
From: Eric Dumazet @ 2015-01-07 15:45 UTC (permalink / raw)
  To: Cosmin GIRADU; +Cc: netdev
In-Reply-To: <54AD0543.1090806@rcs-rds.ro>

On Wed, 2015-01-07 at 12:06 +0200, Cosmin GIRADU wrote:

> Hi Eric,
> 
>     I saw that this patch didn't make it's way into the stable branches.
> So I have two questions:
>         - Would it be safe to apply to linux-3.12.x stable?
>         - If yes, would there be any [noticeable] efects on a [pretty
> complex] HTB setup? (I know, test and I'll see,
>            but if theory sais I won't, then there would be no point to
> the test, would there?)
> 

It is safe to backport.

If you want to test an equivalent, without kernel patch, you simply can
do :

echo 0 >/proc/sys/kernel/timer_migration

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Eric Dumazet @ 2015-01-07 15:58 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <34CE0233-F881-4F40-B119-AA9D8F7D500F@bengler.no>

On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@google.com> wrote:
> > On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@bengler.no> wrote:
> >> 
> >>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
> >> This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s
> > probably some minor change, which gets amplified by the lack of SACKs
> > on the loadbalancer. Anyway, I’ll bring it up with Amazon.
> > can you post traces with the older kernels?
> 
> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
> 
> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
> 
> The transfer shows lots of DUPACKs and retransmits, but this does not
> seem to have as bad an effect as it did with the failing transfer we
> saw on newer kernels:
> 
> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
> 
> One big difference, which Rick touched on earlier, is that the newer
> kernels keep sending TCP window updates as it’s going through the
> retransmits. The older kernel does not do this.

The new kernel is the receiver : It does no retransmits.

Increasing window in ACK packets should not prevent sender into
retransmitting missing packets.

Sender is not a linux host and is very buggy IMO : If receiver
advertises a too big window, sender decides to not retransmit in some
cases.

You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
to work around the sender bug.

( Or use SO_RCVBUF in receiver application)

^ permalink raw reply

* Re: [PATCH v4] can: Convert to runtime_pm
From: Sören Brinkmann @ 2015-01-07 15:58 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Kedareswara rao Appana, wg, michal.simek, grant.likely, linux-can,
	netdev, linux-kernel, Kedareswara rao Appana
In-Reply-To: <54AD267C.4060004@pengutronix.de>

On Wed, 2015-01-07 at 01:28PM +0100, Marc Kleine-Budde wrote:
> On 12/23/2014 01:25 PM, Kedareswara rao Appana wrote:
> > Instead of enabling/disabling clocks at several locations in the driver,
> > use the runtime_pm framework. This consolidates the actions for
> > runtime PM in the appropriate callbacks and makes the driver more
> > readable and mantainable.
> > 
> > Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> > Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> > ---
> > Chnages for v4:
> >  - Updated with the review comments.
> > Changes for v3:
> >   - Converted the driver to use runtime_pm.
> > Changes for v2:
> >   - Removed the struct platform_device* from suspend/resume
> >     as suggest by Lothar.
> > 
> >  drivers/net/can/xilinx_can.c |  123 +++++++++++++++++++++++++-----------------
> >  1 files changed, 74 insertions(+), 49 deletions(-)
> > 
> > diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
> > index 6c67643..c71f683 100644
> > --- a/drivers/net/can/xilinx_can.c
> > +++ b/drivers/net/can/xilinx_can.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/can/dev.h>
> >  #include <linux/can/error.h>
> >  #include <linux/can/led.h>
> > +#include <linux/pm_runtime.h>
> >  
> >  #define DRIVER_NAME	"xilinx_can"
> >  
> > @@ -138,7 +139,7 @@ struct xcan_priv {
> >  	u32 (*read_reg)(const struct xcan_priv *priv, enum xcan_reg reg);
> >  	void (*write_reg)(const struct xcan_priv *priv, enum xcan_reg reg,
> >  			u32 val);
> > -	struct net_device *dev;
> > +	struct device *dev;
> >  	void __iomem *reg_base;
> >  	unsigned long irq_flags;
> >  	struct clk *bus_clk;
> > @@ -842,6 +843,13 @@ static int xcan_open(struct net_device *ndev)
> >  	struct xcan_priv *priv = netdev_priv(ndev);
> >  	int ret;
> >  
> > +	ret = pm_runtime_get_sync(priv->dev);
> > +	if (ret < 0) {
> > +		netdev_err(ndev, "%s: pm_runtime_get failed\r(%d)\n\r",
> > +				__func__, ret);
> > +		return ret;
> > +	}
> > +
> >  	ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
> >  			ndev->name, ndev);
> >  	if (ret < 0) {
> > @@ -849,29 +857,17 @@ static int xcan_open(struct net_device *ndev)
> >  		goto err;
> >  	}
> >  
> > -	ret = clk_prepare_enable(priv->can_clk);
> > -	if (ret) {
> > -		netdev_err(ndev, "unable to enable device clock\n");
> > -		goto err_irq;
> > -	}
> > -
> > -	ret = clk_prepare_enable(priv->bus_clk);
> > -	if (ret) {
> > -		netdev_err(ndev, "unable to enable bus clock\n");
> > -		goto err_can_clk;
> > -	}
> > -
> >  	/* Set chip into reset mode */
> >  	ret = set_reset_mode(ndev);
> >  	if (ret < 0) {
> >  		netdev_err(ndev, "mode resetting failed!\n");
> > -		goto err_bus_clk;
> > +		goto err_irq;
> >  	}
> >  
> >  	/* Common open */
> >  	ret = open_candev(ndev);
> >  	if (ret)
> > -		goto err_bus_clk;
> > +		goto err_irq;
> >  
> >  	ret = xcan_chip_start(ndev);
> >  	if (ret < 0) {
> > @@ -887,13 +883,11 @@ static int xcan_open(struct net_device *ndev)
> >  
> >  err_candev:
> >  	close_candev(ndev);
> > -err_bus_clk:
> > -	clk_disable_unprepare(priv->bus_clk);
> > -err_can_clk:
> > -	clk_disable_unprepare(priv->can_clk);
> >  err_irq:
> >  	free_irq(ndev->irq, ndev);
> >  err:
> > +	pm_runtime_put(priv->dev);
> > +
> >  	return ret;
> >  }
> >  
> > @@ -910,12 +904,11 @@ static int xcan_close(struct net_device *ndev)
> >  	netif_stop_queue(ndev);
> >  	napi_disable(&priv->napi);
> >  	xcan_chip_stop(ndev);
> > -	clk_disable_unprepare(priv->bus_clk);
> > -	clk_disable_unprepare(priv->can_clk);
> >  	free_irq(ndev->irq, ndev);
> >  	close_candev(ndev);
> >  
> >  	can_led_event(ndev, CAN_LED_EVENT_STOP);
> > +	pm_runtime_put(priv->dev);
> >  
> >  	return 0;
> >  }
> > @@ -934,27 +927,20 @@ static int xcan_get_berr_counter(const struct net_device *ndev,
> >  	struct xcan_priv *priv = netdev_priv(ndev);
> >  	int ret;
> >  
> > -	ret = clk_prepare_enable(priv->can_clk);
> > -	if (ret)
> > -		goto err;
> > -
> > -	ret = clk_prepare_enable(priv->bus_clk);
> > -	if (ret)
> > -		goto err_clk;
> > +	ret = pm_runtime_get_sync(priv->dev);
> > +	if (ret < 0) {
> > +		netdev_err(ndev, "%s: pm_runtime_get failed\r(%d)\n\r",
> > +				__func__, ret);
> 
> Please remove the \r from the error messages.
> 
> > +		return ret;
> > +	}
> >  
> >  	bec->txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK;
> >  	bec->rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) &
> >  			XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT);
> >  
> > -	clk_disable_unprepare(priv->bus_clk);
> > -	clk_disable_unprepare(priv->can_clk);
> > +	pm_runtime_put(priv->dev);
> >  
> >  	return 0;
> > -
> > -err_clk:
> > -	clk_disable_unprepare(priv->can_clk);
> > -err:
> > -	return ret;
> >  }
> >  
> >  
> > @@ -967,15 +953,45 @@ static const struct net_device_ops xcan_netdev_ops = {
> >  
> >  /**
> >   * xcan_suspend - Suspend method for the driver
> > - * @dev:	Address of the platform_device structure
> > + * @dev:	Address of the device structure
> >   *
> >   * Put the driver into low power mode.
> > - * Return: 0 always
> > + * Return: 0 on success and failure value on error
> >   */
> >  static int __maybe_unused xcan_suspend(struct device *dev)
> >  {
> > -	struct platform_device *pdev = dev_get_drvdata(dev);
> > -	struct net_device *ndev = platform_get_drvdata(pdev);
> > +	if (!device_may_wakeup(dev))
> > +		return pm_runtime_force_suspend(dev);
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * xcan_resume - Resume from suspend
> > + * @dev:	Address of the device structure
> > + *
> > + * Resume operation after suspend.
> > + * Return: 0 on success and failure value on error
> > + */
> > +static int __maybe_unused xcan_resume(struct device *dev)
> > +{
> > +	if (!device_may_wakeup(dev))
> > +		return pm_runtime_force_resume(dev);
> > +
> > +	return 0;
> > +
> > +}
> > +
> > +/**
> > + * xcan_runtime_suspend - Runtime suspend method for the driver
> > + * @dev:	Address of the device structure
> > + *
> > + * Put the driver into low power mode.
> > + * Return: 0 always
> > + */
> > +static int __maybe_unused xcan_runtime_suspend(struct device *dev)
> > +{
> > +	struct net_device *ndev = dev_get_drvdata(dev);
> >  	struct xcan_priv *priv = netdev_priv(ndev);
> >  
> >  	if (netif_running(ndev)) {
> > @@ -993,16 +1009,15 @@ static int __maybe_unused xcan_suspend(struct device *dev)
> >  }
> >  
> >  /**
> > - * xcan_resume - Resume from suspend
> > - * @dev:	Address of the platformdevice structure
> > + * xcan_runtime_resume - Runtime resume from suspend
> > + * @dev:	Address of the device structure
> >   *
> >   * Resume operation after suspend.
> >   * Return: 0 on success and failure value on error
> >   */
> > -static int __maybe_unused xcan_resume(struct device *dev)
> > +static int __maybe_unused xcan_runtime_resume(struct device *dev)
> >  {
> > -	struct platform_device *pdev = dev_get_drvdata(dev);
> > -	struct net_device *ndev = platform_get_drvdata(pdev);
> > +	struct net_device *ndev = dev_get_drvdata(dev);
> >  	struct xcan_priv *priv = netdev_priv(ndev);
> >  	int ret;
> 
> Some more context:
> 
> > 	ret = clk_enable(priv->bus_clk);
> > 	if (ret) {
> > 		dev_err(dev, "Cannot enable clock.\n");
> > 		return ret;
> > 	}
> > 	ret = clk_enable(priv->can_clk);
> > 	if (ret) {
> > 		dev_err(dev, "Cannot enable clock.\n");
> > 		clk_disable_unprepare(priv->bus_clk);
> 
> This disable_unprepare looks wrong, should be a disable only.
> 
> > 		return ret;
> > 	}
> > 
> > 	priv->write_reg(priv, XCAN_MSR_OFFSET, 0);
> > 	priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_CEN_MASK);
> > 
> > 	if (netif_running(ndev)) {
> > 		priv->can.state = CAN_STATE_ERROR_ACTIVE;
> 
> What happens if the device was not in ACTIVE state prior to the
> runtime_suspend?
> 
> > 		netif_device_attach(ndev);
> > 		netif_start_queue(ndev);
> > 	}
> > 
> > 	return 0;
> > }
> 
> 
> >  
> > @@ -1020,9 +1035,9 @@ static int __maybe_unused xcan_resume(struct device *dev)
> >  
> >  	priv->write_reg(priv, XCAN_MSR_OFFSET, 0);
> >  	priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_CEN_MASK);
> > -	priv->can.state = CAN_STATE_ERROR_ACTIVE;
> >  
> >  	if (netif_running(ndev)) {
> > +		priv->can.state = CAN_STATE_ERROR_ACTIVE;
> >  		netif_device_attach(ndev);
> >  		netif_start_queue(ndev);
> >  	}
> > @@ -1030,7 +1045,10 @@ static int __maybe_unused xcan_resume(struct device *dev)
> >  	return 0;
> >  }
> >  
> > -static SIMPLE_DEV_PM_OPS(xcan_dev_pm_ops, xcan_suspend, xcan_resume);
> > +static const struct dev_pm_ops xcan_dev_pm_ops = {
> > +	SET_SYSTEM_SLEEP_PM_OPS(xcan_suspend, xcan_resume)
> > +	SET_PM_RUNTIME_PM_OPS(xcan_runtime_suspend, xcan_runtime_resume, NULL)
> > +};
> >  
> >  /**
> >   * xcan_probe - Platform registration call
> > @@ -1071,7 +1089,7 @@ static int xcan_probe(struct platform_device *pdev)
> >  		return -ENOMEM;
> >  
> >  	priv = netdev_priv(ndev);
> > -	priv->dev = ndev;
> > +	priv->dev = &pdev->dev;
> >  	priv->can.bittiming_const = &xcan_bittiming_const;
> >  	priv->can.do_set_mode = xcan_do_set_mode;
> >  	priv->can.do_get_berr_counter = xcan_get_berr_counter;
> > @@ -1137,15 +1155,22 @@ static int xcan_probe(struct platform_device *pdev)
> >  
> >  	netif_napi_add(ndev, &priv->napi, xcan_rx_poll, rx_max);
> >  
> > +	pm_runtime_set_active(&pdev->dev);
> > +	pm_runtime_irq_safe(&pdev->dev);
> > +	pm_runtime_enable(&pdev->dev);
> > +	pm_runtime_get_sync(&pdev->dev);
> Check error values?
> > +
> >  	ret = register_candev(ndev);
> >  	if (ret) {
> >  		dev_err(&pdev->dev, "fail to register failed (err=%d)\n", ret);
> > +		pm_runtime_put(priv->dev);
> 
> Please move the pm_runtime_put into the common error exit path.
> 
> >  		goto err_unprepare_disable_busclk;
> >  	}
> >  
> >  	devm_can_led_init(ndev);
> > -	clk_disable_unprepare(priv->bus_clk);
> > -	clk_disable_unprepare(priv->can_clk);
> > +
> > +	pm_runtime_put(&pdev->dev);
> > +
> >  	netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n",
> >  			priv->reg_base, ndev->irq, priv->can.clock.freq,
> >  			priv->tx_max);
> > 
> 
> I think you have to convert the _remove() function, too. Have a look at
> the gpio-zynq.c driver:
> 
> > static int zynq_gpio_remove(struct platform_device *pdev)
> > {
> > 	struct zynq_gpio *gpio = platform_get_drvdata(pdev);
> > 
> > 	pm_runtime_get_sync(&pdev->dev);
> 
> However I don't understand why the get_sync() is here. Maybe Sören can help?

IIRC, the concern was that the remove function may be called while the device is
runtime suspended. Hence the remove function needs to resume the device since the
remove function may access the HW.

	Sören

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Tom Herbert @ 2015-01-07 16:05 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev@openvswitch.org
In-Reply-To: <2b43c19c47e3702b370054f8fd11233e44179edf.1420594925.git.tgraf@suug.ch>

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> Implements supports for the Group Policy VXLAN extension [0] to provide
> a lightweight and simple security label mechanism across network peers
> based on VXLAN. The security context and associated metadata is mapped
> to/from skb->mark. This allows further mapping to a SELinux context
> using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
> tc, etc.
>
> The group membership is defined by the lower 16 bits of skb->mark, the
> upper 16 bits are used for flags.
>
> SELinux allows to manage label to secure local resources. However,
> distributed applications require ACLs to implemented across hosts. This
> is typically achieved by matching on L2-L4 fields to identify the
> original sending host and process on the receiver. On top of that,
> netlabel and specifically CIPSO [1] allow to map security contexts to
> universal labels.  However, netlabel and CIPSO are relatively complex.
> This patch provides a lightweight alternative for overlay network
> environments with a trusted underlay. No additional control protocol
> is required.
>
Associating a sixteen bit field with security is worrisome, especially
considering that VXLAN provides no verification for any header fields
and doesn't even advocate use of outer UDP checksum so the field is
susceptible to an undetected single bit flip. The concept of a
"trusted underlay" is weak justification and hardly universal, so the
only way to actually secure this is through IPsec (this is mentioned
in the VXLAN-GPB draft). But if we have the security state of IPsec
then why would we need this field anyway?

Could this same functionality be achieved if we just match the VNI to
a mark in IP tables?

Tom

>            Host 1:                       Host 2:
>
>       Group A        Group B        Group B     Group A
>       +-----+   +-------------+    +-------+   +-----+
>       | lxc |   | SELinux CTX |    | httpd |   | VM  |
>       +--+--+   +--+----------+    +---+---+   +--+--+
>           \---+---/                     \----+---/
>               |                              |
>           +---+---+                      +---+---+
>           | vxlan |                      | vxlan |
>           +---+---+                      +---+---+
>               +------------------------------+
>
> Backwards compatibility:
> A VXLAN-GBP socket can receive standard VXLAN frames and will assign
> the default group 0x0000 to such frames. A Linux VXLAN socket will
> drop VXLAN-GBP  frames. The extension is therefore disabled by default
> and needs to be specifically enabled:
>
>    ip link add [...] type vxlan [...] gbp
>
> In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
> must run on a separate port number.
>
> Examples:
>   iptables:
>   $ iptables -I OUTPUT -p icmp -j MARK --set-mark 0x200
>   $ iptables -I INPUT -i br0 -m mark --mark 0x200 -j ACCEPT
>
>   OVS (patches provided separately):
>   in_port=1, actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL
>
> [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> [1] http://lwn.net/Articles/204905/
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  drivers/net/vxlan.c           | 155 ++++++++++++++++++++++++++++++------------
>  include/net/vxlan.h           |  80 ++++++++++++++++++++--
>  include/uapi/linux/if_link.h  |   8 +++
>  net/openvswitch/vport-vxlan.c |   9 ++-
>  4 files changed, 197 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..30b7b59 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -132,6 +132,7 @@ struct vxlan_dev {
>         __u8              tos;          /* TOS override */
>         __u8              ttl;
>         u32               flags;        /* VXLAN_F_* in vxlan.h */
> +       u32               exts;         /* Enabled extensions */
>
>         struct work_struct sock_work;
>         struct work_struct igmp_join;
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>                         continue;
>
>                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -               if (vh->vx_vni != vh2->vx_vni) {
> +               if (vh->vx_flags != vh2->vx_flags ||
> +                   vh->vx_vni != vh2->vx_vni) {
>                         NAPI_GRO_CB(p)->same_flow = 0;
>                         continue;
>                 }
> @@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>  {
>         struct vxlan_sock *vs;
>         struct vxlanhdr *vxh;
> +       struct vxlan_metadata md = {0};
>
>         /* Need Vxlan and inner Ethernet header to be present */
>         if (!pskb_may_pull(skb, VXLAN_HLEN))
> @@ -1113,6 +1116,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (vs->exts) {
>                 if (!vxh->vni_present)
>                         goto error_invalid_header;
> +
> +               if (vxh->gbp_present) {
> +                       if (!(vs->exts & VXLAN_EXT_GBP))
> +                               goto error_invalid_header;
> +
> +                       md.gbp = ntohs(vxh->gbp.policy_id);
> +
> +                       if (vxh->gbp.dont_learn)
> +                               md.gbp |= VXLAN_GBP_DONT_LEARN;
> +
> +                       if (vxh->gbp.policy_applied)
> +                               md.gbp |= VXLAN_GBP_POLICY_APPLIED;
> +               }
>         } else {
>                 if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
>                     (vxh->vx_vni & htonl(0xff)))
> @@ -1122,7 +1138,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
>                 goto drop;
>
> -       vs->rcv(vs, skb, vxh->vx_vni);
> +       md.vni = vxh->vx_vni;
> +       vs->rcv(vs, skb, &md);
>         return 0;
>
>  drop:
> @@ -1138,8 +1155,8 @@ error:
>         return 1;
>  }
>
> -static void vxlan_rcv(struct vxlan_sock *vs,
> -                     struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +                     struct vxlan_metadata *md)
>  {
>         struct iphdr *oip = NULL;
>         struct ipv6hdr *oip6 = NULL;
> @@ -1150,7 +1167,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>         int err = 0;
>         union vxlan_addr *remote_ip;
>
> -       vni = ntohl(vx_vni) >> 8;
> +       vni = ntohl(md->vni) >> 8;
>         /* Is this VNI defined? */
>         vxlan = vxlan_vs_find_vni(vs, vni);
>         if (!vxlan)
> @@ -1184,6 +1201,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>                 goto drop;
>
>         skb_reset_network_header(skb);
> +       skb->mark = md->gbp;
>
>         if (oip6)
>                 err = IP6_ECN_decapsulate(oip6, skb);
> @@ -1533,15 +1551,54 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
>         return false;
>  }
>
> +static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
> +                          int min_headroom, struct vxlan_metadata *md)
> +{
> +       struct vxlanhdr *vxh;
> +       int err;
> +
> +       /* Need space for new headers (invalidates iph ptr) */
> +       err = skb_cow_head(skb, min_headroom);
> +       if (unlikely(err)) {
> +               kfree_skb(skb);
> +               return err;
> +       }
> +
> +       skb = vlan_hwaccel_push_inside(skb);
> +       if (WARN_ON(!skb))
> +               return -ENOMEM;
> +
> +       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> +       vxh->vx_flags = htonl(VXLAN_FLAGS);
> +       vxh->vx_vni = md->vni;
> +
> +       if (vs->exts)  {
> +               if (vs->exts & VXLAN_EXT_GBP) {
> +                       vxh->gbp_present = 1;
> +
> +                       if (md->gbp & VXLAN_GBP_DONT_LEARN)
> +                               vxh->gbp.dont_learn = 1;
> +
> +                       if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
> +                               vxh->gbp.policy_applied = 1;
> +
> +                       vxh->gbp.policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
> +               }
> +       }
> +
> +       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> +
> +       return 0;
> +}
> +
>  #if IS_ENABLED(CONFIG_IPV6)
>  static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>                            struct dst_entry *dst, struct sk_buff *skb,
>                            struct net_device *dev, struct in6_addr *saddr,
>                            struct in6_addr *daddr, __u8 prio, __u8 ttl,
> -                          __be16 src_port, __be16 dst_port, __be32 vni,
> -                          bool xnet)
> +                          __be16 src_port, __be16 dst_port,
> +                          struct vxlan_metadata *md, bool xnet)
>  {
> -       struct vxlanhdr *vxh;
>         int min_headroom;
>         int err;
>         bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
> @@ -1558,24 +1615,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>                         + VXLAN_HLEN + sizeof(struct ipv6hdr)
>                         + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -       /* Need space for new headers (invalidates iph ptr) */
> -       err = skb_cow_head(skb, min_headroom);
> -       if (unlikely(err)) {
> -               kfree_skb(skb);
> -               goto err;
> -       }
> -
> -       skb = vlan_hwaccel_push_inside(skb);
> -       if (WARN_ON(!skb)) {
> -               err = -ENOMEM;
> +       err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +       if (err)
>                 goto err;
> -       }
> -
> -       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = htonl(VXLAN_FLAGS);
> -       vxh->vx_vni = vni;
> -
> -       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>         udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
>                              ttl, src_port, dst_port);
> @@ -1589,9 +1631,9 @@ err:
>  int vxlan_xmit_skb(struct vxlan_sock *vs,
>                    struct rtable *rt, struct sk_buff *skb,
>                    __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -                  __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
> +                  __be16 src_port, __be16 dst_port,
> +                  struct vxlan_metadata *md, bool xnet)
>  {
> -       struct vxlanhdr *vxh;
>         int min_headroom;
>         int err;
>         bool udp_sum = !vs->sock->sk->sk_no_check_tx;
> @@ -1604,22 +1646,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>                         + VXLAN_HLEN + sizeof(struct iphdr)
>                         + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -       /* Need space for new headers (invalidates iph ptr) */
> -       err = skb_cow_head(skb, min_headroom);
> -       if (unlikely(err)) {
> -               kfree_skb(skb);
> +       err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +       if (err)
>                 return err;
> -       }
> -
> -       skb = vlan_hwaccel_push_inside(skb);
> -       if (WARN_ON(!skb))
> -               return -ENOMEM;
> -
> -       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = htonl(VXLAN_FLAGS);
> -       vxh->vx_vni = vni;
> -
> -       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>         return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
>                                    ttl, df, src_port, dst_port, xnet);
> @@ -1679,6 +1708,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>         const struct iphdr *old_iph;
>         struct flowi4 fl4;
>         union vxlan_addr *dst;
> +       struct vxlan_metadata md;
>         __be16 src_port = 0, dst_port;
>         u32 vni;
>         __be16 df = 0;
> @@ -1749,11 +1779,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>
>                 tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
>                 ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
> +               md.vni = htonl(vni << 8);
> +               md.gbp = skb->mark;
>
>                 err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
>                                      fl4.saddr, dst->sin.sin_addr.s_addr,
> -                                    tos, ttl, df, src_port, dst_port,
> -                                    htonl(vni << 8),
> +                                    tos, ttl, df, src_port, dst_port, &md,
>                                      !net_eq(vxlan->net, dev_net(vxlan->dev)));
>                 if (err < 0) {
>                         /* skb is already freed. */
> @@ -1806,10 +1837,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>                 }
>
>                 ttl = ttl ? : ip6_dst_hoplimit(ndst);
> +               md.vni = htonl(vni << 8);
> +               md.gbp = skb->mark;
>
>                 err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
>                                       dev, &fl6.saddr, &fl6.daddr, 0, ttl,
> -                                     src_port, dst_port, htonl(vni << 8),
> +                                     src_port, dst_port, &md,
>                                       !net_eq(vxlan->net, dev_net(vxlan->dev)));
>  #endif
>         }
> @@ -2210,6 +2243,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
>         [IFLA_VXLAN_UDP_CSUM]   = { .type = NLA_U8 },
>         [IFLA_VXLAN_UDP_ZERO_CSUM6_TX]  = { .type = NLA_U8 },
>         [IFLA_VXLAN_UDP_ZERO_CSUM6_RX]  = { .type = NLA_U8 },
> +       [IFLA_VXLAN_EXTENSION]  = { .type = NLA_NESTED },
> +};
> +
> +static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
> +       [IFLA_VXLAN_EXT_GBP]    = { .type = NLA_FLAG, },
>  };
>
>  static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
> @@ -2246,6 +2284,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>                 }
>         }
>
> +       if (data[IFLA_VXLAN_EXTENSION]) {
> +               int err;
> +
> +               err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
> +                                         IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
> +               if (err < 0) {
> +                       pr_debug("invalid VXLAN extension configuration: %d\n",
> +                                err);
> +                       return -EINVAL;
> +               }
> +       }
> +
>         return 0;
>  }
>
> @@ -2400,6 +2450,18 @@ static void vxlan_sock_work(struct work_struct *work)
>         dev_put(vxlan->dev);
>  }
>
> +static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
> +{
> +       struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
> +
> +       /* Validated in vxlan_validate() */
> +       if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
> +               BUG();
> +
> +       if (exts[IFLA_VXLAN_EXT_GBP])
> +               vxlan->exts |= VXLAN_EXT_GBP;
> +}
> +
>  static int vxlan_newlink(struct net *net, struct net_device *dev,
>                          struct nlattr *tb[], struct nlattr *data[])
>  {
> @@ -2525,6 +2587,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
>             nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
>                 vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
>
> +       if (data[IFLA_VXLAN_EXTENSION])
> +               configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
> +
>         if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
>                            vxlan->dst_port)) {
>                 pr_info("duplicate VNI %u\n", vni);
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 3e98d31..66000d0 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -11,13 +11,60 @@
>  #define VNI_HASH_BITS  10
>  #define VNI_HASH_SIZE  (1<<VNI_HASH_BITS)
>
> +/*
> + * VXLAN Group Based Policy Extension:
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                VXLAN Network Identifier (VNI) |   Reserved    |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + * D = Don't Learn bit. When set, this bit indicates that the egress
> + *     VTEP MUST NOT learn the source address of the encapsulated frame.
> + *
> + * A = Indicates that the group policy has already been applied to
> + *     this packet. Policies MUST NOT be applied by devices when the
> + *     A bit is set.
> + *
> + * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> + */
> +struct vxlan_gbp {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +       __u8    reserved_flags1:3,
> +               policy_applied:1,
> +               reserved_flags2:2,
> +               dont_learn:1,
> +               reserved_flags3:1;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +       __u8    reserved_flags1:1,
> +               dont_learn:1,
> +               reserved_flags2:2,
> +               policy_applied:1,
> +               reserved_flags3:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> +       __be16 policy_id;
> +} __packed;
> +
> +/* skb->mark mapping
> + *
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + */
> +#define VXLAN_GBP_DONT_LEARN           (BIT(6) << 16)
> +#define VXLAN_GBP_POLICY_APPLIED       (BIT(3) << 16)
> +#define VXLAN_GBP_ID_MASK              (0xFFFF)
> +
>  /* VXLAN protocol header:
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> - * |R|R|R|R|I|R|R|R|               Reserved                        |
> + * |G|R|R|R|I|R|R|R|               Reserved                        |
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>   * |                VXLAN Network Identifier (VNI) |   Reserved    |
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>   *
> + * G = 1       Group Policy (VXLAN-GBP)
>   * I = 1       VXLAN Network Identifier (VNI) present
>   */
>  struct vxlanhdr {
> @@ -26,24 +73,42 @@ struct vxlanhdr {
>  #ifdef __LITTLE_ENDIAN_BITFIELD
>                         __u8    reserved_flags1:3,
>                                 vni_present:1,
> -                               reserved_flags2:4;
> +                               reserved_flags2:3,
> +                               gbp_present:1;
>  #elif defined(__BIG_ENDIAN_BITFIELD)
> -                       __u8    reserved_flags2:4,
> +                       __u8    gbp_present:1,
> +                               reserved_flags2:3,
>                                 vni_present:1,
>                                 reserved_flags1:3;
>  #else
>  #error "Please fix <asm/byteorder.h>"
>  #endif
> -                       __u8    vx_reserved1;
> -                       __be16  vx_reserved2;
> +                       union {
> +                               /* NOTE: Offset 0 will be 1 byte aligned, so
> +                                * all member structs must be marked packed.
> +                                */
> +                               struct vxlan_gbp gbp;
> +                               struct {
> +                                       __u8    vx_reserved1;
> +                                       __be16  vx_reserved2;
> +                               } __packed;
> +                       };
>                 };
>                 __be32 vx_flags;
>         };
>         __be32  vx_vni;
>  };
>
> +struct vxlan_metadata {
> +       __be32          vni;
> +       u32             gbp;
> +};
> +
>  struct vxlan_sock;
> -typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
> +typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
> +                          struct vxlan_metadata *md);
> +
> +#define VXLAN_EXT_GBP                  BIT(0)
>
>  /* per UDP socket information */
>  struct vxlan_sock {
> @@ -78,7 +143,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
>  int vxlan_xmit_skb(struct vxlan_sock *vs,
>                    struct rtable *rt, struct sk_buff *skb,
>                    __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -                  __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
> +                  __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
> +                  bool xnet);
>
>  static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
>                                                      netdev_features_t features)
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..9f07bf5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -370,10 +370,18 @@ enum {
>         IFLA_VXLAN_UDP_CSUM,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
> +       IFLA_VXLAN_EXTENSION,
>         __IFLA_VXLAN_MAX
>  };
>  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
>
> +enum {
> +       IFLA_VXLAN_EXT_UNSPEC,
> +       IFLA_VXLAN_EXT_GBP,
> +       __IFLA_VXLAN_EXT_MAX,
> +};
> +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
> +
>  struct ifla_vxlan_port_range {
>         __be16  low;
>         __be16  high;
> diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> index d7c46b3..dd68c97 100644
> --- a/net/openvswitch/vport-vxlan.c
> +++ b/net/openvswitch/vport-vxlan.c
> @@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
>  }
>
>  /* Called with rcu_read_lock and BH disabled. */
> -static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +                     struct vxlan_metadata *md)
>  {
>         struct ovs_tunnel_info tun_info;
>         struct vport *vport = vs->data;
> @@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
>
>         /* Save outer tunnel values */
>         iph = ip_hdr(skb);
> -       key = cpu_to_be64(ntohl(vx_vni) >> 8);
> +       key = cpu_to_be64(ntohl(md->vni) >> 8);
>         ovs_flow_tun_info_init(&tun_info, iph,
>                                udp_hdr(skb)->source, udp_hdr(skb)->dest,
>                                key, TUNNEL_KEY, NULL, 0);
> @@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         struct vxlan_port *vxlan_port = vxlan_vport(vport);
>         __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
>         struct ovs_key_ipv4_tunnel *tun_key;
> +       struct vxlan_metadata md;
>         struct rtable *rt;
>         struct flowi4 fl;
>         __be16 src_port;
> @@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         skb->ignore_df = 1;
>
>         src_port = udp_flow_src_port(net, skb, 0, 0, true);
> +       md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
>
>         err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
>                              fl.saddr, tun_key->ipv4_dst,
>                              tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
>                              src_port, dst_port,
> -                            htonl(be64_to_cpu(tun_key->tun_id) << 8),
> +                            &md,
>                              false);
>         if (err < 0)
>                 ip_rt_put(rt);
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Thomas Graf @ 2015-01-07 16:21 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, Linux Netdev List,
	Stephen Hemminger, David Miller
In-Reply-To: <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 01/07/15 at 08:05am, Tom Herbert wrote:
> Associating a sixteen bit field with security is worrisome, especially
> considering that VXLAN provides no verification for any header fields
> and doesn't even advocate use of outer UDP checksum so the field is
> susceptible to an undetected single bit flip. The concept of a
> "trusted underlay" is weak justification and hardly universal, so the
> only way to actually secure this is through IPsec (this is mentioned
> in the VXLAN-GPB draft).

As you state correctly, this work requires a trusted underlay which can
be achieved with IPsec, OpenVPN, SSH, ...

> But if we have the security state of IPsec then why would we need
> this field anyway?

It's a separation of concern: the security label mechanism of the
overlay should not depend on an eventual encryption layer in the
underlay as not all of them provide a mechanism to label packets.
 
> Could this same functionality be achieved if we just match the VNI to
> a mark in IP tables?

If the VNI is not already used for another purpose, yes. The solution
as proposed can be integrated into existing VXLAN overlays separated by
VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
reserved bits while continueing to maintain VNI separation.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* Re: [PATCH v4] can: Convert to runtime_pm
From: Marc Kleine-Budde @ 2015-01-07 16:30 UTC (permalink / raw)
  To: Sören Brinkmann
  Cc: Kedareswara rao Appana, wg, michal.simek, grant.likely, linux-can,
	netdev, linux-kernel, Kedareswara rao Appana
In-Reply-To: <a868331b44ce4b2a80b572399530f1f9@BN1BFFO11FD043.protection.gbl>

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

On 01/07/2015 04:58 PM, Sören Brinkmann wrote:
>> I think you have to convert the _remove() function, too. Have a look at
>> the gpio-zynq.c driver:
>>
>>> static int zynq_gpio_remove(struct platform_device *pdev)
>>> {
>>> 	struct zynq_gpio *gpio = platform_get_drvdata(pdev);
>>>
>>> 	pm_runtime_get_sync(&pdev->dev);
>>
>> However I don't understand why the get_sync() is here. Maybe Sören can help?
> 
> IIRC, the concern was that the remove function may be called while the device is
> runtime suspended. Hence the remove function needs to resume the device since the
> remove function may access the HW.

What about the corresponding runtime_put()? Would some counter be
unbalanced upon device removal?

Without having tested it, unloading and loading, i.e. reloading a CAN
driver is easier than reloading the gpio driver on an average embedded
system. Kedareswara please test your driver with something like:
modprobe; ifconfig up; cansend; ifconfig down; rmmod in a loop.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v4] can: Convert to runtime_pm
From: Sören Brinkmann @ 2015-01-07 16:32 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Kedareswara rao Appana, wg, michal.simek, grant.likely, linux-can,
	netdev, linux-kernel, Kedareswara rao Appana
In-Reply-To: <54AD5F13.1020107@pengutronix.de>

On Wed, 2015-01-07 at 05:30PM +0100, Marc Kleine-Budde wrote:
> On 01/07/2015 04:58 PM, Sören Brinkmann wrote:
> >> I think you have to convert the _remove() function, too. Have a look at
> >> the gpio-zynq.c driver:
> >>
> >>> static int zynq_gpio_remove(struct platform_device *pdev)
> >>> {
> >>> 	struct zynq_gpio *gpio = platform_get_drvdata(pdev);
> >>>
> >>> 	pm_runtime_get_sync(&pdev->dev);
> >>
> >> However I don't understand why the get_sync() is here. Maybe Sören can help?
> > 
> > IIRC, the concern was that the remove function may be called while the device is
> > runtime suspended. Hence the remove function needs to resume the device since the
> > remove function may access the HW.
> 
> What about the corresponding runtime_put()? Would some counter be
> unbalanced upon device removal?

Aren't those counters destroyed with module unloading?

	Sören

^ permalink raw reply

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: John Fastabend @ 2015-01-07 16:35 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek
In-Reply-To: <CAJ3xEMiMsHW9Wys2GMhD9CRhU03DF3KZywGQkTWF1sMKn8x+fQ@mail.gmail.com>

On 01/07/2015 02:07 AM, Or Gerlitz wrote:
> On Mon, Jan 5, 2015 at 8:59 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>>> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
>>>> +                                          struct net_device *dev,
>>>> +                                          u32 portid, int seq, u8 cmd)
>>>> +{
>>>> +       struct genlmsghdr *hdr;
>>>> +       struct sk_buff *skb;
>>>> +       int err = -ENOBUFS;
>>>> +
>>>> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>>
>>>
>>> genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>
>>
>> fixed along with the other cases.
>
> small nit here, net_flow_build_actions_msg can be made static, it's
> called only from within this file
>
> few more nits... checkpatch --strict produces bunch of "CHECK: Please
> use a blank line after function/struct/union/enum declarations"
> comments, I guess worth fixing too.
>

Thanks. will fix in v2.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [PATCH v4] can: Convert to runtime_pm
From: Marc Kleine-Budde @ 2015-01-07 16:36 UTC (permalink / raw)
  To: Sören Brinkmann
  Cc: Kedareswara rao Appana, wg, michal.simek, grant.likely, linux-can,
	netdev, linux-kernel, Kedareswara rao Appana
In-Reply-To: <1d3216d75e9d494fb68ce9a3b24e7ca3@BL2FFO11FD017.protection.gbl>

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

On 01/07/2015 05:32 PM, Sören Brinkmann wrote:
> On Wed, 2015-01-07 at 05:30PM +0100, Marc Kleine-Budde wrote:
>> On 01/07/2015 04:58 PM, Sören Brinkmann wrote:
>>>> I think you have to convert the _remove() function, too. Have a look at
>>>> the gpio-zynq.c driver:
>>>>
>>>>> static int zynq_gpio_remove(struct platform_device *pdev)
>>>>> {
>>>>> 	struct zynq_gpio *gpio = platform_get_drvdata(pdev);
>>>>>
>>>>> 	pm_runtime_get_sync(&pdev->dev);
>>>>
>>>> However I don't understand why the get_sync() is here. Maybe Sören can help?
>>>
>>> IIRC, the concern was that the remove function may be called while the device is
>>> runtime suspended. Hence the remove function needs to resume the device since the
>>> remove function may access the HW.
>>
>> What about the corresponding runtime_put()? Would some counter be
>> unbalanced upon device removal?
> 
> Aren't those counters destroyed with module unloading?

I don't know.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH net-next 01/11] r8169:change rtl8168dp jumbo frame patch
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

For RTL8168DP, its jumbo frame patch is the same as RTL8168DP. So use RTL8168C
jumbo frame patch instead and reomve function "r8168dp_hw_jumbo_enable" and
"r8168dp_hw_jumbo_disable".

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 14a1c5c..2f97476 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4927,20 +4927,6 @@ static void r8168c_hw_jumbo_disable(struct rtl8169_private *tp)
 	rtl_tx_performance_tweak(tp->pci_dev, 0x5 << MAX_READ_REQUEST_SHIFT);
 }
 
-static void r8168dp_hw_jumbo_enable(struct rtl8169_private *tp)
-{
-	void __iomem *ioaddr = tp->mmio_addr;
-
-	RTL_W8(Config3, RTL_R8(Config3) | Jumbo_En0);
-}
-
-static void r8168dp_hw_jumbo_disable(struct rtl8169_private *tp)
-{
-	void __iomem *ioaddr = tp->mmio_addr;
-
-	RTL_W8(Config3, RTL_R8(Config3) & ~Jumbo_En0);
-}
-
 static void r8168e_hw_jumbo_enable(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -5014,16 +5000,13 @@ static void rtl_init_jumbo_ops(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_24:
 	case RTL_GIGA_MAC_VER_25:
 	case RTL_GIGA_MAC_VER_26:
-		ops->disable	= r8168c_hw_jumbo_disable;
-		ops->enable	= r8168c_hw_jumbo_enable;
-		break;
 	case RTL_GIGA_MAC_VER_27:
 	case RTL_GIGA_MAC_VER_28:
-		ops->disable	= r8168dp_hw_jumbo_disable;
-		ops->enable	= r8168dp_hw_jumbo_enable;
+	case RTL_GIGA_MAC_VER_31:
+		ops->disable	= r8168c_hw_jumbo_disable;
+		ops->enable	= r8168c_hw_jumbo_enable;
 		break;
-	case RTL_GIGA_MAC_VER_31: /* Wild guess. Needs info from Realtek. */
-	case RTL_GIGA_MAC_VER_32:
+	case RTL_GIGA_MAC_VER_32: /* Wild guess. Needs info from Realtek. */
 	case RTL_GIGA_MAC_VER_33:
 	case RTL_GIGA_MAC_VER_34:
 		ops->disable	= r8168e_hw_jumbo_disable;
@@ -5758,7 +5741,8 @@ static void rtl_hw_start_8168d_4(struct rtl8169_private *tp)
 
 	rtl_csi_access_enable_1(tp);
 
-	rtl_tx_performance_tweak(pdev, 0x5 << MAX_READ_REQUEST_SHIFT);
+	if (tp->dev->mtu <= ETH_DATA_LEN)
+		rtl_tx_performance_tweak(pdev, 0x5 << MAX_READ_REQUEST_SHIFT);
 
 	RTL_W8(MaxTxPacketSize, TxPacketMax);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 04/11] r8169:update rtl8168dp pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

Add following ephy parameter. This parameter will save more power when
in ASPM mode.
{ 0x10, 0x0004,	0x0000 }

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 0fc7e62..15a0f5b 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5733,7 +5733,8 @@ static void rtl_hw_start_8168d_4(struct rtl8169_private *tp)
 	static const struct ephy_info e_info_8168d_4[] = {
 		{ 0x0b, 0x0000,	0x0048 },
 		{ 0x19, 0x0020,	0x0050 },
-		{ 0x0c, 0x0100,	0x0020 }
+		{ 0x0c, 0x0100,	0x0020 },
+		{ 0x10, 0x0004,	0x0000 }
 	};
 
 	rtl_csi_access_enable_1(tp);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 05/11] r8169:remove function rtl_hw_start_8168dp
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

RTL_GIGA_MAC_VER_31 is RTL8168DP and it use the same ephy parameter as
RTL_GIGA_MAC_VER_28. So I use function rtl_hw_start_8168d_4 to set
RTL_GIGA_MAC_VER_31 ephy parameter and remove unnecessary function
rtl_hw_start_8168dp.

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 15a0f5b..48d1f78 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5711,21 +5711,6 @@ static void rtl_hw_start_8168d(struct rtl8169_private *tp)
 	RTL_W16(CPlusCmd, RTL_R16(CPlusCmd) & ~R8168_CPCMD_QUIRK_MASK);
 }
 
-static void rtl_hw_start_8168dp(struct rtl8169_private *tp)
-{
-	void __iomem *ioaddr = tp->mmio_addr;
-	struct pci_dev *pdev = tp->pci_dev;
-
-	rtl_csi_access_enable_1(tp);
-
-	if (tp->dev->mtu <= ETH_DATA_LEN)
-		rtl_tx_performance_tweak(pdev, 0x5 << MAX_READ_REQUEST_SHIFT);
-
-	RTL_W8(MaxTxPacketSize, TxPacketMax);
-
-	rtl_disable_clock_request(pdev);
-}
-
 static void rtl_hw_start_8168d_4(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -6272,11 +6257,8 @@ static void rtl_hw_start_8168(struct net_device *dev)
 		break;
 
 	case RTL_GIGA_MAC_VER_28:
-		rtl_hw_start_8168d_4(tp);
-		break;
-
 	case RTL_GIGA_MAC_VER_31:
-		rtl_hw_start_8168dp(tp);
+		rtl_hw_start_8168d_4(tp);
 		break;
 
 	case RTL_GIGA_MAC_VER_32:
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 08/11] r8169:update rtl8411 pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

rtl8411 may return to PCIe L0 from PCIe L0s low power mode too slow.
The following ephy parameters are for this issue.
{ 0x00, 0x0000,     0x0008 }
{ 0x0c, 0x3df0,     0x0200 }

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index bafa132..e92eece 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5869,17 +5869,19 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 
 static void rtl_hw_start_8411(struct rtl8169_private *tp)
 {
-	static const struct ephy_info e_info_8168f_1[] = {
+	static const struct ephy_info e_info_8411[] = {
 		{ 0x06, 0x00c0,	0x0020 },
 		{ 0x0f, 0xffff,	0x5200 },
 		{ 0x1e, 0x0000,	0x4000 },
-		{ 0x19, 0x0000,	0x0224 }
+		{ 0x19, 0x0000,	0x0224 },
+		{ 0x00, 0x0000,	0x0008 },
+		{ 0x0c, 0x3df0,	0x0200 }
 	};
 
 	rtl_hw_start_8168f(tp);
 	rtl_pcie_state_l2l3_enable(tp, false);
 
-	rtl_ephy_init(tp, e_info_8168f_1, ARRAY_SIZE(e_info_8168f_1));
+	rtl_ephy_init(tp, e_info_8411, ARRAY_SIZE(e_info_8411));
 
 	rtl_w0w1_eri(tp, 0x0d4, ERIAR_MASK_0011, 0x0c00, 0x0000, ERIAR_EXGMAC);
 }
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 09/11] r8169:update rtl8105e pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

rtl8105e may return to PCIe L0 from PCIe L0s low power mode too slow.
The following ephy parameters are for this issue.
{ 0x05,	0, 0x2000 }

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e92eece..ce98d2a 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6403,7 +6403,8 @@ static void rtl_hw_start_8105e_1(struct rtl8169_private *tp)
 		{ 0x03,	0, 0x0001 },
 		{ 0x19,	0, 0x0100 },
 		{ 0x19,	0, 0x0004 },
-		{ 0x0a,	0, 0x0020 }
+		{ 0x0a,	0, 0x0020 },
+		{ 0x05,	0, 0x2000 }
 	};
 
 	/* Force LAN exit from ASPM if Rx/Tx are not idle */
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 11/11] r8169:update rtl8168f rev.b pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

RTL8168F rev.B does not have to set following two ephy parameters.
{ 0x06, 0x00c0,	0x0020 }
{ 0x08, 0x0001,	0x0002 }

Add function rtl_hw_start_8168f_2 to set RTL8168F rev.B ephy parameters,
instead of using function rtl_hw_start_8168f_1.

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index b8a097c..d62f8d8 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5867,6 +5867,26 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 	RTL_W8(EEE_LED, RTL_R8(EEE_LED) & ~0x07);
 }
 
+static void rtl_hw_start_8168f_2(struct rtl8169_private *tp)
+{
+	void __iomem *ioaddr = tp->mmio_addr;
+	static const struct ephy_info e_info_8168f_2[] = {
+		{ 0x09, 0x0000,	0x0080 },
+		{ 0x19, 0x0000,	0x0224 },
+		{ 0x00, 0x0000,	0x0008 },
+		{ 0x0c, 0x3df0,	0x0200 }
+	};
+
+	rtl_hw_start_8168f(tp);
+
+	rtl_ephy_init(tp, e_info_8168f_2, ARRAY_SIZE(e_info_8168f_2));
+
+	rtl_w0w1_eri(tp, 0x0d4, ERIAR_MASK_0011, 0x0c00, 0xff00, ERIAR_EXGMAC);
+
+	/* Adjust EEE LED frequency */
+	RTL_W8(EEE_LED, RTL_R8(EEE_LED) & ~0x07);
+}
+
 static void rtl_hw_start_8411(struct rtl8169_private *tp)
 {
 	static const struct ephy_info e_info_8411[] = {
@@ -6276,9 +6296,11 @@ static void rtl_hw_start_8168(struct net_device *dev)
 		break;
 
 	case RTL_GIGA_MAC_VER_35:
-	case RTL_GIGA_MAC_VER_36:
 		rtl_hw_start_8168f_1(tp);
 		break;
+	case RTL_GIGA_MAC_VER_36:
+		rtl_hw_start_8168f_2(tp);
+		break;
 
 	case RTL_GIGA_MAC_VER_38:
 		rtl_hw_start_8411(tp);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 00/11]r8169:update hardware parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin

These series of patch include the ephy update of following adapters.
rtl8411
rtl8168f
rtl8168evl
rtl8168dp
rtl8105
rtl8402

Jumbo frame patch update of following adapters.
rtl8168dp
rtl8168e
rtl8168evl

Also remove unnecessary function rtl_hw_start_8168dp. And add function
rtl_hw_start_8168f_2 to set rtl8168f rev.b ephy paramerers.

Chunhao Lin (11):
  r8169:change rtl8168dp jumbo frame patch
  r8169:update rtl8168e and rtl8168evl jumbo frame patch
  r8169:change the way of setting rtl8168dp ephy parameters.
  r8169:update rtl8168dp pcie ephy parameter
  r8169:remove function rtl_hw_start_8168dp
  r8169:update rtl8168evl pcie ephy parameter
  r8169:update rtl8168f pcie ephy parameter
  r8169:update rtl8411 pcie ephy parameter
  r8169:update rtl8105e pcie ephy parameter
  r8169:update rtl8402 pcie ephy parameter
  r8169:update rtl8168f rev.b pcie ephy parameter

 drivers/net/ethernet/realtek/r8169.c | 112 +++++++++++++++--------------------
 1 file changed, 49 insertions(+), 63 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net-next 02/11] r8169:update rtl8168e and rtl8168evl jumbo frame patch
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

For RTL8168E and RTL8168EVL, these two chips do not need to change pcie max
read request size when jumbo frame is enabled.

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 2f97476..5bfd0b9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4934,7 +4934,6 @@ static void r8168e_hw_jumbo_enable(struct rtl8169_private *tp)
 	RTL_W8(MaxTxPacketSize, 0x3f);
 	RTL_W8(Config3, RTL_R8(Config3) | Jumbo_En0);
 	RTL_W8(Config4, RTL_R8(Config4) | 0x01);
-	rtl_tx_performance_tweak(tp->pci_dev, 0x2 << MAX_READ_REQUEST_SHIFT);
 }
 
 static void r8168e_hw_jumbo_disable(struct rtl8169_private *tp)
@@ -4944,7 +4943,6 @@ static void r8168e_hw_jumbo_disable(struct rtl8169_private *tp)
 	RTL_W8(MaxTxPacketSize, 0x0c);
 	RTL_W8(Config3, RTL_R8(Config3) & ~Jumbo_En0);
 	RTL_W8(Config4, RTL_R8(Config4) & ~0x01);
-	rtl_tx_performance_tweak(tp->pci_dev, 0x5 << MAX_READ_REQUEST_SHIFT);
 }
 
 static void r8168b_0_hw_jumbo_enable(struct rtl8169_private *tp)
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 03/11] r8169:change the way of setting rtl8168dp ephy parameters.
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

The original way is wrong. I correct it in this patch.

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 5bfd0b9..0fc7e62 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5731,11 +5731,10 @@ static void rtl_hw_start_8168d_4(struct rtl8169_private *tp)
 	void __iomem *ioaddr = tp->mmio_addr;
 	struct pci_dev *pdev = tp->pci_dev;
 	static const struct ephy_info e_info_8168d_4[] = {
-		{ 0x0b, ~0,	0x48 },
-		{ 0x19, 0x20,	0x50 },
-		{ 0x0c, ~0,	0x20 }
+		{ 0x0b, 0x0000,	0x0048 },
+		{ 0x19, 0x0020,	0x0050 },
+		{ 0x0c, 0x0100,	0x0020 }
 	};
-	int i;
 
 	rtl_csi_access_enable_1(tp);
 
@@ -5744,13 +5743,7 @@ static void rtl_hw_start_8168d_4(struct rtl8169_private *tp)
 
 	RTL_W8(MaxTxPacketSize, TxPacketMax);
 
-	for (i = 0; i < ARRAY_SIZE(e_info_8168d_4); i++) {
-		const struct ephy_info *e = e_info_8168d_4 + i;
-		u16 w;
-
-		w = rtl_ephy_read(tp, e->offset);
-		rtl_ephy_write(tp, 0x03, (w & e->mask) | e->bits);
-	}
+	rtl_ephy_init(tp, e_info_8168d_4, ARRAY_SIZE(e_info_8168d_4));
 
 	rtl_enable_clock_request(pdev);
 }
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 06/11] r8169:update rtl8168evl pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

rtl8168evl may return to PCIe L0 from PCIe L0s low power mode too slow.
The following ephy parameters are for this issue.
{ 0x0c, 0x0100,	0x0020 }
{ 0x10, 0x0004,	0x0000 }

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 48d1f78..1874583 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5778,7 +5778,9 @@ static void rtl_hw_start_8168e_2(struct rtl8169_private *tp)
 	struct pci_dev *pdev = tp->pci_dev;
 	static const struct ephy_info e_info_8168e_2[] = {
 		{ 0x09, 0x0000,	0x0080 },
-		{ 0x19, 0x0000,	0x0224 }
+		{ 0x19, 0x0000,	0x0224 },
+		{ 0x00, 0x0000,	0x0008 },
+		{ 0x0c, 0x3df0,	0x0200 }
 	};
 
 	rtl_csi_access_enable_1(tp);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 07/11] r8169:update rtl8168f pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

rtl8168f may return to PCIe L0 from PCIe L0s low power mode too slow.
The following ephy parameters are for this issue.
{ 0x00, 0x0000,	0x0008 }
{ 0x0c, 0x3df0,	0x0200 }

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 1874583..bafa132 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5852,7 +5852,9 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 		{ 0x06, 0x00c0,	0x0020 },
 		{ 0x08, 0x0001,	0x0002 },
 		{ 0x09, 0x0000,	0x0080 },
-		{ 0x19, 0x0000,	0x0224 }
+		{ 0x19, 0x0000,	0x0224 },
+		{ 0x00, 0x0000,	0x0008 },
+		{ 0x0c, 0x3df0,	0x0200 }
 	};
 
 	rtl_hw_start_8168f(tp);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 10/11] r8169:update rtl8402 pcie ephy parameter
From: Chunhao Lin @ 2015-01-07 16:40 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chunhao Lin
In-Reply-To: <1420648826-12972-1-git-send-email-hau@realtek.com>

Remove following unnecessary ephy parameter.
{ 0x1e,	0, 0x4000 }

Signed-off-by: Chunhao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index ce98d2a..b8a097c 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6431,8 +6431,7 @@ static void rtl_hw_start_8402(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
 	static const struct ephy_info e_info_8402[] = {
-		{ 0x19,	0xffff, 0xff64 },
-		{ 0x1e,	0, 0x4000 }
+		{ 0x19,	0xffff, 0xff64 }
 	};
 
 	rtl_csi_access_enable_2(tp);
-- 
1.9.1

^ permalink raw reply related

* [patch net-next] tc: add BPF based action
From: Jiri Pirko @ 2015-01-07 16:43 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs, stephen

This action provides a possibility to exec custom BPF code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/net/tc_act/tc_bpf.h        |  25 +++++
 include/uapi/linux/tc_act/Kbuild   |   1 +
 include/uapi/linux/tc_act/tc_bpf.h |  31 ++++++
 net/sched/Kconfig                  |  11 +++
 net/sched/Makefile                 |   1 +
 net/sched/act_bpf.c                | 196 +++++++++++++++++++++++++++++++++++++
 6 files changed, 265 insertions(+)
 create mode 100644 include/net/tc_act/tc_bpf.h
 create mode 100644 include/uapi/linux/tc_act/tc_bpf.h
 create mode 100644 net/sched/act_bpf.c

diff --git a/include/net/tc_act/tc_bpf.h b/include/net/tc_act/tc_bpf.h
new file mode 100644
index 0000000..95e11da
--- /dev/null
+++ b/include/net/tc_act/tc_bpf.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2015 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_BPF_H
+#define __NET_TC_BPF_H
+
+#include <linux/filter.h>
+#include <net/act_api.h>
+
+struct tcf_bpf {
+	struct tcf_common	common;
+	struct bpf_prog		*filter;
+	struct sock_filter	*bpf_ops;
+	u16			bpf_len;
+};
+#define to_bpf(a) \
+	container_of(a->priv, struct tcf_bpf, common)
+
+#endif /* __NET_TC_BPF_H */
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index b057da2..19d5219 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -8,3 +8,4 @@ header-y += tc_nat.h
 header-y += tc_pedit.h
 header-y += tc_skbedit.h
 header-y += tc_vlan.h
+header-y += tc_bpf.h
diff --git a/include/uapi/linux/tc_act/tc_bpf.h b/include/uapi/linux/tc_act/tc_bpf.h
new file mode 100644
index 0000000..5288bd77
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2015 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_BPF_H
+#define __LINUX_TC_BPF_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_BPF 13
+
+struct tc_act_bpf {
+	tc_gen;
+};
+
+enum {
+	TCA_ACT_BPF_UNSPEC,
+	TCA_ACT_BPF_TM,
+	TCA_ACT_BPF_PARMS,
+	TCA_ACT_BPF_OPS_LEN,
+	TCA_ACT_BPF_OPS,
+	__TCA_ACT_BPF_MAX,
+};
+#define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c54c9d9..cc311e9 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -698,6 +698,17 @@ config NET_ACT_VLAN
 	  To compile this code as a module, choose M here: the
 	  module will be called act_vlan.
 
+config NET_ACT_BPF
+        tristate "BPF based action"
+        depends on NET_CLS_ACT
+        ---help---
+	  Say Y here to execute BFP code on packets.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_bpf.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 679f24a..7ca2b4e 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
+obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
new file mode 100644
index 0000000..43f5f9d
--- /dev/null
+++ b/net/sched/act_bpf.c
@@ -0,0 +1,196 @@
+/*
+ * Copyright (c) 2015 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/filter.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+
+#include <linux/tc_act/tc_bpf.h>
+#include <net/tc_act/tc_bpf.h>
+
+#define BPF_TAB_MASK     15
+
+static int tcf_bpf(struct sk_buff *skb, const struct tc_action *a,
+		   struct tcf_result *res)
+{
+	struct tcf_bpf *b = a->priv;
+	int action;
+	int filter_res;
+
+	spin_lock(&b->tcf_lock);
+	b->tcf_tm.lastuse = jiffies;
+	bstats_update(&b->tcf_bstats, skb);
+	action = b->tcf_action;
+
+	filter_res = BPF_PROG_RUN(b->filter, skb);
+	if (filter_res == -1)
+		goto drop;
+
+	goto unlock;
+
+drop:
+	action = TC_ACT_SHOT;
+	b->tcf_qstats.drops++;
+unlock:
+	spin_unlock(&b->tcf_lock);
+	return action;
+}
+
+static const struct nla_policy act_bpf_policy[TCA_ACT_BPF_MAX + 1] = {
+	[TCA_ACT_BPF_PARMS]	= { .len = sizeof(struct tc_act_bpf) },
+	[TCA_ACT_BPF_OPS_LEN]	= { .type = NLA_U16 },
+	[TCA_ACT_BPF_OPS]	= { .type = NLA_BINARY,
+				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
+};
+
+static int tcf_bpf_init(struct net *net, struct nlattr *nla,
+			struct nlattr *est, struct tc_action *a,
+			int ovr, int bind)
+{
+	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
+	struct tc_act_bpf *parm;
+	struct tcf_bpf *b;
+	u16 bpf_size, bpf_len;
+	struct sock_filter *bpf_ops;
+	struct sock_fprog_kern tmp;
+	struct bpf_prog *fp;
+	int ret;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_ACT_BPF_MAX, nla, act_bpf_policy);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[TCA_ACT_BPF_PARMS] ||
+	    !tb[TCA_ACT_BPF_OPS_LEN] || !tb[TCA_ACT_BPF_OPS])
+		return -EINVAL;
+	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
+
+	bpf_len = nla_get_u16(tb[TCA_ACT_BPF_OPS_LEN]);
+	if (bpf_len > BPF_MAXINSNS || bpf_len == 0)
+		return -EINVAL;
+
+	bpf_size = bpf_len * sizeof(*bpf_ops);
+	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
+	if (!bpf_ops)
+		return -ENOMEM;
+
+	memcpy(bpf_ops, nla_data(tb[TCA_ACT_BPF_OPS]), bpf_size);
+
+	tmp.len = bpf_len;
+	tmp.filter = bpf_ops;
+
+	ret = bpf_prog_create(&fp, &tmp);
+	if (ret)
+		goto free_bpf_ops;
+
+	if (!tcf_hash_check(parm->index, a, bind)) {
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*b), bind);
+		if (ret)
+			goto free_bpf_ops;
+
+		ret = ACT_P_CREATED;
+	} else {
+		if (bind)
+			goto free_bpf_ops;
+		tcf_hash_release(a, bind);
+		if (!ovr) {
+			ret = -EEXIST;
+			goto free_bpf_ops;
+		}
+	}
+
+	b = to_bpf(a);
+	spin_lock_bh(&b->tcf_lock);
+	b->tcf_action = parm->action;
+	b->bpf_len = bpf_len;
+	b->bpf_ops = bpf_ops;
+	b->filter = fp;
+	spin_unlock_bh(&b->tcf_lock);
+
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(a);
+	return ret;
+
+free_bpf_ops:
+	kfree(bpf_ops);
+	return ret;
+}
+
+static int tcf_bpf_dump(struct sk_buff *skb, struct tc_action *a,
+			int bind, int ref)
+{
+	unsigned char *tp = skb_tail_pointer(skb);
+	struct tcf_bpf *b = a->priv;
+	struct tc_act_bpf opt = {
+		.index    = b->tcf_index,
+		.refcnt   = b->tcf_refcnt - ref,
+		.bindcnt  = b->tcf_bindcnt - bind,
+		.action   = b->tcf_action,
+	};
+	struct tcf_t t;
+	struct nlattr *nla;
+
+	if (nla_put(skb, TCA_ACT_BPF_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (nla_put_u16(skb, TCA_ACT_BPF_OPS_LEN, b->bpf_len))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_ACT_BPF_OPS, b->bpf_len *
+			  sizeof(struct sock_filter));
+	if (!nla)
+		goto nla_put_failure;
+
+	memcpy(nla_data(nla), b->bpf_ops, nla_len(nla));
+
+	t.install = jiffies_to_clock_t(jiffies - b->tcf_tm.install);
+	t.lastuse = jiffies_to_clock_t(jiffies - b->tcf_tm.lastuse);
+	t.expires = jiffies_to_clock_t(b->tcf_tm.expires);
+	if (nla_put(skb, TCA_ACT_BPF_TM, sizeof(t), &t))
+		goto nla_put_failure;
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, tp);
+	return -1;
+}
+
+static struct tc_action_ops act_bpf_ops = {
+	.kind		=	"bpf",
+	.type		=	TCA_ACT_BPF,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_bpf,
+	.dump		=	tcf_bpf_dump,
+	.init		=	tcf_bpf_init,
+};
+
+static int __init bpf_init_module(void)
+{
+	return tcf_register_action(&act_bpf_ops, BPF_TAB_MASK);
+}
+
+static void __exit bpf_cleanup_module(void)
+{
+	tcf_unregister_action(&act_bpf_ops);
+}
+
+module_init(bpf_init_module);
+module_exit(bpf_cleanup_module);
+
+MODULE_AUTHOR("Jiri Pirko <jiri@resnulli.us>");
+MODULE_DESCRIPTION("TC BPF based action");
+MODULE_LICENSE("GPL v2");
-- 
1.9.3

^ permalink raw reply related

* RE: [PATCH net-next 07/11] r8169:update rtl8168f pcie ephy parameter
From: David Laight @ 2015-01-07 16:45 UTC (permalink / raw)
  To: 'Chunhao Lin', netdev@vger.kernel.org
  Cc: nic_swsd@realtek.com, linux-kernel@vger.kernel.org
In-Reply-To: <1420648826-12972-8-git-send-email-hau@realtek.com>

From: Chunhao Lin
> rtl8168f may return to PCIe L0 from PCIe L0s low power mode too slow.
> The following ephy parameters are for this issue.
> { 0x00, 0x0000,	0x0008 }
> { 0x0c, 0x3df0,	0x0200 }
> 
> Signed-off-by: Chunhao Lin <hau@realtek.com>
> ---
>  drivers/net/ethernet/realtek/r8169.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 1874583..bafa132 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5852,7 +5852,9 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
>  		{ 0x06, 0x00c0,	0x0020 },
>  		{ 0x08, 0x0001,	0x0002 },
>  		{ 0x09, 0x0000,	0x0080 },
> -		{ 0x19, 0x0000,	0x0224 }
> +		{ 0x19, 0x0000,	0x0224 },
> +		{ 0x00, 0x0000,	0x0008 },
> +		{ 0x0c, 0x3df0,	0x0200 }

I can't help feeling these lines all require short comments.

	David

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox