Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 3/6] xen-netfront: frags -> slots in xennet_get_responses
From: David Miller @ 2013-03-25 16:20 UTC (permalink / raw)
  To: wei.liu2
  Cc: xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <1364209702-12437-4-git-send-email-wei.liu2@citrix.com>

From: Wei Liu <wei.liu2@citrix.com>
Date: Mon, 25 Mar 2013 11:08:19 +0000

> This function is in fact counting the ring slots required for responses.
> Separate the concepts of ring slots and skb frags make the code clearer, as
> now netfront and netback can have different MAX_SKB_FRAGS, slot and frag are
> not mapped 1:1 any more.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/6] xen-netback: remove skb in xen_netbk_alloc_page
From: David Miller @ 2013-03-25 16:20 UTC (permalink / raw)
  To: wei.liu2
  Cc: xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <1364209702-12437-5-git-send-email-wei.liu2@citrix.com>

From: Wei Liu <wei.liu2@citrix.com>
Date: Mon, 25 Mar 2013 11:08:20 +0000

> This variable is never used.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/6] xen-netback: coalesce slots before copying
From: David Miller @ 2013-03-25 16:20 UTC (permalink / raw)
  To: wei.liu2
  Cc: xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <1364209702-12437-6-git-send-email-wei.liu2@citrix.com>

From: Wei Liu <wei.liu2@citrix.com>
Date: Mon, 25 Mar 2013 11:08:21 +0000

> This patch tries to coalesce tx requests when constructing grant copy
> structures. It enables netback to deal with situation when frontend's
> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> 
> It defines max_skb_slots, which is a estimation of the maximum number of slots
> a guest can send, anything bigger than that is considered malicious. Now it is
> set to 20, which should be enough to accommodate Linux (16 to 19).
> 
> Also change variable name from "frags" to "slots" in netbk_count_requests.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Please resubmit after you've incorporated the feedback you've
received.

^ permalink raw reply

* Re: [PATCH 6/6] xen-netback: don't disconnect frontend when seeing oversize frame
From: David Miller @ 2013-03-25 16:21 UTC (permalink / raw)
  To: wei.liu2
  Cc: xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <1364209702-12437-7-git-send-email-wei.liu2@citrix.com>

From: Wei Liu <wei.liu2@citrix.com>
Date: Mon, 25 Mar 2013 11:08:22 +0000

> Some buggy frontends may generate frames larger than 64 KiB. We should
> aggresively consume all slots and drop the packet instead of disconnecting the
> frontend.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Similarly, resubmit with the more descriptive commit message David
posted and also integrate any other feedback you've received.

^ permalink raw reply

* Re: [PATCH] libertas: drop maintainership
From: Daniel Drake @ 2013-03-25 16:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: netdev@vger.kernel.org, linux-wireless@vger.kernel.org, Bing Zhao
In-Reply-To: <1363625321.1597.28.camel@dcbw.foobar.com>

On Mon, Mar 18, 2013 at 10:48 AM, Dan Williams <dcbw@redhat.com> wrote:
> Would be better maintained by somebody who actualy has time for it.
>
> Signed-off-by: Dan Williams <dcbw@redhat.com>
> ---
> Daniel?  Bing? :)
>
> I'm quite happy to ACK somebody else picking libertas up, and I'm also
> happy to investigate how to send my hardware cache to the new
> maintainer.  (usb8388 dev modules, some CF8385s, some sd8686s, etc).
> Also happy to forward along the libertas-dev moderator password.

Thanks for all your work on libertas over the years.

I imagine I will continue to send small numbers of libertas patches as
and when new issues are found on OLPC XO, but I wouldn't be able to
make any other form of commitment, certainly not with other hardware.
I am already overstretched with my roles. So I don't really see myself
as a maintainer candidate.

Daniel

^ permalink raw reply

* Re: [PATCH 0/3 net-next] dsa: bugfixing and typos after Device Tree bindings
From: David Miller @ 2013-03-25 16:23 UTC (permalink / raw)
  To: florian; +Cc: netdev
In-Reply-To: <1364223820-26051-1-git-send-email-florian@openwrt.org>

From: Florian Fainelli <florian@openwrt.org>
Date: Mon, 25 Mar 2013 16:03:37 +0100

> Please find here 3 minor bugfixes related to my recent device tree bindings
> additions to DSA. Thanks!

Series applied, thanks Florian.

^ permalink raw reply

* [PATCH net-next] ipv6: don't free static mem in the case of init_net
From: Hong Zhiguo @ 2013-03-25 16:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, Hong Zhiguo

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 net/ipv6/addrconf.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index fa36a67..1f5f51f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4891,10 +4891,14 @@ static int __net_init addrconf_init_net(struct net *net)
 err_reg_dflt:
 	__addrconf_sysctl_unregister(all);
 err_reg_all:
-	kfree(dflt);
+	if (dflt != &ipv6_devconf_dflt) {
+		kfree(dflt);
+	}
 #endif
 err_alloc_dflt:
-	kfree(all);
+	if (all != &ipv6_devconf) {
+		kfree(all);
+	}
 err_alloc_all:
 	return err;
 }
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net 0/3] ipv4: Fix ip-header identification for gso packets.
From: David Miller @ 2013-03-25 16:29 UTC (permalink / raw)
  To: amwang; +Cc: pshelar, netdev
In-Reply-To: <1364183295.2532.9.camel@cr0>

From: Cong Wang <amwang@redhat.com>
Date: Mon, 25 Mar 2013 11:48:15 +0800

> On Sun, 2013-03-24 at 20:36 -0700, Pravin B Shelar wrote:
>> Following patch fixes ip-header identification according to comments from
>> David Miller and Cong Wang.
>> First two patches reverts fixes to upper layer gso handlers which is not
>> required after fixing ipv4 gso handler.
>> 
> 
> Looks good to me now:
> 
> Acked-by: Cong Wang <amwang@redhat.com>

Looks good to me too, all applied, thanks.

^ permalink raw reply

* Re: Atheros Communications Inc. AR8121/AR8113/AR8114 Gigabit or Fast Ethernet (rev b0) 1.0.0.7 md5/sha1 corrupted using NFS and samba (updated) Version 2
From: Hannes Frederic Sowa @ 2013-03-25 16:30 UTC (permalink / raw)
  To: Huang, Xiong, Sven Hartge, netdev@vger.kernel.org
In-Reply-To: <20130324172304.GB12968@order.stressinduktion.org>

On Sun, Mar 24, 2013 at 06:23:04PM +0100, Hannes Frederic Sowa wrote:
> > 2. try to set hw->dmaw_block = atl1e_dma_req_128 
> 
> Sorry, we can still reproduce corrupted frames with this change.

In atl1e_main.c

    940         max_pay_load  = ((dev_ctrl_data >> DEVICE_CTRL_MAX_PAYLOAD_SHIFT)) &
    941                         DEVICE_CTRL_MAX_PAYLOAD_MASK;
    942 
    943         hw->dmaw_block = min_t(u32, max_pay_load, hw->dmaw_block);

the value dmaw_block always gets reset to 0 on my machine. I forced it
to use atl1e_dma_req_1024 as it seems intended. But no change either.

Thanks,

  Hannes

^ permalink raw reply

* Re: [PATCH net-next] bridge: avoid br_ifinfo_notify when nothing changed
From: Sergei Shtylyov @ 2013-03-25 16:31 UTC (permalink / raw)
  To: David Miller; +Cc: honkiko, netdev, bridge, stephen, herbert, zhiguo.hong
In-Reply-To: <20130325.120808.1143527366649356424.davem@davemloft.net>

Hello.

On 25-03-2013 20:08, David Miller wrote:

>>     Sorry, I don't usuallly ACK patches in an area I'm not closely
>>     familiar with. Though I can add:

> If you feel confident enough to ask for corrections, you better be
> ready to ACK the result when your requests have been met.

   Come on, I only asked to correct an evident cut and paste typo in the 
description. That's too little a correction to ACK the whole patch. To ACK it, 
I should know well the code it's patching and be confident that the whole 
patch is correct. I have neither knowledge.

> You can't hold other people accountable yet take none of the
> same responsibility yourself.

    I don't hold anyone accountable, I just provide my comments which might as 
well be ignored by the submitter/maintainer.

WBR, Sergei

^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: David Vrabel @ 2013-03-25 16:34 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, netdev@vger.kernel.org, annie.li@oracle.com,
	konrad.wilk@oracle.com, Ian Campbell, xen-devel@lists.xen.org
In-Reply-To: <CAOsiSVUNO0FQkfJWjv9bcoMgKNnHu4XYciz61hG3YNFzZNxK3w@mail.gmail.com>

On 25/03/13 15:47, Wei Liu wrote:
> On Mon, Mar 25, 2013 at 3:13 PM, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 25/03/13 11:08, Wei Liu wrote:
>>> This patch tries to coalesce tx requests when constructing grant copy
>>> structures. It enables netback to deal with situation when frontend's
>>> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
>>>
>>> It defines max_skb_slots, which is a estimation of the maximum number of slots
>>> a guest can send, anything bigger than that is considered malicious. Now it is
>>> set to 20, which should be enough to accommodate Linux (16 to 19).
>>
>> This maximum needs to be defined as part of the protocol and added to
>> the interface header.
>>
> 
> No, this is not part of the protocol and not a hard limit. It is
> configurable by system administrator.

There is no mechanism by which the front and back ends can negotiate
this value, so it does need to be a fixed value that is equal or greater
than the max from any front or back end that has ever existed.

The reason for this patch is that this wasn't properly specified and
changes outside of netback broke the protocol.

>>> +
>>> +                             if (unlikely(!first)) {
>>
>> This isn't unlikely is it?
>>
> 
> For big packet the chance is 1 in max_skb_slots, so 5% (1/20) in default case.

I don't understand your reasoning here.  The "if (!first)" branch is
taken once per page.  It will be 100% if each slot goes into its own
page and only 5% if the packet is less than PAGE_SIZE in length but
split into 20 slots.

>>> +                                     first = &pending_tx_info[pending_idx];
>>> +                                     start_idx = index;
>>> +                                     head_idx = pending_idx;
>>> +                             }
>>
>> Instead of setting first here why not move the code below here?
>>
> 
> Because first->req.size needs to be set to dst_offset, it has to be
> here anyway. So I put other fields here as well.

Hmmm, I'd probably accumulate first->req.size but ok. It was a minor
point anyway.

>>> +             first->req.offset = 0;
>>> +             first->req.size = dst_offset;
>>> +             first->head = start_idx;
>>> +             set_page_ext(page, netbk, head_idx);
>>> +             netbk->mmap_pages[head_idx] = page;
>>> +             frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
>>
>>> @@ -1548,13 +1649,34 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
>> [...]
>>> +             /* Setting any number other than
>>> +              * INVALID_PENDING_RING_IDX indicates this slot is
>>> +              * starting a new packet / ending a previous packet.
>>> +              */
>>> +             pending_tx_info->head = 0;
>>
>> This doesn't look needed.  It will be initialized again when reusing t
>> his pending_tx_info again, right?
>>
> 
> Yes, it is needed. Otherwise netback responses to invalid tx_info and
> cause netfront to crash before coming into the re-initialization path.

Maybe I'm missing something but this is after the make_tx_reponse()
call, and immediately after this pending_tx_info is returned to the
pending ring as free.

David

^ permalink raw reply

* Re: [PATCH 2/6] xen-netfront: reduce gso_max_size to account for ethernet header
From: Eric Dumazet @ 2013-03-25 16:54 UTC (permalink / raw)
  To: David Miller
  Cc: wei.liu2, xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <20130325.121809.2023305728304341952.davem@davemloft.net>

On Mon, 2013-03-25 at 12:18 -0400, David Miller wrote:

> 
> This is effectively the default already, you don't need to change this
> value explicitly.
> 
> ->gso_max_size is set by default to 65536 and then TCP performs this
> calculation:
> 
>                 xmit_size_goal = ((sk->sk_gso_max_size - 1) -                          
>                                   inet_csk(sk)->icsk_af_ops->net_header_len -          
>                                   inet_csk(sk)->icsk_ext_hdr_len -                     
>                                   tp->tcp_header_len);                                 
> 
> thereby making it adhere to your limits just fine.

For locally generated TCP traffic this is the case.

However, GRO can build packets up to 65535 bytes, not including the
Ethernet header.

For such packets, it seems xen-netfront needs a segmentation.

And we might have other providers as well (UFO for example ?), but I
have not checked.

^ permalink raw reply

* Re: [PATCH 2/6] xen-netfront: reduce gso_max_size to account for ethernet header
From: Konrad Rzeszutek Wilk @ 2013-03-25 16:56 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, ian.campbell, annie.li, david.vrabel
In-Reply-To: <1364209702-12437-3-git-send-email-wei.liu2@citrix.com>

On Mon, Mar 25, 2013 at 11:08:18AM +0000, Wei Liu wrote:
> The maximum packet including ethernet header that can be handled by netfront /
> netback wire format is 65535. Reduce gso_max_size accordingly.
> 
> Drop skb and print warning when skb->len > 65535. This can 1) save the effort
> to send malformed packet to netback, 2) help spotting misconfiguration of
> netfront in the future.
> 

Should it also CC stable@vger.kernel.org?

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..3ae9dc1 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/* Wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t. If skb->len is too big, drop it and alert user about
> +	 * misconfiguration.
> +	 */
> +	if (unlikely(skb->len >= (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %u, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> @@ -1362,6 +1374,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
>  	SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
>  	SET_NETDEV_DEV(netdev, &dev->dev);
>  
> +	netif_set_gso_max_size(netdev, 65535 - ETH_HLEN);
> +
>  	np->netdev = netdev;
>  
>  	netif_carrier_off(netdev);
> -- 
> 1.7.10.4
> 

^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: Konrad Rzeszutek Wilk @ 2013-03-25 16:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, annie.li, david.vrabel, ian.campbell
In-Reply-To: <1364209702-12437-6-git-send-email-wei.liu2@citrix.com>

On Mon, Mar 25, 2013 at 11:08:21AM +0000, Wei Liu wrote:
> This patch tries to coalesce tx requests when constructing grant copy
> structures. It enables netback to deal with situation when frontend's
> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> 
> It defines max_skb_slots, which is a estimation of the maximum number of slots
> a guest can send, anything bigger than that is considered malicious. Now it is
> set to 20, which should be enough to accommodate Linux (16 to 19).
> 
> Also change variable name from "frags" to "slots" in netbk_count_requests.
> 

This should probably also CC stable@vger.kernel.org

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/netback.c |  220 ++++++++++++++++++++++++++++---------
>  1 file changed, 171 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 6e8e51a..a634dc5 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -47,11 +47,26 @@
>  #include <asm/xen/hypercall.h>
>  #include <asm/xen/page.h>
>  
> +/*
> + * This is an estimation of the maximum possible frags a SKB might
> + * have, anything larger than this is considered malicious. Typically
> + * Linux has 16 to 19.
> + */
> +#define MAX_SKB_SLOTS_DEFAULT 20
> +static unsigned int max_skb_slots = MAX_SKB_SLOTS_DEFAULT;
> +module_param(max_skb_slots, uint, 0444);
> +
> +typedef unsigned int pending_ring_idx_t;
> +#define INVALID_PENDING_RING_IDX (~0U)
> +
>  struct pending_tx_info {
> -	struct xen_netif_tx_request req;
> +	struct xen_netif_tx_request req; /* coalesced tx request  */
>  	struct xenvif *vif;
> +	pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX
> +				  * if it is head of serveral merged
> +				  * tx reqs
> +				  */
>  };
> -typedef unsigned int pending_ring_idx_t;
>  
>  struct netbk_rx_meta {
>  	int id;
> @@ -102,7 +117,11 @@ struct xen_netbk {
>  	atomic_t netfront_count;
>  
>  	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
> -	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
> +	/* Coalescing tx requests before copying makes number of grant
> +	 * copy ops greater of equal to number of slots required. In
> +	 * worst case a tx request consumes 2 gnttab_copy.
> +	 */
> +	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
>  
>  	u16 pending_ring[MAX_PENDING_REQS];
>  
> @@ -251,7 +270,7 @@ static int max_required_rx_slots(struct xenvif *vif)
>  	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
>  
>  	if (vif->can_sg || vif->gso || vif->gso_prefix)
> -		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
> +		max += max_skb_slots + 1; /* extra_info + frags */
>  
>  	return max;
>  }
> @@ -657,7 +676,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
>  		__skb_queue_tail(&rxq, skb);
>  
>  		/* Filled the batch queue? */
> -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> +		if (count + max_skb_slots >= XEN_NETIF_RX_RING_SIZE)
>  			break;
>  	}
>  
> @@ -908,34 +927,34 @@ static int netbk_count_requests(struct xenvif *vif,
>  				int work_to_do)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
> -	int frags = 0;
> +	int slots = 0;
>  
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
>  
>  	do {
> -		if (frags >= work_to_do) {
> -			netdev_err(vif->dev, "Need more frags\n");
> +		if (slots >= work_to_do) {
> +			netdev_err(vif->dev, "Need more slots\n");
>  			netbk_fatal_tx_err(vif);
>  			return -ENODATA;
>  		}
>  
> -		if (unlikely(frags >= MAX_SKB_FRAGS)) {
> -			netdev_err(vif->dev, "Too many frags\n");
> +		if (unlikely(slots >= max_skb_slots)) {
> +			netdev_err(vif->dev, "Too many slots\n");
>  			netbk_fatal_tx_err(vif);
>  			return -E2BIG;
>  		}
>  
> -		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
> +		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
>  		       sizeof(*txp));
>  		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> +			netdev_err(vif->dev, "Packet is bigger than frame.\n");
>  			netbk_fatal_tx_err(vif);
>  			return -EIO;
>  		}
>  
>  		first->size -= txp->size;
> -		frags++;
> +		slots++;
>  
>  		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
>  			netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
> @@ -944,7 +963,7 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> -	return frags;
> +	return slots;
>  }
>  
>  static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
> @@ -968,48 +987,114 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
>  	skb_frag_t *frags = shinfo->frags;
>  	u16 pending_idx = *((u16 *)skb->data);
> -	int i, start;
> +	u16 head_idx = 0;
> +	int slot, start;
> +	struct page *page;
> +	pending_ring_idx_t index, start_idx = 0;
> +	uint16_t dst_offset;
> +	unsigned int nr_slots;
> +	struct pending_tx_info *first = NULL;
> +
> +	/* At this point shinfo->nr_frags is in fact the number of
> +	 * slots, which can be as large as max_skb_slots.
> +	 */
> +	nr_slots = shinfo->nr_frags;
>  
>  	/* Skip first skb fragment if it is on same page as header fragment. */
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
> -	for (i = start; i < shinfo->nr_frags; i++, txp++) {
> -		struct page *page;
> -		pending_ring_idx_t index;
> +	/* Coalesce tx requests, at this point the packet passed in
> +	 * should be <= 64K. Any packets larger than 64K have been
> +	 * handled in netbk_count_requests().
> +	 */
> +	for (shinfo->nr_frags = slot = start; slot < nr_slots;
> +	     shinfo->nr_frags++) {
>  		struct pending_tx_info *pending_tx_info =
>  			netbk->pending_tx_info;
>  
> -		index = pending_index(netbk->pending_cons++);
> -		pending_idx = netbk->pending_ring[index];
> -		page = xen_netbk_alloc_page(netbk, pending_idx);
> +		page = alloc_page(GFP_KERNEL|__GFP_COLD);
>  		if (!page)
>  			goto err;
>  
> -		gop->source.u.ref = txp->gref;
> -		gop->source.domid = vif->domid;
> -		gop->source.offset = txp->offset;
> -
> -		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
> -		gop->dest.domid = DOMID_SELF;
> -		gop->dest.offset = txp->offset;
> -
> -		gop->len = txp->size;
> -		gop->flags = GNTCOPY_source_gref;
> +		dst_offset = 0;
> +		first = NULL;
> +		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
> +			gop->flags = GNTCOPY_source_gref;
> +
> +			gop->source.u.ref = txp->gref;
> +			gop->source.domid = vif->domid;
> +			gop->source.offset = txp->offset;
> +
> +			gop->dest.domid = DOMID_SELF;
> +
> +			gop->dest.offset = dst_offset;
> +			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
> +
> +			if (dst_offset + txp->size > PAGE_SIZE) {
> +				/* This page can only merge a portion
> +				 * of tx request. Do not increment any
> +				 * pointer / counter here. The txp
> +				 * will be dealt with in future
> +				 * rounds, eventually hitting the
> +				 * `else` branch.
> +				 */
> +				gop->len = PAGE_SIZE - dst_offset;
> +				txp->offset += gop->len;
> +				txp->size -= gop->len;
> +				dst_offset += gop->len; /* quit loop */
> +			} else {
> +				/* This tx request can be merged in the page */
> +				gop->len = txp->size;
> +				dst_offset += gop->len;
> +
> +				index = pending_index(netbk->pending_cons++);
> +
> +				pending_idx = netbk->pending_ring[index];
> +
> +				memcpy(&pending_tx_info[pending_idx].req, txp,
> +				       sizeof(*txp));
> +				xenvif_get(vif);
> +
> +				pending_tx_info[pending_idx].vif = vif;
> +
> +				/* Poison these fields, corresponding
> +				 * fields for head tx req will be set
> +				 * to correct values after the loop.
> +				 */
> +				netbk->mmap_pages[pending_idx] = (void *)(~0UL);
> +				pending_tx_info[pending_idx].head =
> +					INVALID_PENDING_RING_IDX;
> +
> +				if (unlikely(!first)) {
> +					first = &pending_tx_info[pending_idx];
> +					start_idx = index;
> +					head_idx = pending_idx;
> +				}
> +
> +				txp++;
> +				slot++;
> +			}
>  
> -		gop++;
> +			gop++;
> +		}
>  
> -		memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp));
> -		xenvif_get(vif);
> -		pending_tx_info[pending_idx].vif = vif;
> -		frag_set_pending_idx(&frags[i], pending_idx);
> +		first->req.offset = 0;
> +		first->req.size = dst_offset;
> +		first->head = start_idx;
> +		set_page_ext(page, netbk, head_idx);
> +		netbk->mmap_pages[head_idx] = page;
> +		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
>  	}
>  
> +	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
> +
>  	return gop;
>  err:
>  	/* Unwind, freeing all pages and sending error responses. */
> -	while (i-- > start) {
> -		xen_netbk_idx_release(netbk, frag_get_pending_idx(&frags[i]),
> -				      XEN_NETIF_RSP_ERROR);
> +	while (shinfo->nr_frags-- > start) {
> +		xen_netbk_idx_release(netbk,
> +				frag_get_pending_idx(&frags[shinfo->nr_frags]),
> +				XEN_NETIF_RSP_ERROR);
>  	}
>  	/* The head too, if necessary. */
>  	if (start)
> @@ -1025,8 +1110,10 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
>  	struct gnttab_copy *gop = *gopp;
>  	u16 pending_idx = *((u16 *)skb->data);
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
> +	struct pending_tx_info *tx_info;
>  	int nr_frags = shinfo->nr_frags;
>  	int i, err, start;
> +	u16 peek; /* peek into next tx request */
>  
>  	/* Check status of header. */
>  	err = gop->status;
> @@ -1038,11 +1125,21 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
>  
>  	for (i = start; i < nr_frags; i++) {
>  		int j, newerr;
> +		pending_ring_idx_t head;
>  
>  		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
> +		tx_info = &netbk->pending_tx_info[pending_idx];
> +		head = tx_info->head;
>  
>  		/* Check error status: if okay then remember grant handle. */
> -		newerr = (++gop)->status;
> +		do {
> +			newerr = (++gop)->status;
> +			if (newerr)
> +				break;
> +			peek = netbk->pending_ring[pending_index(++head)];
> +		} while (netbk->pending_tx_info[peek].head
> +			 == INVALID_PENDING_RING_IDX);
> +
>  		if (likely(!newerr)) {
>  			/* Had a previous error? Invalidate this fragment. */
>  			if (unlikely(err))
> @@ -1267,11 +1364,11 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
>  	struct sk_buff *skb;
>  	int ret;
>  
> -	while (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
> +	while (((nr_pending_reqs(netbk) + max_skb_slots) < MAX_PENDING_REQS) &&
>  		!list_empty(&netbk->net_schedule_list)) {
>  		struct xenvif *vif;
>  		struct xen_netif_tx_request txreq;
> -		struct xen_netif_tx_request txfrags[MAX_SKB_FRAGS];
> +		struct xen_netif_tx_request txfrags[max_skb_slots];
>  		struct page *page;
>  		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
>  		u16 pending_idx;
> @@ -1359,7 +1456,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
>  		pending_idx = netbk->pending_ring[index];
>  
>  		data_len = (txreq.size > PKT_PROT_LEN &&
> -			    ret < MAX_SKB_FRAGS) ?
> +			    ret < max_skb_slots) ?
>  			PKT_PROT_LEN : txreq.size;
>  
>  		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
> @@ -1409,6 +1506,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
>  		memcpy(&netbk->pending_tx_info[pending_idx].req,
>  		       &txreq, sizeof(txreq));
>  		netbk->pending_tx_info[pending_idx].vif = vif;
> +		netbk->pending_tx_info[pending_idx].head = index;
>  		*((u16 *)skb->data) = pending_idx;
>  
>  		__skb_put(skb, data_len);
> @@ -1539,7 +1637,10 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
>  {
>  	struct xenvif *vif;
>  	struct pending_tx_info *pending_tx_info;
> -	pending_ring_idx_t index;
> +	pending_ring_idx_t head;
> +	u16 peek; /* peek into next tx request */
> +
> +	BUG_ON(netbk->mmap_pages[pending_idx] == (void *)(~0UL));
>  
>  	/* Already complete? */
>  	if (netbk->mmap_pages[pending_idx] == NULL)
> @@ -1548,13 +1649,34 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
>  	pending_tx_info = &netbk->pending_tx_info[pending_idx];
>  
>  	vif = pending_tx_info->vif;
> +	head = pending_tx_info->head;
>  
> -	make_tx_response(vif, &pending_tx_info->req, status);
> +	BUG_ON(head == INVALID_PENDING_RING_IDX);
> +	BUG_ON(netbk->pending_ring[pending_index(head)] != pending_idx);
>  
> -	index = pending_index(netbk->pending_prod++);
> -	netbk->pending_ring[index] = pending_idx;
> +	do {
> +		pending_ring_idx_t index;
> +		pending_ring_idx_t idx = pending_index(head);
> +		u16 info_idx = netbk->pending_ring[idx];
>  
> -	xenvif_put(vif);
> +		pending_tx_info = &netbk->pending_tx_info[info_idx];
> +		make_tx_response(vif, &pending_tx_info->req, status);
> +
> +		/* Setting any number other than
> +		 * INVALID_PENDING_RING_IDX indicates this slot is
> +		 * starting a new packet / ending a previous packet.
> +		 */
> +		pending_tx_info->head = 0;
> +
> +		index = pending_index(netbk->pending_prod++);
> +		netbk->pending_ring[index] = netbk->pending_ring[info_idx];
> +
> +		xenvif_put(vif);
> +
> +		peek = netbk->pending_ring[pending_index(++head)];
> +
> +	} while (netbk->pending_tx_info[peek].head
> +		 == INVALID_PENDING_RING_IDX);
>  
>  	netbk->mmap_pages[pending_idx]->mapping = 0;
>  	put_page(netbk->mmap_pages[pending_idx]);
> @@ -1613,7 +1735,7 @@ static inline int rx_work_todo(struct xen_netbk *netbk)
>  static inline int tx_work_todo(struct xen_netbk *netbk)
>  {
>  
> -	if (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
> +	if (((nr_pending_reqs(netbk) + max_skb_slots) < MAX_PENDING_REQS) &&
>  			!list_empty(&netbk->net_schedule_list))
>  		return 1;
>  
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply

* Re: [PATCH 5/6] xen-netback: coalesce slots before copying
From: Wei Liu @ 2013-03-25 16:58 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, Ian Campbell, konrad.wilk@oracle.com,
	netdev@vger.kernel.org, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <51507C9B.3080109@citrix.com>

On Mon, Mar 25, 2013 at 4:34 PM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 25/03/13 15:47, Wei Liu wrote:
>> On Mon, Mar 25, 2013 at 3:13 PM, David Vrabel <david.vrabel@citrix.com> wrote:
>>> On 25/03/13 11:08, Wei Liu wrote:
>>>> This patch tries to coalesce tx requests when constructing grant copy
>>>> structures. It enables netback to deal with situation when frontend's
>>>> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
>>>>
>>>> It defines max_skb_slots, which is a estimation of the maximum number of slots
>>>> a guest can send, anything bigger than that is considered malicious. Now it is
>>>> set to 20, which should be enough to accommodate Linux (16 to 19).
>>>
>>> This maximum needs to be defined as part of the protocol and added to
>>> the interface header.
>>>
>>
>> No, this is not part of the protocol and not a hard limit. It is
>> configurable by system administrator.
>
> There is no mechanism by which the front and back ends can negotiate
> this value, so it does need to be a fixed value that is equal or greater
> than the max from any front or back end that has ever existed.
>

Are you suggesting move the default macro value to header file? It is
just an estimation, I have no knowledge of the accurate maximum value,
so I think make it part of the protocol a bad idea.

Do you have a handle on the maximum value?

> The reason for this patch is that this wasn't properly specified and
> changes outside of netback broke the protocol.
>
>>>> +
>>>> +                             if (unlikely(!first)) {
>>>
>>> This isn't unlikely is it?
>>>
>>
>> For big packet the chance is 1 in max_skb_slots, so 5% (1/20) in default case.
>
> I don't understand your reasoning here.  The "if (!first)" branch is
> taken once per page.  It will be 100% if each slot goes into its own
> page and only 5% if the packet is less than PAGE_SIZE in length but
> split into 20 slots.
>

My mistake. Should be a small packet split into multiple slots.

>>> [...]
>>>> +             /* Setting any number other than
>>>> +              * INVALID_PENDING_RING_IDX indicates this slot is
>>>> +              * starting a new packet / ending a previous packet.
>>>> +              */
>>>> +             pending_tx_info->head = 0;
>>>
>>> This doesn't look needed.  It will be initialized again when reusing t
>>> his pending_tx_info again, right?
>>>
>>
>> Yes, it is needed. Otherwise netback responses to invalid tx_info and
>> cause netfront to crash before coming into the re-initialization path.
>
> Maybe I'm missing something but this is after the make_tx_reponse()
> call, and immediately after this pending_tx_info is returned to the
> pending ring as free.
>

So it is a bit tricky here. Let me clarify this, the head field is
used to indicate the start of a new tx requests queue and the end of
previous queue.

Imagine a sequence of head fileds(I = INVALID_PENDING_RING_IDX below),
the number is the starting index of pending ring.

  .... 0 I I I 5 I I ...

consume all tx_info but not setting I to 0 (or any number other then
I) makes the sequence remains the same as before. The in subsequent
call to process next SKB, which has 3 extra slots, which makes the
sequence look like

  .... 8 I I I I I I ...

but in fact the correct sequence should be

  .... 8 I I I 0 I I ...

The wrong sequence makes netbk_idx_release responses to more slots
than required, which causes netfront to crash miserably.


Wei.


> David

^ permalink raw reply

* Re: [Xen-devel] [PATCH 6/6] xen-netback: don't disconnect frontend when seeing oversize frame
From: Konrad Rzeszutek Wilk @ 2013-03-25 16:58 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, annie.li, david.vrabel, ian.campbell
In-Reply-To: <1364209702-12437-7-git-send-email-wei.liu2@citrix.com>

On Mon, Mar 25, 2013 at 11:08:22AM +0000, Wei Liu wrote:
> Some buggy frontends may generate frames larger than 64 KiB. We should
> aggresively consume all slots and drop the packet instead of disconnecting the
> frontend.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

CC stable@vger.kernel.org ?

> ---
>  drivers/net/xen-netback/netback.c |   32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index a634dc5..1971623 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -924,10 +924,12 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
>  static int netbk_count_requests(struct xenvif *vif,
>  				struct xen_netif_tx_request *first,
>  				struct xen_netif_tx_request *txp,
> -				int work_to_do)
> +				int work_to_do,
> +				RING_IDX first_idx)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
>  	int slots = 0;
> +	bool drop = false;
>  
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
> @@ -947,10 +949,21 @@ static int netbk_count_requests(struct xenvif *vif,
>  
>  		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
>  		       sizeof(*txp));
> -		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Packet is bigger than frame.\n");
> -			netbk_fatal_tx_err(vif);
> -			return -EIO;
> +
> +		/* If the guest submitted a frame >= 64 KiB then
> +		 * first->size overflowed and following slots will
> +		 * appear to be larger than the frame.
> +		 *
> +		 * This cannot be fatal error as there are buggy
> +		 * frontends that do this.
> +		 *
> +		 * Consume all slots and drop the packet.
> +		 */
> +		if (!drop && txp->size > first->size) {
> +			if (net_ratelimit())
> +				netdev_dbg(vif->dev,
> +					   "Packet is bigger than frame.\n");
> +			drop = true;
>  		}
>  
>  		first->size -= txp->size;
> @@ -963,6 +976,12 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> +
> +	if (drop) {
> +		netbk_tx_err(vif, first, first_idx + slots);
> +		return -EIO;
> +	}
> +
>  	return slots;
>  }
>  
> @@ -1429,7 +1448,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
>  				continue;
>  		}
>  
> -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> +		ret = netbk_count_requests(vif, &txreq, txfrags,
> +					   work_to_do, idx);
>  		if (unlikely(ret < 0))
>  			continue;
>  
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply

* Re: [PATCH 2/6] xen-netfront: reduce gso_max_size to account for ethernet header
From: David Miller @ 2013-03-25 16:59 UTC (permalink / raw)
  To: eric.dumazet
  Cc: wei.liu2, xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <1364230472.29473.21.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Mar 2013 09:54:32 -0700

> On Mon, 2013-03-25 at 12:18 -0400, David Miller wrote:
> 
>> 
>> This is effectively the default already, you don't need to change this
>> value explicitly.
>> 
>> ->gso_max_size is set by default to 65536 and then TCP performs this
>> calculation:
>> 
>>                 xmit_size_goal = ((sk->sk_gso_max_size - 1) -                          
>>                                   inet_csk(sk)->icsk_af_ops->net_header_len -          
>>                                   inet_csk(sk)->icsk_ext_hdr_len -                     
>>                                   tp->tcp_header_len);                                 
>> 
>> thereby making it adhere to your limits just fine.
> 
> For locally generated TCP traffic this is the case.

Right, and also any other piece of code that is not interpreting the
gso_max_size value the same way (as "(x - 1) - sizeof_headers") would
need to be fixed.

> However, GRO can build packets up to 65535 bytes, not including the
> Ethernet header.

If this GRO packet ends up being transmitted, the gso limit should be
applied, otherwise we would be violating the device's advertised GSO
limit value.

Assume that this kind of check is performed (it must), then I don't
see how GRO can cause any problems for Xen.

^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: Wei Liu @ 2013-03-25 17:06 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, netdev@vger.kernel.org, annie.li@oracle.com,
	konrad.wilk@oracle.com, Ian Campbell, xen-devel@lists.xen.org
In-Reply-To: <CAOsiSVV8gtKqFxyAr3XN9YF78WjtFqzvx5mnA1dKTrS_4qzXBw@mail.gmail.com>

On Mon, Mar 25, 2013 at 4:58 PM, Wei Liu <liuw@liuw.name> wrote:
>
> So it is a bit tricky here. Let me clarify this, the head field is
> used to indicate the start of a new tx requests queue and the end of
> previous queue.
>
> Imagine a sequence of head fileds(I = INVALID_PENDING_RING_IDX below),
> the number is the starting index of pending ring.
>
>   .... 0 I I I 5 I I ...
>
> consume all tx_info but not setting I to 0 (or any number other then
> I) makes the sequence remains the same as before. The in subsequent
> call to process next SKB, which has 3 extra slots, which makes the

Sorry, should be "4 extra slots"

sequence looks like

  .... 8 | | | | | | ...

but in fact the correct sequence should be

  .... 8 | | | | 0 0 ...


Wei.

^ permalink raw reply

* Re: [PATCH 1/1] ethernet driver: dm9000: davicom: Upgrade the driver to suit for all DM9000 series chips
From: David Miller @ 2013-03-25 17:09 UTC (permalink / raw)
  To: josright123
  Cc: wfp5p, matthew, gregkh, joseph_chang, jiri, netdev, linux-kernel
In-Reply-To: <1364198682-1589-1-git-send-email-josright123@gmail.com>

From: Joseph CHANG <josright123@gmail.com>
Date: Mon, 25 Mar 2013 16:04:42 +0800

>  #define CARDNAME	"dm9000"
> -#define DRV_VERSION	"1.31"
> +/* [KERN-ADD-1](upgrade) */
> +#define DRV_VERSION	"1.39"
 ...
> +			/* [KERN-ADD-2] */

These kinds of comments are absolutely inapproprate.

^ permalink raw reply

* Re: [Eulerkernel] [PATCH] af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
From: David Miller @ 2013-03-25 17:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dingtianhong, edumazet, netdev, lizefan, huxinwei
In-Reply-To: <1364220269.29473.13.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Mar 2013 07:04:29 -0700

> On Mon, 2013-03-25 at 18:28 +0800, dingtianhong wrote:
>> SCM_SCREDENTIALS should apply to write() syscalls only either source or destination
>> socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong,
>> and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom).
>> 
>> Origionally-authored-by: Karel Srot <ksrot@redhat.com>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>    net/unix/af_unix.c | 4 ++--
>>    1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index 51be64f..99189fd 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1413,8 +1413,8 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
>>           if (UNIXCB(skb).cred)
>>                   return;
>>           if (test_bit(SOCK_PASSCRED, &sock->flags) ||
>> -           !other->sk_socket ||
>> -           test_bit(SOCK_PASSCRED, &other->sk_socket->flags)) {
>> +           (other->sk_socket &&
>> +           test_bit(SOCK_PASSCRED, &other->sk_socket->flags))) {
>>                   UNIXCB(skb).pid  = get_pid(task_tgid(current));
>>                   UNIXCB(skb).cred = get_current_cred();
>>           }
> 
> I am not sure why adding credentials if other->sk_socket is NULL could
> break an application ?
> 
> This was the case before commit introducing this code.
> 
> Anyway patch looks fine
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

This patch is massively whitespace damaged, and therefore completely
unusable.

Please fix this, and only resubmit this patch when you can successfully
email the patch to yourself and apply it successfully.

^ permalink raw reply

* Re: [PATCH] unix: fix a race condition in unix_release()
From: David Miller @ 2013-03-25 17:13 UTC (permalink / raw)
  To: pmoore; +Cc: netdev, jan.stancek
In-Reply-To: <20130325131833.10376.68379.stgit@localhost>

From: Paul Moore <pmoore@redhat.com>
Date: Mon, 25 Mar 2013 09:18:33 -0400

> As reported by Jan, and others over the past few years, there is a
> race condition caused by unix_release setting the sock->sk pointer
> to NULL before properly marking the socket as dead/orphaned.  This
> can cause a problem with the LSM hook security_unix_may_send() if
> there is another socket attempting to write to this partially
> released socket in between when sock->sk is set to NULL and it is
> marked as dead/orphaned.  This patch fixes this by only setting
> sock->sk to NULL after the socket has been marked as dead; I also
> take the opportunity to make unix_release_sock() a void function
> as it only ever returned 0/success.
> 
> Dave, I think this one should go on the -stable pile.
> 
> Special thanks to Jan for coming up with a reproducer for this
> problem.
> 
> Reported-by: Jan Stancek <jan.stancek@gmail.com>
> Signed-off-by: Paul Moore <pmoore@redhat.com>

Applied, and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: don't free static mem in the case of init_net
From: David Miller @ 2013-03-25 17:16 UTC (permalink / raw)
  To: honkiko; +Cc: netdev, stephen
In-Reply-To: <1364228725-16218-1-git-send-email-honkiko@gmail.com>

From: Hong Zhiguo <honkiko@gmail.com>
Date: Tue, 26 Mar 2013 00:25:25 +0800

> Signed-off-by: Hong Zhiguo <honkiko@gmail.com>

1) This should be submitted against 'net' since it's clearly
   a real bug.

2) This needs to be fixed differently.  We should always
   kmemdup() and create fresh copies of these objects, even
   for init_net.  So 'dflt' and 'all' will start as "NULL"
   and always we will try to kmemdup(&ipv6_defconf, ...) and
   kmemdup(&ipv6_devconf_dflt, ...);

^ permalink raw reply

* [PATCH net-next] netlink: add missing length check of rtnl msg
From: Hong Zhiguo @ 2013-03-25 17:22 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, tgraf, Hong Zhiguo

When the legacy array rtm_min still exists, the length check within these 
functions is covered by rtm_min[RTM_NEWTFILTER], rtm_min[RTM_NEWQDISC] 
and rtm_min[RTM_NEWTCLASS].

But after Thomas Graf removed rtm_min several days ago, these checks are 
missing. 

Other doit functions should be OK.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 net/sched/cls_api.c |    4 ++++
 net/sched/sch_api.c |   22 ++++++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 9a04b98..01bfa9c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -141,6 +141,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) && !capable(CAP_NET_ADMIN))
 		return -EPERM;
+
+	if (nlmsg_len(n) < sizeof(*t))
+		return -EINVAL;
+
 replay:
 	t = nlmsg_data(n);
 	protocol = TC_H_MIN(t->tcm_info);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0bbce22..b30a988 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -977,7 +977,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	struct tcmsg *tcm = nlmsg_data(n);
 	struct nlattr *tca[TCA_MAX + 1];
 	struct net_device *dev;
-	u32 clid = tcm->tcm_parent;
+	u32 clid;
 	struct Qdisc *q = NULL;
 	struct Qdisc *p = NULL;
 	int err;
@@ -985,6 +985,9 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	if ((n->nlmsg_type != RTM_GETQDISC) && !capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (nlmsg_len(n) < sizeof(*tcm))
+		return -EINVAL;
+
 	dev = __dev_get_by_index(net, tcm->tcm_ifindex);
 	if (!dev)
 		return -ENODEV;
@@ -993,6 +996,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	if (err < 0)
 		return err;
 
+	clid = tcm->tcm_parent;
 	if (clid) {
 		if (clid != TC_H_ROOT) {
 			if (TC_H_MAJ(clid) != TC_H_MAJ(TC_H_INGRESS)) {
@@ -1051,6 +1055,9 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (nlmsg_len(n) < sizeof(*tcm))
+		return -EINVAL;
+
 replay:
 	/* Reinit, just in case something touches this. */
 	tcm = nlmsg_data(n);
@@ -1382,14 +1389,17 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 	const struct Qdisc_class_ops *cops;
 	unsigned long cl = 0;
 	unsigned long new_cl;
-	u32 portid = tcm->tcm_parent;
-	u32 clid = tcm->tcm_handle;
-	u32 qid = TC_H_MAJ(clid);
+	u32 portid;
+	u32 clid;
+	u32 qid;
 	int err;
 
 	if ((n->nlmsg_type != RTM_GETTCLASS) && !capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (nlmsg_len(n) < sizeof(*tcm))
+		return -EINVAL;
+
 	dev = __dev_get_by_index(net, tcm->tcm_ifindex);
 	if (!dev)
 		return -ENODEV;
@@ -1413,6 +1423,10 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 
 	/* Step 1. Determine qdisc handle X:0 */
 
+	portid = tcm->tcm_parent;
+	clid = tcm->tcm_handle;
+	qid = TC_H_MAJ(clid);
+
 	if (portid != TC_H_ROOT) {
 		u32 qid1 = TC_H_MAJ(portid);
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 2/6] xen-netfront: reduce gso_max_size to account for ethernet header
From: Eric Dumazet @ 2013-03-25 17:24 UTC (permalink / raw)
  To: David Miller
  Cc: wei.liu2, xen-devel, netdev, ian.campbell, annie.li, konrad.wilk,
	david.vrabel
In-Reply-To: <20130325.125949.1046297562456662826.davem@davemloft.net>

On Mon, 2013-03-25 at 12:59 -0400, David Miller wrote:

> If this GRO packet ends up being transmitted, the gso limit should be
> applied, otherwise we would be violating the device's advertised GSO
> limit value.
> 
> Assume that this kind of check is performed (it must), then I don't
> see how GRO can cause any problems for Xen.

It seems nobody cared to perform this generic check.

netif_skb_features() only deals with max_segs :

if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs) 
	features &= ~NETIF_F_GSO_MASK; 


dev->gso_max_size is currently only used to populate sk->sk_gso_max_size

For regular 1500 MTU and at most 17 frags per skb, its hardly a problem,
but it could happen with jumbo frames, or using loopback and splice()

^ permalink raw reply

* [PATCH net] ipv6: fix bad free of addrconf_init_net
From: Hong Zhiguo @ 2013-03-25 17:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, Hong Zhiguo

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 net/ipv6/addrconf.c |   26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index fa36a67..e7d81cd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4852,26 +4852,20 @@ static void addrconf_sysctl_unregister(struct inet6_dev *idev)
 
 static int __net_init addrconf_init_net(struct net *net)
 {
-	int err;
+	int err = -ENOMEM;
 	struct ipv6_devconf *all, *dflt;
 
-	err = -ENOMEM;
-	all = &ipv6_devconf;
-	dflt = &ipv6_devconf_dflt;
+	all = kmemdup(&ipv6_devconf, sizeof(ipv6_devconf), GFP_KERNEL);
+	if (all == NULL)
+		goto err_alloc_all;
 
-	if (!net_eq(net, &init_net)) {
-		all = kmemdup(all, sizeof(ipv6_devconf), GFP_KERNEL);
-		if (all == NULL)
-			goto err_alloc_all;
+	dflt = kmemdup(&ipv6_devconf_dflt, sizeof(ipv6_devconf_dflt), GFP_KERNEL);
+	if (dflt == NULL)
+		goto err_alloc_dflt;
 
-		dflt = kmemdup(dflt, sizeof(ipv6_devconf_dflt), GFP_KERNEL);
-		if (dflt == NULL)
-			goto err_alloc_dflt;
-	} else {
-		/* these will be inherited by all namespaces */
-		dflt->autoconf = ipv6_defaults.autoconf;
-		dflt->disable_ipv6 = ipv6_defaults.disable_ipv6;
-	}
+	/* these will be inherited by all namespaces */
+	dflt->autoconf = ipv6_defaults.autoconf;
+	dflt->disable_ipv6 = ipv6_defaults.disable_ipv6;
 
 	net->ipv6.devconf_all = all;
 	net->ipv6.devconf_dflt = dflt;
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox