Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/6 V2] mlx4: Fix vlan table overflow
From: Yevgeny Petrilin @ 2011-10-18 11:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp


Prevent overflow when trying to register more Vlans then the Vlan table in
HW is configured to.
Need to take into acount that the first 2 entries are reserved.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/port.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 609e0ec..163a314 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -65,7 +65,7 @@ void mlx4_init_vlan_table(struct mlx4_dev *dev, struct mlx4_vlan_table *table)
 		table->entries[i] = 0;
 		table->refs[i]	 = 0;
 	}
-	table->max   = 1 << dev->caps.log_num_vlans;
+	table->max   = (1 << dev->caps.log_num_vlans) - MLX4_VLAN_REGULAR;
 	table->total = 0;
 }
 
@@ -354,6 +354,13 @@ int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
 	int free = -1;
 
 	mutex_lock(&table->mutex);
+
+	if (table->total == table->max) {
+		/* No free vlan entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
 	for (i = MLX4_VLAN_REGULAR; i < MLX4_MAX_VLAN_NUM; i++) {
 		if (free < 0 && (table->refs[i] == 0)) {
 			free = i;
@@ -375,12 +382,6 @@ int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
 		goto out;
 	}
 
-	if (table->total == table->max) {
-		/* No free vlan entries */
-		err = -ENOSPC;
-		goto out;
-	}
-
 	/* Register new MAC */
 	table->refs[free] = 1;
 	table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID);
-- 
1.7.7

^ permalink raw reply related

* [PATCH 0/6 V2] mlx4_en: Data path optimizations and bug fix
From: Yevgeny Petrilin @ 2011-10-18 11:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, yevgenyp

Hello,
This is V2 of the patches sent yesterday.
The differences comparing to V1 are:
1. Remove patch 3/7 : "Incoming traffic alignment optimizations" following Eric's comments for a rework.
2. Using static seed value for Toeplitz function, rather then randomizing the value each time

 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_port.c   |    6 ++++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |   23 ++++++++++++++++++++---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c     |    2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c        |    1 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    7 +++++--
 drivers/net/ethernet/mellanox/mlx4/port.c      |   15 ++++++++-------
 include/linux/mlx4/device.h                    |    1 +


Thanks,
Yevgeny

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Herbert Xu @ 2011-10-18 13:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs
In-Reply-To: <1318942560.2657.69.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Oct 18, 2011 at 02:56:00PM +0200, Eric Dumazet wrote:
>
> I am ok by this way, but we might hit another similar problem elsewhere.
> 
> (igmp.c ip6_output, ...)
> 
> We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate
> code...

Here's another idea, provide a helper to do the skb allocation
and the skb_reserve in one go.  That way this ugliness would only
need to be done once.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Possible via-velocity bug.
From: Eric Dumazet @ 2011-10-18 13:38 UTC (permalink / raw)
  To: n00bsys0p; +Cc: linux-kernel, netdev
In-Reply-To: <j7js28$u0m$1@dough.gmane.org>

Le mardi 18 octobre 2011 à 13:39 +0100, n00bsys0p a écrit :
> Hi everyone, I have a nice simple question.
> 
> I've found what I think is a bug in the 3.0+ kernel's via-velocity LAN 
> driver, which causes the leaking of data into the video RAM during PXE boot.
> 
> How do I go about reporting this to the relevant people?
> 
> Thanks in advance!

Here and netdev is fine.

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: Krishna Kumar2 @ 2011-10-18 13:22 UTC (permalink / raw)
  To: Ian Campbell
  Cc: davem@davemloft.net, linux-kernel@vger.kernel.org, mst@redhat.com,
	netdev@vger.kernel.org, rusty@rustcorp.com.au,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1318935055.3385.0.camel@zakaz.uk.xensource.com>

Ian Campbell <Ian.Campbell@citrix.com> wrote on 10/18/2011 04:20:55 PM:

> > > Sigh, only one out of the ten callers of (__)skb_frag_set_page
expects
> > > skb_frag_set_page to take a new reference. I think that's pretty
> > > comprehensive evidence that the current behaviour is unexpected and
> > > wrong.
> >
> > Looks good!
>
> Thanks.

Tested this patch for virtio_net - hang/panic is fixed.
MemFree also doesn't reduce at end of test compared to
the start.

Tested-by: krkumar2@in.ibm.com

- KK

> > Does it make sense to commit both of these patches? The
> > reason being - my patch becomes a cleanup of set_skb_frag()
> > in virtio_net driver.
>
> I think your patch remains a valid cleanup but I don't maintain that
> code.
>
> Ian.
>
>

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: Eric Dumazet @ 2011-10-18 13:20 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Krishna Kumar, rusty@rustcorp.com.au, mst@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, davem@davemloft.net
In-Reply-To: <1318931232.16132.65.camel@zakaz.uk.xensource.com>

Le mardi 18 octobre 2011 à 10:47 +0100, Ian Campbell a écrit :
> On Tue, 2011-10-18 at 09:36 +0100, Ian Campbell wrote:
> > I think the best thing might be to remove the additional ref taking from
> > the setter function and audit the previous changes to ensure they
> > conform. I'll do that right away and post a fixup patch ASAP.
> 
> Sigh, only one out of the ten callers of (__)skb_frag_set_page expects
> skb_frag_set_page to take a new reference. I think that's pretty
> comprehensive evidence that the current behaviour is unexpected and
> wrong. 
> 
> Sorry about this.
> 
> Ian.
> 
> 8<---------------------------------------------------------
> 
> From 42c26b7ca640bd5cb6f9c3bc76db96c92ed8ff82 Mon Sep 17 00:00:00 2001
> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Tue, 18 Oct 2011 09:59:37 +0100
> Subject: [PATCH] net: do not take an additional reference in skb_frag_set_page
> 
> I audited all of the callers in the tree and only one of them (pktgen) expects
> it to do so. Taking this reference is pretty obviously confusing and error
> prone.
> 
> In particular I looked at the following commits which switched callers of
> (__)skb_frag_set_page to the skb paged fragment api:
> 
> 6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
> 5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
> 0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
> 86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
> 4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
> 18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
> b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
> b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
> 804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
> ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  include/linux/skbuff.h |    1 -
>  net/core/pktgen.c      |    1 +
>  2 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 64f8695..78741da 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1765,7 +1765,6 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
>  static inline void __skb_frag_set_page(skb_frag_t *frag, struct page *page)
>  {
>  	frag->page = page;
> -	__skb_frag_ref(frag);
>  }
>  
>  /**
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 796044a..c4effd4 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -2603,6 +2603,7 @@ static void pktgen_finalize_skb(struct pktgen_dev *pkt_dev, struct sk_buff *skb,
>  					break;
>  			}
>  			skb_frag_set_page(skb, i, pkt_dev->page);
> +			skb_frag_ref(skb, i);
>  			skb_shinfo(skb)->frags[i].page_offset = 0;
>  			/*last fragment, fill rest of data*/
>  			if (i == (frags - 1))

I am ok with this patch, since it makes sense to let driver do the
page->_count change itself (it can use a batched add() in case a page is
splitted in many frags)

But I suggest using get_page(pkt_dev->page), this seems more obvious to
me [ This was how I wrote the thing ;) ]

^ permalink raw reply

* [PATCH] net: Fix driver name for mdio-gpio.c
From: Dirk Eibach @ 2011-10-18 13:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, grant.likely, Dirk Eibach

Since commit
"7488876... dt/net: Eliminate users of of_platform_{,un}register_driver"
there are two platform drivers named "mdio-gpio" registered.
I renamed the of variant to "mdio-ofgpio".

Signed-off-by: Dirk Eibach <eibach@gdsys.de>
---
 drivers/net/phy/mdio-gpio.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/mdio-gpio.c b/drivers/net/phy/mdio-gpio.c
index f2d122f..3d66d9f 100644
--- a/drivers/net/phy/mdio-gpio.c
+++ b/drivers/net/phy/mdio-gpio.c
@@ -241,7 +241,7 @@ MODULE_DEVICE_TABLE(of, mdio_ofgpio_match);
 
 static struct platform_driver mdio_ofgpio_driver = {
 	.driver = {
-		.name = "mdio-gpio",
+		.name = "mdio-ofgpio",
 		.owner = THIS_MODULE,
 		.of_match_table = mdio_ofgpio_match,
 	},
-- 
1.5.6.5

^ permalink raw reply related

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Eric Dumazet @ 2011-10-18 12:56 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs
In-Reply-To: <20111018114945.GA16359@gondor.apana.org.au>

Le mardi 18 octobre 2011 à 13:49 +0200, Herbert Xu a écrit :
> On Tue, Oct 18, 2011 at 01:37:58PM +0200, Eric Dumazet wrote:
> >
> > In the bug we try to fix, we have :
> > 
> > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) 
> > 
> > ... < increase of dev->needed_headroom by another cpu/task >
> > 
> > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));
> 
> OK, in that case one fix would be to replace LL_ALLOCATED_SPACE
> with its two constiuents so that they may be stored in local
> variables for later use.
> 
> 	hlen = LL_HEADROOM(skb);
> 	tlen = LL_TAILROOM(skb);
> 	skb_alloc_send_skb(sk, ... + LL_ALIGN(hlen + tlen));
> 
> 	skb_reserve(skb, LL_ALIGN(hlen));
> 
> Cheers,

I am ok by this way, but we might hit another similar problem elsewhere.

(igmp.c ip6_output, ...)

We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate
code...


[PATCH] raw: allow dev->needed_headroom dynamic change

It seems ip_gre is able to change dev->needed_headroom on the fly.

It triggers a BUG in raw_sendmsg()

skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) 

< another cpu change dev->needed_headromm (making it bigger)

...
skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));

We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.

Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)

Reported-by: Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Timo Teräs <timo.teras@iki.fi>
CC: Herbert Xu <herbert@gondor.apana.org.au>
---
 include/linux/netdevice.h |   10 +++++++---
 net/ipv4/raw.c            |    9 +++++++--
 net/ipv6/raw.c            |    9 +++++++--
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddee79b..dba2399 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -276,12 +276,16 @@ struct hh_cache {
  * LL_ALLOCATED_SPACE also takes into account the tailroom the device
  * may need.
  */
+#define LL_ALIGN(__len) (((__len)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+
 #define LL_RESERVED_SPACE(dev) \
-	((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+	LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom))
+
 #define LL_RESERVED_SPACE_EXTRA(dev,extra) \
-	((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+	LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom + (extra))
+
 #define LL_ALLOCATED_SPACE(dev) \
-	((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
+	LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom + (dev)->needed_tailroom)
 
 struct header_ops {
 	int	(*create) (struct sk_buff *skb, struct net_device *dev,
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 61714bd..4ed4eda 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -326,6 +326,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	unsigned int iphlen;
 	int err;
 	struct rtable *rt = *rtp;
+	unsigned int hard_header_len = rt->dst.dev->hard_header_len;
+	unsigned int needed_headroom = rt->dst.dev->needed_headroom;
+	unsigned int needed_tailroom = rt->dst.dev->needed_tailroom;
 
 	if (length > rt->dst.dev->mtu) {
 		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
@@ -336,11 +339,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 		goto out;
 
 	skb = sock_alloc_send_skb(sk,
-				  length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15,
+				  length + LL_ALIGN(hard_header_len +
+						    needed_headroom +
+					            needed_tailroom) + 15,
 				  flags & MSG_DONTWAIT, &err);
 	if (skb == NULL)
 		goto error;
-	skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));
+	skb_reserve(skb, LL_ALIGN(hard_header_len + needed_headroom));
 
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 343852e..eb0a797 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -610,6 +610,9 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length,
 	struct sk_buff *skb;
 	int err;
 	struct rt6_info *rt = (struct rt6_info *)*dstp;
+	unsigned int hard_header_len = rt->dst.dev->hard_header_len;
+	unsigned int needed_headroom = rt->dst.dev->needed_headroom;
+	unsigned int needed_tailroom = rt->dst.dev->needed_tailroom;
 
 	if (length > rt->dst.dev->mtu) {
 		ipv6_local_error(sk, EMSGSIZE, fl6, rt->dst.dev->mtu);
@@ -619,11 +622,13 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length,
 		goto out;
 
 	skb = sock_alloc_send_skb(sk,
-				  length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15,
+				  length + LL_ALIGN(hard_header_len +
+						    needed_headroom +
+					            needed_tailroom) + 15,
 				  flags & MSG_DONTWAIT, &err);
 	if (skb == NULL)
 		goto error;
-	skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));
+	skb_reserve(skb, LL_ALIGN(hard_header_len + needed_headroom));
 
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;

^ permalink raw reply related

* Re: [PATCH v13 0/6] flexcan: Add support for powerpc flexcan (freescale p1010)
From: Robin Holt @ 2011-10-18 12:30 UTC (permalink / raw)
  To: Kumar Gala
  Cc: Grant Likely, netdev-u79uwXL29TY76Z2rM5mHXA, U Bhaskar-B22300,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w, Marc Kleine-Budde,
	PPC list, David S. Miller, Wolfgang Grandegger
In-Reply-To: <D79CB818-C14E-4C8D-9A8D-42B39ADE20B2-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>

On Tue, Oct 18, 2011 at 06:43:13AM -0500, Kumar Gala wrote:
> 
> >> Robin,
> >> 
> >> Do you remember why we went with just 'fsl,p1010-flexcan' as the device tree compatible?  Do we feel the flex can on P1010 isn't the same as on MPC5xxx? or the ARM SoCs?
> > 
> > The decision was due to the fact there is no true "generic" fsl.flexcan
> > chip free of any SOC implementation and therefore not something which
> > could be separately defined.  That decision was made by Grant Likely.
> > I will inline that email below.
> > 
> > Robin
> 
> 
> Thanks, I'll look into this internally at FSL.  I think its confusing as hell to have "fsl,p1010-flexcan" in an ARM .dts and don't think any reasonable ARM customer of FSL would know to put a PPC SOC name in their .dts.  I'll ask the HW guys what's going on so we can come up with a bit more generic name so we don't have to constantly change this.  Even if its just:

Grants argument was that there should then be a fsl,zeba-flexcan which
would define each arm based soc.  The match string could be there and
the devicetree binding would match on each equivalent.

Robin

> 
> fsl,ppc-flexcan & fsl,arm-flexcan.
> 
> > On Mon, Aug 15, 2011 at 09:13:50AM -0600, Grant Likely wrote:
> >> On Mon, Aug 15, 2011 at 9:03 AM, Robin Holt <holt-sJ/iWh9BUns@public.gmane.org> wrote:
> >>> Grant,
> >>> 
> >>> Earlier, you had asked for a more specific name for the compatible
> >>> property of the Freescale flexcan device.  I still have not gotten a
> >>> more specific answer.  Hopefully Marc can give you more details about
> >>> the flexcan implementations.
> >> 
> >> If there is no ip core version, then just stick with the
> >> fsl,<soc>-flexcan name and drop "fsl,flexcan".  Marketing may say
> >> flexcan is flexcan, but hardware engineers like to change things.
> >> Trying to be too generic in compatible values will just lead to
> >> problems in the future.
> > 
> > Thanks,
> > Robin
> 
> - k

^ permalink raw reply

* Re: [net-next] stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb (V2)
From: Eric Dumazet @ 2011-10-18 12:26 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev, davem
In-Reply-To: <1318937995-14228-1-git-send-email-peppe.cavallaro@st.com>

Le mardi 18 octobre 2011 à 13:39 +0200, Giuseppe CAVALLARO a écrit :
> Problem using big mtu around 4096 bytes is you end allocating (4096
> +NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
> 8192 bytes : order-1 pages
> 
> It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
> to have no more than one page per skb.
> 
> Also the patch changes the netdev_alloc_skb_ip_align() done in
> init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
> allowing the driver to load even in case of memory pressure.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 

Seems good, thanks :)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: BUG: All network processes hang (brcmsmac/wpa_supplicant)
From: Eric Dumazet @ 2011-10-18 12:24 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: LKML, netdev
In-Reply-To: <20111018111422.GA1979@schottelius.org>

Le mardi 18 octobre 2011 à 13:14 +0200, Nico Schottelius a écrit :
> Dear LKML,
> 
> there is a rather nasty bug in all recent kernels (probably older as
> well):
> 
> If wpa_supplicant looses the connection to an AP, all network calls
> hang, uninterruptle.
> 
> This is a snippet of running fetchmail after the problem started:
> 
> socket(PF_NETLINK, SOCK_RAW, 0)         = 4
> bind(4, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
> getsockname(4, {sa_family=AF_NETLINK, pid=9734, groups=00000000}, [12]) = 0
> sendto(4, "\24\0\0\0\26\0\1\3\225\\\235N\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12
> 
> [12:59] brief:linux-2.6% ps aux | grep fetchmail
> nico      9063  0.0  0.0  20652  1492 pts/3    D+   12:55   0:00 fetchmail
> nico      9564  0.0  0.0  20652  1496 pts/15   D+   13:01   0:00 fetchmail
> nico      9731  0.0  0.0   4288   724 pts/34   S+   13:01   0:00 strace -fF fetchmail
> nico      9734  0.0  0.0  20652  1492 pts/34   D+   13:01   0:00 fetchmail
> 
> This is particular nasty, because even calling "sudo -i" does a sendto() call
> and thus hangs, as well as trying to send this mail with sendmail makes mutt hang.
> 
> I've seen that on stock Archlinux kernel (linux 3.0.6-2),
> latest from Linus (v3.1-rc9-93-ga84a79e), merged trees from Keith and Jiri
> (3.1.0-rc6-gbee709a).
> 
> Listing of hanging processes:
> 
> [13:05] brief:linux-2.6% ps aux | grep D        
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root      9056  0.2  0.0      0     0 ?        D    12:54   0:02 [kworker/u:0]
> nico      9063  0.0  0.0  20652  1492 pts/3    D+   12:55   0:00 fetchmail
> nico      9066  0.0  0.0  16548  1180 ?        D    12:56   0:00 sendmail -oem -oi -f ME@inf.ethz.ch -- RECIPIENT
> root      9067  0.0  0.0   6060   320 pts/14   DN   12:56   0:00 ip r
> root      9124  0.0  0.0   6060   320 pts/14   DN   12:57   0:00 ip r add 192.168.42.1/24 dev lo
> root      9390  0.0  0.0  11996   784 pts/12   D+   13:01   0:00 sudo -i
> nico      9564  0.0  0.0  20652  1496 pts/15   D+   13:01   0:00 fetchmail
> nico      9734  0.0  0.0  20652  1492 pts/34   D+   13:01   0:00 fetchmail
> root     12304  0.0  0.0  28624  2228 ?        Ds   09:16   0:00 wpa_supplicant -B -P /run/wpa_supplicant_wlan0.pid -i wlan0 -D nl80211,wext -c /run/network//wpa.wlan0/wpa.conf
> [13:07] brief:linux-2.6% 
> 
> The hardware is again the (in)famous MacBook Air 4,2, so I imagine this may be
> related to brcmsmac.
> 
> Attached are config, dmesg, lsmod.


Must be a mutex_unlock(some_mltex) missing somewhere.

Then later, a process holding RTNL is blocking on mutex_lock(some_mutex)

Try a "CONFIG_LOCKDEP=y" enabled build

^ permalink raw reply

* Re: [patch] pktgen: bug when calling ndelay in x86 architectures
From: Eric Dumazet @ 2011-10-18 11:56 UTC (permalink / raw)
  To: Daniel Turull
  Cc: David Miller, netdev, Robert Olsson, Voravit Tanyingyong,
	Jens Laas
In-Reply-To: <4E9D5E1B.3080704@gmail.com>

Le mardi 18 octobre 2011 à 13:08 +0200, Daniel Turull a écrit :
> The value selected to delay the transmission in pktgen with the ndelay function should be lower.
> In Linux/arch/x86/include/asm/delay.h and Linux/arch/sh/include/asm/delay.h
> the maximal expected value for a constant is 20000 ns.
> 
> Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
> ---
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 796044a..e17bd41 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -2145,7 +2145,7 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
>  	}
>  
>  	start_time = ktime_now();
> -	if (remaining < 100000)
> +	if (remaining < 20000)
>  		ndelay(remaining);	/* really small just spin */
>  	else {
>  		/* see do_nanosleep */

But 'remaining' is not a constant.

If we want exactly 40.000 packets per second rate (25 us between
packets), your patch makes this not quite possible without
CONFIG_HIGH_RES_TIMERS and probable high jitter because of scheduler
effects.

pktgen is kind of special, we _want_ a cpu for our exclusive use.

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Herbert Xu @ 2011-10-18 11:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs
In-Reply-To: <1318937878.2657.50.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Oct 18, 2011 at 01:37:58PM +0200, Eric Dumazet wrote:
>
> In the bug we try to fix, we have :
> 
> skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) 
> 
> ... < increase of dev->needed_headroom by another cpu/task >
> 
> skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));

OK, in that case one fix would be to replace LL_ALLOCATED_SPACE
with its two constiuents so that they may be stored in local
variables for later use.

	hlen = LL_HEADROOM(skb);
	tlen = LL_TAILROOM(skb);
	skb_alloc_send_skb(sk, ... + LL_ALIGN(hlen + tlen));

	skb_reserve(skb, LL_ALIGN(hlen));

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v13 0/6] flexcan: Add support for powerpc flexcan (freescale p1010)
From: Marc Kleine-Budde @ 2011-10-18 11:48 UTC (permalink / raw)
  To: Kumar Gala
  Cc: Grant Likely, netdev-u79uwXL29TY76Z2rM5mHXA, U Bhaskar-B22300,
	linux-can-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w, PPC list, David S. Miller,
	Wolfgang Grandegger
In-Reply-To: <D79CB818-C14E-4C8D-9A8D-42B39ADE20B2-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 857 bytes --]

On 10/18/2011 01:43 PM, Kumar Gala wrote:
> Thanks, I'll look into this internally at FSL.  I think its confusing
> as hell to have "fsl,p1010-flexcan" in an ARM .dts and don't think
> any reasonable ARM customer of FSL would know to put a PPC SOC name
> in their .dts.  I'll ask the HW guys what's going on so we can come
> up with a bit more generic name so we don't have to constantly change
> this.  Even if its just:
> 
> fsl,ppc-flexcan & fsl,arm-flexcan.

If you talk the HW guys ask them for a name for the coldfire flexcan
core, too please ;)

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 188 bytes --]

_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core

^ permalink raw reply

* Re: [PATCH v13 0/6] flexcan: Add support for powerpc flexcan (freescale p1010)
From: Kumar Gala @ 2011-10-18 11:43 UTC (permalink / raw)
  To: Robin Holt
  Cc: Grant Likely, netdev-u79uwXL29TY76Z2rM5mHXA, U Bhaskar-B22300,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w, Marc Kleine-Budde,
	PPC list, David S. Miller, Wolfgang Grandegger
In-Reply-To: <20111018094328.GC22814-sJ/iWh9BUns@public.gmane.org>


>> Robin,
>> 
>> Do you remember why we went with just 'fsl,p1010-flexcan' as the device tree compatible?  Do we feel the flex can on P1010 isn't the same as on MPC5xxx? or the ARM SoCs?
> 
> The decision was due to the fact there is no true "generic" fsl.flexcan
> chip free of any SOC implementation and therefore not something which
> could be separately defined.  That decision was made by Grant Likely.
> I will inline that email below.
> 
> Robin


Thanks, I'll look into this internally at FSL.  I think its confusing as hell to have "fsl,p1010-flexcan" in an ARM .dts and don't think any reasonable ARM customer of FSL would know to put a PPC SOC name in their .dts.  I'll ask the HW guys what's going on so we can come up with a bit more generic name so we don't have to constantly change this.  Even if its just:

fsl,ppc-flexcan & fsl,arm-flexcan.

> On Mon, Aug 15, 2011 at 09:13:50AM -0600, Grant Likely wrote:
>> On Mon, Aug 15, 2011 at 9:03 AM, Robin Holt <holt-sJ/iWh9BUns@public.gmane.org> wrote:
>>> Grant,
>>> 
>>> Earlier, you had asked for a more specific name for the compatible
>>> property of the Freescale flexcan device.  I still have not gotten a
>>> more specific answer.  Hopefully Marc can give you more details about
>>> the flexcan implementations.
>> 
>> If there is no ip core version, then just stick with the
>> fsl,<soc>-flexcan name and drop "fsl,flexcan".  Marketing may say
>> flexcan is flexcan, but hardware engineers like to change things.
>> Trying to be too generic in compatible values will just lead to
>> problems in the future.
> 
> Thanks,
> Robin

- k

^ permalink raw reply

* [net-next] stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb (V2)
From: Giuseppe CAVALLARO @ 2011-10-18 11:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, Giuseppe Cavallaro
In-Reply-To: <1318932085-14927-9-git-send-email-peppe.cavallaro@st.com>

Problem using big mtu around 4096 bytes is you end allocating (4096
+NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
8192 bytes : order-1 pages

It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
to have no more than one page per skb.

Also the patch changes the netdev_alloc_skb_ip_align() done in
init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
allowing the driver to load even in case of memory pressure.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1848a16..6b8f455 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -474,11 +474,13 @@ static void init_dma_desc_rings(struct net_device *dev)
 	for (i = 0; i < rxsize; i++) {
 		struct dma_desc *p = priv->dma_rx + i;
 
-		skb = netdev_alloc_skb_ip_align(dev, bfsize);
+		skb = __netdev_alloc_skb(dev, bfsize + NET_IP_ALIGN,
+					 GFP_KERNEL);
 		if (unlikely(skb == NULL)) {
 			pr_err("%s: Rx init fails; skb is NULL\n", __func__);
 			break;
 		}
+		skb_reserve(skb, NET_IP_ALIGN);
 		priv->rx_skbuff[i] = skb;
 		priv->rx_skbuff_dma[i] = dma_map_single(priv->device, skb->data,
 						bfsize, DMA_FROM_DEVICE);
@@ -1401,7 +1403,7 @@ static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
 	if (priv->plat->enh_desc)
 		max_mtu = JUMBO_LEN;
 	else
-		max_mtu = BUF_SIZE_4KiB;
+		max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
 
 	if ((new_mtu < 46) || (new_mtu > max_mtu)) {
 		pr_err("%s: invalid MTU, max MTU is: %d\n", dev->name, max_mtu);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [net-next 8/8] stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb.
From: Giuseppe CAVALLARO @ 2011-10-18 11:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem
In-Reply-To: <1318933110.2657.39.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On 10/18/2011 12:18 PM, Eric Dumazet wrote:
> Le mardi 18 octobre 2011 à 12:01 +0200, Giuseppe CAVALLARO a écrit :
>> Problem using big mtu around 4096 bytes is you end allocating (4096
>> +NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
>> 8192 bytes : order-1 pages
>>
>> It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
>> to have no more than one page per skb.
>>
>> Also the patch changes the netdev_alloc_skb_ip_align() done in
>> init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
>> allowing the driver to load even in case of memory pressure.
>>
>> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
>> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   13 +++++++++----
>>  1 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 1848a16..f5ca3be 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -474,11 +474,13 @@ static void init_dma_desc_rings(struct net_device *dev)
>>  	for (i = 0; i < rxsize; i++) {
>>  		struct dma_desc *p = priv->dma_rx + i;
>>  
>> -		skb = netdev_alloc_skb_ip_align(dev, bfsize);
>> +		skb = __netdev_alloc_skb(dev, bfsize + NET_IP_ALIGN,
>> +					 GFP_KERNEL);
>>  		if (unlikely(skb == NULL)) {
>>  			pr_err("%s: Rx init fails; skb is NULL\n", __func__);
>>  			break;
>>  		}
>> +		skb_reserve(skb, NET_IP_ALIGN);
>>  		priv->rx_skbuff[i] = skb;
>>  		priv->rx_skbuff_dma[i] = dma_map_single(priv->device, skb->data,
>>  						bfsize, DMA_FROM_DEVICE);
>> @@ -1176,12 +1178,15 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
>>  
>>  			skb = __skb_dequeue(&priv->rx_recycle);
>>  			if (skb == NULL)
>> -				skb = netdev_alloc_skb_ip_align(priv->dev,
>> -								bfsize);
>> +				skb = __netdev_alloc_skb(priv->dev, bfsize +
>> +							 NET_IP_ALIGN,
>> +							 GFP_KERNEL);
>>  
> 
> No, you cant do that in softirq context. We cant sleep here and must use
> GFP_ATOMIC
> Only the init_dma_desc_rings() part is OK, we run in process context and
> are allowed to sleep in memory allocations (GFP_KERNEL)
> 
> 
>>  			if (unlikely(skb == NULL))
>>  				break;
>>  
>> +			skb_reserve(skb, NET_IP_ALIGN);
>> +
>>  			priv->rx_skbuff[entry] = skb;
>>  			priv->rx_skbuff_dma[entry] =
>>  			    dma_map_single(priv->device, skb->data, bfsize,
>> @@ -1401,7 +1406,7 @@ static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
>>  	if (priv->plat->enh_desc)
>>  		max_mtu = JUMBO_LEN;
>>  	else
>> -		max_mtu = BUF_SIZE_4KiB;
>> +		max_mtu = SKB_MAX_HEAD(NET_SKB_PAD);
>>  
> 
> minor nit (since NET_IP_ALIGN is 0 anyway on modern x86)
> max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
> 
> 
>>  	if ((new_mtu < 46) || (new_mtu > max_mtu)) {
>>  		pr_err("%s: invalid MTU, max MTU is: %d\n", dev->name, max_mtu);
> 
> 
> 

Ok! I'm reworking and sending it again.

Thx
Peppe

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Eric Dumazet @ 2011-10-18 11:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs
In-Reply-To: <20111018104504.GA15913@gondor.apana.org.au>

Le mardi 18 octobre 2011 à 12:45 +0200, Herbert Xu a écrit :
> On Tue, Oct 18, 2011 at 12:23:43PM +0200, Eric Dumazet wrote:
> > 
> > You're right, if reallocations are OK in all paths.
> 
> If it wasn't OK then making needed_headroom constant won't work
> anyway.
> 
> > We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() /
> > LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead
> > of a [read once] pointer to values.
> 
> I'm not sure what you mean here.  I don't see any need to change
> these macros.  All we need is to save the value in a local variable:
> 
> 	hh_len = LL_RESERVED_SPACE(dev);
> 
> 	skb = alloc_skb(hh_len + len);
> 	skb_reserve(skb, hh_len);
> 

Not really Herbert. Please read again my patch changelog.

In the bug we try to fix, we have :

skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) 

... < increase of dev->needed_headroom by another cpu/task >

skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));

skb_put() -> crash because we reserved too much space

So we really want LL_ALLOCATED_SPACE() and LL_RESERVED_SPACE() use the
same needed_headroom, or else you can have LL_RESERVED_SPACE() >
LL_ALLOCATED_SPACE().

There are several way to fix this, but this kind of code assumed the
dev->needed... values were consistent for the whole block.

^ permalink raw reply

* [patch] pktgen: bug when calling ndelay in x86 architectures
From: Daniel Turull @ 2011-10-18 11:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Robert Olsson, Voravit Tanyingyong, Jens Laas

The value selected to delay the transmission in pktgen with the ndelay function should be lower.
In Linux/arch/x86/include/asm/delay.h and Linux/arch/sh/include/asm/delay.h
the maximal expected value for a constant is 20000 ns.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
---
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 796044a..e17bd41 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2145,7 +2145,7 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 	}
 
 	start_time = ktime_now();
-	if (remaining < 100000)
+	if (remaining < 20000)
 		ndelay(remaining);	/* really small just spin */
 	else {
 		/* see do_nanosleep */

^ permalink raw reply related

* Re: [PATCH] Fix guest memory leak and panic
From: Ian Campbell @ 2011-10-18 10:50 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: davem@davemloft.net, linux-kernel@vger.kernel.org, mst@redhat.com,
	netdev@vger.kernel.org, rusty@rustcorp.com.au,
	virtualization@lists.linux-foundation.org
In-Reply-To: <OFB632D60B.6534A38B-ON6525792D.003A1B21-6525792D.003B0DB8@in.ibm.com>

On Tue, 2011-10-18 at 11:46 +0100, Krishna Kumar2 wrote:
> Ian Campbell <Ian.Campbell@citrix.com> wrote on 10/18/2011 03:17:11 PM:
> 
> > > I think the best thing might be to remove the additional ref taking
> from
> > > the setter function and audit the previous changes to ensure they
> > > conform. I'll do that right away and post a fixup patch ASAP.
> >
> > Sigh, only one out of the ten callers of (__)skb_frag_set_page expects
> > skb_frag_set_page to take a new reference. I think that's pretty
> > comprehensive evidence that the current behaviour is unexpected and
> > wrong.
> 
> Looks good!

Thanks.

> Does it make sense to commit both of these patches? The
> reason being - my patch becomes a cleanup of set_skb_frag()
> in virtio_net driver.

I think your patch remains a valid cleanup but I don't maintain that
code.

Ian.

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: Krishna Kumar2 @ 2011-10-18 10:46 UTC (permalink / raw)
  To: Ian Campbell
  Cc: davem@davemloft.net, linux-kernel@vger.kernel.org, mst@redhat.com,
	netdev@vger.kernel.org, rusty@rustcorp.com.au,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1318931232.16132.65.camel@zakaz.uk.xensource.com>

Ian Campbell <Ian.Campbell@citrix.com> wrote on 10/18/2011 03:17:11 PM:

> > I think the best thing might be to remove the additional ref taking
from
> > the setter function and audit the previous changes to ensure they
> > conform. I'll do that right away and post a fixup patch ASAP.
>
> Sigh, only one out of the ten callers of (__)skb_frag_set_page expects
> skb_frag_set_page to take a new reference. I think that's pretty
> comprehensive evidence that the current behaviour is unexpected and
> wrong.

Looks good!

Does it make sense to commit both of these patches? The
reason being - my patch becomes a cleanup of set_skb_frag()
in virtio_net driver.

thanks,

- KK

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Herbert Xu @ 2011-10-18 10:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs
In-Reply-To: <1318933423.2657.44.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Oct 18, 2011 at 12:23:43PM +0200, Eric Dumazet wrote:
> 
> You're right, if reallocations are OK in all paths.

If it wasn't OK then making needed_headroom constant won't work
anyway.

> We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() /
> LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead
> of a [read once] pointer to values.

I'm not sure what you mean here.  I don't see any need to change
these macros.  All we need is to save the value in a local variable:

	hh_len = LL_RESERVED_SPACE(dev);

	skb = alloc_skb(hh_len + len);
	skb_reserve(skb, hh_len);

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] Fix guest memory leak and panic
From: Krishna Kumar2 @ 2011-10-18 10:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: davem@davemloft.net, linux-kernel@vger.kernel.org, mst@redhat.com,
	netdev@vger.kernel.org, rusty@rustcorp.com.au,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1318927016.16132.49.camel@zakaz.uk.xensource.com>

Ian Campbell <Ian.Campbell@citrix.com> wrote on 10/18/2011 02:06:56 PM:

Hi Ian,

> On Tue, 2011-10-18 at 09:05 +0100, Krishna Kumar wrote:
> > The extra reference to the fragment pages causes those pages to
> > not get freed in skb_release_data(). The following patch fixes
> > the bug. I have not checked if any other converted driver has
> > the same issue.
>
> Damn. You are completely correct and I appear to have made this same
> mistake several times. A quick look suggests that at least cxbg,
> myriage, vmxnet, cassini and bnx2 may potentially have a similar
> issue :-( (I stopped looking at that point, I'll obviously do a full
> audit).
>
> I considered quite carefully whether (__)skb_frag_set_page should take a
> reference or not and decided yes but I'm starting to reconsider whether
> I made the right choice. It seems that is just confusing and violates
> the principal of least surprise to have a function called "set" take a
> new reference. In reality all existing drivers expect that adding a page
> to an SKB frag will just take over the existing reference.
>
> I think the best thing might be to remove the additional ref taking from
> the setter function and audit the previous changes to ensure they
> conform. I'll do that right away and post a fixup patch ASAP.
>
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>
> This change is correct as things stand today, so:
>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>
> but perhaps it would be better to hold off and let me fix all of these
> all at once.

Either way is fine with me. However, besides having a fix,
I would like to remove the manual initializations in
set_skb_frag(), and substitute with __skb_fill_page_desc,
like other drivers do.

thanks,

- KK

^ permalink raw reply

* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops
From: Eric Dumazet @ 2011-10-18 10:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs
In-Reply-To: <20111018100527.GA15383@gondor.apana.org.au>

Le mardi 18 octobre 2011 à 12:05 +0200, Herbert Xu a écrit :
> On Tue, Oct 18, 2011 at 12:01:33PM +0200, Eric Dumazet wrote:
> >
> > Adding an RCU protected structure to hold hard_header_len /
> > needed_headroom / needed_tailroom should be possible, but this adds yet
> > another pointer dereference...
> 
> I don't think we need RCU here since the problem is simply that
> we're using two different values for skb allocations and skb_reserve.
> 
> As long as we use one and the same value it should work.  The value
> will rarely be incorrect and when it is, automatic reallocation will
> occur.
> 

You're right, if reallocations are OK in all paths.

We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() /
LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead
of a [read once] pointer to values.

Thats a bit complex change, but doable.

^ permalink raw reply

* Re: [net-next 8/8] stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb.
From: Eric Dumazet @ 2011-10-18 10:18 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev, davem
In-Reply-To: <1318932085-14927-9-git-send-email-peppe.cavallaro@st.com>

Le mardi 18 octobre 2011 à 12:01 +0200, Giuseppe CAVALLARO a écrit :
> Problem using big mtu around 4096 bytes is you end allocating (4096
> +NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
> 8192 bytes : order-1 pages
> 
> It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
> to have no more than one page per skb.
> 
> Also the patch changes the netdev_alloc_skb_ip_align() done in
> init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
> allowing the driver to load even in case of memory pressure.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   13 +++++++++----
>  1 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 1848a16..f5ca3be 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -474,11 +474,13 @@ static void init_dma_desc_rings(struct net_device *dev)
>  	for (i = 0; i < rxsize; i++) {
>  		struct dma_desc *p = priv->dma_rx + i;
>  
> -		skb = netdev_alloc_skb_ip_align(dev, bfsize);
> +		skb = __netdev_alloc_skb(dev, bfsize + NET_IP_ALIGN,
> +					 GFP_KERNEL);
>  		if (unlikely(skb == NULL)) {
>  			pr_err("%s: Rx init fails; skb is NULL\n", __func__);
>  			break;
>  		}
> +		skb_reserve(skb, NET_IP_ALIGN);
>  		priv->rx_skbuff[i] = skb;
>  		priv->rx_skbuff_dma[i] = dma_map_single(priv->device, skb->data,
>  						bfsize, DMA_FROM_DEVICE);
> @@ -1176,12 +1178,15 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
>  
>  			skb = __skb_dequeue(&priv->rx_recycle);
>  			if (skb == NULL)
> -				skb = netdev_alloc_skb_ip_align(priv->dev,
> -								bfsize);
> +				skb = __netdev_alloc_skb(priv->dev, bfsize +
> +							 NET_IP_ALIGN,
> +							 GFP_KERNEL);
>  

No, you cant do that in softirq context. We cant sleep here and must use
GFP_ATOMIC

Only the init_dma_desc_rings() part is OK, we run in process context and
are allowed to sleep in memory allocations (GFP_KERNEL)


>  			if (unlikely(skb == NULL))
>  				break;
>  
> +			skb_reserve(skb, NET_IP_ALIGN);
> +
>  			priv->rx_skbuff[entry] = skb;
>  			priv->rx_skbuff_dma[entry] =
>  			    dma_map_single(priv->device, skb->data, bfsize,
> @@ -1401,7 +1406,7 @@ static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
>  	if (priv->plat->enh_desc)
>  		max_mtu = JUMBO_LEN;
>  	else
> -		max_mtu = BUF_SIZE_4KiB;
> +		max_mtu = SKB_MAX_HEAD(NET_SKB_PAD);
>  

minor nit (since NET_IP_ALIGN is 0 anyway on modern x86)
max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);


>  	if ((new_mtu < 46) || (new_mtu > max_mtu)) {
>  		pr_err("%s: invalid MTU, max MTU is: %d\n", dev->name, max_mtu);

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox