Netdev List
 help / color / mirror / Atom feed
* [PATCH 3/4] ibmveth: Checksum offload is always disabled
From: Anton Blanchard @ 2011-09-08  0:41 UTC (permalink / raw)
  To: Santiago Leon, brking, rcj, mirq-linux; +Cc: netdev
In-Reply-To: <20110908004102.355674129@samba.org>

[-- Attachment #1: ibmveth_fix_csum.patch --]
[-- Type: text/plain, Size: 766 bytes --]

Commit b9367bf3ee6d (net: ibmveth: convert to hw_features) reversed
a check in ibmveth_set_csum_offload that results in checksum offload
never being enabled.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
---

Index: linux-build/drivers/net/ibmveth.c
===================================================================
--- linux-build.orig/drivers/net/ibmveth.c	2011-09-01 16:02:12.198726425 +1000
+++ linux-build/drivers/net/ibmveth.c	2011-09-01 16:05:37.282482851 +1000
@@ -812,7 +812,7 @@ static int ibmveth_set_csum_offload(stru
 		} else
 			adapter->fw_ipv6_csum_support = data;
 
-		if (ret != H_SUCCESS || ret6 != H_SUCCESS)
+		if (ret == H_SUCCESS || ret6 == H_SUCCESS)
 			adapter->rx_csum = data;
 		else
 			rc1 = -EIO;

^ permalink raw reply

* [PATCH 0/4] ibmveth fixes
From: Anton Blanchard @ 2011-09-08  0:41 UTC (permalink / raw)
  To: Santiago Leon, brking, rcj; +Cc: netdev

Here are a number of fixes found during ibmveth testing.

Anton

^ permalink raw reply

* [PATCH 2/4] ibmveth: Fix issue with DMA mapping failure
From: Anton Blanchard @ 2011-09-08  0:41 UTC (permalink / raw)
  To: Santiago Leon, brking, rcj; +Cc: netdev
In-Reply-To: <20110908004102.355674129@samba.org>

[-- Attachment #1: ibmveth_dma_mapping_error.patch --]
[-- Type: text/plain, Size: 1604 bytes --]

descs[].fields.address is 32bit which truncates any dma mapping
errors so dma_mapping_error() fails to catch it.

Use a dma_addr_t to do the comparison. With this patch I was able
to transfer many gigabytes of data with IOMMU fault injection set
at 10% probability.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # v2.6.37+
---

Index: linux-build/drivers/net/ibmveth.c
===================================================================
--- linux-build.orig/drivers/net/ibmveth.c	2011-09-01 15:01:18.066844039 +1000
+++ linux-build/drivers/net/ibmveth.c	2011-09-01 15:03:34.299079796 +1000
@@ -930,6 +930,7 @@ static netdev_tx_t ibmveth_start_xmit(st
 	union ibmveth_buf_desc descs[6];
 	int last, i;
 	int force_bounce = 0;
+	dma_addr_t dma_addr;
 
 	/*
 	 * veth handles a maximum of 6 segments including the header, so
@@ -994,17 +995,16 @@ retry_bounce:
 	}
 
 	/* Map the header */
-	descs[0].fields.address = dma_map_single(&adapter->vdev->dev, skb->data,
-						 skb_headlen(skb),
-						 DMA_TO_DEVICE);
-	if (dma_mapping_error(&adapter->vdev->dev, descs[0].fields.address))
+	dma_addr = dma_map_single(&adapter->vdev->dev, skb->data,
+				  skb_headlen(skb), DMA_TO_DEVICE);
+	if (dma_mapping_error(&adapter->vdev->dev, dma_addr))
 		goto map_failed;
 
 	descs[0].fields.flags_len = desc_flags | skb_headlen(skb);
+	descs[0].fields.address = dma_addr;
 
 	/* Map the frags */
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		unsigned long dma_addr;
 		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 
 		dma_addr = dma_map_page(&adapter->vdev->dev, frag->page,

^ permalink raw reply

* [PATCH 4/4] ibmveth: Fix checksum offload failure handling
From: Anton Blanchard @ 2011-09-08  0:41 UTC (permalink / raw)
  To: Santiago Leon, brking, rcj, mirq-linux; +Cc: netdev
In-Reply-To: <20110908004102.355674129@samba.org>

[-- Attachment #1: ibmveth_fix_csum2.patch --]
[-- Type: text/plain, Size: 2988 bytes --]

Fix a number of issues in ibmveth_set_csum_offload:

- set_attr6 and clr_attr6 may be used uninitialised

- We store the result of the IPV4 checksum change in ret but overwrite
  it in a couple of places before checking it again later. Add ret4
  to make it obvious what we are doing.

- We weren't clearing the NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM flags
  if the enable of that hypervisor feature failed.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

This patch could do with some review and I haven't marked it -stable
yet.

One thing that stands out is that we need to return an error back
through set_features if any feature bit gets changed.

Index: linux-build/drivers/net/ibmveth.c
===================================================================
--- linux-build.orig/drivers/net/ibmveth.c	2011-09-01 16:06:19.000000000 +1000
+++ linux-build/drivers/net/ibmveth.c	2011-09-02 09:05:57.585353722 +1000
@@ -757,7 +757,7 @@ static int ibmveth_set_csum_offload(stru
 	struct ibmveth_adapter *adapter = netdev_priv(dev);
 	unsigned long set_attr, clr_attr, ret_attr;
 	unsigned long set_attr6, clr_attr6;
-	long ret, ret6;
+	long ret, ret4, ret6;
 	int rc1 = 0, rc2 = 0;
 	int restart = 0;
 
@@ -770,6 +770,8 @@ static int ibmveth_set_csum_offload(stru
 
 	set_attr = 0;
 	clr_attr = 0;
+	set_attr6 = 0;
+	clr_attr6 = 0;
 
 	if (data) {
 		set_attr = IBMVETH_ILLAN_IPV4_TCP_CSUM;
@@ -784,16 +786,20 @@ static int ibmveth_set_csum_offload(stru
 	if (ret == H_SUCCESS && !(ret_attr & IBMVETH_ILLAN_ACTIVE_TRUNK) &&
 	    !(ret_attr & IBMVETH_ILLAN_TRUNK_PRI_MASK) &&
 	    (ret_attr & IBMVETH_ILLAN_PADDED_PKT_CSUM)) {
-		ret = h_illan_attributes(adapter->vdev->unit_address, clr_attr,
+		ret4 = h_illan_attributes(adapter->vdev->unit_address, clr_attr,
 					 set_attr, &ret_attr);
 
-		if (ret != H_SUCCESS) {
+		if (ret4 != H_SUCCESS) {
 			netdev_err(dev, "unable to change IPv4 checksum "
 					"offload settings. %d rc=%ld\n",
-					data, ret);
+					data, ret4);
+
+			h_illan_attributes(adapter->vdev->unit_address,
+					   set_attr, clr_attr, &ret_attr);
+
+			if (data == 1)
+				dev->features &= ~NETIF_F_IP_CSUM;
 
-			ret = h_illan_attributes(adapter->vdev->unit_address,
-						 set_attr, clr_attr, &ret_attr);
 		} else {
 			adapter->fw_ipv4_csum_support = data;
 		}
@@ -804,15 +810,18 @@ static int ibmveth_set_csum_offload(stru
 		if (ret6 != H_SUCCESS) {
 			netdev_err(dev, "unable to change IPv6 checksum "
 					"offload settings. %d rc=%ld\n",
-					data, ret);
+					data, ret6);
+
+			h_illan_attributes(adapter->vdev->unit_address,
+					   set_attr6, clr_attr6, &ret_attr);
+
+			if (data == 1)
+				dev->features &= ~NETIF_F_IPV6_CSUM;
 
-			ret = h_illan_attributes(adapter->vdev->unit_address,
-						 set_attr6, clr_attr6,
-						 &ret_attr);
 		} else
 			adapter->fw_ipv6_csum_support = data;
 
-		if (ret == H_SUCCESS || ret6 == H_SUCCESS)
+		if (ret4 == H_SUCCESS || ret6 == H_SUCCESS)
 			adapter->rx_csum = data;
 		else
 			rc1 = -EIO;

^ permalink raw reply

* [PATCH 1/4] ibmveth: Fix DMA unmap error
From: Anton Blanchard @ 2011-09-08  0:41 UTC (permalink / raw)
  To: Santiago Leon, brking, rcj; +Cc: netdev
In-Reply-To: <20110908004102.355674129@samba.org>

[-- Attachment #1: ibmveth_1.patch --]
[-- Type: text/plain, Size: 1208 bytes --]

From: Brian King <brking@linux.vnet.ibm.com>

Commit 6e8ab30ec677 (ibmveth: Add scatter-gather support) introduced a
DMA mapping API inconsistency resulting in dma_unmap_page getting
called on memory mapped via dma_map_single. This was seen when
CONFIG_DMA_API_DEBUG was enabled. Fix up this API usage inconsistency.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # v2.6.37+
---

Index: linux-build/drivers/net/ibmveth.c
===================================================================
--- linux-build.orig/drivers/net/ibmveth.c	2011-09-08 08:00:16.842856634 +1000
+++ linux-build/drivers/net/ibmveth.c	2011-09-08 09:45:43.163851274 +1000
@@ -1026,7 +1026,12 @@ retry_bounce:
 		netdev->stats.tx_bytes += skb->len;
 	}
 
-	for (i = 0; i < skb_shinfo(skb)->nr_frags + 1; i++)
+	dma_unmap_single(&adapter->vdev->dev,
+			 descs[0].fields.address,
+			 descs[0].fields.flags_len & IBMVETH_BUF_LEN_MASK,
+			 DMA_TO_DEVICE);
+
+	for (i = 1; i < skb_shinfo(skb)->nr_frags + 1; i++)
 		dma_unmap_page(&adapter->vdev->dev, descs[i].fields.address,
 			       descs[i].fields.flags_len & IBMVETH_BUF_LEN_MASK,
 			       DMA_TO_DEVICE);

^ permalink raw reply

* Re: [PATCH 1/2] iwlegacy: change IWL_WARN to IWL_DEBUG_HT in iwl4965_tx_agg_start
From: Greg Dietsche @ 2011-09-08  3:51 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <20110906150131.GA7322@redhat.com>

Hi Stanislaw,
On 09/06/2011 10:01 AM, Stanislaw Gruszka wrote:
> I put patches here:
> http://people.redhat.com/sgruszka/iwlegacy_cleanup.tar.bz2
>
> They are on top of wireless-testing tree.
>    
<snip>
> Series include your 2 patches. You can test this cleanup and
> apply your new changes on top. I'll not do any further cleanup
> for some time now, perhaps continue when I got public git tree.
>
>    
Thanks! I've re-worked my patches and you can find them here:
http://www.gregd.org/stuff/linux/iwlegacy_cleanup_greg.tar.bz2

I also decided to play with github a little bit: 
git://github.com/dietsche/linux.git and pushed two branches:
   1) wireless-next-iwlegacy-stanislaw - your patch set
   2) wireless-next-iwlegacy-stanislaw-greg - a branch that has my 
additional patches.
`git format-patch 
wireless-next-iwlegacy-stanislaw..wireless-next-iwlegacy-stanislaw-greg` 
will generate the patches that are in the link i posted above.

The first two patches in my series are the ones that I think folks 
should take a closer look at. The rest are pretty safe.

I'm running both patch sets right now on my laptop. So far it is running 
perfectly.

Greg

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Tim Chen @ 2011-09-07 21:06 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: sedat.dilek@gmail.com, Eric Dumazet, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <4E680BF1.8000901@intel.com>

On Thu, 2011-09-08 at 08:27 +0800, Yan, Zheng wrote:

> >  	err = -EPIPE;
> >  out_err:
> > -	if (skb == NULL)
> > +	if (!steal_refs)
> >  		scm_destroy(siocb->scm);
> 
> I think we should call scm_release() here in the case of
> steal_refs == true. Otherwise siocb->scm->fp may leak.

Yan Zheng,

I've updated the patch.  If it looks good to you now, can you add your
Signed-off-by to the patch.  Pending Sedat's testing on this patch,
I think it is good to go.

Tim

---
Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
in Unix socket's send and receive path) introduced a use-after-free bug.
The sent skbs from unix_stream_sendmsg could be consumed and destructed 
by the receive side, removing all references to the credentials, 
before the send side has finished sending out all 
packets. However, send side could continue to consturct new packets in the 
stream, using credentials that have lost its last reference and been
freed.  

In this fix, we don't steal the reference to credentials we have obtained 
in scm_send at beginning of unix_stream_sendmsg, till we've reached
the last packet.  This fixes the problem in commit 0856a30409.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 136298c..47780dc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1383,10 +1383,11 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 }
 
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
-			   bool send_fds, bool ref)
+			   bool send_fds, bool steal_refs)
 {
 	int err = 0;
-	if (ref) {
+
+	if (!steal_refs) {
 		UNIXCB(skb).pid  = get_pid(scm->pid);
 		UNIXCB(skb).cred = get_cred(scm->cred);
 	} else {
@@ -1458,7 +1459,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
-	err = unix_scm_to_skb(siocb->scm, skb, true, false);
+	err = unix_scm_to_skb(siocb->scm, skb, true, true);
 	if (err < 0)
 		goto out_free;
 	max_level = err + 1;
@@ -1581,6 +1582,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	int sent = 0;
 	struct scm_cookie tmp_scm;
 	bool fds_sent = false;
+	bool steal_refs = false;
 	int max_level;
 
 	if (NULL == siocb->scm)
@@ -1642,11 +1644,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		size = min_t(int, size, skb_tailroom(skb));
 
 
-		/* Only send the fds and no ref to pid in the first buffer */
-		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
+		/* Only send the fds in first buffer
+		 * Last buffer can steal our references to pid/cred
+		 */
+		steal_refs = (sent + size >= len);
+		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
 		if (err < 0) {
 			kfree_skb(skb);
-			goto out;
+			goto out_err;
 		}
 		max_level = err + 1;
 		fds_sent = true;
@@ -1654,7 +1659,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
 		if (err) {
 			kfree_skb(skb);
-			goto out;
+			goto out_err;
 		}
 
 		unix_state_lock(other);
@@ -1671,7 +1676,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		sent += size;
 	}
 
-	if (skb)
+	if (steal_refs)
 		scm_release(siocb->scm);
 	else
 		scm_destroy(siocb->scm);
@@ -1687,9 +1692,10 @@ pipe_err:
 		send_sig(SIGPIPE, current, 0);
 	err = -EPIPE;
 out_err:
-	if (skb == NULL)
+	if (steal_refs)
+		scm_release(siocb->scm);
+	else
 		scm_destroy(siocb->scm);
-out:
 	siocb->scm = NULL;
 	return sent ? : err;
 }

^ permalink raw reply related

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Tim Chen @ 2011-09-07 21:15 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: sedat.dilek@gmail.com, Eric Dumazet, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks, Yan, Zheng
In-Reply-To: <1315429583.2361.3.camel@schen9-mobl>

On Wed, 2011-09-07 at 14:06 -0700, Tim Chen wrote:
> On Thu, 2011-09-08 at 08:27 +0800, Yan, Zheng wrote:
> 
> > >  	err = -EPIPE;
> > >  out_err:
> > > -	if (skb == NULL)
> > > +	if (!steal_refs)
> > >  		scm_destroy(siocb->scm);
> > 
> > I think we should call scm_release() here in the case of
> > steal_refs == true. Otherwise siocb->scm->fp may leak.
> 
> Yan Zheng,
> 
> I've updated the patch.  If it looks good to you now, can you add your
> Signed-off-by to the patch.  Pending Sedat's testing on this patch,
> I think it is good to go.
> 
> Tim

Oops, I've forgotten to add Eric's previous Signed-off-by in my latest
patch log.  David, please add that when you pick up the patch.  
Once Yan Zheng added his sign off and Sedat tested the patch, I think it
will be good to go.

Thanks.

Tim 

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Yan, Zheng @ 2011-09-08  4:18 UTC (permalink / raw)
  To: Tim Chen
  Cc: sedat.dilek@gmail.com, Eric Dumazet, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <1315429583.2361.3.camel@schen9-mobl>

On 09/08/2011 05:06 AM, Tim Chen wrote:
> On Thu, 2011-09-08 at 08:27 +0800, Yan, Zheng wrote:
> 
>>>  	err = -EPIPE;
>>>  out_err:
>>> -	if (skb == NULL)
>>> +	if (!steal_refs)
>>>  		scm_destroy(siocb->scm);
>>
>> I think we should call scm_release() here in the case of
>> steal_refs == true. Otherwise siocb->scm->fp may leak.
> 
> Yan Zheng,
> 
> I've updated the patch.  If it looks good to you now, can you add your
> Signed-off-by to the patch.  Pending Sedat's testing on this patch,
> I think it is good to go.
> 
> Tim
> 
> ---
> Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
> in Unix socket's send and receive path) introduced a use-after-free bug.
> The sent skbs from unix_stream_sendmsg could be consumed and destructed 
> by the receive side, removing all references to the credentials, 
> before the send side has finished sending out all 
> packets. However, send side could continue to consturct new packets in the 
> stream, using credentials that have lost its last reference and been
> freed.  
> 
> In this fix, we don't steal the reference to credentials we have obtained 
> in scm_send at beginning of unix_stream_sendmsg, till we've reached
> the last packet.  This fixes the problem in commit 0856a30409.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reported-by: Jiri Slaby <jirislaby@gmail.com>
> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
> ---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 136298c..47780dc 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1383,10 +1383,11 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>  }
>  
>  static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
> -			   bool send_fds, bool ref)
> +			   bool send_fds, bool steal_refs)
>  {
>  	int err = 0;
> -	if (ref) {
> +
> +	if (!steal_refs) {
>  		UNIXCB(skb).pid  = get_pid(scm->pid);
>  		UNIXCB(skb).cred = get_cred(scm->cred);
>  	} else {
> @@ -1458,7 +1459,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	if (skb == NULL)
>  		goto out;
>  
> -	err = unix_scm_to_skb(siocb->scm, skb, true, false);
> +	err = unix_scm_to_skb(siocb->scm, skb, true, true);
>  	if (err < 0)
>  		goto out_free;
>  	max_level = err + 1;
> @@ -1581,6 +1582,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	int sent = 0;
>  	struct scm_cookie tmp_scm;
>  	bool fds_sent = false;
> +	bool steal_refs = false;
>  	int max_level;
>  
>  	if (NULL == siocb->scm)
> @@ -1642,11 +1644,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		size = min_t(int, size, skb_tailroom(skb));
>  
>  
> -		/* Only send the fds and no ref to pid in the first buffer */
> -		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
> +		/* Only send the fds in first buffer
> +		 * Last buffer can steal our references to pid/cred
> +		 */
> +		steal_refs = (sent + size >= len);
> +		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
>  		if (err < 0) {
>  			kfree_skb(skb);
> -			goto out;
> +			goto out_err;
>  		}
>  		max_level = err + 1;
>  		fds_sent = true;
> @@ -1654,7 +1659,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
>  		if (err) {
>  			kfree_skb(skb);
> -			goto out;
> +			goto out_err;
>  		}
>  
>  		unix_state_lock(other);
> @@ -1671,7 +1676,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		sent += size;
>  	}
>  
> -	if (skb)
> +	if (steal_refs)
>  		scm_release(siocb->scm);
>  	else
>  		scm_destroy(siocb->scm);
> @@ -1687,9 +1692,10 @@ pipe_err:
>  		send_sig(SIGPIPE, current, 0);
>  	err = -EPIPE;
>  out_err:
> -	if (skb == NULL)
> +	if (steal_refs)
> +		scm_release(siocb->scm);
> +	else
>  		scm_destroy(siocb->scm);
> -out:
>  	siocb->scm = NULL;
>  	return sent ? : err;
>  }
> 
> 
> 

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>

^ permalink raw reply

* Re: [PATCH] per-cgroup tcp buffer limitation
From: Glauber Costa @ 2011-09-08  4:44 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-kernel, linux-mm, containers, netdev, xemul,
	David S. Miller, Hiroyouki Kamezawa, Eric W. Biederman,
	Suleiman Souhlal
In-Reply-To: <CAHH2K0aq4s1_H-yY0kA3LhM00CCNNbJZyvyBoDD6rHC+qo_gNg@mail.gmail.com>

On 09/07/2011 06:35 PM, Greg Thelen wrote:
> On Tue, Sep 6, 2011 at 3:37 PM, Glauber Costa<glommer@parallels.com>  wrote:
>> I think memcg's usage is really all you need here. In the end of the day, it
>> tells you how many pages your container has available. The whole
>> point of kmem cgroup is not any kind of reservation or accounting.
>
> The memcg does not reserve memory.  It provides upper bound limits on
> memory usage.  A careful admin can configure soft_limit_in_bytes as an
> approximation of a memory reservation.  But the soft limit is really
> more like a reclaim target when there is global memory pressure.
>
>> Once a container (or cgroup) reaches a number of objects *pinned* in memory
>> (therefore, non-reclaimable), you won't be able to grab anything from it.
>>
>>> So
>>> far my use cases involve a single memory limit which includes both
>>> kernel and user memory.  So I would need a user space agent to poll
>>> {memcg,kmem}.usage_in_bytes to apply pressure to memcg if kmem grows
>>> and visa versa.
>>
>> Maybe not.
>> If userspace memory works for you today (supposing it does), why change?
>
> Good question.  Current upstream memcg user space memory limit does
> not work for me today.  I should have made that more obvious (sorry).
> See below for details.
>
>> Right now you assign X bytes of user memory to a container, and the kernel
>> memory is shared among all of them. If this works for you, kmem_cgroup won't
>> change that. It just will impose limits over which
>> your kernel objects can't grow.
>>
>> So you don't *need* a userspace agent doing this calculation, because
>> fundamentally, nothing changed: I am not unbilling memory in memcg to bill
>> it back in kmem_cg. Of course, once it is in, you will be able to do it in
>> such a fine grained fashion if you decide to do so.
>>
>>> Do you foresee instantiation of multiple kmem cgroups, so that a
>>> process could be added into kmem/K1 or kmem/K2?  If so do you plan on
>>> supporting migration between cgroups and/or migration of kmem charge
>>> between K1 to K2?
>>
>> Yes, each container should have its own cgroup, so at least in the use
>> cases I am concerned, we will have a lot of them. But the usual lifecycle,
>> is create, execute and die. Mobility between them
>> is not something I am overly concerned right now.
>>
>>
>>>>> Do you foresee the kmem cgroup growing to include reclaimable slab,
>>>>> where freeing one type of memory allows for reclaim of the other?
>>>>
>>>> Yes, absolutely.
>
> Now I see that you're using kmem to limit the amount of unreclaimable
> kernel memory.
>
> We have a work-in-progress patch series that adds kernel memory accounting to
> memcg.  These patches allow an admin to specify a single memory limit
> for a cgroup which encompasses both user memory (as upstream memcg
> does) and also includes many kernel memory allocations (especially
> slab, page-tables).  When kernel memory grows it puts pressure on user
> memory; when user memory grows it puts pressure on reclaimable kernel
> memory using registered shrinkers.  We are in the process of cleaning
> up these memcg slab accounting patches.
 >
> In my uses cases there is a single memory limit that applies to both
> kernel and user memory.  If a separate kmem cgroup is introduced to
> manage kernel memory outside of memcg with a distinct limit, then I
> would need a user space daemon which balances memory between the kmem
> and memcg subsystems.  As kmem grows, this daemon would apply pressure
> to memcg, and as memcg grows pressure would be applied to kmem.  As
> you stated kernel memory is not necessarily reclaimable.  So such
> reclaim may fail.  My resistance to this approach is that with a
> single memory cgroup admins can do a better job packing a machine.  If
> balancing daemons are employed then more memory would need to be
> reserved and more user space cpu time would be needed to apply VM
> pressure between the types of memory.
Well, it is a way to see this. The other way to see this, is that you're
proposing to move to the kernel, something that really belongs in 
userspace. That's because:

With the information you provided me, I have no reason to believe that 
the kernel has more condition to do this work. Do the kernel have access 
to any information that userspace do not, and can't be exported? If not, 
userspace is traditionally where this sort of stuff has been done.

Using userspace CPU is no different from using kernel cpu in this 
particular case. It is all overhead, regardless where it comes from. 
Moreover, you end up setting up a policy, instead of a mechanism. What 
should be this proportion?  Do we reclaim everything with the same 
frequency? Should we be more tolerant with a specific container?

Also, If you want to allow any flexibility in this scheme, like: "Should 
this network container be able to stress the network more, pinning more 
memory, but not other subsystems?", you end up having to touch all 
individual files anyway - probably with a userspace daemon.

Also, as you noticed yourself, kernel memory is fundamentally different 
from userspace memory. You can't just set reclaim limits, since you have 
no guarantees it will work. User memory is not a scarce resource.
Kernel memory is.

>
> While there are people (like me) who want a combined memory usage
> limit there are also people (like you) who want separate user and
> kernel limiting.

Combined excludes separate. Separate does not exclude combined.

> I have toyed with the idea of having a per cgroup
> flag that determines if kernel and user memory should be combined
> charged against a single limit or if they should have separate limits.

And then every other kind of mechanism one may think of involves a 
kernel patch, instead of a potentially simple userspace change.

>   I have also wondered if there was a way to wire the usage of two
> subsystems together, then it would also meet meet my needs.  But I am
> not sure how to do that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v2 3/9] socket: initial cgroup code.
From: Glauber Costa @ 2011-09-08  4:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, xemul, netdev, linux-mm, Eric W. Biederman,
	containers, David S. Miller
In-Reply-To: <20110907221710.GA7845@shutemov.name>

On 09/07/2011 07:17 PM, Kirill A. Shutemov wrote:
> On Wed, Sep 07, 2011 at 01:23:13AM -0300, Glauber Costa wrote:
>> We aim to control the amount of kernel memory pinned at any
>> time by tcp sockets. To lay the foundations for this work,
>> this patch adds a pointer to the kmem_cgroup to the socket
>> structure.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: David S. Miller<davem@davemloft.net>
>> CC: Hiroyouki Kamezawa<kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Eric W. Biederman<ebiederm@xmission.com>
>> ---
>>   include/linux/kmem_cgroup.h |   29 +++++++++++++++++++++++++++++
>>   include/net/sock.h          |    2 ++
>>   net/core/sock.c             |    5 ++---
>>   3 files changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/kmem_cgroup.h b/include/linux/kmem_cgroup.h
>> index 0e4a74b..77076d8 100644
>> --- a/include/linux/kmem_cgroup.h
>> +++ b/include/linux/kmem_cgroup.h
>> @@ -49,5 +49,34 @@ static inline struct kmem_cgroup *kcg_from_task(struct task_struct *tsk)
>>   	return NULL;
>>   }
>>   #endif /* CONFIG_CGROUP_KMEM */
>> +
>> +#ifdef CONFIG_INET
>
> Will it break something if you define the helpers even if CONFIG_INET
> is not defined?
> It will be much cleaner. You can reuse ifdef CONFIG_CGROUP_KMEM in this
> case.

The helpers inside CONFIG_INET are needed for the network code, 
regardless of kmem cgroup is defined or not, not the other way around.

So I could remove CONFIG_INET, but I can't possibly move it inside
CONFIG_CGROUP_KMEM. So this buy us nothing.

>> +#include<net/sock.h>
>> +static inline void sock_update_kmem_cgrp(struct sock *sk)
>> +{
>> +#ifdef CONFIG_CGROUP_KMEM
>> +	sk->sk_cgrp = kcg_from_task(current);
>> +
>> +	/*
>> +	 * We don't need to protect against anything task-related, because
>> +	 * we are basically stuck with the sock pointer that won't change,
>> +	 * even if the task that originated the socket changes cgroups.
>> +	 *
>> +	 * What we do have to guarantee, is that the chain leading us to
>> +	 * the top level won't change under our noses. Incrementing the
>> +	 * reference count via cgroup_exclude_rmdir guarantees that.
>> +	 */
>> +	cgroup_exclude_rmdir(&sk->sk_cgrp->css);
>> +#endif
>> +}
>> +
>> +static inline void sock_release_kmem_cgrp(struct sock *sk)
>> +{
>> +#ifdef CONFIG_CGROUP_KMEM
>> +	cgroup_release_and_wakeup_rmdir(&sk->sk_cgrp->css);
>> +#endif
>> +}
>> +
>> +#endif /* CONFIG_INET */
>>   #endif /* _LINUX_KMEM_CGROUP_H */
>
>> @@ -2252,9 +2254,6 @@ void sk_common_release(struct sock *sk)
>>   }
>>   EXPORT_SYMBOL(sk_common_release);
>>
>> -static DEFINE_RWLOCK(proto_list_lock);
>> -static LIST_HEAD(proto_list);
>> -
>
> Wrong patch?
Yes, it is. Thanks for noticing.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode
From: Roopa Prabhu @ 2011-09-08  5:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, dragos.tatulea, arnd, dwang2, benve, kaber, sri, davem,
	eric.dumazet, mchan, kvm
In-Reply-To: <20110907123435.GF9337@redhat.com>

On 9/7/11 5:34 AM, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Sep 06, 2011 at 03:35:40PM -0700, Roopa Prabhu wrote:
>> This patch is an attempt at providing address filtering support for macvtap
>> devices in PASSTHRU mode. Its still a work in progress.
>> Briefly tested for basic functionality. Wanted to get some feedback on the
>> direction before proceeding.
>> 
> 
> Good work, thanks.
> 

Thanks.

>> I have hopefully CC'ed all concerned people.
> 
> kvm crowd might also be interested.
> Try using ./scripts/get_maintainer.pl as well.
> 
Thanks for the tip. Expanded CC list a bit more.

>> PASSTHRU mode today sets the lowerdev in promiscous mode. In PASSTHRU mode
>> there is a 1-1 mapping between macvtap device and physical nic or VF. And all
>> filtering is done in lowerdev hw. The lowerdev does not need to be in
>> promiscous mode as long as the guest filters are passed down to the lowerdev.
>> This patch tries to remove the need for putting the lowerdev in promiscous
>> mode. 
>> I have also referred to the thread below where TUNSETTXFILTER was mentioned
>> in 
>> this context: 
>>  http://patchwork.ozlabs.org/patch/69297/
>> 
>> This patch basically passes the addresses got by TUNSETTXFILTER to macvlan
>> lowerdev.
>> 
>> I have looked at previous work and discussions on this for qemu-kvm
>> by Michael Tsirkin, Alex Williamson and Dragos Tatulea
>> http://patchwork.ozlabs.org/patch/78595/
>> http://patchwork.ozlabs.org/patch/47160/
>> https://patchwork.kernel.org/patch/474481/
>> 
>> Redhat bugzilla by Michael Tsirkin:
>> https://bugzilla.redhat.com/show_bug.cgi?id=655013
>> 
>> I used Michael's qemu-kvm patch for testing the changes with KVM
>> 
>> I would like to cover both MAC and vlan filtering in this work.
>> 
>> Open Questions/Issues:
>> - There is a need for vlan filtering to complete the patch. It will require
>>   a new tap ioctl cmd for vlans.
>>   Some ideas on this are:
>> 
>>   a) TUNSETVLANFILTER: This will entail we send the whole vlan bitmap filter
>> (similar to tun_filter for addresses). Passing the vlan id's to lower
>> device will mean going thru the whole list of vlans every time.
>> 
>>   OR
>> 
>>   b) TUNSETVLAN with vlan id and flag to set/unset
>> 
>>   Does option 'b' sound ok ?
>> 
>> - In this implementation we make the macvlan address list same as the address
>>   list that came in the filter with TUNSETTXFILTER. This will not cover cases
>>   where the macvlan device needs to have other addresses that are not
>>   necessarily in the filter. Is this a problem ?
> 
> What cases do you have in mind?
> 
This patch targets only macvlan PASSTHRU mode and for PASSTHRU mode I don't
see a problem with uc/mc address list being the same in all the stacked
netdevs in the path. I called that out above to make sure I was not missing
any case in PASSTHRU mode where this might be invalid. Otherwise I don't see
a problem in the simple PASSTHRU use case this patch supports.

>> - The patch currently only supports passing of IFF_PROMISC and IFF_MULTICAST
>> filter flags to lowerdev
>> 
>> This patch series implements the following
>> 01/3 - macvlan: Add support for unicast filtering in macvlan
>> 02/3 - macvlan: Add function to set addr filter on lower device in passthru
>> mode
>> 03/3 - macvtap: Add support for TUNSETTXFILTER
>> 
>> Please comment. Thanks.
>> 
>> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
>> Signed-off-by: Christian Benvenuti <benve@cisco.com>
>> Signed-off-by: David Wang <dwang2@cisco.com>
> 
> The security isn't lower than with promisc, so I don't see
> a problem with this as such.
> 
> There are more features we'll want down the road though,
> so let's see whether the interface will be able to
> satisfy them in a backwards compatible way before we
> set it in stone. Here's what I came up with:
> 
> How will the filtering table be partitioned within guests?

Since this patch supports macvlan PASSTHRU mode only, in which the lower
device has 1-1 mapping to the guest nic, it does not require any
partitioning of filtering table within guests. Unless I missed understanding
something. 

If the lower device were being shared by multiple guest network interfaces
(non PASSTHRU mode), only then we will need to maintain separate filter
tables for each guest network interface in macvlan and forward the pkt to
respective guest interface after a filter lookup. This could affect
performance too I think.

I chose to support PASSTHRU Mode only at first because its simpler and all
code additions are in control path only.

> 
> A way to limit what the guest can do would also be useful.
> How can this be done? selinux?

I vaguely remember a thread on the same context.. had a suggestion to
maintain pre-approved address lists and allow guest filter registration of
only those addresses for security. This seemed reasonable. Plus the ability
to support additional address registration from guest could be made
configurable (One of your ideas again from prior work).

I am not an selinux expert, but I am thinking we can use it to only allow or
disallow access or operations to the macvtap device. (?). I will check more
on this.

> 
> Any thoughts on spoofing filtering?

I can only think of checking addresses against an allowed address list.
Don't know of any other ways. Any hints ?

In any case I am assuming all the protection/security measures should be
taken at the layer calling the TUNSETTXFILTER ie..In macvtap virtualization
use case its libvirt or qemu-kvm. No ?

> 
> Would it be possible to make the filtering programmable
> using netlink, e.g. ethtool, ip, or some such?

Should be possible via ethtool or ip calling ioctl TUNSETTXFILTER. Are you
thinking of macvlan having a netlink interface to set filter and not ioctl
?. Sure. But I was thinking the point of implementing TUNSETTXFILTER was to
maintain compatibility with the generic tap interface that does the same
thing. 
And having both the netlink op and ioctl interface might not be clean ?.
Sorry if I misunderstood your question.

> That would make this useful for bridged setups besides
> macvtap/virtualization.
> 

Thanks for the comments. 

^ permalink raw reply

* Re: [PATCH v2 3/9] socket: initial cgroup code.
From: Kirill A. Shutemov @ 2011-09-08  5:35 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, xemul, netdev, linux-mm, Eric W. Biederman,
	containers, David S. Miller
In-Reply-To: <4E684A6B.6030205@parallels.com>

On Thu, Sep 08, 2011 at 01:54:03AM -0300, Glauber Costa wrote:
> On 09/07/2011 07:17 PM, Kirill A. Shutemov wrote:
> > On Wed, Sep 07, 2011 at 01:23:13AM -0300, Glauber Costa wrote:
> >> We aim to control the amount of kernel memory pinned at any
> >> time by tcp sockets. To lay the foundations for this work,
> >> this patch adds a pointer to the kmem_cgroup to the socket
> >> structure.
> >>
> >> Signed-off-by: Glauber Costa<glommer@parallels.com>
> >> CC: David S. Miller<davem@davemloft.net>
> >> CC: Hiroyouki Kamezawa<kamezawa.hiroyu@jp.fujitsu.com>
> >> CC: Eric W. Biederman<ebiederm@xmission.com>
> >> ---
> >>   include/linux/kmem_cgroup.h |   29 +++++++++++++++++++++++++++++
> >>   include/net/sock.h          |    2 ++
> >>   net/core/sock.c             |    5 ++---
> >>   3 files changed, 33 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/kmem_cgroup.h b/include/linux/kmem_cgroup.h
> >> index 0e4a74b..77076d8 100644
> >> --- a/include/linux/kmem_cgroup.h
> >> +++ b/include/linux/kmem_cgroup.h
> >> @@ -49,5 +49,34 @@ static inline struct kmem_cgroup *kcg_from_task(struct task_struct *tsk)
> >>   	return NULL;
> >>   }
> >>   #endif /* CONFIG_CGROUP_KMEM */
> >> +
> >> +#ifdef CONFIG_INET
> >
> > Will it break something if you define the helpers even if CONFIG_INET
> > is not defined?
> > It will be much cleaner. You can reuse ifdef CONFIG_CGROUP_KMEM in this
> > case.
> 
> The helpers inside CONFIG_INET are needed for the network code, 
> regardless of kmem cgroup is defined or not, not the other way around.
> 
> So I could remove CONFIG_INET, but I can't possibly move it inside
> CONFIG_CGROUP_KMEM. So this buy us nothing.

You can define empty under CONFIG_CGROUP_KMEM's #else, can't you?
Like with kcg_from_cgroup()/kcg_from_task().

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Eric Dumazet @ 2011-09-08  5:59 UTC (permalink / raw)
  To: Tim Chen
  Cc: Yan, Zheng, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <1315429583.2361.3.camel@schen9-mobl>

Le mercredi 07 septembre 2011 à 14:06 -0700, Tim Chen a écrit :
> On Thu, 2011-09-08 at 08:27 +0800, Yan, Zheng wrote:
> 
> > >  	err = -EPIPE;
> > >  out_err:
> > > -	if (skb == NULL)
> > > +	if (!steal_refs)
> > >  		scm_destroy(siocb->scm);
> > 
> > I think we should call scm_release() here in the case of
> > steal_refs == true. Otherwise siocb->scm->fp may leak.
> 
> Yan Zheng,
> 
> I've updated the patch.  If it looks good to you now, can you add your
> Signed-off-by to the patch.  Pending Sedat's testing on this patch,
> I think it is good to go.
> 
> Tim
> 
> ---
> Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
> in Unix socket's send and receive path) introduced a use-after-free bug.
> The sent skbs from unix_stream_sendmsg could be consumed and destructed 
> by the receive side, removing all references to the credentials, 
> before the send side has finished sending out all 
> packets. However, send side could continue to consturct new packets in the 
> stream, using credentials that have lost its last reference and been
> freed.  
> 
> In this fix, we don't steal the reference to credentials we have obtained 
> in scm_send at beginning of unix_stream_sendmsg, till we've reached
> the last packet.  This fixes the problem in commit 0856a30409.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Reported-by: Jiri Slaby <jirislaby@gmail.com>
> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
> ---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 136298c..47780dc 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1383,10 +1383,11 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>  }
>  
>  static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
> -			   bool send_fds, bool ref)
> +			   bool send_fds, bool steal_refs)
>  {
>  	int err = 0;
> -	if (ref) {
> +
> +	if (!steal_refs) {
>  		UNIXCB(skb).pid  = get_pid(scm->pid);
>  		UNIXCB(skb).cred = get_cred(scm->cred);
>  	} else {
> @@ -1458,7 +1459,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	if (skb == NULL)
>  		goto out;
>  
> -	err = unix_scm_to_skb(siocb->scm, skb, true, false);
> +	err = unix_scm_to_skb(siocb->scm, skb, true, true);
>  	if (err < 0)
>  		goto out_free;
>  	max_level = err + 1;
> @@ -1581,6 +1582,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	int sent = 0;
>  	struct scm_cookie tmp_scm;
>  	bool fds_sent = false;
> +	bool steal_refs = false;
>  	int max_level;
>  
>  	if (NULL == siocb->scm)
> @@ -1642,11 +1644,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		size = min_t(int, size, skb_tailroom(skb));
>  
> 
> -		/* Only send the fds and no ref to pid in the first buffer */
> -		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
> +		/* Only send the fds in first buffer
> +		 * Last buffer can steal our references to pid/cred
> +		 */
> +		steal_refs = (sent + size >= len);
> +		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
>  		if (err < 0) {
>  			kfree_skb(skb);
> -			goto out;
> +			goto out_err;
>  		}
>  		max_level = err + 1;
>  		fds_sent = true;
> @@ -1654,7 +1659,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
>  		if (err) {
>  			kfree_skb(skb);
> -			goto out;
> +			goto out_err;
>  		}
>  
>  		unix_state_lock(other);
> @@ -1671,7 +1676,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		sent += size;
>  	}
>  
> -	if (skb)
> +	if (steal_refs)
>  		scm_release(siocb->scm);
>  	else
>  		scm_destroy(siocb->scm);
> @@ -1687,9 +1692,10 @@ pipe_err:
>  		send_sig(SIGPIPE, current, 0);
>  	err = -EPIPE;
>  out_err:
> -	if (skb == NULL)
> +	if (steal_refs)
> +		scm_release(siocb->scm);
> +	else
>  		scm_destroy(siocb->scm);
> -out:
>  	siocb->scm = NULL;
>  	return sent ? : err;
>  }
> 
> 
> 

I dont think this patch is good.

Sedat traces have nothing to do with af_unix.

Once unix_scm_to_skb() was called and successful, and steal_refs is true
our refs are attached to this skb. They will be released by
skb_free(skb). Same for fp : They either were sent in a previous skb or
this one.

This is why the "goto out;" was OK.

^ permalink raw reply

* Re: [RFC] interface for outgoing packet
From: Eric Dumazet @ 2011-09-08  6:09 UTC (permalink / raw)
  To: Viral Mehta; +Cc: netdev
In-Reply-To: <CANX6DanC8DJsAZj4dV3oZo3DnLV6AyxBDm82vHaJm3C51aRqEg@mail.gmail.com>

Le mercredi 07 septembre 2011 à 20:24 -0400, Viral Mehta a écrit :
> Hi All,
> 
> How reasonably I can assume that,
> 
> on whatever interface, I received the last packet
> for a particular Socket Connection, the same interface will be
> used to SEND the next packet for same socket connection?
> 

No

> I am collecting the logs on one of my test machines.
> And it shows, I can assume all the time that "interface" which received packet
> for some socket connection, the same will be used to send packet for
> that connection
> 
> But, I am not sure if I am missing some scenarios
> or setup (for e.g., bonding) where it can be wrong ?
> 
> It would be more help if some one can shed some light.

By default, a socket is not bound to one interface, so you cant assume
this.

Check SO_BINDTODEVICE socket option if you need it.

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Eric Dumazet @ 2011-09-08  6:21 UTC (permalink / raw)
  To: Tim Chen
  Cc: davem@davemloft.net, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, sfr@canb.auug.org.au, jirislaby@gmail.com,
	Shi, Alex, Valdis Kletnieks, Yan, Zheng
In-Reply-To: <1315430115.2361.11.camel@schen9-mobl>

Le mercredi 07 septembre 2011 à 14:15 -0700, Tim Chen a écrit :

> Oops, I've forgotten to add Eric's previous Signed-off-by in my latest
> patch log.  David, please add that when you pick up the patch.  
> Once Yan Zheng added his sign off and Sedat tested the patch, I think it
> will be good to go.

Tim, as soon as you post another patch version, you must remove all
prior Signed-off-by, and only add yours.

So it was fine to do so.

By the way your last version introduce a new bug, so I would rather NACK
it ;)

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Yan, Zheng @ 2011-09-08  6:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tim Chen, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <1315461572.2532.7.camel@edumazet-laptop>

On 09/08/2011 01:59 PM, Eric Dumazet wrote:
> Le mercredi 07 septembre 2011 à 14:06 -0700, Tim Chen a écrit :
>> On Thu, 2011-09-08 at 08:27 +0800, Yan, Zheng wrote:
>>
>>>>  	err = -EPIPE;
>>>>  out_err:
>>>> -	if (skb == NULL)
>>>> +	if (!steal_refs)
>>>>  		scm_destroy(siocb->scm);
>>>
>>> I think we should call scm_release() here in the case of
>>> steal_refs == true. Otherwise siocb->scm->fp may leak.
>>
>> Yan Zheng,
>>
>> I've updated the patch.  If it looks good to you now, can you add your
>> Signed-off-by to the patch.  Pending Sedat's testing on this patch,
>> I think it is good to go.
>>
>> Tim
>>
>> ---
>> Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
>> in Unix socket's send and receive path) introduced a use-after-free bug.
>> The sent skbs from unix_stream_sendmsg could be consumed and destructed 
>> by the receive side, removing all references to the credentials, 
>> before the send side has finished sending out all 
>> packets. However, send side could continue to consturct new packets in the 
>> stream, using credentials that have lost its last reference and been
>> freed.  
>>
>> In this fix, we don't steal the reference to credentials we have obtained 
>> in scm_send at beginning of unix_stream_sendmsg, till we've reached
>> the last packet.  This fixes the problem in commit 0856a30409.
>>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Reported-by: Jiri Slaby <jirislaby@gmail.com>
>> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
>> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
>> ---
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index 136298c..47780dc 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1383,10 +1383,11 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>>  }
>>  
>>  static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
>> -			   bool send_fds, bool ref)
>> +			   bool send_fds, bool steal_refs)
>>  {
>>  	int err = 0;
>> -	if (ref) {
>> +
>> +	if (!steal_refs) {
>>  		UNIXCB(skb).pid  = get_pid(scm->pid);
>>  		UNIXCB(skb).cred = get_cred(scm->cred);
>>  	} else {
>> @@ -1458,7 +1459,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>  	if (skb == NULL)
>>  		goto out;
>>  
>> -	err = unix_scm_to_skb(siocb->scm, skb, true, false);
>> +	err = unix_scm_to_skb(siocb->scm, skb, true, true);
>>  	if (err < 0)
>>  		goto out_free;
>>  	max_level = err + 1;
>> @@ -1581,6 +1582,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>  	int sent = 0;
>>  	struct scm_cookie tmp_scm;
>>  	bool fds_sent = false;
>> +	bool steal_refs = false;
>>  	int max_level;
>>  
>>  	if (NULL == siocb->scm)
>> @@ -1642,11 +1644,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>  		size = min_t(int, size, skb_tailroom(skb));
>>  
>>
>> -		/* Only send the fds and no ref to pid in the first buffer */
>> -		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
>> +		/* Only send the fds in first buffer
>> +		 * Last buffer can steal our references to pid/cred
>> +		 */
>> +		steal_refs = (sent + size >= len);
>> +		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
>>  		if (err < 0) {
>>  			kfree_skb(skb);
>> -			goto out;
>> +			goto out_err;
>>  		}
>>  		max_level = err + 1;
>>  		fds_sent = true;
>> @@ -1654,7 +1659,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>  		err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
>>  		if (err) {
>>  			kfree_skb(skb);
>> -			goto out;
>> +			goto out_err;
>>  		}
>>  
>>  		unix_state_lock(other);
>> @@ -1671,7 +1676,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>  		sent += size;
>>  	}
>>  
>> -	if (skb)
>> +	if (steal_refs)
>>  		scm_release(siocb->scm);
>>  	else
>>  		scm_destroy(siocb->scm);
>> @@ -1687,9 +1692,10 @@ pipe_err:
>>  		send_sig(SIGPIPE, current, 0);
>>  	err = -EPIPE;
>>  out_err:
>> -	if (skb == NULL)
>> +	if (steal_refs)
>> +		scm_release(siocb->scm);
>> +	else
>>  		scm_destroy(siocb->scm);
>> -out:
>>  	siocb->scm = NULL;
>>  	return sent ? : err;
>>  }
>>
>>
>>
> 
> I dont think this patch is good.
> 
> Sedat traces have nothing to do with af_unix.
> 
> Once unix_scm_to_skb() was called and successful, and steal_refs is true
> our refs are attached to this skb. They will be released by
> skb_free(skb). Same for fp : They either were sent in a previous skb or
> this one.
> 
> This is why the "goto out;" was OK.
> 

I don't think so. unix_scm_to_skb() calls unix_attach_fds(), it
always duplicates scm->fp. 

Regards

^ permalink raw reply

* [PATCH] net: phy: Add config option to specify external switch port to be used if switch is used as PHY
From: Lambrecht Jürgen @ 2011-09-08  6:54 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: linux-embedded@vger.kernel.org

Hello,

In our embedded designs, this is a useful patch. Maybe it can be useful 
for somebody else too.
Or maybe there are already better solutions?
I know I could also write a driver for our switch, but that is too much 
effort just to select the active port.

Kind regards,
Jürgen

   In embedded design, instead of a PHY, sometimes a switch is used that
           behaves as a PHY through its MII port. For example to use a daisy
           chain network configuration instead of an expensive star config.
           In that case, many phy ports are available, but only 1 should 
be used
           to check link status, and not the first one available as is 
the case
           without this configuration (that is, set to its default value 0).
           So this options specifies the switch port number to be used 
to check
           link status, because if the link is down, no data is sent by the
           TCP/IP stack.

Signed-off-by: Jürgen Lambrecht <J.Lambrecht@televic.com>
---
  drivers/net/phy/Kconfig    |   14 ++++++++++++++
  drivers/net/phy/mdio_bus.c |    9 +++++++++
  2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index a702443..554561f 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -13,6 +13,20 @@ menuconfig PHYLIB

  if PHYLIB

+config SWITCH_PHY
+       int "External switch port to be used if switch is used as PHY"
+       default "0"
+       help
+         In embedded design, instead of a PHY, sometimes a switch is 
used that
+         behaves as a PHY through its MII port. For example to use a daisy
+         chain network configuration instead of an expensive star config.
+         In that case, many phy ports are available, but only 1 should 
be used
+         to check link status, and not the first one available as is 
the case
+         without this configuration (that is, set to its default value 0).
+         So this options specifies the switch port number to be used to 
check
+         link status, because if the link is down, no data is sent by the
+         TCP/IP stack.
+
  comment "MII PHY device drivers"

  config MARVELL_PHY
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 6c58da2..016437a 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -112,7 +112,14 @@ int mdiobus_register(struct mii_bus *bus)
         if (bus->reset)
                 bus->reset(bus);

+       /* The config below is always availble with CONFIG_PHYLIB. If 0, the
+          behavior is as before without this patch (or P0 of the switch is
+          taken because it is the first one found). */
+#if CONFIG_SWITCH_PHY
+       i = CONFIG_SWITCH_PHY;
+#else
         for (i = 0; i < PHY_MAX_ADDR; i++) {
+#endif
                 if ((bus->phy_mask & (1 << i)) == 0) {
                         struct phy_device *phydev;

@@ -122,7 +129,9 @@ int mdiobus_register(struct mii_bus *bus)
                                 goto error;
                         }
                 }
+#if !CONFIG_SWITCH_PHY
         }
+#endif

         bus->state = MDIOBUS_REGISTERED;
         pr_info("%s: probed\n", bus->name);
--
1.7.1

^ permalink raw reply related

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Sedat Dilek @ 2011-09-08  7:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tim Chen, Yan, Zheng, Yan, Zheng, netdev@vger.kernel.org,
	davem@davemloft.net, sfr@canb.auug.org.au, jirislaby@gmail.com,
	Shi, Alex, Valdis Kletnieks
In-Reply-To: <1315461572.2532.7.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 5903 bytes --]

On Thu, Sep 8, 2011 at 7:59 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 07 septembre 2011 à 14:06 -0700, Tim Chen a écrit :
>> On Thu, 2011-09-08 at 08:27 +0800, Yan, Zheng wrote:
>>
>> > >   err = -EPIPE;
>> > >  out_err:
>> > > - if (skb == NULL)
>> > > + if (!steal_refs)
>> > >           scm_destroy(siocb->scm);
>> >
>> > I think we should call scm_release() here in the case of
>> > steal_refs == true. Otherwise siocb->scm->fp may leak.
>>
>> Yan Zheng,
>>
>> I've updated the patch.  If it looks good to you now, can you add your
>> Signed-off-by to the patch.  Pending Sedat's testing on this patch,
>> I think it is good to go.
>>
>> Tim
>>
>> ---
>> Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
>> in Unix socket's send and receive path) introduced a use-after-free bug.
>> The sent skbs from unix_stream_sendmsg could be consumed and destructed
>> by the receive side, removing all references to the credentials,
>> before the send side has finished sending out all
>> packets. However, send side could continue to consturct new packets in the
>> stream, using credentials that have lost its last reference and been
>> freed.
>>
>> In this fix, we don't steal the reference to credentials we have obtained
>> in scm_send at beginning of unix_stream_sendmsg, till we've reached
>> the last packet.  This fixes the problem in commit 0856a30409.
>>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Reported-by: Jiri Slaby <jirislaby@gmail.com>
>> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
>> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
>> ---
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index 136298c..47780dc 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1383,10 +1383,11 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>>  }
>>
>>  static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
>> -                        bool send_fds, bool ref)
>> +                        bool send_fds, bool steal_refs)
>>  {
>>       int err = 0;
>> -     if (ref) {
>> +
>> +     if (!steal_refs) {
>>               UNIXCB(skb).pid  = get_pid(scm->pid);
>>               UNIXCB(skb).cred = get_cred(scm->cred);
>>       } else {
>> @@ -1458,7 +1459,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>       if (skb == NULL)
>>               goto out;
>>
>> -     err = unix_scm_to_skb(siocb->scm, skb, true, false);
>> +     err = unix_scm_to_skb(siocb->scm, skb, true, true);
>>       if (err < 0)
>>               goto out_free;
>>       max_level = err + 1;
>> @@ -1581,6 +1582,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>       int sent = 0;
>>       struct scm_cookie tmp_scm;
>>       bool fds_sent = false;
>> +     bool steal_refs = false;
>>       int max_level;
>>
>>       if (NULL == siocb->scm)
>> @@ -1642,11 +1644,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>               size = min_t(int, size, skb_tailroom(skb));
>>
>>
>> -             /* Only send the fds and no ref to pid in the first buffer */
>> -             err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
>> +             /* Only send the fds in first buffer
>> +              * Last buffer can steal our references to pid/cred
>> +              */
>> +             steal_refs = (sent + size >= len);
>> +             err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
>>               if (err < 0) {
>>                       kfree_skb(skb);
>> -                     goto out;
>> +                     goto out_err;
>>               }
>>               max_level = err + 1;
>>               fds_sent = true;
>> @@ -1654,7 +1659,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>               err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
>>               if (err) {
>>                       kfree_skb(skb);
>> -                     goto out;
>> +                     goto out_err;
>>               }
>>
>>               unix_state_lock(other);
>> @@ -1671,7 +1676,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>>               sent += size;
>>       }
>>
>> -     if (skb)
>> +     if (steal_refs)
>>               scm_release(siocb->scm);
>>       else
>>               scm_destroy(siocb->scm);
>> @@ -1687,9 +1692,10 @@ pipe_err:
>>               send_sig(SIGPIPE, current, 0);
>>       err = -EPIPE;
>>  out_err:
>> -     if (skb == NULL)
>> +     if (steal_refs)
>> +             scm_release(siocb->scm);
>> +     else
>>               scm_destroy(siocb->scm);
>> -out:
>>       siocb->scm = NULL;
>>       return sent ? : err;
>>  }
>>
>>
>>
>
> I dont think this patch is good.
>
> Sedat traces have nothing to do with af_unix.
>
> Once unix_scm_to_skb() was called and successful, and steal_refs is true
> our refs are attached to this skb. They will be released by
> skb_free(skb). Same for fp : They either were sent in a previous skb or
> this one.
>
> This is why the "goto out;" was OK.
>

Good morning,

/me sees so many patches :-).
Yes, I guess the patch by Eric has nothing to do with seen
call-traces, adding "irqpoll" to Kernel command line seems to fix them
(my uptime: approx. 10:30, Eric's proposal patch in my last
patch-series is attached).

- Sedat -

[-- Attachment #2: unix-stream-Fix-use-after-free-crashes-by-edumazet.patch --]
[-- Type: text/x-diff, Size: 2709 bytes --]

Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
in Unix socket's send and receive path) introduced a use-after-free bug.
The sent skbs from unix_stream_sendmsg could be consumed and destructed 
by the receive side, removing all references to the credentials, 
before the send side has finished sending out all 
packets. However, send side could continue to consturct new packets in the 
stream, using credentials that have lost its last reference and been
freed.  

In this fix, we don't steal the reference to credentials we have obtained 
in scm_send at beginning of unix_stream_sendmsg, till we've reached
the last packet.  This fixes the problem in commit 0856a30409.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
---

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 136298c..4a324a0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1383,10 +1383,11 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 }
 
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
-			   bool send_fds, bool ref)
+			   bool send_fds, bool steal_refs)
 {
 	int err = 0;
-	if (ref) {
+
+	if (!steal_refs) {
 		UNIXCB(skb).pid  = get_pid(scm->pid);
 		UNIXCB(skb).cred = get_cred(scm->cred);
 	} else {
@@ -1458,7 +1459,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
-	err = unix_scm_to_skb(siocb->scm, skb, true, false);
+	err = unix_scm_to_skb(siocb->scm, skb, true, true);
 	if (err < 0)
 		goto out_free;
 	max_level = err + 1;
@@ -1581,6 +1582,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	int sent = 0;
 	struct scm_cookie tmp_scm;
 	bool fds_sent = false;
+	bool steal_refs = false;
 	int max_level;
 
 	if (NULL == siocb->scm)
@@ -1642,8 +1644,11 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		size = min_t(int, size, skb_tailroom(skb));
 
 
-		/* Only send the fds and no ref to pid in the first buffer */
-		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
+		/* Only send the fds in first buffer
+		 * Last buffer can steal our references to pid/cred
+		 */
+		steal_refs = (sent + size >= len);
+		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
 		if (err < 0) {
 			kfree_skb(skb);
 			goto out;
@@ -1671,7 +1676,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		sent += size;
 	}
 
-	if (skb)
+	if (steal_refs)
 		scm_release(siocb->scm);
 	else
 		scm_destroy(siocb->scm);



^ permalink raw reply related

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Eric Dumazet @ 2011-09-08  7:11 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: Tim Chen, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <4E685F19.6030407@intel.com>

Le jeudi 08 septembre 2011 à 14:22 +0800, Yan, Zheng a écrit :

> I don't think so. unix_scm_to_skb() calls unix_attach_fds(), it
> always duplicates scm->fp. 

What a mess. This code is a nightmare.

Part of the mess comes from scm_destroy() and scm_release() duplication.

We should have scm_destroy() only, as before, and NULLify scm->pid/cred
in unix_scm_to_skb() when we steal references.

It makes more sense and keeps things clearer.


 include/net/scm.h  |    9 ---------
 net/unix/af_unix.c |   27 +++++++++++++++------------
 2 files changed, 15 insertions(+), 21 deletions(-)


diff --git a/include/net/scm.h b/include/net/scm.h
index 68e1e48..2a5b42f 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -78,15 +78,6 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
 		__scm_destroy(scm);
 }
 
-static __inline__ void scm_release(struct scm_cookie *scm)
-{
-	/* keep ref on pid and cred */
-	scm->pid = NULL;
-	scm->cred = NULL;
-	if (scm->fp)
-		__scm_destroy(scm);
-}
-
 static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
 			       struct scm_cookie *scm)
 {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e6d9d10..1fa270a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1379,15 +1379,18 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 }
 
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
-			   bool send_fds, bool ref)
+			   bool send_fds, bool steal_refs)
 {
 	int err = 0;
-	if (ref) {
+
+	if (!steal_refs) {
 		UNIXCB(skb).pid  = get_pid(scm->pid);
 		UNIXCB(skb).cred = get_cred(scm->cred);
 	} else {
 		UNIXCB(skb).pid  = scm->pid;
 		UNIXCB(skb).cred = scm->cred;
+		scm->pid = NULL;
+		scm->cred = NULL;
 	}
 	UNIXCB(skb).fp = NULL;
 	if (scm->fp && send_fds)
@@ -1454,7 +1457,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
-	err = unix_scm_to_skb(siocb->scm, skb, true, false);
+	err = unix_scm_to_skb(siocb->scm, skb, true, true);
 	if (err < 0)
 		goto out_free;
 	max_level = err + 1;
@@ -1550,7 +1553,7 @@ restart:
 	unix_state_unlock(other);
 	other->sk_data_ready(other, len);
 	sock_put(other);
-	scm_release(siocb->scm);
+	scm_destroy(siocb->scm);
 	return len;
 
 out_unlock:
@@ -1577,6 +1580,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	int sent = 0;
 	struct scm_cookie tmp_scm;
 	bool fds_sent = false;
+	bool steal_refs = false;
 	int max_level;
 
 	if (NULL == siocb->scm)
@@ -1638,8 +1642,11 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		size = min_t(int, size, skb_tailroom(skb));
 
 
-		/* Only send the fds and no ref to pid in the first buffer */
-		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
+		/* Only send the fds in first buffer
+		 * Last buffer can steal our references to pid/cred
+		 */
+		steal_refs = (sent + size >= len);
+		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
 		if (err < 0) {
 			kfree_skb(skb);
 			goto out;
@@ -1667,10 +1674,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		sent += size;
 	}
 
-	if (skb)
-		scm_release(siocb->scm);
-	else
-		scm_destroy(siocb->scm);
+	scm_destroy(siocb->scm);
 	siocb->scm = NULL;
 
 	return sent;
@@ -1683,8 +1687,7 @@ pipe_err:
 		send_sig(SIGPIPE, current, 0);
 	err = -EPIPE;
 out_err:
-	if (skb == NULL)
-		scm_destroy(siocb->scm);
+	scm_destroy(siocb->scm);
 out:
 	siocb->scm = NULL;
 	return sent ? : err;

^ permalink raw reply related

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Yan, Zheng @ 2011-09-08  7:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tim Chen, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <1315465919.2532.19.camel@edumazet-laptop>

On 09/08/2011 03:11 PM, Eric Dumazet wrote:
> Le jeudi 08 septembre 2011 à 14:22 +0800, Yan, Zheng a écrit :
> 
>> I don't think so. unix_scm_to_skb() calls unix_attach_fds(), it
>> always duplicates scm->fp. 
> 
> What a mess. This code is a nightmare.
> 
> Part of the mess comes from scm_destroy() and scm_release() duplication.
> 
> We should have scm_destroy() only, as before, and NULLify scm->pid/cred
> in unix_scm_to_skb() when we steal references.
> 
> It makes more sense and keeps things clearer.
> 
> 
>  include/net/scm.h  |    9 ---------
>  net/unix/af_unix.c |   27 +++++++++++++++------------
>  2 files changed, 15 insertions(+), 21 deletions(-)
> 
> 
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 68e1e48..2a5b42f 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -78,15 +78,6 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
>  		__scm_destroy(scm);
>  }
>  
> -static __inline__ void scm_release(struct scm_cookie *scm)
> -{
> -	/* keep ref on pid and cred */
> -	scm->pid = NULL;
> -	scm->cred = NULL;
> -	if (scm->fp)
> -		__scm_destroy(scm);
> -}
> -
>  static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
>  			       struct scm_cookie *scm)
>  {
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index e6d9d10..1fa270a 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1379,15 +1379,18 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>  }
>  
>  static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
> -			   bool send_fds, bool ref)
> +			   bool send_fds, bool steal_refs)
>  {
>  	int err = 0;
> -	if (ref) {
> +
> +	if (!steal_refs) {
>  		UNIXCB(skb).pid  = get_pid(scm->pid);
>  		UNIXCB(skb).cred = get_cred(scm->cred);
>  	} else {
>  		UNIXCB(skb).pid  = scm->pid;
>  		UNIXCB(skb).cred = scm->cred;
> +		scm->pid = NULL;
> +		scm->cred = NULL;
>  	}
>  	UNIXCB(skb).fp = NULL;
>  	if (scm->fp && send_fds)
> @@ -1454,7 +1457,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	if (skb == NULL)
>  		goto out;
>  
> -	err = unix_scm_to_skb(siocb->scm, skb, true, false);
> +	err = unix_scm_to_skb(siocb->scm, skb, true, true);
>  	if (err < 0)
>  		goto out_free;
>  	max_level = err + 1;
> @@ -1550,7 +1553,7 @@ restart:
>  	unix_state_unlock(other);
>  	other->sk_data_ready(other, len);
>  	sock_put(other);
> -	scm_release(siocb->scm);
> +	scm_destroy(siocb->scm);
>  	return len;
>  
>  out_unlock:
> @@ -1577,6 +1580,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  	int sent = 0;
>  	struct scm_cookie tmp_scm;
>  	bool fds_sent = false;
> +	bool steal_refs = false;
>  	int max_level;
>  
>  	if (NULL == siocb->scm)
> @@ -1638,8 +1642,11 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		size = min_t(int, size, skb_tailroom(skb));
>  
>  
> -		/* Only send the fds and no ref to pid in the first buffer */
> -		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
> +		/* Only send the fds in first buffer
> +		 * Last buffer can steal our references to pid/cred
> +		 */
> +		steal_refs = (sent + size >= len);
> +		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
>  		if (err < 0) {
>  			kfree_skb(skb);
>  			goto out;
> @@ -1667,10 +1674,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		sent += size;
>  	}
>  
> -	if (skb)
> -		scm_release(siocb->scm);
> -	else
> -		scm_destroy(siocb->scm);
> +	scm_destroy(siocb->scm);
>  	siocb->scm = NULL;
>  
>  	return sent;
> @@ -1683,8 +1687,7 @@ pipe_err:
>  		send_sig(SIGPIPE, current, 0);
>  	err = -EPIPE;
>  out_err:
> -	if (skb == NULL)
> -		scm_destroy(siocb->scm);
> +	scm_destroy(siocb->scm);
>  out:
>  	siocb->scm = NULL;
>  	return sent ? : err;
> 
> 

This code looks great, except "goto out;" is still there. I think we should replace it with "goto out_err;" :)

Regards

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Eric Dumazet @ 2011-09-08  7:33 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: Tim Chen, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	jirislaby@gmail.com, Shi, Alex, Valdis Kletnieks
In-Reply-To: <4E686D71.30603@intel.com>

Le jeudi 08 septembre 2011 à 15:23 +0800, Yan, Zheng a écrit :

> This code looks great, except "goto out;" is still there. I think we should replace it with "goto out_err;" :)
> 

Indeed, you're right, thanks

 include/net/scm.h  |    9 ---------
 net/unix/af_unix.c |   32 +++++++++++++++++---------------
 2 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 68e1e48..2a5b42f 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -78,15 +78,6 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
 		__scm_destroy(scm);
 }
 
-static __inline__ void scm_release(struct scm_cookie *scm)
-{
-	/* keep ref on pid and cred */
-	scm->pid = NULL;
-	scm->cred = NULL;
-	if (scm->fp)
-		__scm_destroy(scm);
-}
-
 static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
 			       struct scm_cookie *scm)
 {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e6d9d10..c8a08ba 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1379,15 +1379,18 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 }
 
 static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
-			   bool send_fds, bool ref)
+			   bool send_fds, bool steal_refs)
 {
 	int err = 0;
-	if (ref) {
+
+	if (!steal_refs) {
 		UNIXCB(skb).pid  = get_pid(scm->pid);
 		UNIXCB(skb).cred = get_cred(scm->cred);
 	} else {
 		UNIXCB(skb).pid  = scm->pid;
 		UNIXCB(skb).cred = scm->cred;
+		scm->pid = NULL;
+		scm->cred = NULL;
 	}
 	UNIXCB(skb).fp = NULL;
 	if (scm->fp && send_fds)
@@ -1454,7 +1457,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
-	err = unix_scm_to_skb(siocb->scm, skb, true, false);
+	err = unix_scm_to_skb(siocb->scm, skb, true, true);
 	if (err < 0)
 		goto out_free;
 	max_level = err + 1;
@@ -1550,7 +1553,7 @@ restart:
 	unix_state_unlock(other);
 	other->sk_data_ready(other, len);
 	sock_put(other);
-	scm_release(siocb->scm);
+	scm_destroy(siocb->scm);
 	return len;
 
 out_unlock:
@@ -1577,6 +1580,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	int sent = 0;
 	struct scm_cookie tmp_scm;
 	bool fds_sent = false;
+	bool steal_refs = false;
 	int max_level;
 
 	if (NULL == siocb->scm)
@@ -1638,11 +1642,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		size = min_t(int, size, skb_tailroom(skb));
 
 
-		/* Only send the fds and no ref to pid in the first buffer */
-		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, fds_sent);
+		/* Only send the fds in first buffer
+		 * Last buffer can steal our references to pid/cred
+		 */
+		steal_refs = (sent + size >= len);
+		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, steal_refs);
 		if (err < 0) {
 			kfree_skb(skb);
-			goto out;
+			goto out_err;
 		}
 		max_level = err + 1;
 		fds_sent = true;
@@ -1650,7 +1657,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
 		if (err) {
 			kfree_skb(skb);
-			goto out;
+			goto out_err;
 		}
 
 		unix_state_lock(other);
@@ -1667,10 +1674,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		sent += size;
 	}
 
-	if (skb)
-		scm_release(siocb->scm);
-	else
-		scm_destroy(siocb->scm);
+	scm_destroy(siocb->scm);
 	siocb->scm = NULL;
 
 	return sent;
@@ -1683,9 +1687,7 @@ pipe_err:
 		send_sig(SIGPIPE, current, 0);
 	err = -EPIPE;
 out_err:
-	if (skb == NULL)
-		scm_destroy(siocb->scm);
-out:
+	scm_destroy(siocb->scm);
 	siocb->scm = NULL;
 	return sent ? : err;
 }

^ permalink raw reply related

* e1000e: NIC not working (afer resume?)
From: Frederik Himpe @ 2011-09-08  7:45 UTC (permalink / raw)
  To: netdev

I have a Dell Latitude E6400 which has a network card supported by the 
e1000e driver. Often (I think after a suspend/resume cycle), the network 
card does not work at all: the NIC is correctly seen by ifconfig, but 
running ethtool just returns: "No such device". dhclient -v gives the 
impression that it's correctly sending out DHCPDISCOVER packets on the 
NIC, but a tcpdump running on the same machine does not see any packets 
going out.

I'm using Debian's 3.0.0-3 kernel (corresponding with Linux 3.0.3).

Full lspci, .config and dmesg output at
http://artipc10.vub.ac.be/~frederik/e1000e/

Here is some relevant summary.

How can I find out what is going wrong?

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:21:70:e1:bb:4c  
          UP BROADCAST PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:22 Memory:f6ae0000-f6b00000 


# ethtool eth0
Settings for eth0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available

# lspci -vvnn
00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit Network Connection [8086:10f5] (rev 03)
        Subsystem: Dell Device [1028:0233]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 22
        Region 0: Memory at f6ae0000 (32-bit, non-prefetchable) [disabled] [size=128K]
        Region 1: Memory at f6adb000 (32-bit, non-prefetchable) [disabled] [size=4K]
        Region 2: I/O ports at efe0 [disabled] [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME+
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 00000000fee0300c  Data: 4182
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: e1000e


# dmesg | grep -E "e1000e|eth0"
[    1.027437] e1000e: Intel(R) PRO/1000 Network Driver - 1.3.10-k2
[    1.027441] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
[    1.027480] e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[    1.027491] e1000e 0000:00:19.0: setting latency timer to 64
[    1.027605] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[    1.231440] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GT/s:Width x1) 00:21:70:e1:bb:4c
[    1.231444] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
[    1.231470] e1000e 0000:00:19.0: eth0: MAC: 7, PHY: 8, PBA No: 1004FF-0FF
[   22.896268] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[   22.952097] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[   22.954132] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   24.504903] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[   24.506413] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO
[   24.508402] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   34.788022] eth0: no IPv6 routers present
[   41.444922] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[   41.446325] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO
[25136.918393] e1000e 0000:00:19.0: PME# enabled
[25142.488050] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[25142.488058] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[25142.488066] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[25142.488085] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[25142.488110] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[25142.488167] e1000e 0000:00:19.0: PME# disabled
[25142.510966] e1000e 0000:00:19.0: PCI INT A disabled
[25142.510971] e1000e 0000:00:19.0: PME# enabled
[25143.468668] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[25143.468689] e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0xefe1)
[25143.468696] e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xf6adb000)
[25143.468703] e1000e 0000:00:19.0: restoring config space at offset 0x4 (was 0x0, writing 0xf6ae0000)
[25143.468713] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[25143.471149] e1000e 0000:00:19.0: PME# disabled
[25143.471277] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[25145.276441] e1000e 0000:00:19.0: PME# enabled
[25146.082051] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[25146.082076] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[25146.082090] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[25146.082126] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[25146.082169] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[25146.082280] e1000e 0000:00:19.0: PME# disabled
[25146.176605] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[25146.232228] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[25146.234003] ADDRCONF(NETDEV_UP): eth0: link is not ready
[25147.328875] e1000e 0000:00:19.0: PME# enabled
[26261.896126] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[26261.896140] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[26261.896149] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[26261.896169] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[26261.896196] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[26261.896241] e1000e 0000:00:19.0: PME# disabled
[26261.896316] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26261.972148] e1000e 0000:00:19.0: PME# enabled
[26268.192160] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[26268.192168] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[26268.192175] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[26268.192194] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[26268.192219] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[26268.192274] e1000e 0000:00:19.0: PME# disabled
[26268.213875] e1000e 0000:00:19.0: PME# enabled
[26269.172481] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[26269.172497] e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0xefe1)
[26269.172502] e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xf6adb000)
[26269.172507] e1000e 0000:00:19.0: restoring config space at offset 0x4 (was 0x0, writing 0xf6ae0000)
[26269.172515] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[26269.174262] e1000e 0000:00:19.0: PME# disabled
[26269.174339] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26272.384806] e1000e 0000:00:19.0: PME# enabled
[26274.688310] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26275.482490] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26276.277343] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26277.075194] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26277.870807] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26278.667881] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26279.463492] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26280.258982] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26281.056598] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26281.855386] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26282.649838] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26283.445674] e1000e 0000:00:19.0: eth0: Error reading PHY register
[26289.848036] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[26289.848044] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[26289.848050] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[26289.848067] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[26289.848090] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[26289.848146] e1000e 0000:00:19.0: PME# disabled
[26289.940279] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26289.996092] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26289.996761] ADDRCONF(NETDEV_UP): eth0: link is not ready
[26291.088343] e1000e 0000:00:19.0: PME# enabled
[26900.272077] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[26900.272096] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[26900.272111] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[26900.272142] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[26900.272180] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[26900.272250] e1000e 0000:00:19.0: PME# disabled
[26900.272379] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26900.274426] e1000e 0000:00:19.0: eth0: MAC Wakeup cause - Link Status Change
[26905.560584] e1000e: Intel(R) PRO/1000 Network Driver - 1.3.10-k2
[26905.560591] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
[26905.560650] e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[26905.560665] e1000e 0000:00:19.0: setting latency timer to 64
[26905.560836] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26905.744547] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GT/s:Width x1) 00:21:70:e1:bb:4c
[26905.744552] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
[26905.744581] e1000e 0000:00:19.0: eth0: MAC: 7, PHY: 8, PBA No: 1004FF-0FF
[26905.744594] e1000e 0000:00:19.0: PME# enabled
[26905.880141] e1000e 0000:00:19.0: BAR 0: set to [mem 0xf6ae0000-0xf6afffff] (PCI address [0xf6ae0000-0xf6afffff])
[26905.880149] e1000e 0000:00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] (PCI address [0xf6adb000-0xf6adbfff])
[26905.880155] e1000e 0000:00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI address [0xefe0-0xefff])
[26905.880172] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[26905.880194] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[26905.880247] e1000e 0000:00:19.0: PME# disabled
[26905.968320] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26906.024395] e1000e 0000:00:19.0: irq 43 for MSI/MSI-X
[26906.025907] ADDRCONF(NETDEV_UP): eth0: link is not ready
[26907.112415] e1000e 0000:00:19.0: PME# enabled
[27308.975856] device eth0 entered promiscuous mode
[27326.414382] device eth0 left promiscuous mode
[27336.815362] device eth0 entered promiscuous mode

* Right after start up: machine is in docking station, network is working 
fine.
* At 25136: resume machine, not in docking station, no network cable 
plugged in, so network connection was not tested.
* At 26261: resume machine, in docking station, network cable plugged in, 
but network connection is not working
* At 26900: rmmod e1000e && modprobe e1000e: network still not working

-- 
Frederik Himpe

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Jiri Slaby @ 2011-09-08  7:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Yan, Zheng, Tim Chen, sedat.dilek@gmail.com, Yan, Zheng,
	netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	Shi, Alex, Valdis Kletnieks
In-Reply-To: <1315465919.2532.19.camel@edumazet-laptop>

On 09/08/2011 09:11 AM, Eric Dumazet wrote:
> Le jeudi 08 septembre 2011 à 14:22 +0800, Yan, Zheng a écrit :
> 
>> I don't think so. unix_scm_to_skb() calls unix_attach_fds(), it
>> always duplicates scm->fp. 
> 
> What a mess. This code is a nightmare.
> 
> Part of the mess comes from scm_destroy() and scm_release() duplication.
> 
> We should have scm_destroy() only, as before, and NULLify scm->pid/cred
> in unix_scm_to_skb() when we steal references.

This patch works for me. I haven't tried the out_err fixup from the
followup, but I assume I won't spot a difference as those are fail paths
anyway...

thanks,
-- 
js

^ permalink raw reply

* Re: [PATCH] net: phy: Add config option to specify external switch port to be used if switch is used as PHY
From: Florian Fainelli @ 2011-09-08  8:39 UTC (permalink / raw)
  To: Lambrecht Jürgen
  Cc: netdev@vger.kernel.org, linux-embedded@vger.kernel.org
In-Reply-To: <4E68668F.9060008@televic.com>

Hello Jurgen,

On Thursday 08 September 2011 08:54:07 Lambrecht Jürgen wrote:
> Hello,
> 
> In our embedded designs, this is a useful patch. Maybe it can be useful
> for somebody else too.
> Or maybe there are already better solutions?
> I know I could also write a driver for our switch, but that is too much
> effort just to select the active port.

This is not going to work well with all switches out there. You could use the 
fixed-PHY driver to make your ethernet driver see the link as always up between 
the MAC and switch CPU port.

A better solution would be to have proper switch drivers and user-space, which 
reminds me that we (OpenWrt) should at some point propose our switch drivers 
[1] for review.

[1]: 
https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/net/phy/

> 
> Kind regards,
> Jürgen
> 
>    In embedded design, instead of a PHY, sometimes a switch is used that
>            behaves as a PHY through its MII port. For example to use a
> daisy chain network configuration instead of an expensive star config. In
> that case, many phy ports are available, but only 1 should be used
>            to check link status, and not the first one available as is
> the case
>            without this configuration (that is, set to its default value
> 0). So this options specifies the switch port number to be used to check
>            link status, because if the link is down, no data is sent by the
>            TCP/IP stack.
> 
> Signed-off-by: Jürgen Lambrecht <J.Lambrecht@televic.com>
> ---
>   drivers/net/phy/Kconfig    |   14 ++++++++++++++
>   drivers/net/phy/mdio_bus.c |    9 +++++++++
>   2 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index a702443..554561f 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -13,6 +13,20 @@ menuconfig PHYLIB
> 
>   if PHYLIB
> 
> +config SWITCH_PHY
> +       int "External switch port to be used if switch is used as PHY"
> +       default "0"
> +       help
> +         In embedded design, instead of a PHY, sometimes a switch is
> used that
> +         behaves as a PHY through its MII port. For example to use a daisy
> +         chain network configuration instead of an expensive star config.
> +         In that case, many phy ports are available, but only 1 should
> be used
> +         to check link status, and not the first one available as is
> the case
> +         without this configuration (that is, set to its default value 0).
> +         So this options specifies the switch port number to be used to
> check
> +         link status, because if the link is down, no data is sent by the
> +         TCP/IP stack.
> +
>   comment "MII PHY device drivers"
> 
>   config MARVELL_PHY
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index 6c58da2..016437a 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -112,7 +112,14 @@ int mdiobus_register(struct mii_bus *bus)
>          if (bus->reset)
>                  bus->reset(bus);
> 
> +       /* The config below is always availble with CONFIG_PHYLIB. If 0,
> the +          behavior is as before without this patch (or P0 of the
> switch is +          taken because it is the first one found). */
> +#if CONFIG_SWITCH_PHY
> +       i = CONFIG_SWITCH_PHY;
> +#else
>          for (i = 0; i < PHY_MAX_ADDR; i++) {
> +#endif
>                  if ((bus->phy_mask & (1 << i)) == 0) {
>                          struct phy_device *phydev;
> 
> @@ -122,7 +129,9 @@ int mdiobus_register(struct mii_bus *bus)
>                                  goto error;
>                          }
>                  }
> +#if !CONFIG_SWITCH_PHY
>          }
> +#endif
> 
>          bus->state = MDIOBUS_REGISTERED;
>          pr_info("%s: probed\n", bus->name);
> --
> 1.7.1
> N�����r��y����b�X��ǧv�^�)޺{.n�+���z�^�)����w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ����
> &�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w��٥

-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox