Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/9] net: support multiple independant multicast routing instances
From: David Miller @ 2010-04-13 21:51 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1271171003-11901-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Tue, 13 Apr 2010 17:03:14 +0200

> this is an updated patchset of my patches to support multiple independant
> multicast routing instances. Changes since the last posting are:
> 
> - rebase to the current net-next-2.6.git tree
> - fix up patch subjects to consistently refer to "ipv4: ipmr:"
> - fix up list_head conversion patch to add new elements at the head of
>   the list instead of at the tail
> 
> Please apply or pull from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6.git master

I applied the patches instead of pulling just to check your email
patch submission format, and it was perfect! :-)

I'll do a git pull next time.

All applied to net-next-2.6, thanks!

^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Eric Dumazet @ 2010-04-13 21:46 UTC (permalink / raw)
  To: David Miller; +Cc: smulcahy, bhutchings, netdev, ben, aabdulla, 572201
In-Reply-To: <20100413.144340.138717714.davem@davemloft.net>

Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit :
> Do you really come to the conclusion that TSO is broken with the above
> test results?
> 
> I would conclude that there is a TX checksumming issue, since merely
> turning TSO off does not fix the problem whereas turning TX
> checksumming off does.

Indeed, we clarified the point and it is a TX checksum issue.



^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: David Miller @ 2010-04-13 21:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: smulcahy, bhutchings, netdev, ben, aabdulla, 572201
In-Reply-To: <1271169741.16881.437.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 16:42:21 +0200

> Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
>> Ok, I've tried both of the following with my reproducer
>> 
>> 1. ethtool -K eth0 tso off
>> 
>> RESULT: reproducer causes multiple hosts to be come unresponsive on 
>> first run.
>> 
>> 2. ethtool -K eth0 tx off
>> 
>> RESULT: reproducer runs three times without any hosts becoming unresponsive.
>> 
>> -stephen
> 
> Thanks Stephen !
> 
> Now some brave fouls to check the 6410 lines of this driver ? ;)
> 
> Question of the day : Why TSO is broken in forcedeth ?
> Is it generically broken or is it broken for specific NICS ?

Do you really come to the conclusion that TSO is broken with the above
test results?

I would conclude that there is a TX checksumming issue, since merely
turning TSO off does not fix the problem whereas turning TX
checksumming off does.

^ permalink raw reply

* RE: [PATCH 2/3] cxgb4i: main driver files
From: Karen Xie @ 2010-04-13 21:41 UTC (permalink / raw)
  To: Mike Christie, open-iscsi
  Cc: Rakesh Ranjan, netdev, linux-scsi, linux-kernel, davem,
	James.Bottomley
In-Reply-To: <4BC4D711.5030802@cs.wisc.edu>

Hi, Mike,

Yes, will do that for the next submission.

Thanks,
Karen

-----Original Message-----
From: Mike Christie [mailto:michaelc@cs.wisc.edu] 
Sent: Tuesday, April 13, 2010 1:42 PM
To: open-iscsi@googlegroups.com
Cc: Rakesh Ranjan; netdev@vger.kernel.org; linux-scsi@vger.kernel.org;
linux-kernel@vger.kernel.org; Karen Xie; davem@davemloft.net;
James.Bottomley@hansenpartnership.com
Subject: Re: [PATCH 2/3] cxgb4i: main driver files

On 04/08/2010 07:14 AM, Rakesh Ranjan wrote:
> +static inline int cxgb4i_ddp_gl_map(struct pci_dev *pdev,
> +				struct cxgb4i_gather_list *gl)
> +{
> +	int i;
> +
> +	for (i = 0; i<  gl->nelem; i++) {
> +		gl->phys_addr[i] = pci_map_page(pdev, gl->pages[i], 0,
> +						PAGE_SIZE,

Hey Rakesh,

I guess we are trying to move away from the pci mapping functions move 
to the dma ones. On your next submission, could you fix those up too?

^ permalink raw reply

* Re: [PATCH Resubmission] drivers/net/usb: Add new driver ipheth
From: David Miller @ 2010-04-13 21:29 UTC (permalink / raw)
  To: agimenez
  Cc: linux-kernel, dgiagio, dborca, gregkh, jonas.sjoquist,
	steve.glendinning, torgny.johansson, dbrownell, omar.oberthur,
	linux-usb, netdev
In-Reply-To: <4BC4BFFD.9040802@sysvalve.es>

From: "L. Alberto Giménez" <agimenez@sysvalve.es>
Date: Tue, 13 Apr 2010 21:03:25 +0200

> Thanks for the info. I didn't know that I had to add an entry on the
> upper level Makefile. I guess that something like
> obj-$(CONFIG_USB_IPHETH) += usb/ should be enough? (I got it from the
> other USB net drivers).

Yes.

^ permalink raw reply

* [PATCH net-next-2.6] drivers: net: use skb_headlen()
From: Eric Dumazet @ 2010-04-13 20:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

replaces (skb->len - skb->data_len) occurrences by skb_headlen(skb)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/atm/eni.c                  |    2 +-
 drivers/atm/he.c                   |    4 ++--
 drivers/net/3c59x.c                |    4 ++--
 drivers/net/atl1e/atl1e_main.c     |    2 +-
 drivers/net/atlx/atl1.c            |    4 ++--
 drivers/net/benet/be_main.c        |    4 ++--
 drivers/net/chelsio/sge.c          |    8 ++++----
 drivers/net/e1000/e1000_main.c     |    4 ++--
 drivers/net/e1000e/netdev.c        |    4 ++--
 drivers/net/ehea/ehea_main.c       |   10 +++++-----
 drivers/net/forcedeth.c            |    4 ++--
 drivers/net/ixgbevf/ixgbevf_main.c |    2 +-
 drivers/net/ksz884x.c              |    2 +-
 drivers/net/myri10ge/myri10ge.c    |    2 +-
 drivers/net/s2io.c                 |    4 ++--
 drivers/net/tehuti.c               |    2 +-
 drivers/net/tsi108_eth.c           |    4 ++--
 17 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index 719ec5a..90a5a7c 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -1131,7 +1131,7 @@ DPRINTK("doing direct send\n"); /* @@@ well, this doesn't work anyway */
 			if (i == -1)
 				put_dma(tx->index,eni_dev->dma,&j,(unsigned long)
 				    skb->data,
-				    skb->len - skb->data_len);
+				    skb_headlen(skb));
 			else
 				put_dma(tx->index,eni_dev->dma,&j,(unsigned long)
 				    skb_shinfo(skb)->frags[i].page + skb_shinfo(skb)->frags[i].page_offset,
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index c213e0d..56c2e99 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2664,8 +2664,8 @@ he_send(struct atm_vcc *vcc, struct sk_buff *skb)
 
 #ifdef USE_SCATTERGATHER
 	tpd->iovec[slot].addr = pci_map_single(he_dev->pci_dev, skb->data,
-				skb->len - skb->data_len, PCI_DMA_TODEVICE);
-	tpd->iovec[slot].len = skb->len - skb->data_len;
+				skb_headlen(skb), PCI_DMA_TODEVICE);
+	tpd->iovec[slot].len = skb_headlen(skb);
 	++slot;
 
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 5f92fdb..9752530 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -2129,8 +2129,8 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		int i;
 
 		vp->tx_ring[entry].frag[0].addr = cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data,
-										skb->len-skb->data_len, PCI_DMA_TODEVICE));
-		vp->tx_ring[entry].frag[0].length = cpu_to_le32(skb->len-skb->data_len);
+										skb_headlen(skb), PCI_DMA_TODEVICE));
+		vp->tx_ring[entry].frag[0].length = cpu_to_le32(skb_headlen(skb));
 
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
diff --git a/drivers/net/atl1e/atl1e_main.c b/drivers/net/atl1e/atl1e_main.c
index b6605d4..d45356f 100644
--- a/drivers/net/atl1e/atl1e_main.c
+++ b/drivers/net/atl1e/atl1e_main.c
@@ -1679,7 +1679,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
 {
 	struct atl1e_tpd_desc *use_tpd = NULL;
 	struct atl1e_tx_buffer *tx_buffer = NULL;
-	u16 buf_len = skb->len - skb->data_len;
+	u16 buf_len = skb_headsize(skb);
 	u16 map_len = 0;
 	u16 mapped_len = 0;
 	u16 hdr_len = 0;
diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 0ebd820..33448a0 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2347,7 +2347,7 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb,
 {
 	struct atl1_adapter *adapter = netdev_priv(netdev);
 	struct atl1_tpd_ring *tpd_ring = &adapter->tpd_ring;
-	int len = skb->len;
+	int len;
 	int tso;
 	int count = 1;
 	int ret_val;
@@ -2359,7 +2359,7 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb,
 	unsigned int f;
 	unsigned int proto_hdr_len;
 
-	len -= skb->data_len;
+	len = skb_headlen(skb);
 
 	if (unlikely(skb->len <= 0)) {
 		dev_kfree_skb_any(skb);
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 18e0a80..fa10f13 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -432,7 +432,7 @@ static int make_tx_wrbs(struct be_adapter *adapter,
 	map_head = txq->head;
 
 	if (skb->len > skb->data_len) {
-		int len = skb->len - skb->data_len;
+		int len = skb_headlen(skb);
 		busaddr = pci_map_single(pdev, skb->data, len,
 					 PCI_DMA_TODEVICE);
 		if (pci_dma_mapping_error(pdev, busaddr))
@@ -1098,7 +1098,7 @@ static void be_tx_compl_process(struct be_adapter *adapter, u16 last_index)
 		cur_index = txq->tail;
 		wrb = queue_tail_node(txq);
 		unmap_tx_frag(adapter->pdev, wrb, (unmap_skb_hdr &&
-					sent_skb->len > sent_skb->data_len));
+					skb_headlen(sent_skb)));
 		unmap_skb_hdr = false;
 
 		num_wrbs++;
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
index a8ffc1e..f01cfdb 100644
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1123,7 +1123,7 @@ static inline unsigned int compute_large_page_tx_descs(struct sk_buff *skb)
 
 	if (PAGE_SIZE > SGE_TX_DESC_MAX_PLEN) {
 		unsigned int nfrags = skb_shinfo(skb)->nr_frags;
-		unsigned int i, len = skb->len - skb->data_len;
+		unsigned int i, len = skb_headlen(skb);
 		while (len > SGE_TX_DESC_MAX_PLEN) {
 			count++;
 			len -= SGE_TX_DESC_MAX_PLEN;
@@ -1219,10 +1219,10 @@ static inline void write_tx_descs(struct adapter *adapter, struct sk_buff *skb,
 	ce = &q->centries[pidx];
 
 	mapping = pci_map_single(adapter->pdev, skb->data,
-				skb->len - skb->data_len, PCI_DMA_TODEVICE);
+				 skb_headlen(skb), PCI_DMA_TODEVICE);
 
 	desc_mapping = mapping;
-	desc_len = skb->len - skb->data_len;
+	desc_len = skb_headlen(skb);
 
 	flags = F_CMD_DATAVALID | F_CMD_SOP |
 	    V_CMD_EOP(nfrags == 0 && desc_len <= SGE_TX_DESC_MAX_PLEN) |
@@ -1258,7 +1258,7 @@ static inline void write_tx_descs(struct adapter *adapter, struct sk_buff *skb,
 
 	ce->skb = NULL;
 	dma_unmap_addr_set(ce, dma_addr, mapping);
-	dma_unmap_len_set(ce, dma_len, skb->len - skb->data_len);
+	dma_unmap_len_set(ce, dma_len, skb_headlen(skb));
 
 	for (i = 0; nfrags--; i++) {
 		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 47da5fc..974a02d 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2929,7 +2929,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 	unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD;
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
-	unsigned int len = skb->len - skb->data_len;
+	unsigned int len = skb_headlen(skb);
 	unsigned int nr_frags;
 	unsigned int mss;
 	int count = 0;
@@ -2980,7 +2980,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 					dev_kfree_skb_any(skb);
 					return NETDEV_TX_OK;
 				}
-				len = skb->len - skb->data_len;
+				len = skb_headlen(skb);
 				break;
 			default:
 				/* do nothing */
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 38390b5..214db04 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4130,7 +4130,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 	unsigned int max_per_txd = E1000_MAX_PER_TXD;
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
-	unsigned int len = skb->len - skb->data_len;
+	unsigned int len = skb_headsize(skb);
 	unsigned int nr_frags;
 	unsigned int mss;
 	int count = 0;
@@ -4180,7 +4180,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 				dev_kfree_skb_any(skb);
 				return NETDEV_TX_OK;
 			}
-			len = skb->len - skb->data_len;
+			len = skb_headlen(skb);
 		}
 	}
 
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index e2d25fb..3f445ef 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -1618,7 +1618,7 @@ static void write_swqe2_TSO(struct sk_buff *skb,
 {
 	struct ehea_vsgentry *sg1entry = &swqe->u.immdata_desc.sg_entry;
 	u8 *imm_data = &swqe->u.immdata_desc.immediate_data[0];
-	int skb_data_size = skb->len - skb->data_len;
+	int skb_data_size = skb_headlen(skb);
 	int headersize;
 
 	/* Packet is TCP with TSO enabled */
@@ -1629,7 +1629,7 @@ static void write_swqe2_TSO(struct sk_buff *skb,
 	 */
 	headersize = ETH_HLEN + ip_hdrlen(skb) + tcp_hdrlen(skb);
 
-	skb_data_size = skb->len - skb->data_len;
+	skb_data_size = skb_headlen(skb);
 
 	if (skb_data_size >= headersize) {
 		/* copy immediate data */
@@ -1651,7 +1651,7 @@ static void write_swqe2_TSO(struct sk_buff *skb,
 static void write_swqe2_nonTSO(struct sk_buff *skb,
 			       struct ehea_swqe *swqe, u32 lkey)
 {
-	int skb_data_size = skb->len - skb->data_len;
+	int skb_data_size = skb_headlen(skb);
 	u8 *imm_data = &swqe->u.immdata_desc.immediate_data[0];
 	struct ehea_vsgentry *sg1entry = &swqe->u.immdata_desc.sg_entry;
 
@@ -2108,8 +2108,8 @@ static void ehea_xmit3(struct sk_buff *skb, struct net_device *dev,
 	} else {
 		/* first copy data from the skb->data buffer ... */
 		skb_copy_from_linear_data(skb, imm_data,
-					  skb->len - skb->data_len);
-		imm_data += skb->len - skb->data_len;
+					  skb_headlen(skb));
+		imm_data += skb_headlen(skb);
 
 		/* ... then copy data from the fragments */
 		for (i = 0; i < nfrags; i++) {
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 3267b23..6c18834 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2148,7 +2148,7 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int i;
 	u32 offset = 0;
 	u32 bcnt;
-	u32 size = skb->len-skb->data_len;
+	u32 size = skb_headlen(skb);
 	u32 entries = (size >> NV_TX2_TSO_MAX_SHIFT) + ((size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
 	u32 empty_slots;
 	struct ring_desc* put_tx;
@@ -2269,7 +2269,7 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 	unsigned int i;
 	u32 offset = 0;
 	u32 bcnt;
-	u32 size = skb->len-skb->data_len;
+	u32 size = skb_headlen(skb);
 	u32 entries = (size >> NV_TX2_TSO_MAX_SHIFT) + ((size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
 	u32 empty_slots;
 	struct ring_desc_ex* put_tx;
diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
index 960e985..f484161 100644
--- a/drivers/net/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ixgbevf/ixgbevf_main.c
@@ -604,7 +604,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * packets not getting split correctly
 		 */
 		if (staterr & IXGBE_RXD_STAT_LB) {
-			u32 header_fixup_len = skb->len - skb->data_len;
+			u32 header_fixup_len = skb_headlen(skb);
 			if (header_fixup_len < 14)
 				skb_push(skb, header_fixup_len);
 		}
diff --git a/drivers/net/ksz884x.c b/drivers/net/ksz884x.c
index 4a231bd..cc0bc8a 100644
--- a/drivers/net/ksz884x.c
+++ b/drivers/net/ksz884x.c
@@ -4684,7 +4684,7 @@ static void send_packet(struct sk_buff *skb, struct net_device *dev)
 		int frag;
 		skb_frag_t *this_frag;
 
-		dma_buf->len = skb->len - skb->data_len;
+		dma_buf->len = skb_headlen(skb);
 
 		dma_buf->dma = pci_map_single(
 			hw_priv->pdev, skb->data, dma_buf->len,
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 958dc28..e0b47cc 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -2757,7 +2757,7 @@ again:
 	}
 
 	/* map the skb for DMA */
-	len = skb->len - skb->data_len;
+	len = skb_headlen(skb);
 	idx = tx->req & tx->mask;
 	tx->info[idx].skb = skb;
 	bus = pci_map_single(mgp->pdev, skb->data, len, PCI_DMA_TODEVICE);
diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index bab0061..f155928 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -2400,7 +2400,7 @@ static struct sk_buff *s2io_txdl_getskb(struct fifo_info *fifo_data,
 		return NULL;
 	}
 	pci_unmap_single(nic->pdev, (dma_addr_t)txds->Buffer_Pointer,
-			 skb->len - skb->data_len, PCI_DMA_TODEVICE);
+			 skb_headlen(skb), PCI_DMA_TODEVICE);
 	frg_cnt = skb_shinfo(skb)->nr_frags;
 	if (frg_cnt) {
 		txds++;
@@ -4202,7 +4202,7 @@ static netdev_tx_t s2io_xmit(struct sk_buff *skb, struct net_device *dev)
 		txdp->Control_2 |= TXD_VLAN_TAG(vlan_tag);
 	}
 
-	frg_len = skb->len - skb->data_len;
+	frg_len = skb_headlen(skb);
 	if (offload_type == SKB_GSO_UDP) {
 		int ufo_size;
 
diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
index a38aede..93affdc 100644
--- a/drivers/net/tehuti.c
+++ b/drivers/net/tehuti.c
@@ -1508,7 +1508,7 @@ bdx_tx_map_skb(struct bdx_priv *priv, struct sk_buff *skb,
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 	int i;
 
-	db->wptr->len = skb->len - skb->data_len;
+	db->wptr->len = skb_headsize(skb);
 	db->wptr->addr.dma = pci_map_single(priv->pdev, skb->data,
 					    db->wptr->len, PCI_DMA_TODEVICE);
 	pbl->len = CPU_CHIP_SWAP32(db->wptr->len);
diff --git a/drivers/net/tsi108_eth.c b/drivers/net/tsi108_eth.c
index 1292d23..a03730b 100644
--- a/drivers/net/tsi108_eth.c
+++ b/drivers/net/tsi108_eth.c
@@ -704,8 +704,8 @@ static int tsi108_send_packet(struct sk_buff * skb, struct net_device *dev)
 
 		if (i == 0) {
 			data->txring[tx].buf0 = dma_map_single(NULL, skb->data,
-					skb->len - skb->data_len, DMA_TO_DEVICE);
-			data->txring[tx].len = skb->len - skb->data_len;
+					skb_headlen(skb), DMA_TO_DEVICE);
+			data->txring[tx].len = skb_headlen(skb);
 			misc |= TSI108_TX_SOF;
 		} else {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1];



^ permalink raw reply related

* Re: [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 20:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271191086.16881.570.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 10:38:06PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit :
> > On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > > Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> > > 
> > > > > When a socket with inflight tx packets is closed, we dont block the
> > > > > close, we only delay the socket freeing once all packets were delivered
> > > > > and freed.
> > > > > 
> > > > 
> > > > Which is wrong, since this is under userspace control, so you get
> > > > unkillable processes.
> > > > 
> > > 
> > > We do not get unkillable processes, at least with sockets I was thinking
> > > about (TCP/UDP ones).
> > > 
> > > Maybe tun sockets can behave the same ?
> > 
> > Looks like that's what my patch does: ip_rcv seems to call
> > skb_orphan too.
> 
> Well, I was speaking of tx side, you speak of receiving side.

Point is, both ip_rcv and my patch call skb_orphan.

> An external flood (coming from another domain) is another problem.
> 
> A sender might flood the 'network' inside our domain. How can we
> reasonably limit the sender ?
> 
> Maybe the answer is 'We can not', but it should be stated somewhere, so
> that someone can address this point later.
> 

And whatever's done should ideally work for tap to IP
and IP to IP sockets as well, not just tap to tap.

-- 
MST

^ permalink raw reply

* Re: [PATCH 2/3] cxgb4i: main driver files
From: Mike Christie @ 2010-04-13 20:41 UTC (permalink / raw)
  To: open-iscsi
  Cc: Rakesh Ranjan, netdev, linux-scsi, linux-kernel, kxie, davem,
	James.Bottomley
In-Reply-To: <1270728855-20951-3-git-send-email-rakesh@chelsio.com>

On 04/08/2010 07:14 AM, Rakesh Ranjan wrote:
> +static inline int cxgb4i_ddp_gl_map(struct pci_dev *pdev,
> +				struct cxgb4i_gather_list *gl)
> +{
> +	int i;
> +
> +	for (i = 0; i<  gl->nelem; i++) {
> +		gl->phys_addr[i] = pci_map_page(pdev, gl->pages[i], 0,
> +						PAGE_SIZE,

Hey Rakesh,

I guess we are trying to move away from the pci mapping functions move 
to the dma ones. On your next submission, could you fix those up too?

^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Eric Dumazet @ 2010-04-13 20:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <20100413202548.GA3582@redhat.com>

Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit :
> On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> > 
> > > > When a socket with inflight tx packets is closed, we dont block the
> > > > close, we only delay the socket freeing once all packets were delivered
> > > > and freed.
> > > > 
> > > 
> > > Which is wrong, since this is under userspace control, so you get
> > > unkillable processes.
> > > 
> > 
> > We do not get unkillable processes, at least with sockets I was thinking
> > about (TCP/UDP ones).
> > 
> > Maybe tun sockets can behave the same ?
> 
> Looks like that's what my patch does: ip_rcv seems to call
> skb_orphan too.

Well, I was speaking of tx side, you speak of receiving side.
An external flood (coming from another domain) is another problem.

A sender might flood the 'network' inside our domain. How can we
reasonably limit the sender ?

Maybe the answer is 'We can not', but it should be stated somewhere, so
that someone can address this point later.




^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 20:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271183463.16881.545.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> 
> > > When a socket with inflight tx packets is closed, we dont block the
> > > close, we only delay the socket freeing once all packets were delivered
> > > and freed.
> > > 
> > 
> > Which is wrong, since this is under userspace control, so you get
> > unkillable processes.
> > 
> 
> We do not get unkillable processes, at least with sockets I was thinking
> about (TCP/UDP ones).
> 
> Maybe tun sockets can behave the same ?

Looks like that's what my patch does: ip_rcv seems to call
skb_orphan too.

> Herbert Acked your patch, so I guess its OK, but I think it can be
> dangerous.
> Anyway my feeling is that we try to add various mechanisms to keep a
> hostile user flooding another one.
> 
> For example, UDP got memory accounting quite recently, and we added
> socket backlog limits very recently. It was considered not needed few
> years ago.
> 




^ permalink raw reply

* usb-sound circular locking again?
From: Richard Zidlicky @ 2010-04-13 20:30 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <s5hocm9som6.wl%tiwai@suse.de>

Hi,

is this the same old issue? Any way to fix it? Seeing it triggered in a sync
syscall does not make me comfortable.

Apr 13 02:01:36 localhost kernel: [ 8569.449882] PM: Syncing filesystems ... 
Apr 13 02:01:36 localhost kernel: [ 8569.449998] =======================================================
Apr 13 02:01:36 localhost kernel: [ 8569.450049] [ INFO: possible circular locking dependency detected ]
Apr 13 02:01:36 localhost kernel: [ 8569.450078] 2.6.33.2v2 #4
Apr 13 02:01:36 localhost kernel: [ 8569.450101] -------------------------------------------------------
Apr 13 02:01:36 localhost kernel: [ 8569.450130] pm-hibernate/17348 is trying to acquire lock:
Apr 13 02:01:36 localhost kernel: [ 8569.450158]  (mutex){+.+...}, at: [<c04e6670>] sync_filesystems+0x14/0xd6
Apr 13 02:01:36 localhost kernel: [ 8569.450252] 
Apr 13 02:01:36 localhost kernel: [ 8569.450253] but task is already holding lock:
Apr 13 02:01:36 localhost kernel: [ 8569.450266]  (pm_mutex){+.+.+.}, at: [<c0466658>] hibernate+0x13/0x18d
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] which lock already depends on the new lock.
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] the existing dependency chain (in reverse order) is:
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] -> #6 (pm_mutex){+.+.+.}:
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0466658>] hibernate+0x13/0x18d
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c046551c>] state_store+0x56/0xa8
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c05acb19>] kobj_attr_store+0x1a/0x22
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c050f306>] sysfs_write_file+0xb9/0xe4
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c04cc821>] vfs_write+0x84/0xdf
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c04cc915>] sys_write+0x3b/0x60
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] -> #5 (s_active){++++.+}:
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c05102f8>] sysfs_addrm_finish+0x89/0xde
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c050eaf7>] sysfs_hash_and_remove+0x3d/0x4f
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0511100>] sysfs_remove_group+0x74/0xa3
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c062e16c>] dpm_sysfs_remove+0x10/0x12
Apr 13 09:39:32 localhost kernel: [ 8569.450266]        [<c062933f>] device_del+0x33/0x154
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0629488>] device_unregister+0x28/0x4b
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c067b7c5>] usb_remove_ep_devs+0x15/0x1f
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0675c92>] remove_intf_ep_devs+0x21/0x32
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0676d53>] usb_set_interface+0x18c/0x22c
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<f8302c46>] snd_usb_capture_close+0x26/0x3f [snd_usb_audio]
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<f80fbb08>] snd_pcm_release_substream+0x3d/0x66 [snd_pcm]
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<f80fbb8d>] snd_pcm_release+0x5c/0x9e [snd_pcm]
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04cd12a>] __fput+0xf0/0x187
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04cd1da>] fput+0x19/0x1b
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04b2e9f>] remove_vma+0x3e/0x5d
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04b3b2a>] do_munmap+0x23c/0x259
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04b3b77>] sys_munmap+0x30/0x3f
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.450266] 
Apr 13 09:39:34 localhost kernel: [ 8569.450266] -> #4 (&pcm->open_mutex){+.+.+.}:
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<f80fbb86>] snd_pcm_release+0x55/0x9e [snd_pcm]
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04cd12a>] __fput+0xf0/0x187
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04cd1da>] fput+0x19/0x1b
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04b2e9f>] remove_vma+0x3e/0x5d
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04b3b2a>] do_munmap+0x23c/0x259
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04b3b77>] sys_munmap+0x30/0x3f
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.455127] 
Apr 13 09:39:34 localhost kernel: [ 8569.455127] -> #3 (&mm->mmap_sem){++++++}:
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c04add1a>] might_fault+0x64/0x81
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c05b3828>] copy_to_user+0x2c/0xfc
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c04d784b>] filldir64+0x97/0xcd
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c04e299c>] dcache_readdir+0x5a/0x1af
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04d7a5d>] vfs_readdir+0x68/0x94
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04d7aec>] sys_getdents64+0x63/0xa0
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.456129] 
Apr 13 09:39:34 localhost kernel: [ 8569.456129] -> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}:
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c051164f>] devpts_get_sb+0x1c0/0x29f
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04ce0db>] vfs_kern_mount+0x86/0x11f
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04ce1b8>] do_kern_mount+0x32/0xbe
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04e02c2>] do_mount+0x671/0x6d0
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04e0382>] sys_mount+0x61/0x8f
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.456129] 
Apr 13 09:39:34 localhost kernel: [ 8569.456129] -> #1 (&type->s_umount_key#19){++++..}:
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c0737310>] down_read+0x31/0x45
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e66cf>] sync_filesystems+0x73/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e676e>] sys_sync+0x11/0x2d
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.458127] 
Apr 13 09:39:34 localhost kernel: [ 8569.458127] -> #0 (mutex){+.+...}:
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b517>] __lock_acquire+0x93a/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e6670>] sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e676e>] sys_sync+0x11/0x2d
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04666c2>] hibernate+0x7d/0x18d
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c046551c>] state_store+0x56/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c05acb19>] kobj_attr_store+0x1a/0x22
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c050f306>] sysfs_write_file+0xb9/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c04cc821>] vfs_write+0x84/0xdf
Apr 13 09:39:34 localhost kernel: [ 8569.460128]        [<c04cc915>] sys_write+0x3b/0x60
Apr 13 09:39:34 localhost kernel: [ 8569.460128]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.460128] 
Apr 13 09:39:34 localhost kernel: [ 8569.460128] other info that might help us debug this:
Apr 13 09:39:34 localhost kernel: [ 8569.460128] 
Apr 13 09:39:34 localhost kernel: [ 8569.460128] 4 locks held by pm-hibernate/17348:
Apr 13 09:39:34 localhost kernel: [ 8569.460128]  #0:  (&buffer->mutex){+.+.+.}, at: [<c050f272>] sysfs_write_file+0x25/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.460128]  #1:  (s_active){++++.+}, at: [<c0510544>] sysfs_get_active_two+0x16/0x36
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  #2:  (s_active){++++.+}, at: [<c051054f>] sysfs_get_active_two+0x21/0x36
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  #3:  (pm_mutex){+.+.+.}, at: [<c0466658>] hibernate+0x13/0x18d
Apr 13 09:39:34 localhost kernel: [ 8569.461127] 
Apr 13 09:39:34 localhost kernel: [ 8569.461127] stack backtrace:
Apr 13 09:39:34 localhost kernel: [ 8569.461127] Pid: 17348, comm: pm-hibernate Not tainted 2.6.33.2v2 #4
Apr 13 09:39:34 localhost kernel: [ 8569.461127] Call Trace:
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  [<c0735b79>] ? printk+0xf/0x16
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  [<c045a8a0>] print_circular_bug+0x90/0x9c
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c045b517>] __lock_acquire+0x93a/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c042730d>] ? update_curr+0x177/0x17f
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c0459bf5>] ? mark_lock+0x1e/0x1ea
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] ? sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] ? sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e3423>] ? bdi_alloc_queue_work+0x84/0xa0
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] ? sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e676e>] sys_sync+0x11/0x2d
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04666c2>] hibernate+0x7d/0x18d
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04654c6>] ? state_store+0x0/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c046551c>] state_store+0x56/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04654c6>] ? state_store+0x0/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c05acb19>] kobj_attr_store+0x1a/0x22
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c050f306>] sysfs_write_file+0xb9/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c050f24d>] ? sysfs_write_file+0x0/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04cc821>] vfs_write+0x84/0xdf
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04cc915>] sys_write+0x3b/0x60
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.484133] done.

Apr 13 09:39:34 localhost kernel: [ 8569.484223] Freezing user space processes ... (elapsed 0.04 seconds) done.
Apr 13 09:39:34 localhost kernel: [ 8569.528142] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Apr 13 09:39:34 localhost kernel: [ 8569.539272] PM: Preallocating image memory... done (allocated 349210 pages)
Apr 13 09:39:34 localhost kernel: [ 8583.627118] PM: Allocated 1396840 kbytes in 14.08 seconds (99.20 MB/s)

Regards,
Richard


^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Eric Dumazet @ 2010-04-13 20:01 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201
In-Reply-To: <4BC48CE0.1080504@gmail.com>

Le mardi 13 avril 2010 à 16:25 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > OK, thanks for clarification.
> > 
> > Last question, did you tried a vanilla kernel, aka 2.6.33.2 for
> > example ?
> 
> I built a Debian package from the vanilla 2.6.33.2 and installed that on 
> all nodes and tried my reproducer with the same results - nodes becoming 
> unresponsive.
> 
> I didn't try changing the tso and tx settings with the 2.6.33.2 kernel 
> though. Let me know if that would be useful (and/or if there is another 
> kernel that you would like me to test with) and I'll try to fit it in.
> 

I tried 2.6.34-rc4 (64bits) on an old machine I had lying at home.



00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
	Subsystem: ASUSTeK Computer Inc. K8N4-E or A8N-E Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Latency: 0 (250ns min, 5000ns max)
	Interrupt: pin A routed to IRQ 21
	Region 0: Memory at d4000000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at b000 [size=8]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
	Kernel driver in use: forcedeth
	Kernel modules: forcedeth

I could not reproduce the problem you have.

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 31
model name	: AMD Athlon(tm) 64 Processor 3200+
stepping	: 0
cpu MHz		: 1000.000
cache size	: 512 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good lahf_lm
bogomips	: 2010.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


RAM : 3 Gbytes 

Only strange thing I noticed is ethtool -S results with an insane tx_broadcast

# ethtool -S eth1
NIC statistics:
     tx_bytes: 90388
     tx_zero_rexmt: 348
     tx_one_rexmt: 0
     tx_many_rexmt: 0
     tx_late_collision: 0
     tx_fifo_errors: 0
     tx_carrier_errors: 0
     tx_excess_deferral: 0
     tx_retry_error: 0
     rx_frame_error: 0
     rx_extra_byte: 0
     rx_late_collision: 0
     rx_runt: 0
     rx_frame_too_long: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_align_error: 0
     rx_length_error: 0
     rx_unicast: 413
     rx_multicast: 22
     rx_broadcast: 2
     rx_packets: 437
     rx_errors_total: 0
     tx_errors_total: 0
     tx_deferral: 718
     tx_packets: 718
     rx_bytes: 718
     tx_pause: 718
     rx_pause: 718
     rx_drop_frame: 718
     tx_unicast: 15748
     tx_multicast: 5552
     tx_broadcast: 115174309658

[root@localhost ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:11:D8:9A:6D:06  
          inet adr:192.168.99.99  Bcast:192.168.99.255  Masque:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:466 errors:0 dropped:0 overruns:0 frame:0
          TX packets:354 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000 
          RX bytes:50751 (49.5 KiB)  TX bytes:92974 (90.7 KiB)
          Interruption:21 Adresse de base:0x2000 

[root@localhost ~]# grep eth1 /proc/interrupts 
 21:        954   IO-APIC-fasteoi   eth1



^ permalink raw reply

* Re: [Bugme-new] [Bug 15777] New: Changing MTU after enabling GSO/GRO breaks incoming IPv6 neighbour discovery
From: Andrew Morton @ 2010-04-13 19:37 UTC (permalink / raw)
  To: netdev; +Cc: bugzilla-daemon, bugme-daemon, roman
In-Reply-To: <bug-15777-10286@https.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 13 Apr 2010 16:17:44 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=15777
> 
>            Summary: Changing MTU after enabling GSO/GRO breaks incoming
>                     IPv6 neighbour discovery
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.33
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV6
>         AssignedTo: yoshfuji@linux-ipv6.org
>         ReportedBy: roman@rm.pp.ru
>         Regression: No
> 
> 
> I have discovered that on one machine, if I enable either GSO or GRO (with
> ethtool -K), and then change the interface MTU, the machine ceases to be IPv6
> neighbour-discoverable. After the following commands:
> 
>   ethtool -K eth0 gro on gso on # doesn't matter which of them, or both
>   ifconfig eth0 mtu 4082
> 
> the machine is no longer ping6'able from LAN by "new" hosts (which haven't seen
> it recently) -- until something ELSE is adjusted on the same interface of that
> machine, e.g. the following command helps (I don't know why, the PROMISC mode
> is already disabled when it runs):
> 
>   ifconfig eth0 -promisc
> 
> The NIC (using the "skge" driver):
> 
>   00:08.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T
> [Marvell] (rev 10)
> 
> The system is a  Debian Squeeze with 2.6.33 kernel and ethtool 2.6.33.
> 
> The issue is 100% reproducible.


^ permalink raw reply

* Re: [PATCH Resubmission] drivers/net/usb: Add new driver ipheth
From: "L. Alberto Giménez" @ 2010-04-13 19:03 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, dgiagio, dborca, gregkh, jonas.sjoquist,
	steve.glendinning, torgny.johansson, dbrownell, omar.oberthur,
	linux-usb, netdev
In-Reply-To: <20100413.011540.115903049.davem@davemloft.net>

David Miller wrote:
> Unless you add a rule to drivers/net/Makefile, the build won't
> actually get to your driver unless one of the other USB networking
> devices are configured.
>   

Hi David,

Thanks for the info. I didn't know that I had to add an entry on the 
upper level Makefile. I guess that something like 
obj-$(CONFIG_USB_IPHETH) += usb/ should be enough? (I got it from the 
other USB net drivers).

I won't be able to work on it today, I've been very busy today and can't 
look into this, but I've not given up :)

> Please fix this up and resubmit. 

I have also in my queue a patch sent from upstream to fix the latest 
issues pointed out by Roland Dreier, and I need to test it a little bit 
more.

Thanks for your comments and patience!


Best regards,
L. Alberto Giménez

^ permalink raw reply

* Re: [PATCH] Add somaxconn to Documentation/sysctl/net.txt
From: Eric Dumazet @ 2010-04-13 18:40 UTC (permalink / raw)
  To: Rob Landley; +Cc: linux-kernel, linux-doc, netdev
In-Reply-To: <201004131325.29104.rob@landley.net>

Le mardi 13 avril 2010 à 13:25 -0500, Rob Landley a écrit :
> From: Rob Landley <rob@landley.net>
> 
> Add somaxconn to Documentation/sysctl/net.txt
> 
> Signed-off-by: Rob Landley <rob@landley.net>
> ---
> 
>  Documentation/sysctl/net.txt |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index df38ef0..2740085 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -90,6 +90,12 @@ optmem_max
>  Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
>  of struct cmsghdr structures with appended data.
>  
> +somaxconn
> +---------
> +
> +Maximum backlog of unanswered connections for a listening socket.  Provides
> +an upper bound on the "backlog" parameter of the listen() syscall.
> +
>  2. /proc/sys/net/unix - Parameters for Unix domain sockets
>  -------------------------------------------------------
>  
> 

Please cc netdev for such patches

Extract of Documentation/networking/ip-sysctl.txt

somaxconn - INTEGER
	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
	Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
	for TCP sockets.

I guess you need to change both files ?



^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Eric Dumazet @ 2010-04-13 18:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <20100413173919.GC26011@redhat.com>

Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :

> > When a socket with inflight tx packets is closed, we dont block the
> > close, we only delay the socket freeing once all packets were delivered
> > and freed.
> > 
> 
> Which is wrong, since this is under userspace control, so you get
> unkillable processes.
> 

We do not get unkillable processes, at least with sockets I was thinking
about (TCP/UDP ones).

Maybe tun sockets can behave the same ?

Herbert Acked your patch, so I guess its OK, but I think it can be
dangerous.

Anyway my feeling is that we try to add various mechanisms to keep a
hostile user flooding another one.

For example, UDP got memory accounting quite recently, and we added
socket backlog limits very recently. It was considered not needed few
years ago.




^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 17:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271176838.16881.537.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 06:40:38PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
> > Michael S. Tsirkin wrote:
> > > The following situation was observed in the field:
> > > tap1 sends packets, tap2 does not consume them, as a result
> > > tap1 can not be closed.
> > 
> > And before that, tap1 may not be able to send further packets to anyone
> > else on the bridge as its TX resources were blocked by tap2 - that's
> > what we saw in the field.
> > 
> 
> After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
> send one single frame. Is it OK ?

Yes :) This was always possible. Number of senders needed to flood
a receiver might vary depending on send/recv queue size
that you set. External sources can also fill your RX queue
if you let them. In the end, we need to rely on the scheduler for fairness,
or apply packet shaping.

> Back to the problem : tap1 cannot be closed.
> 
> Why ? because of refcounts ?

Yes.

> When a socket with inflight tx packets is closed, we dont block the
> close, we only delay the socket freeing once all packets were delivered
> and freed.
> 

Which is wrong, since this is under userspace control, so you get
unkillable processes.

-- 
MST

^ permalink raw reply

* Re: [Bonding-devel] [v3 Patch 2/3] bridge: make bridge support netpoll
From: Stephen Hemminger @ 2010-04-13 17:33 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Cong Wang, Eric Dumazet, Neil Horman, netdev, Andy Gospodarek,
	bridge, linux-kernel, bonding-devel, Jeff Moyer, Matt Mackall,
	David Miller
In-Reply-To: <8304.1271177567@death.nxdomain.ibm.com>

On Tue, 13 Apr 2010 09:52:47 -0700
Jay Vosburgh <fubar@us.ibm.com> wrote:

> Cong Wang <amwang@redhat.com> wrote:
> 
> >Stephen Hemminger wrote:
> >> On Mon, 12 Apr 2010 12:38:57 +0200
> >> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> 
> >>> Le lundi 12 avril 2010 à 18:37 +0800, Cong Wang a écrit :
> >>>> Stephen Hemminger wrote:
> >>>>> There is no protection on dev->priv_flags for SMP access.
> >>>>> It would better bit value in dev->state if you are using it as control flag.
> >>>>>
> >>>>> Then you could use 
> >>>>> 			if (unlikely(test_and_clear_bit(__IN_NETPOLL, &skb->dev->state)))
> >>>>> 				netpoll_send_skb(...)
> >>>>>
> >>>>>
> >>>> Hmm, I think we can't use ->state here, it is not for this kind of purpose,
> >>>> according to its comments.
> >>>>
> >>>> Also, I find other usages of IFF_XXX flags of ->priv_flags are also using
> >>>> &, | to set or clear the flags. So there must be some other things preventing
> >>>> the race...
> >>> Yes, its RTNL that protects priv_flags changes, hopefully...
> >>>
> >> 
> >> The patch was not protecting priv_flags with RTNL.
> >> For example..
> >> 
> >> 
> >> @@ -308,7 +312,9 @@ static void netpoll_send_skb(struct netp
> >>  		     tries > 0; --tries) {
> >>  			if (__netif_tx_trylock(txq)) {
> >>  				if (!netif_tx_queue_stopped(txq)) {
> >> +					dev->priv_flags |= IFF_IN_NETPOLL;
> >>  					status = ops->ndo_start_xmit(skb, dev);
> >> +					dev->priv_flags &= ~IFF_IN_NETPOLL;
> >>  					if (status == NETDEV_TX_OK)
> >>  						txq_trans_update(txq);
> >
> >Hmm, but I checked the bonding case (IFF_BONDING), it doesn't
> >hold rtnl_lock. Strange.
> 
> 	I looked, and there are a couple of cases in bonding that don't
> have RTNL for adjusting priv_flags (in bond_ab_arp_probe when no slaves
> are up, and a couple of cases in 802.3ad).  I think the solution there
> is to move bonding away from priv_flags for some of this (e.g., convert
> bonding to use a frame hook like bridge and macvlan, and greatly
> simplify skb_bond_should_drop), but that's a separate topic.
> 
> 	The majority of the cases, however, do hold RTNL.  Bonding
> generally doesn't have to acquire RTNL itself, since whatever called
> into bonding is holding it already.  For example, the slave add and
> remove paths (bond_enslave, bond_release) are called either via sysfs or
> ioctl, both of which acquire RTNL.  All of the set and clear operations
> for IFF_BONDING fall into this category; look at bonding_store_slaves
> for an example.
> 
> 	Bonding does acquire RTNL itself when performing failovers,
> e.g., bond_mii_monitor holds RTNL prior to calling bond_miimon_commit,
> which will change priv_flags.
> 

All this was related to netpoll. And netpoll processing often needs to occur
in hard IRQ context. Therefor netpoll stuff and RTNL (which is a mutex),
really don't mix well.  Keep RTNL for what it was meant for network
reconfiguration. Don't turn it into a network special BKL.



-- 

^ permalink raw reply

* Re: Network protocol (IP,IPv6,...) and TC actions
From: Jan Ceuleers @ 2010-04-13 17:31 UTC (permalink / raw)
  To: Grégoire Baron, netdev
In-Reply-To: <20100411190845.GA18383@n7mm.org>

Grégoire Baron wrote:
> As this .protocol member seems to be used at different moments when a
> packet is received, forwared or sent, and could contain something like
> ETH_P_8021Q which isn't a network protocol Id, can we say the struct
> sk_buff .protocol member is guaranteed to contain a network protocol Id
> in the struct sb_buff used in the TC action executions ?

Grégoire,

I suggest that you ask your question on the netdev mailing list (netdev@vger.kernel.org).

Cheers, Jan

^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Xose Vazquez Perez @ 2010-04-13 17:22 UTC (permalink / raw)
  To: netdev

stephen mulcahy  wrote:

> running Hadoop[1] TeraSort[2] but I haven't identified a simpler 
> reproducer. I tried to recreate this with iperf and ping -f but neither 
> helped - it may be that the problem only occurs when systems are passing 
> large amounts of traffic and have very high cpu utilisation (when 

Did you try ISIC(IP Stack Integrity Checker)[1] tools ?

Net-drivers usually break running these tools.


[1] http://isic.sf.net needs libnet[2]
[2] http://github.com/sam-github/libnet

-- 
«Allá muevan feroz guerra, ciegos reyes por un palmo más de tierra;
que yo aquí tengo por mío cuanto abarca el mar bravío, a quien nadie
impuso leyes. Y no hay playa, sea cualquiera, ni bandera de esplendor,
que no sienta mi derecho y dé pecho a mi valor.»

^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Jan Kiszka @ 2010-04-13 16:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael S. Tsirkin, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271176838.16881.537.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
>> Michael S. Tsirkin wrote:
>>> The following situation was observed in the field:
>>> tap1 sends packets, tap2 does not consume them, as a result
>>> tap1 can not be closed.
>> And before that, tap1 may not be able to send further packets to anyone
>> else on the bridge as its TX resources were blocked by tap2 - that's
>> what we saw in the field.
>>
> 
> After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
> send one single frame. Is it OK ?

I think if that's a real issue, you have to apply traffic shaping to the
untrusted nodes. The existing flow-control scheme was fragile anyway as
you had to translate packet lengths on TX side to packet counts on RX.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [Bonding-devel] [v3 Patch 2/3] bridge: make bridge support netpoll
From: Jay Vosburgh @ 2010-04-13 16:52 UTC (permalink / raw)
  To: Cong Wang
  Cc: Stephen Hemminger, Eric Dumazet, Neil Horman, netdev,
	Andy Gospodarek, bridge, linux-kernel, bonding-devel, Jeff Moyer,
	Matt Mackall, David Miller
In-Reply-To: <4BC43214.6030009@redhat.com>

Cong Wang <amwang@redhat.com> wrote:

>Stephen Hemminger wrote:
>> On Mon, 12 Apr 2010 12:38:57 +0200
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> 
>>> Le lundi 12 avril 2010 à 18:37 +0800, Cong Wang a écrit :
>>>> Stephen Hemminger wrote:
>>>>> There is no protection on dev->priv_flags for SMP access.
>>>>> It would better bit value in dev->state if you are using it as control flag.
>>>>>
>>>>> Then you could use 
>>>>> 			if (unlikely(test_and_clear_bit(__IN_NETPOLL, &skb->dev->state)))
>>>>> 				netpoll_send_skb(...)
>>>>>
>>>>>
>>>> Hmm, I think we can't use ->state here, it is not for this kind of purpose,
>>>> according to its comments.
>>>>
>>>> Also, I find other usages of IFF_XXX flags of ->priv_flags are also using
>>>> &, | to set or clear the flags. So there must be some other things preventing
>>>> the race...
>>> Yes, its RTNL that protects priv_flags changes, hopefully...
>>>
>> 
>> The patch was not protecting priv_flags with RTNL.
>> For example..
>> 
>> 
>> @@ -308,7 +312,9 @@ static void netpoll_send_skb(struct netp
>>  		     tries > 0; --tries) {
>>  			if (__netif_tx_trylock(txq)) {
>>  				if (!netif_tx_queue_stopped(txq)) {
>> +					dev->priv_flags |= IFF_IN_NETPOLL;
>>  					status = ops->ndo_start_xmit(skb, dev);
>> +					dev->priv_flags &= ~IFF_IN_NETPOLL;
>>  					if (status == NETDEV_TX_OK)
>>  						txq_trans_update(txq);
>
>Hmm, but I checked the bonding case (IFF_BONDING), it doesn't
>hold rtnl_lock. Strange.

	I looked, and there are a couple of cases in bonding that don't
have RTNL for adjusting priv_flags (in bond_ab_arp_probe when no slaves
are up, and a couple of cases in 802.3ad).  I think the solution there
is to move bonding away from priv_flags for some of this (e.g., convert
bonding to use a frame hook like bridge and macvlan, and greatly
simplify skb_bond_should_drop), but that's a separate topic.

	The majority of the cases, however, do hold RTNL.  Bonding
generally doesn't have to acquire RTNL itself, since whatever called
into bonding is holding it already.  For example, the slave add and
remove paths (bond_enslave, bond_release) are called either via sysfs or
ioctl, both of which acquire RTNL.  All of the set and clear operations
for IFF_BONDING fall into this category; look at bonding_store_slaves
for an example.

	Bonding does acquire RTNL itself when performing failovers,
e.g., bond_mii_monitor holds RTNL prior to calling bond_miimon_commit,
which will change priv_flags.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Eric Dumazet @ 2010-04-13 16:40 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Michael S. Tsirkin, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <4BC48F79.5090409@siemens.com>

Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
> Michael S. Tsirkin wrote:
> > The following situation was observed in the field:
> > tap1 sends packets, tap2 does not consume them, as a result
> > tap1 can not be closed.
> 
> And before that, tap1 may not be able to send further packets to anyone
> else on the bridge as its TX resources were blocked by tap2 - that's
> what we saw in the field.
> 

After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
send one single frame. Is it OK ?

Back to the problem : tap1 cannot be closed.

Why ? because of refcounts ?

When a socket with inflight tx packets is closed, we dont block the
close, we only delay the socket freeing once all packets were delivered
and freed.




^ permalink raw reply

* Re: SO_REUSEADDR with UDP (again)
From: Eric Dumazet @ 2010-04-13 16:36 UTC (permalink / raw)
  To: Michal Svoboda; +Cc: netdev
In-Reply-To: <20100413162326.GD16595@myhost.felk.cvut.cz>

Le mardi 13 avril 2010 à 18:23 +0200, Michal Svoboda a écrit :
> Eric Dumazet wrote:
> > sock1 = socket(AF_INET, SOCK_DGRAM, 0);
> > setsockopt(sock1, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
> > addr.sin_addr.s_addr = htonl(0x7f000001);
> > if (bind(sock1, (struct sockaddr *)&addr, sizeof(addr)))
> > 	perror("bind1");
> > 
> > sock2 = socket(AF_INET, SOCK_DGRAM, 0);
> > setsockopt(sock2, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
> > addr.sin_addr.s_addr = INADDR_ANY; /* or htonl(0x7f000001); */
> > if (bind(sock2, (struct sockaddr *)&addr, sizeof(addr)))
> > 	perror("bind2");
> > }
> 
> Well, now if I send to 127.0.0.1, who gets the datagram? I guess sock2,
> so it steals from sock1. What practical use does this have?
> 

No, sock1 will get the frame.

In udp receive (kernel), we chose the socket with highest score.
A socket bound to an IP address (not 0.0.0.0) has a bonus.
A connected socket has an extra bonus.
A socket bound to a device has an extra bonus.

> > Therefore, applications should not use REUSEADDR on unicast UDP, unless
> > it is a non security issue (for example, if it is able to react to any
> > new IP addresses added by the administrator on the machine, and complain
> > loudly if another application could bind() before itself)
> 
> I don't think that in that case REUSEADDR would be useful because you
> can already claim new addresses without it, either by binding a separate
> socket to each IP or by binding to 0.0.0.0. Moreover the detection of
> the "complain" case would be very tricky, at least on first sight.
> 
> > REUSADDR has a meaning for multicast, but for unicast... this is hardly
> > useful ?
> 
> So would it be somehow possible to deliver the datagram to both sockets
> (for example if they would be SO_BROADCAST as well)?

Not without a change in kernel. AFAIK no other OS do that anyway.



^ permalink raw reply

* RE: [Patch 3/3] net: reserve ports for applications using fixed port numbers
From: Sean Hefty @ 2010-04-13 16:32 UTC (permalink / raw)
  To: 'Tetsuo Handa', amwang, rolandd
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel
In-Reply-To: <201004132207.GAJ52684.OJFtMQVFHOSFLO@I-love.SAKURA.ne.jp>

>Sean and Roland, is below patch correct?
>inet_is_reserved_local_port() is the new function proposed in this patchset.

It looks correct to me.  I didn't test the patch series, but if I comment out
the call to inet_is_reserved_local_port() in the provided below, the changes
worked fine for me.

Acked-by: Sean Hefty <sean.hefty@intel.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox