Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 7/8] trivial: fix many typos s/untill/until/
From: Ivo van Doorn @ 2009-10-16 10:04 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: trivial, linux-rdma, linux-kernel, bonding-devel, netdev,
	linux-wireless, users, linux-scsi, devel, linux-ext4,
	linux-bluetooth, linux-sctp
In-Reply-To: <1255638072-6236-1-git-send-email-cascardo@holoscopio.com>

On Thursday 15 October 2009, Thadeu Lima de Souza Cascardo wrote:
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>

For rt2800usb

Acked-by: Ivo van Doorn <IvDoorn@gmail.com>

> ---
>  drivers/infiniband/ulp/iser/iser_verbs.c |    2 +-
>  drivers/net/bonding/bond_alb.c           |    2 +-
>  drivers/net/wireless/rt2x00/rt2800usb.c  |    2 +-
>  drivers/scsi/bnx2i/bnx2i_iscsi.c         |    2 +-
>  drivers/staging/rtl8187se/r8180.h        |    2 +-
>  fs/ext4/balloc.c                         |    2 +-
>  net/bluetooth/bnep/core.c                |    2 +-
>  net/sctp/sm_statefuns.c                  |    2 +-
>  8 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
> index ea9e155..8579f32 100644
> --- a/drivers/infiniband/ulp/iser/iser_verbs.c
> +++ b/drivers/infiniband/ulp/iser/iser_verbs.c
> @@ -499,7 +499,7 @@ void iser_conn_init(struct iser_conn *ib_conn)
>  
>   /**
>   * starts the process of connecting to the target
> - * sleeps untill the connection is established or rejected
> + * sleeps until the connection is established or rejected
>   */
>  int iser_connect(struct iser_conn   *ib_conn,
>  		 struct sockaddr_in *src_addr,
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index 9b5936f..a1e7eb9 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -559,7 +559,7 @@ static void rlb_update_rx_clients(struct bonding *bond)
>  		}
>  	}
>  
> -	/* do not update the entries again untill this counter is zero so that
> +	/* do not update the entries again until this counter is zero so that
>  	 * not to confuse the clients.
>  	 */
>  	bond_info->rlb_update_delay_counter = RLB_UPDATE_DELAY;
> diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
> index a084077..449886c 100644
> --- a/drivers/net/wireless/rt2x00/rt2800usb.c
> +++ b/drivers/net/wireless/rt2x00/rt2800usb.c
> @@ -1257,7 +1257,7 @@ static int rt2800usb_init_registers(struct rt2x00_dev *rt2x00dev)
>  	unsigned int i;
>  
>  	/*
> -	 * Wait untill BBP and RF are ready.
> +	 * Wait until BBP and RF are ready.
>  	 */
>  	for (i = 0; i < REGISTER_BUSY_COUNT; i++) {
>  		rt2x00usb_register_read(rt2x00dev, MAC_CSR0, &reg);
> diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
> index cafb888..10110be 100644
> --- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
> +++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
> @@ -1883,7 +1883,7 @@ static void bnx2i_ep_disconnect(struct iscsi_endpoint *ep)
>  
>  	bnx2i_ep = ep->dd_data;
>  
> -	/* driver should not attempt connection cleanup untill TCP_CONNECT
> +	/* driver should not attempt connection cleanup until TCP_CONNECT
>  	 * completes either successfully or fails. Timeout is 9-secs, so
>  	 * wait for it to complete
>  	 */
> diff --git a/drivers/staging/rtl8187se/r8180.h b/drivers/staging/rtl8187se/r8180.h
> index 8216d7e..35ed60b 100644
> --- a/drivers/staging/rtl8187se/r8180.h
> +++ b/drivers/staging/rtl8187se/r8180.h
> @@ -521,7 +521,7 @@ typedef struct r8180_priv
>  	//u32 NumTxOkInPeriod;   //YJ,del,080828
>  	u8   TxPollingTimes;
>  
> -	bool	bApBufOurFrame;// TRUE if AP buffer our unicast data , we will keep eAwake untill receive data or timeout.
> +	bool	bApBufOurFrame;// TRUE if AP buffer our unicast data , we will keep eAwake until receive data or timeout.
>  	u8	WaitBufDataBcnCount;
>  	u8	WaitBufDataTimeOut;
>  
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 1d04189..2565b8c 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -519,7 +519,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
>  		metadata = 1;
>  
>  	/* We need to make sure we don't reuse
> -	 * block released untill the transaction commit.
> +	 * block released until the transaction commit.
>  	 * writeback mode have weak data consistency so
>  	 * don't force data as metadata when freeing block
>  	 * for writeback mode.
> diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
> index 1bd9398..9ac0497 100644
> --- a/net/bluetooth/bnep/core.c
> +++ b/net/bluetooth/bnep/core.c
> @@ -629,7 +629,7 @@ int bnep_del_connection(struct bnep_conndel_req *req)
>  	s = __bnep_get_session(req->dst);
>  	if (s) {
>  		/* Wakeup user-space which is polling for socket errors.
> -		 * This is temporary hack untill we have shutdown in L2CAP */
> +		 * This is temporary hack until we have shutdown in L2CAP */
>  		s->sock->sk->sk_err = EUNATCH;
>  
>  		/* Kill session thread */
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index c8fae19..ba2f66d 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -3569,7 +3569,7 @@ sctp_disposition_t sctp_sf_do_asconf(const struct sctp_endpoint *ep,
>  	 * To do this properly, we'll set the destination address of the chunk
>  	 * and at the transmit time, will try look up the transport to use.
>  	 * Since ASCONFs may be bundled, the correct transport may not be
> -	 * created untill we process the entire packet, thus this workaround.
> +	 * created until we process the entire packet, thus this workaround.
>  	 */
>  	asconf_ack->dest = chunk->source;
>  	sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(asconf_ack));



^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Ilpo Järvinen @ 2009-10-16 10:08 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, Netdev, Eric Dumazet
In-Reply-To: <Pine.LNX.4.58.0910140912420.2714@u.domain.uli>

On Wed, 14 Oct 2009, Julian Anastasov wrote:

> 
> 	Hello,
> 
> On Wed, 14 Oct 2009, Willy Tarreau wrote:
> 
> > > > I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> > > > client does not talk, the connection is never accepted and
> > > > remains in SYN_RECV state until the retransmits expire, where
> > > > it finally is deleted. This is bad when some firewall such as
> > > 
> > > 	I think, this is by design, there is big comment in
> > > tcp_check_req().
> > 
> > I'm not sure. That would considerably reduce the usefulness of
> > the feature. The comment I see there is just a one line explaining
> > why we drop the ACK. It does not indicate any strategy on what to
> > do when the counter expires.
> 
> 	There were attempts to switch the server socket to
> established state, no SYN-ACK retransmissions after client ACK
> and send FIN (or RST) to client on TCP_DEFER_ACCEPT expiration
> but due to some locking problems it was reverted:
> 
> http://marc.info/?t=121318919000001&r=1&w=2

For the record,

I think you underestimate that particular change by putting it like that. 
After being there and figuring out all the what that change did I'd say it 
attempted even more ambitious goal, ie., to allow ofo data to be queued. 
And that was the cause for all the troubles as the full state was created 
without a wakeup resulting in plurality in locks that one had to acquire 
when waking up and all the havoc broke loose due to lack of locking (or 
deadlockish ordering)...

...So the problem is not in the "ESTABLISHED" (or like) state itself, but 
in the fact that full state was created before the wakeup which needs to 
access the listening socket state too.

-- 
 i.

^ permalink raw reply

* [PATCH 0/2] Reduce number of GFP_ATOMIC allocation failures
From: Mel Gorman @ 2009-10-16 10:37 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: Rafael J. Wysocki, David Miller, Frans Pop, reinette chatre,
	Kalle Valo, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Karol Lewandowski, netdev,
	linux-kernel, linux-mm@kvack.org", Mel Gorman

The following two patches against 2.6.32-rc4 should reduce allocation
failure reports for GFP_ATOMIC allocations that have being cropping up
since 2.6.31-rc1.

I believe these are candidates for -stable as they address issues in the
following commit introduced in the v2.6.30..v2.6.31-rc1 window

11e33f6 page allocator: break up the allocator entry point into fast and slow paths

The patch should not have made functional changes but at least two slipped
by. The first patch wakes kswapd up each time after OOM-killing has been
considered. This can be important to high-order allocations where kswapd
needs to reclaim at a higher order. The second patch corrects a problem
whereby a process that is exiting and making a __GFP_NOFAIL allocation can
ignore watermarks.

These patches in combination should help reduce the number of GFP_ATOMIC
failures in the following bug.

[Bug #14141] order 2 page allocation failures in iwlagn

However, this bug should not yet be closed as there are still problems in
the driver itself that increase the number of GFP_ATOMIC allocations that bug.

The patches should also help the following bugs as well and testing there
would be appreciated.

[Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100

It might also have helped the following bug although that driver has already
been fixed by not making high-order atomic allocations.

[Bug #14016] mm/ipw2200 regression

 mm/page_alloc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 1/2] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
From: Mel Gorman @ 2009-10-16 10:37 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: Rafael J. Wysocki, David Miller, Frans Pop, reinette chatre,
	Kalle Valo, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Karol Lewandowski, netdev,
	linux-kernel, linux-mm@kvack.org", Mel Gorman
In-Reply-To: <1255689446-3858-1-git-send-email-mel@csn.ul.ie>

If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
application afterwards. In times past, this would always wake kswapd as well
but currently, kswapd is not woken up after direct reclaim fails. For order-0
allocations, this makes little difference but if there is a heavy mix of
higher-order allocations that direct reclaim is failing for, it might mean
that kswapd is not rewoken for higher orders as much as it did previously.

This patch wakes up kswapd when an allocation is being retried after a direct
reclaim failure. It would be expected that kswapd is already awake, but
this has the effect of telling kswapd to reclaim at the higher order as well.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..dfa4362 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;

+restart:
 	wake_all_kswapd(order, zonelist, high_zoneidx);

-restart:
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
 	 * reclaim. Now things get more complex, so set up alloc_flags according
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 2/2] page allocator: Direct reclaim should always obey watermarks
From: Mel Gorman @ 2009-10-16 10:37 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: Rafael J. Wysocki, David Miller, Frans Pop, reinette chatre,
	Kalle Valo, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Karol Lewandowski, netdev,
	linux-kernel, linux-mm@kvack.org", Mel Gorman
In-Reply-To: <1255689446-3858-1-git-send-email-mel@csn.ul.ie>

ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the
free-lists after a direct reclaim. If it's not, __GFP_NOFAIL allocations
from a process that is exiting can ignore watermarks. __GFP_NOFAIL is not
often used but the journal layer is one of those places. This is suspected of
causing an increase in the number of GFP_ATOMIC allocation failures reported.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfa4362..a3e5fed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1860,7 +1860,8 @@ rebalance:
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
 					zonelist, high_zoneidx,
 					nodemask,
-					alloc_flags, preferred_zone,
+					alloc_flags & ~ALLOC_NO_WATERMARKS,
+					preferred_zone,
 					migratetype, &did_some_progress);
 	if (page)
 		goto got_pg;
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Eric Dumazet @ 2009-10-16 10:40 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, netdev
In-Reply-To: <Pine.LNX.4.58.0910161137490.3886@u.domain.uli>

Julian Anastasov a écrit :
> 	Hello,
> 
> On Fri, 16 Oct 2009, Willy Tarreau wrote:
> 
>>> 	This will need little change in inet_csk_reqsk_queue_prune()
>>> but it saves SYN-ACK traffic during deferring period in the
>>> common case when client sends ACK. If such compromise is
>>> acceptable I can prepare and test some patch.
>> I would personally like this a lot ! This will satisfy people who
>> expect it to establish at the end of the "TCP_DEFER_ACCEPT delay"
>> as can be interpreted from the man page, will reduce the number of
>> useless SYN-ACKs that annoy other people while still making no
>> visible change for anyone who would rely on the current behaviour.
> 
> 	OK, I don't have much time now, this is what I'm
> going to test later today and later can provide proper comments:
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

I tested both patches and they perform very well, thank you !

For the minimum 1 sec value, tcpdump looks like :
12:32:03.850456 IP 127.0.0.1.20000 > 127.0.0.1.2222: S 1879889239:1879889239(0) win 32792 <mss 16396,nop,nop,timestamp 952803 0,nop,wscale 6>
12:32:03.850463 IP 127.0.0.1.2222 > 127.0.0.1.20000: S 1890330616:1890330616(0) ack 1879889240 win 32768 <mss 16396,nop,nop,timestamp 952803 952803,nop,wscale 6>
12:32:03.850469 IP 127.0.0.1.20000 > 127.0.0.1.2222: . ack 1 win 513 <nop,nop,timestamp 952803 952803>

12:32:06.849989 IP 127.0.0.1.2222 > 127.0.0.1.20000: S 1890330616:1890330616(0) ack 1879889240 win 32768 <mss 16396,nop,nop,timestamp 955803 952803,nop,wscale 6>
12:32:06.849996 IP 127.0.0.1.20000 > 127.0.0.1.2222: . ack 1 win 513 <nop,nop,timestamp 955803 955803>

So listening application gets the accept() 3 seconds after initial SYN


# ss -emoian | grep SYN-RECV
SYN-RECV   0      0                 127.0.0.1:2222             127.0.0.1:20000  timer:(on,24sec,4) ino:0 sk:f6f0ec80

I wonder if tcp_diag should be extented a bit to reflect fact that the ACK was received from client
(ie forward the inet_rsk(req)->acked information to idiag_rqueue)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index cb73fde..c172bd4 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -589,7 +589,7 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
 	r->id.idiag_src[0] = ireq->loc_addr;
 	r->id.idiag_dst[0] = ireq->rmt_addr;
 	r->idiag_expires = jiffies_to_msecs(tmo);
- 	r->idiag_rqueue = 0;
+	r->idiag_rqueue = ireq->acked;
 	r->idiag_wqueue = 0;
 	r->idiag_uid = sock_i_uid(sk);
 	r->idiag_inode = 0;



-> 
# ss -emoian | grep SYN-RECV
SYN-RECV   1      0                 127.0.0.1:2222             127.0.0.1:20000  timer:(on,24sec,4) ino:0 sk:f6f0ec80


^ permalink raw reply related

* Re: [PATCH 1/2] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
From: Pekka Enberg @ 2009-10-16 10:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, Rafael J. Wysocki, David Miller, Frans Pop,
	reinette chatre, Kalle Valo, John W. Linville,
	Bartlomiej Zolnierkiewicz, Karol Lewandowski, netdev,
	linux-kernel, linux-mm@kvack.org"
In-Reply-To: <1255689446-3858-2-git-send-email-mel@csn.ul.ie>

On Fri, Oct 16, 2009 at 1:37 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> If a direct reclaim makes no forward progress, it considers whether it
> should go OOM or not. Whether OOM is triggered or not, it may retry the
> application afterwards. In times past, this would always wake kswapd as well
> but currently, kswapd is not woken up after direct reclaim fails. For order-0
> allocations, this makes little difference but if there is a heavy mix of
> higher-order allocations that direct reclaim is failing for, it might mean
> that kswapd is not rewoken for higher orders as much as it did previously.
>
> This patch wakes up kswapd when an allocation is being retried after a direct
> reclaim failure. It would be expected that kswapd is already awake, but
> this has the effect of telling kswapd to reclaim at the higher order as well.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

> ---
>  mm/page_alloc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf72055..dfa4362 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>        if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
>                goto nopage;
>
> +restart:
>        wake_all_kswapd(order, zonelist, high_zoneidx);
>
> -restart:
>        /*
>         * OK, we're below the kswapd watermark and have kicked background
>         * reclaim. Now things get more complex, so set up alloc_flags according
> --
> 1.6.3.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/4 v2] net: Use sk_tx_queue_mapping for connected sockets
From: Krishna Kumar2 @ 2009-10-16 12:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, herbert, netdev
In-Reply-To: <4AD84188.7030804@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/16/2009 03:18:56 PM:

> Are we sure that for selection done by skb_tx_hash(dev, skb),
> rx packets will use the same queue/cpu ?
>
> Probably not, since it uses sk->sk_hash (tcp/udp port) :

Yes, I expect the same.

> If NIC has some proprietary hash, and selects rx queue 3 for feeding us
> packets, it would be nice to also use tx queue 3 for transmit.

I had written: "This is not saved if either skb rx was recorded, or
...". It is somewhat misleading, and I will try to correct it. The
idea is that if the same incoming skb is used for response after the
connection is setup, which I think is not possible in this case, then
txq will be same as rxq. See more below...

> We would have to record in sk the rx queue chosen by the device
> when processing SYN / SYN-ACK packet for example for tcp flows.

I wait for tcp_v4_syn_recv_sock to finish and create a connection.
sk_setup_caps/__sk_dst_set are called for the new socket resulting
in sk_dst_cache getting set (I cannot set the cache till the
connection is finished and dst is set). I guess that means that I
will wait for another skb to come and the response will use skb_tx_hash
instead of using the rx recorded.

Will this suffice, but if not, is it something I should handle or an
add-on patch should do instead?

Thanks,

- KK

^ permalink raw reply

* Re: [PATCH 1/4 v2] net: Introduce sk_tx_queue_mapping
From: Krishna Kumar2 @ 2009-10-16 12:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, herbert, netdev
In-Reply-To: <4AD84375.9020400@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/16/2009 03:27:09 PM:

>
> > Introduce sk_tx_queue_mapping; and functions that set, test and get
> > this value. Reset sk_tx_queue_mapping to -1 whenever the dst cache
> > is set/reset, and in socket alloc & free (free probably doesn't need
> > it).
> >
>
> Could you please use an u16 instead, and take the convention of 0
> being the 'unitialized value' ?

I wanted to use a signed number to avoid doing an unnecessary sub
on every iteration. I feel u16 is good for incoming path since
skb->queue_mapping is set to 0 by default, and only those drivers
doing rx recording needs to set this value. So queue_mapping reqd
the logic of using !0 to mean rx is set and the subsequent sub on
all retrieves. But on tx path where sk->txq# is saved permanently,
I feel it is not required.

If that sounds reasonable, I can change to a signed short.

> And define sk_tx_queue_clear(sk) instead of sk_record_tx_queue(sk, -1);
>
> I also suggest using following names :
>
> static inline void sk_tx_queue_set(struct sock *sk, u16 tx_queue)
> {
>    sk->sk_tx_queue_mapping = tx_queue + 1;
> }
>
> static inline u16 sk_tx_queue_get(const struct sock *sk)
> {
>    return sk->sk_tx_queue_mapping - 1;
> }
>
> static inline u16 sk_tx_queue_clear(struct sock *sk) // or _reset
> {
>    sk->sk_tx_queue_mapping = 0;
> }
>
> static inline bool sk_tx_queue_recorded(const struct sock *sk)
> {
>    return (sk && sk->sk_tx_queue_mapping > 0);
> }
> >
> > @@ -1016,6 +1019,8 @@ static void sk_prot_free(struct proto *p
> >     struct kmem_cache *slab;
> >     struct module *owner;
> >
> > +   sk_record_tx_queue(sk, -1);
> > +
> >     owner = prot->owner;
> >     slab = prot->slab;
> >
>
> This is not necessary, we are going to kfree(sk) anyway !

Yes, will make these changes, but please let me know your opinion on
"return sk->sk_tx_queue_mapping - 1" vs "return sk->sk_tx_queue_mapping".

Thanks,

- KK

^ permalink raw reply

* Re: [PATCH 7/8] trivial: fix many typos s/untill/until/
From: Jiri Kosina @ 2009-10-16 13:24 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bonding-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	users-IdT//7ivld+3fTS+yoWyeR8Sddau0WhX,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	linux-sctp-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1255638072-6236-1-git-send-email-cascardo-DmMZpsCg3uxeGPcbtGPokg@public.gmane.org>

On Thu, 15 Oct 2009, Thadeu Lima de Souza Cascardo wrote:

> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo-DmMZpsCg3uxeGPcbtGPokg@public.gmane.org>
> ---
>  drivers/infiniband/ulp/iser/iser_verbs.c |    2 +-
>  drivers/net/bonding/bond_alb.c           |    2 +-
>  drivers/net/wireless/rt2x00/rt2800usb.c  |    2 +-
>  drivers/scsi/bnx2i/bnx2i_iscsi.c         |    2 +-
>  drivers/staging/rtl8187se/r8180.h        |    2 +-
>  fs/ext4/balloc.c                         |    2 +-
>  net/bluetooth/bnep/core.c                |    2 +-
>  net/sctp/sm_statefuns.c                  |    2 +-
>  8 files changed, 8 insertions(+), 8 deletions(-)

Merged together with patches 2-8 from your series and applied to trivial 
queue.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH 6/8] trivial: fix typo s/refference/reference/ in comment
From: Jiri Kosina @ 2009-10-16 13:25 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Marcel Holtmann, David S. Miller, Stephen Hemminger, Wang Chen,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1255637783-6109-1-git-send-email-cascardo-DmMZpsCg3uxeGPcbtGPokg@public.gmane.org>

On Thu, 15 Oct 2009, Thadeu Lima de Souza Cascardo wrote:

> diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
> index cafe9f5..1bd9398 100644
> --- a/net/bluetooth/bnep/core.c
> +++ b/net/bluetooth/bnep/core.c
> @@ -78,7 +78,7 @@ static struct bnep_session *__bnep_get_session(u8 *dst)
>  static void __bnep_link_session(struct bnep_session *s)
>  {
>  	/* It's safe to call __module_get() here because sessions are added
> -	   by the socket layer which has to hold the refference to this module.
> +	   by the socket layer which has to hold the reference to this module.
>  	 */
>  	__module_get(THIS_MODULE);
>  	list_add(&s->list, &bnep_session_list);

Merged together with patches 2-8 from your series and applied to trivial
queue.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH 5/8] trivial: fix typo s/assocate/associate/ in comment
From: Jiri Kosina @ 2009-10-16 13:25 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez, Bob Copeland,
	John W. Linville, linux-wireless, ath5k-devel, netdev,
	linux-kernel
In-Reply-To: <1255637738-6069-1-git-send-email-cascardo@holoscopio.com>

On Thu, 15 Oct 2009, Thadeu Lima de Souza Cascardo wrote:

> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
> ---
>  drivers/net/wireless/ath/ath5k/base.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath5k/base.h b/drivers/net/wireless/ath/ath5k/base.h
> index b14ba07..b73e7d3 100644
> --- a/drivers/net/wireless/ath/ath5k/base.h
> +++ b/drivers/net/wireless/ath/ath5k/base.h
> @@ -192,7 +192,7 @@ struct ath5k_softc {
>  	struct ath5k_txq	*cabq;		/* content after beacon */
>  
>  	int 			power_level;	/* Requested tx power in dbm */
> -	bool			assoc;		/* assocate state */
> +	bool			assoc;		/* associate state */
>  	bool			enable_beacon;	/* true if beacons are on */
>  };

Merged together with patches 2-8 from your series and applied to trivial
queue.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH 4/8] trivial: fix some typos and punctuation in comments
From: Jiri Kosina @ 2009-10-16 13:27 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linville, IvDoorn, johannes, users, linux-kernel, linux-wireless,
	netdev
In-Reply-To: <1255636103-5817-1-git-send-email-cascardo@holoscopio.com>

On Thu, 15 Oct 2009, Thadeu Lima de Souza Cascardo wrote:

> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
> ---
>  drivers/net/wireless/rt2x00/rt61pci.c |   36 ++++++++++++++++----------------
>  1 files changed, 18 insertions(+), 18 deletions(-)

Merged together with patches 2-8 from your series and applied to trivial
queue.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* [PATCH 0/2] net: vbus-enet fixes
From: Gregory Haskins @ 2009-10-16 13:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, alacrityvm-devel

The following apply to the linux-next branch for AlacrityVM:

git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git

The series fixes a few problems discovered via community feedback and
lockdep.  Please see the patch headers for more details.

Kind Regards,
-Greg

---

Gregory Haskins (2):
      venet: fix locking issue with dev_kfree_skb()
      net: fix vbus-enet Kconfig dependencies


 drivers/net/Kconfig     |    2 +
 drivers/net/vbus-enet.c |   71 +++++++++++++++++++++++------------------------
 2 files changed, 36 insertions(+), 37 deletions(-)

-- 
Signature

^ permalink raw reply

* [PATCH 1/2] net: fix vbus-enet Kconfig dependencies
From: Gregory Haskins @ 2009-10-16 13:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, alacrityvm-devel
In-Reply-To: <20091016133411.13423.36106.stgit@dev.haskins.net>

We currently select VBUS_PROXY when vbus-enet is enabled, which is
the wrong direction.  Not all platforms will define VBUS-PROXY, and
venet depends on its inclusion.  Therefore, lets fix vbus-enet to
properly depend on the presence of VBUS_PROXY to get this right.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 drivers/net/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 47dfa04..c9128ea 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3233,7 +3233,7 @@ config VIRTIO_NET
 config VBUS_ENET
 	tristate "VBUS Ethernet Driver"
 	default n
-	select VBUS_PROXY
+	depends on VBUS_PROXY
 	help
 	   A virtualized 802.x network device based on the VBUS
 	   "virtual-ethernet" interface.  It can be used with any

^ permalink raw reply related

* [PATCH 2/2] venet: fix locking issue with dev_kfree_skb()
From: Gregory Haskins @ 2009-10-16 13:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, alacrityvm-devel
In-Reply-To: <20091016133411.13423.36106.stgit@dev.haskins.net>

We currently hold the priv->lock with interrupts disabled while calling
dev_kfree_skb().  lockdep indicated that this arrangement is problematic
with higher stack components which handle the wmem facility.  It is
probably a bad idea to hold the lock/interrupts over the duration of this
function independent of the lock-conflict issue, so lets rectify this.

This new design switches to a finer-grained model, where we acquire/release
the lock for each packet that we reap from the tx queue.  This adds
theoretical lock acquistion overhead, but gains the ability to release
the skbs without holding a lock and while improving critical section
granularity.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 drivers/net/vbus-enet.c |   71 +++++++++++++++++++++++------------------------
 1 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/drivers/net/vbus-enet.c b/drivers/net/vbus-enet.c
index 9d48674..228c366 100644
--- a/drivers/net/vbus-enet.c
+++ b/drivers/net/vbus-enet.c
@@ -883,7 +883,7 @@ vbus_enet_tx_start(struct sk_buff *skb, struct net_device *dev)
 	priv->dev->stats.tx_packets++;
 	priv->dev->stats.tx_bytes += skb->len;
 
-	__skb_queue_tail(&priv->tx.outstanding, skb);
+	skb_queue_tail(&priv->tx.outstanding, skb);
 
 	/*
 	 * This advances both indexes together implicitly, and then
@@ -914,7 +914,7 @@ vbus_enet_skb_complete(struct vbus_enet_priv *priv, struct sk_buff *skb)
 	PDEBUG(priv->dev, "completed sending %d bytes\n",
 	       skb->len);
 
-	__skb_unlink(skb, &priv->tx.outstanding);
+	skb_unlink(skb, &priv->tx.outstanding);
 	dev_kfree_skb(skb);
 }
 
@@ -923,12 +923,16 @@ vbus_enet_skb_complete(struct vbus_enet_priv *priv, struct sk_buff *skb)
  *
  * assumes priv->lock held
  */
-static void
-vbus_enet_tx_reap(struct vbus_enet_priv *priv)
+static struct sk_buff *
+vbus_enet_tx_reap_one(struct vbus_enet_priv *priv)
 {
+	struct sk_buff *skb = NULL;
 	struct ioq_iterator iter;
+	unsigned long flags;
 	int ret;
 
+	spin_lock_irqsave(&priv->lock, flags);
+
 	/*
 	 * We want to iterate on the head of the valid index, but we
 	 * do not want the iter_pop (below) to flip the ownership, so
@@ -941,29 +945,15 @@ vbus_enet_tx_reap(struct vbus_enet_priv *priv)
 	ret = ioq_iter_seek(&iter, ioq_seek_head, 0, 0);
 	BUG_ON(ret < 0);
 
-	/*
-	 * We are done once we find the first packet either invalid or still
-	 * owned by the south-side
-	 */
-	while (iter.desc->valid && !iter.desc->sown) {
-
-		if (!priv->evq.txc) {
-			struct sk_buff *skb;
+	if (iter.desc->valid && !iter.desc->sown) {
 
-			if (priv->sg) {
-				struct venet_sg *vsg;
-
-				vsg = (struct venet_sg *)iter.desc->cookie;
-				skb = (struct sk_buff *)vsg->cookie;
-			} else
-				skb = (struct sk_buff *)iter.desc->cookie;
+		if (priv->sg) {
+			struct venet_sg *vsg;
 
-			/*
-			 * If TXC is not enabled, we are required to free
-			 * the buffer resources now
-			 */
-			vbus_enet_skb_complete(priv, skb);
-		}
+			vsg = (struct venet_sg *)iter.desc->cookie;
+			skb = (struct sk_buff *)vsg->cookie;
+		} else
+			skb = (struct sk_buff *)iter.desc->cookie;
 
 		/* Reset the descriptor */
 		iter.desc->valid  = 0;
@@ -982,19 +972,35 @@ vbus_enet_tx_reap(struct vbus_enet_priv *priv)
 		PDEBUG(priv->dev, "re-enabling tx queue\n");
 		netif_wake_queue(priv->dev);
 	}
+
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return skb;
+}
+
+static void
+vbus_enet_tx_reap(struct vbus_enet_priv *priv)
+{
+	struct sk_buff *skb;
+
+	while ((skb = vbus_enet_tx_reap_one(priv))) {
+		if (!priv->evq.txc)
+			/*
+			 * We are responsible for freeing the packet upon
+			 * reap if TXC is not enabled
+			 */
+			vbus_enet_skb_complete(priv, skb);
+	}
 }
 
 static void
 vbus_enet_timeout(struct net_device *dev)
 {
 	struct vbus_enet_priv *priv = netdev_priv(dev);
-	unsigned long flags;
 
 	dev_dbg(&dev->dev, "Transmit timeout\n");
 
-	spin_lock_irqsave(&priv->lock, flags);
 	vbus_enet_tx_reap(priv);
-	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
 static void
@@ -1014,13 +1020,10 @@ static void
 deferred_tx_isr(unsigned long data)
 {
 	struct vbus_enet_priv *priv = (struct vbus_enet_priv *)data;
-	unsigned long flags;
 
 	PDEBUG(priv->dev, "deferred_tx_isr\n");
 
-	spin_lock_irqsave(&priv->lock, flags);
 	vbus_enet_tx_reap(priv);
-	spin_unlock_irqrestore(&priv->lock, flags);
 
 	ioq_notify_enable(priv->tx.veq.queue, 0);
 }
@@ -1063,14 +1066,10 @@ evq_txc_event(struct vbus_enet_priv *priv,
 {
 	struct venet_event_txc *event =
 		(struct venet_event_txc *)header;
-	unsigned long flags;
-
-	spin_lock_irqsave(&priv->lock, flags);
 
 	vbus_enet_tx_reap(priv);
-	vbus_enet_skb_complete(priv, (struct sk_buff *)event->cookie);
 
-	spin_unlock_irqrestore(&priv->lock, flags);
+	vbus_enet_skb_complete(priv, (struct sk_buff *)event->cookie);
 }
 
 static void


^ permalink raw reply related

* not supported broadcom (?) NIC revision
From: Wojtek Sawasciuk @ 2009-10-16 13:03 UTC (permalink / raw)
  To: netdev; +Cc: mchan, davem, jgarzik

Hi,

I'v got server with built-in 9 NIC ports. Two ports are not recognized 
by any driver, I suppose it *should* be by tg3 or bnx2
Below is lspci info, not supported NIC are in slots 00:10.0 and 00:10.2
strange thing is that this vendor is described as Serverworks, but lspci 
says its Broadcom.

# grep 1166 include/linux/pci_ids.h
#define PCI_VENDOR_ID_SERVERWORKS         0x1166

Can someone help here ?


# lspci
00:00.0 Host bridge: Broadcom GCNB-LE Host Bridge (rev 32)
00:00.1 Host bridge: Broadcom GCNB-LE Host Bridge
00:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller (rev 05)
00:0f.0 ISA bridge: Broadcom CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: Broadcom CSB5 LPC bridge

00:10.0 Host bridge: Broadcom CIOB-E I/O Bridge with Gigabit Ethernet 
(rev 12)
00:10.2 Host bridge: Broadcom CIOB-E I/O Bridge with Gigabit Ethernet 
(rev 12)

02:02.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)
02:02.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)
02:04.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)
02:04.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 02)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 02)


00:10.0 0600: 1166:0110 (rev 12)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=8
                Status: Dev=00:00.0 64bit+ 133MHz+ SCD- USC- DC=bridge 
DMMRBC=512 DMOST=8 DMCRS=8 RSCEM- 266MHz- 533MHz-

00:10.2 0600: 1166:0110 (rev 12)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=8
                Status: Dev=00:00.0 64bit+ 133MHz+ SCD- USC- DC=bridge 
DMMRBC=512 DMOST=8 DMCRS=8 RSCEM- 266MHz- 533MHz-






^ permalink raw reply

* [PATCH] af_packet: Avoid cache line dirtying
From: Eric Dumazet @ 2009-10-16 14:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List, Luca Deri

While doing multiple captures, I found af_packet was dirtying cache line
containing its prot_hook.

This slow down machines where several cpus are necessary to handle capture
traffic, as each prot_hook is traversed for each packet coming in or out
the host.

This patches moves "struct packet_type prot_hook" to the end of 
packet_sock, and uses a ____cacheline_aligned_in_smp to make sure
this remains shared by all cpus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bf3a295..dac775e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -188,7 +188,6 @@ struct packet_sock {
 	struct packet_ring_buffer	tx_ring;
 	int			copy_thresh;
 #endif
-	struct packet_type	prot_hook;
 	spinlock_t		bind_lock;
 	struct mutex		pg_vec_lock;
 	unsigned int		running:1,	/* prot_hook is attached*/
@@ -204,6 +203,7 @@ struct packet_sock {
 	unsigned int		tp_reserve;
 	unsigned int		tp_loss:1;
 #endif
+	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
 };

 struct packet_skb_cb {

^ permalink raw reply related

* RE: PATCH: Network Device Naming mechanism and policy
From: Narendra_K @ 2009-10-16 14:02 UTC (permalink / raw)
  To: dannf; +Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose
In-Reply-To: <20091016003245.GD29672@ldl.fc.hp.com>


>On Tue, Oct 13, 2009 at 11:36:38AM -0600, dann frazier wrote:
>> Right - so any reason this couldn't be implemented completely in 
>> userspace by having udev manipulate plain text files under say 
>> /etc/udev/net/?
>> 
>> I do agree that it would be nice for admins/installers to tweak/use 
>> nic names in a similar way to storage names (udev rules), 
>and it might 
>> let us take advantage of a lot of the existing udev code.
>
>Is there interest in this approach?
> - modify udev to manage network devices names as regular (non-device)
>   files (stored in /etc/udev, /dev/netdev, or wherever)

Yes. Would you elaborate little more on "modify udev to manage network
devices as regular files". Does it mean some custom rules which will
generate a regular file under, say, /dev/netdev/ or extend udev itself ?
And how would the regular file look like in terms of holding ifindex of
the interface, which can be passed to libnetdevname.


> - use the existing udev rules to manage symlinks to these files
> - point libnetdevname at these text files for its name resolution
>
>I've started prototyping this, and it certainly looks possible 
>w/o any kernel changes. However, I could probably use some 
>advice from a udev person to do a proper implementation.

With regards,
Narendra K  

^ permalink raw reply

* Re: [PATCH 2/4 v2] net: Use sk_tx_queue_mapping for connected sockets
From: Eric Dumazet @ 2009-10-16 14:07 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, herbert, netdev
In-Reply-To: <OFAE5CC975.C26B600E-ON65257651.00401683-65257651.004721BC@in.ibm.com>

Krishna Kumar2 a écrit :

> I wait for tcp_v4_syn_recv_sock to finish and create a connection.
> sk_setup_caps/__sk_dst_set are called for the new socket resulting
> in sk_dst_cache getting set (I cannot set the cache till the
> connection is finished and dst is set). I guess that means that I
> will wait for another skb to come and the response will use skb_tx_hash
> instead of using the rx recorded.

Well, I see no relation between rx mapping and tx mapping currently.

> 
> Will this suffice, but if not, is it something I should handle or an
> add-on patch should do instead?
> 

Your patch is fine, I was only giving ideas for future patches :)




^ permalink raw reply

* TCP_NODELAY and CORK - should they be added for network fs case?
From: Steve French @ 2009-10-16 14:53 UTC (permalink / raw)
  To: netdev

Looking at calls to kernel_sendmsg, and thinking about why I only see
a few places that do TCP_NODELAY and TCP_CORK in kernel.

Looking at the cifs example.  cifs is trying to send packets which
vary from about 50-100 bytes for common calls (like lookup) to about
56K for file writes (can be larger if override wsize and max buffer
size via insmod parameter), and cifs always uses kernel_sendmsg.   For
the cifs case, sending individual SMB/CIFS requests to a particular
server (socket) are serialized, protected by a mutex, even if many
processes are writing to different remote files at one time.
Usually one kernel_sendmsg is all that is needed to send an SMB
request - does kernel_sendmsg implicitly "cork" the request so that
the SMB is not unnecessarily fragmented?  If the socket is full, and
only a few bytes are sent, multiple sendmsg's may be required to send
one smb - should cifs be doing a cork before the loop which calls
kernel_sendmsg in smb_sendv in fs/cifs/transport.c and uncork
afterward (since the server can't do much processing without getting
the whole SMB request except in one narrow case of receivefile on
certain write requests)?    Especially if we add code to allow setting
"TCP_NODELAY" ... to improve GigE performance

Are there any cases where we should be setting LOWDELAY instead for
this kind of socket?

-- 
Thanks,

Steve

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: dann frazier @ 2009-10-16 15:20 UTC (permalink / raw)
  To: Narendra_K
  Cc: netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave, Charles_Rose
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE589579@blrx3m08.blr.amer.dell.com>

On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote:
> 
> >On Tue, Oct 13, 2009 at 11:36:38AM -0600, dann frazier wrote:
> >> Right - so any reason this couldn't be implemented completely in 
> >> userspace by having udev manipulate plain text files under say 
> >> /etc/udev/net/?
> >> 
> >> I do agree that it would be nice for admins/installers to tweak/use 
> >> nic names in a similar way to storage names (udev rules), 
> >and it might 
> >> let us take advantage of a lot of the existing udev code.
> >
> >Is there interest in this approach?
> > - modify udev to manage network devices names as regular (non-device)
> >   files (stored in /etc/udev, /dev/netdev, or wherever)
> 
> Yes. Would you elaborate little more on "modify udev to manage network
> devices as regular files".

Sure. We already get an event when netifs get added/removed - udev
just doesn't create a device file for it. And since all we care about
is the file's name (and the symlinks to it), there's really no point
in creating a real device file anyway.

So, instead of 'mknod /dev/netdev/eth0', why not just 'touch
/dev/netdev/eth0'? A file exists, so we can still maintain
aliases as symlinks, we just don't need to modify the kernel.

> Does it mean some custom rules which will
> generate a regular file under, say, /dev/netdev/ or extend udev
> itself ?

I believe we have to extend udev itself. We could probably do this
completely within udev rules by running programs that do the touching
and symlinking, but it would be nicer and more consistent/familiar to
take advantage of the udev syntax (SYMLINK) to do this
natively. Besides, udev already has the logic to know when/how to
instantiate and unlink symlinks, it would suck to duplicate that.

So, udev would need to be modified to know how to go through the
normal "node" creation for net devices, and to call creat() instead of
mknod().

> And how would the regular file look like in terms of holding ifindex of
> the interface, which can be passed to libnetdevname.

I can't think of anything we need to store in the regular file. If we
have the kernel name for the device, we can look up the ifindex in
/sys. Correct me if I'm wrong, but storing it ourselves seems
redundant.

> 
> 
> > - use the existing udev rules to manage symlinks to these files
> > - point libnetdevname at these text files for its name resolution
> >
> >I've started prototyping this, and it certainly looks possible 
> >w/o any kernel changes. However, I could probably use some 
> >advice from a udev person to do a proper implementation.
> 
> With regards,
> Narendra K  

-- 
dann frazier

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Ben Hutchings @ 2009-10-16 15:33 UTC (permalink / raw)
  To: dann frazier
  Cc: Narendra_K, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave,
	Charles_Rose
In-Reply-To: <20091016152053.GA9862@ldl.fc.hp.com>

On Fri, 2009-10-16 at 09:20 -0600, dann frazier wrote:
> On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote:
[...]
> > And how would the regular file look like in terms of holding ifindex of
> > the interface, which can be passed to libnetdevname.
> 
> I can't think of anything we need to store in the regular file. If we
> have the kernel name for the device, we can look up the ifindex in
> /sys. Correct me if I'm wrong, but storing it ourselves seems
> redundant.

But the name of a netdev can change whereas its ifindex never does.
Identifying netdevs by name would require additional work to update the
links when a netdev is renamed and would still be prone to race
conditions.  This is why Narendra and Matt were proposing to store the
ifindex in the node all along...

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: dann frazier @ 2009-10-16 15:41 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Narendra_K, netdev, linux-hotplug, Matt_Domsch, Jordan_Hargrave,
	Charles_Rose
In-Reply-To: <1255707193.2869.12.camel@achroite>

On Fri, Oct 16, 2009 at 04:33:13PM +0100, Ben Hutchings wrote:
> On Fri, 2009-10-16 at 09:20 -0600, dann frazier wrote:
> > On Fri, Oct 16, 2009 at 07:32:50PM +0530, Narendra_K@Dell.com wrote:
> [...]
> > > And how would the regular file look like in terms of holding ifindex of
> > > the interface, which can be passed to libnetdevname.
> > 
> > I can't think of anything we need to store in the regular file. If we
> > have the kernel name for the device, we can look up the ifindex in
> > /sys. Correct me if I'm wrong, but storing it ourselves seems
> > redundant.
> 
> But the name of a netdev can change whereas its ifindex never does.
> Identifying netdevs by name would require additional work to update the
> links when a netdev is renamed and would still be prone to race
> conditions.  This is why Narendra and Matt were proposing to store the
> ifindex in the node all along...

ah, yes - I see that now - the ability to rename an interface is what
prevents this from working. Thanks for the explanation.
-- 
dann frazier


^ permalink raw reply

* Re: [PATCH 2/4 v2] net: Use sk_tx_queue_mapping for connected sockets
From: Krishna Kumar2 @ 2009-10-16 15:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, herbert, netdev
In-Reply-To: <4AD87E22.9040500@gmail.com>

Hi Eric,

Eric Dumazet <eric.dumazet@gmail.com> wrote on 10/16/2009 07:37:30 PM:

> > I wait for tcp_v4_syn_recv_sock to finish and create a connection.
> > sk_setup_caps/__sk_dst_set are called for the new socket resulting
> > in sk_dst_cache getting set (I cannot set the cache till the
> > connection is finished and dst is set). I guess that means that I
> > will wait for another skb to come and the response will use skb_tx_hash
> > instead of using the rx recorded.
>
> Well, I see no relation between rx mapping and tx mapping currently.

You are suggesting an enhancement to assign the txq# at ack processing
using the skb->rx#. As you said it will help cards that are hashing the
rx# internally. I will try that separately after this patch.

> > Will this suffice, but if not, is it something I should handle or an
> > add-on patch should do instead?
> >
>
> Your patch is fine, I was only giving ideas for future patches :)

Thanks for the suggestions, I just wanted to make sure.

- KK


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox