Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next-2.6 PATCH] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: Matt Carlson @ 2010-10-06 18:14 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eric Dumazet, Matthew Carlson, John Fastabend,
	netdev@vger.kernel.org, therbert@google.com
In-Reply-To: <1286379105.2371.15.camel@achroite.uk.solarflarecom.com>

On Wed, Oct 06, 2010 at 08:31:45AM -0700, Ben Hutchings wrote:
> On Wed, 2010-10-06 at 17:23 +0200, Eric Dumazet wrote:
> > Le mercredi 06 octobre 2010 ?? 16:07 +0100, Ben Hutchings a ??crit :
> > 
> > > The waste of memory is minimal now that we only allocate kobjects for
> > > real_num_rx_queues.
> > 
> > Thats strange, here with tg3 (and mono queue device) :
> > 
> > 0a:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S
> > Gigabit Ethernet (rev a3)
> > 
> > grep . /sys/class/net/eth2/queues/rx-*/rps_cpus
> > /sys/class/net/eth2/queues/rx-0/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-1/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-2/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-3/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-4/rps_cpus:00000000
> 
> It looks like I missed a necessary call to
> netif_set_real_num_rx_queues() in tg3.  I suggest that Matt should check
> and correct this since I got the numbers wrong last time.

Yes.  We were missing a call to this function in the legacy case.


[PATCH net-next] tg3: Set real_num_rx_queues for non-multiq devs

Commit 2ddaad397c47de012dfb956b0c05540da1a0dde5 entitled "tg3: Use
netif_set_real_num_{rx,tx}_queues()" added a new call to
netif_set_real_num_rx_queues in tg3_enable_msix().  This call also needs
to be added to the legacy path to correctly reflect the actual number of
rx queues.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/tg3.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 16e1a95..e5b9ec5 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -8906,6 +8906,7 @@ defcfg:
 		tp->irq_cnt = 1;
 		tp->napi[0].irq_vec = tp->pdev->irq;
 		netif_set_real_num_tx_queues(tp->dev, 1);
+		netif_set_real_num_rx_queues(tp->dev, 1);
 	}
 }
 
-- 
1.7.2.2



^ permalink raw reply related

* Re: [PATCH] SIW: iWARP Protocol headers
From: David Dillow @ 2010-10-06 18:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bernard Metzler, Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006172518.GI24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Wed, 2010-10-06 at 11:25 -0600, Jason Gunthorpe wrote:
> On Wed, Oct 06, 2010 at 02:52:46PM +0200, Bernard Metzler wrote:
> It is actually a little more complicated than just this. I assume you
> are casting the structures over packet payloads? In this case you
> have to guarentee alignment (or used packed everywhere). Does iwarp
> have provisions for alignment? If so you can construct your bitfields
> using the alignment type, ie if iWarp guarantees 4 bytes then the
> biggest type you can use is u32 - then you can avoid using packed.
> 
> Mind you, I'm not sure how to guarentee alignment when you consider
> all the possible sources of unalignment in the stack: TCP, IP, L2 stack ?

You don't have to. The TCP stack, IIRC, requires the architecture to fix
up misaligned accesses, or at least it used to. It will try to keep
things aligned if possible -- see the comment above NET_IP_ALIGN in
include/linux/skbuff.h.

The structures in Bernard's header have all fields naturally aligned, so
no need for the packed attribute and its baggage.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] SIW: iWARP Protocol headers
From: Jason Gunthorpe @ 2010-10-06 18:37 UTC (permalink / raw)
  To: David Dillow
  Cc: Bernard Metzler, Bart Van Assche, linux-rdma, linux-rdma-owner,
	netdev
In-Reply-To: <1286389375.26136.24.camel@lap75545.ornl.gov>

On Wed, Oct 06, 2010 at 02:22:55PM -0400, David Dillow wrote:
> On Wed, 2010-10-06 at 11:25 -0600, Jason Gunthorpe wrote:
> > On Wed, Oct 06, 2010 at 02:52:46PM +0200, Bernard Metzler wrote:
> > It is actually a little more complicated than just this. I assume you
> > are casting the structures over packet payloads? In this case you
> > have to guarentee alignment (or used packed everywhere). Does iwarp
> > have provisions for alignment? If so you can construct your bitfields
> > using the alignment type, ie if iWarp guarantees 4 bytes then the
> > biggest type you can use is u32 - then you can avoid using packed.
> > 
> > Mind you, I'm not sure how to guarentee alignment when you consider
> > all the possible sources of unalignment in the stack: TCP, IP, L2 stack ?
> 
> You don't have to. The TCP stack, IIRC, requires the architecture to fix
> up misaligned accesses, or at least it used to. It will try to keep
> things aligned if possible -- see the comment above NET_IP_ALIGN in
> include/linux/skbuff.h.

I was under the impression that this was just to align the IP
header. There are quite a variety of options and additional stuff that
can be inserted at each layer, VLAN for ethernet, IP option/extension
headers, TCP option headers, etc, etc. I seem to remember that because
of all this there are cases where processing unaligned payloads is
unavoidable? Hence my question if iWarp has restrictions (particularly
on TCP) that might guarentee alignment.

> The structures in Bernard's header have all fields naturally
> aligned, so no need for the packed attribute and its baggage.

No, they arent:

> +struct iwarp_rdma_write {
> +	struct iwarp_ctrl	ctrl;
> +	__u32			sink_stag;
> +	__u64			sink_to;
> +} __attribute__((__packed__));

For instance has sink_to unaligned if the maximum aligment of the
payload is only guarenteed to be 4. The proper thing to do in this
case is probably __attribute__((__packed__, __aligned__(4))) if that
works the way I think.. :)

Jason

^ permalink raw reply

* Re: [net-next-2.6 PATCH] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: Eric Dumazet @ 2010-10-06 18:49 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Ben Hutchings, John Fastabend, netdev@vger.kernel.org,
	therbert@google.com
In-Reply-To: <20101006181458.GA6817@mcarlson.broadcom.com>

Le mercredi 06 octobre 2010 à 11:14 -0700, Matt Carlson a écrit :

> Yes.  We were missing a call to this function in the legacy case.
> 
> 
> [PATCH net-next] tg3: Set real_num_rx_queues for non-multiq devs
> 
> Commit 2ddaad397c47de012dfb956b0c05540da1a0dde5 entitled "tg3: Use
> netif_set_real_num_{rx,tx}_queues()" added a new call to
> netif_set_real_num_rx_queues in tg3_enable_msix().  This call also needs
> to be added to the legacy path to correctly reflect the actual number of
> rx queues.
> 
> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
> ---
>  drivers/net/tg3.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 16e1a95..e5b9ec5 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -8906,6 +8906,7 @@ defcfg:
>  		tp->irq_cnt = 1;
>  		tp->napi[0].irq_vec = tp->pdev->irq;
>  		netif_set_real_num_tx_queues(tp->dev, 1);
> +		netif_set_real_num_rx_queues(tp->dev, 1);
>  	}
>  }
>  

Thanks, this fixed the thing for me, once device is UP

# insmod drivers/net/tg3.ko
# grep . /sys/class/net/eth3/queues/*/rps_cpus
/sys/class/net/eth3/queues/rx-0/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-1/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-2/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-3/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-4/rps_cpus:00000000
# echo ff >/sys/class/net/eth3/queues/rx-2/rps_cpus
# grep . /sys/class/net/eth3/queues/*/rps_cpus
/sys/class/net/eth3/queues/rx-0/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-1/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-2/rps_cpus:000000ff
/sys/class/net/eth3/queues/rx-3/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-4/rps_cpus:00000000
# ifconfig eth3 up
# grep . /sys/class/net/eth3/queues/*/rps_cpus
00000000

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-10-06 19:03 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: anthony, arnd, avi, davem, kvm, netdev, rusty, herbert
In-Reply-To: <OF806283F6.38268772-ON652577B4.005E805A-652577B4.00611B1C@in.ibm.com>

On Wed, Oct 06, 2010 at 11:13:31PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 10/05/2010 11:53:23 PM:
> 
> > > > Any idea where does this come from?
> > > > Do you see more TX interrupts? RX interrupts? Exits?
> > > > Do interrupts bounce more between guest CPUs?
> > > > 4. Identify reasons for single netperf BW regression.
> > >
> > > After testing various combinations of #txqs, #vhosts, #netperf
> > > sessions, I think the drop for 1 stream is due to TX and RX for
> > > a flow being processed on different cpus.
> >
> > Right. Can we fix it?
> 
> I am not sure how to. My initial patch had one thread but gave
> small gains and ran into limitations once number of sessions
> became large.

Sure. We will need multiple RX queues, and have a single
thread handle a TX and RX pair. Then we need to make sure packets
from a given flow on TX land on the same thread on RX.
As flows can be hashed differently, for this to work we'll have to
expose this info in host/guest interface.
But since multiqueue implies host/guest ABI changes anyway,
this point is moot.

BTW, an interesting approach could be using bonding
and multiple virtio-net interfaces.
What are the disadvantages of such a setup?  One advantage
is it can be made to work in existing guests.

> > >  I did two more tests:
> > >     1. Pin vhosts to same CPU:
> > >         - BW drop is much lower for 1 stream case (- 5 to -8% range)
> > >         - But performance is not so high for more sessions.
> > >     2. Changed vhost to be single threaded:
> > >           - No degradation for 1 session, and improvement for upto
> > >          8, sometimes 16 streams (5-12%).
> > >           - BW degrades after that, all the way till 128 netperf
> sessions.
> > >           - But overall CPU utilization improves.
> > >             Summary of the entire run (for 1-128 sessions):
> > >                 txq=4:  BW: (-2.3)      CPU: (-16.5)    RCPU: (-5.3)
> > >                 txq=16: BW: (-1.9)      CPU: (-24.9)    RCPU: (-9.6)
> > >
> > > I don't see any reasons mentioned above.  However, for higher
> > > number of netperf sessions, I see a big increase in retransmissions:
> >
> > Hmm, ok, and do you see any errors?
> 
> I haven't seen any in any statistics, messages, etc.

Herbert, could you help out debugging this increase in retransmissions
please?  Older mail on netdev in this thread has some numbers that seem
to imply that we start hitting retransmissions much more as # of flows
goes up.

> Also no
> retranmissions for txq=1.

While it's nice that we have this parameter, the need to choose between
single stream and multi stream performance when you start the vm makes
this patch much less interesting IMHO.

-- 
MST

^ permalink raw reply

* Re: [PATCH] SIW: iWARP Protocol headers
From: David Dillow @ 2010-10-06 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bernard Metzler, Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006183734.GK24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Wed, 2010-10-06 at 12:37 -0600, Jason Gunthorpe wrote:
> On Wed, Oct 06, 2010 at 02:22:55PM -0400, David Dillow wrote:
> > On Wed, 2010-10-06 at 11:25 -0600, Jason Gunthorpe wrote:
> > > On Wed, Oct 06, 2010 at 02:52:46PM +0200, Bernard Metzler wrote:
> > > It is actually a little more complicated than just this. I assume you
> > > are casting the structures over packet payloads? In this case you
> > > have to guarentee alignment (or used packed everywhere). Does iwarp
> > > have provisions for alignment? If so you can construct your bitfields
> > > using the alignment type, ie if iWarp guarantees 4 bytes then the
> > > biggest type you can use is u32 - then you can avoid using packed.
> > > 
> > > Mind you, I'm not sure how to guarentee alignment when you consider
> > > all the possible sources of unalignment in the stack: TCP, IP, L2 stack ?
> > 
> > You don't have to. The TCP stack, IIRC, requires the architecture to fix
> > up misaligned accesses, or at least it used to. It will try to keep
> > things aligned if possible -- see the comment above NET_IP_ALIGN in
> > include/linux/skbuff.h.
> 
> I was under the impression that this was just to align the IP
> header.

It is, and it assumes an ethernet header of 14 bytes. As you note, VLANs
and everything else can mess with that, which is why the TCP stack
requires the architecture to handle unaligned accesses. It's a bonus if
iWarp can guarantee an alignment WRT the IP or TCP header, but not
required.

> > The structures in Bernard's header have all fields naturally
> > aligned, so no need for the packed attribute and its baggage.
> 
> No, they arent:
> 
> > +struct iwarp_rdma_write {
> > +	struct iwarp_ctrl	ctrl;
> > +	__u32			sink_stag;
> > +	__u64			sink_to;
> > +} __attribute__((__packed__));
> 
> For instance has sink_to unaligned if the maximum aligment of the
> payload is only guarenteed to be 4. The proper thing to do in this
> case is probably __attribute__((__packed__, __aligned__(4))) if that
> works the way I think.. :)

struct iwarp_ctrl consists of 2 u16s, so sink_stag is aligned to 4 bytes
from the start of the struct iwarp_rdma_write, and sink_to is at 8 bytes
-- so they are naturally aligned with respect to the structure. The
compiler will not add padding, so no need for the packed attribute.

Note I'm not talking about alignment in memory -- that'll depend on
where the data starts, and as I mentioned, the stack doesn't make any
guarantees about that, relying on the architecture to fixup any
misaligned accesses.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH wireless-2.6 or stable] iwlwifi: return error when fail to start scanning
From: John W. Linville @ 2010-10-06 20:01 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Florian Mickler, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	wey-yi.w.guy-ral2JQCrhuEAvxtiuMwx3w,
	reinette.chatre-ral2JQCrhuEAvxtiuMwx3w,
	ilw-VuQAYsv1563Yd54FQh9/CA, johannes.berg-ral2JQCrhuEAvxtiuMwx3w,
	ben.m.cahill-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006160434.GB24581-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, Oct 06, 2010 at 06:04:35PM +0200, Stanislaw Gruszka wrote:
> This is stable friendly backport of wireless-testing commit
> 3eecce527c7434dfd16517ce08b49632c8a26717 "iwlwifi: unify scan start
> checks". Wireless-testing version include also a lot of cleanups.
> 
> On error case in {iwl3945,iwlagn}_request_scan we queue abort_scan work,
> which in a fact do nothing, so we never complete the scan. Thats gives
> "WARNING: at net/wireless/core.c:614 wdev_cleanup_work" and stopped
> network connection until reload iwlwifi modules. Return of -EIO, and
> stopping queuing any work is a fix.
> 
> Note there are still many cases when we can get: 
> 
> WARNING: at net/wireless/core.c:614 wdev_cleanup_work  
> WARNING: at net/mac80211/scan.c:266 ieee80211_scan_completed
> WARNING: at net/mac80211/scan.c:269 ieee80211_scan_complete
> 
> which are not fixed. Unfortunately does not exist single, small fix
> for that problems, and we probably will see some more bug reports
> with scan warnings. As solution we rewrite iwl-scan.c code to avoid
> all possible race conditions. That quite big patch set is queued
> for 2.6.37 .
> 
> Signed-off-by: Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> Patch is intended for wireless-2.6, or in case when it does not
> make 2.6.36 release, for -stable 2.6.36.y 

I'm fine with this patch for -stable, but I'm not sure it is worthwhile
to rush it into 2.6.36.  The problem it fixes is intermittent and
not always very severe, and we've already been living with it for
4 releases.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [GIT PULL net-next-2.6] vhost-net patchset for 2.6.37
From: David Miller @ 2010-10-06 20:11 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, jasowang
In-Reply-To: <20101005182732.GA25796@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 5 Oct 2010 20:27:32 +0200

> It looks like it was a quiet cycle for vhost-net:
> probably because most of energy was spent on bugfixes
> that went in for 2.6.36.
> People are working on multiqueue, tracing but I'm not
> sure it'll get done in time for 2.6.37 - so here's
> a tree with a single patch that helps windows guests
> which we definitely want in the next kernel.
> 
> Please merge for 2.6.37.

Pulled, thanks Michael.

^ permalink raw reply

* Re: drivers/net/Makefile - question about link order comment introduced in 2.3.43
From: David Miller @ 2010-10-06 20:15 UTC (permalink / raw)
  To: joe; +Cc: netdev, sam
In-Reply-To: <1286363049.2156.113.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 06 Oct 2010 04:04:09 -0700

> It wasn't that way in 2.3.43-pre7 , but it is in 2.3.43
> and has remained there since.  There's nothing in lkml
> history as far as I can find to describe a reason for it.
> 
> http://lkml.org/lkml/2000/2/10
> (or thereabout)
> 
> Anyone know why it's like that or if that Makefile link
> ordering is still needed?

I suspect it's not needed any more.

I also suspect it's an obsolete sparc specific hack to get the primary
ethernet interface to always show up as eth0.

^ permalink raw reply

* Segmentation fault when using ping6
From: Rogerio Pimentel @ 2010-10-06 20:27 UTC (permalink / raw)
  To: netdev, linux-arm-kernel

Hi,

I'm having a "segmentation fault" problem when using ping6.

When testing on ARM9 platforms (i.MX25 and i.MX27), it returns the error:

root@freescale ~$ ping6 ::1
PING ::1 (::1): 56 data bytes
Segmentation fault

When testing on ARM11 and ARM Cortex A8 platforms (i.MX31 and i.MX51), 
it works:

root@freescale ~$ ping6 ::1
PING ::1 (::1): 56 data bytes
64 bytes from ::1: seq=0 ttl=64 time=0.331 ms
64 bytes from ::1: seq=1 ttl=64 time=0.214 ms
64 bytes from ::1: seq=2 ttl=64 time=0.176 ms
64 bytes from ::1: seq=3 ttl=64 time=0.155 ms

On all cases, I'm using Linux Kernel 2.6.35 (from Mainline) and Busybox 
1.15.0

Does anybody have any idea?

Thanks in advance,

Rogerio Pimentel

^ permalink raw reply

* Re: [net-next-2.6 PATCH] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: David Miller @ 2010-10-06 20:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: mcarlson, bhutchings, john.r.fastabend, netdev, therbert
In-Reply-To: <1286390945.9417.46.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 06 Oct 2010 20:49:05 +0200

> Le mercredi 06 octobre 2010 à 11:14 -0700, Matt Carlson a écrit :
> 
>> Yes.  We were missing a call to this function in the legacy case.
>> 
>> 
>> [PATCH net-next] tg3: Set real_num_rx_queues for non-multiq devs
>> 
>> Commit 2ddaad397c47de012dfb956b0c05540da1a0dde5 entitled "tg3: Use
>> netif_set_real_num_{rx,tx}_queues()" added a new call to
>> netif_set_real_num_rx_queues in tg3_enable_msix().  This call also needs
>> to be added to the legacy path to correctly reflect the actual number of
>> rx queues.
>> 
>> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
...
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH net-next 00/19] bnx2x, cnic, bnx2i: Supporting 57712 device
From: David Miller @ 2010-10-06 21:11 UTC (permalink / raw)
  To: dmitry; +Cc: netdev, eilong, mchan
In-Reply-To: <1286370813.30552.7.camel@lb-tlvb-dmitry>

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Wed, 6 Oct 2010 15:13:33 +0200

> 01 cnic: Pass cp pointer to BNX2X_HW_CID.
> 02 cnic: Use pfid for internal memory offsets.
> 03 cnic: Fine-tune ring init code.
> 04 bnx2x: create folder for bnx2x firmware files
> 05 bnx2x: add 6.0.34 fw files
> 06 bnx2x, cnic, bnx2i: use new FW/HSI
> 07 bnx2x: remove old FW files 
> 08 bnx2x: rename MF related fields
> 09 bnx2x: change type of spq_left to atomic
> 10 bnx2x: Add 57712 support
> 11 bnx2x: remove unused parameter in reuse_rx_skb()
> 12 bnx2x: remove unused fields in main driver structure
> 13 bnx2x: use proper constants for dma_unmap* calls
> 14 bnx2x: use L1_CACHE_BYTES instead of magic number
> 15 bnx2x: move msix table initialization to probe()
> 16 bnx2x, cnic: Fix SPQ return credit
> 17 bnx2x: code beautify
> 18 bnx2x: properly initialize FW stats
> 19 bnx2x: update version to 1.60.00-1
> 
> Please consider applying to net-next.

All applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/fec: fix link/queue interaction, wake queue iff link is up
From: David Miller @ 2010-10-06 21:15 UTC (permalink / raw)
  To: oskar; +Cc: netdev, dan, bigeasy, hjk
In-Reply-To: <1286364718-20070-1-git-send-email-oskar@linutronix.de>

From: Oskar Schirmer <oskar@linutronix.de>
Date: Wed,  6 Oct 2010 13:31:58 +0200

> with hardware slow in negotiation, the system did freeze
> while trying to mount root on nfs at boot time.
> 
> while the driver did report being busy when the link is down
> or no transmission buffers are available, it did not stop the
> queue, causing instant retries. furthermore, transmission being
> triggered with link down was caused by unconditional queue
> wakes, especially on timeouts.
> 
> now, wake queue only iff link is up and transmission
> buffers are available, and dont forget to wake queue
> when link has been adjusted. next, add stop queue notification
> upon driver induced transmission problems, so network stack has
> a chance to handle the situation.
> 
> Signed-off-by: Oskar Schirmer <oskar@linutronix.de>

You should never have to do this.

As long as netif_carrier_on() and netif_carrier_off() are invoked
properly by the driver, the core networking will never send you
packets when netif_carrier_ok() is false.

I am not applying this patch, drivers should not manage tx queue
state using link status as a condition, that's the job of the
core networking.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] bonding: add Speed/Duplex information to /proc/net/bonding/bond
From: David Miller @ 2010-10-06 21:24 UTC (permalink / raw)
  To: ole; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <4ca4b878.KGc4gLtXHLSFlAdV%ole@ans.pl>

From: Krzysztof Piotr Oledzki <ole@ans.pl>
Date: Thu, 30 Sep 2010 18:19:04 +0200

> Subject: bonding: add Speed/Duplex information to /proc/net/bonding/bond*
> 
> Effect:
>  Slave Interface: eth5
>  MII Status: up
>  Speed: 10000 Mbps
>  Duplex: full
>  Link Failure Count: 0
>  Permanent HW addr: XX:XX:XX:XX:XX:XX
>  Slave queue ID: 0
> 
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Changing the layout of a procfs file is pretty much always unsafe.

Some useland program somewhere depends upon the current layout, and
therefore very likely could break if we change the contents.

We really can't apply a patch like this.  A most portable and
extensible interface (netlink, ethtool) should be used to provide this
information to userspace.

^ permalink raw reply

* Re: Regarding to your linux kernel CL
From: Chung-Yih Wang (王崇懿) @ 2010-10-06 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, timo.teras, netdev
In-Reply-To: <20101006.011448.112578977.davem@davemloft.net>

Do you mean if we should move the genid/cookie logic into sk_dst_check
and remove the dst->ops->check()? If so, I dont think it will be a
good idea to remove 'dst->ops->check' since it is per dst entry's
check not just for ipv4 entry. At least, the fact of obsolete=2
reveals that the entry is freed and is ready for being released. That
will be weird if we still use the obsolete entry for routing the
socket's traffic(for sure, that's why the packets of the socket goes
nowhere but dropped).

On Wed, Oct 6, 2010 at 1:14 AM, David Miller <davem@davemloft.net> wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Wed, 6 Oct 2010 15:59:46 +0800
>
>> Perhaps we should move the genid/cookie logic into the dst so that
>> we can eliminate the dst->check call or at least make it a lot less
>> frequent.
>
> That sounds like a good idea.
>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] bonding: print information about speed and duplex seen by the driver
From: David Miller @ 2010-10-06 21:28 UTC (permalink / raw)
  To: ole; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <4ca4b873.iZ0Dea7Y1OFcVmf8%ole@ans.pl>

From: Krzysztof Piotr Oledzki <ole@ans.pl>
Date: Thu, 30 Sep 2010 18:18:59 +0200

>>From d4fb10933bb88f2bf410b0abe42f151c13e3df85 Mon Sep 17 00:00:00 2001
> From: Krzysztof Piotr Oledzki <ole@ans.pl>
> Date: Thu, 30 Sep 2010 17:05:19 +0200
> Subject: bonding: print information about speed and duplex seen by the driver
> 
> before:
>  bonding: bond0: link status definitely up for interface eth5
>  bonding: bond0: link status definitely up for interface eth0
> 
> after:
>  bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
>  bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.
> 
> 
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] bonding: reread information about speed and duplex when interface goes up
From: David Miller @ 2010-10-06 21:28 UTC (permalink / raw)
  To: ole; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <4ca4b87e.NzbQ35P2qRrA2iuz%ole@ans.pl>

From: Krzysztof Piotr Oledzki <ole@ans.pl>
Date: Thu, 30 Sep 2010 18:19:10 +0200

>>From 43285224a785e90c7d4cff2be0766ca8df6ddfb9 Mon Sep 17 00:00:00 2001
> From: Krzysztof Piotr Oledzki <ole@ans.pl>
> Date: Thu, 30 Sep 2010 17:09:02 +0200
> Subject: bonding: reread information about speed and duplex when interface goes up
> 
> When an interface was enslaved when it was down, bonding thinks
> it has speed -1 even after it goes up. This leads into selecting
> a wrong active interface in active/backup mode on mixed 10G/1G or
> 1G/100M environment.
> 
> before:
>  bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
>  bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.
> 
> after:
>  bonding: bond0: link status definitely up for interface eth5, 10000 Mbps full duplex.
>  bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
> 
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] bonding: add Speed/Duplex information to /proc/net/bonding/bond
From: Krzysztof Olędzki @ 2010-10-06 21:32 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <20101006.142424.246555203.davem@davemloft.net>

On 2010-10-06 23:24, David Miller wrote:
> From: Krzysztof Piotr Oledzki <ole@ans.pl>
> Date: Thu, 30 Sep 2010 18:19:04 +0200
>
>> Subject: bonding: add Speed/Duplex information to /proc/net/bonding/bond*
>>
>> Effect:
>>   Slave Interface: eth5
>>   MII Status: up
>>   Speed: 10000 Mbps
>>   Duplex: full
>>   Link Failure Count: 0
>>   Permanent HW addr: XX:XX:XX:XX:XX:XX
>>   Slave queue ID: 0
>>
>> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>
> Changing the layout of a procfs file is pretty much always unsafe.
>
> Some useland program somewhere depends upon the current layout, and
> therefore very likely could break if we change the contents.
>
> We really can't apply a patch like this.  A most portable and
> extensible interface (netlink, ethtool) should be used to provide this
> information to userspace.

Sure, no problem.

Please note however that this procfs file was recently changed ("Slave 
queue ID" was added) so if this is indeed a problem than we would 
already heard about this.

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: [PATCHv4 net-next-2.6 4/5] XFRM,IPv6: Add IRO remapping hook in xfrm_input()
From: Arnaud Ebalard @ 2010-10-06 21:42 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, netdev
In-Reply-To: <20101006012509.GA3540@gondor.apana.org.au>

Hi Herbert,

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Oct 06, 2010 at 01:28:42AM +0200, Arnaud Ebalard wrote:
>>
>>  1) First, current net-next-2.6 (no patches applied)
>>  2) then, with my patches applied and CONFIG_XFRM_SUB_POLICY enabled
>>  3) then, with my patches applied and CONFIG_XFRM_SUB_POLICY disabled
>
> To measure the effect of this properly you should use null
> encryption/hashing and look at the CPU utilisation with minimum
> packet sizes.

I did that w/ the following kernels on the receiver (same as before):

 - kernel 1: w/o patches (net-next-2.6)
 - kernel 2: patches applied w/ CONFIG_XFRM_SUB_POLICY
 - kernel 3: patches applied w/o CONFIG_XFRM_SUB_POLICY

5 runs of sending 1073741824 bytes over TCP protected using ESP (null
enc and null auth) in transport mode, forcing small packets:

dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x <v4orv6dst> 1234 

5 runs for each kernel for IPv4 and IPv6. I keep only the 3 central
values provided by dd indicating the completion of the emission (remove
highest and smallest) and do an average w/ those 3 values: 

              IPv4          IPv6
kernel 1    70.0815s      72.4843s
kernel 2    69.8335s      72.3462s
kernel 3    69.9758s      72.3588s

I used the exact same method for the tests for the 3 cases (reboot,
started the test, nothing else running). I am unable to explain why it's
longer for the test to complete with an unpatched kernel but this is
what I get. Maybe it is just an artefact and the impact is just simply
too small to be mesured. Anyway, the whole set of results are at the end
of the email. 

>> > With your remapping, would it be possible to add dummy xfrm_state
>> > objects with the remapped destination address that could then call
>> > xfrm6_input_addr?
>> >
>> > That way normal IPsec users would not be affected at all while
>> > preserving your new functionality.
>> 
>> I don't think I can do that easily (at all?) with what XFRM provides,
>> can I? Or at least I don't see how it is possible because I would need
>> some kind of policy for the state to be applied and the only trigger I
>> see is the src/dst address mismatch when processing the IPsec packet.
>
> So do you know the remapped destination addresses a priori?

I have both the HoA and the CoA (this is x->coaddr in my state. '::' is
allowed if I want to allow anything), but the appplication of the state
needs to be done for traffic meant for the HoA and not blindly for all
the IPsec traffic received with the CoA (as source or destination).

it's not possible to just blindly remap things based on the on-wire
address and the fact it is IPsec traffic. For instance, if some tunnel
mode IPsec traffic between a MN and its HA is used in parallel, I cannot
remap the CoA from received packets on the MN to the HoA.

That's the reason why the lookup is done via something stable (i.e. the
HoA) which is derived from the SPI during SA lookup. 

> If not then then other possibility would be to add the code hook
> in case of xfrm_state_lookup failure.

This would work for destination address. But it has the drawback of
requiring a first lookup (guaranteed to fail if destination remapping
for the feature) and then a second (somehow w/o using the destination
address). Source check would still be done in all cases.

> But more importantly you need to solve the hash collission issue
> that I mentioned earlier.  Without that it won't work at all.

You are correct. The fact that the byspi hash table contains all states
(for in and out traffic) and that the state lookup does not involve the
direction but only the destination address (damn, the one I remove ;-))
is a real issue. It just makes the state lookup unreliable when I pass
NULL as daddr for incoming traffic. Thanks for pointing that.

I don't see any good solution yet (state have no direction) but I will
definitely focus on that issue and spend some time on it tomorrow.

Cheers,

a+

The full set of results for the tests:

* kernel 1

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 70.0393 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.2442 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.1995 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.0057 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 69.9887 s, 15.3 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.5605 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.9725 s, 14.7 MB/s
1073741824 bytes (1.1 GB) copied, 72.3099 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.435 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.4573 s, 14.8 MB/s

* kernel 2

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 69.6845 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 69.9419 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 69.8615 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.0142 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 69.6970 s, 15.4 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.4252 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 71.5411 s, 15.0 MB/s
1073741824 bytes (1.1 GB) copied, 72.2388 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.3745 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.6553 s, 14.8 MB/s

* kernel 3

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 69.8569 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.4445 s, 15.2 MB/s
1073741824 bytes (1.1 GB) copied, 69.8989 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.0238 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.0047 s, 15.3 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.4677 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.2757 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.2261 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.3331 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.8262 s, 14.7 MB/s

^ permalink raw reply

* [PATCH] Documentation/networking/ixgbevf.txt: Change file permissions to 644
From: Joe Perches @ 2010-10-06 22:09 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: netdev, Randy Dunlap, Greg Rose

Signed-off-by: Joe Perches <joe@perches.com>
---
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
old mode 100755
new mode 100644



^ permalink raw reply

* Re: [PATCH] Documentation/networking/ixgbevf.txt: Change file permissions to 644
From: Jeff Kirsher @ 2010-10-06 22:16 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, Randy Dunlap, Greg Rose, Ben Hutchings, linux-doc
In-Reply-To: <1286402990.32241.10.camel@Joe-Laptop>

On Wed, Oct 6, 2010 at 15:09, Joe Perches <joe@perches.com> wrote:
> Signed-off-by: Joe Perches <joe@perches.com>

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com

> ---
> diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
> old mode 100755
> new mode 100644
>

FYI - Ben submitted this same patch 3 days ago to Randy and
linux-doc@vger.kernel.org.

-- 
Cheers,
Jeff

^ permalink raw reply

* Re: [PATCH] SIW: Module initialization
From: Stephen Hemminger @ 2010-10-06 22:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Bernard Metzler, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <AANLkTikL6u4Kzd1jibp77GMZfMebxusoDX39dPsNK3RW-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 6 Oct 2010 18:06:52 +0200
Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> On Wed, Oct 6, 2010 at 12:02 AM, Stephen Hemminger
> <shemminger-ZtmgI6mnKB3QT0dZR+AlfA@public.gmane.org> wrote:
> > On Tue, 5 Oct 2010 12:57:21 +0200
> > Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> >
> >> > + * TODO: Dynamic device management (network device registration/removal).
> >>
> >> The current implementation is such that one siw device is created for
> >> each network device found at kernel module load time. That means that
> >> you force the user to load the siw kernel module after all other
> >> kernel modules that register a network device. I'm not sure that's a
> >> good idea.
> >
> > Then device should be controlled by a netlink (rtnl_link_ops) style
> > interface see vlan_netlink.c. Using netlink is extensible and provides
> > a cleaner interface than all these other parameterization methods.
> 
> That will be a lot more work to implement than what Jason proposed:
> adding a sysfs variable per Ethernet driver, e.g.
> /sys/class/net/eth0/enable_iwarp.

But if we let code in with that API, then we have to maintain it.
Netlink is the way to go.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [net-next-2.6 PATCH v2] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: Jeff Kirsher @ 2010-10-07  0:05 UTC (permalink / raw)
  To: davem
  Cc: netdev, gospo, bphilips, Ben Hutchings, Eric Dumazet,
	John Fastabend, Jeff Kirsher

From: John Fastabend <john.r.fastabend@intel.com>

Do not set num_rx_queues in netif_set_real_num_rx_queues() some
drivers will increase the real_num_rx_queues later due to a feature
changes or available interrupts increasing. By setting num_rx_queues
here this ends up creating a cap on the number of rx queues
available.

For example the ixgbe driver sets the max number of queues it intends
to use ever then sets the current number in use with the
netif_set_num_{rx|tx}_queues calls. With the current implementation
the number of rx queues gets limited so when a feature such as DCB
or FCoE is enabled the queues are no longer available.

kobjects will only be allocated for real_num_rx_queues so the waste
in memory is minimal.

CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 net/core/dev.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7d14955..fd1b75a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1593,8 +1593,6 @@ int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq)
 						  rxq);
 		if (rc)
 			return rc;
-	} else {
-		dev->num_rx_queues = rxq;
 	}

 	dev->real_num_rx_queues = rxq;

^ permalink raw reply related

* Re: [PATCH] Documentation/networking/ixgbevf.txt: Change file permissions to 644
From: David Miller @ 2010-10-07  0:33 UTC (permalink / raw)
  To: joe; +Cc: jeffrey.t.kirsher, netdev, rdunlap, gregory.v.rose
In-Reply-To: <1286402990.32241.10.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 06 Oct 2010 15:09:50 -0700

> Signed-off-by: Joe Perches <joe@perches.com>
> ---
> diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
> old mode 100755
> new mode 100644

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] neigh: RCU conversion of struct neighbour
From: David Miller @ 2010-10-07  0:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1286380456.9417.40.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 06 Oct 2010 17:54:16 +0200

> This is the second step for neighbour RCU conversion.
> 
> (first was commit d6bf7817 : RCU conversion of neigh hash table)
> 
> neigh_lookup() becomes lockless, but still take a reference on found
> neighbour. (no more read_lock()/read_unlock() on tbl->lock)
> 
> struct neighbour gets an additional rcu_head field and is freed after an
> RCU grace period.
> 
> Future work would need to eventually not take a reference on neighbour
> for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
> use a noref bit like we did for skb->_dst.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox