Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: recommended way to support duplicate IP addresses on different VLANs?
From: Ben Greear @ 2011-07-11 15:56 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Rémi Denis-Courmont, Chris Friesen, netdev
In-Reply-To: <4E1B1B33.5060300@genband.com>

On 07/11/2011 08:48 AM, Chris Friesen wrote:
> On 07/11/2011 09:04 AM, Rémi Denis-Courmont wrote:
>> Le lundi 11 juillet 2011 17:58:14 Chris Friesen, vous avez écrit :
>>> Hi all,
>>>
>>> We've got a server that sits on multiple VLANs. Each VLAN is segregated
>>> and doesn't know about the others. The IP address ranges in each of the
>>> VLANs may overlap, and the server may be assigned the same IP address in
>>> multiple VLANs.
>
>>> Is there any other way to deal with this scenario?
>
>
>> Or then binding sockets to devices (SO_BINDTODEVICE) might work.
>
> Hmm...SO_BINDTODEVICE looks interesting. I would imagine we'd still need
> to do some funky stuff around ARP handling.

arp_filter should help.

Also, you may want to use conn-trck tables.  This lets packets coming
in one or more interfaces use a specific conn-track cache.  Might help
keep the identical IPs from colliding in their conn tracking.

iptables -t raw -A PREROUTING -i eth0.7 -j CT --zone 7

Thanks,
Ben

>
> Chris
>
>
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [ath5k-devel] [PATCH 3/5] ath5k: Add missing breaks in switch/case
From: Pavel Roskin @ 2011-07-11 15:50 UTC (permalink / raw)
  To: Joe Perches
  Cc: Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez, Bob Copeland,
	netdev, ath5k-devel, linux-wireless, John W. Linville,
	linux-kernel
In-Reply-To: <e554bed5c753a3fe85d3be34983a15e6110a6e35.1310289795.git.joe@perches.com>

On 07/10/2011 05:28 AM, Joe Perches wrote:
> Signed-off-by: Joe Perches<joe@perches.com>

Acked-by: Pavel Roskin <proski@gnu.org>

> ---
>   drivers/net/wireless/ath/ath5k/desc.c |    3 +++
>   1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath5k/desc.c b/drivers/net/wireless/ath/ath5k/desc.c
> index 62172d5..f82383b 100644
> --- a/drivers/net/wireless/ath/ath5k/desc.c
> +++ b/drivers/net/wireless/ath/ath5k/desc.c
> @@ -107,10 +107,13 @@ ath5k_hw_setup_2word_tx_desc(struct ath5k_hw *ah, struct ath5k_desc *desc,
>   		case AR5K_PKT_TYPE_BEACON:
>   		case AR5K_PKT_TYPE_PROBE_RESP:
>   			frame_type = AR5K_AR5210_TX_DESC_FRAME_TYPE_NO_DELAY;
> +			break;
>   		case AR5K_PKT_TYPE_PIFS:
>   			frame_type = AR5K_AR5210_TX_DESC_FRAME_TYPE_PIFS;
> +			break;
>   		default:
>   			frame_type = type;
> +			break;
>   		}

The intention here is to replace frame types from enum ath5k_pkt_type 
with their AR5210-specific counterparts.  So the intention is definitely 
to have breaks here.

Unfortunately, AR5210 cards are extremely rare these days.  I have one, 
but it only works with old motherboards.  It would take me half a day to 
dust off that system, compile the kernel and check the patch.  But I 
assume your patch is fine.  At least it's very unlikely to break anything.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: recommended way to support duplicate IP addresses on different VLANs?
From: Chris Friesen @ 2011-07-11 15:48 UTC (permalink / raw)
  To: Rémi Denis-Courmont; +Cc: Chris Friesen, netdev
In-Reply-To: <201107111804.26500.remi@remlab.net>

On 07/11/2011 09:04 AM, Rémi Denis-Courmont wrote:
> Le lundi 11 juillet 2011 17:58:14 Chris Friesen, vous avez écrit :
>> Hi all,
>>
>> We've got a server that sits on multiple VLANs.  Each VLAN is segregated
>> and doesn't know about the others.  The IP address ranges in each of the
>> VLANs may overlap, and the server may be assigned the same IP address in
>> multiple VLANs.

>> Is there any other way to deal with this scenario?

> Or then binding sockets to devices (SO_BINDTODEVICE) might work.

Hmm...SO_BINDTODEVICE looks interesting.  I would imagine we'd still 
need to do some funky stuff around ARP handling.

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply

* Re: [PATCH] bridge: mask forwarding of IEEE 802 local multicast groups
From: Stephen Hemminger @ 2011-07-11 15:27 UTC (permalink / raw)
  To: Nick Carter; +Cc: netdev, Michał Mirosław, David Lamparter, davem
In-Reply-To: <CAEJpZP1cr4B9qX6gH9O4iXwWvEdxs2aebL21FnX1BE0dJPHrEw@mail.gmail.com>

On Sun, 10 Jul 2011 17:04:30 +0100
Nick Carter <ncarter100@gmail.com> wrote:

> Updated diffs so they apply to net-next (Original diffs were based off 2.6.38).
> 
> Any chance of getting these diffs applied?  The default behaviour of
> the bridge code is unchanged.  They solve the problem of
> authenticating a virtual 802.1x supplicant machine against an external
> 802.1X authenticator.  It is also a general solution that allows the
> forwarding of any combination of the IEEE 802 local multicast groups.
> 
> Signed-off-by: Nick Carter <ncarter100@gmail.com>
> Reviewed-by: David Lamparter <equinox@diac24.net>

I am still undecided on this. Understand the need, but don't like idea
of bridge behaving in non-conforming manner. Will see if IEEE 802 committee
has any input.

Also, don't want to build more knobs in with sysfs that are per-bridge.
Eventually, the plan is to make all the setting per-port with sysctl's
like IPv6.

^ permalink raw reply

* Re: [PATCH v2 46/46] net: mark drivers that drop packets from rx queue head under memory pressure
From: Stephen Hemminger @ 2011-07-11 15:24 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev, Hartley Sweeten, Michael Chan, Eilon Greenstein,
	Guo-Fu Tseng, Realtek linux nic maintainers, Francois Romieu,
	Matt Carlson, Jon Mason
In-Reply-To: <c84a04a957128ae664f4a80da23fad4b9f71a85f.1310339689.git.mirq-linux@rere.qmqm.pl>

On Mon, 11 Jul 2011 02:52:50 +0200 (CEST)
Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:

> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
>  drivers/net/arm/ep93xx_eth.c    |    3 +++
>  drivers/net/bnx2.c              |    3 +++
>  drivers/net/bnx2x/bnx2x_cmn.c   |    3 +++
>  drivers/net/cassini.c           |    3 +++
>  drivers/net/jme.c               |    3 +++
>  drivers/net/mlx4/en_rx.c        |    6 ++++++
>  drivers/net/r8169.c             |    3 +++
>  drivers/net/skge.c              |    3 +++
>  drivers/net/sky2.c              |    2 ++
>  drivers/net/tg3.c               |    2 ++
>  drivers/net/tokenring/olympic.c |    2 ++
>  drivers/net/vxge/vxge-main.c    |    3 +++
>  12 files changed, 36 insertions(+), 0 deletions(-)
> 

Nak. This is normal behavior and putting in a compile warning is just
useless extra noise.

^ permalink raw reply

* Re: [PATCH v2 04/46] net/wireless: p54: remove useless dma_sync_single_for_device(DMA_FROM_DEVICE)
From: Pavel Roskin @ 2011-07-11 15:15 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Christian Lamparter,
	John W. Linville, linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <86c8bde08b005ca7eb4806ea77aec1f3212d63fc.1310339688.git.mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

On 07/10/2011 08:52 PM, Michał Mirosław wrote:
> Also constify pointers used in frame parsers to verify assumptions.

Cleanups are better done separately.

> -	u16 type = le16_to_cpu(*((__le16 *)skb->data));
> +	u16 type = le16_to_cpu(*(const __le16 *)skb->data);

I think it would be more appropriate to use get_unaligned_le16() here. 
No casts should be needed then.

That's not an objection, just a suggestion :)

-- 
Regards,
Pavel Roskin
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: recommended way to support duplicate IP addresses on different VLANs?
From: Rémi Denis-Courmont @ 2011-07-11 15:04 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev
In-Reply-To: <4E1B0F86.2040508@mail.usask.ca>

Le lundi 11 juillet 2011 17:58:14 Chris Friesen, vous avez écrit :
> Hi all,
> 
> We've got a server that sits on multiple VLANs.  Each VLAN is segregated
> and doesn't know about the others.  The IP address ranges in each of the
> VLANs may overlap, and the server may be assigned the same IP address in
> multiple VLANs.
> 
> We've got a messy solution now involving unique internal addresses and
> NATing between those and the duplicate external addresses, but I'm
> wondering if there is a cleaner way to handle this.
> 
> It seems like network namespaces would work, but it would require
> multiple instances of our software which is a dealbreaker.
> 
> Is there any other way to deal with this scenario?

Namespace file descriptors if/when they get accepted.

Or then binding sockets to devices (SO_BINDTODEVICE) might work.

-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply

* recommended way to support duplicate IP addresses on different VLANs?
From: Chris Friesen @ 2011-07-11 14:58 UTC (permalink / raw)
  To: netdev

Hi all,

We've got a server that sits on multiple VLANs.  Each VLAN is segregated 
and doesn't know about the others.  The IP address ranges in each of the 
VLANs may overlap, and the server may be assigned the same IP address in 
multiple VLANs.

We've got a messy solution now involving unique internal addresses and 
NATing between those and the duplicate external addresses, but I'm 
wondering if there is a cleaner way to handle this.

It seems like network namespaces would work, but it would require 
multiple instances of our software which is a dealbreaker.

Is there any other way to deal with this scenario?

Thanks,
Chris

^ permalink raw reply

* Re: [PATCH 3/3] iwlagn: Add missing comma between constant string array
From: Guy, Wey-Yi @ 2011-07-11 13:42 UTC (permalink / raw)
  To: Joe Perches
  Cc: Intel Linux Wireless, John W. Linville,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <2c8a6e15b0f320e4b4b761ae6434862fc3c15924.1310187270.git.joe@perches.com>

On Fri, 2011-07-08 at 23:20 -0700, Joe Perches wrote:
> Multiple quoted strings are concatenated without comma separators.
> 
> Make the array const while there.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
> ---
>  drivers/net/wireless/iwlwifi/iwl-agn.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.c b/drivers/net/wireless/iwlwifi/iwl-agn.c
> index 7e6c463..de1a0c1 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c
> @@ -1563,7 +1563,7 @@ static void iwl_ucode_callback(const struct firmware *ucode_raw, void *context)
>  	release_firmware(ucode_raw);
>  }
>  
> -static const char *desc_lookup_text[] = {
> +static const char * const desc_lookup_text[] = {
>  	"OK",
>  	"FAIL",
>  	"BAD_PARAM",
> @@ -1587,7 +1587,7 @@ static const char *desc_lookup_text[] = {
>  	"NMI_INTERRUPT_DATA_ACTION_PT",
>  	"NMI_TRM_HW_ER",
>  	"NMI_INTERRUPT_TRM",
> -	"NMI_INTERRUPT_BREAK_POINT"
> +	"NMI_INTERRUPT_BREAK_POINT",
>  	"DEBUG_0",
>  	"DEBUG_1",
>  	"DEBUG_2",

Thanks
Wey

^ permalink raw reply

* [PATCH] inetpeer: kill inet_putpeer race
From: Eric Dumazet @ 2011-07-11 12:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

We currently can free inetpeer entries too early :

[  782.636674] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f130f44c)
[  782.636677] 1f7b13c100000000000000000000000002000000000000000000000000000000
[  782.636686]  i i i i u u u u i i i i u u u u i i i i u u u u u u u u u u u u
[  782.636694]                          ^
[  782.636696] 
[  782.636698] Pid: 4638, comm: ssh Not tainted 3.0.0-rc5+ #270 Hewlett-Packard HP Compaq 6005 Pro SFF PC/3047h
[  782.636702] EIP: 0060:[<c13fefbb>] EFLAGS: 00010286 CPU: 0
[  782.636707] EIP is at inet_getpeer+0x25b/0x5a0
[  782.636709] EAX: 00000002 EBX: 00010080 ECX: f130f3c0 EDX: f0209d30
[  782.636711] ESI: 0000bc87 EDI: 0000ea60 EBP: f0209ddc ESP: c173134c
[  782.636712]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  782.636714] CR0: 8005003b CR2: f0beca80 CR3: 30246000 CR4: 000006d0
[  782.636716] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  782.636717] DR6: ffff4ff0 DR7: 00000400
[  782.636718]  [<c13fbf76>] rt_set_nexthop.clone.45+0x56/0x220
[  782.636722]  [<c13fc449>] __ip_route_output_key+0x309/0x860
[  782.636724]  [<c141dc54>] tcp_v4_connect+0x124/0x450
[  782.636728]  [<c142ce43>] inet_stream_connect+0xa3/0x270
[  782.636731]  [<c13a8da1>] sys_connect+0xa1/0xb0
[  782.636733]  [<c13a99dd>] sys_socketcall+0x25d/0x2a0
[  782.636736]  [<c149deb8>] sysenter_do_call+0x12/0x28
[  782.636738]  [<ffffffff>] 0xffffffff

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/inetpeer.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index dafbf2c..90c5f0d 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -373,11 +373,14 @@ static int inet_peer_gc(struct inet_peer_base *base,
 	while (stackptr > stack) {
 		stackptr--;
 		p = rcu_deref_locked(**stackptr, base);
-		delta = (__u32)jiffies - p->dtime;
-		if (atomic_read(&p->refcnt) == 0 && delta >= ttl &&
-		    atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {
-			p->gc_next = gchead;
-			gchead = p;
+		if (atomic_read(&p->refcnt) == 0) {
+			smp_rmb();
+			delta = (__u32)jiffies - p->dtime;
+			if (delta >= ttl &&
+			    atomic_cmpxchg(&p->refcnt, 0, -1) == 0) {
+				p->gc_next = gchead;
+				gchead = p;
+			}
 		}
 	}
 	while ((p = gchead) != NULL) {
@@ -456,6 +459,7 @@ EXPORT_SYMBOL_GPL(inet_getpeer);
 void inet_putpeer(struct inet_peer *p)
 {
 	p->dtime = (__u32)jiffies;
+	smp_mb__before_atomic_dec();
 	atomic_dec(&p->refcnt);
 }
 EXPORT_SYMBOL_GPL(inet_putpeer);



^ permalink raw reply related

* Re: [PATCH v2 00/46] Clean up RX copybreak and DMA handling
From: Ben Hutchings @ 2011-07-11 12:36 UTC (permalink / raw)
  To: David Miller; +Cc: mirq-linux, netdev
In-Reply-To: <20110710.235458.1549578255936886669.davem@davemloft.net>

On Sun, 2011-07-10 at 23:54 -0700, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Mon, 11 Jul 2011 02:52:46 +0200 (CEST)
> 
> >   1. under packet storm and memory pressure NIC keeps generating interrupts
> >      (if non-NAPI) and indicating new buffers because it always has free
> >      RX buffers --- this only wastes CPU and bus bandwidth transferring
> >      data that is going to be immediately discarded;
> 
> Actually, this is exactly how I, and others advise people to implement
> drivers.  It is the right thing to do.
> 
> The worst thing that can happen is to let the RX ring empty of
> buffers.  Some cards hang as a result of this, and also it causes head
> of line blocking on multiqueue cards, etc.

The controllers you are familiar with might do head-of-line blocking
when a single RX queue is empty.  But any multiqueue controller that is
supposed to support untrusted queues (required for SR-IOV) had better
not.  This is certainly not done on Solarflare controllers (packets for
that queue just get dropped until it's refilled) and I doubt it's done
on many others.

I also think it's quite reasonable for the RX queue to stop interrupting
when the host is already too busy to refill it.  Some drivers might not
recover correctly, but this is not a hardware issue.

> So the first thing the driver should do is try to allocate a
> replacement buffer.
> 
> And if that fails, it should give the RX packet right back to the
> card, and not pass it up the stack.

I agree this is a reasonable and generic way to deal with empty RX
queues.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 1/2] neigh: Store hash shift instead of mask.
From: Michał Mirosław @ 2011-07-11 11:58 UTC (permalink / raw)
  To: David Miller; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110711.014841.1004194674075047305.davem@davemloft.net>

2011/7/11 David Miller <davem@davemloft.net>:
> And mask the hash function result by simply shifting
> down the "->hash_shift" most significant bits.
>
> Currently which bits we use is arbitrary since jhash
> produces entropy evenly across the whole hash function
> result.
>
> But soon we'll be using universal hashing functions,
> and in those cases more entropy exists in the higher
> bits than the lower bits, because they use multiplies.

You could use some evil shift tricks to cut some instructions if you like. ;-)
Examples below.

[...]
> -       for (i = 0; i <= nht->hash_mask; i++) {
> +       for (i = 0; i < (1 << nht->hash_shift); i++) {

for (i = 0; !(i >> nth->hash_shift); i++)

[...]
> -       size_t size = entries * sizeof(struct neighbour *);
> +       size_t size = (1 << shift) * sizeof(struct neighbour *);

size_t size = sizeof(struct neighbour *) << shift;

Or, since later get_order(size) is used:

unsinged int size_shift = shift + order_base_2(sizeof(struct neighbour *));

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC 43/72] a2065/ariadne: Move the a2065/ariadne drivers
From: David Miller @ 2011-07-11 11:41 UTC (permalink / raw)
  To: hch; +Cc: jeffrey.t.kirsher, geert, netdev
In-Reply-To: <20110711113128.GA21187@infradead.org>

From: Christoph Hellwig <hch@infradead.org>
Date: Mon, 11 Jul 2011 07:31:28 -0400

> But what whole vendor split, even if consistently implemented seems
> like a lot of wanking for very little juice.

There's simply too much stuff under drivers/net right now, I
want TAB completion to actually start being useful again.

^ permalink raw reply

* Re: [RFC 43/72] a2065/ariadne: Move the a2065/ariadne drivers
From: Christoph Hellwig @ 2011-07-11 11:31 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Geert Uytterhoeven, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1310377198.26989.90.camel@jtkirshe-mobl>

On Mon, Jul 11, 2011 at 02:39:57AM -0700, Jeff Kirsher wrote:
> It may just be me, these statements seem negative and bitter and is not
> helping us "solve" the issue.  While the statements may be true,  I
> would like to try and find a solution, what ever it may be, to better
> organize drivers/net/ethernet/ drivers to help with maintenance and
> future development.

It would help what the whole point of shuffling these drivers around
in arbitrary ways is.  Splitting them up for the actual protocols might
make some sense, as would giving a subdirectory to every non-trivial
driver.  But what whole vendor split, even if consistently implemented
seems like a lot of wanking for very little juice.

^ permalink raw reply

* Re: [PATCH v2 00/46] Clean up RX copybreak and DMA handling
From: Michał Mirosław @ 2011-07-11 11:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110711.031128.635136897396133223.davem@davemloft.net>

On Mon, Jul 11, 2011 at 03:11:28AM -0700, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Mon, 11 Jul 2011 11:47:23 +0200
> 
> > Catch 22: The chips are not tested because they have always free buffers,
> > they are provided with endless rx buffers because they are not being
> > tested. I'd rather test them well rather than workaround a phantom issue.
> > Tripping on empty free rx buffer ring is still possible even with scheme
> > #1 (eg. with lots of NICs receiving heavy traffic) --- just harder to
> > recognise and debug (if it breaks at all) when it happens.
> I do not support taking this risk.

The problem will be waiting, just pushed into dark corner. ;)
The card+driver need to survive empty rx buffer list anyway.

This issue will come back with people attacking bufferbloat issues
(they will want to reduce queue sizes and so increase the frequency
when free rx buffers run out).

> Please do not submit patches which move away from the long
> standing allocation failure handling scheme.

Sure. In this series it's patch 16 --- because it was trivial to do. Pending
patches only wrap the existing behaviour (where clearly scheme #1 or #2
is used).

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: introduce build_skb()
From: Michał Mirosław @ 2011-07-11 10:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1310363206.2512.26.camel@edumazet-laptop>

On Mon, Jul 11, 2011 at 07:46:46AM +0200, Eric Dumazet wrote:
> Le lundi 11 juillet 2011 à 02:52 +0200, Michał Mirosław a écrit :
> > Introduce __netdev_alloc_skb_aligned() to return skb with skb->data
> > aligned at specified 2^n multiple.
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Hi Michal
> 
> Could we synchronize our work to not introduce things that might
> disappear shortly ?

Sure. Are you saying that you'll convert all drivers to build_skb()? :-)

> Here is the RFC patch about build_skb() :
> 
> [PATCH] net: introduce build_skb()
> 
> One of the thing we discussed during netdev 2011 conference was the idea
> to change network drivers to allocate/populate their skb at RX
> completion time, right before feeding the skb to network stack.
> 
> Right now, we allocate skbs when populating the RX ring, and thats a
> waste of CPU cache, since allocating skb means a full memset() to clear
> the skb and its skb_shared_info portion. By the time NIC fills a frame
> in data buffer and host can get it, cpu probably threw away the cache
> lines from its caches, because of huge RX ring sizes.
> 
> So the deal would be to allocate only the data buffer for the NIC to
> populate its RX ring buffer. And use build_skb() at RX completion to
> attach a data buffer (now filled with an ethernet frame) to a new skb,
> initialize the skb_shared_info portion, and give the hot skb to network
> stack.
> 
> build_skb() is the function to allocate an skb, caller providing the
> data buffer that should be attached to it. Drivers are expected to call 
> skb_reserve() right after build_skb() to let skb->data points to the
> Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN)
[...]
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -234,6 +234,54 @@ nodata:
[...]
>  /**
> + * build_skb - build a network buffer
> + * @data: data buffer provider by caller
> + * @size: size of data buffer, not including skb_shared_info
> + *
> + * Allocate a new &sk_buff. Caller provides space holding head and
> + * skb_shared_info. Mostly used in driver RX path.
> + * The return is the buffer. On a failure the return is %NULL.
> + * Notes :
> + *  Before IO, driver allocates only data buffer where NIC put incoming frame
> + *  Driver SHOULD add room at head (NET_SKB_PAD) and
> + *  MUST add room tail (to hold skb_shared_info)
> + *  After IO, driver calls build_skb(), to get a hot skb instead of a cold one
> + *  before giving packet to stack. RX rings only contains data buffers, not
> + *  full skbs.
> + */
> +struct sk_buff *build_skb(void *data, unsigned int size)
> +{
> +	struct skb_shared_info *shinfo;
> +	struct sk_buff *skb;
> +
> +	skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
> +	if (!skb)
> +		return NULL;
> +
> +	size = SKB_DATA_ALIGN(size);
> +
> +	memset(skb, 0, offsetof(struct sk_buff, tail));
> +	skb->truesize = size + sizeof(struct sk_buff);
> +	atomic_set(&skb->users, 1);
> +	skb->head = data;
> +	skb->data = data;
> +	skb_reset_tail_pointer(skb);
> +	skb->end = skb->tail + size;
> +#ifdef NET_SKBUFF_DATA_USES_OFFSET
> +	skb->mac_header = ~0U;
> +#endif
> +
> +	/* make sure we initialize shinfo sequentially */
> +	shinfo = skb_shinfo(skb);
> +	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> +	atomic_set(&shinfo->dataref, 1);
> +	kmemcheck_annotate_variable(shinfo->destructor_arg);
> +
> +	return skb;
> +}
> +EXPORT_SYMBOL(build_skb);

I like the idea. From driver writer perspective I would like to also
see a function that given max frame size and DMA aligment would allocate
the buffer for me.

In short:

 * rx_refill:

	[ptr, size] = alloc_rx_buffer(size, alignment, offset);
	[dma_addr] = map_buffer(ptr, size);
	append to rx buffer list

 * rx_poll:

	unmap_buffer(dma_addr, size);
	[skb] = build_skb(ptr, size);
	if (!skb)
		reuse_buffer
		return
	skb_reserve(skb, rx_offset);
	skb_put(skb, pkt_len);
	(indicate offloads)
	rx_skb(skb);
	call rx_refill

 * rx_poll with copybreak:

	sync_buffer_to_cpu(dma_addr, data_len);
	[skb, copied] = build_or_copy_skb(ptr, size, hw_rx_offset, pkt_len);
	if (copied || !skb)
		append to rx buffer list
	else
		unmap_buffer(dma_addr, size);
	if (!skb)
		return;
	(indicate offloads)
	rx_skb(skb);
	if (!copied)
		call rx_refill


For even less driver code this could happen:

 * rx_refill(ptr, dma_addr)
	[size/alignment stored in queue or net_device struct]

	if (is_rx_free_list_full)
		return -EBUSY;
	append to rx buffer list [ptr, dma_addr, size]
	return !is_rx_free_list_full;

 * rx_poll with copybreak:
	[copy threshold stored in queue or net_device struct]

	[skb] = finish_rx(ptr, dma_addr, size, hw_rx_offset, pkt_len);
	if (!skb)
		return;
	(fill in offloads: checksum/etc)
	rx_skb(skb);


This could be extended to handle frames spanning multiple buffers.

BTW, napi_get_frags() + napi_gro_frags() use similar idea of allocating
skb late.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH v2 46/46] net: mark drivers that drop packets from rx queue head under memory pressure
From: Eilon Greenstein @ 2011-07-11 10:16 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev@vger.kernel.org, Hartley Sweeten, Michael Chan,
	Guo-Fu Tseng, Realtek linux nic maintainers, Francois Romieu,
	Stephen Hemminger, Matthew Carlson, Jon Mason
In-Reply-To: <20110711100447.GA7532@rere.qmqm.pl>

On Mon, 2011-07-11 at 03:04 -0700, Michał Mirosław wrote:
> On Mon, Jul 11, 2011 at 09:47:08AM +0300, Eilon Greenstein wrote:
> > On Sun, 2011-07-10 at 17:52 -0700, Michał Mirosław wrote:
> > > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > ---
> > 
> > > diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
> > > index 4f9164c..a6da01a 100644
> > > --- a/drivers/net/bnx2x/bnx2x_cmn.c
> > > +++ b/drivers/net/bnx2x/bnx2x_cmn.c
> > > @@ -673,6 +673,9 @@ int bnx2x_rx_int(struct bnx2x_fastpath *fp, int budget)
> > >  				goto reuse_rx;
> > >  			}
> > >  
> > > +#warning drops packets from rx queue head on memory pressure
> > > +#warning (like dev_skb_finish_rx_dma_refill() users)
> > > +
> > 
> > We have the dropless_fc module parameter that can be configured if the
> > user prefers pausing on host memory pressure - the problem with that
> > feature is that it is enough that one of the ring runs out of memory and
> > the entire port is stopped. When running with 16 rings, this can lead to
> > serious throughput degradation - this is why it is kept as a user
> > configurable option.
> 
> From the code it look like dropless_fc just enables sending of pause
> frames.  If that's disabled, then what happens when one queue runs out
> of free rx buffers?
> 

Actually, I was too fast before and did not read it all through. After I
did, I saw that Dave already replied...

The dropless_fc is not really related to this case. It is about the
driver not keeping up with the FW/HW and not about the driver failing to
allocate a buffer (well, not directly - if that will happen with the
suggested patch we will run out of space on the ring). If the ring is
full, the FW will drop the packet. But if the FW is not fast enough and
the internal chip buffer is getting full - the HW will send pause. So
when pause is enabled, without dropless_fc packets will still be dropped
if the host is too slow but not if the chip is too slow (when exceeding
the chip max PPS with small packets). When dropless_fc is set, packet
will not be dropped but in multi-ring scenario we are likely to be under
utilizing the link in case some (possibly only one) ring on one CPU is
not keeping up while the other rings (on other CPUs) still have room and
possibly idling. 

Regards,
Eilon

^ permalink raw reply

* Re: [PATCH v2 00/46] Clean up RX copybreak and DMA handling
From: David Miller @ 2011-07-11 10:11 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <20110711094723.GC6380@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Mon, 11 Jul 2011 11:47:23 +0200

> Catch 22: The chips are not tested because they have always free buffers,
> they are provided with endless rx buffers because they are not being
> tested. I'd rather test them well rather than workaround a phantom issue.
> Tripping on empty free rx buffer ring is still possible even with scheme
> #1 (eg. with lots of NICs receiving heavy traffic) --- just harder to
> recognise and debug (if it breaks at all) when it happens.

I do not support taking this risk.

Please do not submit patches which move away from the long
standing allocation failure handling scheme.

^ permalink raw reply

* Re: [PATCH v2 46/46] net: mark drivers that drop packets from rx queue head under memory pressure
From: Michał Mirosław @ 2011-07-11 10:04 UTC (permalink / raw)
  To: Eilon Greenstein
  Cc: netdev@vger.kernel.org, Hartley Sweeten, Michael Chan,
	Guo-Fu Tseng, Realtek linux nic maintainers, Francois Romieu,
	Stephen Hemminger, Matthew Carlson, Jon Mason
In-Reply-To: <1310366828.22731.1.camel@lb-tlvb-eilong.il.broadcom.com>

On Mon, Jul 11, 2011 at 09:47:08AM +0300, Eilon Greenstein wrote:
> On Sun, 2011-07-10 at 17:52 -0700, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> 
> > diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
> > index 4f9164c..a6da01a 100644
> > --- a/drivers/net/bnx2x/bnx2x_cmn.c
> > +++ b/drivers/net/bnx2x/bnx2x_cmn.c
> > @@ -673,6 +673,9 @@ int bnx2x_rx_int(struct bnx2x_fastpath *fp, int budget)
> >  				goto reuse_rx;
> >  			}
> >  
> > +#warning drops packets from rx queue head on memory pressure
> > +#warning (like dev_skb_finish_rx_dma_refill() users)
> > +
> 
> We have the dropless_fc module parameter that can be configured if the
> user prefers pausing on host memory pressure - the problem with that
> feature is that it is enough that one of the ring runs out of memory and
> the entire port is stopped. When running with 16 rings, this can lead to
> serious throughput degradation - this is why it is kept as a user
> configurable option.

From the code it look like dropless_fc just enables sending of pause
frames.  If that's disabled, then what happens when one queue runs out
of free rx buffers?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC 43/72] a2065/ariadne: Move the a2065/ariadne drivers
From: Geert Uytterhoeven @ 2011-07-11  9:51 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1310377198.26989.90.camel@jtkirshe-mobl>

On Mon, Jul 11, 2011 at 11:39, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> On Sun, 2011-07-10 at 23:33 -0700, Geert Uytterhoeven wrote:
>> On Mon, Jul 11, 2011 at 02:48, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
>> > On Sun, 2011-07-10 at 12:34 -0700, Geert Uytterhoeven wrote:
>> >> On Sat, Jul 9, 2011 at 16:30, Jeff Kirsher
>> >> <jeffrey.t.kirsher@intel.com> wrote:
>> >> > On Tue, 2011-06-28 at 13:33 -0700, Geert Uytterhoeven wrote:
>> >> >> And (in some other patch) 82596.c is an Intel driver, not a
>> >> Motorola driver.
>> >> >
>> >> > 82596.c is not an Intel driver, it is an Intel part.  The driver was
>> >>
>> >> Sorry, I meant "driver for an Intel part".
>> >>
>> >> > written and support by someone other than Intel.  I am looking at
>> >> how to
>> >>
>> >> Sure. But I'm strongly against classifying drivers based on who wrote
>> >> them ;-)
>> >
>> > I agree to some extent, because if that were the case, we would have a
>> > donald_becker/ directory for several of the drivers. :)
>> >
>> > Here is one of the problem's I keep running into and there is no simple
>> > answer.  While most of the drivers can be grouped together by the
>> > hardware they use, that does not work "logically" for every driver.
>> >
>> > In addition, if vendor 'A' makes a part and vendor 'B' uses same part in
>> > a device/system/NIC and vendor 'B' creates the driver, supports the
>> > driver and maintains the driver.  Should the part be categorized under
>> > vendor 'A'?  IMHO, I think it should be categorized as a vendor 'B'
>> > driver.
>>
>> Several of the Ethernet drivers are of a third type: vendor A chip, vendor B
>> card, entity C software.
>>
>> > I started this work with the idea of trying to organize the drivers in
>> > the same way that the drivers were to be in the Kconfig, which tended to
>> > be drivers/net/ethernet/<manufacturer>.
>> >
>> > One of the problems that arise in this organization is what do we do
>> > when vendor A is bought by vendor B, and vendor B takes on the support
>> > of all the old vendor A parts/drivers?
>>
>> We don't care. We don't sort drivers by who support them. Eventually, vendors
>> lose interest and they all end up under "Linux kernel community" anyway.
>
> It may just be me, these statements seem negative and bitter and is not
> helping us "solve" the issue.  While the statements may be true,  I
> would like to try and find a solution, what ever it may be, to better
> organize drivers/net/ethernet/ drivers to help with maintenance and
> future development.
>
>> > So I am open to suggestions.  The process I have using to organize the
>> > drivers has been to group drivers that use common libraries and/or code
>> > first, then group by either manufacturer, maintainer, or common
>> > platform.
>> >
>> > I would like to keep the lasi_82506.c, sni_82596.c, 82506.c and similar
>> > drivers out of the intel/ directory because we would not be supporting
>> > the drivers and they are not similar to our drivers that we do support
>> > that would be in the intel/ directory.
>> >
>> > Again, I open to suggestions on how to best organize these types of
>> > drivers.  Maybe create a misc/ or <bus_type>/ for these types of
>> > drivers?
>>
>> "Similar" drivers should be together and consolidated (if someone has time
>> to do it). They can even be of different brands.
>> I.e. not all Tulip-compatibles were manufactured by Digital or Intel.
>
> I agree and understand, that is why I am taking the time to do it.  The
> drivers/net/ethernet/8390/, drivers/net/ethernet/tulip and
> drivers/net/ethernet/sun/ are some examples of this.

I have no problems with moving all 82596 drivers to drivers/net/ethernet/intel/.

>> >> > better organize the 82596.c, lasi_82596.c, lib82596.c, and
>> >> sni_82596.c
>> >> > which all use an Intel Ethernet chip but were written and supported
>> >> by
>> >> > someone other than Intel.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v2 00/46] Clean up RX copybreak and DMA handling
From: Michał Mirosław @ 2011-07-11  9:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110711.022403.532062888492669176.davem@davemloft.net>

On Mon, Jul 11, 2011 at 02:24:03AM -0700, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Mon, 11 Jul 2011 11:16:49 +0200
> > Packet is indicated in queue N, theres no memory for new skb, so its
> > dropped, and the buffer goes back to free list. In parallel, queue M
> > (!= N) indicates new packet. Still, there's no memory for new skb so
> > its also dropped and its buffer is reused. The effect is that all
> > packets are dropped, whatever queue they appear on.
> Why would queue M (!= N) fail just because N did?  They may be
> allocating out of different NUMA nodes, and thus succeed.
> 
> > The HOL blocking does not matter here, because there's only one head
> > --- the system memory. If I misunderstood this point, please explain
> > it further.
> Multiqueue drivers are moving towards placing the queues on different
> NUMA nodes, and in that scenerio one queue might succeed even if the
> other fails.

I assumed that all queues get buffers from the same rx free ring. If
queues have their own free list, then we can treat them as separate NICs
for this discussion. Queues on one NUMA node stall, others go on normally.

> Back to the hardware hanging issue, it's real.  Getting into a
> situation where the RX ring lacks any buffers at all is the least
> tested path for these chips.
> 
> Testing fate is a really bad idea, and this is why I always propose to
> keep the hardware with RX buffers to use in all circumstances.

Catch 22: The chips are not tested because they have always free buffers,
they are provided with endless rx buffers because they are not being
tested. I'd rather test them well rather than workaround a phantom issue.
Tripping on empty free rx buffer ring is still possible even with scheme
#1 (eg. with lots of NICs receiving heavy traffic) --- just harder to
recognise and debug (if it breaks at all) when it happens.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH v2 03/46] net drivers: remove unnecessary dma_sync_to_device(DMA_FROM_DEVICE)
From: Vlad Zolotarov @ 2011-07-11  9:46 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: linux-wireless@vger.kernel.org, Eilon Greenstein, Gary Zambrano,
	Stephen Hemminger, Stefano Brivio,
	e1000-devel@lists.sourceforge.net, Matthew Carlson,
	Jesse Brandeburg, Francois Romieu, Realtek linux nic maintainers,
	John W. Linville, Ron Mercer, Michael Chan, Jitendra Kalsaria,
	Divy Le Ray, netdev@vger.kernel.org, Bruce Allan, Hartley Sweeten,
	John Ronciak, Jon 
In-Reply-To: <20110711092909.GB6380@rere.qmqm.pl>

On Monday 11 July 2011 12:29:09 Michał Mirosław wrote:
> On Mon, Jul 11, 2011 at 11:30:39AM +0300, Vlad Zolotarov wrote:
> > >         prod_rx_buf->skb = skb;
> > > 
> > > diff --git a/drivers/net/bnx2x/bnx2x_cmn.h
> > > b/drivers/net/bnx2x/bnx2x_cmn.h index c016e20..c9e49a0 100644
> > > --- a/drivers/net/bnx2x/bnx2x_cmn.h
> > > +++ b/drivers/net/bnx2x/bnx2x_cmn.h
> > > @@ -923,16 +923,11 @@ static inline int bnx2x_alloc_rx_skb(struct bnx2x
> > > *bp, static inline void bnx2x_reuse_rx_skb(struct bnx2x_fastpath *fp,
> > > 
> > >                                       u16 cons, u16 prod)
> > >  
> > >  {
> > > 
> > > -       struct bnx2x *bp = fp->bp;
> > > 
> > >         struct sw_rx_bd *cons_rx_buf = &fp->rx_buf_ring[cons];
> > >         struct sw_rx_bd *prod_rx_buf = &fp->rx_buf_ring[prod];
> > >         struct eth_rx_bd *cons_bd = &fp->rx_desc_ring[cons];
> > >         struct eth_rx_bd *prod_bd = &fp->rx_desc_ring[prod];
> > > 
> > > -       dma_sync_single_for_device(&bp->pdev->dev,
> > > -                                  dma_unmap_addr(cons_rx_buf,
> > > mapping), -                                  RX_COPY_THRESH,
> > > DMA_FROM_DEVICE); -
> > > 
> > >         dma_unmap_addr_set(prod_rx_buf, mapping,
> > >         
> > >                            dma_unmap_addr(cons_rx_buf, mapping));
> > >         
> > >         prod_rx_buf->skb = cons_rx_buf->skb;
> > 
> > Michal, pls., note that this function is only called for buffers which
> > were previously dma_synced towards CPU (your "[PATCH v2 05/46] net:
> > bnx2x: fix DMA sync direction" properly fixes the direction of the first
> > call which was incorrect). Then, according to the 3d edition of the
> > "Linux device drivers" book, chapter 15, "Setting up streaming DMA
> > mappings" article, end of the page 449, when we call for
> > dma_syc_single_for_cpu() the buffer ownership gets to the CPU and CPU
> > may safely access the buffer (in particular, we read it). Then the
> > author says: "Before the device accesses the buffer, however, ownership
> > should be transfered back to it with: dma_sync_single_for_device().
> > 
> > The DMA-API.txt document u've referenced doesn't refer the above
> > function, so, it's unclear how your fix may be based on it. On the other
> > hand it clearly contradicts the "Linux device driver" book.
> 
> DMA-API.txt describes what synchronization points are necessary for what
> DMA mapping types (direction). dma_sync_single_for_cpu/device() are
> functions realising those points. Note that example DMA-API-HOWTO.txt is
> misleading as it has dma_sync_single_for_device() where its not required
> by DMA-API.txt.
> 
> In this case, you don't need to sync to device for mappings that haven't
> been written to by CPU. CPU caches will be invalidated anyway by next
> dma_sync_single_for_cpu() or dma_unmap_single() and the CPU should not
> ever write to cachelines that belong to FROM_DEVICE mappings.

Okay, I see the section in the doc u r talking about... I agree. We may drop 
these sync_single() in the bnx2x_reuse_rx_skb().

> 
> The best source is the code. 

Hmmm... The code is bug prone, so I'd stick to the Doc...;)

Thanks, Michal.
vlad


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [RFC 43/72] a2065/ariadne: Move the a2065/ariadne drivers
From: Jeff Kirsher @ 2011-07-11  9:39 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <CAMuHMdXfpbbfv7oT+0oDWJd0=x0BBqEeCcGWaxMHagTXoeCscQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4130 bytes --]

On Sun, 2011-07-10 at 23:33 -0700, Geert Uytterhoeven wrote:
> On Mon, Jul 11, 2011 at 02:48, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > On Sun, 2011-07-10 at 12:34 -0700, Geert Uytterhoeven wrote:
> >> On Sat, Jul 9, 2011 at 16:30, Jeff Kirsher
> >> <jeffrey.t.kirsher@intel.com> wrote:
> >> > On Tue, 2011-06-28 at 13:33 -0700, Geert Uytterhoeven wrote:
> >> >> And (in some other patch) 82596.c is an Intel driver, not a
> >> Motorola driver.
> >> >
> >> > 82596.c is not an Intel driver, it is an Intel part.  The driver was
> >>
> >> Sorry, I meant "driver for an Intel part".
> >>
> >> > written and support by someone other than Intel.  I am looking at
> >> how to
> >>
> >> Sure. But I'm strongly against classifying drivers based on who wrote
> >> them ;-)
> >
> > I agree to some extent, because if that were the case, we would have a
> > donald_becker/ directory for several of the drivers. :)
> >
> > Here is one of the problem's I keep running into and there is no simple
> > answer.  While most of the drivers can be grouped together by the
> > hardware they use, that does not work "logically" for every driver.
> >
> > In addition, if vendor 'A' makes a part and vendor 'B' uses same part in
> > a device/system/NIC and vendor 'B' creates the driver, supports the
> > driver and maintains the driver.  Should the part be categorized under
> > vendor 'A'?  IMHO, I think it should be categorized as a vendor 'B'
> > driver.
> 
> Several of the Ethernet drivers are of a third type: vendor A chip, vendor B
> card, entity C software.
> 
> > I started this work with the idea of trying to organize the drivers in
> > the same way that the drivers were to be in the Kconfig, which tended to
> > be drivers/net/ethernet/<manufacturer>.
> >
> > One of the problems that arise in this organization is what do we do
> > when vendor A is bought by vendor B, and vendor B takes on the support
> > of all the old vendor A parts/drivers?
> 
> We don't care. We don't sort drivers by who support them. Eventually, vendors
> lose interest and they all end up under "Linux kernel community" anyway.

It may just be me, these statements seem negative and bitter and is not
helping us "solve" the issue.  While the statements may be true,  I
would like to try and find a solution, what ever it may be, to better
organize drivers/net/ethernet/ drivers to help with maintenance and
future development.

> 
> > So I am open to suggestions.  The process I have using to organize the
> > drivers has been to group drivers that use common libraries and/or code
> > first, then group by either manufacturer, maintainer, or common
> > platform.
> >
> > I would like to keep the lasi_82506.c, sni_82596.c, 82506.c and similar
> > drivers out of the intel/ directory because we would not be supporting
> > the drivers and they are not similar to our drivers that we do support
> > that would be in the intel/ directory.
> >
> > Again, I open to suggestions on how to best organize these types of
> > drivers.  Maybe create a misc/ or <bus_type>/ for these types of
> > drivers?
> 
> "Similar" drivers should be together and consolidated (if someone has time
> to do it). They can even be of different brands.
> I.e. not all Tulip-compatibles were manufactured by Digital or Intel.

I agree and understand, that is why I am taking the time to do it.  The
drivers/net/ethernet/8390/, drivers/net/ethernet/tulip and
drivers/net/ethernet/sun/ are some examples of this.

> 
> >> > better organize the 82596.c, lasi_82596.c, lib82596.c, and
> >> sni_82596.c
> >> > which all use an Intel Ethernet chip but were written and supported
> >> by
> >> > someone other than Intel.
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH v2 03/46] net drivers: remove unnecessary dma_sync_to_device(DMA_FROM_DEVICE)
From: Michał Mirosław @ 2011-07-11  9:29 UTC (permalink / raw)
  To: Vlad Zolotarov
  Cc: linux-wireless@vger.kernel.org, Eilon Greenstein, Gary Zambrano,
	Stephen Hemminger, Stefano Brivio,
	e1000-devel@lists.sourceforge.net, Matthew Carlson,
	Jesse Brandeburg, Francois Romieu, Realtek linux nic maintainers,
	John W. Linville, Ron Mercer, Michael Chan, Jitendra Kalsaria,
	Divy Le Ray, netdev@vger.kernel.org, Bruce Allan, Hartley Sweeten,
	John Ronciak, Jon 
In-Reply-To: <201107111130.39629.vladz@broadcom.com>

On Mon, Jul 11, 2011 at 11:30:39AM +0300, Vlad Zolotarov wrote:
> >         prod_rx_buf->skb = skb;
> > diff --git a/drivers/net/bnx2x/bnx2x_cmn.h b/drivers/net/bnx2x/bnx2x_cmn.h
> > index c016e20..c9e49a0 100644
> > --- a/drivers/net/bnx2x/bnx2x_cmn.h
> > +++ b/drivers/net/bnx2x/bnx2x_cmn.h
> > @@ -923,16 +923,11 @@ static inline int bnx2x_alloc_rx_skb(struct bnx2x
> > *bp, static inline void bnx2x_reuse_rx_skb(struct bnx2x_fastpath *fp,
> >                                       u16 cons, u16 prod)
> >  {
> > -       struct bnx2x *bp = fp->bp;
> >         struct sw_rx_bd *cons_rx_buf = &fp->rx_buf_ring[cons];
> >         struct sw_rx_bd *prod_rx_buf = &fp->rx_buf_ring[prod];
> >         struct eth_rx_bd *cons_bd = &fp->rx_desc_ring[cons];
> >         struct eth_rx_bd *prod_bd = &fp->rx_desc_ring[prod];
> > 
> > -       dma_sync_single_for_device(&bp->pdev->dev,
> > -                                  dma_unmap_addr(cons_rx_buf, mapping),
> > -                                  RX_COPY_THRESH, DMA_FROM_DEVICE);
> > -
> >         dma_unmap_addr_set(prod_rx_buf, mapping,
> >                            dma_unmap_addr(cons_rx_buf, mapping));
> >         prod_rx_buf->skb = cons_rx_buf->skb;
> Michal, pls., note that this function is only called for buffers which were 
> previously dma_synced towards CPU (your "[PATCH v2 05/46] net: bnx2x: fix DMA 
> sync direction" properly fixes the direction of the first call which was 
> incorrect). Then, according to the 3d edition of the "Linux device drivers" 
> book, chapter 15, "Setting up streaming DMA mappings" article, end of the page 
> 449, when we call for dma_syc_single_for_cpu() the buffer ownership gets to 
> the CPU and CPU may safely access the buffer (in particular, we read it). Then 
> the author says: "Before the device accesses the buffer, however, ownership 
> should be transfered back to it with: dma_sync_single_for_device().
> 
> The DMA-API.txt document u've referenced doesn't refer the above function, so, 
> it's unclear how your fix may be based on it. On the other hand it clearly 
> contradicts the "Linux device driver" book.

DMA-API.txt describes what synchronization points are necessary for what DMA
mapping types (direction). dma_sync_single_for_cpu/device() are functions
realising those points. Note that example DMA-API-HOWTO.txt is misleading
as it has dma_sync_single_for_device() where its not required by DMA-API.txt.

In this case, you don't need to sync to device for mappings that haven't
been written to by CPU. CPU caches will be invalidated anyway by next
dma_sync_single_for_cpu() or dma_unmap_single() and the CPU should not
ever write to cachelines that belong to FROM_DEVICE mappings.

The best source is the code. I looked through random implementations of
dma_sync_*_to_*() and in to_device() cases these are CPU write buffer
flushes and bounce buffer copying to the mapping - both actions are useless
(and potentially harmful in the bounce-buffer case) when the mapping hasn't
been written to after sync_to_cpu().

Best Regards,
Michał Mirosław

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH v2 00/46] Clean up RX copybreak and DMA handling
From: David Miller @ 2011-07-11  9:24 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <20110711091649.GA6380@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Mon, 11 Jul 2011 11:16:49 +0200

> Packet is indicated in queue N, theres no memory for new skb, so its
> dropped, and the buffer goes back to free list. In parallel, queue M
> (!= N) indicates new packet. Still, there's no memory for new skb so
> its also dropped and its buffer is reused. The effect is that all
> packets are dropped, whatever queue they appear on.

Why would queue M (!= N) fail just because N did?  They may be
allocating out of different NUMA nodes, and thus succeed.

> The HOL blocking does not matter here, because there's only one head
> --- the system memory. If I misunderstood this point, please explain
> it further.

Multiqueue drivers are moving towards placing the queues on different
NUMA nodes, and in that scenerio one queue might succeed even if the
other fails.

Back to the hardware hanging issue, it's real.  Getting into a
situation where the RX ring lacks any buffers at all is the least
tested path for these chips.

Testing fate is a really bad idea, and this is why I always propose to
keep the hardware with RX buffers to use in all circumstances.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox