Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  9:24 UTC (permalink / raw)
  To: Eli Cohen
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin
In-Reply-To: <20111010091611.GN2681@mtldesk30>

On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:

> Until then I think we need to have the logic working right on ppc and
> measure if blue flame buys us any improvement in ppc. If that's not
> the case (e.g because write combining is not working), then maybe we
> should avoid using blueflame in ppc.
> Could any of the guys from IBM check this and give us feedback?

I don't have the necessary hardware myself to test that but maybe Thadeu
can.

Note that for WC to work, things must be mapped non-guarded. You can do
that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
ioremap_wc() (dunno how "generic" the later is).

>From there, you should get write combining provided that you don't have
barriers between every access (ie those copy operations in their current
form should do the trick).

Cheers,
Ben.

> > Maybe it's time for us to revive those discussions about providing a
> > good set of relaxed MMIO accessors with explicit barriers :-)
> > 
> > Cheers,
> > Ben.
> >  

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-10  9:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin
In-Reply-To: <1318237284.29415.422.camel@pasglop>

On Mon, Oct 10, 2011 at 11:01:24AM +0200, Benjamin Herrenschmidt wrote:
> 
> The case where things get a bit more nasty is when you try to use MMIO
> for low latency small-data type transfers instead of DMA, in which case
> you do want the ability for the chipset to write-combine and control the
> barriers more precisely.
> 
> However, this is hard and Linux doesn't provide very good accessors to
> do so, thus you need to be extra careful (see my example about wmb()
> 
> In the case of the iomap "copy" operations, my problem is that they
> don't properly advertise their lack of ordering since normal iomap does
> have full ordering.
> 
> I believe they should provide ordering with a barrier before & a barrier
> after, eventually with _relaxed variants or _raw variants for those who
> "know what they are doing".

Until then I think we need to have the logic working right on ppc and
measure if blue flame buys us any improvement in ppc. If that's not
the case (e.g because write combining is not working), then maybe we
should avoid using blueflame in ppc.
Could any of the guys from IBM check this and give us feedback?
> 
> Maybe it's time for us to revive those discussions about providing a
> good set of relaxed MMIO accessors with explicit barriers :-)
> 
> Cheers,
> Ben.
>  

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  9:01 UTC (permalink / raw)
  To: Eli Cohen
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin
In-Reply-To: <20111010084726.GM2681@mtldesk30>

On Mon, 2011-10-10 at 10:47 +0200, Eli Cohen wrote:
> On Mon, Oct 10, 2011 at 09:40:17AM +0100, David Laight wrote:
> > 
> > Actually memory barriers shouldn't really be added to
> > any of these 'accessor' functions.
> > (Or, at least, ones without barriers should be provided.)
> > 
> > The driver may want to to a series of writes, then a
> > single barrier, before a final write of a command (etc).
> > 
> > in_le32() from io.h is specially horrid!
> > 
> > 	David
> > 
> The driver would like to control if and when we want to put a memory
> barrier. We really don't want it to be done under the hood. In this
> respect we prefer raw functions which are still available to all
> platforms.

 ... but not necessarily the corresponding barriers.

That's why on powerpc we had to make all rmb,wmb and mb the same, aka a
full sync, because our weaker barriers don't order cachable vs.
non-cachable.

In any case, the raw functions are a bit nasty to use because they both
don't have barriers -and- don't handle endianness. So you have to be
extra careful.

In 90% of the cases, the barriers are what you want anyway. For example
in the else case of the driver, the doorbell MMIO typically wants it, so
using writel() is fine (or iowrite32be) and will have the necessary
barriers.

The case where things get a bit more nasty is when you try to use MMIO
for low latency small-data type transfers instead of DMA, in which case
you do want the ability for the chipset to write-combine and control the
barriers more precisely.

However, this is hard and Linux doesn't provide very good accessors to
do so, thus you need to be extra careful (see my example about wmb()

In the case of the iomap "copy" operations, my problem is that they
don't properly advertise their lack of ordering since normal iomap does
have full ordering.

I believe they should provide ordering with a barrier before & a barrier
after, eventually with _relaxed variants or _raw variants for those who
"know what they are doing".

Maybe it's time for us to revive those discussions about providing a
good set of relaxed MMIO accessors with explicit barriers :-)

Cheers,
Ben.
 

^ permalink raw reply

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  8:53 UTC (permalink / raw)
  To: David Laight
  Cc: Eli Cohen, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8AE78@saturn3.aculab.com>

On Mon, 2011-10-10 at 09:40 +0100, David Laight wrote:
> > What is this __iowrite64_copy... oh I see
> > 
> > Nice, somebody _AGAIN_ added a bunch of "generic" IO 
> > accessors that are utterly wrong on all archs except
> > x86 (ok, -almost-).
> > There isn't a single bloody memory barrier in there !
> 
> Actually memory barriers shouldn't really be added to
> any of these 'accessor' functions.
> (Or, at least, ones without barriers should be provided.)

As long as they are documented to provide no guarantee of ordering
between the stores... And x86 driver writers have any clue that they
will not be ordered vs. surrounding accesses.

> The driver may want to to a series of writes, then a
> single barrier, before a final write of a command (etc).
> 
> in_le32() from io.h is specially horrid!

The reason for that is that drivers expect fully ordered writel() vs
everything (including DMA).

Unfortunately, this is how Linux defines those semantics. I would much
prefer to require barriers explicitely but the decision was made back
then simply because the vast majority of driver writers do not
understand weakly ordered memory models and "everything should be made
to look like x86".

It would be great to come up with a set of more relaxed accessors along
with the appropriate barrier to use for drivers who "know better" but so
far all attempts at doing so have failed due to the inability to agree
on their precise semantics. Tho that was a while ago, we should probably
give it a new shot.

Cheers,
Ben. 

^ permalink raw reply

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Eli Cohen @ 2011-10-10  8:47 UTC (permalink / raw)
  To: David Laight
  Cc: Yevgeny Petrilin, Eli Cohen, Thadeu Lima de Souza Cascardo,
	netdev, linuxppc-dev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8AE78@saturn3.aculab.com>

On Mon, Oct 10, 2011 at 09:40:17AM +0100, David Laight wrote:
> 
> Actually memory barriers shouldn't really be added to
> any of these 'accessor' functions.
> (Or, at least, ones without barriers should be provided.)
> 
> The driver may want to to a series of writes, then a
> single barrier, before a final write of a command (etc).
> 
> in_le32() from io.h is specially horrid!
> 
> 	David
> 
The driver would like to control if and when we want to put a memory
barrier. We really don't want it to be done under the hood. In this
respect we prefer raw functions which are still available to all
platforms.

^ permalink raw reply

* Re: [PATCH] dev: use name hash for dev_seq_ops.
From: Mihai Maruseac @ 2011-10-10  8:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: davem, eric.dumazet, mirq-linux, therbert, jpirko, netdev,
	linux-kernel, dbaluta, Mihai Maruseac
In-Reply-To: <20111007092445.4f097ed9@nehalam.linuxnetplumber.net>

On Fri, Oct 7, 2011 at 7:24 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Fri,  7 Oct 2011 18:20:49 +0300
> Mihai Maruseac <mihai.maruseac@gmail.com> wrote:
>
>> Instead of using the dev->next chain and trying to resync at each call to
>> dev_seq_start, use this hash and store bucket number and bucket offset in
>> seq->private field.
>>
>> As one can notice the improvement is of 1 order of magnitude.
>
> Good idea,
> This will change the ordering of entries in /proc which may upset
> some program, not a critical flaw but worth noting.
>
> Rather than recording the bucket and offset of last entry, another
> alternative would be to just record the ifindex.
>

I tried to record the ifindex but I think that using it and
dev_get_by_index can result in an infinite loop or a NULL
dereferrence. If a device is removed and ifindex points to it we'll
get a NULL from dev_get_by_index. Checking for NULL and calling again
dev_get_by_index will end in an infinite loop at the end of the hlist.

Augmenting the structure to also contain the number of indexes when
the seq_file is opened returns to the current situation with two ints.
Also, it is more prone to bugs caused by device removal while
printing.

-- 
Mihai

^ permalink raw reply

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: David Laight @ 2011-10-10  8:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Eli Cohen
  Cc: netdev, Eli Cohen, linuxppc-dev, Thadeu Lima de Souza Cascardo,
	Yevgeny Petrilin
In-Reply-To: <1318145118.29415.371.camel@pasglop>

 
> What is this __iowrite64_copy... oh I see
> 
> Nice, somebody _AGAIN_ added a bunch of "generic" IO 
> accessors that are utterly wrong on all archs except
> x86 (ok, -almost-).
> There isn't a single bloody memory barrier in there !

Actually memory barriers shouldn't really be added to
any of these 'accessor' functions.
(Or, at least, ones without barriers should be provided.)

The driver may want to to a series of writes, then a
single barrier, before a final write of a command (etc).

in_le32() from io.h is specially horrid!

	David

^ permalink raw reply

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  8:29 UTC (permalink / raw)
  To: David Laight
  Cc: Eli Cohen, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8AE77@saturn3.aculab.com>

On Mon, 2011-10-10 at 09:20 +0100, David Laight wrote:
> 
> For the above I'd actually suggest making 'doorbell_qpn' have the
> correct endianness in order to avoid the (potential) swap every
> time it is set.

Well, the problem is that either you'll end up swapping on x86 or you'll
end up swapping on ppc, there is no "native" MMIO accessor that allow
you to do a no-swap access whatever the arch you are on. Or rather,
there is the __raw_ one but you shouldn't use it for most things :-)
(Because it also doesn't have the right memory barriers).

So I'd rather they do it right using the simpler method, the cost of
swap is going to be negligible, probably not even measurable, and if and
only if they think they can improve on that in a second step, then
consider doing otherwise with appropriate measurements showing a
significant difference.

> You also need to treble-check the required endianness for the
> 'vlan_tag' in the tx descriptor. What would be needed is the
> MAC PCI slave were on an x86 (LE) system.

Cheers,
Ben.

^ permalink raw reply

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: David Laight @ 2011-10-10  8:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Eli Cohen
  Cc: netdev, Eli Cohen, linuxppc-dev, Thadeu Lima de Souza Cascardo,
	Yevgeny Petrilin
In-Reply-To: <1318153939.29415.401.camel@pasglop>

 
> Then, this statement:
> 
> *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;

...
> instead do ... :

> 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |=
cpu_to_be32(ring->doorbell_qpn);
> 
> (Also get rid of that cast and define vlan_tag as a __be32 to start
> with).

Agreed, casts that change the type of memory - *(foo *)&xxx - are
generally bad news unless you are casting a generic 'buffer' to
a specific structure.
I've seen far to much code that ends up being depending on the
endianness and system word size.

For the above I'd actually suggest making 'doorbell_qpn' have the
correct endianness in order to avoid the (potential) swap every
time it is set.

You also need to treble-check the required endianness for the
'vlan_tag' in the tx descriptor. What would be needed is the
MAC PCI slave were on an x86 (LE) system.

	David

^ permalink raw reply

* Re: [PATCH] af_packet: tpacket_destruct_skb, deref skb after BUG_ON assertion
From: danborkmann @ 2011-10-10  8:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1318193866.21116.3.camel@edumazet-laptop>

Hi Eric,

Quoting Eric Dumazet <eric.dumazet@gmail.com>:
> Le dimanche 09 octobre 2011 à 17:19 +0200, danborkmann@iogearbox.net a
> écrit :
>> This tiny patch derefs the skb only after BUG_ON(skb==NULL) was evaluated
>> and not before. Patched against latest Linus tree.
>>
>> Thanks,
>> Daniel
>>
>> Signed-off-by: Daniel Borkmann <danborkmann@iogearbox.net>
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index fabb4fa..d9d833b 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -1167,11 +1167,12 @@ ring_is_full:
>>
>>   static void tpacket_destruct_skb(struct sk_buff *skb)
>>   {
>> -	struct packet_sock *po = pkt_sk(skb->sk);
>> +	struct packet_sock *po;
>>   	void *ph;
>>
>>   	BUG_ON(skb == NULL);
>>
>> +	po = pkt_sk(skb->sk);
>>   	if (likely(po->tx_ring.pg_vec)) {
>>   		ph = skb_shinfo(skb)->destructor_arg;
>>   		BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
>>
>>
>
> Well, to be honest, this BUG_ON(!skb) is absolutely useless for two
> reasons.
>
> 1) If skb happens to be NULL, the NULL dereference is trapped and stack
> trace dumped as well.
>
> 2) Of course, tpacket_destruct_skb() being an skb destructor, skb cannot
> be NULL at this point by design.
>
> Please remove the BUG_ON() instead of trying to move it ;)

Thanks, you're absolutely right! Here's the trivial patch:

af_packet: removed unnecessary BUG_ON assertion in tpacket_destruct_skb

If skb is NULL, then stack trace is thrown on anyway on dereference.  
Therefore,
the stack trace triggered by BUG_ON is duplicate.

Signed-off-by: Daniel Borkmann <danborkmann@googlemail.com>

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index fabb4fa..886ae50 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1170,8 +1170,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
  	struct packet_sock *po = pkt_sk(skb->sk);
  	void *ph;

-	BUG_ON(skb == NULL);
-
  	if (likely(po->tx_ring.pg_vec)) {
  		ph = skb_shinfo(skb)->destructor_arg;
  		BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);

^ permalink raw reply related

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
From: Benjamin Herrenschmidt @ 2011-10-10  7:32 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin,
	netdev@vger.kernel.org, Eli Cohen, linuxppc-dev
In-Reply-To: <20111009103020.GL2681@mtldesk30>

On Sun, 2011-10-09 at 12:30 +0200, Eli Cohen wrote:

> > Ideally you want to avoid that swapping altogether and use the right
> > accessor that indicates that your register is BE to start with. IE.
> > remove the swab32 completely and then use something like 
> > iowrite32be() instead of writel().
> I agree, this looks better but does it work on memory mapped io or
> only on io pci space? All our registers are memory mapped...

The iomap functions work on both.

> > Basically, the problem you have is that writel() has an implicit "write
> > to LE register" semantic. Your register is BE. the "iomap" variants
> > provide you with more fine grained "be" variants to use in that case.
> > There's also writel_be() but that one doesn't exist on every
> > architecture afaik.
> So writel_be is the function I should use for memory mapped io? If it
> does not exist for all platforms it's a pitty :-(

Just use the iomap variant. Usually you also use pci_iomap() instead of
ioremap() but afaik, for straight MMIO, it works with normal ioremap as
well.

> > Now, once the mmio problem is out of the way, let's look back at how you
> > then use that qpn.
> > 
> > With the current code, you've generated something in memory which is
> > byte reversed, so essentially "LE" on ppc and "BE" on x86.
> > 
> > Then, this statement:
> > 
> > *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> > 
> > Will essentially write it out as-is in memory for use by the chip. The chip,
> > from what you say, expects BE, so this will be broken on PPC.
> I see. So this field is layed in le for ppc and the rest of the
> descriptor is be. so I assum that __iowrite64_copy() does not swap
> anything but we still have tx_desc->ctrl.vlan_tag in the wrong
> endianess.

Yes because you had swapped it initially. IE your original swab32 is
what busts it for you on ppc.

> > Here too, the right solution is to instead not do that swab32 to begin
> > with (ring->doorbell_qpn remains a native endian value) and instead do,
> > in addition to the above mentioned change to the writel:
> > 
> > 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
> > 
> > (Also get rid of that cast and define vlan_tag as a __be32 to start
> > with).
> > 
> > Cheers,
> > Ben.
> 
> Thanks for your review. I will send another patch which should fix the
> deficiencies.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC] truesize lies
From: Eric Dumazet @ 2011-10-10  6:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111009.181023.328192272226333109.davem@davemloft.net>

Le dimanche 09 octobre 2011 à 18:10 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 09 Oct 2011 23:55:20 +0200
> 
> > Some drivers splits a page in two (or more) pieces, so we cant know what
> > was really reserved by a driver for paged skbs.
> > 
> > Only thing we could enforce at the moment is the proper accounting for
> > skb head :
> > 
> > WARN_ON(skb->truesize < (skb_end_pointer(skb) - skb->head) + sizeof(sk_buff) + skb->data_len);
> 
> This is partly true.
> 
> Drivers that use page pools divide pages up into different pools, each
> with some specific block size.  At least this is how NIU works.
> 
> So NIU knows exactly how much of the block is logically part of that
> SKB yet unused.
> 
> And if we really wanted to we could add a frag[].reserved field that
> keeps track of this for debugging.

Hmm, this would still be possible for a driver to lie and consume lot of
ram on 64bit arches.

How about using a "struct page" field then, since for a given page, we
can assume/enforce it was divided in equal units (kind of a negative
order) by a driver.

Instead of using alloc_page(order) a driver could use

alloc_page_truesize(gfp_t mask, int page_order, int subunit_order)

For example, an x86 driver using 4KB page splitted in two pieces would
use alloc_page_truesize(gfp, 0, 1)

To get frag[i].reserved (aka truesize), we then would do

static inline int frag_truesize(const skb_frag_t *frag)
{
	return compound_head(frag->page)->truesize;
}

(Not even sure compound_head() is even necessary here, but I see NIU
uses it in niu_rbr_add_page() ?)

^ permalink raw reply

* Re: [PATCH net] vlan:make mtu of vlan equal to physical dev
From: David Miller @ 2011-10-10  4:00 UTC (permalink / raw)
  To: panweiping3; +Cc: netdev, herbert
In-Reply-To: <bc33a6142ce48b71b9c232a9154ed76f6048f0cb.1318068629.git.panweiping3@gmail.com>

From: Weiping Pan <panweiping3@gmail.com>
Date: Sat,  8 Oct 2011 18:12:15 +0800

[ Herbert, Weiping is suggesting to remove the "vlandev->mtu <= dev->mtu"
  test in your patch below. ]

> Default mtu of vlan device is the same with mtu of physical device,
> for example 1500, but when change physics mtu to 1600,
> VLAN device's mtu is still 1500.
> Certainly, you can change vlan device's mtu to 1600 manually,
> but I think when you change physics device's mtu, VLAN's mtu should be changed
> automatically instead of by manually.
> 
> Steps to Reproduce:
> 1.vconfig add eth4 3
> 2.ifconfig eth4 mtu 1600
> 3.check mtu on eth4.3
> 
> And what's worse is that if you decrease mtu of pyhsical device,
> and when you want to increase it, the mtu of vlan device won't be changed.
> 
> Steps to Reproduce:
> 1.vconfig add eth4 3
> 2.ifconfig eth4 mtu 100
> 3.ifconfig eth4 mtu 1500
> 4.the mtu of eth4.3 is still 100
> 
> This bug is reported by Liang Zheng(lzheng@redhat.com).
> 
> Signed-off-by: Weiping Pan <panweiping3@gmail.com>

But now this setting is unconditional, and thus value user made
settings will be ignored.

The code that is there now is just a safety measure, and the
test appears to very much be intentional:

--------------------
commit 2e477c9bd2bb6a1606e498adb53ba913378ecdf2
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Mon Jul 20 07:35:37 2009 -0700

    vlan: Propagate physical MTU changes
    
    When the physical MTU changes we want to ensure that all existing
    VLAN device MTUs do not exceed the new underlying MTU.  This patch
    adds that propagation.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index fe64908..6d37b7e 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -468,6 +468,19 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 		}
 		break;
 
+	case NETDEV_CHANGEMTU:
+		for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) {
+			vlandev = vlan_group_get_device(grp, i);
+			if (!vlandev)
+				continue;
+
+			if (vlandev->mtu <= dev->mtu)
+				continue;
+
+			dev_set_mtu(vlandev, dev->mtu);
+		}
+		break;
+
 	case NETDEV_FEAT_CHANGE:
 		/* Propagate device features to underlying device */
 		for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) {

^ permalink raw reply related

* Re: [PATCH 2/2 net-next] r6040: bump version to 0.28 and date to 07Oct2011.
From: David Miller @ 2011-10-10  3:51 UTC (permalink / raw)
  To: florian; +Cc: netdev
In-Reply-To: <201110071136.28266.florian@openwrt.org>

From: Florian Fainelli <florian@openwrt.org>
Date: Fri, 7 Oct 2011 11:36:28 +0200

> Signed-off-by: Florian Fainelli <florian@openwrt.org>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2 net-next] r6040: invoke phy_{start,stop} when appropriate
From: David Miller @ 2011-10-10  3:51 UTC (permalink / raw)
  To: florian; +Cc: netdev
In-Reply-To: <201110071136.22107.florian@openwrt.org>

From: Florian Fainelli <florian@openwrt.org>
Date: Fri, 7 Oct 2011 11:36:22 +0200

> Joe reported to me that right after a bring up of a r6040 interface
> the ethtool output had no consistent output with respect to link duplex
> and speed. Fix this by adding a missing phy_start call in r6040_up and
> conversely a phy_stop call in r6040_down to properly initialize phy states.
> 
> Reported-by: Joe Chou <Joe.Chou@rdc.com.tw>
> Signed-off-by: Florian Fainelli <florian@openwrt.org>

Applied.

^ permalink raw reply

* Re: [PATCH 0/7] mlx4_en: bug fixes and performance enhancement
From: David Miller @ 2011-10-10  3:44 UTC (permalink / raw)
  To: alexg; +Cc: netdev, yevgenyp
In-Reply-To: <4E91BD1B.2040902@mellanox.com>

From: Alexander Guller <alexg@mellanox.com>
Date: Sun, 9 Oct 2011 17:26:19 +0200

> This is a series of patches for the mlx4_en driver generated against
> net-next tree.  The patches include several bug fixes and a fix for
> a better initialization of TX interrupts.

Applied.

For some reason another copy of patch #3 was appended after the
end of patch #4.  Make sure that doesn't happen next time.

^ permalink raw reply

* Re: [PATCH] bluetooth: Properly clone LSM attributes to newly created child connections
From: James Morris @ 2011-10-10  0:31 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux
In-Reply-To: <20111007194059.12345.13398.stgit@sifl>

On Fri, 7 Oct 2011, Paul Moore wrote:

> Reported-by: James M. Cape <jcape@ignore-your.tv>
> Signed-off-by: Paul Moore <pmoore@redhat.com>

Acked-by: James Morris <jmorris@namei.org>


-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [RFC] truesize lies
From: David Miller @ 2011-10-09 22:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1318197320.21116.11.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Oct 2011 23:55:20 +0200

> Some drivers splits a page in two (or more) pieces, so we cant know what
> was really reserved by a driver for paged skbs.
> 
> Only thing we could enforce at the moment is the proper accounting for
> skb head :
> 
> WARN_ON(skb->truesize < (skb_end_pointer(skb) - skb->head) + sizeof(sk_buff) + skb->data_len);

This is partly true.

Drivers that use page pools divide pages up into different pools, each
with some specific block size.  At least this is how NIU works.

So NIU knows exactly how much of the block is logically part of that
SKB yet unused.

And if we really wanted to we could add a frag[].reserved field that
keeps track of this for debugging.

^ permalink raw reply

* Re: [RFC] truesize lies
From: Eric Dumazet @ 2011-10-09 21:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111009.172739.2151675769111763947.davem@davemloft.net>

Le dimanche 09 octobre 2011 à 17:27 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 09 Oct 2011 21:02:28 +0200
> 
> > I guess most driver writers adjust truesize to please TCP stack and get
> > nice performance numbers.
> > 
> > skb->truesize = frame_len + sizeof(sk_buff);
> > 
> > What should we do exactly ?
> 
> We should pick a specification that accounts for all the metadata as well
> as unused data areas properly, and enforce this in netif_receive_skb().
> 
> That will get the inaccurate cases fixed real fast.
> 
> We've had similar problems in the past and dealt with it in the same
> exact way.


Some drivers splits a page in two (or more) pieces, so we cant know what
was really reserved by a driver for paged skbs.

Only thing we could enforce at the moment is the proper accounting for
skb head :

WARN_ON(skb->truesize < (skb_end_pointer(skb) - skb->head) + sizeof(sk_buff) + skb->data_len);

^ permalink raw reply

* Re: [RFC] truesize lies
From: David Miller @ 2011-10-09 21:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1318186948.5276.49.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 09 Oct 2011 21:02:28 +0200

> I guess most driver writers adjust truesize to please TCP stack and get
> nice performance numbers.
> 
> skb->truesize = frame_len + sizeof(sk_buff);
> 
> What should we do exactly ?

We should pick a specification that accounts for all the metadata as well
as unused data areas properly, and enforce this in netif_receive_skb().

That will get the inaccurate cases fixed real fast.

We've had similar problems in the past and dealt with it in the same
exact way.

^ permalink raw reply

* Re: [PATCH] af_packet: tpacket_destruct_skb, deref skb after BUG_ON assertion
From: Eric Dumazet @ 2011-10-09 20:57 UTC (permalink / raw)
  To: danborkmann; +Cc: David S. Miller, netdev
In-Reply-To: <20111009171919.10922hrx8qjm2f7b@webmail.your-server.de>

Le dimanche 09 octobre 2011 à 17:19 +0200, danborkmann@iogearbox.net a
écrit :
> This tiny patch derefs the skb only after BUG_ON(skb==NULL) was evaluated
> and not before. Patched against latest Linus tree.
> 
> Thanks,
> Daniel
> 
> Signed-off-by: Daniel Borkmann <danborkmann@iogearbox.net>
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index fabb4fa..d9d833b 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1167,11 +1167,12 @@ ring_is_full:
> 
>   static void tpacket_destruct_skb(struct sk_buff *skb)
>   {
> -	struct packet_sock *po = pkt_sk(skb->sk);
> +	struct packet_sock *po;
>   	void *ph;
> 
>   	BUG_ON(skb == NULL);
> 
> +	po = pkt_sk(skb->sk);
>   	if (likely(po->tx_ring.pg_vec)) {
>   		ph = skb_shinfo(skb)->destructor_arg;
>   		BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
> 
> 

Well, to be honest, this BUG_ON(!skb) is absolutely useless for two
reasons.

1) If skb happens to be NULL, the NULL dereference is trapped and stack
trace dumped as well.

2) Of course, tpacket_destruct_skb() being an skb destructor, skb cannot
be NULL at this point by design.

Please remove the BUG_ON() instead of trying to move it ;)

Thanks

^ permalink raw reply

* [PATCH V2] Add platform driver support to the CS890x driver
From: Jaccon Bastiaansen @ 2011-10-09 20:51 UTC (permalink / raw)
  To: netdev; +Cc: s.hauer, Uwe Kleine-König, kernel
In-Reply-To: <1317148076-14931-1-git-send-email-jaccon.bastiaansen@gmail.com>

Hello,

This patch hasn't been sent to the netdev mailing list before, sorry for that.

Jaccon

---------- Forwarded message ----------
From: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Date: 2011/9/27
Subject: [PATCH V2] Add platform driver support to the CS890x driver
To: s.hauer@pengutronix.de, u.kleine-koenig@pengutronix.de,
kernel@pengutronix.de
Cc: linux-arm-kernel@lists.infradead.org, Jaccon Bastiaansen
<jaccon.bastiaansen@gmail.com>


The CS89x0 ethernet controller is used on a number of evaluation
boards, such as the MX31ADS. The current driver has memory address and
IRQ settings for each board on which this controller is used. Driver
updates are therefore required to support other boards that also use
the CS89x0. To avoid these driver updates, a better mechanism
(platform driver support) is added to communicate the board dependent
settings to the driver.

Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
---
 arch/arm/mach-imx/mach-mx31ads.c               |   40 ++++++-
 arch/arm/mach-ixp2000/ixdp2x01.c               |   24 +++-
 arch/arm/mach-ixp2000/pci.c                    |    3 +-
 arch/arm/mach-ixp23xx/ixdp2351.c               |   21 +++
 arch/arm/mach-ixp23xx/pci.c                    |    3 +-
 arch/arm/plat-mxc/include/mach/board-mx31ads.h |   33 -----
 drivers/net/Kconfig                            |   16 ++-
 drivers/net/Space.c                            |    2 +-
 drivers/net/cs89x0.c                           |  175 +++++++++++++++---------
 9 files changed, 210 insertions(+), 107 deletions(-)
 delete mode 100644 arch/arm/plat-mxc/include/mach/board-mx31ads.h

diff --git a/arch/arm/mach-imx/mach-mx31ads.c b/arch/arm/mach-imx/mach-mx31ads.c
index f4dee02..007bd38 100644
--- a/arch/arm/mach-imx/mach-mx31ads.c
+++ b/arch/arm/mach-imx/mach-mx31ads.c
@@ -28,7 +28,6 @@
 #include <asm/memory.h>
 #include <asm/mach/map.h>
 #include <mach/common.h>
-#include <mach/board-mx31ads.h>
 #include <mach/iomux-mx3.h>

 #ifdef CONFIG_MACH_MX31ADS_WM1133_EV1
@@ -62,6 +61,7 @@
 #define PBC_INTMASK_CLEAR_REG  (PBC_INTMASK_CLEAR + PBC_BASE_ADDRESS)
 #define EXPIO_PARENT_INT       IOMUX_TO_IRQ(MX31_PIN_GPIO1_4)

+#define MXC_EXP_IO_BASE                (MXC_BOARD_IRQ_START)
 #define MXC_IRQ_TO_EXPIO(irq)  ((irq) - MXC_EXP_IO_BASE)

 #define EXPIO_INT_XUART_INTA   (MXC_EXP_IO_BASE + 10)
@@ -69,6 +69,18 @@

 #define MXC_MAX_EXP_IO_LINES   16

+
+/* Base address of PBC controller */
+#define PBC_BASE_ADDRESS        MX31_CS4_BASE_ADDR_VIRT
+
+/* Offsets for the PBC Controller register */
+
+/* Ethernet Controller IO addresses */
+#define PBC_CS8900A_IOBASE      0x020000
+#define CS8900A_START          (PBC_BASE_ADDRESS + PBC_CS8900A_IOBASE + 0x300)
+
+#define EXPIO_INT_ENET_INT     (MXC_EXP_IO_BASE + 8)
+
 /*
 * The serial port definition structure.
 */
@@ -101,11 +113,36 @@ static struct platform_device serial_device = {
       },
 };

+static struct resource mx31ads_cs8900_resources[] = {
+       {
+               .start  = (u32)CS8900A_START,
+               .end    = (u32)CS8900A_START + 0x1000 - 1,
+               .flags  = IORESOURCE_MEM,
+       },
+       {
+               .start  = EXPIO_INT_ENET_INT,
+               .end    = EXPIO_INT_ENET_INT,
+               .flags  = IORESOURCE_IRQ,
+       },
+};
+
+static struct platform_device mx31ads_cs8900_device = {
+       .name           = "cs89x0",
+       .id             = 0,
+       .num_resources  = ARRAY_SIZE(mx31ads_cs8900_resources),
+       .resource       = mx31ads_cs8900_resources,
+};
+
 static int __init mxc_init_extuart(void)
 {
       return platform_device_register(&serial_device);
 }

+static void __init mxc_init_ext_ethernet(void)
+{
+       platform_device_register(&mx31ads_cs8900_device);
+}
+
 static const struct imxuart_platform_data uart_pdata __initconst = {
       .flags = IMXUART_HAVE_RTSCTS,
 };
@@ -520,6 +557,7 @@ static void __init mx31ads_init(void)
       mxc_init_imx_uart();
       mxc_init_i2c();
       mxc_init_audio();
+       mxc_init_ext_ethernet();
 }

 static void __init mx31ads_timer_init(void)
diff --git a/arch/arm/mach-ixp2000/ixdp2x01.c b/arch/arm/mach-ixp2000/ixdp2x01.c
index 84835b2..1ec4a74 100644
--- a/arch/arm/mach-ixp2000/ixdp2x01.c
+++ b/arch/arm/mach-ixp2000/ixdp2x01.c
@@ -394,9 +394,31 @@ static struct platform_device ixdp2x01_i2c_controller = {
       .num_resources  = 0
 };

+static struct resource ixdp2x01_cs8900_resources[] = {
+       {
+               .start  = (u32)IXDP2X01_CS8900_VIRT_BASE,
+               .end    = (u32)IXDP2X01_CS8900_VIRT_BASE + 0x1000 - 1,
+               .flags  = IORESOURCE_MEM,
+       },
+       {
+               .start  = IRQ_IXDP2X01_CS8900,
+               .end    = IRQ_IXDP2X01_CS8900,
+               .flags  = IORESOURCE_IRQ,
+       },
+};
+
+static struct platform_device ixdp2x01_cs8900 = {
+       .name           = "cs89x0",
+       .id             = 0,
+       .num_resources  = ARRAY_SIZE(ixdp2x01_cs8900_resources),
+       .resource       = ixdp2x01_cs8900_resources,
+};
+
+
 static struct platform_device *ixdp2x01_devices[] __initdata = {
       &ixdp2x01_flash,
-       &ixdp2x01_i2c_controller
+       &ixdp2x01_i2c_controller,
+       &ixdp2x01_cs8900
 };

 static void __init ixdp2x01_init_machine(void)
diff --git a/arch/arm/mach-ixp2000/pci.c b/arch/arm/mach-ixp2000/pci.c
index f797c5f..54bf1b8 100644
--- a/arch/arm/mach-ixp2000/pci.c
+++ b/arch/arm/mach-ixp2000/pci.c
@@ -130,7 +130,8 @@ static struct pci_ops ixp2000_pci_ops = {
       .write  = ixp2000_pci_write_config
 };

-struct pci_bus *ixp2000_pci_scan_bus(int nr, struct pci_sys_data *sysdata)
+struct pci_bus __devinit
+*ixp2000_pci_scan_bus(int nr, struct pci_sys_data *sysdata)
 {
       return pci_scan_bus(sysdata->busnr, &ixp2000_pci_ops, sysdata);
 }
diff --git a/arch/arm/mach-ixp23xx/ixdp2351.c b/arch/arm/mach-ixp23xx/ixdp2351.c
index 8dcba17..d559110 100644
--- a/arch/arm/mach-ixp23xx/ixdp2351.c
+++ b/arch/arm/mach-ixp23xx/ixdp2351.c
@@ -311,6 +311,26 @@ static struct platform_device ixdp2351_flash = {
       .resource       = &ixdp2351_flash_resource,
 };

+static struct resource ixdp2351_cs8900_resources[] = {
+       {
+               .start  = (u32)IXDP2351_VIRT_CS8900_BASE,
+               .end    = (u32)IXDP2351_VIRT_CS8900_BASE + 0x1000 - 1,
+               .flags  = IORESOURCE_MEM,
+       },
+       {
+               .start  = IRQ_IXDP2351_CS8900,
+               .end    = IRQ_IXDP2351_CS8900,
+               .flags  = IORESOURCE_IRQ,
+       },
+};
+
+static struct platform_device ixdp2351_cs8900 = {
+       .name           = "cs89x0",
+       .id             = 0,
+       .num_resources  = ARRAY_SIZE(ixdp2351_cs8900_resources),
+       .resource       = ixdp2351_cs8900_resources,
+};
+
 static void __init ixdp2351_init(void)
 {
       platform_device_register(&ixdp2351_flash);
@@ -323,6 +343,7 @@ static void __init ixdp2351_init(void)
       IXP23XX_EXP_CS0[2] |= IXP23XX_FLASH_WRITABLE;
       IXP23XX_EXP_CS0[3] |= IXP23XX_FLASH_WRITABLE;

+       platform_device_register(&ixdp2351_cs8900);
       ixp23xx_sys_init();
 }

diff --git a/arch/arm/mach-ixp23xx/pci.c b/arch/arm/mach-ixp23xx/pci.c
index 563819a..f82bb91 100644
--- a/arch/arm/mach-ixp23xx/pci.c
+++ b/arch/arm/mach-ixp23xx/pci.c
@@ -141,7 +141,8 @@ struct pci_ops ixp23xx_pci_ops = {
       .write  = ixp23xx_pci_write_config,
 };

-struct pci_bus *ixp23xx_pci_scan_bus(int nr, struct pci_sys_data *sysdata)
+struct pci_bus __devinit
+*ixp23xx_pci_scan_bus(int nr, struct pci_sys_data *sysdata)
 {
       return pci_scan_bus(sysdata->busnr, &ixp23xx_pci_ops, sysdata);
 }
diff --git a/arch/arm/plat-mxc/include/mach/board-mx31ads.h
b/arch/arm/plat-mxc/include/mach/board-mx31ads.h
deleted file mode 100644
index 94b60dd..0000000
--- a/arch/arm/plat-mxc/include/mach/board-mx31ads.h
+++ /dev/null
@@ -1,33 +0,0 @@
-/*
- * Copyright 2005-2007 Freescale Semiconductor, Inc. All Rights Reserved.
- */
-
-/*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#ifndef __ASM_ARCH_MXC_BOARD_MX31ADS_H__
-#define __ASM_ARCH_MXC_BOARD_MX31ADS_H__
-
-#include <mach/hardware.h>
-
-/*
- * These symbols are used by drivers/net/cs89x0.c.
- * This is ugly as hell, but we have to provide them until
- * someone fixed the driver.
- */
-
-/* Base address of PBC controller */
-#define PBC_BASE_ADDRESS        MX31_CS4_BASE_ADDR_VIRT
-/* Offsets for the PBC Controller register */
-
-/* Ethernet Controller IO base address */
-#define PBC_CS8900A_IOBASE      0x020000
-
-#define MXC_EXP_IO_BASE                (MXC_BOARD_IRQ_START)
-
-#define EXPIO_INT_ENET_INT     (MXC_EXP_IO_BASE + 8)
-
-#endif /* __ASM_ARCH_MXC_BOARD_MX31ADS_H__ */
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 93359fa..2873f86 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1497,8 +1497,7 @@ config FORCEDETH

 config CS89x0
       tristate "CS89x0 support"
-       depends on NET_ETHERNET && (ISA || EISA || MACH_IXDP2351 \
-               || ARCH_IXDP2X01 || MACH_MX31ADS || MACH_QQ2440)
+       depends on NET_ETHERNET
       ---help---
         Support for CS89x0 chipset based Ethernet cards. If you have a
         network (Ethernet) card of this type, say Y and read the
@@ -1509,10 +1508,15 @@ config CS89x0
         To compile this driver as a module, choose M here. The module
         will be called cs89x0.

-config CS89x0_NONISA_IRQ
-       def_bool y
-       depends on CS89x0 != n
-       depends on MACH_IXDP2351 || ARCH_IXDP2X01 || MACH_MX31ADS || MACH_QQ2440
+config CS89x0_PLATFORM
+       bool "CS89x0 platform driver support"
+       depends on CS89x0
+       help
+         Say Y to compile the cs890x0 driver as a platform driver. This
+         makes this driver suitable for use on certain evaluation boards
+         such as the IMX21ADS.
+
+         If you are unsure, say N.

 config TC35815
       tristate "TOSHIBA TC35815 Ethernet support"
diff --git a/drivers/net/Space.c b/drivers/net/Space.c
index 068c356..3c53ab1 100644
--- a/drivers/net/Space.c
+++ b/drivers/net/Space.c
@@ -189,7 +189,7 @@ static struct devprobe2 isa_probes[] __initdata = {
 #ifdef CONFIG_SEEQ8005
       {seeq8005_probe, 0},
 #endif
-#ifdef CONFIG_CS89x0
+#if defined(CONFIG_CS89x0) && !defined(CONFIG_CS89x0_PLATFORM)
       {cs89x0_probe, 0},
 #endif
 #ifdef CONFIG_AT1700
diff --git a/drivers/net/cs89x0.c b/drivers/net/cs89x0.c
index 537a4b2..b7fb3bc 100644
--- a/drivers/net/cs89x0.c
+++ b/drivers/net/cs89x0.c
@@ -12,6 +12,14 @@
        The author may be reached at nelson@crynwr.com, Crynwr
        Software, 521 Pleasant Valley Rd., Potsdam, NY 13676

+Sources
+
+       Crynwr packet driver epktisa.
+
+       Crystal Semiconductor data sheets.
+
+
+
  Changelog:

  Mike Cruse        : mcruse@cti-ltd.com
@@ -98,39 +106,14 @@
  Domenico Andreoli : cavokz@gmail.com
                    : QQ2440 platform support

+  Jaccon Bastiaansen: jaccon.bastiaansen@gmail.com
+                   : added platform driver support
 */

-/* Always include 'config.h' first in case the user wants to turn on
-   or override something. */
-#include <linux/module.h>
-
-/*
- * Set this to zero to disable DMA code
- *
- * Note that even if DMA is turned off we still support the 'dma' and
 'use_dma'
- * module options so we don't break any startup scripts.
- */
-#ifndef CONFIG_ISA_DMA_API
-#define ALLOW_DMA      0
-#else
-#define ALLOW_DMA      1
-#endif
-
-/*
- * Set this to zero to remove all the debug statements via
- * dead code elimination
- */
-#define DEBUGGING      1
-
-/*
-  Sources:
-
-       Crynwr packet driver epktisa.
-
-       Crystal Semiconductor data sheets.
-
-*/
+#define pr_fmt(fmt)    KBUILD_MODNAME ": " fmt

+#include <linux/module.h>
+#include <linux/printk.h>
 #include <linux/errno.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
@@ -147,21 +130,36 @@
 #include <linux/bitops.h>
 #include <linux/delay.h>
 #include <linux/gfp.h>
-
 #include <asm/system.h>
 #include <asm/io.h>
 #include <asm/irq.h>
-#if ALLOW_DMA
+#include <linux/platform_device.h>
+#include "cs89x0.h"
+
+/*
+ * Set this to zero to disable DMA code
+ *
+ * Note that even if DMA is turned off we still support the 'dma' and
 'use_dma'
+ * module options so we don't break any startup scripts.
+ */
+#ifndef CONFIG_ISA_DMA_API
+#define ALLOW_DMA      0
+#else
+#define ALLOW_DMA      1
 #include <asm/dma.h>
 #endif

-#include "cs89x0.h"
-
-static char version[] __initdata =
-"cs89x0.c: v2.4.3-pre1 Russell Nelson <nelson@crynwr.com>, Andrew Morton\n";
+/*
+ * Set this to zero to remove all the debug statements via
+ * dead code elimination
+ */
+#define DEBUGGING      1

 #define DRV_NAME "cs89x0"

+static char version[] =
+"cs89x0.c: v2.4.3-pre1 Russell Nelson <nelson@crynwr.com>, Andrew Morton\n";
+
 /* First, a few definitions that the brave might change.
   A zero-terminated list of I/O addresses to be probed. Some special flags..
      Addr & 1 = Read back the address port, look for signature and reset
@@ -173,22 +171,9 @@ static char version[] __initdata =
 /* The cs8900 has 4 IRQ pins, software selectable. cs8900_irq_map maps
   them to system IRQ numbers. This mapping is card specific and is set to
   the configuration of the Cirrus Eval board for this chip. */
-#if defined(CONFIG_MACH_IXDP2351)
-static unsigned int netcard_portlist[] __used __initdata =
{IXDP2351_VIRT_CS8900_BASE, 0};
-static unsigned int cs8900_irq_map[] = {IRQ_IXDP2351_CS8900, 0, 0, 0};
-#elif defined(CONFIG_ARCH_IXDP2X01)
-static unsigned int netcard_portlist[] __used __initdata =
{IXDP2X01_CS8900_VIRT_BASE, 0};
-static unsigned int cs8900_irq_map[] = {IRQ_IXDP2X01_CS8900, 0, 0, 0};
-#elif defined(CONFIG_MACH_QQ2440)
-#include <mach/qq2440.h>
-static unsigned int netcard_portlist[] __used __initdata = {
QQ2440_CS8900_VIRT_BASE + 0x300, 0 };
-static unsigned int cs8900_irq_map[] = { QQ2440_CS8900_IRQ, 0, 0, 0 };
-#elif defined(CONFIG_MACH_MX31ADS)
-#include <mach/board-mx31ads.h>
-static unsigned int netcard_portlist[] __used __initdata = {
-       PBC_BASE_ADDRESS + PBC_CS8900A_IOBASE + 0x300, 0
-};
-static unsigned cs8900_irq_map[] = {EXPIO_INT_ENET_INT, 0, 0, 0};
+#if defined(CONFIG_CS89x0_PLATFORM)
+static unsigned int netcard_portlist[] __used __initdata = {0, 0};
+static unsigned int cs8900_irq_map[] = {0, 0, 0, 0};
 #else
 static unsigned int netcard_portlist[] __used __initdata =
   { 0x300, 0x320, 0x340, 0x360, 0x200, 0x220, 0x240, 0x260, 0x280,
0x2a0, 0x2c0, 0x2e0, 0};
@@ -424,7 +409,7 @@ writereg(struct net_device *dev, u16 regno, u16 value)
       writeword(dev->base_addr, DATA_PORT, value);
 }

-static int __init
+static int
 wait_eeprom_ready(struct net_device *dev)
 {
       int timeout = jiffies;
@@ -437,7 +422,7 @@ wait_eeprom_ready(struct net_device *dev)
       return 0;
 }

-static int __init
+static int
 get_eeprom_data(struct net_device *dev, int off, int len, int *buffer)
 {
       int i;
@@ -455,7 +440,7 @@ get_eeprom_data(struct net_device *dev, int off,
int len, int *buffer)
        return 0;
 }

-static int  __init
+static int
 get_eeprom_cksum(int off, int len, int *buffer)
 {
       int i, cksum;
@@ -503,7 +488,7 @@ static const struct net_device_ops net_ops = {
   Return 0 on success.
 */

-static int __init
+static int
 cs89x0_probe1(struct net_device *dev, int ioaddr, int modular)
 {
       struct net_local *lp = netdev_priv(dev);
@@ -529,9 +514,6 @@ cs89x0_probe1(struct net_device *dev, int ioaddr,
int modular)
               lp->force = g_cs89x0_media__force;
 #endif

-#if defined(CONFIG_MACH_QQ2440)
-               lp->force |= FORCE_RJ45 | FORCE_FULL;
-#endif
        }

       /* Grab the region so we can find another board if autoIRQ fails. */
@@ -737,7 +719,7 @@ cs89x0_probe1(struct net_device *dev, int ioaddr,
int modular)
       } else {
               i = lp->isa_config & INT_NO_MASK;
               if (lp->chip_type == CS8900) {
-#ifdef CONFIG_CS89x0_NONISA_IRQ
+#ifdef CONFIG_CS89x0_PLATFORM
                       i = cs8900_irq_map[0];
 #else
                       /* Translate the IRQ using the IRQ mapping table. */
@@ -1228,7 +1210,7 @@ net_open(struct net_device *dev)
       }
       else
       {
-#ifndef CONFIG_CS89x0_NONISA_IRQ
+#ifndef CONFIG_CS89x0_PLATFORM
               if (((1 << dev->irq) & lp->irq_map) == 0) {
                       printk(KERN_ERR "%s: IRQ %d is not in our map
of allowable IRQs, which is %x\n",
                               dev->name, dev->irq, lp->irq_map);
@@ -1746,7 +1728,7 @@ static int set_mac_address(struct net_device
*dev, void *p)
       return 0;
 }

-#ifdef MODULE
+#if defined(MODULE) && !defined(CONFIG_CS89x0_PLATFORM)

 static struct net_device *dev_cs89x0;

@@ -1900,7 +1882,74 @@ cleanup_module(void)
       release_region(dev_cs89x0->base_addr, NETCARD_IO_EXTENT);
       free_netdev(dev_cs89x0);
 }
-#endif /* MODULE */
+#endif /* MODULE && !CONFIG_CS89x0_PLATFORM */
+
+#ifdef CONFIG_CS89x0_PLATFORM
+static int cs89x0_platform_probe(struct platform_device *pdev)
+{
+       struct net_device *dev = alloc_etherdev(sizeof(struct net_local));
+       struct resource *mem_res;
+       struct resource *irq_res;
+       int err;
+
+       if (!dev)
+               return -ENODEV;
+
+       mem_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+       irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
+       if (mem_res == NULL || irq_res == NULL) {
+               pr_warning("memory and/or interrupt resource missing.\n");
+               err = -ENOENT;
+               goto out;
+       }
+
+       cs8900_irq_map[0] = irq_res->start;
+       err = cs89x0_probe1(dev, mem_res->start, 0);
+       if (err) {
+               pr_warning("no cs8900 or cs8920 detected.\n");
+               goto out;
+       }
+
+       platform_set_drvdata(pdev, dev);
+       return 0;
+out:
+       free_netdev(dev);
+       return err;
+}
+
+static int cs89x0_platform_remove(struct platform_device *pdev)
+{
+       struct net_device *dev = platform_get_drvdata(pdev);
+
+       unregister_netdev(dev);
+       free_netdev(dev);
+       return 0;
+}
+
+static struct platform_driver cs89x0_platform_driver = {
+       .driver = {
+               .name   = DRV_NAME,
+               .owner  = THIS_MODULE,
+       },
+       .probe  = cs89x0_platform_probe,
+       .remove = cs89x0_platform_remove,
+};
+
+static int __init cs89x0_init(void)
+{
+       return platform_driver_register(&cs89x0_platform_driver);
+}
+
+module_init(cs89x0_init);
+
+static void __exit cs89x0_cleanup(void)
+{
+       platform_driver_unregister(&cs89x0_platform_driver);
+}
+
+module_exit(cs89x0_cleanup);
+
+#endif

 /*
 * Local variables:
--
1.7.1

^ permalink raw reply related

* Re: [RFC] truesize lies
From: Andi Kleen @ 2011-10-09 19:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1318186948.5276.49.camel@edumazet-laptop>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> After some analysis, I found full sized tcp skbs (len=1440) had a 2864
> bytes truesize on my wifi adapter. Ouch...

It's probably more like 4096. slab rounds to powers of two.

> What should we do exactly ?

Give up? It's always been a lie.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

* [RFC] truesize lies
From: Eric Dumazet @ 2011-10-09 19:02 UTC (permalink / raw)
  To: netdev

I noticed on my laptop a difference if I use wifi or wired internet
connectivity.

On wifi mode, I can see netstat -s giving sign of bad rcvbuf tuning :

   371 packets collapsed in receive queue due to low socket buffer

After some analysis, I found full sized tcp skbs (len=1440) had a 2864
bytes truesize on my wifi adapter. Ouch...

On tg3 adapter, truesize is MTU + NET_SKB_PAD + sizeof(sk_buff).

On some devices (say... NIU, but other drivers have same problem :
bnx2x, ), truesize is sizeof(sk_buff) + frame_length.

So if 'truesize' really means to account exact memory cost of frames
(including the empty space after used part), I would say :

1) NIU is lying (it should account in niu_rx_skb_append() not the used
part of the page, but reserved part : PAGE_SIZE/rbr_blocks_per_page)

2) Autotuning is a bit pessimistic : It works only if truesize is MSS +
sizeof(sk_buff) + 16 + MAX_TCP_HEADER


I guess most driver writers adjust truesize to please TCP stack and get
nice performance numbers.

skb->truesize = frame_len + sizeof(sk_buff);

What should we do exactly ?

^ permalink raw reply

* Re: [net-next 02/11] igb: Use node specific allocations for the q_vectors and rings
From: Andi Kleen @ 2011-10-09 18:08 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Alexander Duyck, netdev, gospo, sassmann
In-Reply-To: <1318056461-19562-3-git-send-email-jeffrey.t.kirsher@intel.com>

Jeff Kirsher <jeffrey.t.kirsher@intel.com> writes:
>  
>  	for (i = 0; i < adapter->num_tx_queues; i++) {
> -		ring = kzalloc(sizeof(struct igb_ring), GFP_KERNEL);
> +		if (orig_node == -1) {
> +			int cur_node = next_online_node(adapter->node);
> +			if (cur_node == MAX_NUMNODES)
> +				cur_node = first_online_node;

RR seems quite arbitrary. Who guarantees those nodes have any
relationship with the CPUs submitting on those queues? Or the node
the device is on.

Anyways if it's a good idea probably need to add a
dma_alloc_coherent_node() too

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox