Netdev List
 help / color / mirror / Atom feed
* Re: ipv4: Simplify ARP hash function.
From: Martin Mares @ 2011-07-08 17:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110708.101056.89960389404725087.davem@davemloft.net>

Hello!

> Using Jenkins is over the top.
> 
> If the premise is that the hash_rnd is a random unpredictable key,
> then:
> 
> 	key ^ dev->ifindex ^ hash_rnd
> 
> results in an unpredictable hash result, even if an attacker
> controls 'key' and 'dev->ifindex' completely.
> 
> Therefore, if this hash result is unpredictable, then the
> final fold phase of:
> 
> 	(val >> 8) ^ (val >> 16) ^ (val >> 24)
> 
> is unpredictable as well.

If I understand the new hash function correctly, it should be very easy
for an outside attacker to force arbitrary collisions.

The hash function is linear, so it can be reduced to:

	a = key ^ dev->ifindex
	return (a >> 8) ^ (a >> 16) ^ (a >> 24)				// (1)
	     ^ (hash_rnd >> 8) ^ (hash_rnd >> 16) ^ (hash_rnd >> 24)	// (2)

Where (1) is under control of the attacker and while (2) is not, the
only effect of (2) is a random permutation on the hash buckets.

I.e., the attacker can generate arbitrarily long collision chains,
although he cannot pick the bucket where the collisions happen :)

Am I right?

				Have a nice fortnight
-- 
Martin `MJ' Mares                          <mj@ucw.cz>   http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
If going to a church makes you a Christian, does going to a garage make you a car?

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 17:47 UTC (permalink / raw)
  To: mj; +Cc: netdev
In-Reply-To: <mj+md-20110708.173556.16517.nikam@ucw.cz>

From: Martin Mares <mj@ucw.cz>
Date: Fri, 8 Jul 2011 19:40:55 +0200

> The hash function is linear, so it can be reduced to:
> 
> 	a = key ^ dev->ifindex
> 	return (a >> 8) ^ (a >> 16) ^ (a >> 24)				// (1)
> 	     ^ (hash_rnd >> 8) ^ (hash_rnd >> 16) ^ (hash_rnd >> 24)	// (2)

Is this really the same?  The inclusion of a full 32-bit xor
with hash_rnd before folding was intentional, so that the
final folding occurs on a completely "random" value.

> Where (1) is under control of the attacker and while (2) is not, the
> only effect of (2) is a random permutation on the hash buckets.
> 
> I.e., the attacker can generate arbitrarily long collision chains,
> although he cannot pick the bucket where the collisions happen :)
> 
> Am I right?

Please give an example :-)

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 17:54 UTC (permalink / raw)
  To: mj; +Cc: netdev
In-Reply-To: <20110708.104739.169518036069870432.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 08 Jul 2011 10:47:39 -0700 (PDT)

> From: Martin Mares <mj@ucw.cz>
> Date: Fri, 8 Jul 2011 19:40:55 +0200
> 
>> The hash function is linear, so it can be reduced to:
>> 
>> 	a = key ^ dev->ifindex
>> 	return (a >> 8) ^ (a >> 16) ^ (a >> 24)				// (1)
>> 	     ^ (hash_rnd >> 8) ^ (hash_rnd >> 16) ^ (hash_rnd >> 24)	// (2)
> 
> Is this really the same?  The inclusion of a full 32-bit xor
> with hash_rnd before folding was intentional, so that the
> final folding occurs on a completely "random" value.

For example, try out this test program.  Run as "./x ${RANDOM_VALUE}",
it shows that the attacker cannot simply just iterate by the number of
hash table slots to create collisions, assuming a hash table size of
256 slots:

--------------------
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argp)
{
	int i, rnd;

	rnd = atoi(argp[1]);
	for (i = 1; i < (64 * 1024); i += 256) {
		int x = (i ^ rnd);

		x ^= (x >> 8) ^ (x << 16) ^ (x >> 24);

		x &= 0xff;

		printf("%d\n", x);
	}
	return 0;
}


^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: John Heffner @ 2011-07-08 18:03 UTC (permalink / raw)
  To: David Miller; +Cc: mj, netdev
In-Reply-To: <20110708.104739.169518036069870432.davem@davemloft.net>

On Fri, Jul 8, 2011 at 1:47 PM, David Miller <davem@davemloft.net> wrote:
> From: Martin Mares <mj@ucw.cz>
> Date: Fri, 8 Jul 2011 19:40:55 +0200
>
>> The hash function is linear, so it can be reduced to:
>>
>>       a = key ^ dev->ifindex
>>       return (a >> 8) ^ (a >> 16) ^ (a >> 24)                         // (1)
>>            ^ (hash_rnd >> 8) ^ (hash_rnd >> 16) ^ (hash_rnd >> 24)    // (2)
>
> Is this really the same?  The inclusion of a full 32-bit xor
> with hash_rnd before folding was intentional, so that the
> final folding occurs on a completely "random" value.

Martin's reduction looks exactly correct to me.

  -John

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 18:06 UTC (permalink / raw)
  To: johnwheffner; +Cc: mj, netdev
In-Reply-To: <CABrhC0kaRCpvmKw=TdPZ+7dRFzdcrjF7hoc-VhbtAfXu=WBU_Q@mail.gmail.com>

From: John Heffner <johnwheffner@gmail.com>
Date: Fri, 8 Jul 2011 14:03:45 -0400

> On Fri, Jul 8, 2011 at 1:47 PM, David Miller <davem@davemloft.net> wrote:
>> From: Martin Mares <mj@ucw.cz>
>> Date: Fri, 8 Jul 2011 19:40:55 +0200
>>
>>> The hash function is linear, so it can be reduced to:
>>>
>>>       a = key ^ dev->ifindex
>>>       return (a >> 8) ^ (a >> 16) ^ (a >> 24)                         // (1)
>>>            ^ (hash_rnd >> 8) ^ (hash_rnd >> 16) ^ (hash_rnd >> 24)    // (2)
>>
>> Is this really the same?  The inclusion of a full 32-bit xor
>> with hash_rnd before folding was intentional, so that the
>> final folding occurs on a completely "random" value.
> 
> Martin's reduction looks exactly correct to me.

Ok, there was also an unintended bug in my original patch,
I lost the bottom 8 bits in the fold, the hash function
should instead be:

+static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
+{
+	u32 val = key ^ dev->ifindex ^ hash_rnd;
+
+	return val ^ (val >> 8) ^ (val >> 16) ^ (val >> 24);
+}

^ permalink raw reply

* Re: [PATCH] net/fec: gasket needs to be enabled for some i.mx
From: Troy Kisky @ 2011-07-08 18:38 UTC (permalink / raw)
  To: Shawn Guo
  Cc: Sascha Hauer, netdev, linux-arm-kernel, u.kleine-koenig, LW,
	David S. Miller
In-Reply-To: <20110708101810.GE6069@pengutronix.de>

On 7/8/2011 3:18 AM, Sascha Hauer wrote:
> On Fri, Jul 01, 2011 at 06:11:22PM +0800, Shawn Guo wrote:
>> On the recent i.mx (mx25/50/53), there is a gasket inside fec
>> controller which needs to be enabled no matter phy works in MII
>> or RMII mode.
>>
>> The current code enables the gasket only when phy interface is RMII.
>> It's broken when the driver works with a MII phy.  The patch uses
>> platform_device_id to distinguish the SoCs that have the gasket and
>> enables it on these SoCs for both MII and RMII mode.
>>
>> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>

While you're making changes, you can change this to
Reported-by: Troy Kisky

Thanks

>> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Sascha Hauer <s.hauer@pengutronix.de>
>> ---
>>  arch/arm/mach-imx/clock-imx25.c                 |    2 +-
>>  arch/arm/mach-imx/clock-imx27.c                 |    2 +-
>>  arch/arm/mach-imx/clock-imx35.c                 |    2 +-
>>  arch/arm/mach-mx5/clock-mx51-mx53.c             |    4 +-
>>  arch/arm/plat-mxc/devices/platform-fec.c        |   17 ++++++++-------
>>  arch/arm/plat-mxc/include/mach/devices-common.h |    1 +
>>  drivers/net/fec.c                               |   26 ++++++++++++++++++++--
>>  7 files changed, 38 insertions(+), 16 deletions(-)
> 
> Just realized that this change breaks m68k support. You shouldn't remove
> DRIVER_NAME from fec_devtype[]
> 
> 
> Sascha
> 

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Roland Dreier @ 2011-07-08 19:26 UTC (permalink / raw)
  To: David Miller; +Cc: johnwheffner, mj, netdev
In-Reply-To: <20110708.110659.1816173367050101549.davem@davemloft.net>

On Fri, Jul 8, 2011 at 11:06 AM, David Miller <davem@davemloft.net> wrote:
> Ok, there was also an unintended bug in my original patch,
> I lost the bottom 8 bits in the fold, the hash function
> should instead be:
>
> +static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
> +{
> +       u32 val = key ^ dev->ifindex ^ hash_rnd;
> +
> +       return val ^ (val >> 8) ^ (val >> 16) ^ (val >> 24);
> +}

Doesn't seem to matter much -- this is now equivalent to

      a = key ^ dev->ifindex
       return (a ^ (a >> 8) ^ (a >> 16) ^ (a >> 24))           // (1)
            ^ (rnd ^ (rnd >> 8) ^ (rnd >> 16) ^ (rnd >> 24))   // (2)

where again the attacker controls (1), and (2) is a constant.

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 19:27 UTC (permalink / raw)
  To: roland; +Cc: johnwheffner, mj, netdev
In-Reply-To: <CAL1RGDWbCi_nE8tRp9zyX3Z1yBHKP9ygxNV6TNKp4Up+0g2EPA@mail.gmail.com>

From: Roland Dreier <roland@purestorage.com>
Date: Fri, 8 Jul 2011 12:26:17 -0700

> On Fri, Jul 8, 2011 at 11:06 AM, David Miller <davem@davemloft.net> wrote:
>> Ok, there was also an unintended bug in my original patch,
>> I lost the bottom 8 bits in the fold, the hash function
>> should instead be:
>>
>> +static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
>> +{
>> +       u32 val = key ^ dev->ifindex ^ hash_rnd;
>> +
>> +       return val ^ (val >> 8) ^ (val >> 16) ^ (val >> 24);
>> +}
> 
> Doesn't seem to matter much -- this is now equivalent to
> 
>       a = key ^ dev->ifindex
>        return (a ^ (a >> 8) ^ (a >> 16) ^ (a >> 24))           // (1)
>             ^ (rnd ^ (rnd >> 8) ^ (rnd >> 16) ^ (rnd >> 24))   // (2)
> 
> where again the attacker controls (1), and (2) is a constant.

Right, but how can you attack it?  Show me how you can grow
a hash chain of arbitrary length by modulating the key in
a deterministic way.

Nobody has done this yet.

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Michał Mirosław @ 2011-07-08 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110708.122742.1006323245708104141.davem@davemloft.net>

2011/7/8 David Miller <davem@davemloft.net>:
> From: Roland Dreier <roland@purestorage.com>
> Date: Fri, 8 Jul 2011 12:26:17 -0700
>
>> On Fri, Jul 8, 2011 at 11:06 AM, David Miller <davem@davemloft.net> wrote:
>>> Ok, there was also an unintended bug in my original patch,
>>> I lost the bottom 8 bits in the fold, the hash function
>>> should instead be:
>>>
>>> +static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
>>> +{
>>> +       u32 val = key ^ dev->ifindex ^ hash_rnd;
>>> +
>>> +       return val ^ (val >> 8) ^ (val >> 16) ^ (val >> 24);
>>> +}
>>
>> Doesn't seem to matter much -- this is now equivalent to
>>
>>       a = key ^ dev->ifindex
>>        return (a ^ (a >> 8) ^ (a >> 16) ^ (a >> 24))           // (1)
>>             ^ (rnd ^ (rnd >> 8) ^ (rnd >> 16) ^ (rnd >> 24))   // (2)
>>
>> where again the attacker controls (1), and (2) is a constant.
>
> Right, but how can you attack it?  Show me how you can grow
> a hash chain of arbitrary length by modulating the key in
> a deterministic way.

For 256 buckets its easy:

hash_index = b[0] ^ b[1] ^ b[2] ^ b[3];
(b[i] are bytes of the key)

With b[3] = b[0] ^ b[1] ^ b[2] you get 2^24 keys that hash to the same bucket.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 19:51 UTC (permalink / raw)
  To: mirqus; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <CAHXqBFKE_hLvFgL1_7F+k+pQ0+tEuhBqeRUOtgaM1yrjvXQQww@mail.gmail.com>

From: Michał Mirosław <mirqus@gmail.com>
Date: Fri, 8 Jul 2011 21:39:18 +0200

> With b[3] = b[0] ^ b[1] ^ b[2] you get 2^24 keys that hash to the same bucket.

Ok, I'm convinced, thanks :-)

--------------------
#include <stdlib.h>
#include <stdio.h>

int hashfn(unsigned int key, unsigned int rnd)
{
	unsigned int x = key ^ rnd;

	x ^= (x >> 8) ^ (x >> 16) ^ (x >> 24);

	return x & 0xff;
}

int count[256];

unsigned int collide(unsigned int key)
{
	unsigned int b0 = key >> 24;
	unsigned int b1 = (key >> 16) & 0xff;
	unsigned int b2 = (key >> 8) & 0xff;

	key &= ~0xff;
	key |= (b0 ^ b1 ^ b2);

	return key;
}

int main(int argc, char **argp)
{
	unsigned int rnd = atoi(argp[1]);
	unsigned int i;

	for (i = 0; i < (64 * 1024); i++) {
		unsigned int key = i << 8;
		unsigned int hash;

		key = collide(key);
		hash = hashfn(key, rnd);
		printf("%u: %u\n", key, hash);
		count[hash]++;
	}
	for (i = 0; i < 256; i++)
		printf("COUNT[%3u]=%3u\n", i, count[i]);
	return 0;
}

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 19:59 UTC (permalink / raw)
  To: mirqus; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110708.125118.886216418938741383.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 08 Jul 2011 12:51:18 -0700 (PDT)

> From: Michał Mirosław <mirqus@gmail.com>
> Date: Fri, 8 Jul 2011 21:39:18 +0200
> 
>> With b[3] = b[0] ^ b[1] ^ b[2] you get 2^24 keys that hash to the same bucket.
> 
> Ok, I'm convinced, thanks :-)

Although, actually it's not this simple.  The attack doesn't work.

As they "attack" us, the ARP hash table grows and thus the hash mask
changes to match.  Then his old collisions won't collide any more.

We could even adjust the fold shifts as the table grows to make this
effect even more pronounced.

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Michał Mirosław @ 2011-07-08 20:10 UTC (permalink / raw)
  To: David Miller; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110708.125912.1535057393082512441.davem@davemloft.net>

2011/7/8 David Miller <davem@davemloft.net>:
> From: David Miller <davem@davemloft.net>
> Date: Fri, 08 Jul 2011 12:51:18 -0700 (PDT)
>
>> From: Michał Mirosław <mirqus@gmail.com>
>> Date: Fri, 8 Jul 2011 21:39:18 +0200
>>
>>> With b[3] = b[0] ^ b[1] ^ b[2] you get 2^24 keys that hash to the same bucket.
>>
>> Ok, I'm convinced, thanks :-)
>
> Although, actually it's not this simple.  The attack doesn't work.
>
> As they "attack" us, the ARP hash table grows and thus the hash mask
> changes to match.  Then his old collisions won't collide any more.
>
> We could even adjust the fold shifts as the table grows to make this
> effect even more pronounced.

There will still be 2^32/n_buckets known values that hash to the same
bucket for every n_buckets. So if the attacker knows when and how the
hash size changes, he can adapt accordingly. It should be easier to
see when you get rid of the XOR [random, but] constant part.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Michał Mirosław @ 2011-07-08 20:34 UTC (permalink / raw)
  To: David Miller; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <CAHXqBFL6fT2V0iFTN1mJmQjEVUEfGEAQVOS=Y46BREUr_KWoow@mail.gmail.com>

W dniu 8 lipca 2011 22:10 użytkownik Michał Mirosław <mirqus@gmail.com> napisał:
> 2011/7/8 David Miller <davem@davemloft.net>:
>> From: David Miller <davem@davemloft.net>
>> Date: Fri, 08 Jul 2011 12:51:18 -0700 (PDT)
>>
>>> From: Michał Mirosław <mirqus@gmail.com>
>>> Date: Fri, 8 Jul 2011 21:39:18 +0200
>>>
>>>> With b[3] = b[0] ^ b[1] ^ b[2] you get 2^24 keys that hash to the same bucket.
>>>
>>> Ok, I'm convinced, thanks :-)
>>
>> Although, actually it's not this simple.  The attack doesn't work.
>>
>> As they "attack" us, the ARP hash table grows and thus the hash mask
>> changes to match.  Then his old collisions won't collide any more.
>>
>> We could even adjust the fold shifts as the table grows to make this
>> effect even more pronounced.
>
> There will still be 2^32/n_buckets known values that hash to the same
> bucket for every n_buckets. So if the attacker knows when and how the
> hash size changes, he can adapt accordingly. It should be easier to
> see when you get rid of the XOR [random, but] constant part.

BTW, am I correct, that neighbour hash tables never shrink? Looking at
net/core/neighbour.c it seems that after the table reaches gc_thresh3
capacity, it is never reallocated again.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 20:35 UTC (permalink / raw)
  To: mirqus; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <CAHXqBFKKRXFjr7y+v12ApuWTpjf-G3H5iBtfQg1gzC5iPfRO=g@mail.gmail.com>

From: Michał Mirosław <mirqus@gmail.com>
Date: Fri, 8 Jul 2011 22:34:22 +0200

> BTW, am I correct, that neighbour hash tables never shrink? Looking at
> net/core/neighbour.c it seems that after the table reaches gc_thresh3
> capacity, it is never reallocated again.

Currently, yes, but I plan to remove that limit.

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Roland Dreier @ 2011-07-08 20:44 UTC (permalink / raw)
  To: David Miller; +Cc: johnwheffner, mj, netdev
In-Reply-To: <20110708.122742.1006323245708104141.davem@davemloft.net>

>> Doesn't seem to matter much -- this is now equivalent to
>>
>>       a = key ^ dev->ifindex
>>        return (a ^ (a >> 8) ^ (a >> 16) ^ (a >> 24))           // (1)
>>             ^ (rnd ^ (rnd >> 8) ^ (rnd >> 16) ^ (rnd >> 24))   // (2)
>>
>> where again the attacker controls (1), and (2) is a constant.

> Right, but how can you attack it?  Show me how you can grow
> a hash chain of arbitrary length by modulating the key in
> a deterministic way.

Well, if two things hash to different buckets with the full hash
function, then they already hashed to different buckets without
the extra randomness.  So why bother with hash_rnd?

The answer is that you have to mix hash_rnd into the hash
in a nonlinear way, so that an attacker can't know if two values
end up in the same bucket or not.

With your hash function, the attacker can just compute the
hash (without hash_rnd) for all the values of key ^ ifindex
and then use all the values that end up in the same bucket.

 - R.

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 22:32 UTC (permalink / raw)
  To: roland; +Cc: johnwheffner, mj, netdev
In-Reply-To: <CAL1RGDWTfchzBSv9rbgfg5DWvqB-Gi-RBpzDPGRUg96fNcf4Bw@mail.gmail.com>

From: Roland Dreier <roland@purestorage.com>
Date: Fri, 8 Jul 2011 13:44:42 -0700

> The answer is that you have to mix hash_rnd into the hash
> in a nonlinear way, so that an attacker can't know if two values
> end up in the same bucket or not.
> 
> With your hash function, the attacker can just compute the
> hash (without hash_rnd) for all the values of key ^ ifindex
> and then use all the values that end up in the same bucket.

Ok, thanks everyone for explaining things.

So what is the cheapest non-linear function we could use?

^ permalink raw reply

* network related lockdep splat in 3.0-rc6+
From: Ben Greear @ 2011-07-08 22:38 UTC (permalink / raw)
  To: netdev

This has some additional NFS patches as well, but is otherwise un-tainted.



=======================================================
[ INFO: possible circular locking dependency detected ]
3.0.0-rc6+ #13
-------------------------------------------------------
gnuserver/2266 is trying to acquire lock:
  (rcu_node_level_0){..-...}, at: [<ffffffff810a7439>] rcu_report_unblock_qs_rnp+0x52/0x72

but task is already holding lock:
  (&rq->lock){-.-.-.}, at: [<ffffffff81045da5>] sched_ttwu_pending+0x34/0x58

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&rq->lock){-.-.-.}:
        [<ffffffff8107b4d5>] lock_acquire+0xf4/0x14b
        [<ffffffff8147e451>] _raw_spin_lock+0x36/0x45
        [<ffffffff8103da13>] __task_rq_lock+0x5b/0x89
        [<ffffffff81046ab6>] wake_up_new_task+0x41/0x116
        [<ffffffff810496d5>] do_fork+0x207/0x2f1
        [<ffffffff81010d25>] kernel_thread+0x70/0x72
        [<ffffffff8146598d>] rest_init+0x21/0xd7
        [<ffffffff81aa9c76>] start_kernel+0x3bd/0x3c8
        [<ffffffff81aa92cd>] x86_64_start_reservations+0xb8/0xbc
        [<ffffffff81aa93d2>] x86_64_start_kernel+0x101/0x110

-> #2 (&p->pi_lock){-.-.-.}:
        [<ffffffff8107b4d5>] lock_acquire+0xf4/0x14b
        [<ffffffff8147e565>] _raw_spin_lock_irqsave+0x4e/0x60
        [<ffffffff810468d0>] try_to_wake_up+0x29/0x1a0
        [<ffffffff81046a54>] default_wake_function+0xd/0xf
        [<ffffffff8106751a>] autoremove_wake_function+0x13/0x38
        [<ffffffff810395d0>] __wake_up_common+0x49/0x7f
        [<ffffffff8103c79a>] __wake_up+0x34/0x48
        [<ffffffff810a74a9>] rcu_report_exp_rnp+0x50/0x89
        [<ffffffff810a8032>] __rcu_read_unlock+0x1e9/0x24e
        [<ffffffff8111a41e>] rcu_read_unlock+0x21/0x23
        [<ffffffff8111a556>] fget_light+0xa2/0xac
        [<ffffffff81128bc1>] do_sys_poll+0x1ff/0x3e5
        [<ffffffff81128f3b>] sys_poll+0x50/0xba
        [<ffffffff81484d52>] system_call_fastpath+0x16/0x1b

-> #1 (sync_rcu_preempt_exp_wq.lock){......}:
        [<ffffffff8107b4d5>] lock_acquire+0xf4/0x14b
        [<ffffffff8147e565>] _raw_spin_lock_irqsave+0x4e/0x60
        [<ffffffff8103c783>] __wake_up+0x1d/0x48
        [<ffffffff810a74a9>] rcu_report_exp_rnp+0x50/0x89
        [<ffffffff810a8c2c>] sync_rcu_preempt_exp_init.clone.0+0x3e/0x53
        [<ffffffff810a8d1c>] synchronize_rcu_expedited+0xdb/0x1c3
        [<ffffffff813c0db3>] synchronize_net+0x25/0x2e
        [<ffffffff813c3382>] rollback_registered_many+0xee/0x1e1
        [<ffffffff813c3489>] unregister_netdevice_many+0x14/0x55
        [<ffffffff813c361b>] default_device_exit_batch+0x98/0xb4
        [<ffffffff813bd95e>] ops_exit_list+0x46/0x4e
        [<ffffffff813bde8c>] cleanup_net+0xeb/0x17d
        [<ffffffff81061343>] process_one_work+0x230/0x41d
        [<ffffffff8106379f>] worker_thread+0x133/0x217
        [<ffffffff81066f9c>] kthread+0x7d/0x85
        [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10

-> #0 (rcu_node_level_0){..-...}:
        [<ffffffff8107ace2>] __lock_acquire+0xae6/0xdd5
        [<ffffffff8107b4d5>] lock_acquire+0xf4/0x14b
        [<ffffffff8147e451>] _raw_spin_lock+0x36/0x45
        [<ffffffff810a7439>] rcu_report_unblock_qs_rnp+0x52/0x72
        [<ffffffff810a7ff0>] __rcu_read_unlock+0x1a7/0x24e
        [<ffffffff8103d34d>] rcu_read_unlock+0x21/0x23
        [<ffffffff8103d3a2>] cpuacct_charge+0x53/0x5b
        [<ffffffff81044d04>] update_curr+0x11f/0x15a
        [<ffffffff81045a37>] enqueue_task_fair+0x46/0x22a
        [<ffffffff8103d2c0>] enqueue_task+0x61/0x68
        [<ffffffff8103d2ef>] activate_task+0x28/0x30
        [<ffffffff81040b3b>] ttwu_activate+0x12/0x34
        [<ffffffff81045d5f>] ttwu_do_activate.clone.4+0x2d/0x3f
        [<ffffffff81045db4>] sched_ttwu_pending+0x43/0x58
        [<ffffffff81045dd2>] scheduler_ipi+0x9/0xb
        [<ffffffff81021e10>] smp_reschedule_interrupt+0x25/0x27
        [<ffffffff81485c73>] reschedule_interrupt+0x13/0x20
        [<ffffffff813fce40>] rcu_read_unlock+0x21/0x23
        [<ffffffff813fd52d>] ip_queue_xmit+0x35e/0x3b1
        [<ffffffff8140f5f3>] tcp_transmit_skb+0x785/0x7c3
        [<ffffffff81411e23>] tcp_write_xmit+0x806/0x8f5
        [<ffffffff81411f63>] __tcp_push_pending_frames+0x20/0x4d
        [<ffffffff8140411f>] tcp_push+0x84/0x86
        [<ffffffff81406577>] tcp_sendmsg+0x674/0x775
        [<ffffffff81423d68>] inet_sendmsg+0x61/0x6a
        [<ffffffff813af67a>] __sock_sendmsg_nosec+0x58/0x61
        [<ffffffff813b0db5>] __sock_sendmsg+0x3d/0x48
        [<ffffffff813b1631>] sock_sendmsg+0xa3/0xbc
        [<ffffffff813b1bf5>] sys_sendto+0xfa/0x11f
        [<ffffffff81484d52>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Chain exists of:
   rcu_node_level_0 --> &p->pi_lock --> &rq->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&rq->lock);
                                lock(&p->pi_lock);
                                lock(&rq->lock);
   lock(rcu_node_level_0);

  *** DEADLOCK ***

2 locks held by gnuserver/2266:
  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff81405f24>] tcp_sendmsg+0x21/0x775
  #1:  (&rq->lock){-.-.-.}, at: [<ffffffff81045da5>] sched_ttwu_pending+0x34/0x58

stack backtrace:
Pid: 2266, comm: gnuserver Not tainted 3.0.0-rc6+ #13
Call Trace:
  <IRQ>  [<ffffffff8107a13d>] print_circular_bug+0x1fe/0x20f
  [<ffffffff8107ace2>] __lock_acquire+0xae6/0xdd5
  [<ffffffff810a7439>] ? rcu_report_unblock_qs_rnp+0x52/0x72
  [<ffffffff8107b4d5>] lock_acquire+0xf4/0x14b
  [<ffffffff810a7439>] ? rcu_report_unblock_qs_rnp+0x52/0x72
  [<ffffffff8147e451>] _raw_spin_lock+0x36/0x45
  [<ffffffff810a7439>] ? rcu_report_unblock_qs_rnp+0x52/0x72
  [<ffffffff8147ea3b>] ? _raw_spin_unlock+0x45/0x52
  [<ffffffff810a7439>] rcu_report_unblock_qs_rnp+0x52/0x72
  [<ffffffff810a7f25>] ? __rcu_read_unlock+0xdc/0x24e
  [<ffffffff810a7ff0>] __rcu_read_unlock+0x1a7/0x24e
  [<ffffffff8103d34d>] rcu_read_unlock+0x21/0x23
  [<ffffffff8103d3a2>] cpuacct_charge+0x53/0x5b
  [<ffffffff81044d04>] update_curr+0x11f/0x15a
  [<ffffffff81045a37>] enqueue_task_fair+0x46/0x22a
  [<ffffffff8103d2c0>] enqueue_task+0x61/0x68
  [<ffffffff8103d2ef>] activate_task+0x28/0x30
  [<ffffffff81040b3b>] ttwu_activate+0x12/0x34
  [<ffffffff81045d5f>] ttwu_do_activate.clone.4+0x2d/0x3f
  [<ffffffff81045db4>] sched_ttwu_pending+0x43/0x58
  [<ffffffff81045dd2>] scheduler_ipi+0x9/0xb
  [<ffffffff81021e10>] smp_reschedule_interrupt+0x25/0x27
  [<ffffffff81485c73>] reschedule_interrupt+0x13/0x20
  <EOI>  [<ffffffff810a7e9b>] ? __rcu_read_unlock+0x52/0x24e
  [<ffffffff813fce40>] rcu_read_unlock+0x21/0x23
  [<ffffffff813fd52d>] ip_queue_xmit+0x35e/0x3b1
  [<ffffffff813fd1cf>] ? ip_send_reply+0x247/0x247
  [<ffffffff8140f5f3>] tcp_transmit_skb+0x785/0x7c3
  [<ffffffff81411e23>] tcp_write_xmit+0x806/0x8f5
  [<ffffffff810e646f>] ? might_fault+0x4e/0x9e
  [<ffffffff81403e25>] ? copy_from_user+0x2a/0x2c
  [<ffffffff81411f63>] __tcp_push_pending_frames+0x20/0x4d
  [<ffffffff8140411f>] tcp_push+0x84/0x86
  [<ffffffff81406577>] tcp_sendmsg+0x674/0x775
  [<ffffffff81423d68>] inet_sendmsg+0x61/0x6a
  [<ffffffff813af67a>] __sock_sendmsg_nosec+0x58/0x61
  [<ffffffff813b0db5>] __sock_sendmsg+0x3d/0x48
  [<ffffffff813b1631>] sock_sendmsg+0xa3/0xbc
  [<ffffffff810ea6ae>] ? handle_pte_fault+0x7fc/0x84d
  [<ffffffff8111a4e9>] ? fget_light+0x35/0xac
  [<ffffffff813b16b2>] ? sockfd_lookup_light+0x1b/0x53
  [<ffffffff813b1bf5>] sys_sendto+0xfa/0x11f
  [<ffffffff811326ac>] ? mntput_no_expire+0x52/0x109
  [<ffffffff81132784>] ? mntput+0x21/0x23
  [<ffffffff8111af94>] ? fput+0x1a3/0x1b2
  [<ffffffff8109f0a1>] ? audit_syscall_entry+0x119/0x145
  [<ffffffff81484d52>] system_call_fastpath+0x16/0x1b

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: network related lockdep splat in 3.0-rc6+
From: David Miller @ 2011-07-08 22:48 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <4E1786D9.3020209@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Fri, 08 Jul 2011 15:38:17 -0700

> This has some additional NFS patches as well, but is otherwise
> un-tainted.
> 

"networking related" when it's honking in the scheduler
wakeup paths?

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Roland Dreier @ 2011-07-08 23:11 UTC (permalink / raw)
  To: David Miller; +Cc: johnwheffner, mj, netdev
In-Reply-To: <20110708.153258.1997707802176810939.davem@davemloft.net>

On Fri, Jul 8, 2011 at 3:32 PM, David Miller <davem@davemloft.net> wrote:
> So what is the cheapest non-linear function we could use?

I'm not comfortable giving cryptographic advice, but even + (addition
with carry) is nonlinear when combined with ^.  However that seems
like the low-order bits might be too predictable.

Maybe * of hash key with a random odd value is good enough?

 - R.

^ permalink raw reply

* [RFC PATCH] net: clean up rx_copybreak handling
From: Michał Mirosław @ 2011-07-08 23:27 UTC (permalink / raw)
  To: netdev

Things noticed:
 - some drivers drop front packets when they can't allocate new RX skb
 - tg3: left alone because of complexity
 - 82596: left alone because of virt_to_bus(); should probably be based on lib82596.c
 - sgiseeq: does it really loop back transmitted frames?

Patch side effects:
 - use ETH_FCS_LEN in some drivers
 - make rx_copybreak consistent: meaning copy packets <= threshold
 - implicit partial conversion to generic DMA API (from PCI specific calls)
 - DMA-unmap whole buffers instead of pkt_len (couple of drivers)
 - consistently use netdev_alloc_skb_ip_align() for copies

I made the common functions inline because they are called from hot path
and have a lot of arguments. This should allow the compiler to optimize
the calls depending on RX handler code.

Builds on x86 allyesconfig.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/3c59x.c             |   23 ++-----
 drivers/net/epic100.c           |   32 +++-------
 drivers/net/fealnx.c            |   39 ++----------
 drivers/net/hamachi.c           |   43 +++-----------
 drivers/net/ibmveth.c           |    2 +-
 drivers/net/lib82596.c          |   66 ++++----------------
 drivers/net/natsemi.c           |   30 ++--------
 drivers/net/pcnet32.c           |   52 +++-------------
 drivers/net/sgiseeq.c           |   66 ++++++++------------
 drivers/net/sis190.c            |   38 ++----------
 drivers/net/starfire.c          |   27 +++------
 drivers/net/sundance.c          |   26 ++------
 drivers/net/tg3.c               |    1 +
 drivers/net/tulip/de2104x.c     |   38 +++---------
 drivers/net/tulip/interrupt.c   |   77 +++++------------------
 drivers/net/tulip/winbond-840.c |   25 ++------
 drivers/net/typhoon.c           |   26 ++------
 drivers/net/via-rhine.c         |   37 ++----------
 drivers/net/via-velocity.c      |   59 +++---------------
 drivers/net/yellowfin.c         |   27 ++------
 include/linux/skbuff.h          |  129 +++++++++++++++++++++++++++++++++++++++
 21 files changed, 290 insertions(+), 573 deletions(-)

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 8cc2256..456726c 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -2576,25 +2576,14 @@ boomerang_rx(struct net_device *dev)
 				pr_debug("Receiving packet size %d status %4.4x.\n",
 					   pkt_len, rx_status);
 
-			/* Check if the packet is long enough to just accept without
-			   copying to a properly sized skbuff. */
-			if (pkt_len < rx_copybreak && (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-				skb_reserve(skb, 2);	/* Align IP on 16 byte boundaries */
-				pci_dma_sync_single_for_cpu(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
-				/* 'skb_put()' points to the start of sk_buff data area. */
-				memcpy(skb_put(skb, pkt_len),
-					   vp->rx_skbuff[entry]->data,
-					   pkt_len);
-				pci_dma_sync_single_for_device(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
+			skb = dev_skb_finish_rx_dma(&vp->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&VORTEX_PCI(vp)->dev, dma, PKT_BUF_SZ);
+			if (skb)
 				vp->rx_copy++;
-			} else {
-				/* Pass up the skbuff already on the Rx ring. */
-				skb = vp->rx_skbuff[entry];
-				vp->rx_skbuff[entry] = NULL;
-				skb_put(skb, pkt_len);
-				pci_unmap_single(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
+			else
 				vp->rx_nocopy++;
-			}
+
 			skb->protocol = eth_type_trans(skb, dev);
 			{					/* Use hardware checksum info. */
 				int csum_bits = rx_status & 0xee000000;
diff --git a/drivers/net/epic100.c b/drivers/net/epic100.c
index 814c187..0a22072 100644
--- a/drivers/net/epic100.c
+++ b/drivers/net/epic100.c
@@ -1188,37 +1188,21 @@ static int epic_rx(struct net_device *dev, int budget)
 		} else {
 			/* Malloc up new buffer, compatible with net-2e. */
 			/* Omit the four octet CRC from the length. */
-			short pkt_len = (status >> 16) - 4;
+			short pkt_len = (status >> 16) - ETH_FCS_LEN;
 			struct sk_buff *skb;
 
-			if (pkt_len > PKT_BUF_SZ - 4) {
+			if (pkt_len > PKT_BUF_SZ - ETH_FCS_LEN) {
 				printk(KERN_ERR "%s: Oversized Ethernet frame, status %x "
 					   "%d bytes.\n",
 					   dev->name, status, pkt_len);
 				pkt_len = 1514;
 			}
-			/* Check if the packet is long enough to accept without copying
-			   to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				pci_dma_sync_single_for_cpu(ep->pci_dev,
-							    ep->rx_ring[entry].bufaddr,
-							    ep->rx_buf_sz,
-							    PCI_DMA_FROMDEVICE);
-				skb_copy_to_linear_data(skb, ep->rx_skbuff[entry]->data, pkt_len);
-				skb_put(skb, pkt_len);
-				pci_dma_sync_single_for_device(ep->pci_dev,
-							       ep->rx_ring[entry].bufaddr,
-							       ep->rx_buf_sz,
-							       PCI_DMA_FROMDEVICE);
-			} else {
-				pci_unmap_single(ep->pci_dev,
-					ep->rx_ring[entry].bufaddr,
-					ep->rx_buf_sz, PCI_DMA_FROMDEVICE);
-				skb_put(skb = ep->rx_skbuff[entry], pkt_len);
-				ep->rx_skbuff[entry] = NULL;
-			}
+
+			skb = dev_skb_finish_rx_dma(&ep->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&ep->pci_dev->dev, ep->rx_ring[entry].bufaddr,
+				ep->rx_buf_sz);
+
 			skb->protocol = eth_type_trans(skb, dev);
 			netif_receive_skb(skb);
 			dev->stats.rx_packets++;
diff --git a/drivers/net/fealnx.c b/drivers/net/fealnx.c
index fa8677c..b692a4d 100644
--- a/drivers/net/fealnx.c
+++ b/drivers/net/fealnx.c
@@ -1693,46 +1693,21 @@ static int netdev_rx(struct net_device *dev)
 
 			struct sk_buff *skb;
 			/* Omit the four octet CRC from the length. */
-			short pkt_len = ((rx_status & FLNGMASK) >> FLNGShift) - 4;
+			short pkt_len = ((rx_status & FLNGMASK) >> FLNGShift) - ETH_FCS_LEN;
 
 #ifndef final_version
 			if (debug)
 				printk(KERN_DEBUG "  netdev_rx() normal Rx pkt length %d"
 				       " status %x.\n", pkt_len, rx_status);
 #endif
+			skb = dev_skb_finish_rx_dma(&np->cur_rx->skbuff,
+				pkt_len, rx_copybreak,
+				&np->pci_dev->dev, np->cur_rx->buffer,
+				np->rx_buf_sz);
 
-			/* Check if the packet is long enough to accept without copying
-			   to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				pci_dma_sync_single_for_cpu(np->pci_dev,
-							    np->cur_rx->buffer,
-							    np->rx_buf_sz,
-							    PCI_DMA_FROMDEVICE);
-				/* Call copy + cksum if available. */
-
-#if ! defined(__alpha__)
-				skb_copy_to_linear_data(skb,
-					np->cur_rx->skbuff->data, pkt_len);
-				skb_put(skb, pkt_len);
-#else
-				memcpy(skb_put(skb, pkt_len),
-					np->cur_rx->skbuff->data, pkt_len);
-#endif
-				pci_dma_sync_single_for_device(np->pci_dev,
-							       np->cur_rx->buffer,
-							       np->rx_buf_sz,
-							       PCI_DMA_FROMDEVICE);
-			} else {
-				pci_unmap_single(np->pci_dev,
-						 np->cur_rx->buffer,
-						 np->rx_buf_sz,
-						 PCI_DMA_FROMDEVICE);
-				skb_put(skb = np->cur_rx->skbuff, pkt_len);
-				np->cur_rx->skbuff = NULL;
+			if (!np->cur_rx->skbuff)
 				--np->really_rx_count;
-			}
+
 			skb->protocol = eth_type_trans(skb, dev);
 			netif_rx(skb);
 			dev->stats.rx_packets++;
diff --git a/drivers/net/hamachi.c b/drivers/net/hamachi.c
index c274b3d..68ee914 100644
--- a/drivers/net/hamachi.c
+++ b/drivers/net/hamachi.c
@@ -1465,7 +1465,7 @@ static int hamachi_rx(struct net_device *dev)
 		} else {
 			struct sk_buff *skb;
 			/* Omit CRC */
-			u16 pkt_len = (frame_status & 0x07ff) - 4;
+			u16 pkt_len = (frame_status & 0x07ff) - ETH_FCS_LEN;
 #ifdef RX_CHECKSUM
 			u32 pfck = *(u32 *) &buf_addr[data_size - 8];
 #endif
@@ -1485,42 +1485,15 @@ static int hamachi_rx(struct net_device *dev)
 					   *(s32*)&(buf_addr[data_size - 8]),
 					   *(s32*)&(buf_addr[data_size - 4]));
 #endif
-			/* Check if the packet is long enough to accept without copying
-			   to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-#ifdef RX_CHECKSUM
-				printk(KERN_ERR "%s: rx_copybreak non-zero "
-				  "not good with RX_CHECKSUM\n", dev->name);
-#endif
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				pci_dma_sync_single_for_cpu(hmp->pci_dev,
-							    leXX_to_cpu(hmp->rx_ring[entry].addr),
-							    hmp->rx_buf_sz,
-							    PCI_DMA_FROMDEVICE);
-				/* Call copy + cksum if available. */
-#if 1 || USE_IP_COPYSUM
-				skb_copy_to_linear_data(skb,
-					hmp->rx_skbuff[entry]->data, pkt_len);
-				skb_put(skb, pkt_len);
-#else
-				memcpy(skb_put(skb, pkt_len), hmp->rx_ring_dma
-					+ entry*sizeof(*desc), pkt_len);
-#endif
-				pci_dma_sync_single_for_device(hmp->pci_dev,
-							       leXX_to_cpu(hmp->rx_ring[entry].addr),
-							       hmp->rx_buf_sz,
-							       PCI_DMA_FROMDEVICE);
-			} else {
-				pci_unmap_single(hmp->pci_dev,
-						 leXX_to_cpu(hmp->rx_ring[entry].addr),
-						 hmp->rx_buf_sz, PCI_DMA_FROMDEVICE);
-				skb_put(skb = hmp->rx_skbuff[entry], pkt_len);
-				hmp->rx_skbuff[entry] = NULL;
-			}
+
+			skb = dev_skb_finish_rx_dma(&hmp->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&hmp->pci_dev->dev,
+				leXX_to_cpu(hmp->rx_ring[entry].addr),
+				hmp->rx_buf_sz);
+
 			skb->protocol = eth_type_trans(skb, dev);
 
-
 #ifdef RX_CHECKSUM
 			/* TCP or UDP on ipv4, DIX encoding */
 			if (pfck>>24 == 0x91 || pfck>>24 == 0x51) {
diff --git a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c
index 06514bc..f9fb9d8 100644
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -1076,7 +1076,7 @@ restart_poll:
 			skb = ibmveth_rxq_get_buffer(adapter);
 
 			new_skb = NULL;
-			if (length < rx_copybreak)
+			if (length <= rx_copybreak)
 				new_skb = netdev_alloc_skb(netdev, length);
 
 			if (new_skb) {
diff --git a/drivers/net/lib82596.c b/drivers/net/lib82596.c
index 9e04289..d19e05b 100644
--- a/drivers/net/lib82596.c
+++ b/drivers/net/lib82596.c
@@ -679,67 +679,29 @@ static inline int i596_rx(struct net_device *dev)
 		if (rbd != NULL && (rfd->stat & SWAP16(STAT_OK))) {
 			/* a good frame */
 			int pkt_len = SWAP16(rbd->count) & 0x3fff;
-			struct sk_buff *skb = rbd->skb;
-			int rx_in_place = 0;
+			struct sk_buff *skb;
+			dma_addr_t dma_addr;
 
 			DEB(DEB_RXADDR, print_eth(rbd->v_data, "received"));
 			frames++;
 
-			/* Check if the packet is long enough to just accept
-			 * without copying to a properly sized skbuff.
-			 */
+			dma_addr = SWAP32(rbd->b_data);
+			skb = dev_skb_finish_rx_dma_refill(&rbd->skb,
+				pkt_len, rx_copybreak, NET_IP_ALIGN, 0,
+				dev->dev.parent, &dma_addr, PKT_BUF_SZ);
+			rbd->v_data = rbd->skb->data;
+			rbd->b_data = SWAP32(dma_addr);
+			DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
 
-			if (pkt_len > rx_copybreak) {
-				struct sk_buff *newskb;
-				dma_addr_t dma_addr;
-
-				dma_unmap_single(dev->dev.parent,
-						 (dma_addr_t)SWAP32(rbd->b_data),
-						 PKT_BUF_SZ, DMA_FROM_DEVICE);
-				/* Get fresh skbuff to replace filled one. */
-				newskb = netdev_alloc_skb_ip_align(dev,
-								   PKT_BUF_SZ);
-				if (newskb == NULL) {
-					skb = NULL;	/* drop pkt */
-					goto memory_squeeze;
-				}
-
-				/* Pass up the skb already on the Rx ring. */
-				skb_put(skb, pkt_len);
-				rx_in_place = 1;
-				rbd->skb = newskb;
-				dma_addr = dma_map_single(dev->dev.parent,
-							  newskb->data,
-							  PKT_BUF_SZ,
-							  DMA_FROM_DEVICE);
-				rbd->v_data = newskb->data;
-				rbd->b_data = SWAP32(dma_addr);
-				DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
-			} else
-				skb = netdev_alloc_skb_ip_align(dev, pkt_len);
-memory_squeeze:
-			if (skb == NULL) {
-				/* XXX tulip.c can defer packets here!! */
-				printk(KERN_ERR
-				       "%s: i596_rx Memory squeeze, dropping packet.\n",
-				       dev->name);
-				dev->stats.rx_dropped++;
-			} else {
-				if (!rx_in_place) {
-					/* 16 byte align the data fields */
-					dma_sync_single_for_cpu(dev->dev.parent,
-								(dma_addr_t)SWAP32(rbd->b_data),
-								PKT_BUF_SZ, DMA_FROM_DEVICE);
-					memcpy(skb_put(skb, pkt_len), rbd->v_data, pkt_len);
-					dma_sync_single_for_device(dev->dev.parent,
-								   (dma_addr_t)SWAP32(rbd->b_data),
-								   PKT_BUF_SZ, DMA_FROM_DEVICE);
-				}
-				skb->len = pkt_len;
+			if (likely(skb)) {
 				skb->protocol = eth_type_trans(skb, dev);
 				netif_rx(skb);
 				dev->stats.rx_packets++;
 				dev->stats.rx_bytes += pkt_len;
+			} else {
+				netdev_err(dev,
+				       "i596_rx Memory squeeze, dropping packet.\n");
+				dev->stats.rx_dropped++;
 			}
 		} else {
 			DEB(DEB_ERRORS, printk(KERN_DEBUG
diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c
index 8f8b65a..b461321 100644
--- a/drivers/net/natsemi.c
+++ b/drivers/net/natsemi.c
@@ -2340,31 +2340,11 @@ static void netdev_rx(struct net_device *dev, int *work_done, int work_to_do)
 			 */
 		} else {
 			struct sk_buff *skb;
-			/* Omit CRC size. */
-			/* Check if the packet is long enough to accept
-			 * without copying to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + RX_OFFSET)) != NULL) {
-				/* 16 byte align the IP header */
-				skb_reserve(skb, RX_OFFSET);
-				pci_dma_sync_single_for_cpu(np->pci_dev,
-					np->rx_dma[entry],
-					buflen,
-					PCI_DMA_FROMDEVICE);
-				skb_copy_to_linear_data(skb,
-					np->rx_skbuff[entry]->data, pkt_len);
-				skb_put(skb, pkt_len);
-				pci_dma_sync_single_for_device(np->pci_dev,
-					np->rx_dma[entry],
-					buflen,
-					PCI_DMA_FROMDEVICE);
-			} else {
-				pci_unmap_single(np->pci_dev, np->rx_dma[entry],
-						 buflen + NATSEMI_PADDING,
-						 PCI_DMA_FROMDEVICE);
-				skb_put(skb = np->rx_skbuff[entry], pkt_len);
-				np->rx_skbuff[entry] = NULL;
-			}
+
+			skb = dev_skb_finish_rx_dma(&np->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&np->pci_dev->dev, np->rx_dma[entry], buflen);
+
 			skb->protocol = eth_type_trans(skb, dev);
 			netif_receive_skb(skb);
 			dev->stats.rx_packets++;
diff --git a/drivers/net/pcnet32.c b/drivers/net/pcnet32.c
index b48aba9..f736dd5 100644
--- a/drivers/net/pcnet32.c
+++ b/drivers/net/pcnet32.c
@@ -588,7 +588,7 @@ static void pcnet32_realloc_rx_ring(struct net_device *dev,
 	/* now allocate any new buffers needed */
 	for (; new < size; new++) {
 		struct sk_buff *rx_skbuff;
-		new_skb_list[new] = dev_alloc_skb(PKT_BUF_SKB);
+		new_skb_list[new] = netdev_alloc_skb_ip_align(dev, PKT_BUF_SKB);
 		rx_skbuff = new_skb_list[new];
 		if (!rx_skbuff) {
 			/* keep the original lists and buffers */
@@ -596,7 +596,6 @@ static void pcnet32_realloc_rx_ring(struct net_device *dev,
 				  __func__);
 			goto free_all_new;
 		}
-		skb_reserve(rx_skbuff, NET_IP_ALIGN);
 
 		new_dma_addr_list[new] =
 			    pci_map_single(lp->pci_dev, rx_skbuff->data,
@@ -1147,51 +1146,18 @@ static void pcnet32_rx_entry(struct net_device *dev,
 		return;
 	}
 
-	if (pkt_len > rx_copybreak) {
-		struct sk_buff *newskb;
-
-		newskb = dev_alloc_skb(PKT_BUF_SKB);
-		if (newskb) {
-			skb_reserve(newskb, NET_IP_ALIGN);
-			skb = lp->rx_skbuff[entry];
-			pci_unmap_single(lp->pci_dev,
-					 lp->rx_dma_addr[entry],
-					 PKT_BUF_SIZE,
-					 PCI_DMA_FROMDEVICE);
-			skb_put(skb, pkt_len);
-			lp->rx_skbuff[entry] = newskb;
-			lp->rx_dma_addr[entry] =
-					    pci_map_single(lp->pci_dev,
-							   newskb->data,
-							   PKT_BUF_SIZE,
-							   PCI_DMA_FROMDEVICE);
-			rxp->base = cpu_to_le32(lp->rx_dma_addr[entry]);
-			rx_in_place = 1;
-		} else
-			skb = NULL;
-	} else
-		skb = dev_alloc_skb(pkt_len + NET_IP_ALIGN);
+	skb = dev_skb_finish_rx_dma_refill(&lp->rx_skbuff[entry],
+		pkt_len, rx_copybreak, NET_IP_ALIGN, 0,
+		&lp->pci_dev->dev, &lp->rx_dma_addr[entry],
+		PKT_BUF_SIZE);
+	rxp->base = cpu_to_le32(lp->rx_dma_addr[entry]);
 
 	if (skb == NULL) {
 		netif_err(lp, drv, dev, "Memory squeeze, dropping packet\n");
 		dev->stats.rx_dropped++;
 		return;
 	}
-	if (!rx_in_place) {
-		skb_reserve(skb, NET_IP_ALIGN);
-		skb_put(skb, pkt_len);	/* Make room */
-		pci_dma_sync_single_for_cpu(lp->pci_dev,
-					    lp->rx_dma_addr[entry],
-					    pkt_len,
-					    PCI_DMA_FROMDEVICE);
-		skb_copy_to_linear_data(skb,
-				 (unsigned char *)(lp->rx_skbuff[entry]->data),
-				 pkt_len);
-		pci_dma_sync_single_for_device(lp->pci_dev,
-					       lp->rx_dma_addr[entry],
-					       pkt_len,
-					       PCI_DMA_FROMDEVICE);
-	}
+
 	dev->stats.rx_bytes += skb->len;
 	skb->protocol = eth_type_trans(skb, dev);
 	netif_receive_skb(skb);
@@ -2271,7 +2237,8 @@ static int pcnet32_init_ring(struct net_device *dev)
 	for (i = 0; i < lp->rx_ring_size; i++) {
 		struct sk_buff *rx_skbuff = lp->rx_skbuff[i];
 		if (rx_skbuff == NULL) {
-			lp->rx_skbuff[i] = dev_alloc_skb(PKT_BUF_SKB);
+			lp->rx_skbuff[i] =
+				netdev_alloc_skb_ip_align(dev, PKT_BUF_SKB);
 			rx_skbuff = lp->rx_skbuff[i];
 			if (!rx_skbuff) {
 				/* there is not much we can do at this point */
@@ -2279,7 +2246,6 @@ static int pcnet32_init_ring(struct net_device *dev)
 					  __func__);
 				return -1;
 			}
-			skb_reserve(rx_skbuff, NET_IP_ALIGN);
 		}
 
 		rmb();
diff --git a/drivers/net/sgiseeq.c b/drivers/net/sgiseeq.c
index 52fb7ed..a4c6d93 100644
--- a/drivers/net/sgiseeq.c
+++ b/drivers/net/sgiseeq.c
@@ -340,9 +340,8 @@ static inline void sgiseeq_rx(struct net_device *dev, struct sgiseeq_private *sp
 {
 	struct sgiseeq_rx_desc *rd;
 	struct sk_buff *skb = NULL;
-	struct sk_buff *newskb;
 	unsigned char pkt_status;
-	int len = 0;
+	int packet_ok, len = 0;
 	unsigned int orig_end = PREV_RX(sp->rx_new);
 
 	/* Service every received packet. */
@@ -350,53 +349,38 @@ static inline void sgiseeq_rx(struct net_device *dev, struct sgiseeq_private *sp
 	dma_sync_desc_cpu(dev, rd);
 	while (!(rd->rdma.cntinfo & HPCDMA_OWN)) {
 		len = PKT_BUF_SZ - (rd->rdma.cntinfo & HPCDMA_BCNT) - 3;
-		dma_unmap_single(dev->dev.parent, rd->rdma.pbuf,
+		dma_sync_single_for_cpu(dev->dev.parent, rd->rdma.pbuf,
 				 PKT_BUF_SZ, DMA_FROM_DEVICE);
 		pkt_status = rd->skb->data[len];
 		if (pkt_status & SEEQ_RSTAT_FIG) {
 			/* Packet is OK. */
 			/* We don't want to receive our own packets */
-			if (memcmp(rd->skb->data + 6, dev->dev_addr, ETH_ALEN)) {
-				if (len > rx_copybreak) {
-					skb = rd->skb;
-					newskb = netdev_alloc_skb(dev, PKT_BUF_SZ);
-					if (!newskb) {
-						newskb = skb;
-						skb = NULL;
-						goto memory_squeeze;
-					}
-					skb_reserve(newskb, 2);
-				} else {
-					skb = netdev_alloc_skb_ip_align(dev, len);
-					if (skb)
-						skb_copy_to_linear_data(skb, rd->skb->data, len);
-
-					newskb = rd->skb;
-				}
-memory_squeeze:
-				if (skb) {
-					skb_put(skb, len);
-					skb->protocol = eth_type_trans(skb, dev);
-					netif_rx(skb);
-					dev->stats.rx_packets++;
-					dev->stats.rx_bytes += len;
-				} else {
-					printk(KERN_NOTICE "%s: Memory squeeze, deferring packet.\n",
-						dev->name);
-					dev->stats.rx_dropped++;
-				}
-			} else {
-				/* Silently drop my own packets */
-				newskb = rd->skb;
-			}
+			packet_ok = memcmp(rd->skb->data + 6, dev->dev_addr, ETH_ALEN);
 		} else {
 			record_rx_errors(dev, pkt_status);
-			newskb = rd->skb;
+			packet_ok = 0;
+		}
+		dma_sync_single_for_device(dev->dev.parent, rd->rdma.pbuf,
+			PKT_BUF_SZ, DMA_FROM_DEVICE);
+
+		if (packet_ok) {
+			dma_addr_t dma = rd->rdma.pbuf;
+			skb = dev_skb_finish_rx_dma_refill(&rd->skb,
+				len, rx_copybreak, 0, 2,
+				dev->dev.parent, &dma, PKT_BUF_SZ);
+			rd->rdma.pbuf = dma;
+
+			if (likely(skb)) {
+				skb->protocol = eth_type_trans(skb, dev);
+				netif_rx(skb);
+				dev->stats.rx_packets++;
+				dev->stats.rx_bytes += len;
+			} else {
+				printk(KERN_NOTICE "%s: Memory squeeze, dropping packet.\n",
+					dev->name);
+				dev->stats.rx_dropped++;
+			}
 		}
-		rd->skb = newskb;
-		rd->rdma.pbuf = dma_map_single(dev->dev.parent,
-					       newskb->data - 2,
-					       PKT_BUF_SZ, DMA_FROM_DEVICE);
 
 		/* Return the entry to the ring pool. */
 		rd->rdma.cntinfo = RCNTINFO_INIT;
diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c
index 8ad7bfb..6836e0d 100644
--- a/drivers/net/sis190.c
+++ b/drivers/net/sis190.c
@@ -530,29 +530,6 @@ static u32 sis190_rx_fill(struct sis190_private *tp, struct net_device *dev,
 	return cur - start;
 }
 
-static bool sis190_try_rx_copy(struct sis190_private *tp,
-			       struct sk_buff **sk_buff, int pkt_size,
-			       dma_addr_t addr)
-{
-	struct sk_buff *skb;
-	bool done = false;
-
-	if (pkt_size >= rx_copybreak)
-		goto out;
-
-	skb = netdev_alloc_skb_ip_align(tp->dev, pkt_size);
-	if (!skb)
-		goto out;
-
-	pci_dma_sync_single_for_cpu(tp->pci_dev, addr, tp->rx_buf_sz,
-				PCI_DMA_FROMDEVICE);
-	skb_copy_to_linear_data(skb, sk_buff[0]->data, pkt_size);
-	*sk_buff = skb;
-	done = true;
-out:
-	return done;
-}
-
 static inline int sis190_rx_pkt_err(u32 status, struct net_device_stats *stats)
 {
 #define ErrMask	(OVRUN | SHORT | LIMIT | MIIER | NIBON | COLON | ABORT)
@@ -612,19 +589,14 @@ static int sis190_rx_interrupt(struct net_device *dev,
 				continue;
 			}
 
-
-			if (sis190_try_rx_copy(tp, &skb, pkt_size, addr)) {
-				pci_dma_sync_single_for_device(pdev, addr,
-					tp->rx_buf_sz, PCI_DMA_FROMDEVICE);
+			skb = dev_skb_finish_rx_dma(&tp->Rx_skbuff[entry],
+				pkt_size, rx_copybreak,
+				&pdev->dev, addr, tp->rx_buf_sz);
+			if (tp->Rx_skbuff[entry])	/* copied */
 				sis190_give_to_asic(desc, tp->rx_buf_sz);
-			} else {
-				pci_unmap_single(pdev, addr, tp->rx_buf_sz,
-						 PCI_DMA_FROMDEVICE);
-				tp->Rx_skbuff[entry] = NULL;
+			else
 				sis190_make_unusable_by_asic(desc);
-			}
 
-			skb_put(skb, pkt_size);
 			skb->protocol = eth_type_trans(skb, dev);
 
 			sis190_rx_skb(skb);
diff --git a/drivers/net/starfire.c b/drivers/net/starfire.c
index 860a508..1664f25 100644
--- a/drivers/net/starfire.c
+++ b/drivers/net/starfire.c
@@ -1475,26 +1475,15 @@ static int __netdev_rx(struct net_device *dev, int *quota)
 
 		if (debug > 4)
 			printk(KERN_DEBUG "  netdev_rx() normal Rx pkt length %d, quota %d.\n", pkt_len, *quota);
-		/* Check if the packet is long enough to accept without copying
-		   to a minimally-sized skbuff. */
-		if (pkt_len < rx_copybreak &&
-		    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-			skb_reserve(skb, 2);	/* 16 byte align the IP header */
-			pci_dma_sync_single_for_cpu(np->pci_dev,
-						    np->rx_info[entry].mapping,
-						    pkt_len, PCI_DMA_FROMDEVICE);
-			skb_copy_to_linear_data(skb, np->rx_info[entry].skb->data, pkt_len);
-			pci_dma_sync_single_for_device(np->pci_dev,
-						       np->rx_info[entry].mapping,
-						       pkt_len, PCI_DMA_FROMDEVICE);
-			skb_put(skb, pkt_len);
-		} else {
-			pci_unmap_single(np->pci_dev, np->rx_info[entry].mapping, np->rx_buf_sz, PCI_DMA_FROMDEVICE);
-			skb = np->rx_info[entry].skb;
-			skb_put(skb, pkt_len);
-			np->rx_info[entry].skb = NULL;
+
+		skb = dev_skb_finish_rx_dma(&np->rx_info[entry].skb,
+			pkt_len, rx_copybreak,
+			&np->pci_dev->dev, np->rx_info[entry].mapping,
+			np->rx_buf_sz);
+
+		if (!np->rx_info[entry].skb)	/* not copied */
 			np->rx_info[entry].mapping = 0;
-		}
+
 #ifndef final_version			/* Remove after testing. */
 		/* You will want this info for the initial debug. */
 		if (debug > 5) {
diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index 4793df8..0a16798 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -1355,26 +1355,12 @@ static void rx_poll(unsigned long data)
 					   ", bogus_cnt %d.\n",
 					   pkt_len, boguscnt);
 #endif
-			/* Check if the packet is long enough to accept without copying
-			   to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				dma_sync_single_for_cpu(&np->pci_dev->dev,
-						le32_to_cpu(desc->frag[0].addr),
-						np->rx_buf_sz, DMA_FROM_DEVICE);
-				skb_copy_to_linear_data(skb, np->rx_skbuff[entry]->data, pkt_len);
-				dma_sync_single_for_device(&np->pci_dev->dev,
-						le32_to_cpu(desc->frag[0].addr),
-						np->rx_buf_sz, DMA_FROM_DEVICE);
-				skb_put(skb, pkt_len);
-			} else {
-				dma_unmap_single(&np->pci_dev->dev,
-					le32_to_cpu(desc->frag[0].addr),
-					np->rx_buf_sz, DMA_FROM_DEVICE);
-				skb_put(skb = np->rx_skbuff[entry], pkt_len);
-				np->rx_skbuff[entry] = NULL;
-			}
+			skb = dev_skb_finish_rx_dma(&np->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&np->pci_dev->dev,
+				le32_to_cpu(desc->frag[0].addr),
+				np->rx_buf_sz);
+
 			skb->protocol = eth_type_trans(skb, dev);
 			/* Note: checksum -> skb->ip_summed = CHECKSUM_UNNECESSARY; */
 			netif_rx(skb);
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 8e76705..ae3e110 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4973,6 +4973,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 
 			skb_size = tg3_alloc_rx_skb(tp, tpr, opaque_key,
 						    *post_ptr);
+#warning same problem as dev_skb_finish_rx_dma_refill()
 			if (skb_size < 0)
 				goto drop_it;
 
diff --git a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c
index ce90efc..80a34b6 100644
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -409,8 +409,7 @@ static void de_rx (struct de_private *de)
 	while (--rx_work) {
 		u32 status, len;
 		dma_addr_t mapping;
-		struct sk_buff *skb, *copy_skb;
-		unsigned copying_skb, buflen;
+		struct sk_buff *skb;
 
 		skb = de->rx_skb[rx_tail].skb;
 		BUG_ON(!skb);
@@ -432,42 +431,22 @@ static void de_rx (struct de_private *de)
 			goto rx_next;
 		}
 
-		copying_skb = (len <= rx_copybreak);
-
 		netif_dbg(de, rx_status, de->dev,
 			  "rx slot %d status 0x%x len %d copying? %d\n",
-			  rx_tail, status, len, copying_skb);
+			  rx_tail, status, len, len <= rx_copybreak);
 
-		buflen = copying_skb ? (len + RX_OFFSET) : de->rx_buf_sz;
-		copy_skb = dev_alloc_skb (buflen);
-		if (unlikely(!copy_skb)) {
+		skb = dev_skb_finish_rx_dma_refill(&de->rx_skb[rx_tail].skb,
+			len, rx_copybreak, 0, RX_OFFSET,
+			&de->pdev->dev, &mapping, de->rx_buf_sz);
+		de->rx_skb[rx_tail].mapping = mapping;
+
+		if (unlikely(!skb)) {
 			de->net_stats.rx_dropped++;
 			drop = 1;
 			rx_work = 100;
 			goto rx_next;
 		}
 
-		if (!copying_skb) {
-			pci_unmap_single(de->pdev, mapping,
-					 buflen, PCI_DMA_FROMDEVICE);
-			skb_put(skb, len);
-
-			mapping =
-			de->rx_skb[rx_tail].mapping =
-				pci_map_single(de->pdev, copy_skb->data,
-					       buflen, PCI_DMA_FROMDEVICE);
-			de->rx_skb[rx_tail].skb = copy_skb;
-		} else {
-			pci_dma_sync_single_for_cpu(de->pdev, mapping, len, PCI_DMA_FROMDEVICE);
-			skb_reserve(copy_skb, RX_OFFSET);
-			skb_copy_from_linear_data(skb, skb_put(copy_skb, len),
-						  len);
-			pci_dma_sync_single_for_device(de->pdev, mapping, len, PCI_DMA_FROMDEVICE);
-
-			/* We'll reuse the original ring buffer. */
-			skb = copy_skb;
-		}
-
 		skb->protocol = eth_type_trans (skb, de->dev);
 
 		de->net_stats.rx_packets++;
@@ -1292,6 +1271,7 @@ static int de_refill_rx (struct de_private *de)
 		de->rx_skb[i].mapping = pci_map_single(de->pdev,
 			skb->data, de->rx_buf_sz, PCI_DMA_FROMDEVICE);
 		de->rx_skb[i].skb = skb;
+		skb_reserve(skb, RX_OFFSET);
 
 		de->rx_ring[i].opts1 = cpu_to_le32(DescOwn);
 		if (i == (DE_RX_RING_SIZE - 1))
diff --git a/drivers/net/tulip/interrupt.c b/drivers/net/tulip/interrupt.c
index 5350d75..990f2a7 100644
--- a/drivers/net/tulip/interrupt.c
+++ b/drivers/net/tulip/interrupt.c
@@ -200,32 +200,14 @@ int tulip_poll(struct napi_struct *napi, int budget)
 						dev->stats.rx_fifo_errors++;
                                }
                        } else {
-                               struct sk_buff *skb;
-
-                               /* Check if the packet is long enough to accept without copying
-                                  to a minimally-sized skbuff. */
-                               if (pkt_len < tulip_rx_copybreak &&
-                                   (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-                                       skb_reserve(skb, 2);    /* 16 byte align the IP header */
-                                       pci_dma_sync_single_for_cpu(tp->pdev,
-								   tp->rx_buffers[entry].mapping,
-								   pkt_len, PCI_DMA_FROMDEVICE);
-#if ! defined(__alpha__)
-                                       skb_copy_to_linear_data(skb, tp->rx_buffers[entry].skb->data,
-                                                        pkt_len);
-                                       skb_put(skb, pkt_len);
-#else
-                                       memcpy(skb_put(skb, pkt_len),
-                                              tp->rx_buffers[entry].skb->data,
-                                              pkt_len);
-#endif
-                                       pci_dma_sync_single_for_device(tp->pdev,
-								      tp->rx_buffers[entry].mapping,
-								      pkt_len, PCI_DMA_FROMDEVICE);
-                               } else {        /* Pass up the skb already on the Rx ring. */
-                                       char *temp = skb_put(skb = tp->rx_buffers[entry].skb,
-                                                            pkt_len);
+				struct sk_buff *skb = dev_skb_finish_rx_dma(
+					&tp->rx_buffers[entry].skb,
+					pkt_len, tulip_rx_copybreak,
+					&tp->pdev->dev,
+					tp->rx_buffers[entry].mapping,
+					PKT_BUF_SZ);
 
+				if (!tp->rx_buffers[entry].skb) {
 #ifndef final_version
                                        if (tp->rx_buffers[entry].mapping !=
                                            le32_to_cpu(tp->rx_ring[entry].buffer1)) {
@@ -233,14 +215,9 @@ int tulip_poll(struct napi_struct *napi, int budget)
 						       "Internal fault: The skbuff addresses do not match in tulip_rx: %08x vs. %08llx %p / %p\n",
 						       le32_to_cpu(tp->rx_ring[entry].buffer1),
 						       (unsigned long long)tp->rx_buffers[entry].mapping,
-						       skb->head, temp);
+						       skb->head, skb->data);
                                        }
 #endif
-
-                                       pci_unmap_single(tp->pdev, tp->rx_buffers[entry].mapping,
-                                                        PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
-
-                                       tp->rx_buffers[entry].skb = NULL;
                                        tp->rx_buffers[entry].mapping = 0;
                                }
                                skb->protocol = eth_type_trans(skb, dev);
@@ -426,32 +403,14 @@ static int tulip_rx(struct net_device *dev)
 					dev->stats.rx_fifo_errors++;
 			}
 		} else {
-			struct sk_buff *skb;
-
-			/* Check if the packet is long enough to accept without copying
-			   to a minimally-sized skbuff. */
-			if (pkt_len < tulip_rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				pci_dma_sync_single_for_cpu(tp->pdev,
-							    tp->rx_buffers[entry].mapping,
-							    pkt_len, PCI_DMA_FROMDEVICE);
-#if ! defined(__alpha__)
-				skb_copy_to_linear_data(skb, tp->rx_buffers[entry].skb->data,
-						 pkt_len);
-				skb_put(skb, pkt_len);
-#else
-				memcpy(skb_put(skb, pkt_len),
-				       tp->rx_buffers[entry].skb->data,
-				       pkt_len);
-#endif
-				pci_dma_sync_single_for_device(tp->pdev,
-							       tp->rx_buffers[entry].mapping,
-							       pkt_len, PCI_DMA_FROMDEVICE);
-			} else { 	/* Pass up the skb already on the Rx ring. */
-				char *temp = skb_put(skb = tp->rx_buffers[entry].skb,
-						     pkt_len);
+			struct sk_buff *skb = dev_skb_finish_rx_dma(
+				&tp->rx_buffers[entry].skb,
+				pkt_len, tulip_rx_copybreak,
+				&tp->pdev->dev,
+				tp->rx_buffers[entry].mapping,
+				PKT_BUF_SZ);
 
+			if (!tp->rx_buffers[entry].skb) {
 #ifndef final_version
 				if (tp->rx_buffers[entry].mapping !=
 				    le32_to_cpu(tp->rx_ring[entry].buffer1)) {
@@ -459,14 +418,10 @@ static int tulip_rx(struct net_device *dev)
 						"Internal fault: The skbuff addresses do not match in tulip_rx: %08x vs. %Lx %p / %p\n",
 						le32_to_cpu(tp->rx_ring[entry].buffer1),
 						(long long)tp->rx_buffers[entry].mapping,
-						skb->head, temp);
+						skb->head, skb->data);
 				}
 #endif
 
-				pci_unmap_single(tp->pdev, tp->rx_buffers[entry].mapping,
-						 PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
-
-				tp->rx_buffers[entry].skb = NULL;
 				tp->rx_buffers[entry].mapping = 0;
 			}
 			skb->protocol = eth_type_trans(skb, dev);
diff --git a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c
index 862eadf..c023512 100644
--- a/drivers/net/tulip/winbond-840.c
+++ b/drivers/net/tulip/winbond-840.c
@@ -1228,26 +1228,11 @@ static int netdev_rx(struct net_device *dev)
 				netdev_dbg(dev, "  netdev_rx() normal Rx pkt length %d status %x\n",
 					   pkt_len, status);
 #endif
-			/* Check if the packet is long enough to accept without copying
-			   to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak &&
-			    (skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				pci_dma_sync_single_for_cpu(np->pci_dev,np->rx_addr[entry],
-							    np->rx_skbuff[entry]->len,
-							    PCI_DMA_FROMDEVICE);
-				skb_copy_to_linear_data(skb, np->rx_skbuff[entry]->data, pkt_len);
-				skb_put(skb, pkt_len);
-				pci_dma_sync_single_for_device(np->pci_dev,np->rx_addr[entry],
-							       np->rx_skbuff[entry]->len,
-							       PCI_DMA_FROMDEVICE);
-			} else {
-				pci_unmap_single(np->pci_dev,np->rx_addr[entry],
-							np->rx_skbuff[entry]->len,
-							PCI_DMA_FROMDEVICE);
-				skb_put(skb = np->rx_skbuff[entry], pkt_len);
-				np->rx_skbuff[entry] = NULL;
-			}
+			skb = dev_skb_finish_rx_dma(&np->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&np->pci_dev->dev, np->rx_addr[entry],
+				np->rx_buf_sz);
+
 #ifndef final_version				/* Remove after testing. */
 			/* You will want this info for the initial debug. */
 			if (debug > 5)
diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c
index 1d5091a..6bae56d 100644
--- a/drivers/net/typhoon.c
+++ b/drivers/net/typhoon.c
@@ -1646,7 +1646,7 @@ typhoon_rx(struct typhoon *tp, struct basic_ring *rxRing, volatile __le32 * read
 	   volatile __le32 * cleared, int budget)
 {
 	struct rx_desc *rx;
-	struct sk_buff *skb, *new_skb;
+	struct sk_buff *new_skb;
 	struct rxbuff_ent *rxb;
 	dma_addr_t dma_addr;
 	u32 local_ready;
@@ -1663,7 +1663,6 @@ typhoon_rx(struct typhoon *tp, struct basic_ring *rxRing, volatile __le32 * read
 		rx = (struct rx_desc *) (rxRing->ringBase + rxaddr);
 		idx = rx->addr;
 		rxb = &tp->rxbuffers[idx];
-		skb = rxb->skb;
 		dma_addr = rxb->dma_addr;
 
 		typhoon_inc_rx_index(&rxaddr, 1);
@@ -1675,25 +1674,14 @@ typhoon_rx(struct typhoon *tp, struct basic_ring *rxRing, volatile __le32 * read
 
 		pkt_len = le16_to_cpu(rx->frameLen);
 
-		if(pkt_len < rx_copybreak &&
-		   (new_skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
-			skb_reserve(new_skb, 2);
-			pci_dma_sync_single_for_cpu(tp->pdev, dma_addr,
-						    PKT_BUF_SZ,
-						    PCI_DMA_FROMDEVICE);
-			skb_copy_to_linear_data(new_skb, skb->data, pkt_len);
-			pci_dma_sync_single_for_device(tp->pdev, dma_addr,
-						       PKT_BUF_SZ,
-						       PCI_DMA_FROMDEVICE);
-			skb_put(new_skb, pkt_len);
+		new_skb = dev_skb_finish_rx_dma(&rxb->skb,
+			pkt_len, rx_copybreak,
+			&tp->pdev->dev, dma_addr, PKT_BUF_SZ);
+		if (!rxb->skb)
 			typhoon_recycle_rx_skb(tp, idx);
-		} else {
-			new_skb = skb;
-			skb_put(new_skb, pkt_len);
-			pci_unmap_single(tp->pdev, dma_addr, PKT_BUF_SZ,
-				       PCI_DMA_FROMDEVICE);
+		else
 			typhoon_alloc_rx_skb(tp, idx);
-		}
+
 		new_skb->protocol = eth_type_trans(new_skb, tp->dev);
 		csum_bits = rx->rxStatus & (TYPHOON_RX_IP_CHK_GOOD |
 			TYPHOON_RX_UDP_CHK_GOOD | TYPHOON_RX_TCP_CHK_GOOD);
diff --git a/drivers/net/via-rhine.c b/drivers/net/via-rhine.c
index 7f23ab9..c97265e 100644
--- a/drivers/net/via-rhine.c
+++ b/drivers/net/via-rhine.c
@@ -1764,40 +1764,13 @@ static int rhine_rx(struct net_device *dev, int limit)
 		} else {
 			struct sk_buff *skb = NULL;
 			/* Length should omit the CRC */
-			int pkt_len = data_size - 4;
+			int pkt_len = data_size - ETH_FCS_LEN;
 			u16 vlan_tci = 0;
 
-			/* Check if the packet is long enough to accept without
-			   copying to a minimally-sized skbuff. */
-			if (pkt_len < rx_copybreak)
-				skb = netdev_alloc_skb_ip_align(dev, pkt_len);
-			if (skb) {
-				pci_dma_sync_single_for_cpu(rp->pdev,
-							    rp->rx_skbuff_dma[entry],
-							    rp->rx_buf_sz,
-							    PCI_DMA_FROMDEVICE);
-
-				skb_copy_to_linear_data(skb,
-						 rp->rx_skbuff[entry]->data,
-						 pkt_len);
-				skb_put(skb, pkt_len);
-				pci_dma_sync_single_for_device(rp->pdev,
-							       rp->rx_skbuff_dma[entry],
-							       rp->rx_buf_sz,
-							       PCI_DMA_FROMDEVICE);
-			} else {
-				skb = rp->rx_skbuff[entry];
-				if (skb == NULL) {
-					netdev_err(dev, "Inconsistent Rx descriptor chain\n");
-					break;
-				}
-				rp->rx_skbuff[entry] = NULL;
-				skb_put(skb, pkt_len);
-				pci_unmap_single(rp->pdev,
-						 rp->rx_skbuff_dma[entry],
-						 rp->rx_buf_sz,
-						 PCI_DMA_FROMDEVICE);
-			}
+			skb = dev_skb_finish_rx_dma(&rp->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&rp->pdev->dev, rp->rx_skbuff_dma[entry],
+				rp->rx_buf_sz);
 
 			if (unlikely(desc_length & DescTag))
 				vlan_tci = rhine_get_vlan_tci(skb, data_size);
diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index f929242..cdc51de 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -1987,37 +1987,6 @@ static inline void velocity_rx_csum(struct rx_desc *rd, struct sk_buff *skb)
 }
 
 /**
- *	velocity_rx_copy	-	in place Rx copy for small packets
- *	@rx_skb: network layer packet buffer candidate
- *	@pkt_size: received data size
- *	@rd: receive packet descriptor
- *	@dev: network device
- *
- *	Replace the current skb that is scheduled for Rx processing by a
- *	shorter, immediately allocated skb, if the received packet is small
- *	enough. This function returns a negative value if the received
- *	packet is too big or if memory is exhausted.
- */
-static int velocity_rx_copy(struct sk_buff **rx_skb, int pkt_size,
-			    struct velocity_info *vptr)
-{
-	int ret = -1;
-	if (pkt_size < rx_copybreak) {
-		struct sk_buff *new_skb;
-
-		new_skb = netdev_alloc_skb_ip_align(vptr->dev, pkt_size);
-		if (new_skb) {
-			new_skb->ip_summed = rx_skb[0]->ip_summed;
-			skb_copy_from_linear_data(*rx_skb, new_skb->data, pkt_size);
-			*rx_skb = new_skb;
-			ret = 0;
-		}
-
-	}
-	return ret;
-}
-
-/**
  *	velocity_iph_realign	-	IP header alignment
  *	@vptr: velocity we are handling
  *	@skb: network layer packet buffer
@@ -2027,10 +1996,10 @@ static int velocity_rx_copy(struct sk_buff **rx_skb, int pkt_size,
  *	configured by the user.
  */
 static inline void velocity_iph_realign(struct velocity_info *vptr,
-					struct sk_buff *skb, int pkt_size)
+					struct sk_buff *skb)
 {
 	if (vptr->flags & VELOCITY_FLAGS_IP_ALIGN) {
-		memmove(skb->data + 2, skb->data, pkt_size);
+		memmove(skb->data + 2, skb->data, skb->len);
 		skb_reserve(skb, 2);
 	}
 }
@@ -2064,9 +2033,6 @@ static int velocity_receive_frame(struct velocity_info *vptr, int idx)
 
 	skb = rd_info->skb;
 
-	pci_dma_sync_single_for_cpu(vptr->pdev, rd_info->skb_dma,
-				    vptr->rx.buf_sz, PCI_DMA_FROMDEVICE);
-
 	/*
 	 *	Drop frame not meeting IEEE 802.3
 	 */
@@ -2078,30 +2044,25 @@ static int velocity_receive_frame(struct velocity_info *vptr, int idx)
 		}
 	}
 
-	pci_action = pci_dma_sync_single_for_device;
+	skb = dev_skb_finish_rx_dma(&rd_info->skb,
+		pkt_len - ETH_FCS_LEN, rx_copybreak,
+		&vptr->pdev->dev, rd_info->skb_dma, vptr->rx.buf_sz);
+	if (!skb)
+		/* not copied */
+		velocity_iph_realign(vptr, skb);
 
 	velocity_rx_csum(rd, skb);
 
-	if (velocity_rx_copy(&skb, pkt_len, vptr) < 0) {
-		velocity_iph_realign(vptr, skb, pkt_len);
-		pci_action = pci_unmap_single;
-		rd_info->skb = NULL;
-	}
-
-	pci_action(vptr->pdev, rd_info->skb_dma, vptr->rx.buf_sz,
-		   PCI_DMA_FROMDEVICE);
-
-	skb_put(skb, pkt_len - 4);
 	skb->protocol = eth_type_trans(skb, vptr->dev);
 
+	stats->rx_bytes += skb->len;
+
 	if (vptr->vlgrp && (rd->rdesc0.RSR & RSR_DETAG)) {
 		vlan_hwaccel_rx(skb, vptr->vlgrp,
 				swab16(le16_to_cpu(rd->rdesc1.PQTAG)));
 	} else
 		netif_rx(skb);
 
-	stats->rx_bytes += pkt_len;
-
 	return 0;
 }
 
diff --git a/drivers/net/yellowfin.c b/drivers/net/yellowfin.c
index 3e5ac60..e1aa4a6 100644
--- a/drivers/net/yellowfin.c
+++ b/drivers/net/yellowfin.c
@@ -1124,27 +1124,12 @@ static int yellowfin_rx(struct net_device *dev)
 				printk(KERN_DEBUG "  %s() normal Rx pkt length %d of %d, bogus_cnt %d\n",
 				       __func__, pkt_len, data_size, boguscnt);
 #endif
-			/* Check if the packet is long enough to just pass up the skbuff
-			   without copying to a properly sized skbuff. */
-			if (pkt_len > rx_copybreak) {
-				skb_put(skb = rx_skb, pkt_len);
-				pci_unmap_single(yp->pci_dev,
-					le32_to_cpu(yp->rx_ring[entry].addr),
-					yp->rx_buf_sz,
-					PCI_DMA_FROMDEVICE);
-				yp->rx_skbuff[entry] = NULL;
-			} else {
-				skb = dev_alloc_skb(pkt_len + 2);
-				if (skb == NULL)
-					break;
-				skb_reserve(skb, 2);	/* 16 byte align the IP header */
-				skb_copy_to_linear_data(skb, rx_skb->data, pkt_len);
-				skb_put(skb, pkt_len);
-				pci_dma_sync_single_for_device(yp->pci_dev,
-								le32_to_cpu(desc->addr),
-								yp->rx_buf_sz,
-								PCI_DMA_FROMDEVICE);
-			}
+			skb = dev_skb_finish_rx_dma(&yp->rx_skbuff[entry],
+				pkt_len, rx_copybreak,
+				&yp->pci_dev->dev,
+				le32_to_cpu(desc->addr),
+				yp->rx_buf_sz);
+
 			skb->protocol = eth_type_trans(skb, dev);
 			netif_rx(skb);
 			dev->stats.rx_packets++;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3e77b0f..496aac0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -30,6 +30,7 @@
 #include <linux/dmaengine.h>
 #include <linux/hrtimer.h>
 #include <linux/netdev_features.h>
+#include <linux/dma-mapping.h>
 
 /* Don't change this without changing skb_csum_unnecessary! */
 #define CHECKSUM_NONE 0
@@ -2283,5 +2284,133 @@ static inline void skb_checksum_none_assert(struct sk_buff *skb)
 
 bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off);
 
+/**
+ * __dev_skb_finish_rx_dma - finish skb after DMA'd RX
+ * @skb: skb to finish
+ * @len: packet data length
+ * @copybreak: maximum packet size to copy
+ * @dma_dev: device used for DMA
+ * @dma_buf: DMA mapping address
+ * @dma_len: DMA mapping length
+ *
+ * This function finishes DMA mapping (sync for copied, unmap otherwise) for
+ * a packet and copies it to new skb if its size is at or below @copybreak
+ * threshold.
+ *
+ * Returns new skb or NULL if the copy wasn't made.
+ */
+static inline struct sk_buff *__dev_skb_finish_rx_dma(
+	struct sk_buff *skb, unsigned int len, unsigned int copybreak,
+	struct device *dma_dev, dma_addr_t dma_buf, size_t dma_len)
+{
+	if (len <= copybreak) {
+		struct sk_buff *skb2 = netdev_alloc_skb_ip_align(skb->dev, len);
+		if (likely(skb2)) {
+			dma_sync_single_for_cpu(dma_dev, dma_buf, dma_len,
+				DMA_FROM_DEVICE);
+			skb_copy_to_linear_data(skb2, skb->data, len);
+			dma_sync_single_for_device(dma_dev, dma_buf, dma_len,
+				DMA_FROM_DEVICE);
+			return skb2;
+		}
+	}
+
+	/* else or copy failed */
+
+	dma_unmap_single(dma_dev, dma_buf, dma_len, DMA_FROM_DEVICE);
+	return NULL;
+}
+
+/**
+ * dev_skb_finish_rx_dma - finish skb after DMA'd RX
+ * @pskb: pointer to variable holding skb to finish
+ * @len: packet data length
+ * @copybreak: maximum packet size to copy
+ * @dma_dev: device used for DMA
+ * @dma_buf: DMA mapping address
+ * @dma_len: DMA mapping length
+ *
+ * This function finishes DMA mapping (sync for copied, unmap otherwise) for
+ * a packet and copies it to new skb if its size is at or below @copybreak
+ * threshold.  Like __dev_skb_finish_rx_dma().
+ *
+ * Returns the skb - old or copied. *pskb is cleared if the skb wasn't copied.
+ */
+static inline struct sk_buff *dev_skb_finish_rx_dma(
+	struct sk_buff **pskb, unsigned int len, unsigned int copybreak,
+	struct device *dma_dev, dma_addr_t dma_buf, size_t dma_len)
+{
+	struct sk_buff *skb2;
+
+	skb2 = __dev_skb_finish_rx_dma(*pskb, len, copybreak,
+		dma_dev, dma_buf, dma_len);
+
+	if (!skb2) {
+		/* not copied */
+		skb2 = *pskb;
+		*pskb = NULL;
+	}
+
+	skb_put(skb2, len);
+	return skb2;
+}
+
+/**
+ * dev_skb_finish_rx_dma_refill - finish skb after DMA'd RX and refill the slot
+ * @pskb: pointer to variable holding skb to finish
+ * @len: packet data length
+ * @copybreak: maximum packet size to copy
+ * @ip_align: new skb's alignment offset
+ * @rx_offset: count of bytes prepended by HW before packet's data
+ * @dma_dev: device used for DMA
+ * @dma_buf: DMA mapping address
+ * @dma_len: DMA mapping length
+ *
+ * This function finishes DMA mapping (sync for copied, unmap otherwise) for
+ * a packet and copies it to new skb if its size is at or below @copybreak
+ * threshold.  Like __dev_skb_finish_rx_dma().
+ *
+ * *pskb is filled with new mapped skb if the skb wasn't copied.
+ * Returns the skb - old or copied, or NULL if refill failed.
+ *
+ * NOTE:
+ * This will effectively drop the packet in case of memory pressure. This
+ * might not be wanted when swapping over network. It's better to throttle
+ * the receiver queue (refill later) as the packet might be needed to
+ * reclaim some memory.
+ */
+static inline __deprecated struct sk_buff *dev_skb_finish_rx_dma_refill(
+	struct sk_buff **pskb, unsigned int len, unsigned int copybreak,
+	unsigned int ip_align, unsigned int rx_offset,
+	struct device *dma_dev, dma_addr_t *dma_buf, size_t dma_len)
+{
+	struct sk_buff *skb;
+
+	skb = __dev_skb_finish_rx_dma(*pskb, len, copybreak,
+		dma_dev, *dma_buf, dma_len);
+
+	if (!skb) {
+		/* not copied */
+		skb = *pskb;
+		/* netdev_alloc_skb_ip_align() */
+		*pskb = netdev_alloc_skb(skb->dev, dma_len + ip_align);
+		if (likely(*pskb))
+			skb_reserve(*pskb, ip_align + rx_offset);
+		else {
+			/* no memory - drop packet */
+			*pskb = skb;
+			skb = NULL;
+		}
+
+		*dma_buf = dma_map_single(dma_dev, (*pskb)->data - rx_offset,
+			dma_len, DMA_FROM_DEVICE);
+	}
+
+	if (likely(skb))
+		skb_put(skb, len);
+
+	return skb;
+}
+
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
-- 
1.7.5.4


^ permalink raw reply related

* Re: ipv4: Simplify ARP hash function.
From: Stephen Hemminger @ 2011-07-08 23:41 UTC (permalink / raw)
  To: David Miller; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110708.153258.1997707802176810939.davem@davemloft.net>

What about using murmur hash which has a four byte pass as well.
  https://sites.google.com/site/murmurhash/
---
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

#define u32 uint32_t

/* Do a one pass murmurhash2 */
static u32 arp_hashfn(u32 key, int ifindex, u32 hash_rnd)
{
	/* murmurhash mixiing constants */
	const unsigned int m = 0x5bd1e995;
	const int r = 24;

	/* Initialize the hash to a 'random' value  */
	unsigned int h = ifindex ^ hash_rnd;
	unsigned int k = key;

	k *= m; 
	k ^= k >> r; 
	k *= m; 
		
	h *= m; 
	h ^= k;

	/* Do a few final mixes of the hash to ensure the last few
	 * bytes are well-incorporated.
	 */

	h ^= h >> 13;
	h *= m;
	h ^= h >> 15;

	return h;
} 

int main(int argc, char **argv)
{
	u32 rnd, key, hash;
	int ifindex;

	key = atoi(argv[1]);
	ifindex = atoi(argv[2]);
	rnd = atoi(argv[3]);

	hash = arp_hashfn(key, ifindex, rnd);
	printf("%u, %d, %u => %u\n", key, ifindex, rnd, hash);
	return 0;
}

^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: David Miller @ 2011-07-08 23:47 UTC (permalink / raw)
  To: shemminger; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110708164128.50155c9c@nehalam.ftrdhcpuser.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 8 Jul 2011 16:41:28 -0700

> What about using murmur hash which has a four byte pass as well.
>   https://sites.google.com/site/murmurhash/

I'm trying to avoid multiplies that are not done in hardware on some
cpus.

Right now I'm looking at one of Thomas Wang's hashes, referenced on
Bob Jenkin's hash analysis page:

u32 hashint(u32 a)
{
	a += ~(a<<15);
	a ^=  (a>>10);
	a +=  (a<<3);
	a ^=  (a>>6);
	a += ~(a<<11);
	a ^=  (a>>16);

	return a;
}

It's 15 instructions, and produces better entropy in the low bits of
the result than the high bits, which is fine for how we'll use this
thing.

^ permalink raw reply

* Re: [RFC PATCH] net: clean up rx_copybreak handling
From: David Dillow @ 2011-07-09  1:37 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev
In-Reply-To: <f73b8ed0717151cd2c72e5c23d275cc0de28d277.1310167326.git.mirq-linux@rere.qmqm.pl>

On Sat, 2011-07-09 at 01:27 +0200, Michał Mirosław wrote:
> diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c

> @@ -1675,25 +1674,14 @@ typhoon_rx(struct typhoon *tp, struct basic_ring *rxRing, volatile __le32 * read
>  
>                 pkt_len = le16_to_cpu(rx->frameLen);
>  
> -               if(pkt_len < rx_copybreak &&
> -                  (new_skb = dev_alloc_skb(pkt_len + 2)) != NULL) {
> -                       skb_reserve(new_skb, 2);
> -                       pci_dma_sync_single_for_cpu(tp->pdev, dma_addr,
> -                                                   PKT_BUF_SZ,
> -                                                   PCI_DMA_FROMDEVICE);
> -                       skb_copy_to_linear_data(new_skb, skb->data, pkt_len);
> -                       pci_dma_sync_single_for_device(tp->pdev, dma_addr,
> -                                                      PKT_BUF_SZ,
> -                                                      PCI_DMA_FROMDEVICE);
> -                       skb_put(new_skb, pkt_len);
> +               new_skb = dev_skb_finish_rx_dma(&rxb->skb,
> +                       pkt_len, rx_copybreak,
> +                       &tp->pdev->dev, dma_addr, PKT_BUF_SZ);

Needs a few more tabs in front of the arguments. It looks like
		new_skb = dev_skb_finish_rx_dma(&rxb->skb, pkt_len,
						rx_copybreak, &tp->pdev->dev,
						dma_addr, PKT_BUF_SZ);

would fit the style around it better.

> +               if (!rxb->skb)
>                         typhoon_recycle_rx_skb(tp, idx);

I think you either meant
		if (rxb->skb)
			typhoon_recycle_rx_skb(tp, idx);

or to swap typhoon_{recycle,alloc}_rx_skb(). As it stands, you'll reload
the NIC with a NULL skb pointer, and it will DMA to the old location
when it eventually uses this descriptor.

> -               } else {
> -                       new_skb = skb;
> -                       skb_put(new_skb, pkt_len);
> -                       pci_unmap_single(tp->pdev, dma_addr, PKT_BUF_SZ,
> -                                      PCI_DMA_FROMDEVICE);
> +               else
>                         typhoon_alloc_rx_skb(tp, idx);



^ permalink raw reply

* Re: ipv4: Simplify ARP hash function.
From: Stephen Hemminger @ 2011-07-09  3:08 UTC (permalink / raw)
  To: David Miller; +Cc: roland, johnwheffner, mj, netdev
In-Reply-To: <20110708.164751.1543109601226116469.davem@davemloft.net>

On Fri, 08 Jul 2011 16:47:51 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Fri, 8 Jul 2011 16:41:28 -0700
> 
> > What about using murmur hash which has a four byte pass as well.
> >   https://sites.google.com/site/murmurhash/
> 
> I'm trying to avoid multiplies that are not done in hardware on some
> cpus.
> 
> Right now I'm looking at one of Thomas Wang's hashes, referenced on
> Bob Jenkin's hash analysis page:
> 
> u32 hashint(u32 a)
> {
> 	a += ~(a<<15);
> 	a ^=  (a>>10);
> 	a +=  (a<<3);
> 	a ^=  (a>>6);
> 	a += ~(a<<11);
> 	a ^=  (a>>16);
> 
> 	return a;
> }
> 
> It's 15 instructions, and produces better entropy in the low bits of
> the result than the high bits, which is fine for how we'll use this
> thing.

Ok. but you really have sell those Sparc's while they are still
worth something on Ebay :-)

^ permalink raw reply

* I need your assistance
From: leung  cheung @ 2011-07-09  4:02 UTC (permalink / raw)



Hello,

Compliment of the day to you. I am sending this brief letter to solicit 
your partnership to transfer $22,500,000.00 US Dollars from Hong Kong to 
your country. You will be entitled to 40% after compliting the transaction 
while I will be entitled to 60% as the sole initiator of this mutual 
proposal. I shall send you more information and procedures when I receive 
a positive response from you.

Best Regards,

Mr. Leung Cheung
Email: leungcheung18@helixnet.cn



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox