Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: IPv6 path discovery oddities - flushing the routing cache resolves
From: Valentijn Sessink @ 2013-10-17 10:53 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20131016154841.GC18135@order.stressinduktion.org>

Hi,

Thanks for your fast response, please see below. Sorry for the rather 
long mail, due to the routing tables. Even though they're really short, 
the output of ipv6_route is extensive...

On 16-10-13 17:48, Hannes Frederic Sowa wrote:
> I do think these two issues are connected. Could you send me a the
> corresponding ip route output and /proc/net/ipv6_route output for when it
> works and mtus are correctly handled and when it does not work?

When it works (please note that the routing cache at first doesn't show 
an MTU, that only happens after a "too big" has been received, hence the 
difference before and after a "ps uaxww"), I have the following - this 
is from last night:

# ip -6 ro li
2a01:4f8:200:546b::/64 dev br0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev br0  proto kernel  metric 256
fe80::/64 dev vnet0  proto kernel  metric 256
fe80::/64 dev vnet1  proto kernel  metric 256
default via fe80::1 dev br0  metric 1024

# ip -6 ro li cache
2001:7b8:1529:0:cd7a:7681:b261:af09 via fe80::1 dev br0  metric 0
     cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache

# ps uaxww
[... output cut for brevity ...]

# ip -6 ro li cache
2001:7b8:1529:0:cd7a:7681:b261:af09 via fe80::1 dev br0  metric 0
     cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache  expires 577sec mtu 1280

# cat /proc/net/ipv6_route
200107b815290000cd7a7681b261af09 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000001 000062b2 01000023
   br0
20011af8ff030003021966fffe2606dd 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000001 00000002 01400023
   br0
2a0104f80000a0a10000000000020001 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000000 00000001 01000003
   br0
2a0104f80200546b0000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
  eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet1
00000000000000000000000000000000 00 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000400 00000000 00000000 00000003
   br0
00000000000000000000000000000000 00 00000000000000000000000000000000 00
00000000000000000000000000000000 ffffffff 00000001 0001b230 00200200
    lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000037 80200001
    lo
2a0104f80200546b0000000000000002 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000003 000e6bc4 80200001
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 0000003b 80200001
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
fe80000000000000fc5400fffe17ee5b 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
fe80000000000000fc5400fffe497447 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
  eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet1
00000000000000000000000000000000 00 00000000000000000000000000000000 00
00000000000000000000000000000000 ffffffff 00000001 0001b230 00200200
    lo

(Please note, while trying, I found a rather interesting output, showing 
negative time; if it's normal, please ignore:
# ip -statistics -6 ro li cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache  expires -23sec users 2 used 2 age 637sec mtu 1280)


Anyway, today, suddenly it doesn't work anymore:

# ip -6 ro li
2a01:4f8:200:546b::/64 dev br0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev br0  proto kernel  metric 256
fe80::/64 dev vnet0  proto kernel  metric 256
fe80::/64 dev vnet1  proto kernel  metric 256
fe80::/64 dev vnet2  proto kernel  metric 256
default via fe80::1 dev br0  metric 1024

# ip -6 ro li cache
2001:7b8:2ff:37a::2 via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529::f via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache
2a01:4f8:200:546b::5 via 2a01:4f8:200:546b::5 dev br0  metric 0
     cache
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache

# ip -statistics -6 ro li cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache  users 4 used 9448
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache  users 4 used 260467 age 0sec
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache  users 1 used 987 age 205sec
# ip -6 ro li cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache
(The second time suddenly a couple of routes were gone - I don't know 
why). After "cat /proc/net/ipv6_route > /tmp/route-bad" I typed a "route 
-6 flush cache" and TCP started flowing again. Here's ipv6_route:

200107b802ff037a0000000000000002 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000000 00000006 01000003 
   br0
200107b815290000000000000000000f 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000000 00000006 01000003 
   br0
200107b81529000002138ffffeda140c 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000004 000024aa 01000023 
   br0
200107b815290000b845bb255d992a07 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000004 0003f79f 01000023 
   br0
2a0104f80200546b0000000000000005 80 00000000000000000000000000000000 00 
2a0104f80200546b0000000000000005 00000000 00000000 00000018 01000001 
   br0
2a0104f80200546b0000000000002eef 80 00000000000000000000000000000000 00 
2a0104f80200546b0000000000002eef 00000000 00000001 000003db 01000001 
   br0
2a0104f80200546b0000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
  eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet1
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet2
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000400 00000000 00000000 00000003 
   br0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
00000000000000000000000000000000 ffffffff 00000001 00064566 00200200 
    lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000167 80200001 
    lo
2a0104f80200546b0000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
2a0104f80200546b0000000000000002 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 0000000b 002a4bfe 80200001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 000001ed 80200001 
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffe17ee5b 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffe497447 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffedd5a81 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
  eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet1
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet2
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
00000000000000000000000000000000 ffffffff 00000001 00064566 00200200 
    lo

I hope this helps.

Best regards,

Valentijn

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: David Vrabel @ 2013-10-17 10:31 UTC (permalink / raw)
  To: jianhai luan
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FB9BC.9010608@oracle.com>

On 17/10/13 11:19, jianhai luan wrote:
> 
> On 2013-10-17 17:15, David Vrabel wrote:
>> On 17/10/13 10:02, jianhai luan wrote:
>>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>>
>>>>> If netfront sends at a very low rate, the time between subsequent
>>>>> calls
>>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>>> timer_after_eq() will be incorrect.  Credit will not be replenished
>>>>> and
>>>>> the guest may become unable to send (e.g., if prior to the long
>>>>> gap, all
>>>>> credit was exhausted).
>>>>>
>>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>>> Because
>>>>> the fact now must be not before than expire, time_before(now, expire)
>>>>> == true
>>>>> will verify the scenario.
>>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>>       ==
>>>>>       !time_in_range_open(now, expire, next_credit)
>>>> So first of all this must be with a 32-bit netback. And the not
>>>> coverable gap between activity is well over 240 days long. _If_
>>>> this really needs dealing with, then why is extending this from
>>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>>> change to 64-bit jiffy values, and use time_after_eq64()?
>>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>>> MAX_ULONG/2 in 64-bit will need long long time)
>>>
>>> I think the gap should be think all environment even now extending 480+.
>>> if now fall in the gap,  one timer will be pending and replenish will be
>>> in time.  Please run the attachment test program.
>>>
>>> If use time_after_eq64(), expire ,next_credit and other member will must
>>> be u64.
>> Yes, you'll need to store next_credit as a u64 in vif instead of
>> calculating it in tx_credit_exceeded from expires (which is only an
>> unsigned long).
> 
> I know that.  Even we use u64, time_after_eq()  will also do wrong judge
> in theory (not in reality because need long long time).

If jiffies_64 has millisecond resolution that would be more than
500,000,000 years.

> I think the two better fixed way is below:
>   - By time_before() to judge if now beyond MAX_ULONG/2

This is broken, so no.

David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17 10:19 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FAABE.5080806@citrix.com>


On 2013-10-17 17:15, David Vrabel wrote:
> On 17/10/13 10:02, jianhai luan wrote:
>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>
>>>> If netfront sends at a very low rate, the time between subsequent calls
>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>>> credit was exhausted).
>>>>
>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>> Because
>>>> the fact now must be not before than expire, time_before(now, expire)
>>>> == true
>>>> will verify the scenario.
>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>       ==
>>>>       !time_in_range_open(now, expire, next_credit)
>>> So first of all this must be with a 32-bit netback. And the not
>>> coverable gap between activity is well over 240 days long. _If_
>>> this really needs dealing with, then why is extending this from
>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>> change to 64-bit jiffy values, and use time_after_eq64()?
>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>> MAX_ULONG/2 in 64-bit will need long long time)
>>
>> I think the gap should be think all environment even now extending 480+.
>> if now fall in the gap,  one timer will be pending and replenish will be
>> in time.  Please run the attachment test program.
>>
>> If use time_after_eq64(), expire ,next_credit and other member will must
>> be u64.
> Yes, you'll need to store next_credit as a u64 in vif instead of
> calculating it in tx_credit_exceeded from expires (which is only an
> unsigned long).

I know that.  Even we use u64, time_after_eq()  will also do wrong judge 
in theory (not in reality because need long long time).
I think the two better fixed way is below:
   - By time_before() to judge if now beyond MAX_ULONG/2
   - Add another timer to check and update expire in MAX_ULONG>>2 period.

Because second way isn't  be verified in practical (need more time to 
waiting jiffes increase),  I chose the first.
>
> David

^ permalink raw reply

* Re: [PATCH] WAN: Adding support for Infineon PEF2256 E1 chipset
From: Mark Rutland @ 2013-10-17 10:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: rob.herring@calxeda.com, Pawel Moll, Stephen Warren, Ian Campbell,
	Rob Landley, grant.likely@linaro.org, Krzysztof Halasa,
	devicetree@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	jerome.chantelauze@c-s.fr
In-Reply-To: <201310161525.r9GFPZI5006238@localhost.localdomain>

On Wed, Oct 16, 2013 at 04:25:35PM +0100, Christophe Leroy wrote:
> The patch adds WAN support for Infineon PEF2256 E1 Chipset.
> 
> Signed-off-by: Jerome Chantelauze <jerome.chantelauze@c-s.fr>
> Acked-by: Christophe Leroy <christophe.leroy@c-s.fr>

> +static ssize_t fs_attr_mode_store(struct device *dev,
> +                       struct device_attribute *attr,  const char *buf,
> +                       size_t count)
> +{
> +       struct net_device *ndev = dev_get_drvdata(dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +       u32 value;
> +       int ret = kstrtol(buf, 10, (long int *)&value);

u32 is not the same as long int.

> +       int reconfigure = (value != priv->mode);
> +
> +       if (ret != 0)
> +               return ret;
> +
> +       if (value != MASTER_MODE && value != SLAVE_MODE)
> +               return -EINVAL;
> +
> +       priv->mode = value;
> +       if (reconfigure && priv->init_done) {
> +               pef2256_close(ndev);
> +               init_FALC(priv);
> +               pef2256_open(ndev);
> +       }
> +
> +       return count;

What if count is not the number of characters read?

[...]

> +
> +       /* TS 0 is reserved */
> +       if (value & 0x80000000)
> +               return -EINVAL;

Magic numbers should be turned into constants.

> +static ssize_t fs_attr_Rx_TS_show(struct device *dev,
> +                       struct device_attribute *attr, char *buf)
> +{
> +       struct net_device *ndev = dev_get_drvdata(dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +
> +       return sprintf(buf, "0x%08x\n", priv->Rx_TS);
> +}
> +
> +
> +static ssize_t fs_attr_Rx_TS_store(struct device *dev,
> +                       struct device_attribute *attr,  const char *buf,
> +                       size_t count)
> +{
> +       struct net_device *ndev = dev_get_drvdata(dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +       u32 value;
> +       int ret = kstrtol(buf, 10, (long int *)&value);

I'm not sure what the rules are regarding this, but why do we show this
in hexadecimal but read it in decimal?

[...]

> +int Config_HDLC(struct pef2256_dev_priv *priv)
> +{
> +       int i;
> +       int TS_idx;
> +       struct pef2256_regs *base_addr;

That sounds suspicious. Using structs for the offsets of registers isn't
very portable...

It would be preferable to #define the offsets.

> +       u8 dummy;
> +
> +       /* Set framer E1 address */
> +       base_addr = (struct pef2256_regs *)priv->base_addr;

That looks even more suspicious...

> +
> +       /* Read to remove pending IT */
> +       dummy = base_addr->ISR0;
> +       dummy = base_addr->ISR1;

You should use MMIO accessors here (readl, writel, etc). You have no
idea how the compiler may reorganise, coalese or throw away accesses,
nor how those accesses will be made. Additionally, without the requisite
barriers you have no guarantee the CPU won't reorder these accesses.

The compiler is within its rights here to throw away these accesses as
the results are never used. This is broken.

With some constants for the register offsets, this would be:

readb(base_addr + REG_ISR0);
readb(base_addr + REG_ISR1);

Which won't be reordered or thrown away by either the compiler or CPU.

> +
> +       /* Mask HDLC 1 Transmit IT */
> +       base_addr->IMR1 |= 1;
> +       base_addr->IMR1 |= 1 << 4;
> +       base_addr->IMR1 |= 1 << 5;
> +
> +       /* Mask HDLC 1 Receive IT */
> +       base_addr->IMR0 |= 1;
> +       base_addr->IMR0 |= 1 << 7;
> +       base_addr->IMR1 |= 1 << 6;
> +
> +       udelay((2 * 32) * 125);

Why the udelay, and how was the delay period (2 * 32 * 125) derived?

Is this to account for the lack of barriers, or does the hardware have a
requirement that there's a delay? 

If the former, please fix. If the later, please coment the udelay to
make this clear.

> +
> +       /* MODE.HRAC = 0 (Receiver inactive)
> +          MODE.DIV = 0 (Data normal operation)
> +          for FALC V2.2 : MODE.HDLCI = 0 (normal operation) */
> +       /* MODE.MDS2:0 = 100 (No address comparison) */
> +       /* MODE.HRAC = 1 (Receiver active) */
> +       out_8(&(base_addr->MODE), 1 << 3);

Why are you using an MMIO accessor here but not elsewhere?

Not all architectures seem to have out_8, but I think iowrite8/writeb
will work (though I'm not sure what the intended difference between
writeb and out_8 is).

> +       /* CCR1.EITS = 1 (Enable internal Time Slot 31:0 Signaling)
> +          CCR1.XMFA = 0 (No transmit multiframe alignment)
> +          CCR1.RFT1:0 = 00 (RFIFO sur 32 bytes) */
> +       /* setting up Interframe Time Fill */
> +       /* CCR1.ITF = 1 (Interframe Time Fill Continuous flag) */
> +       out_8(&(base_addr->CCR1), 0x10 | (1 << 3));
> +       /* CCR2.XCRC = 0 (Transmit CRC ON)
> +          CCR2.RCRC = 0 (Receive CRC ON, no write in RFIFO)
> +          CCR2.RADD = 0 (No write address in RFIFO) */
> +       out_8(&(base_addr->CCR2), 0x00);
> +
> +       udelay((2 * 32) * 125);

Please explain all udelay instances.

[...]

> +                               setbits8(&(base_addr->TTR1), 1 << i);

I'm not aware of a generic equivalent to setbits8, but it seems like
writeb(readb(ADDR) | bits), ADDR) would do the same.

[...]

> +static int pef2256_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
> +{
> +       int ret;
> +
> +       ret = hdlc_ioctl(dev, ifr, cmd);
> +       return ret;
> +}

This seems a bit useless -- can't you just assign hdlc_ioctl to
pef2256_ops::ndo_do_ioctl directly?

> +static const struct of_device_id pef2256_match[];
> +static int pef2256_probe(struct platform_device *ofdev)

s/ofdev/pdev -- platform_device has nothing to do with OF.

> +{
> +       const struct of_device_id *match;
> +       struct pef2256_dev_priv *priv;
> +       int ret = -ENOMEM;
> +       struct net_device *netdev;
> +       hdlc_device *hdlc;
> +       int sys_ret;
> +       struct pef2256_regs *base_addr;
> +       struct device_node *np = (&ofdev->dev)->of_node;
> +       const u32 *data;
> +       int len;
> +
> +       match = of_match_device(pef2256_match, &ofdev->dev);
> +       if (!match)
> +               return -EINVAL;

Why not:

if (!pdev->dev.of_node)
	return -EINVAL;

You shouldn't have an of_node unless one of your compatible strings
matched, and this way you don't have to iterate over the list again.

> +
> +       dev_err(&ofdev->dev, "Found PEF2256\n");
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return ret;
> +
> +       priv->dev = &ofdev->dev;
> +
> +       data = of_get_property(np, "data-rate", &len);
> +       if (!data || len != 4) {

Use of_property_read_u32.

> +               dev_err(&ofdev->dev, "failed to read data-rate -> using 8Mb\n");
> +               priv->data_rate = DATA_RATE_8M;
> +       } else
> +               priv->data_rate = *data;
> +
> +       data = of_get_property(np, "channel-phase", &len);
> +       if (!data || len != 4) {

Use of_property_read_u32.

> +               dev_err(&ofdev->dev, "failed to read channel phase -> using 0\n");
> +               priv->channel_phase = CHANNEL_PHASE_0;
> +       } else
> +               priv->channel_phase = *data;
> +
> +       data = of_get_property(np, "rising-edge-sync-pulse", NULL);

Use of_property_read_string.

> +       if (!data) {
> +               dev_err(&ofdev->dev, "failed to read rising edge sync pulse -> using \"transmit\"\n");
> +               strcpy(priv->rising_edge_sync_pulse, "transmit");
> +       } else if (strcmp((char *)data, "transmit") &&
> +                       strcmp((char *)data, "receive")) {
> +               dev_err(&ofdev->dev, "invalid rising edge sync pulse -> using \"transmit\"\n");
> +               strcpy(priv->rising_edge_sync_pulse, "transmit");
> +       } else
> +               strncpy(priv->rising_edge_sync_pulse, (char *)data, 10);
> +
> +       priv->irq = of_irq_to_resource(np, 0, NULL);
> +       if (!priv->irq) {
> +               dev_err(priv->dev, "no irq defined\n");
> +               return -EINVAL;
> +       }

The irq will have already been parsed, and will be in your
platform_device's set of resources. You can use platform_get_irq to get
at it rather than getting the of_ code to attempt to map it again.

Why are you storing the IRQ resource, rather than the irq itself? Surely
the irq number is easier to deal with?

> +       netdev = alloc_hdlcdev(priv);
> +       if (!netdev) {
> +               ret = -ENOMEM;
> +               return ret;

You leak priv and the priv->base_addr mapping here.

[...]

> +       ret = register_hdlc_device(netdev);
> +       if (ret < 0) {
> +               pr_err("unable to register\n");
> +               return ret;

You leak the priv, priv->base_addr, and netdev here.

> +       }
> +
> +       sys_ret = 0;
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_mode);
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_Tx_TS);
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_Rx_TS);
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_regs);

Huh? can't any of these fail individually?

> +
> +       if (sys_ret) {
> +               device_remove_file(priv->dev, &dev_attr_mode);

What about the other files?

> +               unregister_hdlc_device(priv->netdev);
> +               free_netdev(priv->netdev);

What about priv and priv->base_addr?

Why is there not a return here? We'll fall out to the main body and
return 0, as if everything's OK...

> +       }
> +
> +       priv->init_done = 0;
> +
> +       return 0;
> +}

[...]

> +
> +
> +/*
> + * Suppression du module
> + */
> +static int pef2256_remove(struct platform_device *ofdev)
> +{
> +       struct net_device *ndev = dev_get_drvdata(&ofdev->dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +
> +       device_remove_file(priv->dev, &dev_attr_Rx_TS);
> +       device_remove_file(priv->dev, &dev_attr_Tx_TS);
> +       device_remove_file(priv->dev, &dev_attr_mode);
> +
> +       unregister_hdlc_device(priv->netdev);
> +       free_netdev(priv->netdev);

What about priv and priv->base_addr?

> +
> +       /* Do E1 stuff */
> +
> +       dev_set_drvdata(&ofdev->dev, NULL);
> +       kfree(ofdev);

Is that meant to be done here? Isn't that the job of the core code?

[...]

> +static int __init pef2256_init(void)
> +{
> +       int ret;
> +       ret = platform_driver_register(&pef2256_driver);
> +       return ret;
> +}
> +module_init(pef2256_init);
> +
> +
> +static void __exit pef2256_exit(void)
> +{
> +       platform_driver_unregister(&pef2256_driver);
> +}
> +module_exit(pef2256_exit);

Use module_platform_driver?

> +/* Framer E1 registers */
> +union pef2256_Fifo {
> +       u8      XFIFO[sizeof(u16)];             /* Transmit FIFO */
> +       u8      RFIFO[sizeof(u16)];             /* Receive FIFO */
> +};

Huh? Why sizeof(u16) rather than 2?

> +struct pef2256_regs {
> +       union pef2256_Fifo      FIFO;   /* 0x00/0x01    FIFO (Tx or rx) */
> +       unsigned char   CMDR;   /* 0x02 Command Register */
> +       unsigned char   MODE;   /* 0x03 Mode Register */
> +       unsigned char   RAH1;   /* 0x04 Receive Address High 1 */
> +       unsigned char   RAH2;   /* 0x05 Receive Address High 2 */
> +       unsigned char   RAL1;   /* 0x06 Receive Address Low 1 */

[...]

Please do not use structures for calculation of register offsets.

> +       unsigned short  FEC;    /* 0x50/0x51 Framing Error Counter */
> +       unsigned short  CVC;    /* 0x52/0x53 Code Violation Counter */
> +       unsigned short  CEC1;   /* 0x54/0x55 CRC Error Counter 1 */
> +       unsigned short  EBC;    /* 0x56/0x57 E-Bit Error Counter */
> +       unsigned short  CEC2;   /* 0x58/0x59 CRC Error Counter 2 */
> +       unsigned short  CEC3;   /* 0x5A/0x5B CRC Error Counter 3 */

These may not be the size you expect.

> diff -urN a/Documentation/devicetree/bindings/net/pef2256.txt b/Documentation/devicetree/bindings/net/pef2256.txt
> --- a/Documentation/devicetree/bindings/net/pef2256.txt 1970-01-01 01:00:00.000000000 +0100
> +++ b/Documentation/devicetree/bindings/net/pef2256.txt 2013-10-13 15:05:42.000000000 +0200
> @@ -0,0 +1,29 @@
> +* Wan on Infineon pef2256 E1 controller

A brief description would be helpful. Is there any publicly available
documentation?

> +
> +Required properties:
> +- compatible: Should be "infineon,pef2256"

s/Should be/Should contain/ -- variants may exist in future.

> +- reg: Address and length of the register set for the device

Is there only the one register bank?

> +- interrupts: Should contain interrupts

How many? What do they correspond to?

> +
> +Optional properties:
> +- data-rate: Data rate on the system highway.
> +  Supported values are: 2, 4, 8, 16.
> +  8 if not defined.

What is the "system highway"? Is this configuration, or is this a
property of the device that cannot be probed?

> +- channel-phase: First time slot transmission channel phase.
> +  Supported values are: 0, 1, 2, 3, 4, 5, 6, 7.
> +  0 if not defined.

Similarly?

> +- rising-edge-sync-pulse: rising edge synchronous pulse.
> +  Supported values are: "receive", "transmit".
> +  "transmit" if not defined.

I'm not sure what this means. Could you elaborate?

Thanks,
Mark.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17  9:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FC98002000078000FBBB5@nat28.tlf.novell.com>


On 2013-10-17 17:26, Jan Beulich wrote:
>>>> On 17.10.13 at 11:02, jianhai luan <jianhai.luan@oracle.com> wrote:
>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>
>>>> If netfront sends at a very low rate, the time between subsequent calls
>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>>> credit was exhausted).
>>>>
>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>> Because
>>>> the fact now must be not before than expire, time_before(now, expire) ==
>> true
>>>> will verify the scenario.
>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>       ==
>>>>       !time_in_range_open(now, expire, next_credit)
>>> So first of all this must be with a 32-bit netback. And the not
>>> coverable gap between activity is well over 240 days long. _If_
>>> this really needs dealing with, then why is extending this from
>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>> change to 64-bit jiffy values, and use time_after_eq64()?
>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>> MAX_ULONG/2 in 64-bit will need long long time)
>>
>> I think the gap should be think all environment even now extending 480+.
>> if now fall in the gap,  one timer will be pending and replenish will be
>> in time.  Please run the attachment test program.
> Not sure what this is supposed to tell me. I recognize that there
> are overflow conditions not handled properly, but (a) I have a
> hard time thinking of a sensible guest that sits idle for over 240
> days (host uptime usually isn't even coming close to that due to
> maintenance requirements) and (b) if there is such a sensible
> guest, then I can't see why dealing with one being idle for over
> 480 days should be required too.

The issue can be reproduced when now beyond MAX_ULONG/2 (if the gust 
will send lesser package).
Jiffies beyond than MAX_UNLONG/2 will need below time:
     HZ         days
    100        248.55        (((0xffffffff/2)/HZ)/3600)/24
    250        99.42          (((0xffffffff/2)/HZ)/3600)/24
   1000       24.86          (((0xffffffff/2)/HZ)/3600)/24

Because we use 250,  the issue be found when uptime large than 100 days.

Jason
>> If use time_after_eq64(), expire ,next_credit and other member will must
>> be u64.
> Exactly - that's what I was telling you to do.
>
> Jan
>

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Steffen Klassert @ 2013-10-17  9:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Michal Kubecek, David S. Miller, netdev
In-Reply-To: <20131016123205.GA9982@gondor.apana.org.au>

On Wed, Oct 16, 2013 at 08:32:05PM +0800, Herbert Xu wrote:
> On Mon, Oct 14, 2013 at 06:03:34PM +0200, Michal Kubecek wrote:
> > In ipcomp_compress(), sortirq is enabled too early, allowing the
> > per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> > (called on the same CPU in softirq context) between populating
> > the buffer and copying the compressed data to the skb.
> 
> Good catch.
> 
> > Add similar protection into ipcomp_decompress() as it can be
> > called from process context as well (even if such scenario seems
> > a bit artificial).
> 
> I don't think this is possible or otherwise xfrm_input will
> dead-lock.
> 

Michal, please incorporate the feedback from Herbert and Eric,
I'll take it into the ipsec tree then. Thanks!

^ permalink raw reply

* Re: [PATCH net-next] {selinux, af_key} Rework pfkey_sadb2xfrm_user_sec_ctx
From: Steffen Klassert @ 2013-10-17  9:51 UTC (permalink / raw)
  To: Fan Du; +Cc: Paul Moore, davem, netdev
In-Reply-To: <525F3EBD.80406@windriver.com>

On Thu, Oct 17, 2013 at 09:34:53AM +0800, Fan Du wrote:
> 
> 
> On 2013年10月16日 23:15, Paul Moore wrote:
> >
> >The fact that you are now changing sadb_x_sec_ctx->sadb_x_sec_len whenever
> >pfkey_sadb2xfrm_user_sec_ctx() is called raises an eyebrow.  Can you elaborate
> >on why this is not a problem?
> >
> Thanks for your attention, Paul.
> 
> sadb_x_sec_ctx is extra headers passed down from user space, the usage of
> of this data structure falls down to one of pfkey_funcs function only for
> one time, more specifically speaking, it's only used by SELINUX for security
> checking for each operation. In other words, sadb_x_sec_ctx involves with a
> one shot business here. So the original codes seems do a lots of extra job
> which could easily be avoid using casting operation.
> 

Since the selinux people have to live with that change in the fist place,
I'd like to see an ack of one of the selinux maintainers before I take
in into ipsec-next, Paul?

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: Jan Beulich @ 2013-10-17  9:26 UTC (permalink / raw)
  To: jianhai luan
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FA79F.8060601@oracle.com>

>>> On 17.10.13 at 11:02, jianhai luan <jianhai.luan@oracle.com> wrote:
> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>
>>> If netfront sends at a very low rate, the time between subsequent calls
>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>> credit was exhausted).
>>>
>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2. 
> Because
>>> the fact now must be not before than expire, time_before(now, expire) == 
> true
>>> will verify the scenario.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>> So first of all this must be with a 32-bit netback. And the not
>> coverable gap between activity is well over 240 days long. _If_
>> this really needs dealing with, then why is extending this from
>> 240+ to 480+ days sufficient? I.e. why don't you simply
>> change to 64-bit jiffy values, and use time_after_eq64()?
> 
> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond 
> MAX_ULONG/2 in 64-bit will need long long time)
> 
> I think the gap should be think all environment even now extending 480+. 
> if now fall in the gap,  one timer will be pending and replenish will be 
> in time.  Please run the attachment test program.

Not sure what this is supposed to tell me. I recognize that there
are overflow conditions not handled properly, but (a) I have a
hard time thinking of a sensible guest that sits idle for over 240
days (host uptime usually isn't even coming close to that due to
maintenance requirements) and (b) if there is such a sensible
guest, then I can't see why dealing with one being idle for over
480 days should be required too.

> If use time_after_eq64(), expire ,next_credit and other member will must 
> be u64.

Exactly - that's what I was telling you to do.

Jan

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: David Vrabel @ 2013-10-17  9:15 UTC (permalink / raw)
  To: jianhai luan
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FA79F.8060601@oracle.com>

On 17/10/13 10:02, jianhai luan wrote:
> 
> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>
>>> If netfront sends at a very low rate, the time between subsequent calls
>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>> credit was exhausted).
>>>
>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>> Because
>>> the fact now must be not before than expire, time_before(now, expire)
>>> == true
>>> will verify the scenario.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>> So first of all this must be with a 32-bit netback. And the not
>> coverable gap between activity is well over 240 days long. _If_
>> this really needs dealing with, then why is extending this from
>> 240+ to 480+ days sufficient? I.e. why don't you simply
>> change to 64-bit jiffy values, and use time_after_eq64()?
> 
> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
> MAX_ULONG/2 in 64-bit will need long long time)
> 
> I think the gap should be think all environment even now extending 480+.
> if now fall in the gap,  one timer will be pending and replenish will be
> in time.  Please run the attachment test program.
> 
> If use time_after_eq64(), expire ,next_credit and other member will must
> be u64.

Yes, you'll need to store next_credit as a u64 in vif instead of
calculating it in tx_credit_exceeded from expires (which is only an
unsigned long).

David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17  9:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FA79F.8060601@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]


On 2013-10-17 17:02, jianhai luan wrote:
>
> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>
>>> If netfront sends at a very low rate, the time between subsequent calls
>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>> the guest may become unable to send (e.g., if prior to the long gap, 
>>> all
>>> credit was exhausted).
>>>
>>> We should add the scenario which now beyond 
>>> next_credit+MAX_UNLONG/2. Because
>>> the fact now must be not before than expire, time_before(now, 
>>> expire) == true
>>> will verify the scenario.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>> So first of all this must be with a 32-bit netback. And the not
>> coverable gap between activity is well over 240 days long. _If_
>> this really needs dealing with, then why is extending this from
>> 240+ to 480+ days sufficient? I.e. why don't you simply
>> change to 64-bit jiffy values, and use time_after_eq64()?
>
> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond 
> MAX_ULONG/2 in 64-bit will need long long time)
>
> I think the gap should be think all environment even now extending 
> 480+. if now fall in the gap,  one timer will be pending and replenish 
> will be in time.  Please run the attachment test program.
>

Sorry for miss the attachment in previous letter. Please check the 
attachment.
> If use time_after_eq64(), expire ,next_credit and other member will 
> must be u64.
>>
>> Jan
>>
>>> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
>>> ---
>>>   drivers/net/xen-netback/netback.c |    7 +++++--
>>>   1 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/xen-netback/netback.c
>>> b/drivers/net/xen-netback/netback.c
>>> index f3e591c..31eedaf 100644
>>> --- a/drivers/net/xen-netback/netback.c
>>> +++ b/drivers/net/xen-netback/netback.c
>>> @@ -1194,8 +1194,11 @@ static bool tx_credit_exceeded(struct xenvif 
>>> *vif,
>>> unsigned size)
>>>       if (timer_pending(&vif->credit_timeout))
>>>           return true;
>>>   -    /* Passed the point where we can replenish credit? */
>>> -    if (time_after_eq(now, next_credit)) {
>>> +    /* Credit should be replenished when now does not fall into the
>>> +     * range from expires to next_credit, and time_in_range_open()
>>> +     * is used to verify whether this case happens.
>>> +     */
>>> +    if (!time_in_range_open(now, vif->credit_timeout.expires, 
>>> next_credit)) {
>>>           vif->credit_timeout.expires = now;
>>>           tx_add_credit(vif);
>>>       }
>>> -- 
>>> 1.7.6.5
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>
>>
>


[-- Attachment #2: main.c --]
[-- Type: text/plain, Size: 961 bytes --]

#include <stdio.h>

#define typecheck(type,x) \
({	type __dummy; \
	typeof(x) __dummy2; \
	(void)(&__dummy == &__dummy2); \
	1; \
})

#define time_after(a, b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((b) - (a)) < 0))
#define time_before(a,b)	time_after(b,a)

#define time_after_eq(a,b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((a) -(b)) >= 0))
#define time_before_eq(a, b) time_after_eq(b,a)

void do_nothing()
{
	return;
}

int main()
{
	unsigned char expire, now, next;
	unsigned char delta = 10;
	int i, j;

	for(i = 0; i < 256; i++) {
		expire = i;
		next = expire + delta;

		printf("\n\n\n[%u ... %u]\n", expire, next);
		now = expire;
		for(j=0; j < 1024; j++, now++) {	
			if(j%256 == 0) printf("\n");

			if (time_after_eq(now, next) ||
				time_before(now, expire)) {
				do_nothing();
			}
			else {
				printf("    now=%d\n", (char)now);
			}
		}
	}
	
	return 0;
}

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17  9:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FBB4F02000078000FBB30@nat28.tlf.novell.com>


On 2013-10-17 16:26, Jan Beulich wrote:
>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>
>> If netfront sends at a very low rate, the time between subsequent calls
>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>> the guest may become unable to send (e.g., if prior to the long gap, all
>> credit was exhausted).
>>
>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2. Because
>> the fact now must be not before than expire, time_before(now, expire) == true
>> will verify the scenario.
>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>      ==
>>      !time_in_range_open(now, expire, next_credit)
> So first of all this must be with a 32-bit netback. And the not
> coverable gap between activity is well over 240 days long. _If_
> this really needs dealing with, then why is extending this from
> 240+ to 480+ days sufficient? I.e. why don't you simply
> change to 64-bit jiffy values, and use time_after_eq64()?

Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond 
MAX_ULONG/2 in 64-bit will need long long time)

I think the gap should be think all environment even now extending 480+. 
if now fall in the gap,  one timer will be pending and replenish will be 
in time.  Please run the attachment test program.

If use time_after_eq64(), expire ,next_credit and other member will must 
be u64.
>
> Jan
>
>> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
>> ---
>>   drivers/net/xen-netback/netback.c |    7 +++++--
>>   1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c
>> b/drivers/net/xen-netback/netback.c
>> index f3e591c..31eedaf 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -1194,8 +1194,11 @@ static bool tx_credit_exceeded(struct xenvif *vif,
>> unsigned size)
>>   	if (timer_pending(&vif->credit_timeout))
>>   		return true;
>>   
>> -	/* Passed the point where we can replenish credit? */
>> -	if (time_after_eq(now, next_credit)) {
>> +	/* Credit should be replenished when now does not fall into the
>> +	 * range from expires to next_credit, and time_in_range_open()
>> +	 * is used to verify whether this case happens.
>> +	 */
>> +	if (!time_in_range_open(now, vif->credit_timeout.expires, next_credit)) {
>>   		vif->credit_timeout.expires = now;
>>   		tx_add_credit(vif);
>>   	}
>> -- 
>> 1.7.6.5
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply

* RE: [Xen-devel] [PATCH net-next v4 0/5] xen-netback: IPv6 offload support
From: Paul Durrant @ 2013-10-17  8:42 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org
In-Reply-To: <1381942838.30409.26.camel@kazak.uk.xensource.com>

> -----Original Message-----
> From: Ian Campbell
> Sent: 16 October 2013 18:01
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net-next v4 0/5] xen-netback: IPv6 offload
> support
> 
> On Wed, 2013-10-16 at 17:53 +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 16 October 2013 17:20
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org
> > > Subject: Re: [Xen-devel] [PATCH net-next v4 0/5] xen-netback: IPv6
> offload
> > > support
> > >
> > > On Fri, 2013-10-11 at 16:06 +0100, Paul Durrant wrote:
> > > > This patch series adds support for checksum and large packet offloads
> into
> > > > xen-netback.
> > > > Testing has mainly been done using the Microsoft network hardware
> > > > certification suite running in Server 2008R2 VMs with Citrix PV
> frontends.
> > >
> > > Are there any Linux netfront patches in existence/the pipeline to take
> > > advantage of this?
> > >
> >
> > I was waiting for the backend patches to be accepted first ;-)
> 
> I think it would be useful to get et least an RFC so others can try it
> etc.
> 

Well, everyone can build and use the Windows frontend :-) (https://github.com/xenserver/win-xenvif/tree/upstream)

I'll try to hack up something in xen-netfront soon.

  Paul

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-17  8:41 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131017003421.GA31470@hmsreliant.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> On Mon, Oct 14, 2013 at 03:18:47PM -0700, Eric Dumazet wrote:
> > On Mon, 2013-10-14 at 14:19 -0700, Eric Dumazet wrote:
> > > On Mon, 2013-10-14 at 16:28 -0400, Neil Horman wrote:
> > > 
> > > > So, early testing results today.  I wrote a test module that, allocated a 4k
> > > > buffer, initalized it with random data, and called csum_partial on it 100000
> > > > times, recording the time at the start and end of that loop.  Results on a 2.4
> > > > GHz Intel Xeon processor:
> > > > 
> > > > Without patch: Average execute time for csum_partial was 808 ns
> > > > With patch: Average execute time for csum_partial was 438 ns
> > > 
> > > Impressive, but could you try again with data out of cache ?
> > 
> > So I tried your patch on a GRE tunnel and got following results on a
> > single TCP flow. (short result : no visible difference)
> > 
> > 
> 
> So I went to reproduce these results, but was unable to (due to the fact that I
> only have a pretty jittery network to do testing accross at the moment with
> these devices).  So instead I figured that I would go back to just doing
> measurements with the module that I cobbled together (operating under the
> assumption that it would give me accurate, relatively jitter free results (I've
> attached the module code for reference below).  My results show slightly
> different behavior:
> 
> Base results runs:
> 89417240
> 85170397
> 85208407
> 89422794
> 91645494
> 103655144
> 86063791
> 75647774
> 83502921
> 85847372
> AVG = 875 ns
>
> Prefetch only runs:
> 70962849
> 77555099
> 81898170
> 68249290
> 72636538
> 83039294
> 78561494
> 83393369
> 85317556
> 79570951
> AVG = 781 ns
> 
> Parallel addition only runs:
> 42024233
> 44313064
> 48304416
> 64762297
> 42994259
> 41811628
> 55654282
> 64892958
> 55125582
> 42456403
> AVG = 510 ns
> 
> 
> Both prefetch and parallel addition:
> 41329930
> 40689195
> 61106622
> 46332422
> 49398117
> 52525171
> 49517101
> 61311153
> 43691814
> 49043084
> AVG = 494 ns
> 
> 
> For reference, each of the above large numbers is the number of 
> nanoseconds taken to compute the checksum of a 4kb buffer 100000 times.  
> To get my average results, I ran the test in a loop 10 times, averaged 
> them, and divided by 100000.
> 
> Based on these, prefetching is obviously a a good improvement, but not 
> as good as parallel execution, and the winner by far is doing both.

But in the actual usecase mentioned the packet data was likely cache-cold, 
it just arrived in the NIC and an IRQ got sent. Your testcase uses a 
super-hot 4K buffer that fits into the L1 cache. So it's apples to 
oranges.

To correctly simulate the workload you'd have to:

 - allocate a buffer larger than your L2 cache.

 - to measure the effects of the prefetches you'd also have to randomize
   the individual buffer positions. See how 'perf bench numa' implements a
   random walk via --data_rand_walk, in tools/perf/bench/numa.c.
   Otherwise the CPU might learn your simplistic stream direction and the
   L2 cache might hw-prefetch your data, interfering with any explicit 
   prefetches the code does. In many real-life usecases packet buffers are
   scattered.

Also, it would be nice to see standard deviation noise numbers when two 
averages are close to each other, to be able to tell whether differences 
are statistically significant or not.

For example 'perf stat --repeat' will output stddev for you:

  comet:~/tip> perf stat --repeat 20 --null bash -c 'usleep $((RANDOM*10))'

   Performance counter stats for 'bash -c usleep $((RANDOM*10))' (20 runs):

       0.189084480 seconds time elapsed                                          ( +- 11.95% )

The last '+-' percentage is the noise of the measurement.

Also note that you can inspect many cache behavior details of your 
algorithm via perf stat - the -ddd option will give you a laundry list:

  aldebaran:~> perf stat --repeat 20 -ddd perf bench sched messaging
  ...

     Total time: 0.095 [sec]

 Performance counter stats for 'perf bench sched messaging' (20 runs):

       1519.128721 task-clock (msec)         #   12.305 CPUs utilized            ( +-  0.34% )
            22,882 context-switches          #    0.015 M/sec                    ( +-  2.84% )
             3,927 cpu-migrations            #    0.003 M/sec                    ( +-  2.74% )
            16,616 page-faults               #    0.011 M/sec                    ( +-  0.17% )
     2,327,978,366 cycles                    #    1.532 GHz                      ( +-  1.61% ) [36.43%]
     1,715,561,189 stalled-cycles-frontend   #   73.69% frontend cycles idle     ( +-  1.76% ) [38.05%]
       715,715,454 stalled-cycles-backend    #   30.74% backend  cycles idle     ( +-  2.25% ) [39.85%]
     1,253,106,346 instructions              #    0.54  insns per cycle        
                                             #    1.37  stalled cycles per insn  ( +-  1.71% ) [49.68%]
       241,181,126 branches                  #  158.763 M/sec                    ( +-  1.43% ) [47.83%]
         4,232,053 branch-misses             #    1.75% of all branches          ( +-  1.23% ) [48.63%]
       431,907,354 L1-dcache-loads           #  284.313 M/sec                    ( +-  1.00% ) [48.37%]
        20,550,528 L1-dcache-load-misses     #    4.76% of all L1-dcache hits    ( +-  0.82% ) [47.61%]
         7,435,847 LLC-loads                 #    4.895 M/sec                    ( +-  0.94% ) [36.11%]
         2,419,201 LLC-load-misses           #   32.53% of all LL-cache hits     ( +-  2.93% ) [ 7.33%]
       448,638,547 L1-icache-loads           #  295.326 M/sec                    ( +-  2.43% ) [21.75%]
        22,066,490 L1-icache-load-misses     #    4.92% of all L1-icache hits    ( +-  2.54% ) [30.66%]
       475,557,948 dTLB-loads                #  313.047 M/sec                    ( +-  1.96% ) [37.96%]
         6,741,523 dTLB-load-misses          #    1.42% of all dTLB cache hits   ( +-  2.38% ) [37.05%]
     1,268,628,660 iTLB-loads                #  835.103 M/sec                    ( +-  1.75% ) [36.45%]
            74,192 iTLB-load-misses          #    0.01% of all iTLB cache hits   ( +-  2.88% ) [36.19%]
         4,466,526 L1-dcache-prefetches      #    2.940 M/sec                    ( +-  1.61% ) [36.17%]
         2,396,311 L1-dcache-prefetch-misses #    1.577 M/sec                    ( +-  1.55% ) [35.71%]

       0.123459566 seconds time elapsed                                          ( +-  0.58% )

There's also a number of prefetch counters that might be useful:

 aldebaran:~> perf list | grep prefetch
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  LLC-prefetches                                     [Hardware cache event]
  LLC-prefetch-misses                                [Hardware cache event]
  node-prefetches                                    [Hardware cache event]
  node-prefetch-misses                               [Hardware cache event]

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
From: Simon Horman @ 2013-10-17  8:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
	Wensong Zhang
In-Reply-To: <20131017081142.GA5324@localhost>

On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > > 
> > > 	Hello,
> > > 
> > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > > 
> > > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > > manually apply.
> > > > 
> > > > Let me know.
> > > 
> > > 	It is not critical. I waited weeks the net tree to be
> > > copied into net-next because it collides with the recent
> > > "ipvs: make the service replacement more robust" change in
> > > net tree :) But if a rcu_barrier in the netns cleanup looks
> > > scary enough you can push it to nf. IMHO, it just adds
> > > unneeded delay there.
> > 
> > If it is not critical I would prefer for it to travel through
> > nf-next. Though I do not feel strongly about this.
> 
> Will enqueue for nf-next.
> 
> I'd appreciate if you can recover the tradition of attaching a short
> evaluation in the cover letter as I do when I send pull requests to
> David. Thanks!

Sure, will do.

^ permalink raw reply

* Re: [PATCH net-next] netfilter: xt_socket: use sock_gen_put()
From: Pablo Neira Ayuso @ 2013-10-17  8:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, netfilter-devel
In-Reply-To: <1381507405.4971.108.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, Oct 11, 2013 at 09:03:25AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> TCP listener refactoring, part 7 :
> 
> Use sock_gen_put() instead of xt_socket_put_sk() for future
> SYN_RECV support.

Applied, thanks Eric.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: Jan Beulich @ 2013-10-17  8:26 UTC (permalink / raw)
  To: Jason Luan
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <1381944167-24918-1-git-send-email-jianhai.luan@oracle.com>

>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
> time_after_eq() only works if the delta is < MAX_ULONG/2.
> 
> If netfront sends at a very low rate, the time between subsequent calls
> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
> timer_after_eq() will be incorrect.  Credit will not be replenished and
> the guest may become unable to send (e.g., if prior to the long gap, all
> credit was exhausted).
> 
> We should add the scenario which now beyond next_credit+MAX_UNLONG/2. Because
> the fact now must be not before than expire, time_before(now, expire) == true
> will verify the scenario.
>     time_after_eq(now, next_credit) || time_before (now, expire)
>     ==
>     !time_in_range_open(now, expire, next_credit)

So first of all this must be with a 32-bit netback. And the not
coverable gap between activity is well over 240 days long. _If_
this really needs dealing with, then why is extending this from
240+ to 480+ days sufficient? I.e. why don't you simply
change to 64-bit jiffy values, and use time_after_eq64()?

Jan

> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
> ---
>  drivers/net/xen-netback/netback.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index f3e591c..31eedaf 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1194,8 +1194,11 @@ static bool tx_credit_exceeded(struct xenvif *vif, 
> unsigned size)
>  	if (timer_pending(&vif->credit_timeout))
>  		return true;
>  
> -	/* Passed the point where we can replenish credit? */
> -	if (time_after_eq(now, next_credit)) {
> +	/* Credit should be replenished when now does not fall into the
> +	 * range from expires to next_credit, and time_in_range_open()
> +	 * is used to verify whether this case happens.
> +	 */
> +	if (!time_in_range_open(now, vif->credit_timeout.expires, next_credit)) {
>  		vif->credit_timeout.expires = now;
>  		tx_add_credit(vif);
>  	}
> -- 
> 1.7.6.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply

* View the attached file
From: Microsoft Promotion @ 2013-10-17  8:10 UTC (permalink / raw)

In-Reply-To: <1381940933.56384.YahooMailNeo@web5705.biz.mail.ne1.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 22 bytes --]

View the attached file

[-- Attachment #2: MICROSOFT AWARD PROMOTION.doc --]
[-- Type: application/msword, Size: 109568 bytes --]

^ permalink raw reply

* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
From: Pablo Neira Ayuso @ 2013-10-17  8:11 UTC (permalink / raw)
  To: Simon Horman
  Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
	Wensong Zhang
In-Reply-To: <20131017004939.GB21728@verge.net.au>

On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > 
> > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > manually apply.
> > > 
> > > Let me know.
> > 
> > 	It is not critical. I waited weeks the net tree to be
> > copied into net-next because it collides with the recent
> > "ipvs: make the service replacement more robust" change in
> > net tree :) But if a rcu_barrier in the netns cleanup looks
> > scary enough you can push it to nf. IMHO, it just adds
> > unneeded delay there.
> 
> If it is not critical I would prefer for it to travel through
> nf-next. Though I do not feel strongly about this.

Will enqueue for nf-next.

I'd appreciate if you can recover the tradition of attaching a short
evaluation in the cover letter as I do when I send pull requests to
David. Thanks!

^ permalink raw reply

* Re: [PATCH 5/5] net: rfkill: gpio: add ACPI support
From: Mika Westerberg @ 2013-10-17  7:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Heikki Krogerus, John W. Linville, Johannes Berg, Rhyland Klein,
	linux-acpi, linux-wireless, netdev
In-Reply-To: <2878506.7lf24R85t6@vostro.rjw.lan>

On Wed, Oct 16, 2013 at 10:55:01PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, October 16, 2013 01:53:43 PM Heikki Krogerus wrote:
> > Including ACPI ID for Broadcom GPS receiver BCM4752.
> > 
> > Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> > ---
> >  net/rfkill/rfkill-gpio.c | 31 ++++++++++++++++++++++++++++++-
> >  1 file changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c
> > index 2dd78c6..5620d3c 100644
> > --- a/net/rfkill/rfkill-gpio.c
> > +++ b/net/rfkill/rfkill-gpio.c
> > @@ -24,6 +24,8 @@
> >  #include <linux/platform_device.h>
> >  #include <linux/clk.h>
> >  #include <linux/slab.h>
> > +#include <linux/acpi.h>
> > +#include <linux/acpi_gpio.h>
> >  
> >  #include <linux/rfkill-gpio.h>
> >  
> > @@ -70,6 +72,23 @@ static const struct rfkill_ops rfkill_gpio_ops = {
> >  	.set_block = rfkill_gpio_set_power,
> >  };
> >  
> > +static int rfkill_gpio_acpi_probe(struct device *dev,
> > +				  struct rfkill_gpio_data *rfkill)
> > +{
> > +	const struct acpi_device_id *id;
> > +
> > +	id = acpi_match_device(dev->driver->acpi_match_table, dev);
> > +	if (!id)
> > +		return -ENODEV;
> > +
> > +	rfkill->name = dev_name(dev);
> > +	rfkill->type = (unsigned)id->driver_data;
> > +	rfkill->reset_gpio = acpi_get_gpio_by_index(dev, 0, NULL);
> > +	rfkill->shutdown_gpio = acpi_get_gpio_by_index(dev, 1, NULL);
> > +
> > +	return 0;
> > +}
> > +
> >  static int rfkill_gpio_probe(struct platform_device *pdev)
> >  {
> >  	struct rfkill_gpio_platform_data *pdata = pdev->dev.platform_data;
> > @@ -82,7 +101,11 @@ static int rfkill_gpio_probe(struct platform_device *pdev)
> >  	if (!rfkill)
> >  		return -ENOMEM;
> >  
> > -	if (pdata) {
> > +	if (ACPI_HANDLE(&pdev->dev)) {
> > +		ret = rfkill_gpio_acpi_probe(&pdev->dev, rfkill);
> > +		if (ret)
> > +			return ret;
> > +	} else if (pdata) {
> >  		clk_name = pdata->power_clk_name;
> >  		rfkill->name = pdata->name;
> >  		rfkill->type = pdata->type;
> > @@ -170,12 +193,18 @@ static int rfkill_gpio_remove(struct platform_device *pdev)
> >  	return 0;
> >  }
> >  
> > +static const struct acpi_device_id rfkill_acpi_match[] = {
> > +	{ "BCM4752", RFKILL_TYPE_GPS },
> > +	{ },
> > +};
> > +
> >  static struct platform_driver rfkill_gpio_driver = {
> >  	.probe = rfkill_gpio_probe,
> >  	.remove = rfkill_gpio_remove,
> >  	.driver = {
> >  		.name = "rfkill_gpio",
> >  		.owner = THIS_MODULE,
> > +		.acpi_match_table = ACPI_PTR(rfkill_acpi_match),
> >  	},
> >  };
> 
> Looks good to me.
> 
> Has Mika seen this?

Yes, saw it now and looks good to me as well.

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

for the whole series, for what it's worth.

^ permalink raw reply

* [PATCH] be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd
From: Sathya Perla @ 2013-10-17  6:17 UTC (permalink / raw)
  To: netdev

From: Vasundhara Volam <vasundhara.volam@emulex.com>

It is a required field for all TX_CREATE cmd versions > 0.
This fixes a driver initialization failure, caused by recent SH-R Firmwares
(versions > 10.0.639.0) failing the TX_CREATE cmd when if_id field is
not passed.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---

Dave, pls consider queuing this patch for -stable tree as it fixes
a driver load failure. Thanks.

 drivers/net/ethernet/emulex/benet/be_cmds.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index bd0e0c0..c08fd32 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -1198,7 +1198,6 @@ int be_cmd_txq_create(struct be_adapter *adapter, struct be_tx_obj *txo)
 
 	if (lancer_chip(adapter)) {
 		req->hdr.version = 1;
-		req->if_id = cpu_to_le16(adapter->if_handle);
 	} else if (BEx_chip(adapter)) {
 		if (adapter->function_caps & BE_FUNCTION_CAPS_SUPER_NIC)
 			req->hdr.version = 2;
@@ -1206,6 +1205,8 @@ int be_cmd_txq_create(struct be_adapter *adapter, struct be_tx_obj *txo)
 		req->hdr.version = 2;
 	}
 
+	if (req->hdr.version > 0)
+		req->if_id = cpu_to_le16(adapter->if_handle);
 	req->num_pages = PAGES_4K_SPANNED(q_mem->va, q_mem->size);
 	req->ulp_num = BE_ULP1_NUM;
 	req->type = BE_ETH_TX_RING_TYPE_STANDARD;
-- 
1.7.1

^ permalink raw reply related

* RE: [PATCH 0/2] be2net: patch set
From: Sathya Perla @ 2013-10-17  6:00 UTC (permalink / raw)
  To: Sathya Perla, netdev@vger.kernel.org
In-Reply-To: <1381837869-9543-1-git-send-email-sathya.perla@emulex.com>


> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf
> Of Sathya Perla
> 
> Pls apply the following fixes to the net tree. Thanks.
> 
> Vasundhara Volam (2):
>   be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd
>   be2net: drop non-tso frames longer than mtu
> 
Dave, I'll re-send this patch-set without the "drop non-tso frames" patch.
Based on Eric's comments, it seems better to place the mtu checks in pktgen and
before ndo_start_xmit() invocation so that all drivers/devices may benefit from the check.

thanks,
-Sathya

^ permalink raw reply

* RE: [PATCH net v2] be2net: Warn users of possible broken functionality on BE2 cards with very old FW versions with latest driver
From: Somnath Kotur @ 2013-10-17  5:41 UTC (permalink / raw)
  To: Ivan Vecera; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <525EA9E7.409@redhat.com>

> -----Original Message-----
> From: Ivan Vecera [mailto:ivecera@redhat.com]
> Sent: Wednesday, October 16, 2013 8:30 PM
> To: Somnath Kotur
> Cc: David Miller; netdev@vger.kernel.org
> Subject: Re: [PATCH net v2] be2net: Warn users of possible broken
> functionality on BE2 cards with very old FW versions with latest driver
> 
> On 10/07/2013 06:31 PM, David Miller wrote:
> > From: Somnath Kotur <somnath.kotur@emulex.com>
> > Date: Thu, 3 Oct 2013 15:34:29 +0530
> >
> >> +	if (BE2_chip(adapter) && memcmp(adapter->fw_ver, "4.", 2) < 0) {
> >> +		dev_err(dev, "Firmware version is too old.IRQs may not
> work\n");
> >
> > So many grammatical mistakes in one line.
> >
> > First sentence got a period, second one did not.
> >
> > Missing space between period and second sentence.
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org More majordomo
> info
> > at  http://vger.kernel.org/majordomo-info.html
> >
> Som, any plan to send v3?
> 
> Ivan
HI Ivan,
   Yes , the problem was I was trying to stick within the 80 chars line limit , the missing space above would have pushed it over
needlessly warranting an extra line to accommodate a character.
Will rework the sentence using Joe Perches's suggestions as well as address Ben Hutching's concerns.
Stay tuned.

Thanks
Som

^ permalink raw reply

* [PATCH net-next v3 9/9] net: switch net_secret key generation to net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-17  5:32 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/core/secure_seq.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 3f1ec15..b02fd16 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -7,6 +7,7 @@
 #include <linux/hrtimer.h>
 #include <linux/ktime.h>
 #include <linux/string.h>
+#include <linux/net.h>
 
 #include <net/secure_seq.h>
 
@@ -16,18 +17,7 @@ static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
 
 static void net_secret_init(void)
 {
-	u32 tmp;
-	int i;
-
-	if (likely(net_secret[0]))
-		return;
-
-	for (i = NET_SECRET_SIZE; i > 0;) {
-		do {
-			get_random_bytes(&tmp, sizeof(tmp));
-		} while (!tmp);
-		cmpxchg(&net_secret[--i], 0, tmp);
-	}
+	net_get_random_once(net_secret, sizeof(net_secret));
 }
 
 #ifdef CONFIG_INET
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 8/9] tcp: switch tcp_fastopen key generation to net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-17  5:32 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Yuchung Cheng, Eric Dumazet, David S. Miller
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

Changed key initialization of tcp_fastopen cookies to net_get_random_once.

If the user sets a custom key net_get_random_once must be called at
least once to ensure we don't overwrite the user provided key when the
first cookie is generated later on.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/tcp.h          |  2 +-
 net/ipv4/sysctl_net_ipv4.c |  5 +++++
 net/ipv4/tcp_fastopen.c    | 27 ++++++++++++++++-----------
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0fa8fdc..027b663 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1322,7 +1322,7 @@ extern struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
 int tcp_fastopen_reset_cipher(void *key, unsigned int len);
 void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
 			     struct tcp_fastopen_cookie *foc);
-
+void tcp_fastopen_init_key_once(bool publish);
 #define TCP_FASTOPEN_KEY_LENGTH 16
 
 /* Fastopen key context */
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index c08f096..4b161d5 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -274,6 +274,11 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
 			ret = -EINVAL;
 			goto bad_key;
 		}
+		/* Generate a dummy secret but don't publish it. This
+		 * is needed so we don't regenerate a new key on the
+		 * first invocation of tcp_fastopen_cookie_gen
+		 */
+		tcp_fastopen_init_key_once(false);
 		tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH);
 	}
 
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index ab7bd35..766032b 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -14,6 +14,20 @@ struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
 
 static DEFINE_SPINLOCK(tcp_fastopen_ctx_lock);
 
+void tcp_fastopen_init_key_once(bool publish)
+{
+	static u8 key[TCP_FASTOPEN_KEY_LENGTH];
+
+	/* tcp_fastopen_reset_cipher publishes the new context
+	 * atomically, so we allow this race happening here.
+	 *
+	 * All call sites of tcp_fastopen_cookie_gen also check
+	 * for a valid cookie, so this is an acceptable risk.
+	 */
+	if (net_get_random_once(key, sizeof(key)) && publish)
+		tcp_fastopen_reset_cipher(key, sizeof(key));
+}
+
 static void tcp_fastopen_ctx_free(struct rcu_head *head)
 {
 	struct tcp_fastopen_context *ctx =
@@ -70,6 +84,8 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
 	__be32 path[4] = { src, dst, 0, 0 };
 	struct tcp_fastopen_context *ctx;
 
+	tcp_fastopen_init_key_once(true);
+
 	rcu_read_lock();
 	ctx = rcu_dereference(tcp_fastopen_ctx);
 	if (ctx) {
@@ -78,14 +94,3 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
 	}
 	rcu_read_unlock();
 }
-
-static int __init tcp_fastopen_init(void)
-{
-	__u8 key[TCP_FASTOPEN_KEY_LENGTH];
-
-	get_random_bytes(key, sizeof(key));
-	tcp_fastopen_reset_cipher(key, sizeof(key));
-	return 0;
-}
-
-late_initcall(tcp_fastopen_init);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 7/9] inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-17  5:32 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <1381987923-1524-1-git-send-email-hannes@stressinduktion.org>

Initialize the ehash and ipv6_hash_secrets with net_get_random_once.

Each compilation unit gets its own secret now:
  ipv4/inet_hashtables.o
  ipv4/udp.o
  ipv6/inet6_hashtables.o
  ipv6/udp.o
  rds/connection.o

The functions still get inlined into the hashing functions. In the fast
path we have at most two (needed in ipv6) if (unlikely(...)).

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_sock.h     |  4 ----
 net/ipv4/af_inet.c          | 27 ---------------------------
 net/ipv4/inet_hashtables.c  |  4 ++++
 net/ipv4/udp.c              |  6 +++++-
 net/ipv6/af_inet6.c         |  5 -----
 net/ipv6/inet6_hashtables.c | 15 ++++++++++++---
 net/ipv6/udp.c              | 17 ++++++++++++++---
 net/rds/connection.c        | 12 +++++++++---
 8 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 7a6c7f8..1833c3f 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -204,10 +204,6 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to,
 
 int inet_sk_rebuild_header(struct sock *sk);
 
-extern u32 inet_ehash_secret;
-extern u32 ipv6_hash_secret;
-void build_ehash_secret(void);
-
 static inline unsigned int __inet_ehashfn(const __be32 laddr,
 					  const __u16 lport,
 					  const __be32 faddr,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 35913fb..b6bdd82 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -245,29 +245,6 @@ out:
 }
 EXPORT_SYMBOL(inet_listen);
 
-u32 inet_ehash_secret __read_mostly;
-EXPORT_SYMBOL(inet_ehash_secret);
-
-u32 ipv6_hash_secret __read_mostly;
-EXPORT_SYMBOL(ipv6_hash_secret);
-
-/*
- * inet_ehash_secret must be set exactly once, and to a non nul value
- * ipv6_hash_secret must be set exactly once.
- */
-void build_ehash_secret(void)
-{
-	u32 rnd;
-
-	do {
-		get_random_bytes(&rnd, sizeof(rnd));
-	} while (rnd == 0);
-
-	if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0)
-		get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
-}
-EXPORT_SYMBOL(build_ehash_secret);
-
 /*
  *	Create an inet socket.
  */
@@ -284,10 +261,6 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (unlikely(!inet_ehash_secret))
-		if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
-			build_ehash_secret();
-
 	sock->state = SS_UNCONNECTED;
 
 	/* Look for the requested type/protocol pair. */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 18aa668..8b9cf27 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -28,6 +28,10 @@ static unsigned int inet_ehashfn(struct net *net, const __be32 laddr,
 				 const __u16 lport, const __be32 faddr,
 				 const __be16 fport)
 {
+	static u32 inet_ehash_secret __read_mostly;
+
+	net_get_random_once(&inet_ehash_secret, sizeof(inet_ehash_secret));
+
 	return __inet_ehashfn(laddr, lport, faddr, fport,
 			      inet_ehash_secret + net_hash_mix(net));
 }
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b4437c7..89909dd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -411,8 +411,12 @@ static unsigned int udp_ehashfn(struct net *net, const __be32 laddr,
 				 const __u16 lport, const __be32 faddr,
 				 const __be16 fport)
 {
+	static u32 udp_ehash_secret __read_mostly;
+
+	net_get_random_once(&udp_ehash_secret, sizeof(udp_ehash_secret));
+
 	return __inet_ehashfn(laddr, lport, faddr, fport,
-			      inet_ehash_secret + net_hash_mix(net));
+			      udp_ehash_secret + net_hash_mix(net));
 }
 
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a2cb07c..20af1fb 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -110,11 +110,6 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (sock->type != SOCK_RAW &&
-	    sock->type != SOCK_DGRAM &&
-	    !inet_ehash_secret)
-		build_ehash_secret();
-
 	/* Look for the requested type/protocol pair. */
 lookup_protocol:
 	err = -ESOCKTNOSUPPORT;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index fa7dd38..262e13c 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -29,10 +29,19 @@ static unsigned int inet6_ehashfn(struct net *net,
 				  const struct in6_addr *faddr,
 				  const __be16 fport)
 {
-	const u32 lhash = (__force u32)laddr->s6_addr32[3];
-	const u32 fhash = __ipv6_addr_jhash(faddr, ipv6_hash_secret);
+	static u32 inet6_ehash_secret __read_mostly;
+	static u32 ipv6_hash_secret __read_mostly;
+
+	u32 lhash, fhash;
+
+	net_get_random_once(&inet6_ehash_secret, sizeof(inet6_ehash_secret));
+	net_get_random_once(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
+
+	lhash = (__force u32)laddr->s6_addr32[3];
+	fhash = __ipv6_addr_jhash(faddr, ipv6_hash_secret);
+
 	return __inet6_ehashfn(lhash, lport, fhash, fport,
-			       inet_ehash_secret + net_hash_mix(net));
+			       inet6_ehash_secret + net_hash_mix(net));
 }
 
 static int inet6_sk_ehashfn(const struct sock *sk)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 324bd36..44fc4e3 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -59,10 +59,21 @@ static unsigned int udp6_ehashfn(struct net *net,
 				  const struct in6_addr *faddr,
 				  const __be16 fport)
 {
-	const u32 lhash = (__force u32)laddr->s6_addr32[3];
-	const u32 fhash = __ipv6_addr_jhash(faddr, ipv6_hash_secret);
+	static u32 udp6_ehash_secret __read_mostly;
+	static u32 udp_ipv6_hash_secret __read_mostly;
+
+	u32 lhash, fhash;
+
+	net_get_random_once(&udp6_ehash_secret,
+			    sizeof(udp6_ehash_secret));
+	net_get_random_once(&udp_ipv6_hash_secret,
+			    sizeof(udp_ipv6_hash_secret));
+
+	lhash = (__force u32)laddr->s6_addr32[3];
+	fhash = __ipv6_addr_jhash(faddr, udp_ipv6_hash_secret);
+
 	return __inet6_ehashfn(lhash, lport, fhash, fport,
-			       inet_ehash_secret + net_hash_mix(net));
+			       udp_ipv6_hash_secret + net_hash_mix(net));
 }
 
 int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 45e2366..378c3a6 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -51,10 +51,16 @@ static struct kmem_cache *rds_conn_slab;
 
 static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
 {
+	static u32 rds_hash_secret __read_mostly;
+
+	unsigned long hash;
+
+	net_get_random_once(&rds_hash_secret, sizeof(rds_hash_secret));
+
 	/* Pass NULL, don't need struct net for hash */
-	unsigned long hash = __inet_ehashfn(be32_to_cpu(laddr), 0,
-					    be32_to_cpu(faddr), 0,
-					    inet_ehash_secret);
+	hash = __inet_ehashfn(be32_to_cpu(laddr), 0,
+			      be32_to_cpu(faddr), 0,
+			      rds_hash_secret);
 	return &rds_conn_hash[hash & RDS_CONNECTION_HASH_MASK];
 }
 
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox