Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-12-19  3:04 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Ethan Zhao, Ben Hutchings, Mary Mcgrath, netdev@vger.kernel.org,
	e1000-devel@lists.sf.net, linux-kernel@vger.kernel.org, linux-pci
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD638C60A9B@ORSMSX102.amr.corp.intel.com>

Hi all,

I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.

Thanks all of your help.

Best Regards,
Joe

On 11/29/12 23:52, Fujinaka, Todd wrote:
> Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
> 
> Todd Fujinaka
> Technical Marketing Engineer
> LAN Access Division (LAD)
> Intel Corporation
> todd.fujinaka@intel.com
> (503) 712-4565
> 
> 
> -----Original Message-----
> From: Ethan Zhao [mailto:ethan.kernel@gmail.com] 
> Sent: Wednesday, November 28, 2012 7:10 PM
> To: Fujinaka, Todd
> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
> 
> Joe,
>     Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
>     Anyway, to see if is a payload issue or,  you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause.  I thinks it is much easier than modify the BIOS or  eeprom of NIC.
> 
>     e.g.
>    set device control register to 0f 00   (128 bytes payload size)
>    #   setpci -v -s 00:02.0 98.w=000f
>    set device link control register to 60h (retrain the link)
>    #  setpci -v -s 00:02.0 a0.b=60
> 
>   Hope it works,  Just my 2 cents.
> 
> Ethan.zhao@oracle.com
> 
> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
>> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>
>> Todd Fujinaka
>> Technical Marketing Engineer
>> LAN Access Division (LAD)
>> Intel Corporation
>> todd.fujinaka@intel.com
>> (503) 712-4565
>>
>>
>> -----Original Message-----
>> From: Joe Jin [mailto:joe.jin@oracle.com]
>> Sent: Wednesday, November 28, 2012 12:31 AM
>> To: Ben Hutchings
>> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org; 
>> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>
>> On 11/28/12 02:10, Ben Hutchings wrote:
>>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>>> Forgive me if I'm being too repetitious as I think some of this has 
>>>> been mentioned in the past.
>>>>
>>>> We (and by we I mean the Ethernet part and driver) can only change 
>>>> the advertised availability of a larger MaxPayloadSize. The size is 
>>>> negotiated by both sides of the link when the link is established.
>>>> The driver should not change the size of the link as it would be 
>>>> poking at registers outside of its scope and is controlled by the 
>>>> upstream bridge (not us).
>>> [...]
>>>
>>> MaxPayloadSize (MPS) is not negotiated between devices but is 
>>> programmed by the system firmware (at least for devices present at 
>>> boot - the kernel may be responsible in case of hotplug).  You can 
>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
>>> others) to set a policy that overrides this, but no policy will allow 
>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>
>>
>> Ben,
>>
>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>
>>
>> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>>
>> Thanks in advance,
>> Joe
>>


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 

^ permalink raw reply

* Re: [GIT PULL net-next 04/17] ndisc: Introduce ndisc_fill_redirect_hdr_option().
From: YOSHIFUJI Hideaki @ 2012-12-19  3:08 UTC (permalink / raw)
  To: 'netdev@vger.kernel.org', David Miller; +Cc: YOSHIFUJI Hideaki
In-Reply-To: <50D04B4B.7060002@linux-ipv6.org>

YOSHIFUJI Hideaki wrote:
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> ---
>  net/ipv6/ndisc.c |   21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index a181113..0a4f3a9 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1332,6 +1332,19 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
>  	icmpv6_notify(skb, NDISC_REDIRECT, 0, 0);
>  }
>  
> +static u8 *ndisc_fill_redirect_hdr_option(u8 *opt, struct sk_buff *orig_skb,
> +					  int rd_len)
> +{
> +	memset(opt, 0, 8);
> +	*(opt++) = ND_OPT_REDIRECT_HDR;
> +	*(opt++) = (rd_len >> 3);
> +	opt += 6;
> +
> +	memcpy(opt, ipv6_hdr(orig_skb), rd_len - 8);
> +
> +	return opt;
> +}
> +
>  void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
>  {
>  	struct net_device *dev = skb->dev;

By the way, is it really safe to use memcpy here?
Should we use skb_copy_bits() instead?

--yoshfuji

^ permalink raw reply

* Re: [GIT PULL net-next 01/17] ndisc: Fix size calculation for headers.
From: YOSHIFUJI Hideaki @ 2012-12-19  3:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, YOSHIFUJI Hideaki
In-Reply-To: <20121218.162338.1594192050394527214.davem@davemloft.net>

David Miller wrote:
> From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Date: Tue, 18 Dec 2012 19:52:04 +0900
> 
>> We used to allocate MAX_HEADER bytes more than needed but
>> reserved hlen only (not MAX_HEADER + hlen) and the MAX_HEADER
>> was left behind.
>>
>> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> 
> This change is wrong.
> 
> The MAX_HEADER is being used in order to accomodate any
> possible encapsulation that may occur.

As I tried to explain in the commit message, ndisc_build_skb() does tI would say his:

|        int hlen = LL_RESERVED_SPACE(dev);
:
|        skb = sock_alloc_send_skb(sk,
|                                  (MAX_HEADER + sizeof(struct ipv6hdr) +
|                                   len + hlen + tlen),
|                                  1, &err);
|        if (!skb) {
|                ND_PRINTK(0, err, "ND: %s failed to allocate an skb, err=%d\n",
|                          __func__, err);
|                return NULL;
|        }
|
|        skb_reserve(skb, hlen);

This means, MAX_HEADER has been placed at the TAIL of
the buffer, instead of the HEAD of the buffer, like this.

head        data                   tail                        end
+--------------------------------------------------------------+
+           |                      |          |                |
+--------------------------------------------------------------+
|<- hlen  ->|<---ipv6 packet------>|<--tlen-->|<--MAX_HEADER-->|
|  = LL_
     RESERVED_
     SPACE(dev)

MAX_HEADER will not be used at all and I would say it is waste of
memory.

Or do you expect something like this?

+--------------------------------------------------------------+
+                |           |                      |          |
+--------------------------------------------------------------+
|<--MAX_HEADER-->|<---hlen-->|<---ipv6 packet------>|<--tlen-->|
head                       data                   tail       end

or like this:

+--------------------------------------------------+
+                |                      |          |
+--------------------------------------------------+
|<--MAX_HEADER-->|<---ipv6 packet------>|<--tlen-->|
head           data                   tail       end

If so, we should (re)visit almost all users of
sock_alloc_send_skb() and friends, anyway, I think.

--yoshfuji

^ permalink raw reply

* Re: [RFC PATCH v3 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-19  5:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paul Moore, netdev, linux-security-module, selinux
In-Reply-To: <20121218230808.GC1135@redhat.com>

On 12/19/2012 07:08 AM, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2012 at 05:53:52PM -0500, Paul Moore wrote:
>> This patch corrects some problems with LSM/SELinux that were introduced
>> with the multiqueue patchset.  The problem stems from the fact that the
>> multiqueue work changed the relationship between the tun device and its
>> associated socket; before the socket persisted for the life of the
>> device, however after the multiqueue changes the socket only persisted
>> for the life of the userspace connection (fd open).  For non-persistent
>> devices this is not an issue, but for persistent devices this can cause
>> the tun device to lose its SELinux label.
>>
>> We correct this problem by adding an opaque LSM security blob to the
>> tun device struct which allows us to have the LSM security state, e.g.
>> SELinux labeling information, persist for the lifetime of the tun
>> device.  In the process we tweak the LSM hooks to work with this new
>> approach to TUN device/socket labeling and introduce a new LSM hook,
>> security_tun_dev_attach_queue(), to approve requests to attach to a
>> TUN queue via TUNSETQUEUE.
>>
>> The SELinux code has been adjusted to match the new LSM hooks, the
>> other LSMs do not make use of the LSM TUN controls.  This patch makes
>> use of the recently added "tun_socket:attach_queue" permission to
>> restrict access to the TUNSETQUEUE operation.  On older SELinux
>> policies which do not define the "tun_socket:attach_queue" permission
>> the access control decision for TUNSETQUEUE will be handled according
>> to the SELinux policy's unknown permission setting.
>>
>> Signed-off-by: Paul Moore <pmoore@redhat.com>
>
> Looks good to me. A comment not directly related to this patch, below.

Good to me too, will do some test on this.
>
>> ---
>>  drivers/net/tun.c                 |   27 +++++++++++++----
>>  include/linux/security.h          |   59 +++++++++++++++++++++++++++++--------
>>  security/capability.c             |   24 +++++++++++++--
>>  security/security.c               |   28 ++++++++++++++----
>>  security/selinux/hooks.c          |   50 ++++++++++++++++++++++++-------
>>  security/selinux/include/objsec.h |    4 +++
>>  6 files changed, 154 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 504f7f1..4b7754c 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -186,6 +186,7 @@ struct tun_struct {
>>  	unsigned long ageing_time;
>>  	unsigned int numdisabled;
>>  	struct list_head disabled;
>> +	void *security;
>>  };
>>  
[...]
>>  
>> @@ -1805,12 +1813,18 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>>  
>>  	if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
>>  		tun = tfile->detached;
>> -		if (!tun)
>> +		if (!tun) {
>>  			ret = -EINVAL;
>> -		else if (tun_not_capable(tun))
>> +			goto unlock;
>> +		}
>> +		if (tun_not_capable(tun)) {
> By the way I don't think we need to limit set_queue
> by tun_not_capable. It simply makes a queue active/inactive
> which seems harmless. Jason?

Maybe we should keep this, when a tun has a owner it would not expected
any other user except the one has CAP_NET_ADMIN to active or de-active a
queue.
>
>>  			ret = -EPERM;
>> -		else
[...]

^ permalink raw reply

* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
From: Yijing Wang @ 2012-12-19  5:52 UTC (permalink / raw)
  To: Joe Jin
  Cc: Fujinaka, Todd, Ethan Zhao, Ben Hutchings, Mary Mcgrath,
	netdev@vger.kernel.org, e1000-devel@lists.sf.net,
	linux-kernel@vger.kernel.org, linux-pci, Jon Mason, Bjorn Helgaas
In-Reply-To: <50D12EA2.3030907@oracle.com>

On 2012/12/19 11:04, Joe Jin wrote:
> Hi all,
> 
> I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
> to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.
> 

Hi Joe,
   I found similar problem when I do pci hotplug, discussion is here:http://marc.info/?l=linux-pci&m=134810569924220&w=2.
We try to improve Linux kernel to debug this problem easily based Bjorn's suggestion. Jon sent out the first version patch http://marc.info/?l=linux-pci&m=135002016005274&w=2.
I think we can do further here, http://marc.info/?l=linux-pci&m=135115581307869&w=2. I hope this information can help you.

Thanks!
Yijing.

> Thanks all of your help.
> 
> Best Regards,
> Joe
> 
> On 11/29/12 23:52, Fujinaka, Todd wrote:
>> Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
>>
>> Todd Fujinaka
>> Technical Marketing Engineer
>> LAN Access Division (LAD)
>> Intel Corporation
>> todd.fujinaka@intel.com
>> (503) 712-4565
>>
>>
>> -----Original Message-----
>> From: Ethan Zhao [mailto:ethan.kernel@gmail.com] 
>> Sent: Wednesday, November 28, 2012 7:10 PM
>> To: Fujinaka, Todd
>> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>
>> Joe,
>>     Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
>>     Anyway, to see if is a payload issue or,  you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause.  I thinks it is much easier than modify the BIOS or  eeprom of NIC.
>>
>>     e.g.
>>    set device control register to 0f 00   (128 bytes payload size)
>>    #   setpci -v -s 00:02.0 98.w=000f
>>    set device link control register to 60h (retrain the link)
>>    #  setpci -v -s 00:02.0 a0.b=60
>>
>>   Hope it works,  Just my 2 cents.
>>
>> Ethan.zhao@oracle.com
>>
>> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
>>> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>>
>>> Todd Fujinaka
>>> Technical Marketing Engineer
>>> LAN Access Division (LAD)
>>> Intel Corporation
>>> todd.fujinaka@intel.com
>>> (503) 712-4565
>>>
>>>
>>> -----Original Message-----
>>> From: Joe Jin [mailto:joe.jin@oracle.com]
>>> Sent: Wednesday, November 28, 2012 12:31 AM
>>> To: Ben Hutchings
>>> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org; 
>>> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>
>>> On 11/28/12 02:10, Ben Hutchings wrote:
>>>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>>>> Forgive me if I'm being too repetitious as I think some of this has 
>>>>> been mentioned in the past.
>>>>>
>>>>> We (and by we I mean the Ethernet part and driver) can only change 
>>>>> the advertised availability of a larger MaxPayloadSize. The size is 
>>>>> negotiated by both sides of the link when the link is established.
>>>>> The driver should not change the size of the link as it would be 
>>>>> poking at registers outside of its scope and is controlled by the 
>>>>> upstream bridge (not us).
>>>> [...]
>>>>
>>>> MaxPayloadSize (MPS) is not negotiated between devices but is 
>>>> programmed by the system firmware (at least for devices present at 
>>>> boot - the kernel may be responsible in case of hotplug).  You can 
>>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
>>>> others) to set a policy that overrides this, but no policy will allow 
>>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>>
>>>
>>> Ben,
>>>
>>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>>
>>>
>>> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>>>
>>> Thanks in advance,
>>> Joe
>>>
> 
> 


-- 
Thanks!
Yijing

^ permalink raw reply

* [PATCH net-next v2] xfrm: removes a superfluous check and add a statistic
From: roy.qing.li @ 2012-12-19  5:26 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

Remove the check if x->km.state equal to XFRM_STATE_VALID in
xfrm_state_check_expire(), which will be done before call
xfrm_state_check_expire().

add a LINUX_MIB_XFRMOUTSTATEINVALID statistic to record the
outbound error due to invalid xfrm state.

v2: move newly adding statistic LINUX_MIB_XFRMOUTSTATEINVALID
to the end of the enumeration

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 include/uapi/linux/snmp.h |    1 +
 net/xfrm/xfrm_output.c    |    6 ++++++
 net/xfrm/xfrm_proc.c      |    1 +
 net/xfrm/xfrm_state.c     |    3 ---
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index fdfba23..b49eab8 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -278,6 +278,7 @@ enum
 	LINUX_MIB_XFRMOUTPOLDEAD,		/* XfrmOutPolDead */
 	LINUX_MIB_XFRMOUTPOLERROR,		/* XfrmOutPolError */
 	LINUX_MIB_XFRMFWDHDRERROR,		/* XfrmFwdHdrError*/
+	LINUX_MIB_XFRMOUTSTATEINVALID,		/* XfrmOutStateInvalid */
 	__LINUX_MIB_XFRMMAX
 };
 
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 95a338c..3670526 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -61,6 +61,12 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
 		}
 
 		spin_lock_bh(&x->lock);
+
+		if (unlikely(x->km.state != XFRM_STATE_VALID)) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEINVALID);
+			goto error_nolock;
+		}
+
 		err = xfrm_state_check_expire(x);
 		if (err) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEEXPIRED);
diff --git a/net/xfrm/xfrm_proc.c b/net/xfrm/xfrm_proc.c
index d0a1af8..e4cd441 100644
--- a/net/xfrm/xfrm_proc.c
+++ b/net/xfrm/xfrm_proc.c
@@ -39,6 +39,7 @@ static const struct snmp_mib xfrm_mib_list[] = {
 	SNMP_MIB_ITEM("XfrmOutStateModeError", LINUX_MIB_XFRMOUTSTATEMODEERROR),
 	SNMP_MIB_ITEM("XfrmOutStateSeqError", LINUX_MIB_XFRMOUTSTATESEQERROR),
 	SNMP_MIB_ITEM("XfrmOutStateExpired", LINUX_MIB_XFRMOUTSTATEEXPIRED),
+	SNMP_MIB_ITEM("XfrmOutStateInvalid", LINUX_MIB_XFRMOUTSTATEINVALID),
 	SNMP_MIB_ITEM("XfrmOutPolBlock", LINUX_MIB_XFRMOUTPOLBLOCK),
 	SNMP_MIB_ITEM("XfrmOutPolDead", LINUX_MIB_XFRMOUTPOLDEAD),
 	SNMP_MIB_ITEM("XfrmOutPolError", LINUX_MIB_XFRMOUTPOLERROR),
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3459692..05db236 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1370,9 +1370,6 @@ int xfrm_state_check_expire(struct xfrm_state *x)
 	if (!x->curlft.use_time)
 		x->curlft.use_time = get_seconds();
 
-	if (x->km.state != XFRM_STATE_VALID)
-		return -EINVAL;
-
 	if (x->curlft.bytes >= x->lft.hard_byte_limit ||
 	    x->curlft.packets >= x->lft.hard_packet_limit) {
 		x->km.state = XFRM_STATE_EXPIRED;
-- 
1.7.10.4

^ permalink raw reply related

* Re: 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-12-19  6:13 UTC (permalink / raw)
  To: Yijing Wang
  Cc: netdev@vger.kernel.org, Jon Mason, linux-kernel@vger.kernel.org,
	Bjorn Helgaas, e1000-devel@lists.sf.net, Mary Mcgrath, linux-pci,
	Ben Hutchings, Ethan Zhao
In-Reply-To: <50D15600.6060001@huawei.com>

Hi Yijing,

Thanks for your reference, the patch looks good for me, but I have no chance
to test it on customer's env.

Best Regards,
Joe

On 12/19/12 13:52, Yijing Wang wrote:
> On 2012/12/19 11:04, Joe Jin wrote:
>> Hi all,
>>
>> I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
>> to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.
>>
> 
> Hi Joe,
>    I found similar problem when I do pci hotplug, discussion is here:http://marc.info/?l=linux-pci&m=134810569924220&w=2.
> We try to improve Linux kernel to debug this problem easily based Bjorn's suggestion. Jon sent out the first version patch http://marc.info/?l=linux-pci&m=135002016005274&w=2.
> I think we can do further here, http://marc.info/?l=linux-pci&m=135115581307869&w=2. I hope this information can help you.
> 
> Thanks!
> Yijing.
> 
>> Thanks all of your help.
>>
>> Best Regards,
>> Joe
>>
>> On 11/29/12 23:52, Fujinaka, Todd wrote:
>>> Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
>>>
>>> Todd Fujinaka
>>> Technical Marketing Engineer
>>> LAN Access Division (LAD)
>>> Intel Corporation
>>> todd.fujinaka@intel.com
>>> (503) 712-4565
>>>
>>>
>>> -----Original Message-----
>>> From: Ethan Zhao [mailto:ethan.kernel@gmail.com] 
>>> Sent: Wednesday, November 28, 2012 7:10 PM
>>> To: Fujinaka, Todd
>>> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>
>>> Joe,
>>>     Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
>>>     Anyway, to see if is a payload issue or,  you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause.  I thinks it is much easier than modify the BIOS or  eeprom of NIC.
>>>
>>>     e.g.
>>>    set device control register to 0f 00   (128 bytes payload size)
>>>    #   setpci -v -s 00:02.0 98.w=000f
>>>    set device link control register to 60h (retrain the link)
>>>    #  setpci -v -s 00:02.0 a0.b=60
>>>
>>>   Hope it works,  Just my 2 cents.
>>>
>>> Ethan.zhao@oracle.com
>>>
>>> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
>>>> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>>>
>>>> Todd Fujinaka
>>>> Technical Marketing Engineer
>>>> LAN Access Division (LAD)
>>>> Intel Corporation
>>>> todd.fujinaka@intel.com
>>>> (503) 712-4565
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Joe Jin [mailto:joe.jin@oracle.com]
>>>> Sent: Wednesday, November 28, 2012 12:31 AM
>>>> To: Ben Hutchings
>>>> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org; 
>>>> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>>
>>>> On 11/28/12 02:10, Ben Hutchings wrote:
>>>>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>>>>> Forgive me if I'm being too repetitious as I think some of this has 
>>>>>> been mentioned in the past.
>>>>>>
>>>>>> We (and by we I mean the Ethernet part and driver) can only change 
>>>>>> the advertised availability of a larger MaxPayloadSize. The size is 
>>>>>> negotiated by both sides of the link when the link is established.
>>>>>> The driver should not change the size of the link as it would be 
>>>>>> poking at registers outside of its scope and is controlled by the 
>>>>>> upstream bridge (not us).
>>>>> [...]
>>>>>
>>>>> MaxPayloadSize (MPS) is not negotiated between devices but is 
>>>>> programmed by the system firmware (at least for devices present at 
>>>>> boot - the kernel may be responsible in case of hotplug).  You can 
>>>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
>>>>> others) to set a policy that overrides this, but no policy will allow 
>>>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>>>
>>>>
>>>> Ben,
>>>>
>>>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>>>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>>>
>>>>
>>>> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>>>>
>>>> Thanks in advance,
>>>> Joe
>>>>
>>
>>
> 
> 


-- 
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing 

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [RFC PATCH] fix IP_ECN_set_ce
From: roy.qing.li @ 2012-12-19  6:21 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

1. ECN uses the two least significant (right-most) bits of the DiffServ
field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)

2. When setting CE, we should check if ECN Capable Transport supports,
both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
    00: Non ECN-Capable Transport — Non-ECT
    10: ECN Capable Transport — ECT(0)
    01: ECN Capable Transport — ECT(1)
    11: Congestion Encountered — CE

3. Remove the misunderstand comment

4. fix the checksum computation

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 include/net/inet_ecn.h |   22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index aab7375..545a683 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
 
 static inline int IP_ECN_set_ce(struct iphdr *iph)
 {
-	u32 check = (__force u32)iph->check;
-	u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
-
-	/*
-	 * After the last operation we have (in binary):
-	 * INET_ECN_NOT_ECT => 01
-	 * INET_ECN_ECT_1   => 10
-	 * INET_ECN_ECT_0   => 11
-	 * INET_ECN_CE      => 00
-	 */
-	if (!(ecn & 2))
+	u32 ecn = iph->tos & INET_ECN_MASK;
+
+	if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
 		return !ecn;
 
-	/*
-	 * The following gives us:
-	 * INET_ECN_ECT_1 => check += htons(0xFFFD)
-	 * INET_ECN_ECT_0 => check += htons(0xFFFE)
-	 */
-	check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
+	csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
 
-	iph->check = (__force __sum16)(check + (check>=0xFFFF));
 	iph->tos |= INET_ECN_CE;
 	return 1;
 }
-- 
1.7.10.4

^ permalink raw reply related

* Re: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-19  6:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Ben Greear, David Miller, Stephen Hemminger, Rick Jones,
	Thomas Graf
In-Reply-To: <1355848224.9380.30.camel@edumazet-glaptop>

On Tue, 2012-12-18 at 08:30 -0800, Eric Dumazet wrote:
>  
> 
> ACKS might also be delayed because of bidirectional traffic, and is more
> controlled by the application response time. TCP stack can not easily
> estimate it.

So we still need a knob?

> 
> If you focus on bulk receive, LRO/GRO should already lower number of
> ACKS to an acceptable level and without major disruption.

Indeed.

> 
> Stretch acks are not only the receiver concern, there are issues for the
> sender that you cannot always control/change.
> 
> I recommend reading RFC2525 2.13
> 

Very helpful information!

On the sender's side, it needs to "notify" the receiver not to send
stretch acks when it is in slow-start. But I think the receiver can
detect slow-start too on its own side (based one the window size?).

Thanks!

^ permalink raw reply

* RE: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-19  7:00 UTC (permalink / raw)
  To: David Laight
  Cc: netdev, Ben Greear, David Miller, Eric Dumazet, Stephen Hemminger,
	Rick Jones, Thomas Graf
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70F4@saturn3.aculab.com>

On Tue, 2012-12-18 at 16:39 +0000, David Laight wrote:
> There are problems with only implementing the acks
> specified by RFC1122. 

Yeah, the problem is if we can violate this RFC for getting better
performance. Or it is just a no-no?

Although RFC 2525 mentions this as "Stretch ACK Violation", I am still
not sure if that means we can violate RFC1122 legally.

Thanks.

^ permalink raw reply

* Re: [PATCH] bridge: Correctly encode addresses when dumping mdb entries
From: Cong Wang @ 2012-12-19  7:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1355867648-17316-1-git-send-email-vyasevic@redhat.com>

On Tue, 18 Dec 2012 at 21:54 GMT, Vlad Yasevich <vyasevic@redhat.com> wrote:
> When dumping mdb table, set the addresses the kernel returns
> based on the address protocol type.
>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

Looks good.

Acked-by: Cong Wang <amwang@redhat.com>

^ permalink raw reply

* Re: [PATCH v2 2/2] net: asix: handle packets crossing URB boundaries
From: Christian Riesch @ 2012-12-19  7:50 UTC (permalink / raw)
  To: Lucas Stach
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Oliver Neukum,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1355844083-14285-2-git-send-email-dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>

Hi Lucas,

On 2012-12-18 16:21, Lucas Stach wrote:
> ASIX AX88772B started to pack data even more tightly. Packets and the ASIX packet
> header may now cross URB boundaries. To handle this we have to introduce
> some state between individual calls to asix_rx_fixup().

cc'ed Eric Dumazet, he did some asix_rx_fixup fixes in spring.

>
> Signed-off-by: Lucas Stach <dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>
> ---
> I've running this patch for some weeks already now and it gets rid of all
> the commonly seen rx failures with AX88772B.
>
> v2: don't forget to free driver_private
> ---
>   drivers/net/usb/asix.h         | 11 ++++++
>   drivers/net/usb/asix_common.c  | 81 +++++++++++++++++++++++++++++-------------
>   drivers/net/usb/asix_devices.c | 17 +++++++++
>   3 Dateien geändert, 85 Zeilen hinzugefügt(+), 24 Zeilen entfernt(-)

Your patch breaks the driver of the AX88172A in 
drivers/net/usb/ax88172a.c. It uses the asix_rx_fixup() function as 
well, but you did not add the struct asix_rx_fixup_info to its 
driver_priv struct.

Regards, Christian

>
> diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h
> index 7afe8ac..4dfdbf6 100644
> --- a/drivers/net/usb/asix.h
> +++ b/drivers/net/usb/asix.h
> @@ -167,6 +167,17 @@ struct asix_data {
>   	u8 res;
>   };
>
> +struct asix_rx_fixup_info {
> +	struct sk_buff *ax_skb;
> +	u32 header;
> +	u16 size;
> +	bool split_head;
> +};
> +
> +struct asix_private {
> +	struct asix_rx_fixup_info rx_fixup_info;
> +};
> +
>   /* ASIX specific flags */
>   #define FLAG_EEPROM_MAC		(1UL << 0)  /* init device MAC from eeprom */
>
> diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
> index 50d1673..17f9801 100644
> --- a/drivers/net/usb/asix_common.c
> +++ b/drivers/net/usb/asix_common.c
> @@ -53,44 +53,77 @@ void asix_write_cmd_async(struct usbnet *dev, u8 cmd, u16 value, u16 index,
>
>   int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>   {
> +	struct asix_private *dp = dev->driver_priv;
> +	struct asix_rx_fixup_info *rx = &dp->rx_fixup_info;
>   	int offset = 0;
>
> -	while (offset + sizeof(u32) < skb->len) {
> -		struct sk_buff *ax_skb;
> -		u16 size;
> -		u32 header = get_unaligned_le32(skb->data + offset);
> -
> -		offset += sizeof(u32);
> -
> -		/* get the packet length */
> -		size = (u16) (header & 0x7ff);
> -		if (size != ((~header >> 16) & 0x07ff)) {
> -			netdev_err(dev->net, "asix_rx_fixup() Bad Header Length\n");
> -			return 0;
> +	while (offset + sizeof(u16) <= skb->len) {
> +		u16 remaining = 0;
> +		unsigned char *data;
> +
> +		if (!rx->size) {
> +			if ((skb->len - offset == sizeof(u16)) ||
> +			    rx->split_head) {
> +				if(!rx->split_head) {
> +					rx->header = get_unaligned_le16(
> +							skb->data + offset);
> +					rx->split_head = true;
> +					offset += sizeof(u16);
> +					break;
> +				} else {
> +					rx->header |= (get_unaligned_le16(
> +							skb->data + offset)
> +							<< 16);
> +					rx->split_head = false;
> +					offset += sizeof(u16);
> +				}
> +			} else {
> +				rx->header = get_unaligned_le32(skb->data +
> +								offset);
> +				offset += sizeof(u32);
> +			}
> +
> +			/* get the packet length */
> +			rx->size = (u16) (rx->header & 0x7ff);
> +			if (rx->size != ((~rx->header >> 16) & 0x7ff)) {
> +				netdev_err(dev->net, "asix_rx_fixup() Bad Header Length 0x%x, offset %d\n",
> +					   rx->header, offset);
> +				rx->size = 0;
> +				return 0;
> +			}
> +			rx->ax_skb = netdev_alloc_skb_ip_align(dev->net,
> +							       rx->size);
> +			if (!rx->ax_skb)
> +				return 0;
>   		}
>
> -		if ((size > dev->net->mtu + ETH_HLEN + VLAN_HLEN) ||
> -		    (size + offset > skb->len)) {
> +		if (rx->size > dev->net->mtu + ETH_HLEN + VLAN_HLEN) {
>   			netdev_err(dev->net, "asix_rx_fixup() Bad RX Length %d\n",
> -				   size);
> +				   rx->size);
> +			kfree_skb(rx->ax_skb);
>   			return 0;
>   		}
> -		ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
> -		if (!ax_skb)
> -			return 0;
>
> -		skb_put(ax_skb, size);
> -		memcpy(ax_skb->data, skb->data + offset, size);
> -		usbnet_skb_return(dev, ax_skb);
> +		if (rx->size > skb->len - offset) {
> +			remaining = rx->size - (skb->len - offset);
> +			rx->size = skb->len - offset;
> +		}
> +
> +		data = skb_put(rx->ax_skb, rx->size);
> +		memcpy(data, skb->data + offset, rx->size);
> +		if (!remaining)
> +			usbnet_skb_return(dev, rx->ax_skb);
>
> -		offset += (size + 1) & 0xfffe;
> +		offset += (rx->size + 1) & 0xfffe;
> +		rx->size = remaining;
>   	}
>
>   	if (skb->len != offset) {
> -		netdev_err(dev->net, "asix_rx_fixup() Bad SKB Length %d\n",
> -			   skb->len);
> +		netdev_err(dev->net, "asix_rx_fixup() Bad SKB Length %d, %d\n",
> +			   skb->len, offset);
>   		return 0;
>   	}
> +
>   	return 1;
>   }
>
> diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
> index 0ecc3bc..c651243 100644
> --- a/drivers/net/usb/asix_devices.c
> +++ b/drivers/net/usb/asix_devices.c
> @@ -495,9 +495,19 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
>   		dev->rx_urb_size = 2048;
>   	}
>
> +	dev->driver_priv = kzalloc(sizeof(struct asix_private), GFP_KERNEL);
> +	if (!dev->driver_priv)
> +		return -ENOMEM;
> +
>   	return 0;
>   }
>
> +void ax88772_unbind(struct usbnet *dev, struct usb_interface *intf)
> +{
> +	if (dev->driver_priv)
> +		kfree(dev->driver_priv);
> +}
> +
>   static const struct ethtool_ops ax88178_ethtool_ops = {
>   	.get_drvinfo		= asix_get_drvinfo,
>   	.get_link		= asix_get_link,
> @@ -829,6 +839,10 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
>   		dev->rx_urb_size = 2048;
>   	}
>
> +	dev->driver_priv = kzalloc(sizeof(struct asix_private), GFP_KERNEL);
> +	if (!dev->driver_priv)
> +			return -ENOMEM;
> +
>   	return 0;
>   }
>
> @@ -875,6 +889,7 @@ static const struct driver_info hawking_uf200_info = {
>   static const struct driver_info ax88772_info = {
>   	.description = "ASIX AX88772 USB 2.0 Ethernet",
>   	.bind = ax88772_bind,
> +	.unbind = ax88772_unbind,
>   	.status = asix_status,
>   	.link_reset = ax88772_link_reset,
>   	.reset = ax88772_reset,
> @@ -886,6 +901,7 @@ static const struct driver_info ax88772_info = {
>   static const struct driver_info ax88772b_info = {
>   	.description = "ASIX AX88772B USB 2.0 Ethernet",
>   	.bind = ax88772_bind,
> +	.unbind = ax88772_unbind,
>   	.status = asix_status,
>   	.link_reset = ax88772_link_reset,
>   	.reset = ax88772_reset,
> @@ -899,6 +915,7 @@ static const struct driver_info ax88772b_info = {
>   static const struct driver_info ax88178_info = {
>   	.description = "ASIX AX88178 USB 2.0 Ethernet",
>   	.bind = ax88178_bind,
> +	.unbind = ax88772_unbind,
>   	.status = asix_status,
>   	.link_reset = ax88178_link_reset,
>   	.reset = ax88178_reset,
>

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: Julian Anastasov @ 2012-12-19  8:11 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1355898095-7444-1-git-send-email-roy.qing.li@gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2288 bytes --]


	Hello,

On Wed, 19 Dec 2012, roy.qing.li@gmail.com wrote:

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> 1. ECN uses the two least significant (right-most) bits of the DiffServ
> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
> 
> 2. When setting CE, we should check if ECN Capable Transport supports,
> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>     00: Non ECN-Capable Transport — Non-ECT
>     10: ECN Capable Transport — ECT(0)
>     01: ECN Capable Transport — ECT(1)
>     11: Congestion Encountered — CE
> 
> 3. Remove the misunderstand comment
> 
> 4. fix the checksum computation
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
>  include/net/inet_ecn.h |   22 ++++------------------
>  1 file changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
> index aab7375..545a683 100644
> --- a/include/net/inet_ecn.h
> +++ b/include/net/inet_ecn.h
> @@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
>  
>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>  {
> -	u32 check = (__force u32)iph->check;
> -	u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
> -
> -	/*
> -	 * After the last operation we have (in binary):
> -	 * INET_ECN_NOT_ECT => 01
> -	 * INET_ECN_ECT_1   => 10
> -	 * INET_ECN_ECT_0   => 11
> -	 * INET_ECN_CE      => 00
> -	 */

	I think, the above comment explains how an
increment (iph->tos + 1) serves the purpose to check
for ECT_1 and ECT_0, there is no such thing as
addressing the next byte from header. It is just an
optimized logic that avoids complex INET_ECN_is_XXX
checks.

> -	if (!(ecn & 2))
> +	u32 ecn = iph->tos & INET_ECN_MASK;
> +
> +	if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>  		return !ecn;

	May be return INET_ECN_is_ce(ecn) ?

>  
> -	/*
> -	 * The following gives us:
> -	 * INET_ECN_ECT_1 => check += htons(0xFFFD)
> -	 * INET_ECN_ECT_0 => check += htons(0xFFFE)
> -	 */
> -	check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
> +	csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
>  
> -	iph->check = (__force __sum16)(check + (check>=0xFFFF));
>  	iph->tos |= INET_ECN_CE;
>  	return 1;
>  }
> -- 
> 1.7.10.4

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Shmulik Ladkani @ 2012-12-19  8:10 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <1355857263-31197-1-git-send-email-vyasevic@redhat.com>

Thanks Vlad,

On Tue, 18 Dec 2012 14:00:51 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> A single vlan may also be designated as untagged.  Any untagged traffic
> recieved by the port will be assigned to this vlan.

Why the "untagged vlan" is per-bridge global?
Usually, 802.1q switches define the PVID (port's VID) which controls
the value of VID, in case ingress frame is either untagged or
priority-tagged (per port configuration).
This gives greater flexibility.

> Any traffic exiting
> the port with a VID matching the untagged vlan will exit untagged (the
> bridge will strip the vlan header).  This is similar to "Native Vlan" support
> available in most switches.

802.1q switches usually allow conifguring per-vlan, per-port
tagged/untagged egress policy: each vid has its port membership map and
an accompanying port egress-policy map.
This gives great flexibility defining all sorts of configurations.

Personally, I'd prefer a fully flexible vlan bridge allowing all sorts
of configurations (as available in 802.1q switches).

What's the reason limiting such configurations?

Regards,
Shmulik

^ permalink raw reply

* Re: [patch net-next 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  8:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: netdev, davem, edumazet, bhutchings, mirqus, shemminger, greearb,
	fbl, john.r.fastabend
In-Reply-To: <1355871771.30992.44.camel@dcbw.foobar.com>

Wed, Dec 19, 2012 at 12:02:51AM CET, dcbw@redhat.com wrote:
>On Tue, 2012-12-18 at 23:14 +0100, Jiri Pirko wrote:
>> This allows a driver to register change_carrier callback which will be
>> called whenever user will like to change carrier state. This is useful
>> for devices like dummy, gre, team and so on.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/netdevice.h |  9 +++++++++
>>  net/core/dev.c            | 19 +++++++++++++++++++
>>  2 files changed, 28 insertions(+)
>> 
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 02e0f6b..8047330 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -891,6 +891,11 @@ struct netdev_fcoe_hbainfo {
>>   * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
>>   * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
>>   *			     struct net_device *dev)
>> + *
>> + * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
>> + *	Called to update device carrier. Soft-devices which do not manage
>> + *	real hardware like dummy, team, etc. can define this to provide
>> + *	possibility to set their carrier state.
>
>How about something like:
>
>---
>Called to change device carrier.  Virtual devices (like dummy, team,
>etc) which do not represent real hardware may define this to allow their
>userspace components to manage their virtual carrier state.  Devices
>that determine carrier state from physical hardware properties (eg
>network cables) or protocol-dependent mechanisms (eg
>USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
>---
>
>I'm just worried that without some clearer wording, somebody will end up
>implementing the function for an actual hardware driver down the road
>and expect things to work when they change the carrier to "on" but the
>protocol isn't set up or cable isn't plugged in.  I'm also worried that
>it might be used to simply work around bugs in the drivers that should
>be fixed in the drivers instead.

Okay - I'll repost this single patch with your text as comment.
Thanks.

>
>Dan
>
>>  struct net_device_ops {
>>  	int			(*ndo_init)(struct net_device *dev);
>> @@ -1008,6 +1013,8 @@ struct net_device_ops {
>>  	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
>>  						      u32 pid, u32 seq,
>>  						      struct net_device *dev);
>> +	int			(*ndo_change_carrier)(struct net_device *dev,
>> +						      bool new_carrier);
>>  };
>>  
>>  /*
>> @@ -2194,6 +2201,8 @@ extern int		dev_set_mtu(struct net_device *, int);
>>  extern void		dev_set_group(struct net_device *, int);
>>  extern int		dev_set_mac_address(struct net_device *,
>>  					    struct sockaddr *);
>> +extern int		dev_change_carrier(struct net_device *,
>> +					   bool new_carrier);
>>  extern int		dev_hard_start_xmit(struct sk_buff *skb,
>>  					    struct net_device *dev,
>>  					    struct netdev_queue *txq);
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index d0cbc93..268a714 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5027,6 +5027,25 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
>>  }
>>  EXPORT_SYMBOL(dev_set_mac_address);
>>  
>> +/**
>> + *	dev_change_carrier - Change device carrier
>> + *	@dev: device
>> + *	@new_carries: new value
>> + *
>> + *	Change device carrier
>> + */
>> +int dev_change_carrier(struct net_device *dev, bool new_carrier)
>> +{
>> +	const struct net_device_ops *ops = dev->netdev_ops;
>> +
>> +	if (!ops->ndo_change_carrier)
>> +		return -EOPNOTSUPP;
>> +	if (!netif_device_present(dev))
>> +		return -ENODEV;
>> +	return ops->ndo_change_carrier(dev, new_carrier);
>> +}
>> +EXPORT_SYMBOL(dev_change_carrier);
>> +
>>  /*
>>   *	Perform the SIOCxIFxxx calls, inside rcu_read_lock()
>>   */
>
>

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Jiri Pirko @ 2012-12-19  8:27 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <50D0F23D.4020508@redhat.com>

Tue, Dec 18, 2012 at 11:46:21PM CET, vyasevic@redhat.com wrote:
>On 12/18/2012 05:32 PM, Jiri Pirko wrote:
>>
>>
>>I see that this patchset replicates a lot of code which is already
>>present in net/8021q/ or include/linux/if_vlan.h. I think it would
>>be nice to move this code into some "common" place, wouldn't it?
>>
>
>The only replication that I am aware of is in br_vlan_untag().  I
>thought about pulling that piece out, but I think there is a reason
>why it's not available when 801q support isn't turned on.  I noted that
>openvswitch implemented its own vlan header manipulation functions as well.

openvswitch should use the "common" code as well.

>
>What else are you seeing that's duplicate?

For example I spotted check of ndo_vlan_rx_[add/kill]_vid and
NETIF_F_HW_VLAN_FILTER and ndo_vlan_rx_[add/kill]_vid call


>
>-vlad
>
>>Jiri
>>
>>Tue, Dec 18, 2012 at 08:00:51PM CET, vyasevic@redhat.com wrote:
>>>This series of patches provides an ability to add VLANs to the bridge
>>>ports.  This is similar to what can be found in most switches.  The bridge
>>>port may have any number of VLANs added to it including vlan 0 priority tagged
>>>traffic.  When vlans are added to the port, only traffic tagged with particular
>>>vlan will forwarded over this port.  Additionally, vlan ids are added to FDB
>>>entries and become part of the lookup.  This way we correctly identify the FDB
>>>entry.
>>>
>>>A single vlan may also be designated as untagged.  Any untagged traffic
>>>recieved by the port will be assigned to this vlan.  Any traffic exiting
>>>the port with a VID matching the untagged vlan will exit untagged (the
>>>bridge will strip the vlan header).  This is similar to "Native Vlan" support
>>>available in most switches.
>>>
>>>The default behavior ofthe bridge is unchanged if no vlans have been
>>>configured.
>>>
>>>Changes since v1:
>>>- Fixed some forwarding bugs.
>>>- Add vlan to local fdb entries.  New local entries are created per vlan
>>>   to facilite correct forwarding to bridge interface.
>>>- Allow configuration of vlans directly on the bridge master device
>>>   in addition to ports.
>>>
>>>Changes since rfc v2:
>>>- Per-port vlan bitmap is gone and is replaced with a vlan list.
>>>- Added bridge vlan list, which is referenced by each port.  Entries in
>>>   the birdge vlan list have port bitmap that shows which port are parts
>>>   of which vlan.
>>>- Netlink API changes.
>>>- Dropped sysfs support for now.  If people think this is really usefull,
>>>   can add it back.
>>>- Support for native/untagged vlans.
>>>
>>>Changes since rfc v1:
>>>- Comments addressed regarding formatting and RCU usage
>>>- iocts have been removed and changed over the netlink interface.
>>>- Added support of user added ndb entries.
>>>- changed sysfs interface to export a bitmap.  Also added a write interface.
>>>   I am not sure how much I like it, but it made my testing easier/faster.  I
>>>   might change the write interface to take text instead of binary.
>>>
>>>
>>>Vlad Yasevich (12):
>>>  bridge: Add vlan filtering infrastructure
>>>  bridge: Validate that vlan is permitted on ingress
>>>  bridge: Verify that a vlan is allowed to egress on give port
>>>  bridge: Cache vlan in the cb for faster egress lookup.
>>>  bridge: Add vlan to unicast fdb entries
>>>  bridge: Add vlan id to multicast groups
>>>  bridge: Add netlink interface to configure vlans on bridge ports
>>>  bridge: Add vlan support to static neighbors
>>>  bridge: Add the ability to configure untagged vlans
>>>  bridge: Implement untagged vlan handling
>>>  bridge: Dump vlan information from a bridge port
>>>  bridge: Add vlan support for local fdb entries
>>>
>>>drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    5 +-
>>>drivers/net/macvlan.c                         |    2 +-
>>>drivers/net/vxlan.c                           |    3 +-
>>>include/linux/netdevice.h                     |    4 +-
>>>include/uapi/linux/if_bridge.h                |   23 ++-
>>>include/uapi/linux/neighbour.h                |    1 +
>>>include/uapi/linux/rtnetlink.h                |    1 +
>>>net/bridge/br_device.c                        |   34 ++-
>>>net/bridge/br_fdb.c                           |  253 ++++++++++++---
>>>net/bridge/br_forward.c                       |  160 ++++++++++
>>>net/bridge/br_if.c                            |  404 ++++++++++++++++++++++++-
>>>net/bridge/br_input.c                         |   65 ++++-
>>>net/bridge/br_multicast.c                     |   71 +++--
>>>net/bridge/br_netlink.c                       |  178 ++++++++++--
>>>net/bridge/br_private.h                       |   71 ++++-
>>>net/core/rtnetlink.c                          |   40 ++-
>>>16 files changed, 1190 insertions(+), 125 deletions(-)
>>>
>>>--
>>>1.7.7.6
>>>
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: RongQing Li @ 2012-12-19  8:41 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1212191001530.1620@ja.ssi.bg>

2012/12/19 Julian Anastasov <ja@ssi.bg>:
>
>         Hello,
>
> On Wed, 19 Dec 2012, roy.qing.li@gmail.com wrote:
>
>> From: Li RongQing <roy.qing.li@gmail.com>
>>
>> 1. ECN uses the two least significant (right-most) bits of the DiffServ
>> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
>>
>> 2. When setting CE, we should check if ECN Capable Transport supports,
>> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>>     00: Non ECN-Capable Transport — Non-ECT
>>     10: ECN Capable Transport — ECT(0)
>>     01: ECN Capable Transport — ECT(1)
>>     11: Congestion Encountered — CE
>>
>> 3. Remove the misunderstand comment
>>
>> 4. fix the checksum computation
>>
>> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
>> ---
>>  include/net/inet_ecn.h |   22 ++++------------------
>>  1 file changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
>> index aab7375..545a683 100644
>> --- a/include/net/inet_ecn.h
>> +++ b/include/net/inet_ecn.h
>> @@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
>>
>>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>>  {
>> -     u32 check = (__force u32)iph->check;
>> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>> -
>> -     /*
>> -      * After the last operation we have (in binary):
>> -      * INET_ECN_NOT_ECT => 01
>> -      * INET_ECN_ECT_1   => 10
>> -      * INET_ECN_ECT_0   => 11
>> -      * INET_ECN_CE      => 00
>> -      */
>
>         I think, the above comment explains how an
> increment (iph->tos + 1) serves the purpose to check
> for ECT_1 and ECT_0, there is no such thing as
> addressing the next byte from header. It is just an
> optimized logic that avoids complex INET_ECN_is_XXX
> checks.
Thanks for your reply.
Do you mean this comment are valuable?


>
>> -     if (!(ecn & 2))
>> +     u32 ecn = iph->tos & INET_ECN_MASK;
>> +
>> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>>               return !ecn;
>
>         May be return INET_ECN_is_ce(ecn) ?
>

I like to set the return value to void, since noone cares about the
return value.

-Roy

>>
>> -     /*
>> -      * The following gives us:
>> -      * INET_ECN_ECT_1 => check += htons(0xFFFD)
>> -      * INET_ECN_ECT_0 => check += htons(0xFFFE)
>> -      */
>> -     check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
>> +     csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
>>
>> -     iph->check = (__force __sum16)(check + (check>=0xFFFF));
>>       iph->tos |= INET_ECN_CE;
>>       return 1;
>>  }
>> --
>> 1.7.10.4
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: Julian Anastasov @ 2012-12-19  8:58 UTC (permalink / raw)
  To: RongQing Li; +Cc: netdev
In-Reply-To: <CAJFZqHyhMS22pKuW50D0uZLA5-WDUYBEZurPGCW6gSyRY-6JHg@mail.gmail.com>


	Hello,

On Wed, 19 Dec 2012, RongQing Li wrote:

> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
> >>  {
> >> -     u32 check = (__force u32)iph->check;
> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
> >> -
> >> -     /*
> >> -      * After the last operation we have (in binary):
> >> -      * INET_ECN_NOT_ECT => 01
> >> -      * INET_ECN_ECT_1   => 10
> >> -      * INET_ECN_ECT_0   => 11
> >> -      * INET_ECN_CE      => 00
> >> -      */
> >
> >         I think, the above comment explains how an
> > increment (iph->tos + 1) serves the purpose to check
> > for ECT_1 and ECT_0, there is no such thing as
> > addressing the next byte from header. It is just an
> > optimized logic that avoids complex INET_ECN_is_XXX
> > checks.
> Thanks for your reply.
> Do you mean this comment are valuable?

	It looks better to me with the comment and the
original checks. But I can't comment the correctness of
the other changes in your patch.

> >> -     if (!(ecn & 2))
> >> +     u32 ecn = iph->tos & INET_ECN_MASK;
> >> +
> >> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
> >>               return !ecn;
> >
> >         May be return INET_ECN_is_ce(ecn) ?
> >
> 
> I like to set the return value to void, since noone cares about the
> return value.

	It is used by INET_ECN_set_ce and its users in
net/sched/

> -Roy

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [patch net-next V3 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  9:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend
In-Reply-To: <1355868858-1713-2-git-send-email-jiri@resnulli.us>

This allows a driver to register change_carrier callback which will be
called whenever user will like to change carrier state. This is useful
for devices like dummy, gre, team and so on.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h | 12 ++++++++++++
 net/core/dev.c            | 19 +++++++++++++++++++
 2 files changed, 31 insertions(+)

V2->V3: updated comment by Dan Williams

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 02e0f6b..22ca4a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -891,6 +891,14 @@ struct netdev_fcoe_hbainfo {
  * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
  * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
  *			     struct net_device *dev)
+ *
+ * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
+ *	Called to change device carrier. Soft-devices (like dummy, team, etc)
+ *	which do not represent real hardware may define this to allow their
+ *	userspace components to manage their virtual carrier state. Devices
+ *	that determine carrier state from physical hardware properties (eg
+ *	network cables) or protocol-dependent mechanisms (eg
+ *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1008,6 +1016,8 @@ struct net_device_ops {
 	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
 						      u32 pid, u32 seq,
 						      struct net_device *dev);
+	int			(*ndo_change_carrier)(struct net_device *dev,
+						      bool new_carrier);
 };
 
 /*
@@ -2194,6 +2204,8 @@ extern int		dev_set_mtu(struct net_device *, int);
 extern void		dev_set_group(struct net_device *, int);
 extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
+extern int		dev_change_carrier(struct net_device *,
+					   bool new_carrier);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev,
 					    struct netdev_queue *txq);
diff --git a/net/core/dev.c b/net/core/dev.c
index d0cbc93..268a714 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5027,6 +5027,25 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
 }
 EXPORT_SYMBOL(dev_set_mac_address);
 
+/**
+ *	dev_change_carrier - Change device carrier
+ *	@dev: device
+ *	@new_carries: new value
+ *
+ *	Change device carrier
+ */
+int dev_change_carrier(struct net_device *dev, bool new_carrier)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_change_carrier)
+		return -EOPNOTSUPP;
+	if (!netif_device_present(dev))
+		return -ENODEV;
+	return ops->ndo_change_carrier(dev, new_carrier);
+}
+EXPORT_SYMBOL(dev_change_carrier);
+
 /*
  *	Perform the SIOCxIFxxx calls, inside rcu_read_lock()
  */
-- 
1.8.0

^ permalink raw reply related

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: RongQing Li @ 2012-12-19  9:11 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1212191051590.1906@ja.ssi.bg>

2012/12/19 Julian Anastasov <ja@ssi.bg>:
>
>         Hello,
>
> On Wed, 19 Dec 2012, RongQing Li wrote:
>
>> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>> >>  {
>> >> -     u32 check = (__force u32)iph->check;
>> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>> >> -
>> >> -     /*
>> >> -      * After the last operation we have (in binary):
>> >> -      * INET_ECN_NOT_ECT => 01
>> >> -      * INET_ECN_ECT_1   => 10
>> >> -      * INET_ECN_ECT_0   => 11
>> >> -      * INET_ECN_CE      => 00
>> >> -      */
>> >
>> >         I think, the above comment explains how an
>> > increment (iph->tos + 1) serves the purpose to check
>> > for ECT_1 and ECT_0, there is no such thing as
>> > addressing the next byte from header. It is just an
>> > optimized logic that avoids complex INET_ECN_is_XXX
>> > checks.
>> Thanks for your reply.
>> Do you mean this comment are valuable?
>
>         It looks better to me with the comment and the
> original checks. But I can't comment the correctness of
> the other changes in your patch.

I do not know how they are useful, and how the original check
works, since the value in comments are wrong, the correct is:

enum {
        INET_ECN_NOT_ECT = 0,
        INET_ECN_ECT_1 = 1,
        INET_ECN_ECT_0 = 2,
        INET_ECN_CE = 3,
        INET_ECN_MASK = 3,
};


   00: Non ECN-Capable Transport — Non-ECT
    10: ECN Capable Transport — ECT(0)
    01: ECN Capable Transport — ECT(1)
    11: Congestion Encountered — CE

-Roy


>
>> >> -     if (!(ecn & 2))
>> >> +     u32 ecn = iph->tos & INET_ECN_MASK;
>> >> +
>> >> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>> >>               return !ecn;
>> >
>> >         May be return INET_ECN_is_ce(ecn) ?
>> >
>>
>> I like to set the return value to void, since noone cares about the
>> return value.
>
>         It is used by INET_ECN_set_ce and its users in
> net/sched/
>
>> -Roy
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* RE: [PATCH v2] netlink: align attributes on 64-bits
From: David Laight @ 2012-12-19  9:17 UTC (permalink / raw)
  To: Thomas Graf; +Cc: nicolas.dichtel, bhutchings, netdev, davem
In-Reply-To: <20121218171103.GI27746@casper.infradead.org>

> > Consider the structure:
> > 	struct bar {
> > 		__u32 foo1;
> > 		__u64 foo2;
> > 	}
> > On i386 it will have size 12 and foo2 will be at offset 4.
> > On sparc32 (and most 64bit) it will have size 16 with foo2
> > at offset 8 (and 4 bytes of pad after foo1).
> 
> This is a known problem and I can't think of anything
> that can be done about it except for memcpy()ing the
> data before accessing it.

You can't use memcpy() to copy a pointer to a misaligned
structure into an aligned buffer. The compiler assumes
the pointer is aligned and will use instructions that
depend on the alignment.

> If you have ideas, I'm more that willing to listen :)
> ... Netlink has and will be host bound. It also
> uses host byte order for that reason.

I think:
1) Alignment is only needed on systems that have 'strict alignment'
   requirements (maybe disable for testing?)
2) Alignment is only needed for parameters whose size is a
   multiple of the alignment (a structure containing a
   field that needs 8 byte alignment will always be a multiple
   of 8 bytes long).
3) You need to add NA_HDR_LEN to the write pointer before
   determining the size of the pad.

So a structure of three uint32_t will never need aligning.

	David

^ permalink raw reply

* Re: [patch net-next 1/4] net: add change_carrier netdev op
From: David Miller @ 2012-12-19  9:28 UTC (permalink / raw)
  To: jiri
  Cc: dcbw, netdev, edumazet, bhutchings, mirqus, shemminger, greearb,
	fbl, john.r.fastabend
In-Reply-To: <20121219082050.GA1637@minipsycho.orion>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 19 Dec 2012 09:20:50 +0100

> Okay - I'll repost this single patch with your text as comment.

Please always repost an entire series, rather than just part of
it.

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: David Miller @ 2012-12-19  9:31 UTC (permalink / raw)
  To: roy.qing.li; +Cc: ja, netdev
In-Reply-To: <CAJFZqHzFvspqFyZuZsTSyOOKdCq7SS=xMY5hH3kycaqO+=bXGw@mail.gmail.com>

From: RongQing Li <roy.qing.li@gmail.com>
Date: Wed, 19 Dec 2012 17:11:59 +0800

> 2012/12/19 Julian Anastasov <ja@ssi.bg>:
>>
>>         Hello,
>>
>> On Wed, 19 Dec 2012, RongQing Li wrote:
>>
>>> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>>> >>  {
>>> >> -     u32 check = (__force u32)iph->check;
>>> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>>> >> -
>>> >> -     /*
>>> >> -      * After the last operation we have (in binary):
>>> >> -      * INET_ECN_NOT_ECT => 01
>>> >> -      * INET_ECN_ECT_1   => 10
>>> >> -      * INET_ECN_ECT_0   => 11
>>> >> -      * INET_ECN_CE      => 00
>>> >> -      */
>>> >
>>> >         I think, the above comment explains how an
>>> > increment (iph->tos + 1) serves the purpose to check
>>> > for ECT_1 and ECT_0, there is no such thing as
>>> > addressing the next byte from header. It is just an
>>> > optimized logic that avoids complex INET_ECN_is_XXX
>>> > checks.
>>> Thanks for your reply.
>>> Do you mean this comment are valuable?
>>
>>         It looks better to me with the comment and the
>> original checks. But I can't comment the correctness of
>> the other changes in your patch.
> 
> I do not know how they are useful, and how the original check
> works, since the value in comments are wrong, the correct is:
> 
> enum {
>         INET_ECN_NOT_ECT = 0,
>         INET_ECN_ECT_1 = 1,
>         INET_ECN_ECT_0 = 2,
>         INET_ECN_CE = 3,
>         INET_ECN_MASK = 3,
> };
> 
> 
>    00: Non ECN-Capable Transport ― Non-ECT
>     10: ECN Capable Transport ― ECT(0)
>     01: ECN Capable Transport ― ECT(1)
>     11: Congestion Encountered ― CE

You really don't understand the comment, it is saying what
the incremented value corresponds to, ECN wise.

If iph->tos + 1 is 01, we had INET_ECN_NOT_ECT in iph->tos to
begine with, and so on an so forth.

Because you are having so much trouble with this most fundamental
aspect of this code, I have zero confidence in your being able to
make reasonable changes here.

I am not applying this patch.

^ permalink raw reply

* Re: [patch net-next 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  9:32 UTC (permalink / raw)
  To: David Miller
  Cc: dcbw, netdev, edumazet, bhutchings, mirqus, shemminger, greearb,
	fbl, john.r.fastabend
In-Reply-To: <20121219.012825.1076148821070092794.davem@davemloft.net>

Wed, Dec 19, 2012 at 10:28:25AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Wed, 19 Dec 2012 09:20:50 +0100
>
>> Okay - I'll repost this single patch with your text as comment.
>
>Please always repost an entire series, rather than just part of
>it.

Sorry, I thought that in this case where the change is in one patch and
only cosmethic it wouldn't be a problem. Reposting the whole patchset.

^ permalink raw reply

* [patch net-next V3 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-19  9:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend

This is basically a V3 of a repost of my previous patchset:
"[patch net-next-2.6 0/2] net: allow to change carrier via sysfs" from Aug 30

The way net-sysfs stores values changed and this patchset reflects it.
Also, I exposed carrier via rtnetlink iface.

So far, only dummy driver uses carrier change ndo. In very near future
team driver will use that as well.

V2->V3:
 - updated ndo_change_carrier comment by Dan Williams

V1->v2:
 - added bigger comment to ndo and also note to operstate.txt documentation
   stating the clear purpose of this iface

Jiri Pirko (4):
  net: add change_carrier netdev op
  net: allow to change carrier via sysfs
  rtnl: expose carrier value with possibility to set it
  dummy: implement carrier change

 Documentation/networking/operstates.txt |  4 ++++
 drivers/net/dummy.c                     | 10 ++++++++++
 include/linux/netdevice.h               | 12 ++++++++++++
 include/uapi/linux/if_link.h            |  1 +
 net/core/dev.c                          | 19 +++++++++++++++++++
 net/core/net-sysfs.c                    | 15 ++++++++++++++-
 net/core/rtnetlink.c                    | 10 ++++++++++
 7 files changed, 70 insertions(+), 1 deletion(-)

-- 
1.8.0

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox