Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/6] drivers/net/can/softing/softing_main.c: ensure a consistent return value in error case
From: Julia Lawall @ 2012-07-14 16:43 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: kernel-janitors, Marc Kleine-Budde, linux-can, netdev,
	linux-kernel
In-Reply-To: <1342284188-19176-1-git-send-email-Julia.Lawall@lip6.fr>

From: Julia Lawall <Julia.Lawall@lip6.fr>

Typically, the return value desired for the failure of a function with an
integer return value is a negative integer.  In these cases, the return
value is sometimes a negative integer and sometimes 0, due to a subsequent
initialization of the return variable within the loop.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

//<smpl>
@r exists@
identifier ret;
position p;
constant C;
expression e1,e3,e4;
statement S;
@@

ret = -C
... when != ret = e3
    when any
if@p (...) S
... when any
if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; }
... when != ret = e3
    when any
*if@p (...)
{
  ... when != ret = e4
  return ret;
}
//</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
 drivers/net/can/softing/softing_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/softing/softing_main.c b/drivers/net/can/softing/softing_main.c
index a7c77c7..f2a221e 100644
--- a/drivers/net/can/softing/softing_main.c
+++ b/drivers/net/can/softing/softing_main.c
@@ -826,12 +826,12 @@ static __devinit int softing_pdev_probe(struct platform_device *pdev)
 		goto sysfs_failed;
 	}
 
-	ret = -ENOMEM;
 	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
 		card->net[j] = netdev =
 			softing_netdev_create(card, card->id.chip[j]);
 		if (!netdev) {
 			dev_alert(&pdev->dev, "failed to make can[%i]", j);
+			ret = -ENOMEM;
 			goto netdev_failed;
 		}
 		priv = netdev_priv(card->net[j]);


^ permalink raw reply related

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Greg KH @ 2012-07-14 17:04 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-1-git-send-email-jon.mason@intel.com>

On Fri, Jul 13, 2012 at 02:44:59PM -0700, Jon Mason wrote:
> The NTB device driver is needed to configure these memory windows, doorbell, and
> scratch-pad registers as well as use them in such a way as they can be turned
> into a viable communication channel to the remote system.  ntb_hw.[ch]
> determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
> the underlying hardware to provide access and a common interface to the doorbell
> registers, scratch pads, and memory windows.  These hardware interfaces are
> exported so that other, non-mainlined kernel drivers can access these.

Why would you have non-mainlined drivers?

Can you submit the drivers at the same time so we see how you are using
these new interfaces?

> +++ b/drivers/ntb/ntb_hw.c
> @@ -0,0 +1,1283 @@
> +/*
> + * This file is provided under a dual BSD/GPLv2 license.  When using or
> + *   redistributing this file, you may do so under either license.
> + *
> + *   GPL LICENSE SUMMARY
> + *
> + *   Copyright(c) 2012 Intel Corporation. All rights reserved.
> + *
> + *   This program is free software; you can redistribute it and/or modify
> + *   it under the terms of version 2 of the GNU General Public License as
> + *   published by the Free Software Foundation.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *   General Public License for more details.
> + *
> + *   You should have received a copy of the GNU General Public License
> + *   along with this program; if not, write to the Free Software
> + *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> + *   The full GNU General Public License is included in this distribution
> + *   in the file called LICENSE.GPL.

You really want to track the office movements of the FSF for the next 40
years?  You should ask your lawyers if you can remove this paragraph.

> +/**
> + * ntb_hw_link_status() - return the hardware link status
> + * @ndev: pointer to ntb_device instance
> + *
> + * Returns true if the hardware is connected to the remote system
> + *
> + * RETURNS: true or false based on the hardware link state
> + */
> +bool ntb_hw_link_status(struct ntb_device *ndev)
> +{
> +	return ndev->link_status == NTB_LINK_UP;
> +}
> +EXPORT_SYMBOL(ntb_hw_link_status);

EXPORT_SYMBOL_GPL() perhaps, for these, and the other symbols you are
creating?

thanks,

greg k-h

^ permalink raw reply

* Re: PROBLEM: Silent data corruption when using sendfile()
From: Johannes Truschnigg @ 2012-07-14 17:09 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Eric Dumazet, Hillf Danton, linux-kernel, Linux-Netdev
In-Reply-To: <20120714131540.GQ16256@1wt.eu>

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

On Sat, Jul 14, 2012 at 03:15:40PM +0200, Willy Tarreau wrote:
> On Sat, Jul 14, 2012 at 01:06:07PM +0200, Eric Dumazet wrote:
> [...]
> > Hmmm, this is supposed to fix a bug introduced in 3.4, no ?
> > 
> > So 3.3 kernel should work well ?
> 
> You're right indeed. So maybe it's not the same bug. Or maybe Johannes
> was affected by two different bugs in both versions, since Thorsten's
> report seems to point the finger at the same bug.
> 
> Johannes, are you certain that you were having the exact same issue
> with 3.3 ?

I still have the Linux 3.3-series kernel image around that I _think_ I first
saw the problem occur with. I'll try to reproduce the problem with that
kernel, but I cannot promise that results will be ready before Tuesday. I'll
keep you posted!

-- 
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   http://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Greg KH @ 2012-07-14 17:10 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-1-git-send-email-jon.mason@intel.com>

On Fri, Jul 13, 2012 at 02:44:59PM -0700, Jon Mason wrote:
> +static int max_num_cbs = 2;
> +module_param(max_num_cbs, uint, 0644);
> +MODULE_PARM_DESC(max_num_cbs, "Maximum number of NTB transport connections");
> +
> +static bool no_msix;
> +module_param(no_msix, bool, 0644);
> +MODULE_PARM_DESC(no_msix, "Do not allow MSI-X interrupts to be selected");

How would a user, or a distro, know to set these options?  Why are they
even options at all?


> +struct ntb_device {
> +	struct pci_dev *pdev;
> +	struct msix_entry *msix_entries;
> +	void __iomem *reg_base;
> +	struct ntb_mw mw[NTB_NUM_MW];
> +	struct {
> +		unsigned int max_spads;
> +		unsigned int max_db_bits;
> +		unsigned int msix_cnt;
> +	} limits;
> +	struct {
> +		void __iomem *pdb;
> +		void __iomem *pdb_mask;
> +		void __iomem *sdb;
> +		void __iomem *sbar2_xlat;
> +		void __iomem *sbar4_xlat;
> +		void __iomem *spad_write;
> +		void __iomem *spad_read;
> +		void __iomem *lnk_cntl;
> +		void __iomem *lnk_stat;
> +		void __iomem *spci_cmd;
> +	} reg_ofs;
> +	void *ntb_transport;
> +	void (*event_cb)(void *handle, unsigned int event);

Shouldn't the event be an enum?

> +	struct ntb_db_cb *db_cb;
> +	unsigned char hw_type;
> +	unsigned char conn_type;
> +	unsigned char dev_type;
> +	unsigned char num_msix;
> +	unsigned char bits_per_vector;
> +	unsigned char max_cbs;
> +	unsigned char link_status;
> +	struct delayed_work hb_timer;
> +	unsigned long last_ts;
> +};

Why isn't this either a 'struct device' itself, or why isn't the 'struct
pci_device' embedded within it?  What controls the lifetime of this
device?  Why doesn't it show up in sysfs?  Don't you want it to show up
in the global device tree?

> +static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_JSF)},
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF)},
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_JSF)},
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_SNB)},
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
> +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB)},
> +	{0}
> +};
> +MODULE_DEVICE_TABLE(pci, ntb_pci_tbl);
> +
> +static struct ntb_device *ntbdev;

You can really only have just one of these in the whole system?  Is that
wise?  Why isn't it dynamic and tied to the pci device itself as a
child?

thanks,

greg k-h

^ permalink raw reply

* Re: Crash in CIPSO_V4_TAG_LOCAL handling
From: Lin Ming @ 2012-07-14 17:22 UTC (permalink / raw)
  To: Alan Cox, Paul Moore; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 4634 bytes --]

On Sat, Jul 14, 2012 at 1:08 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> DaveM asked that this goes direct to the list so ...
>
> Looking at some static code analyser output and the results seem valid on
> this one:
>
> ip_options_compile called with skb = NULL and an IPOPT_CIPSO option calls
> into cipso_v4_validate, which if the option is CIPSO_V4_TAG_LOCAL then
> dereferences skb->dev and explodes.

Hi Alan,

I can trigger this oops with the attached simple program(cipso_test.c).

# netlabelctl cipsov4 add local doi:3
# ./cipso_test

It's caused by below code added in commit 15c45f7b.

                case CIPSO_V4_TAG_LOCAL:
                        /* This is a non-standard tag that we only allow for
                         * local connections, so if the incoming interface is
                         * not the loopback device drop the packet. */
                        if (!(skb->dev->flags & IFF_LOOPBACK)) {
                                err_offset = opt_iter;
                                goto validate_return_locked;
                        }

But I don't know how to check if it is loopback device
just after socket() syscall.

sockfd = socket(AF_INET, SOCK_DGRAM, 0);
setsockopt(sockfd, SOL_IP, IP_OPTIONS, &cipso, sizeof(struct cipso));

Obviously, at this time net_device is not assigned yet.
Any idea?

[  346.396250] BUG: unable to handle kernel NULL pointer dereference at 00000014
[  346.400024] IP: [<c171041b>] cipso_v4_validate+0x323/0x363
[  346.400024] *pdpt = 000000001efe4001 *pde = 0000000000000000 
[  346.400024] Oops: 0000 [#1] SMP 
[  346.400024] Modules linked in:
[  346.400024] 
[  346.400024] Pid: 2471, comm: cipso_test Not tainted 3.5.0-rc5-01438-gd322ef7 #28 Bochs Bochs
[  346.400024] EIP: 0060:[<c171041b>] EFLAGS: 00010246 CPU: 0
[  346.400024] EIP is at cipso_v4_validate+0x323/0x363
[  346.400024] EAX: 00000006 EBX: de77d4de ECX: 00000006 EDX: de77d380
[  346.400024] ESI: 00000000 EDI: 00000006 EBP: df61dcb8 ESP: df61dc5c
[  346.400024]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  346.400024] CR0: 8005003b CR2: 00000014 CR3: 1efac000 CR4: 000006b0
[  346.400024] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  346.400024] DR6: ffff0ff0 DR7: 00000400
[  346.400024] Process cipso_test (pid: 2471, ti=df61c000 task=df7f0530 task.ti=df61c000)
[  346.400024] Stack:
[  346.400024]  000000d0 00000040 df404600 de7debe0 0001cd2c df61dc8c 0000000c 0000001c
[  346.400024]  00000000 00000001 c1b2606c df61dc98 c12dfa20 df61dcf4 0c7f0530 de77d4d8
[  346.400024]  c106c380 06f73800 00000006 de77d398 de77d4c8 00000000 0000000c df61dd04
[  346.400024] Call Trace:
[  346.400024]  [<c12dfa20>] ? security_capable+0x1c/0x20
[  346.400024]  [<c106c380>] ? sys_sysctl+0x110/0x140
[  346.400024]  [<c16d2370>] ip_options_compile+0x3c4/0x423
[  346.400024]  [<c10f0999>] ? unlock_page+0x46/0x49
[  346.400024]  [<c110b5ef>] ? __do_fault+0x460/0x474
[  346.400024]  [<c16d1d57>] ? ip_options_get_alloc+0x1b/0x1d
[  346.400024]  [<c16d2401>] ip_options_get_finish+0x32/0x53
[  346.400024]  [<c16d24cf>] ip_options_get_from_user+0x5d/0x65
[  346.400024]  [<c16d5453>] do_ip_setsockopt+0xe3/0xa97
[  346.400024]  [<c105de56>] ? kmap_atomic_prot+0x110/0x112
[  346.400024]  [<c10f2f12>] ? zone_watermark_ok+0x30/0x37
[  346.400024]  [<c10f58e9>] ? get_page_from_freelist+0x38b/0x42c
[  346.400024]  [<c10ef736>] ? find_get_page+0x22/0x6d
[  346.400024]  [<c10f0d2f>] ? filemap_fault+0x7a/0x31a
[  346.400024]  [<c10f0999>] ? unlock_page+0x46/0x49
[  346.400024]  [<c110b5c7>] ? __do_fault+0x438/0x474
[  346.400024]  [<c1664565>] ? sk_prot_alloc+0x25/0xb8
[  346.400024]  [<c16645f8>] ? sk_prot_alloc+0xb8/0xb8
[  346.400024]  [<c16645c7>] ? sk_prot_alloc+0x87/0xb8
[  346.400024]  [<c12dfca1>] ? security_file_alloc+0x14/0x16
[  346.400024]  [<c11333c0>] ? get_empty_filp+0xfb/0x1cd
[  346.400024]  [<c16d5e31>] ip_setsockopt+0x2a/0x8c
[  346.400024]  [<c16ef873>] udp_setsockopt+0x42/0x49
[  346.400024]  [<c1662b91>] sock_common_setsockopt+0x23/0x29
[  346.400024]  [<c16613fa>] sys_setsockopt+0x6e/0x8a
[  346.400024]  [<c16627c4>] sys_socketcall+0x1fa/0x2bc
[  346.400024]  [<c1311ea0>] ? trace_hardirqs_on_thunk+0xc/0x10
[  346.400024]  [<c180f878>] sysenter_do_call+0x12/0x2d
[  346.400024] Code: 3b 4d ec 77 09 66 8b 74 0b 02 66 c1 c6 08 0f b7 ff 3b 7d e4 77 0d 0f b7 f6 89 75 e4 3b 4d ec 72 d1 eb 1d 83 c0 04 eb 2e 8b 75 c4 <8b> 4e 14 f6 81 e4 00 00 00 08 74 1f 80 7d eb 06 74 03 40 eb 16 
[  346.400024] EIP: [<c171041b>] cipso_v4_validate+0x323/0x363 SS:ESP 0068:df61dc5c


>
> All the other cases appear to correctly avoid or check it versus NULL
>
> Alan




[-- Attachment #2: cipso_test.c --]
[-- Type: text/x-csrc, Size: 660 bytes --]

#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>

struct local_tag {
	char type;
	char length;
	char info[4];
};

struct cipso {
	char type;
	char length;
	char doi[4];
	struct local_tag local;	
};

int main(int argc, char **argv)
{
	int sockfd;

	struct cipso cipso = {
		.type = IPOPT_CIPSO,
		.length = sizeof(struct cipso),
		.local = {
			.type = 128,
			.length = sizeof(struct local_tag),
		},
	};

	memset(cipso.doi, 0, 4);
	cipso.doi[3] = 3;

	sockfd = socket(AF_INET, SOCK_DGRAM, 0);

	#define SOL_IP 0
	setsockopt(sockfd, SOL_IP, IP_OPTIONS, &cipso, sizeof(struct cipso));

	return 0;
}

^ permalink raw reply

* Re: [PATCH] iptables: xt_recent: Add optional mask option for xt_recent
From: Denys Fedoryshchenko @ 2012-07-14 17:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Linux netdev
In-Reply-To: <20120714150527.GA6271@1984>

Hi Pablo

On 2012-07-14 18:05, Pablo Neira Ayuso wrote:
> Hi Denys,
>
> On Thu, May 17, 2012 at 11:08:57PM +0300, Denys Fedoryshchenko wrote:
>> Use case for this feature:
>> 1)In some occasions if you need to allow,block,match specific 
>> subnet.
>> 2)I can use recent as a trigger when netfilter rule matches, with 
>> mask 0.0.0.0
>>
>> Tested for backward compatibility:
>> )old (userspace) iptables, new kernel
>> )old kernel, new iptables
>> )new kernel, new iptables
>
> I've cleaned up this patch:
>
> 
> http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=commit;h=73bf03981dfaee48ac1d6da380d46501a96cc83e
>
> It's not yet in master. Please, check that this is correct.
>
> BTW, the man page update is still missing.

Thank you, everything looks ok, i will try to diff(to learn, not repeat 
coding style mistakes in next submissions) and test. Since i'm in desert 
now, it will take few days.
I will prepare also man page patch for iptables.

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: [PATCH] sctp: fix sparse warning for sctp_init_cause_fixed
From: Vlad Yasevich @ 2012-07-14 18:01 UTC (permalink / raw)
  To: Ioan Orghici, sri, davem, netdev
In-Reply-To: <1342199797-11305-1-git-send-email-ioanorghici@gmail.com>

Ioan Orghici <ioanorghici@gmail.com> wrote:

>Fix the following sparse warning:
>	* symbol 'sctp_init_cause_fixed' was not declared. Should it be
>	  static?
>
>Signed-off-by: Ioan Orghici <ioanorghici@gmail.com>

Looks good.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>


>---
> net/sctp/sm_make_chunk.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>index b6de71e..479a70e 100644
>--- a/net/sctp/sm_make_chunk.c
>+++ b/net/sctp/sm_make_chunk.c
>@@ -132,7 +132,7 @@ void  sctp_init_cause(struct sctp_chunk *chunk,
>__be16 cause_code,
>  * abort chunk.  Differs from sctp_init_cause in that it won't oops
>  * if there isn't enough space in the op error chunk
>  */
>-int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
>+static int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16
>cause_code,
> 		      size_t paylen)
> {
> 	sctp_errhdr_t err;
>-- 
>1.7.6.3


-- 
Sent from my Android phone with SkitMail. Please excuse my brevity.

^ permalink raw reply

* Re: [PATCH] sctp: Implement quick failover draft from tsvwg
From: Vlad Yasevich @ 2012-07-14 18:12 UTC (permalink / raw)
  To: Neil Horman, netdev; +Cc: Sridhar Samudrala, David S. Miller, linux-sctp
In-Reply-To: <1342203998-24037-1-git-send-email-nhorman@tuxdriver.com>

Neil Horman <nhorman@tuxdriver.com> wrote:

>I've seen several attempts recently made to do quick failover of sctp
>transports
>by reducing various retransmit timers and counters.  While its possible
>to
>implement a faster failover on multihomed sctp associations, its not
>particularly robust, in that it can lead to unneeded retransmits, as
>well as
>false connection failures due to intermittent latency on a network.
>
>Instead, lets implement the new ietf quick failover draft found here:
>http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
>This will let the sctp stack identify transports that have had a small
>number of
>errors, and avoid using them quickly until their reliability can be
>re-established.  I've tested this out on two virt guests connected via
>multiple
>isolated virt networks and believe its in compliance with the above
>draft and
>works well.
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>CC: Vlad Yasevich <vyasevich@gmail.com>
>CC: Sridhar Samudrala <sri@us.ibm.com>
>CC: "David S. Miller" <davem@davemloft.net>
>CC: linux-sctp@vger.kernel.org
>---
> Documentation/networking/ip-sysctl.txt |   14 +++++++++++++
> include/net/sctp/constants.h           |    1 +
> include/net/sctp/structs.h             |    4 +++
> include/net/sctp/user.h                |    1 +
>net/sctp/associola.c                   |   33
>+++++++++++++++++++++++++------
> net/sctp/outqueue.c                    |    6 +++-
>net/sctp/sm_sideeffect.c               |   33
>++++++++++++++++++++++++++++---
> net/sctp/sysctl.c                      |    9 ++++++++
> net/sctp/transport.c                   |    3 +-
> 9 files changed, 90 insertions(+), 14 deletions(-)
>
>diff --git a/Documentation/networking/ip-sysctl.txt
>b/Documentation/networking/ip-sysctl.txt
>index 47b6c79..c636f9c 100644
>--- a/Documentation/networking/ip-sysctl.txt
>+++ b/Documentation/networking/ip-sysctl.txt
>@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
> 
> 	Default: 5
> 
>+pf_retrans - INTEGER
>+	The number of retransmissions that will be attempted on a given path
>+	before traffic is redirected to an alternate transport (should one
>+	exist).  Note this is distinct from path_max_retrans, as a path that
>+	passes the pf_retrans threshold can still be used.  Its only
>+	deprioritized when a transmission path is selected by the stack. 
>This
>+	setting is primarily used to enable fast failover mechanisms without
>+	having to reduce path_max_retrans to a very low value.  See:
>+	http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
>+	for details.  Note also that a value of pf_retrans > path_max_retrans
>+	disables this feature
>+
>+	Default: 0
>+
> rto_initial - INTEGER
>	The initial round trip timeout value in milliseconds that will be used
> 	in calculating round trip times.  This is the initial time interval
>diff --git a/include/net/sctp/constants.h
>b/include/net/sctp/constants.h
>index 942b864..d053d2e 100644
>--- a/include/net/sctp/constants.h
>+++ b/include/net/sctp/constants.h
>@@ -334,6 +334,7 @@ typedef enum {
> typedef enum {
> 	SCTP_TRANSPORT_UP,
> 	SCTP_TRANSPORT_DOWN,
>+	SCTP_TRANSPORT_PF,
> } sctp_transport_cmd_t;
> 
> /* These are the address scopes defined mainly for IPv4 addresses
>diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>index e4652fe..22825abe 100644
>--- a/include/net/sctp/structs.h
>+++ b/include/net/sctp/structs.h
>@@ -160,6 +160,7 @@ extern struct sctp_globals {
> 	int max_retrans_association;
> 	int max_retrans_path;
> 	int max_retrans_init;
>+	int pf_retrans;
> 
> 	/*
> 	 * Policy for preforming sctp/socket accounting
>@@ -258,6 +259,7 @@ extern struct sctp_globals {
> #define sctp_sndbuf_policy	 	(sctp_globals.sndbuf_policy)
> #define sctp_rcvbuf_policy	 	(sctp_globals.rcvbuf_policy)
> #define sctp_max_retrans_path		(sctp_globals.max_retrans_path)
>+#define sctp_pf_retrans			(sctp_globals.pf_retrans)
> #define sctp_max_retrans_init		(sctp_globals.max_retrans_init)
> #define sctp_sack_timeout		(sctp_globals.sack_timeout)
> #define sctp_hb_interval		(sctp_globals.hb_interval)
>@@ -1660,6 +1662,8 @@ struct sctp_association {
> 	 */
> 	int max_retrans;
> 
>+	int pf_retrans;
>+
> 	/* Maximum number of times the endpoint will retransmit INIT  */
> 	__u16 max_init_attempts;
> 
>diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
>index 0842ef0..cece1bf 100644
>--- a/include/net/sctp/user.h
>+++ b/include/net/sctp/user.h
>@@ -649,6 +649,7 @@ struct sctp_paddrinfo {
>  */
> enum sctp_spinfo_state {
> 	SCTP_INACTIVE,
>+	SCTP_PF,
> 	SCTP_ACTIVE,
> 	SCTP_UNCONFIRMED,
> 	SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
>diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>index 5bc9ab1..f3ebc23 100644
>--- a/net/sctp/associola.c
>+++ b/net/sctp/associola.c
>@@ -124,6 +124,8 @@ static struct sctp_association
>*sctp_association_init(struct sctp_association *a
> 	 * socket values.
> 	 */
> 	asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
>+	asoc->pf_retrans  = sctp_pf_retrans;
>+
> 	asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
> 	asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
> 	asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
>@@ -840,6 +842,7 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 	struct sctp_ulpevent *event;
> 	struct sockaddr_storage addr;
> 	int spc_state = 0;
>+	bool ulp_notify = true;
> 
> 	/* Record the transition on the transport.  */
> 	switch (command) {
>@@ -853,6 +856,14 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 			spc_state = SCTP_ADDR_CONFIRMED;
> 		else
> 			spc_state = SCTP_ADDR_AVAILABLE;
>+		/* Don't inform ULP about transition from PF to
>+		 * active state and set cwnd to 1, see SCTP
>+		 * Quick failover draft section 5.1, point 5
>+		 */
>+		if (transport->state == SCTP_PF) {
>+			ulp_notify = false;
>+			transport->cwnd = 1;
>+		}
> 		transport->state = SCTP_ACTIVE;
> 		break;
> 
>@@ -871,6 +882,10 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 		spc_state = SCTP_ADDR_UNREACHABLE;
> 		break;
> 
>+	case SCTP_TRANSPORT_PF:
>+		transport->state = SCTP_PF;
>+		ulp_notify = false;
>+		break;
> 	default:
> 		return;
> 	}
>@@ -878,12 +893,15 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 	/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
> 	 * user.
> 	 */
>-	memset(&addr, 0, sizeof(struct sockaddr_storage));
>-	memcpy(&addr, &transport->ipaddr,
>transport->af_specific->sockaddr_len);
>-	event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
>-				0, spc_state, error, GFP_ATOMIC);
>-	if (event)
>-		sctp_ulpq_tail_event(&asoc->ulpq, event);
>+	if (ulp_notify) {
>+		memset(&addr, 0, sizeof(struct sockaddr_storage));
>+		memcpy(&addr, &transport->ipaddr,
>+		       transport->af_specific->sockaddr_len);
>+		event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
>+					0, spc_state, error, GFP_ATOMIC);
>+		if (event)
>+			sctp_ulpq_tail_event(&asoc->ulpq, event);
>+	}
> 
> 	/* Select new active and retran paths. */
> 
>@@ -899,7 +917,8 @@ void sctp_assoc_control_transport(struct
>sctp_association *asoc,
> 			transports) {
> 
> 		if ((t->state == SCTP_INACTIVE) ||
>-		    (t->state == SCTP_UNCONFIRMED))
>+		    (t->state == SCTP_UNCONFIRMED) ||
>+		    (t->state == SCTP_PF))
> 			continue;
> 		if (!first || t->last_time_heard > first->last_time_heard) {
> 			second = first;
>diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
>index a0fa19f..e7aa177c 100644
>--- a/net/sctp/outqueue.c
>+++ b/net/sctp/outqueue.c
>@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
>rtx_timeout)
> 			if (!new_transport)
> 				new_transport = asoc->peer.active_path;
> 		} else if ((new_transport->state == SCTP_INACTIVE) ||
>-			   (new_transport->state == SCTP_UNCONFIRMED)) {
>+			   (new_transport->state == SCTP_UNCONFIRMED) ||
>+			   (new_transport->state == SCTP_PF)) {
> 			/* If the chunk is Heartbeat or Heartbeat Ack,
> 			 * send it to chunk->transport, even if it's
> 			 * inactive.
>@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
>rtx_timeout)
> 			new_transport = chunk->transport;
> 			if (!new_transport ||
> 			    ((new_transport->state == SCTP_INACTIVE) ||
>-			     (new_transport->state == SCTP_UNCONFIRMED)))
>+			     (new_transport->state == SCTP_UNCONFIRMED) ||
>+			     (new_transport->state == SCTP_PF)))
> 				new_transport = asoc->peer.active_path;
> 			if (new_transport->state == SCTP_UNCONFIRMED)
> 				continue;
>diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>index c96d1a8..285e26a 100644
>--- a/net/sctp/sm_sideeffect.c
>+++ b/net/sctp/sm_sideeffect.c
>@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type,
>sctp_subtype_t subtype,
> 			     sctp_cmd_seq_t *commands,
> 			     gfp_t gfp);
> 
>+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
>+				     struct sctp_transport *t);
> /********************************************************************
>  * Helper functions
>  ********************************************************************/
>@@ -470,7 +472,8 @@ sctp_timer_event_t
>*sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
>  * notification SHOULD be sent to the upper layer.
>  *
>  */
>-static void sctp_do_8_2_transport_strike(struct sctp_association
>*asoc,
>+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
>+					 struct sctp_association *asoc,
> 					 struct sctp_transport *transport,
> 					 int is_hb)
> {
>@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct
>sctp_association *asoc,
> 			transport->error_count++;
> 	}
> 
>+	/* If the transport error count is greater than the pf_retrans
>+	 * threshold, and less than pathmaxrtx, then mark this transport
>+	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
>+	 * point 1
>+	 */
>+	if ((transport->state != SCTP_PF) &&
>+	   (asoc->pf_retrans < transport->pathmaxrxt) &&
>+	   (transport->error_count > asoc->pf_retrans)) {
>+
>+		sctp_assoc_control_transport(asoc, transport,
>+					     SCTP_TRANSPORT_PF,
>+					     0);
>+
>+		/* Update the hb timer to resend a heartbeat every rto */
>+		sctp_cmd_hb_timer_update(commands, transport);
>+	}
>+
> 	if (transport->state != SCTP_INACTIVE &&
> 	    (transport->error_count > transport->pathmaxrxt)) {
> 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
>@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t
>*cmds,
> 					     SCTP_HEARTBEAT_SUCCESS);
> 	}
> 
>+	if (t->state == SCTP_PF)
>+		sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
>+					     SCTP_HEARTBEAT_SUCCESS);
>+
> 	/* The receiver of the HEARTBEAT ACK should also perform an
> 	 * RTT measurement for that destination transport address
> 	 * using the time value carried in the HEARTBEAT ACK chunk.
>@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t
>event_type,
> 
> 		case SCTP_CMD_STRIKE:
> 			/* Mark one strike against a transport.  */
>-			sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
>-						    0);
>+			sctp_do_8_2_transport_strike(commands, asoc,
>+						    cmd->obj.transport, 0);
> 			break;
> 
> 		case SCTP_CMD_TRANSPORT_IDLE:
>@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t
>event_type,
> 
> 		case SCTP_CMD_TRANSPORT_HB_SENT:
> 			t = cmd->obj.transport;
>-			sctp_do_8_2_transport_strike(asoc, t, 1);
>+			sctp_do_8_2_transport_strike(commands, asoc,
>+						     t, 1);
> 			t->hb_sent = 1;
> 			break;
> 
>diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>index e5fe639..2b2bfe9 100644
>--- a/net/sctp/sysctl.c
>+++ b/net/sctp/sysctl.c
>@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> 		.extra2		= &int_max
> 	},
> 	{
>+		.procname	= "pf_retrans",
>+		.data		= &sctp_pf_retrans,
>+		.maxlen		= sizeof(int),
>+		.mode		= 0644,
>+		.proc_handler	= proc_dointvec_minmax,
>+		.extra1		= &zero,
>+		.extra2		= &int_max
>+	},
>+	{
> 		.procname	= "max_init_retransmits",
> 		.data		= &sctp_max_retrans_init,
> 		.maxlen		= sizeof(int),
>diff --git a/net/sctp/transport.c b/net/sctp/transport.c
>index b026ba0..4639ba2 100644
>--- a/net/sctp/transport.c
>+++ b/net/sctp/transport.c
>@@ -585,7 +585,8 @@ unsigned long sctp_transport_timeout(struct
>sctp_transport *t)
> {
> 	unsigned long timeout;
> 	timeout = t->rto + sctp_jitter(t->rto);
>-	if (t->state != SCTP_UNCONFIRMED)
>+	if ((t->state != SCTP_UNCONFIRMED) &&
>+	    (t->state != SCTP_PF))
> 		timeout += t->hbinterval;
> 	timeout += jiffies;
> 	return timeout;
>-- 
>1.7.7.6

One thing that seems to be missing is the API.  As a result you don't carry the value per transport which we'll need.  That caused you to add assoc parameter to some functions.  That's really the only missing item.
-- 
Sent from my Android phone with SkitMail. Please excuse my brevity.

^ permalink raw reply

* gateway icmp redirect handling problem (3.0.36-3.0.23)
From: Simon Roscic @ 2012-07-14 18:57 UTC (permalink / raw)
  To: netdev

Hello,

I´m experiencing the following problem with kernel versions 3.0.36 
(down to 3.0.23):

on our network we all have one default gateway, it´s 10.1.1.254, but 
there are some networks for which we have another gateway and for this 
networks the default gateway sends an icmp redirect.

lets assume my test machine has ip 10.1.20.79 netmask is 255.255.0.0 
and my default gateway is 10.1.1.254, i now ping the following ip: 
10.109.98.11, my default gateway (10.1.1.254) now sends me an icmp 
redirect to another gateway (10.1.1.1) ... and now everything works as 
expected, i get the replies from 10.109.98.11 but not for long, after 
approx. 60 (or so) seconds i only get "ping: sendmsg: Network is down".
(exact same problem with all other tcp/udp protocols, but i used ping 
for the tests because it also prints the redirect messages to the 
console)

so let´s have a closer look:

not ok - kernel versions 3.0.36 down to 3.0.23:
-----------------------------------------------

test-simon:~ # ping 10.109.98.10
...
64 bytes from 10.109.98.11: icmp_seq=62 ttl=60 time=12.1 ms
64 bytes from 10.109.98.11: icmp_seq=63 ttl=60 time=11.6 ms
ping: sendmsg: Network is down
ping: sendmsg: Network is down

when looking at "ip neigh" the "ping: sendmsg: Network is down" message 
appears in the exact moment when the arp entry for the default gateway 
(10.1.1.254) gets removed from the arp cache:

ping "OK"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

ping "dead"
test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

so it seems that when the default gateway is removed from the arp cache 
something goes wrong in the kernel route handling. i don´t know the 
internals of the linux route handling, now i need your help, any ideas 
what´s going wrong?

i did a lot of tests, the problem i described first happens with kernel 
version 3.0.23, i found in the changelog of 3.0.23 the following two 
commits:
(http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.0.23)

commit 42ab5316ddcaa0de23e88e8a3d363c767b9ab0b3
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Nov 18 15:24:32 2011 -0500
ipv4: fix redirect handling

commit bebee22bcbf0026f92141990972bd5863ef9b69c
Author: Flavio Leitner <fbl@redhat.com>
Date:   Mon Oct 24 02:56:38 2011 -0400
route: fix ICMP redirect validation

i then took the net/ipv4/route.c file from kernel 3.0.22 and replaced 
the version in 3.0.23 with it, this reverts the two mentioned patches 
above (if i havent overlooked something) after that the problem 
disappears.
so those two patches surely fixed some problem but for kernel versions 
3.0.23-3.0.36 they broke the gateway icmp redirect handling as described 
by me here.

i did some further tests with different kernel versions:
3.5-rc6: OK
3.4.4: OK
3.2.22: OK
3.0.1 - 3.0.22: OK
3.0.23 - 3.0.36: not OK
2.6.35.13: OK

now lets have a closer look at a kernel version which works:
------------------------------------------------------------

this is from 3.5-rc6, but 3.4.4, 3.2.2 and 2.6.35.13 also behave 
exactly this way, 3.0.1-3.0.22 behave slightly different, see note 
below.

test-simon:~ # ping 10.109.98.11
PING 10.109.98.10 (10.109.98.11) 56(84) bytes of data.
64 bytes from 10.109.98.11: icmp_seq=1 ttl=60 time=15.2 ms
 From 10.1.1.254: icmp_seq=2 Redirect Host(New nexthop: 10.1.1.1)
...

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE
10.1.1.254 dev eth0 lladdr 00:1a:64:8f:23:64 STALE

and after approx 60 or so seconds:

test-simon:~ # ip neigh
10.1.1.1 dev eth0 lladdr 00:00:0c:9f:f0:64 REACHABLE

and ping (and everything else) is as expected still working.

note:
-----

on 3.0.1-3.0.22:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) 
to my test machine, while running tcpdump on the default gateway 
(10.1.1.254) i see every ping packet also arriving there and also some 
icmp redirect messages going out to my test machine.
but everything works so i think my test machine is correctly talking to 
the destination using the other gateway (10.1.1.1).
i also sniffed a windows 7 client pc, it looks the same there, so 
possibly no problem, but i mention this because kernel versions 3.5-rc6, 
3.4.4, 3.2.22 and 2.6.35.13 act differently (see below).

on 3.0.23-3.0.36:

i see lots of icmp redirects sent from the default gateway (10.1.1.254) 
to my test machine, while running tcpdump on the default gateway 
(10.1.1.254) i see up to 20 ping packets arriving there and also up to 
17 icmp redirect messages going out to my test machine, after the 20th 
ping packet i dont see further ping packets arriving at the default 
gateway. so my test machine is then only talking to the other gateway 
(10.1.1.1) i think.
...
17:48:41.643952 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 
to host 10.1.1.1, length 92
...
17:48:44.649008 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
30733, seq 20, length 64
17:48:44.649018 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
30733, seq 20, length 64

on 3.5-rc6, 3.4.4, 3.2.22 and 2.6.35.13:

here it looks different, and for me this is the expected behavior, or 
at least the behavior i have seen from lots of linux machines on my 
network. i see 1-2 icmp redirects sent from the default gateway 
(10.1.1.254) to my test machine, while running tcpdump on the default 
gateway (10.1.1.254) i only see up to 2 ping packets arriving then 
nothing, so then my test machine seems to only talk to the other gateway 
(10.1.1.1).

17:50:58.995894 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 1, length 64
17:50:58.995914 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 1, length 64
17:50:59.997260 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 2, length 64
17:50:59.997277 IP 10.1.1.254 > 10.1.20.79: ICMP redirect 10.109.98.11 
to host 10.1.1.1, length 92
17:50:59.997287 IP 10.1.20.79 > 10.109.98.11: ICMP echo request, id 
10766, seq 2, length 64

...

(before someone asks why i "must" use kernel 3.0.x ... because this are 
SLES 11 SP2 VMs and they currently ship kernel 3.0.34)

i hope i described the problem in a way so that the kernel network 
stack maintainers can understand the problem, please conact me if you 
have further questions, and please CC me as i am not subscribed to 
linux-netdev. if you wish you can also CC this message or replies to 
linux-kernel.

kind regards,
Simon Roscic.

^ permalink raw reply

* Re: resurrecting tcphealth
From: David Miller @ 2012-07-14 19:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: a9702387, netdev, linux-kernel
In-Reply-To: <1342254423.3265.9028.camel@edumazet-glaptop>


Don't we report all of this shit in tcp_info already?

I really hate such cruddy patches like this.

^ permalink raw reply

* Re: 3.5rc6 sctp panic
From: David Miller @ 2012-07-14 20:02 UTC (permalink / raw)
  To: davej; +Cc: netdev, vyasevich, sri, nhorman
In-Reply-To: <20120711000831.GA10518@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Tue, 10 Jul 2012 20:08:32 -0400

> I just hit this while fuzz testing, and the box locked up immediately afterwards.
> The serial log was a little mangled, I did my best to clean it up..

Guys can we fix crashes like this one reported by Dave instead of
working on new features and cleanups?

Thanks.

> [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP 
> [22766.295376] CPU 0 
> [22766.295384] Modules linked in:
> [22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74
> [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
> [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
> [22766.387137] Stack:
> [22766.387140]  ffff880147c03a10
> [22766.387140]  ffffffffa169f2b6
> [22766.387140]  ffff88013ed95728
> [22766.387143]  0000000000000002
> [22766.387143]  0000000000000000
> [22766.387143]  ffff880003fad062
> [22766.387144]  ffff88013c120000
> [22766.387144] 
> [22766.387145] Call Trace:
> [22766.387145]  <IRQ> 
> [22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp]
> [22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
> [22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
> [22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
> [22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
> [22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> [22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
> [22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> [22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
> [22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
> [22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
> [22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
> [22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
> [22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
> [22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
> [22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
> [22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
> [22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
> [22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
> [22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
> [22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
> [22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
> [22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
> [22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
> [22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
> [22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
> [22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
> [22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
> [22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
> [22766.387283]  <EOI> 
> [22766.387284] 
> [22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
> [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 
> ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 
> 49 89 f5 66 c1 c0 08 66 39 46 02 
> [22766.387307] 
> [22766.387307] RIP 
> [22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
> [22766.387311]  RSP <ffff880147c039b0>
> [22766.387142]  ffffffffa16ab120
> [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
> [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> 
> Disassembly of the function shows that we oopsed here..
> 
> /* Is this the association we are looking for? */
> struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
>                                            const union sctp_addr *laddr,
>                                            const union sctp_addr *paddr)
> {
>     1070:       55                      push   %rbp
>     1071:       48 89 e5                mov    %rsp,%rbp
>     1074:       48 83 ec 20             sub    $0x20,%rsp
>     1078:       48 89 5d e8             mov    %rbx,-0x18(%rbp)
>     107c:       4c 89 65 f0             mov    %r12,-0x10(%rbp)
>     1080:       4c 89 6d f8             mov    %r13,-0x8(%rbp)
>     1084:       e8 00 00 00 00          callq  1089 <sctp_assoc_is_match+0x19>
>         struct sctp_transport *transport;
> 
>         if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
>     1089:       0f b7 87 98 00 00 00    movzwl 0x98(%rdi),%eax
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: Crash in CIPSO_V4_TAG_LOCAL handling
From: David Miller @ 2012-07-14 20:08 UTC (permalink / raw)
  To: mlin; +Cc: alan, paul.moore, netdev
In-Reply-To: <1342286550.23395.12.camel@monkey32>

From: Lin Ming <mlin@ss.pku.edu.cn>
Date: Sun, 15 Jul 2012 01:22:30 +0800

> It's caused by below code added in commit 15c45f7b.
> 
>                 case CIPSO_V4_TAG_LOCAL:
>                         /* This is a non-standard tag that we only allow for
>                          * local connections, so if the incoming interface is
>                          * not the loopback device drop the packet. */
>                         if (!(skb->dev->flags & IFF_LOOPBACK)) {
>                                 err_offset = opt_iter;
>                                 goto validate_return_locked;
>                         }

Paul please fix this, as shown 'skb' can easily be NULL in this
code path.

^ permalink raw reply

* [PATCH] drivers: connector: fixed coding style issues
From: Valentin Ilie @ 2012-07-14 20:26 UTC (permalink / raw)
  To: zbr; +Cc: netdev, Valentin Ilie

Fixed checkpatch warnings.

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
---
 drivers/connector/cn_proc.c   |   30 +++++++++++++++---------------
 drivers/connector/cn_queue.c  |    6 +++---
 drivers/connector/connector.c |    2 +-
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 77e1e6c..a76801c 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -46,7 +46,7 @@ static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
 static inline void get_seq(__u32 *ts, int *cpu)
 {
 	preempt_disable();
-	*ts = __this_cpu_inc_return(proc_event_counts) -1;
+	*ts = __this_cpu_inc_return(proc_event_counts) - 1;
 	*cpu = smp_processor_id();
 	preempt_enable();
 }
@@ -62,8 +62,8 @@ void proc_fork_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -93,8 +93,8 @@ void proc_exec_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -119,8 +119,8 @@ void proc_id_connector(struct task_struct *task, int which_id)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	ev->what = which_id;
 	ev->event_data.id.process_pid = task->pid;
 	ev->event_data.id.process_tgid = task->tgid;
@@ -134,7 +134,7 @@ void proc_id_connector(struct task_struct *task, int which_id)
 		ev->event_data.id.e.egid = cred->egid;
 	} else {
 		rcu_read_unlock();
-	     	return;
+		return;
 	}
 	rcu_read_unlock();
 	get_seq(&msg->seq, &ev->cpu);
@@ -241,8 +241,8 @@ void proc_exit_connector(struct task_struct *task)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	get_seq(&msg->seq, &ev->cpu);
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -276,8 +276,8 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
 	if (atomic_read(&proc_event_num_listeners) < 1)
 		return;
 
-	msg = (struct cn_msg*)buffer;
-	ev = (struct proc_event*)msg->data;
+	msg = (struct cn_msg *)buffer;
+	ev = (struct proc_event *)msg->data;
 	msg->seq = rcvd_seq;
 	ktime_get_ts(&ts); /* get high res monotonic timestamp */
 	put_unaligned(timespec_to_ns(&ts), (__u64 *)&ev->timestamp_ns);
@@ -303,7 +303,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
 	if (msg->len != sizeof(*mc_op))
 		return;
 
-	mc_op = (enum proc_cn_mcast_op*)msg->data;
+	mc_op = (enum proc_cn_mcast_op *)msg->data;
 	switch (*mc_op) {
 	case PROC_CN_MCAST_LISTEN:
 		atomic_inc(&proc_event_num_listeners);
@@ -328,8 +328,8 @@ static int __init cn_proc_init(void)
 	int err;
 
 	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
-	 			   &cn_proc_mcast_ctl))) {
-		printk(KERN_WARNING "cn_proc failed to register\n");
+					&cn_proc_mcast_ctl))) {
+		pr_warn("cn_proc failed to register\n");
 		return err;
 	}
 	return 0;
diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index c42c9d5..cd3bd7d 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -1,5 +1,5 @@
 /*
- * 	cn_queue.c
+ *	cn_queue.c
  *
  * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
@@ -40,7 +40,7 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name,
 
 	cbq = kzalloc(sizeof(*cbq), GFP_KERNEL);
 	if (!cbq) {
-		printk(KERN_ERR "Failed to create new callback queue.\n");
+		pr_err("Failed to create new callback queue.\n");
 		return NULL;
 	}
 
@@ -149,7 +149,7 @@ void cn_queue_free_dev(struct cn_queue_dev *dev)
 	spin_unlock_bh(&dev->queue_lock);
 
 	while (atomic_read(&dev->refcnt)) {
-		printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
+		pr_info("Waiting for %s to become free: refcnt=%d.\n",
 		       dev->name, atomic_read(&dev->refcnt));
 		msleep(1000);
 	}
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index dde6a0f..36e89ae 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -1,5 +1,5 @@
 /*
- * 	connector.c
+ *	connector.c
  *
  * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] drivers: connector: fixed coding style issues
From: Joe Perches @ 2012-07-14 20:33 UTC (permalink / raw)
  To: Valentin Ilie; +Cc: zbr, netdev
In-Reply-To: <1342297608-6872-1-git-send-email-valentin.ilie@gmail.com>

On Sat, 2012-07-14 at 23:26 +0300, Valentin Ilie wrote:
> Fixed checkpatch warnings.
[]
> diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
[]
> @@ -328,8 +328,8 @@ static int __init cn_proc_init(void)
>  	int err;
>  
>  	if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc",
> -	 			   &cn_proc_mcast_ctl))) {
> -		printk(KERN_WARNING "cn_proc failed to register\n");
> +					&cn_proc_mcast_ctl))) {
> +		pr_warn("cn_proc failed to register\n");

I don't believe the first part of this change is a checkpatch
message.  I think it would be better written by hoisting the
function outside the if like:

	err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl)
	if (err) {
		pr_warn("cn_proc failed to register\n");
		return err;
	}

^ permalink raw reply

* Re: 3.5rc6 sctp panic
From: Neil Horman @ 2012-07-14 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev, vyasevich, sri
In-Reply-To: <20120714.130201.1562818146578811700.davem@davemloft.net>

On Sat, Jul 14, 2012 at 01:02:01PM -0700, David Miller wrote:
> From: Dave Jones <davej@redhat.com>
> Date: Tue, 10 Jul 2012 20:08:32 -0400
> 
> > I just hit this while fuzz testing, and the box locked up immediately afterwards.
> > The serial log was a little mangled, I did my best to clean it up..
> 
> Guys can we fix crashes like this one reported by Dave instead of
> working on new features and cleanups?
> 
> Thanks.
> 
Yeah, I'll put it at the top of my to do list, and track it down monday AM.
Neil

> > [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP 
> > [22766.295376] CPU 0 
> > [22766.295384] Modules linked in:
> > [22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74
> > [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
> > [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
> > [22766.387137] Stack:
> > [22766.387140]  ffff880147c03a10
> > [22766.387140]  ffffffffa169f2b6
> > [22766.387140]  ffff88013ed95728
> > [22766.387143]  0000000000000002
> > [22766.387143]  0000000000000000
> > [22766.387143]  ffff880003fad062
> > [22766.387144]  ffff88013c120000
> > [22766.387144] 
> > [22766.387145] Call Trace:
> > [22766.387145]  <IRQ> 
> > [22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp]
> > [22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
> > [22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
> > [22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
> > [22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
> > [22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> > [22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
> > [22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
> > [22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
> > [22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
> > [22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
> > [22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
> > [22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
> > [22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
> > [22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
> > [22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
> > [22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
> > [22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
> > [22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
> > [22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
> > [22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
> > [22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
> > [22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
> > [22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
> > [22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
> > [22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
> > [22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
> > [22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
> > [22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
> > [22766.387283]  <EOI> 
> > [22766.387284] 
> > [22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
> > [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 
> > ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 
> > 49 89 f5 66 c1 c0 08 66 39 46 02 
> > [22766.387307] 
> > [22766.387307] RIP 
> > [22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
> > [22766.387311]  RSP <ffff880147c039b0>
> > [22766.387142]  ffffffffa16ab120
> > [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
> > [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > 
> > 
> > Disassembly of the function shows that we oopsed here..
> > 
> > /* Is this the association we are looking for? */
> > struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
> >                                            const union sctp_addr *laddr,
> >                                            const union sctp_addr *paddr)
> > {
> >     1070:       55                      push   %rbp
> >     1071:       48 89 e5                mov    %rsp,%rbp
> >     1074:       48 83 ec 20             sub    $0x20,%rsp
> >     1078:       48 89 5d e8             mov    %rbx,-0x18(%rbp)
> >     107c:       4c 89 65 f0             mov    %r12,-0x10(%rbp)
> >     1080:       4c 89 6d f8             mov    %r13,-0x8(%rbp)
> >     1084:       e8 00 00 00 00          callq  1089 <sctp_assoc_is_match+0x19>
> >         struct sctp_transport *transport;
> > 
> >         if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
> >     1089:       0f b7 87 98 00 00 00    movzwl 0x98(%rdi),%eax
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply

* Re: [PATCH] sctp: Implement quick failover draft from tsvwg
From: Neil Horman @ 2012-07-14 21:22 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Sridhar Samudrala, David S. Miller, linux-sctp
In-Reply-To: <fd33324e-93b0-4ced-8f57-f3c6032baf9b@email.android.com>

On Sat, Jul 14, 2012 at 02:12:36PM -0400, Vlad Yasevich wrote:
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> >I've seen several attempts recently made to do quick failover of sctp
> >transports
> >by reducing various retransmit timers and counters.  While its possible
> >to
> >implement a faster failover on multihomed sctp associations, its not
> >particularly robust, in that it can lead to unneeded retransmits, as
> >well as
> >false connection failures due to intermittent latency on a network.
> >
> >Instead, lets implement the new ietf quick failover draft found here:
> >http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
> >
> >This will let the sctp stack identify transports that have had a small
> >number of
> >errors, and avoid using them quickly until their reliability can be
> >re-established.  I've tested this out on two virt guests connected via
> >multiple
> >isolated virt networks and believe its in compliance with the above
> >draft and
> >works well.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: Vlad Yasevich <vyasevich@gmail.com>
> >CC: Sridhar Samudrala <sri@us.ibm.com>
> >CC: "David S. Miller" <davem@davemloft.net>
> >CC: linux-sctp@vger.kernel.org
> >---
> > Documentation/networking/ip-sysctl.txt |   14 +++++++++++++
> > include/net/sctp/constants.h           |    1 +
> > include/net/sctp/structs.h             |    4 +++
> > include/net/sctp/user.h                |    1 +
> >net/sctp/associola.c                   |   33
> >+++++++++++++++++++++++++------
> > net/sctp/outqueue.c                    |    6 +++-
> >net/sctp/sm_sideeffect.c               |   33
> >++++++++++++++++++++++++++++---
> > net/sctp/sysctl.c                      |    9 ++++++++
> > net/sctp/transport.c                   |    3 +-
> > 9 files changed, 90 insertions(+), 14 deletions(-)
> >
> >diff --git a/Documentation/networking/ip-sysctl.txt
> >b/Documentation/networking/ip-sysctl.txt
> >index 47b6c79..c636f9c 100644
> >--- a/Documentation/networking/ip-sysctl.txt
> >+++ b/Documentation/networking/ip-sysctl.txt
> >@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
> > 
> > 	Default: 5
> > 
> >+pf_retrans - INTEGER
> >+	The number of retransmissions that will be attempted on a given path
> >+	before traffic is redirected to an alternate transport (should one
> >+	exist).  Note this is distinct from path_max_retrans, as a path that
> >+	passes the pf_retrans threshold can still be used.  Its only
> >+	deprioritized when a transmission path is selected by the stack. 
> >This
> >+	setting is primarily used to enable fast failover mechanisms without
> >+	having to reduce path_max_retrans to a very low value.  See:
> >+	http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> >+	for details.  Note also that a value of pf_retrans > path_max_retrans
> >+	disables this feature
> >+
> >+	Default: 0
> >+
> > rto_initial - INTEGER
> >	The initial round trip timeout value in milliseconds that will be used
> > 	in calculating round trip times.  This is the initial time interval
> >diff --git a/include/net/sctp/constants.h
> >b/include/net/sctp/constants.h
> >index 942b864..d053d2e 100644
> >--- a/include/net/sctp/constants.h
> >+++ b/include/net/sctp/constants.h
> >@@ -334,6 +334,7 @@ typedef enum {
> > typedef enum {
> > 	SCTP_TRANSPORT_UP,
> > 	SCTP_TRANSPORT_DOWN,
> >+	SCTP_TRANSPORT_PF,
> > } sctp_transport_cmd_t;
> > 
> > /* These are the address scopes defined mainly for IPv4 addresses
> >diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> >index e4652fe..22825abe 100644
> >--- a/include/net/sctp/structs.h
> >+++ b/include/net/sctp/structs.h
> >@@ -160,6 +160,7 @@ extern struct sctp_globals {
> > 	int max_retrans_association;
> > 	int max_retrans_path;
> > 	int max_retrans_init;
> >+	int pf_retrans;
> > 
> > 	/*
> > 	 * Policy for preforming sctp/socket accounting
> >@@ -258,6 +259,7 @@ extern struct sctp_globals {
> > #define sctp_sndbuf_policy	 	(sctp_globals.sndbuf_policy)
> > #define sctp_rcvbuf_policy	 	(sctp_globals.rcvbuf_policy)
> > #define sctp_max_retrans_path		(sctp_globals.max_retrans_path)
> >+#define sctp_pf_retrans			(sctp_globals.pf_retrans)
> > #define sctp_max_retrans_init		(sctp_globals.max_retrans_init)
> > #define sctp_sack_timeout		(sctp_globals.sack_timeout)
> > #define sctp_hb_interval		(sctp_globals.hb_interval)
> >@@ -1660,6 +1662,8 @@ struct sctp_association {
> > 	 */
> > 	int max_retrans;
> > 
> >+	int pf_retrans;
> >+
> > 	/* Maximum number of times the endpoint will retransmit INIT  */
> > 	__u16 max_init_attempts;
> > 
> >diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
> >index 0842ef0..cece1bf 100644
> >--- a/include/net/sctp/user.h
> >+++ b/include/net/sctp/user.h
> >@@ -649,6 +649,7 @@ struct sctp_paddrinfo {
> >  */
> > enum sctp_spinfo_state {
> > 	SCTP_INACTIVE,
> >+	SCTP_PF,
> > 	SCTP_ACTIVE,
> > 	SCTP_UNCONFIRMED,
> > 	SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
> >diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> >index 5bc9ab1..f3ebc23 100644
> >--- a/net/sctp/associola.c
> >+++ b/net/sctp/associola.c
> >@@ -124,6 +124,8 @@ static struct sctp_association
> >*sctp_association_init(struct sctp_association *a
> > 	 * socket values.
> > 	 */
> > 	asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
> >+	asoc->pf_retrans  = sctp_pf_retrans;
> >+
> > 	asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
> > 	asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
> > 	asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
> >@@ -840,6 +842,7 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 	struct sctp_ulpevent *event;
> > 	struct sockaddr_storage addr;
> > 	int spc_state = 0;
> >+	bool ulp_notify = true;
> > 
> > 	/* Record the transition on the transport.  */
> > 	switch (command) {
> >@@ -853,6 +856,14 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 			spc_state = SCTP_ADDR_CONFIRMED;
> > 		else
> > 			spc_state = SCTP_ADDR_AVAILABLE;
> >+		/* Don't inform ULP about transition from PF to
> >+		 * active state and set cwnd to 1, see SCTP
> >+		 * Quick failover draft section 5.1, point 5
> >+		 */
> >+		if (transport->state == SCTP_PF) {
> >+			ulp_notify = false;
> >+			transport->cwnd = 1;
> >+		}
> > 		transport->state = SCTP_ACTIVE;
> > 		break;
> > 
> >@@ -871,6 +882,10 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 		spc_state = SCTP_ADDR_UNREACHABLE;
> > 		break;
> > 
> >+	case SCTP_TRANSPORT_PF:
> >+		transport->state = SCTP_PF;
> >+		ulp_notify = false;
> >+		break;
> > 	default:
> > 		return;
> > 	}
> >@@ -878,12 +893,15 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 	/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
> > 	 * user.
> > 	 */
> >-	memset(&addr, 0, sizeof(struct sockaddr_storage));
> >-	memcpy(&addr, &transport->ipaddr,
> >transport->af_specific->sockaddr_len);
> >-	event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
> >-				0, spc_state, error, GFP_ATOMIC);
> >-	if (event)
> >-		sctp_ulpq_tail_event(&asoc->ulpq, event);
> >+	if (ulp_notify) {
> >+		memset(&addr, 0, sizeof(struct sockaddr_storage));
> >+		memcpy(&addr, &transport->ipaddr,
> >+		       transport->af_specific->sockaddr_len);
> >+		event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
> >+					0, spc_state, error, GFP_ATOMIC);
> >+		if (event)
> >+			sctp_ulpq_tail_event(&asoc->ulpq, event);
> >+	}
> > 
> > 	/* Select new active and retran paths. */
> > 
> >@@ -899,7 +917,8 @@ void sctp_assoc_control_transport(struct
> >sctp_association *asoc,
> > 			transports) {
> > 
> > 		if ((t->state == SCTP_INACTIVE) ||
> >-		    (t->state == SCTP_UNCONFIRMED))
> >+		    (t->state == SCTP_UNCONFIRMED) ||
> >+		    (t->state == SCTP_PF))
> > 			continue;
> > 		if (!first || t->last_time_heard > first->last_time_heard) {
> > 			second = first;
> >diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> >index a0fa19f..e7aa177c 100644
> >--- a/net/sctp/outqueue.c
> >+++ b/net/sctp/outqueue.c
> >@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
> >rtx_timeout)
> > 			if (!new_transport)
> > 				new_transport = asoc->peer.active_path;
> > 		} else if ((new_transport->state == SCTP_INACTIVE) ||
> >-			   (new_transport->state == SCTP_UNCONFIRMED)) {
> >+			   (new_transport->state == SCTP_UNCONFIRMED) ||
> >+			   (new_transport->state == SCTP_PF)) {
> > 			/* If the chunk is Heartbeat or Heartbeat Ack,
> > 			 * send it to chunk->transport, even if it's
> > 			 * inactive.
> >@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int
> >rtx_timeout)
> > 			new_transport = chunk->transport;
> > 			if (!new_transport ||
> > 			    ((new_transport->state == SCTP_INACTIVE) ||
> >-			     (new_transport->state == SCTP_UNCONFIRMED)))
> >+			     (new_transport->state == SCTP_UNCONFIRMED) ||
> >+			     (new_transport->state == SCTP_PF)))
> > 				new_transport = asoc->peer.active_path;
> > 			if (new_transport->state == SCTP_UNCONFIRMED)
> > 				continue;
> >diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> >index c96d1a8..285e26a 100644
> >--- a/net/sctp/sm_sideeffect.c
> >+++ b/net/sctp/sm_sideeffect.c
> >@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type,
> >sctp_subtype_t subtype,
> > 			     sctp_cmd_seq_t *commands,
> > 			     gfp_t gfp);
> > 
> >+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
> >+				     struct sctp_transport *t);
> > /********************************************************************
> >  * Helper functions
> >  ********************************************************************/
> >@@ -470,7 +472,8 @@ sctp_timer_event_t
> >*sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
> >  * notification SHOULD be sent to the upper layer.
> >  *
> >  */
> >-static void sctp_do_8_2_transport_strike(struct sctp_association
> >*asoc,
> >+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
> >+					 struct sctp_association *asoc,
> > 					 struct sctp_transport *transport,
> > 					 int is_hb)
> > {
> >@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct
> >sctp_association *asoc,
> > 			transport->error_count++;
> > 	}
> > 
> >+	/* If the transport error count is greater than the pf_retrans
> >+	 * threshold, and less than pathmaxrtx, then mark this transport
> >+	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
> >+	 * point 1
> >+	 */
> >+	if ((transport->state != SCTP_PF) &&
> >+	   (asoc->pf_retrans < transport->pathmaxrxt) &&
> >+	   (transport->error_count > asoc->pf_retrans)) {
> >+
> >+		sctp_assoc_control_transport(asoc, transport,
> >+					     SCTP_TRANSPORT_PF,
> >+					     0);
> >+
> >+		/* Update the hb timer to resend a heartbeat every rto */
> >+		sctp_cmd_hb_timer_update(commands, transport);
> >+	}
> >+
> > 	if (transport->state != SCTP_INACTIVE &&
> > 	    (transport->error_count > transport->pathmaxrxt)) {
> > 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
> >@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t
> >*cmds,
> > 					     SCTP_HEARTBEAT_SUCCESS);
> > 	}
> > 
> >+	if (t->state == SCTP_PF)
> >+		sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
> >+					     SCTP_HEARTBEAT_SUCCESS);
> >+
> > 	/* The receiver of the HEARTBEAT ACK should also perform an
> > 	 * RTT measurement for that destination transport address
> > 	 * using the time value carried in the HEARTBEAT ACK chunk.
> >@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t
> >event_type,
> > 
> > 		case SCTP_CMD_STRIKE:
> > 			/* Mark one strike against a transport.  */
> >-			sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
> >-						    0);
> >+			sctp_do_8_2_transport_strike(commands, asoc,
> >+						    cmd->obj.transport, 0);
> > 			break;
> > 
> > 		case SCTP_CMD_TRANSPORT_IDLE:
> >@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t
> >event_type,
> > 
> > 		case SCTP_CMD_TRANSPORT_HB_SENT:
> > 			t = cmd->obj.transport;
> >-			sctp_do_8_2_transport_strike(asoc, t, 1);
> >+			sctp_do_8_2_transport_strike(commands, asoc,
> >+						     t, 1);
> > 			t->hb_sent = 1;
> > 			break;
> > 
> >diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> >index e5fe639..2b2bfe9 100644
> >--- a/net/sctp/sysctl.c
> >+++ b/net/sctp/sysctl.c
> >@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> > 		.extra2		= &int_max
> > 	},
> > 	{
> >+		.procname	= "pf_retrans",
> >+		.data		= &sctp_pf_retrans,
> >+		.maxlen		= sizeof(int),
> >+		.mode		= 0644,
> >+		.proc_handler	= proc_dointvec_minmax,
> >+		.extra1		= &zero,
> >+		.extra2		= &int_max
> >+	},
> >+	{
> > 		.procname	= "max_init_retransmits",
> > 		.data		= &sctp_max_retrans_init,
> > 		.maxlen		= sizeof(int),
> >diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> >index b026ba0..4639ba2 100644
> >--- a/net/sctp/transport.c
> >+++ b/net/sctp/transport.c
> >@@ -585,7 +585,8 @@ unsigned long sctp_transport_timeout(struct
> >sctp_transport *t)
> > {
> > 	unsigned long timeout;
> > 	timeout = t->rto + sctp_jitter(t->rto);
> >-	if (t->state != SCTP_UNCONFIRMED)
> >+	if ((t->state != SCTP_UNCONFIRMED) &&
> >+	    (t->state != SCTP_PF))
> > 		timeout += t->hbinterval;
> > 	timeout += jiffies;
> > 	return timeout;
> >-- 
> >1.7.7.6
> 
> One thing that seems to be missing is the API.  As a result you don't carry the value per transport which we'll need.  That caused you to add assoc parameter to some functions.  That's really the only missing item.

I definately agree that a way to set the a per association pf threshold from
userspace (ostensibly from a socket option), but I'm not quite sure we're on the
same page about the semantics.  You say above that that I don't carry the value
per transport (I presume you mean the pf threshold).  According to the draft the
threshold is maintained per association, see point one of section 5.1:
1.  The sender maintains a new tunable parameter called Potentially-
       failed.Max.Retrans (PFMR).  The recommended value of PFMR = 0
       when quick failover is used.  When an association's PFMR >= PMR,
       quick failover is turned off.

So yes, we should have a way to change it programatically from the default via
an option, but I think the threshold is stored in the correct place (the
assocition struct).

Or am I misunderstanding what you're saying?

Regards
Neil

> -- 
> Sent from my Android phone with SkitMail. Please excuse my brevity.
> 

^ permalink raw reply

* Re: resurrecting tcphealth
From: Stephen Hemminger @ 2012-07-14 21:48 UTC (permalink / raw)
  To: Piotr Sawuk; +Cc: netdev, linux-kernel
In-Reply-To: <dee325378a2d3b7792da9640fc14905f.squirrel@webmail.univie.ac.at>

Maybe it would be better to use netlink info (for ss command)
rather than a /proc/net interface.
See how existing TCP values and MEMINFO are handled.

^ permalink raw reply

* Re: [PATCH] drivers: connector: fixed coding style issues
From: Valentin Ilie @ 2012-07-14 21:51 UTC (permalink / raw)
  To: Joe Perches; +Cc: zbr, netdev
In-Reply-To: <1342298026.8377.33.camel@joe2Laptop>

On 14 July 2012 23:33, Joe Perches <joe@perches.com> wrote:
> I think it would be better written by hoisting the
> function outside the if like [..]

OK.

Should I submit the whole patch again (with that change) or just
another small patch that does just that?

-- 
Valentin Ilie

^ permalink raw reply

* Re: [PATCH net-next] netem: refine early skb orphaning
From: Stephen Hemminger @ 2012-07-14 21:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Hagen Paul Pfeifer, Mark Gordon,
	Andreas Terzis, Yuchung Cheng
In-Reply-To: <1342271787.3265.9632.camel@edumazet-glaptop>

On Sat, 14 Jul 2012 15:16:27 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> netem does an early orphaning of skbs. Doing so breaks TCP Small Queue
> or any mechanism relying on socket sk_wmem_alloc feedback.
> 
> Ideally, we should perform this orphaning after the rate module and
> before the delay module, to mimic what happens on a real link :
> 
> skb orphaning is indeed normally done at TX completion, before the
> transit on the link.
> 
> +-------+   +--------+  +---------------+  +-----------------+
> + Qdisc +---> Device +--> TX completion +--> links / hops    +->
> +       +   +  xmit  +  + skb orphaning +  + propagation     +
> +-------+   +--------+  +---------------+  +-----------------+
>       < rate limiting >                  < delay, drops, reorders >
> 
> If netem is used without delay feature (drops, reorders, rate
> limiting), then we should avoid early skb orphaning, to keep pressure
> on sockets as long as packets are still in qdisc queue.
> 
> Ideally, netem should be refactored to implement delay module
> as the last stage. Current algorithm merges the two phases
> (rate limiting + delay) so its not correct.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: Mark Gordon <msg@google.com>
> Cc: Andreas Terzis <aterzis@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
>  net/sched/sch_netem.c |    9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index c412ad0..298c0dd 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -380,7 +380,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
>  	}
>  
> -	skb_orphan(skb);
> +	/* If a delay is expected, orphan the skb. (orphaning usually takes
> +	 * place at TX completion time, so _before_ the link transit delay)
> +	 * Ideally, this orphaning should be done after the rate limiting
> +	 * module, because this breaks TCP Small Queue, and other mechanisms
> +	 * based on socket sk_wmem_alloc.
> +	 */
> +	if (q->latency || q->jitter)
> +		skb_orphan(skb);
>  
>  	/*
>  	 * If we need to duplicate packet, then re-insert at top of the
> 

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply

* Re: [PATCH] drivers: connector: fixed coding style issues
From: Joe Perches @ 2012-07-14 21:56 UTC (permalink / raw)
  To: Valentin Ilie; +Cc: zbr, netdev
In-Reply-To: <CAHuBVBYXAj7OUgo3UmLERbcmBw-sK8c66TuCq_7Fsp=CVzzYSw@mail.gmail.com>

On Sun, 2012-07-15 at 00:51 +0300, Valentin Ilie wrote:
> On 14 July 2012 23:33, Joe Perches <joe@perches.com> wrote:
> > I think it would be better written by hoisting the
> > function outside the if like [..]
> 
> OK.
> 
> Should I submit the whole patch again (with that change) or just
> another small patch that does just that?

It's generally better to reply to the original patch
with [PATCH V2] with the entire patch content updated.

btw: there's a missing a semicolon in what I posted.

Also add a changelog after the --- so the new patch
email looks like:

Subject: [PATCH V2] drivers: connectors: etc

Patch description...

Signed-off-by: etc
---
v2: Updated to do whatever

actual patch...

cheers, Joe

^ permalink raw reply

* [net-next 0/7][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-07-14 23:06 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to e1000e and ixgbe.

The following are changes since commit 141e369de698f2e17bf716b83fcc647ddcb2220c:
  xfrm: Initialize the struct xfrm_dst behind the dst_enty field
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (5):
  ixgbe: Simplify logic for getting traffic class from user priority
  ixgbe: Cleanup unpacking code for DCB
  ixgbe: Populate the prio_tc_map in ixgbe_setup_tc
  ixgbe: Add function for obtaining FCoE TC based on FCoE user priority
  ixgbe: Merge FCoE set_num and cache_ring calls into RSS/DCB config

Matthew Vick (1):
  e1000e: Program the correct register for ITR when using MSI-X.

Tushar Dave (1):
  e1000e: Cleanup code logic in e1000_check_for_serdes_link_82571()

 drivers/net/ethernet/intel/e1000e/82571.c       |   14 +-
 drivers/net/ethernet/intel/e1000e/e1000.h       |    1 +
 drivers/net/ethernet/intel/e1000e/ethtool.c     |    5 +-
 drivers/net/ethernet/intel/e1000e/netdev.c      |   32 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h        |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c    |   74 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h    |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |   26 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c   |   15 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c    |  260 ++++++++++-------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |   61 ++++--
 11 files changed, 265 insertions(+), 225 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [net-next 1/7] e1000e: Cleanup code logic in e1000_check_for_serdes_link_82571()
From: Jeff Kirsher @ 2012-07-14 23:06 UTC (permalink / raw)
  To: davem; +Cc: Tushar Dave, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Tushar Dave <tushar.n.dave@intel.com>

Cleanup code to make it more clean and readable.

Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/82571.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 36db4df..19f4cb9 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -1677,16 +1677,18 @@ static s32 e1000_check_for_serdes_link_82571(struct e1000_hw *hw)
 			e_dbg("ANYSTATE  -> DOWN\n");
 		} else {
 			/*
-			 * Check several times, if Sync and Config
-			 * both are consistently 1 then simply ignore
-			 * the Invalid bit and restart Autoneg
+			 * Check several times, if SYNCH bit and CONFIG
+			 * bit both are consistently 1 then simply ignore
+			 * the IV bit and restart Autoneg
 			 */
 			for (i = 0; i < AN_RETRY_COUNT; i++) {
 				udelay(10);
 				rxcw = er32(RXCW);
-				if ((rxcw & E1000_RXCW_IV) &&
-				    !((rxcw & E1000_RXCW_SYNCH) &&
-				      (rxcw & E1000_RXCW_C))) {
+				if ((rxcw & E1000_RXCW_SYNCH) &&
+				    (rxcw & E1000_RXCW_C))
+					continue;
+
+				if (rxcw & E1000_RXCW_IV) {
 					mac->serdes_has_link = false;
 					mac->serdes_link_state =
 					    e1000_serdes_link_down;
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 2/7] e1000e: Program the correct register for ITR when using MSI-X.
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem; +Cc: Matthew Vick, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Matthew Vick <matthew.vick@intel.com>

When configuring interrupt throttling on 82574 in MSI-X mode, we need to
be programming the EITR registers instead of the ITR register.

-rc2: Renamed e1000_write_itr() to e1000e_write_itr(), fixed whitespace
      issues, and removed unnecessary !! operation.
-rc3: Reduced the scope of the loop variable in e1000e_write_itr().

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h   |    1 +
 drivers/net/ethernet/intel/e1000e/ethtool.c |    5 ++---
 drivers/net/ethernet/intel/e1000e/netdev.c  |   32 +++++++++++++++++++++++----
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 6e6fffb..cd15332 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -514,6 +514,7 @@ extern void e1000e_set_interrupt_capability(struct e1000_adapter *adapter);
 extern void e1000e_reset_interrupt_capability(struct e1000_adapter *adapter);
 extern void e1000e_get_hw_control(struct e1000_adapter *adapter);
 extern void e1000e_release_hw_control(struct e1000_adapter *adapter);
+extern void e1000e_write_itr(struct e1000_adapter *adapter, u32 itr);
 
 extern unsigned int copybreak;
 
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 905e214..105d554 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1897,7 +1897,6 @@ static int e1000_set_coalesce(struct net_device *netdev,
 			      struct ethtool_coalesce *ec)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	struct e1000_hw *hw = &adapter->hw;
 
 	if ((ec->rx_coalesce_usecs > E1000_MAX_ITR_USECS) ||
 	    ((ec->rx_coalesce_usecs > 4) &&
@@ -1916,9 +1915,9 @@ static int e1000_set_coalesce(struct net_device *netdev,
 	}
 
 	if (adapter->itr_setting != 0)
-		ew32(ITR, 1000000000 / (adapter->itr * 256));
+		e1000e_write_itr(adapter, adapter->itr);
 	else
-		ew32(ITR, 0);
+		e1000e_write_itr(adapter, 0);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ca477e8..95b2453 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2474,6 +2474,30 @@ set_itr_now:
 }
 
 /**
+ * e1000e_write_itr - write the ITR value to the appropriate registers
+ * @adapter: address of board private structure
+ * @itr: new ITR value to program
+ *
+ * e1000e_write_itr determines if the adapter is in MSI-X mode
+ * and, if so, writes the EITR registers with the ITR value.
+ * Otherwise, it writes the ITR value into the ITR register.
+ **/
+void e1000e_write_itr(struct e1000_adapter *adapter, u32 itr)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	u32 new_itr = itr ? 1000000000 / (itr * 256) : 0;
+
+	if (adapter->msix_entries) {
+		int vector;
+
+		for (vector = 0; vector < adapter->num_vectors; vector++)
+			writel(new_itr, hw->hw_addr + E1000_EITR_82574(vector));
+	} else {
+		ew32(ITR, new_itr);
+	}
+}
+
+/**
  * e1000_alloc_queues - Allocate memory for all rings
  * @adapter: board private structure to initialize
  **/
@@ -3059,7 +3083,7 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 	/* irq moderation */
 	ew32(RADV, adapter->rx_abs_int_delay);
 	if ((adapter->itr_setting != 0) && (adapter->itr != 0))
-		ew32(ITR, 1000000000 / (adapter->itr * 256));
+		e1000e_write_itr(adapter, adapter->itr);
 
 	ctrl_ext = er32(CTRL_EXT);
 	/* Auto-Mask interrupts upon ICR access */
@@ -3486,14 +3510,14 @@ void e1000e_reset(struct e1000_adapter *adapter)
 				dev_info(&adapter->pdev->dev,
 					"Interrupt Throttle Rate turned off\n");
 				adapter->flags2 |= FLAG2_DISABLE_AIM;
-				ew32(ITR, 0);
+				e1000e_write_itr(adapter, 0);
 			}
 		} else if (adapter->flags2 & FLAG2_DISABLE_AIM) {
 			dev_info(&adapter->pdev->dev,
 				 "Interrupt Throttle Rate turned on\n");
 			adapter->flags2 &= ~FLAG2_DISABLE_AIM;
 			adapter->itr = 20000;
-			ew32(ITR, 1000000000 / (adapter->itr * 256));
+			e1000e_write_itr(adapter, adapter->itr);
 		}
 	}
 
@@ -4576,7 +4600,7 @@ link_up:
 			    adapter->gorc - adapter->gotc) / 10000;
 		u32 itr = goc > 0 ? (dif * 6000 / goc + 2000) : 8000;
 
-		ew32(ITR, 1000000000 / (itr * 256));
+		e1000e_write_itr(adapter, itr);
 	}
 
 	/* Cause software interrupt to ensure Rx ring is cleaned */
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 3/7] ixgbe: Simplify logic for getting traffic class from user priority
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is meant to help simplify the logic for getting traffic classes
from user priorities. To do this I am adding a function named
ixgbe_dcb_get_tc_from_up that will go through the traffic classes in
reverse order in order to determine which traffic class contains a bit for
a given user priority.

Adding a declaration for this new function to the header so that
we have a centralized means for sorting out traffic classes belonging to
features such as FCoE.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c |   30 ++++++++++++++++++++------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
index 8bfaaee..4fd5a0d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
@@ -231,16 +231,32 @@ void ixgbe_dcb_unpack_prio(struct ixgbe_dcb_config *cfg, int direction,
 	}
 }
 
-void ixgbe_dcb_unpack_map(struct ixgbe_dcb_config *cfg, int direction, u8 *map)
+static u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *cfg,
+				   int direction, u8 up)
 {
-	int i, up;
-	unsigned long bitmap;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	u8 prio_mask = 1 << up;
+	u8 tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		bitmap = cfg->tc_config[i].path[direction].up_to_tc_bitmap;
-		for_each_set_bit(up, &bitmap, MAX_USER_PRIORITY)
-			map[up] = i;
+	/*
+	 * Test for TCs 7 through 1 and report the first match we find.  If
+	 * we find no match we can assume that the TC is 0 since the TC must
+	 * be set for all user priorities
+	 */
+	for (tc = MAX_TRAFFIC_CLASS - 1; tc; tc--) {
+		if (prio_mask & tc_config[tc].path[direction].up_to_tc_bitmap)
+			break;
 	}
+
+	return tc;
+}
+
+void ixgbe_dcb_unpack_map(struct ixgbe_dcb_config *cfg, int direction, u8 *map)
+{
+	u8 up;
+
+	for (up = 0; up < MAX_USER_PRIORITY; up++)
+		map[up] = ixgbe_dcb_get_tc_from_up(cfg, direction, up);
 }
 
 /**
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 4/7] ixgbe: Cleanup unpacking code for DCB
From: Jeff Kirsher @ 2012-07-14 23:07 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342307225-28917-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This is meant to be a generic clean-up of the remaining functions for
unpacking data from the DCB structures. The only real changes are:
replaced the variable i with tc for functions that were looping through the
traffic classes, and added a pointer for tc_class instead of path since
that way we only need to pull the pointer once instead of once per loop.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c |   47 ++++++++++++--------------
 1 file changed, 22 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
index 4fd5a0d..39ac2fe 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
@@ -180,55 +180,52 @@ out:
 
 void ixgbe_dcb_unpack_pfc(struct ixgbe_dcb_config *cfg, u8 *pfc_en)
 {
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	*pfc_en = 0;
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++)
-		*pfc_en |= !!(cfg->tc_config[i].dcb_pfc & 0xF) << i;
+	for (*pfc_en = 0, tc = 0; tc < MAX_TRAFFIC_CLASS; tc++) {
+		if (tc_config[tc].dcb_pfc != pfc_disabled)
+			*pfc_en |= 1 << tc;
+	}
 }
 
 void ixgbe_dcb_unpack_refill(struct ixgbe_dcb_config *cfg, int direction,
 			     u16 *refill)
 {
-	struct tc_bw_alloc *p;
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		p = &cfg->tc_config[i].path[direction];
-		refill[i] = p->data_credits_refill;
-	}
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		refill[tc] = tc_config[tc].path[direction].data_credits_refill;
 }
 
 void ixgbe_dcb_unpack_max(struct ixgbe_dcb_config *cfg, u16 *max)
 {
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++)
-		max[i] = cfg->tc_config[i].desc_credits_max;
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		max[tc] = tc_config[tc].desc_credits_max;
 }
 
 void ixgbe_dcb_unpack_bwgid(struct ixgbe_dcb_config *cfg, int direction,
 			    u8 *bwgid)
 {
-	struct tc_bw_alloc *p;
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		p = &cfg->tc_config[i].path[direction];
-		bwgid[i] = p->bwg_id;
-	}
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		bwgid[tc] = tc_config[tc].path[direction].bwg_id;
 }
 
 void ixgbe_dcb_unpack_prio(struct ixgbe_dcb_config *cfg, int direction,
 			    u8 *ptype)
 {
-	struct tc_bw_alloc *p;
-	int i;
+	struct tc_configuration *tc_config = &cfg->tc_config[0];
+	int tc;
 
-	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		p = &cfg->tc_config[i].path[direction];
-		ptype[i] = p->prio_type;
-	}
+	for (tc = 0; tc < MAX_TRAFFIC_CLASS; tc++)
+		ptype[tc] = tc_config[tc].path[direction].prio_type;
 }
 
 static u8 ixgbe_dcb_get_tc_from_up(struct ixgbe_dcb_config *cfg,
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox