Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/6] drivers/net/can/softing/softing_main.c: ensure a consistent return value in error case
From: Kurt Van Dijck @ 2012-07-15 20:21 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Wolfgang Grandegger, kernel-janitors, Marc Kleine-Budde,
	linux-can, netdev, linux-kernel
In-Reply-To: <1342284188-19176-3-git-send-email-Julia.Lawall@lip6.fr>

I see the problem, and the solution. You may add
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>

On Sat, Jul 14, 2012 at 06:43:04PM +0200, Julia Lawall wrote:
> From: Julia Lawall <Julia.Lawall@lip6.fr>
> 
> Typically, the return value desired for the failure of a function with an
> integer return value is a negative integer.  In these cases, the return
> value is sometimes a negative integer and sometimes 0, due to a subsequent
> initialization of the return variable within the loop.
> 
> A simplified version of the semantic match that finds this problem is:
> (http://coccinelle.lip6.fr/)
> 
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
> 
> ---
>  drivers/net/can/softing/softing_main.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/can/softing/softing_main.c b/drivers/net/can/softing/softing_main.c
> index a7c77c7..f2a221e 100644
> --- a/drivers/net/can/softing/softing_main.c
> +++ b/drivers/net/can/softing/softing_main.c
> @@ -826,12 +826,12 @@ static __devinit int softing_pdev_probe(struct platform_device *pdev)
>  		goto sysfs_failed;
>  	}
>  
> -	ret = -ENOMEM;
>  	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
>  		card->net[j] = netdev =
>  			softing_netdev_create(card, card->id.chip[j]);
>  		if (!netdev) {
>  			dev_alert(&pdev->dev, "failed to make can[%i]", j);
> +			ret = -ENOMEM;
>  			goto netdev_failed;
>  		}
>  		priv = netdev_priv(card->net[j]);
> 

^ permalink raw reply

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
From: Marc MERLIN @ 2012-07-15 21:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev
In-Reply-To: <20120411052733.GA17352@merlins.org>

On Tue, Apr 10, 2012 at 10:27:33PM -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> > Please try following patch, as it solved the problem for me (no more
> > order-1 allocations in tx path)
> 
> I applied our patch to 3.3.1 and cannot reproduce the problem anymore.
> 
> I'll leave a big wireless copy running overnight just in case, but I think
> you fixed it.

Mmmh, so I'm running 3.4.4 and I had another full machine hang while copying
big files (gigabytes) over wireless via NFS.
The laptop self recovered after 5mn or so (mouse cursor would not even
move) and I was able to kill -9 the process (midnight commander).
mc did not actually stop for another 4mn or so (i.e. it took that long for
the process to come out of kernel hung state), but the machine was usable
during that time.
Note that copying the same data with scp works fine.
NFS mount looks like this:
gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0

I didn't have anything like last time in the kernel logs, and more
annoyingly, ps -elf does not show anything for any process in WCHAN,
making pointing the finger a bit harder (procps-ng 3.3.3 does not show
anything other than '-' in WCHAN for any process with 3.4.4).

My understanding is that user space calling drivers that shut off all
interrupts for extended periods of time (as least I think so since my mouse
cursor would not move), is still a kernel bug.

For what it's worth, copying 1GB of data in lots of small files does not
cause problems, it seems that it's big files that cause a problem since they
likely fill a buffer somewhere while interrupts are disabled.

Do you have an idea of how I can find out where my mc process is stuck in
the kernel?
Should I reproduce with specific sysrq output?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply

* Re: resurrecting tcphealth
From: Piotr Sawuk @ 2012-07-15 22:17 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel
In-Reply-To: <1342346010.3265.10966.camel@edumazet-glaptop>

On So, 15.07.2012, 11:53, Eric Dumazet wrote:
> On Sun, 2012-07-15 at 11:17 +0200, Piotr Sawuk wrote:
>> On So, 15.07.2012, 09:16, Eric Dumazet wrote:
>> > On Sun, 2012-07-15 at 01:43 +0200, Piotr Sawuk wrote:
>> >
>> >> oh, and again I recommend the really short although outdated thesis
>> >>
>> >> [1] https://sacerdoti.org/tcphealth/tcphealth-paper.pdf
>> >
>> > A thesis saying SACK are not useful is highly suspect.
>> >
>> > Instead of finding why they behave not so good and fix the bugs, just
>> > say "SACK addition to TCP is not critical"
>> the actual quotation is "We also found that the number of unnecessary
>> duplicate packets were quite small potentially indicating that the SACK
>> addition to TCP is not critical."
>> >
>> > Really ?
>>
>> no, not really. he he actually said that SACK has been made mostly
>> obsolete
>> by "Linux 2.2 implements fast retransmits for up to two packet gaps, thus
>> reducing the need for course grained timeouts due to the lack of SACK."
>> and
>> he was a bit more careful and admitted that further tests with tcphealth
>> are
>> needed to check if SACK really makes that big a difference. he admitted
>> "It
>> could be that SACK's advantage lies in other areas such as very large
>> downloads or when using slow and unreliable network links." all these
>> things
>> could be checked again nowadays, with larger files available and
>> wlan-users
>> and higher traffic -- just find something without SACK...
>
> There are hundred of papers about TCP behavior. Many are very good.
>
> This one seems not the best of them by far, and is based on measures
> done on 2001 (!!!), on a single computer (!!!), connected to a
> particular ISP (!!!), using a wireless pcmcia network card. (!!!)
>
> At that time, almost no clients were using SACK. Because windows 98/XP
> dont negociate SACK by default (you need to tweak registry)
>
> Its like saying ECN is useless : If ECN users are under 1 % of total
> number of users, network is still under pressure and ECN benefits cannot
> rise because of misbehavior of other flows.

I don't think it's in any way similar, but then you definitely are more
knowledgeable on these matters. but it's quite straight-forward: SACK is
supposed to speed up things by informing when a packet has been received and
doesn't need to be resent. if the client doesn't get duplicate packets then
SACK wont make a difference. tcphealth counts how many packets have been
received correctly two or more times. why would tcphealth be useless for
answering if you'll need SACK?
>
> With RTT of 100 ms, SACK are clearly a win for long transferts.
>
> A single drop shall retransmit a single packet instead of ~500 packets.
> Only fools could deny this fact. Studying DuplicateAcks to detect
> retransmits is clearly wrong.

as far as I understood DuplicateAcks means either the Acks were lost in the
transfer or the same packet has been received twice one after the other.
i.e. duplicate packets received should always be smaller-or-equal than
duplicate acks sent. am I wrong? that's why tcphealth is only interested in
these two values.
>
> Really, dont recommend this paper, it contains a lot of false
> statements.
>
> One example : "we discovered some surprising things as the high
> percentage of lost or reordered packets from supposedly highly reliable
> and fast services as Akamai networks".
>
> I cant believe such nonsense can be written, and recommended.

I agree, it's quite some mickey-mouse thesis. but still a manual for what
tcphealth actually does do!
>
> So you can add more counters to TCP stack because having them is good to
> better understand TCP behavior and what can be done to improve it, but
> dont base this work on dubious 'tcphealth'.

what counters are you thinking of? how do you suggest an ordinary user could
improve the download-speed from sites? imho tcphealth offers a good method:
the more DuplicateAcks you send the less responsive the site is you want to
reach, so try at another time and do some other downloads now...

^ permalink raw reply

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-07-15 23:50 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120714170411.GA25775@kroah.com>

On Sat, Jul 14, 2012 at 10:04:11AM -0700, Greg KH wrote:
> On Fri, Jul 13, 2012 at 02:44:59PM -0700, Jon Mason wrote:
> > The NTB device driver is needed to configure these memory windows, doorbell, and
> > scratch-pad registers as well as use them in such a way as they can be turned
> > into a viable communication channel to the remote system.  ntb_hw.[ch]
> > determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
> > the underlying hardware to provide access and a common interface to the doorbell
> > registers, scratch pads, and memory windows.  These hardware interfaces are
> > exported so that other, non-mainlined kernel drivers can access these.
> 
> Why would you have non-mainlined drivers?
> 
> Can you submit the drivers at the same time so we see how you are using
> these new interfaces?

There are none at this time.  In the near future, the transport will be modified to use IOAT instead of the CPU copy to improve throughput performance, and it may be beneficial to have that separate.  If you wish for me to remove the hooks until it is necessary for that, then I can.

> 
> > +++ b/drivers/ntb/ntb_hw.c
> > @@ -0,0 +1,1283 @@
> > +/*
> > + * This file is provided under a dual BSD/GPLv2 license.  When using or
> > + *   redistributing this file, you may do so under either license.
> > + *
> > + *   GPL LICENSE SUMMARY
> > + *
> > + *   Copyright(c) 2012 Intel Corporation. All rights reserved.
> > + *
> > + *   This program is free software; you can redistribute it and/or modify
> > + *   it under the terms of version 2 of the GNU General Public License as
> > + *   published by the Free Software Foundation.
> > + *
> > + *   This program is distributed in the hope that it will be useful, but
> > + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> > + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + *   General Public License for more details.
> > + *
> > + *   You should have received a copy of the GNU General Public License
> > + *   along with this program; if not, write to the Free Software
> > + *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> > + *   The full GNU General Public License is included in this distribution
> > + *   in the file called LICENSE.GPL.
> 
> You really want to track the office movements of the FSF for the next 40
> years?  You should ask your lawyers if you can remove this paragraph.

Standard boilerplate junk, but it never hurts to ask.
 
> > +/**
> > + * ntb_hw_link_status() - return the hardware link status
> > + * @ndev: pointer to ntb_device instance
> > + *
> > + * Returns true if the hardware is connected to the remote system
> > + *
> > + * RETURNS: true or false based on the hardware link state
> > + */
> > +bool ntb_hw_link_status(struct ntb_device *ndev)
> > +{
> > +	return ndev->link_status == NTB_LINK_UP;
> > +}
> > +EXPORT_SYMBOL(ntb_hw_link_status);
> 
> EXPORT_SYMBOL_GPL() perhaps, for these, and the other symbols you are
> creating?

Will do.

> 
> thanks,
> 
> greg k-h

^ permalink raw reply

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Greg KH @ 2012-07-15 23:53 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120715235041.GB7551@jonmason-lab>

On Sun, Jul 15, 2012 at 04:50:41PM -0700, Jon Mason wrote:
> On Sat, Jul 14, 2012 at 10:04:11AM -0700, Greg KH wrote:
> > On Fri, Jul 13, 2012 at 02:44:59PM -0700, Jon Mason wrote:
> > > The NTB device driver is needed to configure these memory windows, doorbell, and
> > > scratch-pad registers as well as use them in such a way as they can be turned
> > > into a viable communication channel to the remote system.  ntb_hw.[ch]
> > > determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
> > > the underlying hardware to provide access and a common interface to the doorbell
> > > registers, scratch pads, and memory windows.  These hardware interfaces are
> > > exported so that other, non-mainlined kernel drivers can access these.
> > 
> > Why would you have non-mainlined drivers?
> > 
> > Can you submit the drivers at the same time so we see how you are using
> > these new interfaces?
> 
> There are none at this time.  In the near future, the transport will
> be modified to use IOAT instead of the CPU copy to improve throughput
> performance, and it may be beneficial to have that separate.  If you
> wish for me to remove the hooks until it is necessary for that, then I
> can.

Yes, please do so, we don't add apis for things that are not in-kernel
as they almost always need to change once we actually get a user of
them, as you know.

thanks,

greg k-h

^ permalink raw reply

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-07-15 23:55 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120714171015.GB25775@kroah.com>

On Sat, Jul 14, 2012 at 10:10:15AM -0700, Greg KH wrote:
> On Fri, Jul 13, 2012 at 02:44:59PM -0700, Jon Mason wrote:
> > +static int max_num_cbs = 2;
> > +module_param(max_num_cbs, uint, 0644);
> > +MODULE_PARM_DESC(max_num_cbs, "Maximum number of NTB transport connections");
> > +
> > +static bool no_msix;
> > +module_param(no_msix, bool, 0644);
> > +MODULE_PARM_DESC(no_msix, "Do not allow MSI-X interrupts to be selected");
> 
> How would a user, or a distro, know to set these options?  Why are they
> even options at all?

Good question.  There is actually a potential benefit to disabling MSI-X.  The NTB device on one of our platforms only has 3 MSI-X vectors.  In the current driver design, that would limit them to 3 client/virtual devices.  However, there are 15bits in the ISR that can be used for the same purpose.  So, if you disable MSI-X, you can have 15 instead of 3.  

> 
> 
> > +struct ntb_device {
> > +	struct pci_dev *pdev;
> > +	struct msix_entry *msix_entries;
> > +	void __iomem *reg_base;
> > +	struct ntb_mw mw[NTB_NUM_MW];
> > +	struct {
> > +		unsigned int max_spads;
> > +		unsigned int max_db_bits;
> > +		unsigned int msix_cnt;
> > +	} limits;
> > +	struct {
> > +		void __iomem *pdb;
> > +		void __iomem *pdb_mask;
> > +		void __iomem *sdb;
> > +		void __iomem *sbar2_xlat;
> > +		void __iomem *sbar4_xlat;
> > +		void __iomem *spad_write;
> > +		void __iomem *spad_read;
> > +		void __iomem *lnk_cntl;
> > +		void __iomem *lnk_stat;
> > +		void __iomem *spci_cmd;
> > +	} reg_ofs;
> > +	void *ntb_transport;
> > +	void (*event_cb)(void *handle, unsigned int event);
> 
> Shouldn't the event be an enum?

No, that would be too smart.

> 
> > +	struct ntb_db_cb *db_cb;
> > +	unsigned char hw_type;
> > +	unsigned char conn_type;
> > +	unsigned char dev_type;
> > +	unsigned char num_msix;
> > +	unsigned char bits_per_vector;
> > +	unsigned char max_cbs;
> > +	unsigned char link_status;
> > +	struct delayed_work hb_timer;
> > +	unsigned long last_ts;
> > +};
> 
> Why isn't this either a 'struct device' itself, or why isn't the 'struct
> pci_device' embedded within it?  What controls the lifetime of this
> device?  Why doesn't it show up in sysfs?  Don't you want it to show up
> in the global device tree?
> 
> > +static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_JSF)},
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF)},
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_JSF)},
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_SNB)},
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
> > +	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB)},
> > +	{0}
> > +};
> > +MODULE_DEVICE_TABLE(pci, ntb_pci_tbl);
> > +
> > +static struct ntb_device *ntbdev;
> 
> You can really only have just one of these in the whole system?  Is that
> wise?  Why isn't it dynamic and tied to the pci device itself as a
> child?

Good point, I will fix that up.

Thanks for the review!

> 
> thanks,
> 
> greg k-h

^ permalink raw reply

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Greg KH @ 2012-07-16  0:19 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120715235548.GC7551@jonmason-lab>

On Sun, Jul 15, 2012 at 04:55:48PM -0700, Jon Mason wrote:
> On Sat, Jul 14, 2012 at 10:10:15AM -0700, Greg KH wrote:
> > On Fri, Jul 13, 2012 at 02:44:59PM -0700, Jon Mason wrote:
> > > +static int max_num_cbs = 2;
> > > +module_param(max_num_cbs, uint, 0644);
> > > +MODULE_PARM_DESC(max_num_cbs, "Maximum number of NTB transport connections");
> > > +
> > > +static bool no_msix;
> > > +module_param(no_msix, bool, 0644);
> > > +MODULE_PARM_DESC(no_msix, "Do not allow MSI-X interrupts to be selected");
> > 
> > How would a user, or a distro, know to set these options?  Why are they
> > even options at all?
> 
> Good question.  There is actually a potential benefit to disabling
> MSI-X.  The NTB device on one of our platforms only has 3 MSI-X
> vectors.  In the current driver design, that would limit them to 3
> client/virtual devices.  However, there are 15bits in the ISR that can
> be used for the same purpose.  So, if you disable MSI-X, you can have
> 15 instead of 3.  

But again, how would a user, or a distro, know to set these?  Where is
the documentation describing it?  Why really have these options at all
and not just fix the platform issues (only 3 MSI-X vectors?  Really?)

thanks,

greg k-h

^ permalink raw reply

* [PATCH net-next] bonding: Support for multi function NIC devices
From: Anirban Chakraborty @ 2012-07-16  1:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Anirban Chakraborty

From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>

Add support to disable bonding of interfaces belonging to the same physical port. In
case of SRIOV or NIC partition mode, a single port of the adapter can have multiple
NIC functions. While bonding such interfaces, it is ensured that the NIC functions
belonging to the same physical port are not bonded together.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 Documentation/networking/ifenslave.c |  208 +++++++++++++++++++++++++++++++++-
 1 files changed, 207 insertions(+), 1 deletions(-)

diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
index ac5debb..a0bdab9 100644
--- a/Documentation/networking/ifenslave.c
+++ b/Documentation/networking/ifenslave.c
@@ -92,9 +92,14 @@
  *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
  *	 - Code cleanup and style changes
  *	   set version to 1.1.0
+ *
+ *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at qlogic dot com>
+ *	 - Added support to disable bonding interfaces belonging to the
+ *	   same physical port.
+ *	   set version to 1.1.1
  */
 
-#define APP_VERSION	"1.1.0"
+#define APP_VERSION	"1.1.1"
 #define APP_RELDATE	"December 1, 2003"
 #define APP_NAME	"ifenslave"
 
@@ -111,6 +116,10 @@ static const char *usage_msg =
 "       ifenslave -c   <master-if> <slave-if>\n"
 "       ifenslave --help\n";
 
+static const char *misconfig_msg =
+"Use interfaces from different physical port for an ethernet adapter\n"
+"which has multiple NIC functions belonging to the same physical port\n";
+
 static const char *help_msg =
 "\n"
 "       To create a bond device, simply follow these three steps :\n"
@@ -208,6 +217,18 @@ struct dev_ifr {
 	int req_type;
 };
 
+/* port: physical port
+ * bus: PCI bus no.
+ * ifname: interface name
+ * driver: driver for this device
+ */
+struct dev_prop {
+	u8	port;
+	u16	bus;
+	char	ifname[IFNAMSIZ];
+	char	driver[IFNAMSIZ];
+};
+
 struct dev_ifr master_ifra[] = {
 	{&master_mtu,     "SIOCGIFMTU",     SIOCGIFMTU},
 	{&master_flags,   "SIOCGIFFLAGS",   SIOCGIFFLAGS},
@@ -224,6 +245,10 @@ struct dev_ifr slave_ifra[] = {
 
 static void if_print(char *ifname);
 static int get_drv_info(char *master_ifname);
+static int get_bus_info(struct dev_prop *interface);
+static int get_device_info(int count, struct dev_prop *prop, char *slaves[]);
+static int validate_slaves(char *master, char **spp);
+static int valid_slaves(int count, char *interfaces[]);
 static int get_if_settings(char *ifname, struct dev_ifr ifra[]);
 static int get_slave_flags(char *slave_ifname);
 static int set_master_hwaddr(char *master_ifname, struct sockaddr *hwaddr);
@@ -335,6 +360,15 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
+	/* validate the slave configuration */
+	if (!opt_d) {
+		res = validate_slaves(master_ifname, spp);
+		if (res) {
+			fprintf(stderr, "%s\n", misconfig_msg);
+			goto out;
+		}
+	}
+
 	slave_ifname = *spp++;
 
 	if (slave_ifname == NULL) {
@@ -643,6 +677,178 @@ out:
 	return 0;
 }
 
+/*
+ * Validate if specified interfaces do not belong to the same physical port
+ */
+static int validate_slaves(char *master, char **spp)
+{
+	int i, count = 0, res = 0;
+	struct ifreq ifr;
+	ifbond bond;
+	ifslave slv;
+	char *bslave;
+	char **slaves, **islaves, **tslaves;
+
+	/* Find a count for existing slave interfaces */
+	memset(&ifr, 0, sizeof(ifr));
+	ifr.ifr_data = (ifbond *)&bond;
+	strncpy(ifr.ifr_name, master, IFNAMSIZ);
+	if (ioctl(skfd, SIOCBONDINFOQUERY, &ifr) < 0) {
+		if (errno == EOPNOTSUPP) {
+			saved_errno = errno;
+			return 1;
+		}
+	}
+	islaves = spp;
+	while (*islaves++ != NULL)
+		count++;
+	if (!count)
+		return 0;
+	count += bond.num_slaves;
+	slaves = malloc(sizeof(char) * count * IFNAMSIZ);
+	if (slaves == NULL)
+		return 1;
+	memset(slaves, 0, (sizeof(char) * count * IFNAMSIZ));
+	tslaves = slaves;
+	/* find new interface names */
+	islaves = spp;
+
+	while (*islaves != NULL)
+		memcpy(slaves++, islaves++, IFNAMSIZ);
+	/* find existing slave interface names */
+	for (i = 0; i < bond.num_slaves; i++) {
+		memset(&slv, 0, sizeof(slv));
+		ifr.ifr_data = (ifslave *)&slv;
+		slv.slave_id = i;
+		if (ioctl(skfd, SIOCBONDSLAVEINFOQUERY, &ifr) < 0) {
+			if (errno == EOPNOTSUPP) {
+				saved_errno = errno;
+				res = 1;
+				goto out;
+			}
+		}
+		bslave = slv.slave_name;
+		memcpy(slaves++, &bslave, IFNAMSIZ);
+	}
+	res = valid_slaves(count, tslaves);
+out:
+	if (tslaves)
+		free(tslaves);
+	return res;
+}
+
+static int get_bus_info(struct dev_prop *interface)
+{
+	struct ifreq ifr;
+	struct ethtool_drvinfo info;
+	char *buf[4], *token, *tmp, *end;
+	int i = 0;
+
+	memset(&ifr, 0, sizeof(ifr));
+	strncpy(ifr.ifr_name, interface->ifname, IFNAMSIZ);
+	ifr.ifr_data = (caddr_t)&info;
+
+	info.cmd = ETHTOOL_GDRVINFO;
+	if (ioctl(skfd, SIOCETHTOOL, &ifr) < 0) {
+		if (errno == EOPNOTSUPP)
+			goto out;
+		saved_errno = errno;
+		v_print("Slave '%s': Error: get bonding info failed %s\n",
+			interface->ifname, strerror(saved_errno));
+		return 1;
+	}
+	memcpy(interface->driver, info.driver, strlen(info.driver) + 1);
+	token = strtok_r(info.bus_info, " :", &tmp);
+	buf[i] =  token;
+	while (token) {
+		token = strtok_r(NULL, " :.", &tmp);
+		buf[++i] = token;
+	}
+	interface->bus = strtoul(buf[1], &end, 16);
+	return 0;
+out:
+	return 1;
+}
+
+static int get_device_info(int count, struct dev_prop *prop, char *slaves[])
+{
+	int ret = -1, i;
+	struct ifreq ifr;
+	struct ethtool_cmd info;
+
+	for (i = 0; i < count; i++) {
+		strncpy(prop[i].ifname, slaves[i], IFNAMSIZ);
+		memset(&ifr, 0, sizeof(ifr));
+		strncpy(ifr.ifr_name, prop[i].ifname, IFNAMSIZ);
+		ifr.ifr_data = (caddr_t)&info;
+
+		info.cmd = ETHTOOL_GSET;
+
+		if (ioctl(skfd, SIOCETHTOOL, &ifr) < 0) {
+			if (errno == EOPNOTSUPP)
+				goto out;
+			saved_errno = errno;
+			v_print("Slave '%s': Error: get bonding info failed"
+				" %s\n", prop[i].ifname,
+				strerror(saved_errno));
+			ret = 1;
+			goto out;
+		}
+		prop[i].port = info.phy_address;
+		ret = get_bus_info(&prop[i]);
+		if (ret)
+			goto out;
+	}
+out:
+	return ret;
+}
+
+/* For a given set of interfaces, find out if they belong to the
+ * same physical port. Return true if two interfaces are found to
+ * be from same physical port, otherwise return false.
+ */
+
+static int valid_slaves(int count, char *slaves[])
+{
+	int i, j, ret = 0;
+	struct dev_prop *ifprop, *searchif;
+	struct dev_prop *prop;
+
+	prop = malloc(count * sizeof(struct dev_prop));
+	if (prop == NULL)
+		return 1;
+
+	memset(prop, 0, sizeof(struct dev_prop) * count);
+	ret = get_device_info(count, prop, slaves);
+	/* Iterate over the array of interfaces and find a match */
+	for (j = 0; j < count; j++) {
+		ifprop = &prop[j];
+		for (i = j + 1; i < count; i++) {
+			searchif = &prop[i];
+			/* Compare driver names */
+			if (!strncmp(ifprop->driver, searchif->driver, IFNAMSIZ)
+				&& !strncmp(ifprop->ifname, searchif->ifname,
+				IFNAMSIZ))
+				continue;
+			/* Compare physical port and bus of the interfaces */
+			if ((searchif->bus == ifprop->bus) &&
+				(searchif->port == ifprop->port)) {
+				ret = 1;
+				fprintf(stderr,
+					"slave interfaces %s and %s "
+					"belong to the same physical "
+					"port of the adapter\n",
+					searchif->ifname, ifprop->ifname);
+				goto out;
+			}
+		}
+	}
+out:
+	free(prop);
+	prop = NULL;
+	return ret;
+}
+
 static int change_active(char *master_ifname, char *slave_ifname)
 {
 	struct ifreq ifr;
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] ipvs: fixed sparse warning
From: Simon Horman @ 2012-07-16  1:39 UTC (permalink / raw)
  To: Claudiu Ghioc
  Cc: netdev, davem, netfilter-devel, linux-kernel, wensong, ja, pablo,
	kaber, Claudiu Ghioc
In-Reply-To: <1342198642-22741-1-git-send-email-claudiu.ghioc@gmail.com>

On Fri, Jul 13, 2012 at 07:57:22PM +0300, Claudiu Ghioc wrote:
> Removed the following sparse warnings:
> *	warning: symbol 'ip_vs_control_net_init_sysctl' was not declared. Should
> 	it be static?
> *	warning: symbol 'ip_vs_control_net_cleanup_sysctl' was not
> 	declared. Should it be static?

Thanks.

I wonder if you could post a version of this patch which also fixes the
alternate implementation of ip_vs_control_net_init_sysctl around line 3757 and
the alternate implementation of ip_vs_control_net_cleanup_sysctl that
appears around line 3758.

I believe that you will see sparse warnings for these if
you disable CONFIG_SYSCTL by disabling CONFIG_PROC_SYSCTL.

> 
> Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com>
> ---
>  net/netfilter/ipvs/ip_vs_ctl.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index d43e3c1..3542c6b1 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -3674,7 +3674,7 @@ static void ip_vs_genl_unregister(void)
>   * per netns intit/exit func.
>   */
>  #ifdef CONFIG_SYSCTL
> -int __net_init ip_vs_control_net_init_sysctl(struct net *net)
> +static int __net_init ip_vs_control_net_init_sysctl(struct net *net)
>  {
>  	int idx;
>  	struct netns_ipvs *ipvs = net_ipvs(net);
> @@ -3742,7 +3742,7 @@ int __net_init ip_vs_control_net_init_sysctl(struct net *net)
>  	return 0;
>  }
>  
> -void __net_exit ip_vs_control_net_cleanup_sysctl(struct net *net)
> +static void __net_exit ip_vs_control_net_cleanup_sysctl(struct net *net)
>  {
>  	struct netns_ipvs *ipvs = net_ipvs(net);
>  
> -- 
> 1.7.10.4
> 

^ permalink raw reply

* Re: [PATCH net-next] bonding: Support for multi function NIC devices
From: Jay Vosburgh @ 2012-07-16  1:40 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: davem, netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1342400890-32183-1-git-send-email-anirban.chakraborty@qlogic.com>

Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:

>From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>
>Add support to disable bonding of interfaces belonging to the same physical port. In
>case of SRIOV or NIC partition mode, a single port of the adapter can have multiple
>NIC functions. While bonding such interfaces, it is ensured that the NIC functions
>belonging to the same physical port are not bonded together.
>
>Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>---
> Documentation/networking/ifenslave.c |  208 +++++++++++++++++++++++++++++++++-
> 1 files changed, 207 insertions(+), 1 deletions(-)
>
>diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
>index ac5debb..a0bdab9 100644
>--- a/Documentation/networking/ifenslave.c
>+++ b/Documentation/networking/ifenslave.c
>@@ -92,9 +92,14 @@
>  *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
>  *	 - Code cleanup and style changes
>  *	   set version to 1.1.0
>+ *
>+ *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at qlogic dot com>
>+ *	 - Added support to disable bonding interfaces belonging to the
>+ *	   same physical port.
>+ *	   set version to 1.1.1

	This patch is all implemented within the ifenslave user space
program, which, to my knowledge, is not currently used by any major
distro to configure bonding.

	The configuration for bonding is typically performed by packages
such as initscripts or sysconfig, and this functionality would likely
need to go there.

	The only real use for ifenslave.c is on kernels without sysfs
compiled in.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net] caif: Fix access to freed pernet memory
From: Eric W. Biederman @ 2012-07-16  1:43 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: davem, netdev, sjurbren, Dmitry Tarnyagin
In-Reply-To: <1342383014-5525-1-git-send-email-sjur.brandeland@stericsson.com>

sjur.brandeland@stericsson.com writes:

> From: Sjur Brændeland <sjur.brandeland@stericsson.com>
>
> unregister_netdevice_notifier() must be called before
> unregister_pernet_subsys() to avoid accessing already freed
> pernet memory. This fixes the following oops when doing rmmod:

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

The patch looks good to me.

You might want to look at error handling during caif_device_init
as well, as it looks like the same ordering issue is present.
Although unlikely to bite anyone at that point.

Eric

>
> Call Trace:
>  [<ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif]
>  [<ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100
>  [<ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif]
>  [<ffffffff810e7734>] sys_delete_module+0x1a4/0x300
>  [<ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
>  [<ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3
>  [<ffffffff81696bad>] system_call_fastpath+0x1a/0x1f
>
> RIP
>  [<ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif]
>
> Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
> ---
>
> Hi Dave,
>
> Can you please queue up this bugfix as appropriate for -net and -stable?
>
> I guess this bug has been around since introduction of network
> name spaces in CAIF, but it became visible after 3.4 and the commit:
> "net: In unregister_netdevice_notifier unregister the netdevices." 
>
> Thanks,
> Sjur
>
>  net/caif/caif_dev.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/caif/caif_dev.c b/net/caif/caif_dev.c
> index 554b312..8c83c17 100644
> --- a/net/caif/caif_dev.c
> +++ b/net/caif/caif_dev.c
> @@ -561,9 +561,9 @@ static int __init caif_device_init(void)
>  
>  static void __exit caif_device_exit(void)
>  {
> -	unregister_pernet_subsys(&caif_net_ops);
>  	unregister_netdevice_notifier(&caif_device_notifier);
>  	dev_remove_pack(&caif_packet_type);
> +	unregister_pernet_subsys(&caif_net_ops);
>  }
>  
>  module_init(caif_device_init);

^ permalink raw reply

* Re: [PATCH net-next] bonding: Support for multi function NIC devices
From: John Fastabend @ 2012-07-16  4:39 UTC (permalink / raw)
  To: Jay Vosburgh, Anirban Chakraborty; +Cc: davem, netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <20657.1342402819@death.nxdomain>

On 7/15/2012 6:40 PM, Jay Vosburgh wrote:
> Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>
>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>
>> Add support to disable bonding of interfaces belonging to the same physical port. In
>> case of SRIOV or NIC partition mode, a single port of the adapter can have multiple
>> NIC functions. While bonding such interfaces, it is ensured that the NIC functions
>> belonging to the same physical port are not bonded together.
>>
>> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>> ---
>> Documentation/networking/ifenslave.c |  208 +++++++++++++++++++++++++++++++++-
>> 1 files changed, 207 insertions(+), 1 deletions(-)
>>
>> diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
>> index ac5debb..a0bdab9 100644
>> --- a/Documentation/networking/ifenslave.c
>> +++ b/Documentation/networking/ifenslave.c
>> @@ -92,9 +92,14 @@
>>   *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
>>   *	 - Code cleanup and style changes
>>   *	   set version to 1.1.0
>> + *
>> + *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at qlogic dot com>
>> + *	 - Added support to disable bonding interfaces belonging to the
>> + *	   same physical port.
>> + *	   set version to 1.1.1
>
> 	This patch is all implemented within the ifenslave user space
> program, which, to my knowledge, is not currently used by any major
> distro to configure bonding.
>
> 	The configuration for bonding is typically performed by packages
> such as initscripts or sysconfig, and this functionality would likely
> need to go there.
>
> 	The only real use for ifenslave.c is on kernels without sysfs
> compiled in.
>
> 	-J
>

Also I'm not sure we need to explicitly block this.  It is clear from 
looking at 'ip' output what the topology is. And in the SR-IOV
case would this still work if the functions are direct assigned? How
about if I try to bond two stacked devices that are on the same
physical link. In both case iirc the bus info wont match up.

Seems easier to just call this a configuration error or not if for
some reason this is really what someone intended.

.John

^ permalink raw reply

* Re: [PATCH net-next] bonding: Support for multi function NIC devices
From: Anirban Chakraborty @ 2012-07-16  5:12 UTC (permalink / raw)
  To: John Fastabend, Jay Vosburgh
  Cc: David Miller, netdev, Dept-NX Linux NIC Driver
In-Reply-To: <50039B11.1010006@intel.com>



On 7/15/12 9:39 PM, "John Fastabend" <john.r.fastabend@intel.com> wrote:

>On 7/15/2012 6:40 PM, Jay Vosburgh wrote:
>> Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>>
>>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>>
>>> Add support to disable bonding of interfaces belonging to the same
>>>physical port. In
>>> case of SRIOV or NIC partition mode, a single port of the adapter can
>>>have multiple
>>> NIC functions. While bonding such interfaces, it is ensured that the
>>>NIC functions
>>> belonging to the same physical port are not bonded together.
>>>
>>> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>>> ---
>>> Documentation/networking/ifenslave.c |  208
>>>+++++++++++++++++++++++++++++++++-
>>> 1 files changed, 207 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/Documentation/networking/ifenslave.c
>>>b/Documentation/networking/ifenslave.c
>>> index ac5debb..a0bdab9 100644
>>> --- a/Documentation/networking/ifenslave.c
>>> +++ b/Documentation/networking/ifenslave.c
>>> @@ -92,9 +92,14 @@
>>>   *    - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
>>>   *	 - Code cleanup and style changes
>>>   *	   set version to 1.1.0
>>> + *
>>> + *    - 2012/07/15 - Anirban Chakraborty <anirban.chakraborty at
>>>qlogic dot com>
>>> + *	 - Added support to disable bonding interfaces belonging to the
>>> + *	   same physical port.
>>> + *	   set version to 1.1.1
>>
>> 	This patch is all implemented within the ifenslave user space
>> program, which, to my knowledge, is not currently used by any major
>> distro to configure bonding.
>>
>> 	The configuration for bonding is typically performed by packages
>> such as initscripts or sysconfig, and this functionality would likely
>> need to go there.
>>
>> 	The only real use for ifenslave.c is on kernels without sysfs
>> compiled in.
>>
>> 	-J
>>
>
>Also I'm not sure we need to explicitly block this.  It is clear from
>looking at 'ip' output what the topology is. And in the SR-IOV
>case would this still work if the functions are direct assigned? How
>about if I try to bond two stacked devices that are on the same
>physical link. In both case iirc the bus info wont match up.
>
>Seems easier to just call this a configuration error or not if for
>some reason this is really what someone intended.
>
>.John

I agree that for SR-IOV case with VFs assigned directly to the guest, bus
info won't
match up. However, I was thinking from the point of view of NIC
partitioned mode (NPAR),
and for the use case of SR-IOV VFs assigned to the hypervisor. It would be
nice to
prevent the user from getting into misconfiguration.

-Anirban

^ permalink raw reply

* Re: iptables CLAMP MSS to PMTU not working?
From: Timo Teras @ 2012-07-16  5:49 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev
In-Reply-To: <20120712132419.50b4acaf@vostro>

On Thu, 12 Jul 2012 13:24:19 +0300 Timo Teras <timo.teras@iki.fi> wrote:

> On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi>
> wrote:
> 
> > We recently noticed that CLAMPMSS to path MTU does not seem to be
> > working properly. Most recently tested version is linux-3.3.6 which
> > does not work. linux-2.6.35 works for sure, but I suspect it to have
> > broken somewhere around 3.0'ish with the inetpeer changes.
> > 
> > In my case, the destination is on gre tunnel (that gets routed to
> > Internet over IPsec transport mode).
> > 
> > 'ip route' command verifies that in both boxes the path-MTU is
> > detected properly. That, is on both cases the static route MTU is
> > higher. And after large packets sent, ICMP frag-needed is received
> > and the cache route is updated properly.
> > 
> > On the new kernel, I get info like:
> > # ip route get 10.x.x.x
> > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> >     cache  expires 68sec ipid 0x3153 mtu 1422
> 
> CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the
> gre1 interface MTU:
> 
> 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN 
> 
> So apparently CLAMPMSS is honoring the static route for gre1, instead
> of the cached pmtu route.
> 
> > And the older kernel:
> > # ip route get 10.x.x.x
> > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> >     cache  expires 595sec ipid 0xd241 mtu 1422 advmss 1432 hoplimit
> > 64
> > 
> > For some reason, iptables CLAMPMSS seems to set incorrect MSS for
> > this route (or maybe it's using the static route instead?).
> 
> And in this case MSS is set to 1382. That is, it's properly calculated
> from the path MTU (1422-40=1382). I would expect the advmss of the
> cached route to get updated on the TCP connects on the older kernels
> (the above paste is after pinging with large packets and no TCP
> connection done for the cached entry).

Looking at the changelog, this would likely be side effect of:

commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
Author: Steffen Klassert <steffen.klassert@secunet.com>
Date:   Wed Nov 23 02:14:50 2011 +0000

    ipv4: Don't use the cached pmtu informations for input routes

At least from performance side, it would be better if CLAMPMSS to PMTU
would clamp to the learned, cached mtu.

^ permalink raw reply

* Re: [PATCH net-next] bonding: Support for multi function NIC devices
From: Jay Vosburgh @ 2012-07-16  5:50 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: John Fastabend, David Miller, netdev, Dept-NX Linux NIC Driver
In-Reply-To: <CC28EE8B.36B53%anirban.chakraborty@qlogic.com>

Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>On 7/15/12 9:39 PM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
[...]
>>Also I'm not sure we need to explicitly block this.  It is clear from
>>looking at 'ip' output what the topology is. And in the SR-IOV
>>case would this still work if the functions are direct assigned? How
>>about if I try to bond two stacked devices that are on the same
>>physical link. In both case iirc the bus info wont match up.
>>
>>Seems easier to just call this a configuration error or not if for
>>some reason this is really what someone intended.
>>
>>.John
>
>I agree that for SR-IOV case with VFs assigned directly to the guest, bus
>info won't
>match up. However, I was thinking from the point of view of NIC
>partitioned mode (NPAR),
>and for the use case of SR-IOV VFs assigned to the hypervisor. It would be
>nice to
>prevent the user from getting into misconfiguration.

	If I'm understanding correctly, to hit the case you're worried
about here would require assigning multiple VFs from one PF to the same
linux instance as the PF itself, and then bonding those VFs together.

	Heck, there might be some arcane reason that somebody wants to
do that on purpose, or the test may inadvertently prohibit legal
configurations that happen to match the criteria.

	Has this been a real problem in practice?  I'm not seeing a
compelling argument for doing this.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net-next] bonding: Support for multi function NIC devices
From: Anirban Chakraborty @ 2012-07-16  6:10 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: John Fastabend, David Miller, netdev, Dept-NX Linux NIC Driver
In-Reply-To: <22437.1342417803@death.nxdomain>



On 7/15/12 10:50 PM, "Jay Vosburgh" <fubar@us.ibm.com> wrote:

>Anirban Chakraborty <anirban.chakraborty@qlogic.com> wrote:
>>On 7/15/12 9:39 PM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
>[...]
>>>Also I'm not sure we need to explicitly block this.  It is clear from
>>>looking at 'ip' output what the topology is. And in the SR-IOV
>>>case would this still work if the functions are direct assigned? How
>>>about if I try to bond two stacked devices that are on the same
>>>physical link. In both case iirc the bus info wont match up.
>>>
>>>Seems easier to just call this a configuration error or not if for
>>>some reason this is really what someone intended.
>>>
>>>.John
>>
>>I agree that for SR-IOV case with VFs assigned directly to the guest, bus
>>info won't
>>match up. However, I was thinking from the point of view of NIC
>>partitioned mode (NPAR),
>>and for the use case of SR-IOV VFs assigned to the hypervisor. It would
>>be
>>nice to
>>prevent the user from getting into misconfiguration.
>
>	If I'm understanding correctly, to hit the case you're worried
>about here would require assigning multiple VFs from one PF to the same
>linux instance as the PF itself, and then bonding those VFs together.
>
>	Heck, there might be some arcane reason that somebody wants to
>do that on purpose, or the test may inadvertently prohibit legal
>configurations that happen to match the criteria.
>
>	Has this been a real problem in practice?  I'm not seeing a
>compelling argument for doing this.
>
>	-J

The real problem that we have faced so far is the case of NPAR, where
multiple
physical functions are assigned to the linux host and the customer tried
to create
a bond of interfaces from the same physical port. In this case, linux host
was running
without any guest OS and the NICs are used for carrying host traffic only.

-Anirban

^ permalink raw reply

* (unknown)
From: Tess Bradley @ 2012-07-16  6:14 UTC (permalink / raw)




 Need urgent Loan? email me for more info.

^ permalink raw reply

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
From: Eric Dumazet @ 2012-07-16  6:18 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev
In-Reply-To: <20120715215935.GF24420@merlins.org>

On Sun, 2012-07-15 at 14:59 -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 10:27:33PM -0700, Marc MERLIN wrote:
> > On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> > > Please try following patch, as it solved the problem for me (no more
> > > order-1 allocations in tx path)
> > 
> > I applied our patch to 3.3.1 and cannot reproduce the problem anymore.
> > 
> > I'll leave a big wireless copy running overnight just in case, but I think
> > you fixed it.
> 
> Mmmh, so I'm running 3.4.4 and I had another full machine hang while copying
> big files (gigabytes) over wireless via NFS.
> The laptop self recovered after 5mn or so (mouse cursor would not even
> move) and I was able to kill -9 the process (midnight commander).
> mc did not actually stop for another 4mn or so (i.e. it took that long for
> the process to come out of kernel hung state), but the machine was usable
> during that time.
> Note that copying the same data with scp works fine.
> NFS mount looks like this:
> gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0
> 
> I didn't have anything like last time in the kernel logs, and more
> annoyingly, ps -elf does not show anything for any process in WCHAN,
> making pointing the finger a bit harder (procps-ng 3.3.3 does not show
> anything other than '-' in WCHAN for any process with 3.4.4).
> 
> My understanding is that user space calling drivers that shut off all
> interrupts for extended periods of time (as least I think so since my mouse
> cursor would not move), is still a kernel bug.
> 
> For what it's worth, copying 1GB of data in lots of small files does not
> cause problems, it seems that it's big files that cause a problem since they
> likely fill a buffer somewhere while interrupts are disabled.
> 
> Do you have an idea of how I can find out where my mc process is stuck in
> the kernel?
> Should I reproduce with specific sysrq output?

Just to clarify, you get this freeze when transferring a big file from a
remote NFS server to your PC, (aka a download), not the reverse way ?

If so, you might hit OOM condition because iwlwifi uses big/fat RX
buffers, I never understood why...

(amsdu_size_8K = 1)

Storing an MTU=1500 frams in 8KB of memory sounds really bad.


diff --git a/drivers/net/wireless/iwlwifi/iwl-drv.c b/drivers/net/wireless/iwlwifi/iwl-drv.c
index cc41cfa..434b924 100644
--- a/drivers/net/wireless/iwlwifi/iwl-drv.c
+++ b/drivers/net/wireless/iwlwifi/iwl-drv.c
@@ -1006,7 +1006,7 @@ void iwl_drv_stop(struct iwl_drv *drv)
 
 /* shared module parameters */
 struct iwl_mod_params iwlwifi_mod_params = {
-	.amsdu_size_8K = 1,
+	.amsdu_size_8K = 0,
 	.restart_fw = 1,
 	.plcp_check = true,
 	.bt_coex_active = true,

^ permalink raw reply related

* Re: iptables CLAMP MSS to PMTU not working?
From: Timo Teras @ 2012-07-16  6:20 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert; +Cc: netdev
In-Reply-To: <20120716084946.67b91a69@vostro>

On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> wrote:

> On Thu, 12 Jul 2012 13:24:19 +0300 Timo Teras <timo.teras@iki.fi>
> wrote:
> 
> > On Thu, 12 Jul 2012 12:00:21 +0300 Timo Teras <timo.teras@iki.fi>
> > wrote:
> > 
> > > We recently noticed that CLAMPMSS to path MTU does not seem to be
> > > working properly. Most recently tested version is linux-3.3.6
> > > which does not work. linux-2.6.35 works for sure, but I suspect
> > > it to have broken somewhere around 3.0'ish with the inetpeer
> > > changes.
> > > 
> > > In my case, the destination is on gre tunnel (that gets routed to
> > > Internet over IPsec transport mode).
> > > 
> > > 'ip route' command verifies that in both boxes the path-MTU is
> > > detected properly. That, is on both cases the static route MTU is
> > > higher. And after large packets sent, ICMP frag-needed is received
> > > and the cache route is updated properly.
> > > 
> > > On the new kernel, I get info like:
> > > # ip route get 10.x.x.x
> > > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> > >     cache  expires 68sec ipid 0x3153 mtu 1422
> > 
> > CLAMP MSS sets MSS to 1432. Which implies MTU 1472. This matches the
> > gre1 interface MTU:
> > 
> > 14: gre1: <UP,LOWER_UP> mtu 1472 qdisc noqueue state UNKNOWN 
> > 
> > So apparently CLAMPMSS is honoring the static route for gre1,
> > instead of the cached pmtu route.
> > 
> > > And the older kernel:
> > > # ip route get 10.x.x.x
> > > 10.x.x.x via 172.16.y.y dev gre1  src 172.16.z.z 
> > >     cache  expires 595sec ipid 0xd241 mtu 1422 advmss 1432
> > > hoplimit 64
> > > 
> > > For some reason, iptables CLAMPMSS seems to set incorrect MSS for
> > > this route (or maybe it's using the static route instead?).
> > 
> > And in this case MSS is set to 1382. That is, it's properly
> > calculated from the path MTU (1422-40=1382). I would expect the
> > advmss of the cached route to get updated on the TCP connects on
> > the older kernels (the above paste is after pinging with large
> > packets and no TCP connection done for the cached entry).
> 
> Looking at the changelog, this would likely be side effect of:
> 
> commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
> Author: Steffen Klassert <steffen.klassert@secunet.com>
> Date:   Wed Nov 23 02:14:50 2011 +0000
> 
>     ipv4: Don't use the cached pmtu informations for input routes
> 
> At least from performance side, it would be better if CLAMPMSS to PMTU
> would clamp to the learned, cached mtu.

Actually, this is worse. Since XFRM is ignored - it breaks
fragmentation for IPsec targets.

Could this be reverted?

^ permalink raw reply

* RE: Is TCP vulneribility patch (as in RFC 5961) done in linux?
From: Kiran (Kiran Kumar) Kella @ 2012-07-16  7:06 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <68700EDA775E5E47B5EBA9FF8AC0F15C076B03@SJEXCHMB09.corp.ad.broadcom.com>

Looking into the file tcp_input.c in the latest stable linux release 3.4.4 source, I understand the fix for this recommendation is not implemented in Linux.
Any reason why it was not addressed?

Regards,
Kiran

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Kiran (Kiran Kumar) Kella
Sent: Friday, July 13, 2012 11:48 AM
To: netdev@vger.kernel.org
Subject: Is TCP vulneribility patch (as in RFC 5961) done in linux?

Hi,

I just now checked in the kernel archives if the patch in section 3.2 mentioned in RFC 5961 for RST attacks with predictable sequence numbers.
I see some discussion happened in 2004 timeframe.
I was just wondering if in the latest linux source, the patch is made available.

Appreciate your quick response in this regard.

Thanks,
Kiran

^ permalink raw reply

* Re: iptables CLAMP MSS to PMTU not working?
From: Steffen Klassert @ 2012-07-16  7:23 UTC (permalink / raw)
  To: Timo Teras; +Cc: David S. Miller, netdev
In-Reply-To: <20120716092058.270f6008@vostro>

On Mon, Jul 16, 2012 at 09:20:58AM +0300, Timo Teras wrote:
> On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi> wrote:
> 
> > Looking at the changelog, this would likely be side effect of:
> > 
> > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
> > Author: Steffen Klassert <steffen.klassert@secunet.com>
> > Date:   Wed Nov 23 02:14:50 2011 +0000
> > 
> >     ipv4: Don't use the cached pmtu informations for input routes
> > 
> > At least from performance side, it would be better if CLAMPMSS to PMTU
> > would clamp to the learned, cached mtu.
> 
> Actually, this is worse. Since XFRM is ignored - it breaks
> fragmentation for IPsec targets.
> 
> Could this be reverted?

I did this patch to avoid to propagate learned PMTU informations.
It restores the behaviour we had before we moved the PMTU informations
to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU
informations of an input route, which is not possible any more after
this patch.

Anyway, this patch seems to be obsolete in the net-next tree, as
the cached pmtu informations are back in the route. So we should remove
the check for an output route from ipv4_mtu() in the net-next tree.
This should bring CLAMPMSS back to work, at least for upcoming
kernel versions.

^ permalink raw reply

* Re: [PATCH 2/6] drivers/net/can/softing/softing_main.c: ensure a consistent return value in error case
From: Marc Kleine-Budde @ 2012-07-16  7:34 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Wolfgang Grandegger, kernel-janitors, linux-can, netdev,
	linux-kernel
In-Reply-To: <1342284188-19176-3-git-send-email-Julia.Lawall@lip6.fr>

[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]

On 07/14/2012 06:43 PM, Julia Lawall wrote:
> From: Julia Lawall <Julia.Lawall@lip6.fr>
> 
> Typically, the return value desired for the failure of a function with an
> integer return value is a negative integer.  In these cases, the return
> value is sometimes a negative integer and sometimes 0, due to a subsequent
> initialization of the return variable within the loop.
> 
> A simplified version of the semantic match that finds this problem is:
> (http://coccinelle.lip6.fr/)
> 
> //<smpl>
> @r exists@
> identifier ret;
> position p;
> constant C;
> expression e1,e3,e4;
> statement S;
> @@
> 
> ret = -C
> ... when != ret = e3
>     when any
> if@p (...) S
> ... when any
> if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; }
> ... when != ret = e3
>     when any
> *if@p (...)
> {
>   ... when != ret = e4
>   return ret;
> }
> //</smpl>
> 
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

Thanks, applied to can-next

Marc
> 
> ---
>  drivers/net/can/softing/softing_main.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/can/softing/softing_main.c b/drivers/net/can/softing/softing_main.c
> index a7c77c7..f2a221e 100644
> --- a/drivers/net/can/softing/softing_main.c
> +++ b/drivers/net/can/softing/softing_main.c
> @@ -826,12 +826,12 @@ static __devinit int softing_pdev_probe(struct platform_device *pdev)
>  		goto sysfs_failed;
>  	}
>  
> -	ret = -ENOMEM;
>  	for (j = 0; j < ARRAY_SIZE(card->net); ++j) {
>  		card->net[j] = netdev =
>  			softing_netdev_create(card, card->id.chip[j]);
>  		if (!netdev) {
>  			dev_alert(&pdev->dev, "failed to make can[%i]", j);
> +			ret = -ENOMEM;
>  			goto netdev_failed;
>  		}
>  		priv = netdev_priv(card->net[j]);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: iptables CLAMP MSS to PMTU not working?
From: Timo Teras @ 2012-07-16  7:55 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David S. Miller, netdev
In-Reply-To: <20120716072305.GJ1869@secunet.com>

On Mon, 16 Jul 2012 09:23:05 +0200 Steffen Klassert
<steffen.klassert@secunet.com> wrote:

> On Mon, Jul 16, 2012 at 09:20:58AM +0300, Timo Teras wrote:
> > On Mon, 16 Jul 2012 08:49:46 +0300 Timo Teras <timo.teras@iki.fi>
> > wrote:
> > 
> > > Looking at the changelog, this would likely be side effect of:
> > > 
> > > commit 261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
> > > Author: Steffen Klassert <steffen.klassert@secunet.com>
> > > Date:   Wed Nov 23 02:14:50 2011 +0000
> > > 
> > >     ipv4: Don't use the cached pmtu informations for input routes
> > > 
> > > At least from performance side, it would be better if CLAMPMSS to
> > > PMTU would clamp to the learned, cached mtu.
> > 
> > Actually, this is worse. Since XFRM is ignored - it breaks
> > fragmentation for IPsec targets.
> > 
> > Could this be reverted?
> 
> I did this patch to avoid to propagate learned PMTU informations.
> It restores the behaviour we had before we moved the PMTU informations
> to the inetpeer. Unfortunately CLAMPMSS really wants to have the PMTU
> informations of an input route, which is not possible any more after
> this patch.
>
> Anyway, this patch seems to be obsolete in the net-next tree, as
> the cached pmtu informations are back in the route. So we should
> remove the check for an output route from ipv4_mtu() in the net-next
> tree. This should bring CLAMPMSS back to work, at least for upcoming
> kernel versions.

Right, saw those commits. But before net-next hits release, I'd really
need a fix for 3.3/3.4/3.5. Non-working fragmentation with IPsec, and
this CLAMPMSS thingy are an upgrade stopper for me.

Would it be safe to just revert this commit, with the side-effect of
exposing cached pmtu too agressively?

Or would it be better to try to backport the relevant changes of moving
pmtu back to route table?

^ permalink raw reply

* [PATCH] ipv6: fix incorrect route 'expires' value passed to userspace.
From: Li Wei @ 2012-07-16  8:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Stephen Hemminger


When userspace use RTM_GETROUTE to dump route table, with a already
expired route entry, we always got an 'expires' value(2147157)
calculated base on INT_MAX.

The reason of this problem is in the following satement:
	rt->dst.expires - jiffies < INT_MAX
gcc promoted the type of both sides of '<' to unsigned long, thus
a small negative value would be considered greater than INT_MAX.

This patch fix this by cast the result of subtraction to an 'int'
which I think is large enough for the expires.

Also we should do some fix in rtnl_put_cacheinfo() which use
jiffies_to_clock_t(which take an unsigned log as parameter) to
convert jiffies to clock_t to handle the negative expires.
---
 net/core/rtnetlink.c |    3 ++-
 net/ipv6/route.c     |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 21318d1..f92f3d8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -641,7 +641,8 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id,
 	};
 
 	if (expires)
-		ci.rta_expires = jiffies_to_clock_t(expires);
+		ci.rta_expires = expires > 0 ? jiffies_to_clock_t(expires)
+			: -jiffies_to_clock_t(-expires);
 
 	return nla_put(skb, RTA_CACHEINFO, sizeof(ci), &ci);
 }
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index becb048..a7fec9d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2516,7 +2516,7 @@ static int rt6_fill_node(struct net *net,
 		goto nla_put_failure;
 	if (!(rt->rt6i_flags & RTF_EXPIRES))
 		expires = 0;
-	else if (rt->dst.expires - jiffies < INT_MAX)
+	else if ((int)(rt->dst.expires - jiffies) < INT_MAX)
 		expires = rt->dst.expires - jiffies;
 	else
 		expires = INT_MAX;
-- 
1.7.1

^ permalink raw reply related

* Re: linux-next: manual merge of the net-next tree with the infiniband tree
From: Jack Morgenstein @ 2012-07-16  8:33 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, netdev, linux-next, linux-kernel, Roland Dreier,
	linux-rdma, Hadar Hen Zion, Or Gerlitz
In-Reply-To: <20120712120950.223be053857381046b7d5db6@canb.auug.org.au>

On Thursday 12 July 2012 05:09, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in
> drivers/net/ethernet/mellanox/mlx4/main.c between commit 6634961c14d3
> ("mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct
> and paravirtualize them") from the infiniband tree and commit
> 0ff1fb654bec ("{NET, IB}/mlx4: Add device managed flow steering firmware
> API") from the net-next tree.
> 
> Just context changes (I think).  I have fixed it up (see below) and can
> carry the fix as necessary.

Thanks, Stephen!

Ack for IB side.

-Jack

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox