Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v6 0/3] netdev/of/phy: MDIO bus multiplexer support.
From: David Daney @ 2012-05-24 18:50 UTC (permalink / raw)
  To: Timur Tabi
  Cc: devicetree-discuss@lists.ozlabs.org, netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4FBE7DD8.509@freescale.com>

On 05/24/2012 11:28 AM, Timur Tabi wrote:
> David Daney wrote:
>> Yes.  You may note in the DTS file I attached in the parent (sorry for
>> the fubar mime types), that there are two, almost identical, MDIO
>> masters.  smi0 has two directly attached PHYs.  smi1 goes to the mux,
>> and each child of the mux has four attached PHYs.
>
> I'm till have trouble understanding all this.  I'm just hacking things up
> in order to help me understand it, but it's a slow and painful process.
>
> This call in mdio_mux_init() is failing:
>
> 	parent_bus = of_mdio_find_bus(parent_bus_node);
>

Well, the MDIO bus must have an associated device tree node.

For my OCTEON code, the MDIO bus device is created as a result of the 
call to of_platform_bus_probe(), which takes care of filling in all the 
device tree nodes of the devices it finds and creates.

> It returns NULL.  Here is my MDIO node:
>
> 		fman0: fman@400000 {
> 			enet0: ethernet@e0000 {
> 				tbi-handle =<&tbi0>;
> 				phy-handle =<&phy0>;
> 				phy-connection-type = "sgmii";
> 			};
>
> 			mdio0: mdio@e1120 {
> 				gpios =<&gpio0 0 0
> 					&gpio0 1 0>;
>
> 				tbi0: tbi-phy@8 {
> 					reg =<0x8>;
> 					device_type = "tbi-phy";
> 				};
>
> 				phy0: ethernet-phy@1c {
> 					reg =<0x1c>;
> 				};
> 			};
> 		};
>
> What am I missing?

For starters, I do not see any compatible properties that would allow 
the proper drivers to be bound to anything.

Also I see no MDIO mux node there, so it is unclear why you are even 
asking these questions.

David Daney

^ permalink raw reply

* Re: [PATCH v6 0/3] netdev/of/phy: MDIO bus multiplexer support.
From: Timur Tabi @ 2012-05-24 19:03 UTC (permalink / raw)
  To: David Daney
  Cc: devicetree-discuss@lists.ozlabs.org, netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4FBE82F2.6080100@gmail.com>

David Daney wrote:

> Well, the MDIO bus must have an associated device tree node.
> 
> For my OCTEON code, the MDIO bus device is created as a result of the 
> call to of_platform_bus_probe(), which takes care of filling in all the 
> device tree nodes of the devices it finds and creates.

Ok, let me give you some background.  We actually already have MDIO muxing
code in-house, but it's different from yours.  So now I'm rewriting it to
use your design instead.

So our current code looks for "virtual MDIO nodes", and we call
mdiobus_alloc() and then of_mdiobus_register().  I think this is what I'm
missing now.

I just don't know what to do next.  Part of the problem is that I don't
have much experience with MDIO drivers.

>> It returns NULL.  Here is my MDIO node:
>>
>> 		fman0: fman@400000 {
>> 			enet0: ethernet@e0000 {
>> 				tbi-handle =<&tbi0>;
>> 				phy-handle =<&phy0>;
>> 				phy-connection-type = "sgmii";
>> 			};
>>
>> 			mdio0: mdio@e1120 {
>> 				gpios =<&gpio0 0 0
>> 					&gpio0 1 0>;
>>
>> 				tbi0: tbi-phy@8 {
>> 					reg =<0x8>;
>> 					device_type = "tbi-phy";
>> 				};
>>
>> 				phy0: ethernet-phy@1c {
>> 					reg =<0x1c>;
>> 				};
>> 			};
>> 		};
>>
>> What am I missing?
> 
> For starters, I do not see any compatible properties that would allow 
> the proper drivers to be bound to anything.

Ok, that makes sense.

> Also I see no MDIO mux node there, so it is unclear why you are even 
> asking these questions.

I only gave you part of the device tree.  Here's my mdio mux node:

	mdio-mux {
		compatible = "mdio-mux-gpio";
		gpios = <&gpio0 0 0>, <&gpio0 1 0>;
		mdio-parent-bus = <&mdio0>;
		#address-cells = <1>;
		#size-cells = <0>;

		mdio@2 {
			reg = <2>;
			#address-cells = <1>;
			#size-cells = <0>;

			phy21: ethernet-phy@1 {
				reg = <1>;
//				compatible = "marvell,88e1149r", "ethernet-phy-ieee802.3-c22";
				marvell,reg-init = <3 0x10 0 0x5777>,
					<3 0x11 0 0x00aa>,
					<3 0x12 0 0x4105>,
					<3 0x13 0 0x0a60>;
				interrupt-parent = <&gpio0>;
//				interrupts = <10 8>; /* Pin 10, active low */
			};
		};
	};
};




> 
> David Daney
> 


-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [PATCH v6 0/3] netdev/of/phy: MDIO bus multiplexer support.
From: David Daney @ 2012-05-24 19:19 UTC (permalink / raw)
  To: Timur Tabi
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
In-Reply-To: <4FBE8605.2020507-KZfg59tc24xl57MIdRCFDg@public.gmane.org>

On 05/24/2012 12:03 PM, Timur Tabi wrote:
> David Daney wrote:
>
>> Well, the MDIO bus must have an associated device tree node.
>>
>> For my OCTEON code, the MDIO bus device is created as a result of the
>> call to of_platform_bus_probe(), which takes care of filling in all the
>> device tree nodes of the devices it finds and creates.
>
> Ok, let me give you some background.  We actually already have MDIO muxing
> code in-house, but it's different from yours.  So now I'm rewriting it to
> use your design instead.
>
> So our current code looks for "virtual MDIO nodes", and we call
> mdiobus_alloc() and then of_mdiobus_register().  I think this is what I'm
> missing now.
>
> I just don't know what to do next.

You will have to debug it and find out why the device match is failing, 
then fix it.

David Daney

^ permalink raw reply

* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: Paul Gortmaker @ 2012-05-24 19:58 UTC (permalink / raw)
  To: David Miller; +Cc: jon.maloy, netdev, tipc-discussion, allan.stephens, maloy
In-Reply-To: <20120521.023926.548567931208958037.davem@davemloft.net>

[Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label] On 21/05/2012 (Mon 02:39) David Miller wrote:

> From: Jon Maloy <jon.maloy@ericsson.com>
> Date: Mon, 21 May 2012 01:59:12 -0400
> 
> > With the latest series of patches from Paul Gortmaker and Allan
> > Stephens TIPC is now functionally mature and stable enough to
> > justify removal of the EXPERIMENTAL label.
> > 
> > Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> 
> I'll let Paul Gortmaker decide whether this is warranted or
> not.

The EXPERIMENTAL thing has always been rather subjective, but
I'd like to see some level of confidence that a crafted up bogus
TIPC message can't be used to DOS a machine with active TIPC
connections before removing EXPERIMENTAL.  Maybe the current code
is OK as-is in this respect but I'd feel better knowing that it
had been audited with this exact kind of thing in mind.

> 
> I don't really want to all of a sudden start seeing patches from
> people like you and the windriver folks, who effectively wrote off
> upstream and left poor Paul Gortmaker holding the bag and having to
> take care of EVERYTHING.

To be fair, I should note that Al did a lot of work in the background
getting commits onto a modern baseline and answering all my questions
since the out of tree sourceforge mess was highlighted here on netdev.

> 
> You can't just do nothing for years, end up making someone else
> do it, then say "Hey here I am, I feel like submitting upstream
> patches now" after I've spent this entire time starting to trust
> Paul for TIPC patches.

I've been thinking about this off and on, and I'm wondering what to
suggest going forward.  Dealing with the backlog was largely going over
maintenance and bugfix type patches and sanitizing them for integration
upstream.  It largely boiled down to being able to tell a crap patch
from a good one that matched upstream expectations.  I figured I could
manage to not screw that up too badly, hence why I volunteered to assist
with the backlog.

But for new TIPC development features, future direction, and things like
that -- making the right call requires intimate understanding of TIPC
and its users, which is something that a maintainer should have but
something I know I don't have.  (A man has to know his limitations.)

In this context, I'm not talking about these three trivial patches; but
more complicated stuff that I imagine will be floated in the future.

To that end, I can still review and call out issues in a crap patch when
I see them.  But I'd like to see new stuff sent to netdev, so that folks
smarter than me have a chance to catch when a patch appears generally OK
but is architecturally the wrong direction etc.

Paul.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply

* pch_gbe:  backport fails to start sending
From: Andy Cress @ 2012-05-24 20:05 UTC (permalink / raw)
  To: netdev

Folks,

I now have a different case where the pch_gbe driver needs help.
I have backported the pch_gbe git head (v1.00) to kernel 2.6.32
(RHEL6.2) and when it loads, after the open completes and the link is
up, it fails to start sending and receiving.  
I also took the pch_gbe source which runs fine on 2.6.38 (Fedora 15) and
backported it, and get the same results. 
The much bulkier pch_gbe 0.91-NAPI driver does run on 2.6.32 (RHEL6.2)
but is not maintained.
I also tried backporting the mii.c from the 2.6.38 kernel, but that
didn't help, got the same symptoms.

After adding -DDEBUG to the 1.00 driver, I can see that it seems to get
just DMA Complete interrupts when it should be getting Transmit
complete, etc.   I'm not sure why it gets stuck there.  
Any ideas/input is welcome.  

Andy  

https://sendfile.kontron.com/message/KJKmrf171EhsuvbyhjnXpe 
Attached at this link are two files:  
 pch_gbe-100a.tar.gz = the backported pch_gbe 1.00 head source, includes
patches that were applied.  
 dmesg.tar.gz  =  Some dmesg output from test cases with debug:
   dmesg-pch10a-kern2632-bad.txt = backported 1.00 from git head on
kernel 2.6.32, fails
   dmesg-pch10-kern2632-bad.txt  = backported 1.00 from Fedora 15 on
kernel 2.6.32, fails
   dmesg-pch10-kern2638-good.txt = same 1.00 source from Fedora 15 on
kernel 2.6.38, works

^ permalink raw reply

* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: David Miller @ 2012-05-24 20:06 UTC (permalink / raw)
  To: anirban.chakraborty; +Cc: netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1337882816-2097-1-git-send-email-anirban.chakraborty@qlogic.com>

From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Date: Thu, 24 May 2012 14:06:54 -0400

> Please apply to net-next.

As I've stated at least 10 times this week, net-next is not open
and therefore submitting patches for net-next is not appropriate.

If people are not going to even read my announcements and
notifications of the states of the various GIT trees, I might as well
not make them at all.

^ permalink raw reply

* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: David Miller @ 2012-05-24 20:12 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: jon.maloy, netdev, tipc-discussion, allan.stephens, maloy
In-Reply-To: <20120524195816.GA6487@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Thu, 24 May 2012 15:58:16 -0400

> But for new TIPC development features, future direction, and things like
> that -- making the right call requires intimate understanding of TIPC
> and its users, which is something that a maintainer should have but
> something I know I don't have.  (A man has to know his limitations.)
> 
> In this context, I'm not talking about these three trivial patches; but
> more complicated stuff that I imagine will be floated in the future.
> 
> To that end, I can still review and call out issues in a crap patch when
> I see them.  But I'd like to see new stuff sent to netdev, so that folks
> smarter than me have a chance to catch when a patch appears generally OK
> but is architecturally the wrong direction etc.

For maintainership, taste is more important than deep knowledge of the
specific technology.  Worst case you ask the submitter to explain the
background of their change more thoroughly and that information is an
absolutely requirement in the commit message and code comments
anyways.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply

* Re: pch_gbe: backport fails to start sending
From: David Miller @ 2012-05-24 20:13 UTC (permalink / raw)
  To: andy.cress; +Cc: netdev
In-Reply-To: <40680C535D6FE6498883F1640FACD44DEA6714@ka-exchange-1.kontronamerica.local>

From: "Andy Cress" <andy.cress@us.kontron.com>
Date: Thu, 24 May 2012 13:05:05 -0700

> I have backported the pch_gbe git head (v1.00) to kernel 2.6.32
> (RHEL6.2) and when it loads, after the open completes and the link is
> up, it fails to start sending and receiving.  

Nobody here is going to help you with a vendor kernel backport,
sorry.  You're on your own.

^ permalink raw reply

* Re: [PATCH] MAINTAINERS
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
  To: jhs, hadi; +Cc: netdev
In-Reply-To: <1337863502.3513.15.camel@mojatatu>

From: jamal <hadi@cyberus.ca>
Date: Thu, 24 May 2012 08:45:02 -0400

> After about two decades, I am giving up on cyberus.
> Nabwaga Manyanga.
> 
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Applied.

^ permalink raw reply

* Re: [PATCH] xen/netback: Calculate the number of SKB slots required correctly
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
  To: simon.graham
  Cc: Ian.Campbell, konrad.wilk, xen-devel, netdev, bhutchings,
	adnan.misherfi
In-Reply-To: <1337876767-16041-1-git-send-email-simon.graham@citrix.com>

From: Simon Graham <simon.graham@citrix.com>
Date: Thu, 24 May 2012 12:26:07 -0400

> When calculating the number of slots required for a packet header, the code
> was reserving too many slots if the header crossed a page boundary. Since
> netbk_gop_skb copies the header to the start of the page, the count of
> slots required for the header should be based solely on the header size.
> 
> This problem is easy to reproduce if a VIF is bridged to a USB 3G modem
> device as the skb->data value always starts near the end of the first page.
> 
> Signed-off-by: Simon Graham <simon.graham@citrix.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/wanrouter: Deprecate and schedule for removal
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
  To: joe; +Cc: shemminger, greearb, jan.ceuleers, netdev
In-Reply-To: <1337879610.5070.17.camel@joe2Laptop>

From: Joe Perches <joe@perches.com>
Date: Thu, 24 May 2012 10:13:30 -0700

> No one uses this on current kernels anymore.
> 
> Let it be known it's going to be removed eventually.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: qmi_wwan: Add Sierra Wireless device IDs
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
  To: bjorn; +Cc: netdev, linux-usb
In-Reply-To: <1337851172-28549-1-git-send-email-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Thu, 24 May 2012 11:19:32 +0200

> Some additional Gobi3K IDs found in the BSD/GPL licensed
> out-of-tree GobiNet driver from Sierra Wireless.
> 
> Signed-off-by: Bjørn Mork <bjorn@mork.no>

Applied.

^ permalink raw reply

* Re: [PATCH] solos-pci: Fix DMA support
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
  To: dwmw2; +Cc: netdev, nathan
In-Reply-To: <1337871507.26314.132.camel@shinybook.infradead.org>

From: David Woodhouse <dwmw2@infradead.org>
Date: Thu, 24 May 2012 15:58:27 +0100

> DMA support has finally made its way to the top of the TODO list, having
> realised that a Geode using MMIO can't keep up with two ADSL2+ lines
> each running at 21Mb/s.
> 
> This patch fixes a couple of bugs in the DMA support in the driver, so
> once the corresponding FPGA update is complete and tested everything
> should work properly.
> 
> We weren't storing the currently-transmitting skb, so we were never
> unmapping it and never freeing/popping it when the TX was done.
> And the addition of pci_set_master() is fairly self-explanatory.
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: Joe Perches @ 2012-05-24 20:24 UTC (permalink / raw)
  To: David Miller; +Cc: anirban.chakraborty, netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <20120524.160659.834400122540802357.davem@davemloft.net>

On Thu, 2012-05-24 at 16:06 -0400, David Miller wrote:
> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
> Date: Thu, 24 May 2012 14:06:54 -0400
> 
> > Please apply to net-next.
> 
> As I've stated at least 10 times this week, net-next is not open
> and therefore submitting patches for net-next is not appropriate.

> If people are not going to even read my announcements and
> notifications of the states of the various GIT trees, I might as well
> not make them at all.

Perhaps setup a patchwork bot to autoreply to the
sender only that these won't be looked at until
after the merge window closes and train yourself
to ignore the patchwork queue until then?

^ permalink raw reply

* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: Anirban Chakraborty @ 2012-05-24 20:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept-NX Linux NIC Driver
In-Reply-To: <20120524.160659.834400122540802357.davem@davemloft.net>



On 5/24/12 1:06 PM, "David Miller" <davem@davemloft.net> wrote:

>From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>Date: Thu, 24 May 2012 14:06:54 -0400
>
>> Please apply to net-next.
>
>As I've stated at least 10 times this week, net-next is not open
>and therefore submitting patches for net-next is not appropriate.
>
>If people are not going to even read my announcements and
>notifications of the states of the various GIT trees, I might as well
>not make them at all.

My mistake, will resend it when the window opens. Sorry for the trouble.

-Anirban

^ permalink raw reply

* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: David Miller @ 2012-05-24 20:28 UTC (permalink / raw)
  To: joe; +Cc: anirban.chakraborty, netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1337891078.5070.36.camel@joe2Laptop>

From: Joe Perches <joe@perches.com>
Date: Thu, 24 May 2012 13:24:38 -0700

> On Thu, 2012-05-24 at 16:06 -0400, David Miller wrote:
>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>> Date: Thu, 24 May 2012 14:06:54 -0400
>> 
>> > Please apply to net-next.
>> 
>> As I've stated at least 10 times this week, net-next is not open
>> and therefore submitting patches for net-next is not appropriate.
> 
>> If people are not going to even read my announcements and
>> notifications of the states of the various GIT trees, I might as well
>> not make them at all.
> 
> Perhaps setup a patchwork bot to autoreply to the
> sender only that these won't be looked at until
> after the merge window closes and train yourself
> to ignore the patchwork queue until then?

Sorry, people simply need to learn when it's appropriate to
submit patches.

Forcing them to resend at the appropriate time will train
their minds to take such things into consideration.

And if it's too bothersome to get them to resubmit, perhaps
they don't consider their patch important enough after all.

That's why we always handle situations like this by dropping things
and asking for a resend.

^ permalink raw reply

* [PATCH v4] xfrm: take net hdr len into account for esp payload size calculation
From: Benjamin Poirier @ 2012-05-24 21:32 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, linux-kernel,
	Steffen Klassert, Diego Beltrami
In-Reply-To: <20120517.200509.2290282427866555176.davem@davemloft.net>

Corrects the function that determines the esp payload size. The calculations
done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
certain mtu values and suboptimal frames for others.

According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
net_header_len must be taken into account before doing the alignment
calculation.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>

---
Changes since v3:
* also fix ipv6

Changes since v2:
* rename l3_adj to net_adj
* fix indentation

Changes since v1:
* introduce l3_adj to preserve the same returned value as before for tunnel
  mode

For example:
* on ipv4 with md5 AH and 3des ESP (transport mode):
mtu = 1499 leads to FRAGFAILS
mtu = 1500 the addition of padding in the esp header could be avoided
* on ipv6 with md5 AH and twofish-sha1 ESP (transport mode):
mtu = 1491 leads to Ip6FragFails
mtu = 1499 padding can be avoided

For details on how the formula is established, see
https://lkml.org/lkml/2012/5/10/597

Tested with
* transport mode E
* transport mode EA
* transport mode E + ah
* tunnel mode E

Not tested with BEET, but it should be the same as transport mode
	draft-nikander-esp-beet-mode-03.txt Section 5.2:
	"The wire packet format is identical to the ESP transport mode"
---
 net/ipv4/esp4.c |   24 +++++++++---------------
 net/ipv6/esp6.c |   18 +++++++-----------
 2 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 89a47b3..cb982a6 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -459,28 +459,22 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 	struct esp_data *esp = x->data;
 	u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
 	u32 align = max_t(u32, blksize, esp->padlen);
-	u32 rem;
-
-	mtu -= x->props.header_len + crypto_aead_authsize(esp->aead);
-	rem = mtu & (align - 1);
-	mtu &= ~(align - 1);
+	unsigned int net_adj;
 
 	switch (x->props.mode) {
-	case XFRM_MODE_TUNNEL:
-		break;
-	default:
 	case XFRM_MODE_TRANSPORT:
-		/* The worst case */
-		mtu -= blksize - 4;
-		mtu += min_t(u32, blksize - 4, rem);
-		break;
 	case XFRM_MODE_BEET:
-		/* The worst case. */
-		mtu += min_t(u32, IPV4_BEET_PHMAXLEN, rem);
+		net_adj = sizeof(struct iphdr);
 		break;
+	case XFRM_MODE_TUNNEL:
+		net_adj = 0;
+		break;
+	default:
+		BUG();
 	}
 
-	return mtu - 2;
+	return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
+		 net_adj) & ~(align - 1)) + (net_adj - 2);
 }
 
 static void esp4_err(struct sk_buff *skb, u32 info)
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 1e62b75..db1521f 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -413,19 +413,15 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
 	struct esp_data *esp = x->data;
 	u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
 	u32 align = max_t(u32, blksize, esp->padlen);
-	u32 rem;
+	unsigned int net_adj;
 
-	mtu -= x->props.header_len + crypto_aead_authsize(esp->aead);
-	rem = mtu & (align - 1);
-	mtu &= ~(align - 1);
-
-	if (x->props.mode != XFRM_MODE_TUNNEL) {
-		u32 padsize = ((blksize - 1) & 7) + 1;
-		mtu -= blksize - padsize;
-		mtu += min_t(u32, blksize - padsize, rem);
-	}
+	if (x->props.mode != XFRM_MODE_TUNNEL)
+		net_adj = sizeof(struct ipv6hdr);
+	else
+		net_adj = 0;
 
-	return mtu - 2;
+	return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
+		 net_adj) & ~(align - 1)) + (net_adj - 2);
 }
 
 static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH IPROUTE2] tc-codel: Update usage text
From: Stephen Hemminger @ 2012-05-24 22:02 UTC (permalink / raw)
  To: Vijay Subramanian; +Cc: netdev, Eric Dumazet, Dave Taht
In-Reply-To: <1337885287-31354-1-git-send-email-subramanian.vijay@gmail.com>

On Thu, 24 May 2012 11:48:07 -0700
Vijay Subramanian <subramanian.vijay@gmail.com> wrote:

> codel can take 'noecn' as an option. This also makes it consistent with the
> manpage.
> 
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
>  tc/q_codel.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/tc/q_codel.c b/tc/q_codel.c
> index 826285a..dc4b3f6 100644
> --- a/tc/q_codel.c
> +++ b/tc/q_codel.c
> @@ -54,7 +54,7 @@
>  static void explain(void)
>  {
>  	fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
> -	fprintf(stderr, "                 [ interval TIME ] [ ecn ]\n");
> +	fprintf(stderr, "                 [ interval TIME ] [ ecn | noecn ]\n");
>  }
>  
>  static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,

Applied, thanks.

^ permalink raw reply

* Re: [ovs-dev] [PATCH 04/21] vswitchd: Add iface_parse_tunnel
From: Simon Horman @ 2012-05-24 23:59 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: dev, netdev
In-Reply-To: <20120524164738.GE26173@nicira.com>

On Thu, May 24, 2012 at 09:47:38AM -0700, Ben Pfaff wrote:
> The concept seems OK to me here.  I have only a few minor comments.
> 
> On Thu, May 24, 2012 at 06:08:57PM +0900, Simon Horman wrote:
> > +#define TNL_F_CSUM          (1 << 0) /* Checksum packets. */
> > +#define TNL_F_TOS_INHERIT	(1 << 1) /* Inherit ToS from inner packet. */
> > +#define TNL_F_TTL_INHERIT	(1 << 2) /* Inherit TTL from inner packet. */
> > +#define TNL_F_DF_INHERIT	(1 << 3) /* Inherit DF bit from inner packet. */
> > +#define TNL_F_DF_DEFAULT	(1 << 4) /* Set DF bit if inherit off or
> > +                                      * not IP. */
> > +#define TNL_F_PMTUD		    (1 << 5) /* Enable path MTU discovery. */
> > +#define TNL_F_HDR_CACHE		(1 << 6) /* Enable tunnel header caching. */
> > +#define TNL_F_IPSEC		    (1 << 7) /* Traffic is IPsec encrypted. */
> > +#define TNL_F_IN_KEY	    (1 << 8) /* Tunnel port has input key. */
> > +#define TNL_F_OUT_KEY	    (1 << 9) /* Tunnel port has output key. */
> 
> Some of the above definitions use all spaces, others use tabs.  It's
> OVS userspace code so it's better to use all spaces, I think.

Sorry about that. I have a bit of trouble remembering to switch
tabbing modes in my editor depending on if I am in user-space or the
datapath.

> > +    if (is_ipsec) {
> > +        char *file_name = xasprintf("%s/%s", ovs_rundir(),
> > +                "ovs-monitor-ipsec.pid");
> > +        pid_t pid = read_pidfile(file_name);
> > +        free(file_name);
> > +        if (pid < 0) {
> > +            VLOG_ERR("%s: IPsec requires the ovs-monitor-ipsec daemon",
> > +                     iface_cfg->name);
> > +            goto err;
> > +        }
> 
> I just noticed that we re-read this pidfile every time we parse an
> IPsec tunnel.  I guess that would be a big waste of time if we have a
> lot of IPsec tunnels.  I'll make a note to consider fixing this
> separately (it's not your problem).

I guess that it should be easy enough to set a flag if any of the parsed
configurations use ipsec and perform the pid check if so.

As it is, I wouldn't be at all surprised if my series breaks ipsec as
I haven't tested it (with or without my changes).

^ permalink raw reply

* Re: [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr()
From: Simon Horman @ 2012-05-25  0:01 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120524162911.GD26173-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

On Thu, May 24, 2012 at 09:29:11AM -0700, Ben Pfaff wrote:
> On Thu, May 24, 2012 at 06:08:56PM +0900, Simon Horman wrote:
> > Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> 
> But I don't see him CCed?

Strange. I asked git send-mail to CC him explicitly.

> > +        ovs_be32 ipv4_src;
> > +        ovs_be32 ipv4_dst;
> > +        unsigned long long tun_flags;
> > +        int ipv4_tos;
> > +        int ipv4_ttl;
> > +        int n = -1;
> > +
> > +        if (sscanf(s, "ipv4_tunnel(tun_id=%31[x0123456789abcdefABCDEF]"
> > +                   ",flags=%llx,src="IP_SCAN_FMT",dst="IP_SCAN_FMT
> > +                   ",tos=%i,ttl=%i)%n",
> > +                   tun_id_s, &tun_flags,
> > +                   IP_SCAN_ARGS(&ipv4_src), IP_SCAN_ARGS(&ipv4_dst),
> > +                   &ipv4_tos, &ipv4_ttl, &n) > 0
> > +            && n > 0) {
> 
> Does this compile?  I don't see a declaration of tun_id_s.
> 
> In the ODP printer and parser, we usually require fields that are
> hexadecimal to be written with an explicit "0x" on output (using
> something like "0x%x" or "%#x" on output), and then use "%i" on input,
> so that it is always unambiguous at a glance whether a number is
> decimal or hexadecimal.  I'd appreciate it if we could maintain that
> here (I didn't look over at the printer code to see if it writes 0x,
> but I'd like it to).
> 
> Otherwise, this looks good, thank you.

Sorry, perhaps this is not the latest revision, somehow.
I did have it compiling, and I'll update the patch accordingly.

^ permalink raw reply

* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Gao feng @ 2012-05-25  1:05 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <20120524143854.GA15898@1984>

于 2012年05月24日 22:38, Pablo Neira Ayuso 写道:
> On Thu, May 24, 2012 at 06:54:42PM +0800, Gao feng wrote:
> [...]
>>>>> I don't see why we need this new field.
>>>>>
>>>>> It seems to be set to 1 in each structure that has set:
>>>>>
>>>>> .ctl_compat_table
>>>>>
>>>>> to non-NULL. So, it's redundant.
>>>>>
>>>>> Moreover, you already know from the protocol tracker itself if you
>>>>> have to allocate the compat ctl table or not.
>>>>>
>>>>> In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
>>>>> Then, you pass that compat value to generic_init_net via ->inet_net
>>>>> again, but this information (that determines if the compat has to be
>>>>> done or not) is already in the scope of the protocol tracker.
>>>>>
>>>>
>>>> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
>>>> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
>>>> field to identify if we should kmemdup compat_sysctl_table.
>>>
>>> Then, could you use two init_net functions? one for TCP for IPv4 and another
>>> for TCP for IPv6?
>>
>> Of cause, if you prefer to impletment it in this way.
> 
> If this removes the .compat field that you added, then use two
> init_net functions, yes.

Sorry I miss something.

nf_ct_l4proto_unregister_sysctl also uses .compat to identify if we
can unregister the compat sysctl.

if we register l4proto_tcp and l4proto_tcp6 both. without .compat,
when unregister l4proto_tcp6, the compat sysctl will be unregister too.

So maybe we have to use .compat.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] mac80211: Use correct originator sequence number in a Path Reply
From: Julian Calaby @ 2012-05-25  2:24 UTC (permalink / raw)
  To: Qasim Javed; +Cc: linux-wireless, devel, netdev, linux-kernel, ravip
In-Reply-To: <1337922135-27846-1-git-send-email-qasimj@gmail.com>

Hi Qasim,

On Fri, May 25, 2012 at 3:02 PM, Qasim Javed <qasimj@gmail.com> wrote:
> Hi,
>
> I have not tested the patch yet. This is more of a heads up email to let everyone.

Just so you know, the usual practise when doing this is to mark the
patch as [RFC] rather than [PATCH] when you're asking for comments /
letting people know.

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/

^ permalink raw reply

* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Pablo Neira Ayuso @ 2012-05-25  2:54 UTC (permalink / raw)
  To: Gao feng
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <4FBEDADE.8040905@cn.fujitsu.com>

On Fri, May 25, 2012 at 09:05:34AM +0800, Gao feng wrote:
> 于 2012年05月24日 22:38, Pablo Neira Ayuso 写道:
> > On Thu, May 24, 2012 at 06:54:42PM +0800, Gao feng wrote:
> > [...]
> >>>>> I don't see why we need this new field.
> >>>>>
> >>>>> It seems to be set to 1 in each structure that has set:
> >>>>>
> >>>>> .ctl_compat_table
> >>>>>
> >>>>> to non-NULL. So, it's redundant.
> >>>>>
> >>>>> Moreover, you already know from the protocol tracker itself if you
> >>>>> have to allocate the compat ctl table or not.
> >>>>>
> >>>>> In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
> >>>>> Then, you pass that compat value to generic_init_net via ->inet_net
> >>>>> again, but this information (that determines if the compat has to be
> >>>>> done or not) is already in the scope of the protocol tracker.
> >>>>>
> >>>>
> >>>> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
> >>>> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
> >>>> field to identify if we should kmemdup compat_sysctl_table.
> >>>
> >>> Then, could you use two init_net functions? one for TCP for IPv4 and another
> >>> for TCP for IPv6?
> >>
> >> Of cause, if you prefer to impletment it in this way.
> > 
> > If this removes the .compat field that you added, then use two
> > init_net functions, yes.
> 
> Sorry I miss something.
> 
> nf_ct_l4proto_unregister_sysctl also uses .compat to identify if we
> can unregister the compat sysctl.
> 
> if we register l4proto_tcp and l4proto_tcp6 both. without .compat,
> when unregister l4proto_tcp6, the compat sysctl will be unregister too.
> 
> So maybe we have to use .compat.

Could you resolve this by checking pn->ctl_compat_header != NULL ?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 05/17] netfilter: add namespace support for l4proto_tcp
From: Pablo Neira Ayuso @ 2012-05-25  3:00 UTC (permalink / raw)
  To: Gao feng; +Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <1336985547-31960-6-git-send-email-gaofeng@cn.fujitsu.com>

Hi Gao,

While having a look at this again, I have two new requests:

On Mon, May 14, 2012 at 04:52:15PM +0800, Gao feng wrote:
[...]
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 4dfbfa8..dd19350 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
[...]
> @@ -1549,10 +1532,80 @@ static struct ctl_table tcp_compat_sysctl_table[] = {
>  #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
>  #endif /* CONFIG_SYSCTL */
>
> +static int tcp_init_net(struct net *net, u_int8_t compat)
> +{
> +	int i;
> +	struct nf_tcp_net *tn = tcp_pernet(net);
> +	struct nf_proto_net *pn = (struct nf_proto_net *)tn;
> +#ifdef CONFIG_SYSCTL
> +#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> +	if (compat) {
> +		pn->ctl_compat_table = kmemdup(tcp_compat_sysctl_table,
> +					       sizeof(tcp_compat_sysctl_table),
> +					       GFP_KERNEL);
> +		if (!pn->ctl_compat_table)
> +			return -ENOMEM;
> +
> +		pn->ctl_compat_table[0].data = &tn->timeouts[TCP_CONNTRACK_SYN_SENT];
> +		pn->ctl_compat_table[1].data = &tn->timeouts[TCP_CONNTRACK_SYN_SENT2];
> +		pn->ctl_compat_table[2].data = &tn->timeouts[TCP_CONNTRACK_SYN_RECV];
> +		pn->ctl_compat_table[3].data = &tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		pn->ctl_compat_table[4].data = &tn->timeouts[TCP_CONNTRACK_FIN_WAIT];
> +		pn->ctl_compat_table[5].data = &tn->timeouts[TCP_CONNTRACK_CLOSE_WAIT];
> +		pn->ctl_compat_table[6].data = &tn->timeouts[TCP_CONNTRACK_LAST_ACK];
> +		pn->ctl_compat_table[7].data = &tn->timeouts[TCP_CONNTRACK_TIME_WAIT];
> +		pn->ctl_compat_table[8].data = &tn->timeouts[TCP_CONNTRACK_CLOSE];
> +		pn->ctl_compat_table[9].data = &tn->timeouts[TCP_CONNTRACK_RETRANS];
> +		pn->ctl_compat_table[10].data = &tn->tcp_loose;
> +		pn->ctl_compat_table[11].data = &tn->tcp_be_liberal;
> +		pn->ctl_compat_table[12].data = &tn->tcp_max_retrans;

You can make a generic function to set the ctl_data that you can
reuse for this code above and the one below.

> +	}
> +#endif
> +	if (!pn->ctl_table) {
> +#else
> +	if (!pn->user++) {
> +#endif
> +		for (i = 0; i < TCP_CONNTRACK_TIMEOUT_MAX; i++)
> +			tn->timeouts[i] = tcp_timeouts[i];
> +		tn->tcp_loose = nf_ct_tcp_loose;
> +		tn->tcp_be_liberal = nf_ct_tcp_be_liberal;
> +		tn->tcp_max_retrans = nf_ct_tcp_max_retrans;
> +#ifdef CONFIG_SYSCTL
> +		pn->ctl_table = kmemdup(tcp_sysctl_table,
> +					sizeof(tcp_sysctl_table),
> +					GFP_KERNEL);
> +		if (!pn->ctl_table) {
> +#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> +			if (compat) {
> +				kfree(pn->ctl_compat_table);
> +				pn->ctl_compat_table = NULL;
> +			}
> +#endif
> +			return -ENOMEM;
> +		}
> +		pn->ctl_table[0].data = &tn->timeouts[TCP_CONNTRACK_SYN_SENT];
> +		pn->ctl_table[1].data = &tn->timeouts[TCP_CONNTRACK_SYN_RECV];
> +		pn->ctl_table[2].data = &tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		pn->ctl_table[3].data = &tn->timeouts[TCP_CONNTRACK_FIN_WAIT];
> +		pn->ctl_table[4].data = &tn->timeouts[TCP_CONNTRACK_CLOSE_WAIT];
> +		pn->ctl_table[5].data = &tn->timeouts[TCP_CONNTRACK_LAST_ACK];
> +		pn->ctl_table[6].data = &tn->timeouts[TCP_CONNTRACK_TIME_WAIT];
> +		pn->ctl_table[7].data = &tn->timeouts[TCP_CONNTRACK_CLOSE];
> +		pn->ctl_table[8].data = &tn->timeouts[TCP_CONNTRACK_RETRANS];
> +		pn->ctl_table[9].data = &tn->timeouts[TCP_CONNTRACK_UNACK];
> +		pn->ctl_table[10].data = &tn->tcp_loose;
> +		pn->ctl_table[11].data = &tn->tcp_be_liberal;
> +		pn->ctl_table[12].data = &tn->tcp_max_retrans;
> +#endif

I have bad experience with code that has lots of #ifdef's.

Please, split all *_init_net into smaller functions.

^ permalink raw reply

* Re: [RFC:kvm] export host NUMA info to guest & make emulated device NUMA attr
From: Liu ping fan @ 2012-05-25  3:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, Andrew Theurer, Rusty Russell, Shirley Ma, kvm,
	netdev, Shirley Ma, qemu-devel, linux-kernel, Tom Lendacky,
	Ryan Harper, Avi Kivity, Anthony Liguori, Srivatsa Vaddagiri
In-Reply-To: <20120523151604.GB30542@redhat.com>

On Wed, May 23, 2012 at 11:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 23, 2012 at 09:52:15AM -0500, Andrew Theurer wrote:
>> On 05/22/2012 04:28 AM, Liu ping fan wrote:
>> >On Sat, May 19, 2012 at 12:14 AM, Shirley Ma<mashirle@us.ibm.com>  wrote:
>> >>On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote:
>> >>>Currently, the guest can not know the NUMA info of the vcpu, which
>> >>>will
>> >>>result in performance drawback.
>> >>>
>> >>>This is the discovered and experiment by
>> >>>         Shirley Ma<xma@us.ibm.com>
>> >>>         Krishna Kumar<krkumar2@in.ibm.com>
>> >>>         Tom Lendacky<toml@us.ibm.com>
>> >>>Refer to -
>> >>>http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html
>> >>>we can see the big perfermance gap between NUMA aware and unaware.
>> >>>
>> >>>Enlightened by their discovery, I think, we can do more work -- that
>> >>>is to
>> >>>export NUMA info of host to guest.
>> >>
>> >>There three problems we've found:
>> >>
>> >>1. KVM doesn't support NUMA load balancer. Even there are no other
>> >>workloads in the system, and the number of vcpus on the guest is smaller
>> >>than the number of cpus per node, the vcpus could be scheduled on
>> >>different nodes.
>> >>
>> >>Someone is working on in-kernel solution. Andrew Theurer has a working
>> >>user-space NUMA aware VM balancer, it requires libvirt and cgroups
>> >>(which is default for RHEL6 systems).
>> >>
>> >Interesting, and I found that "sched/numa: Introduce
>> >sys_numa_{t,m}bind()" committed by Peter and Ingo may help.
>> >But I think from the guest view, it can not tell whether the two vcpus
>> >are on the same host node. For example,
>> >vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more
>> >expensive if it pull_task from vcpu-a and
>> >choose vcpu-b to push.  And my idea is to export such info to guest,
>> >still working on it.
>>
>> The long term solution is to two-fold:
>> 1) Guests that are quite large (in that they cannot fit in a host
>> NUMA node) must have static mulit-node NUMA topology implemented by
>> Qemu. That is here today, but we do not do it automatically, which
>> is probably going to be a VM management responsibility.
>> 2) Host scheduler and NUMA code must be enhanced to get better
>> placement of Qemu memory and threads.  For single-node vNUMA guests,
>> this is easy, put it all in one node.  For mulit-node vNUMA guests,
>> the host must understand that some Qemu memory belongs with certain
>> vCPU threads (which make up one of the guests vNUMA nodes), and then
>> place that memory/threads in a specific host node (and continue for
>> other memory/threads for each Qemu vNUMA node).
>
> And for IO, we need multiqueue devices such that each
> node can have its own queue in its local memory.
>
Yes, my patches include such solution. Independent device sub logic
units are seated in different NUMA node, "subdev" in the patches
stands for the logic unit. And each of they are backed by a
vhost-thread.  On the other hand, for virtio-guest, the vqs(including
vrings) are allocated align at the PAGE_SIZE, so their NUMA problem
will be resolved automatically by KVM(maybe a little more effort
needed here).

I had thought to export the real host NUMA info to virtio layer (not
scheduler,that is another topic). So we can create the exact num of
logic unit as needed.
And we even can increase/decrease the logic unit.

But what hesitate me to move on is that is it acceptable to create
independent vhost-thread for each node as the user's demand?
And the scalability is perVM *demand_node_num.  Object?


Thanks,
pingfan



> --
> MST

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox