Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
From: Ben Hutchings @ 2014-01-16 21:31 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: netdev, therbert, edumazet, davem, Zhi Yong Wu
In-Reply-To: <1389795654-28381-4-git-send-email-zwu.kernel@gmail.com>

On Wed, 2014-01-15 at 22:20 +0800, Zhi Yong Wu wrote:
[...]
> +static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
> +{
> +	int rc = 0;
> +
> +#ifdef CONFIG_RFS_ACCEL
> +	struct virtio_device *vdev = vi->vdev;
> +	unsigned int irq;
> +	int i;
> +
> +	if (!vi->affinity_hint_set)
> +		goto out;
> +
> +	vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->max_queue_pairs);
> +	if (!vi->dev->rx_cpu_rmap) {
> +		rc = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		irq = virtqueue_get_vq_irq(vdev, vi->rq[i].vq);
> +		if (irq == -1)
> +			goto failed;

Jumping into an if-statement is confusing.  Also do you really want to
return 0 in this case?

Otherwise this looks fine.

Ben.

> +		rc = irq_cpu_rmap_add(vi->dev->rx_cpu_rmap, irq);
> +		if (rc) {
> +failed:
> +			virtnet_free_irq_cpu_rmap(vi);
> +			goto out;
> +		}
> +	}
> +out:
> +#endif
> +	return rc;
> +}
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net] ipv6: simplify detection of first operational link-local address on interface
From: Jiri Pirko @ 2014-01-16 21:47 UTC (permalink / raw)
  To: netdev, fbl
In-Reply-To: <20140116191304.GC17529@order.stressinduktion.org>

Thu, Jan 16, 2014 at 08:13:04PM CET, hannes@stressinduktion.org wrote:
>In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for
>dad-completed ipv6 addresses") I build the detection of the first
>operational link-local address much to complex. Additionally this code
>now has a race condition.
>
>Replace it with a much simpler variant, which just scans the address
>list when duplicate address detection completes, to check if this is
>the first valid link local address and send RS and MLD reports then.
>
>Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
>Reported-by: Jiri Pirko <jiri@resnulli.us>
>Cc: Flavio Leitner <fbl@redhat.com>
>Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Acked-by: Jiri Pirko <jiri@resnulli.us>

^ permalink raw reply

* Re: [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
From: Zhi Yong Wu @ 2014-01-16 22:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Linux Netdev List, Tom Herbert, Eric Dumazet, David S. Miller,
	Zhi Yong Wu
In-Reply-To: <1389907887.11912.87.camel@bwh-desktop.uk.level5networks.com>

On Fri, Jan 17, 2014 at 5:31 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2014-01-15 at 22:20 +0800, Zhi Yong Wu wrote:
> [...]
>> +static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
>> +{
>> +     int rc = 0;
>> +
>> +#ifdef CONFIG_RFS_ACCEL
>> +     struct virtio_device *vdev = vi->vdev;
>> +     unsigned int irq;
>> +     int i;
>> +
>> +     if (!vi->affinity_hint_set)
>> +             goto out;
>> +
>> +     vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->max_queue_pairs);
>> +     if (!vi->dev->rx_cpu_rmap) {
>> +             rc = -ENOMEM;
>> +             goto out;
>> +     }
>> +
>> +     for (i = 0; i < vi->max_queue_pairs; i++) {
>> +             irq = virtqueue_get_vq_irq(vdev, vi->rq[i].vq);
>> +             if (irq == -1)
>> +                     goto failed;
>
> Jumping into an if-statement is confusing.  Also do you really want to
> return 0 in this case?
No, If it fail to get irq, i want it to exit as soon as possible,
otherwise it will cause irq_cpu_rmap_add() to be invoked with one
incorrect argument irq.

By the way, do you have thought about if it makes sense to add aRFS
support to virtio_net? For [patch 2/3], what do you think of those
missing stuff listed by me?
For how indirect table is implemented in sfc NIC, do you have any doc
to share with  me? thanks.

>
> Otherwise this looks fine.
>
> Ben.
>
>> +             rc = irq_cpu_rmap_add(vi->dev->rx_cpu_rmap, irq);
>> +             if (rc) {
>> +failed:
>> +                     virtnet_free_irq_cpu_rmap(vi);
>> +                     goto out;
>> +             }
>> +     }
>> +out:
>> +#endif
>> +     return rc;
>> +}
> [...]
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply

* Re: [PATCH-next v2] net/ipv4: don't use module_init in non-modular gre_offload
From: Eric Dumazet @ 2014-01-16 22:05 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: David S. Miller, netdev, Eric Dumazet
In-Reply-To: <1389802795-27442-1-git-send-email-paul.gortmaker@windriver.com>

On Wed, 2014-01-15 at 11:19 -0500, Paul Gortmaker wrote:
> Recent commit 438e38fadca2f6e57eeecc08326c8a95758594d4
> ("gre_offload: statically build GRE offloading support") added
> new module_init/module_exit calls to the gre_offload.c file.
...
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> 
> v2: dump gre_offload_exit entirely as suggested by Eric.

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: [PATCH v2] ipv6: send Change Status Report after DAD is completed
From: Flavio Leitner @ 2014-01-16 22:13 UTC (permalink / raw)
  To: netdev; +Cc: Hideaki YOSHIFUJI, Hannes Frederic Sowa
In-Reply-To: <1389907679-15346-1-git-send-email-fbl@redhat.com>


This is for net-next.
fbl

On Thu, Jan 16, 2014 at 07:27:59PM -0200, Flavio Leitner wrote:
> The RFC 3810 defines two type of messages for multicast
> listeners. The "Current State Report" message, as the name
> implies, refreshes the *current* state to the querier.
> Since the querier sends Query messages periodically, there
> is no need to retransmit the report.
[...] 

^ permalink raw reply

* [PATCH] DT: net: davinci_emac: "phy-handle" property is actually optional
From: Sergei Shtylyov @ 2014-01-16 22:32 UTC (permalink / raw)
  To: netdev, robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	rob, devicetree
  Cc: linux-doc, davinci-linux-open-source

Though described as required, the "phy-handle" property for the DaVinci EMAC
binding is actually optional, as the driver will happily function without it,
assuming 100/FULL link; the property is not specified  either in the example
device node,  or in the actual EMAC device nodes for DA850 and AM3517 device
trees.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
The patch is against DaveM's 'net-next.git' repo.  Though being a fix, it does
not seem important enough for 'net.git' repo at this time. Not sure if it should
be considered for the stable kernels...

 Documentation/devicetree/bindings/net/davinci_emac.txt |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: renesas/Documentation/devicetree/bindings/net/davinci_emac.txt
===================================================================
--- renesas.orig/Documentation/devicetree/bindings/net/davinci_emac.txt
+++ renesas/Documentation/devicetree/bindings/net/davinci_emac.txt
@@ -12,8 +12,6 @@ Required properties:
 - ti,davinci-ctrl-ram-size: size of control module ram
 - ti,davinci-rmii-en: use RMII
 - ti,davinci-no-bd-ram: has the emac controller BD RAM
-- phy-handle: Contains a phandle to an Ethernet PHY.
-              if not, davinci_emac driver defaults to 100/FULL
 - interrupts: interrupt mapping for the davinci emac interrupts sources:
               4 sources: <Receive Threshold Interrupt
 			  Receive Interrupt
@@ -21,6 +19,8 @@ Required properties:
 			  Miscellaneous Interrupt>
 
 Optional properties:
+- phy-handle: Contains a phandle to an Ethernet PHY.
+              If absent, davinci_emac driver defaults to 100/FULL.
 - local-mac-address : 6 bytes, mac address
 
 Example (enbw_cmc board):

^ permalink raw reply

* Re: [PATCH net-next 0/6] bonding: only rely on arp packets if arp monitor is used
From: Jay Vosburgh @ 2014-01-16 22:38 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, Andy Gospodarek, David S. Miller
In-Reply-To: <20140116084102.GM1867@redhat.com>

Veaceslav Falico <vfalico@redhat.com> wrote:

>On Wed, Jan 15, 2014 at 09:09:57PM -0800, Jay Vosburgh wrote:
>>Veaceslav Falico <vfalico@redhat.com> wrote:
>>
>>>Currently, if arp_validate is off (0), slave_last_rx() returns the
>>>slave->dev->last_rx, which is always updated on *any* packet received by
>>>slave, and not only arps. This means that, if the validation of arps is
>>>off, we're treating *any* incoming packet as a proof of slave being up, and
>>>not only arps.
>>
>>	The "any incoming packet" part is intentional.
>>
>>>This might seem logical at the first glance, however it can cause a lot of
>>>troubles and false-positives, one example would be:
>>>
>>>The arp_ip_target is NOT accessible, however someone in the broadcast domain
>>>spams with any broadcast traffic. This way bonding will be tricked that the
>>>slave is still up (as in - can access arp_ip_target), while it's not.
>>
>>	This type of situation is why arp_validate was added.
>>
>>	The specific situation was when multiple hosts using bonding
>>with the ARP monitor were set up behind a common gateway (in the same
>>Ethernet broadcast domain).  The arp_ip_target is unreachable for
>>whatever reason.  In that case, the various bonding instances on the
>>different hosts will each issue broadcast ARP requests, and (in the
>>absence of arp_validate) those requests would trick the other bonds into
>>believing that they are up.
>>
>>	I don't think this patch set will resolve that problem, since
>>you explicitly permit any incoming ARP to count.
>
>I've said it was tricky at first glance :).
>
>For this situation the arp_validate is *indeed* the cure. I'm not disabling
>(or even working with) how arp_validate=1/2 works, I'm working with
>arp_validate == 0.
>
>Before the patchset (with arp_validate == 0 ):
>
>*Any* packet (arp and non-arp) will signal us that the slave is up -
>because we use slave->dev->last_rx (updated on *every* incoming packet in
>bond_handle_frame).
>
>After the patchset (with arp_validate == 0 ):
>
>*ONLY ARP* packets signal us that the slave is up - because we use
>slave->last_arp_rx that is updated every time we see an ARP packet.

	But that's effectively making arp_validate the only setting if
the host is on a quiet network segment without much other host ARP
traffic.  This may not be an issue for the active-backup mode, but the
load balance modes will have serious issues (more on that below).

	Actually, thinking about it, for active-backup, if a backup
slave is in a different Ethernet broadcast domain than the active slave,
your change will likely break those configurations.  With arp_validate
enabled or your change applied, the backup slaves in active-backup
depend on the ARP request broadcasts to be forwarded by the switch to
the backup slave.  Currently, without arp_validate, that dependency is
not there; the backup slave can receive any traffic (although in most
configurations, the ARP broadcast is what it will receive).

	That would be a legal (if very odd) bonding configuration.  I'm
not aware of anybody seting things up that way.

>The way the modes work with arp_validate > 0 don't change :), as we're
>updating slave->last_arp_rx the old way in this case - after validation.
>
>So that the scenario you've described still works flawlessly, and now
>already we won't be tricked by some weird broadcast traffic even with
>arp_validate == 0.

	So why not just enable arp_validate and get the same effect?

>>>The documentation for arp_validate also states that *ARPs* will (not) be
>>>validated if it's on/off, and that the arp monitoring works on arps as
>>>traffic generators.
>>
>>	I wrote most of that text in the documentation, and the intent
>>was not to imply that only ARPs should count for "up-ness" even without
>>arp_validate enabled.  The intent was to distinguish it from
>>"non-validate," in which any incoming traffic counted for "up-ness."
>>
>>	The main reason for preserving the non-validate behavior (any
>>traffic counts) is for the loadbalance (xor and rr) modes.  In those
>>modes, the switch decides which slave receives the incoming traffic, and
>>so it's to our advantage to permit any incoming traffic to count for
>>"up-ness."  The arp_validate option is not allowed in these modes
>>because it won't work.
>>
>>	With these changes, I suspect that the loadbalance ARP monitor
>>will be less reliable with these changes (granted that it's already a
>>bit dodgy in its dependence on the switch to hit all slaves with
>>incoming packets regularly).  Particularly if the switch ports are
>>configured into an Etherchannel ("static link aggregation") group, in
>>which case only one slave will receive any given frame (broadcast /
>>multicast traffic will not be duplicated across all slaves).
>
>The non-AB modes also gave me a headache, however after thinking a bit I've
>decided to change them also (mainly, it's the change of
>arp_loadbalance_mon function).

	I had that same headache when I was implementing the
arp_validate stuff.  But I disagree with changing this.

>The usual usage, however, is to generate traffic via arps. If we don't see
>arp replies - this means that arp_ip_target is down, and thus the slave is
>down.

	The issue with the loadbalance (-xor and -rr) modes is that the
incoming ARPs will be balanced by the switch, and won't be delivered to
all slaves, or at least not at a rate such that each slave sees an ARP
every arp_interval or so.

	Currently, the loadbalance modes will work with the ARP monitor
as long as sufficient traffic volume (of any kind) flows across the
slaves; with your change, that will no longer be true.

	In this case, the ARP monitor currently is essentially using
"slave can send and receive packets" as the test for availability.

>>	I'm not sure that this change (the "only count ARPs even without
>>arp_validate" bit) won't break existing configurations.  Did you test
>>the -rr and -xor modes with ARP monitor after your changes (with and
>>without configuring a channel group on the switch ports)?
>
>Sure, all works fine, afaics. Obviously, these were basic tests, and bugs
>might exist.

	I set up bonding to run some tests.  I used three slaves,
connected to a switch with those ports set for Etherchannel, balancing
according to source IP (Cisco port-channel load-balance src-ip).  For
non-IP packets (like ARP), this will balance by source MAC.  I set
arp_interval to 1000 (1 second), and used one arp_ip_target.

	Without your patches, I can get all slaves to stay up with a
sufficient traffic load (ping, netperf, etc).  This is dependent upon
the switch distributing the packets egressing the switch to all ports of
the Etherchannel.  This is not by any means an ideal situation, but has
been the state for some time now.

	With your patches, one slave stays up (it is receiving the ARP
replies from the arp_ip_target).  Regular traffic of any kind does not
keep the second and third slaves up.  In my opinion, this is a
regression from previous behavior.

	How did you test this such that all slaves stayed up?  My
suspicion is that you did not configure the switch ports for
Etherchannel (static link aggregation), and thus all broadcast ARP
requests were flooded to all slaves.  This would mark all slaves up, but
not have any real reliance on ARP replies coming from the arp_ip_target
(i.e., if there were no ARP replies, the slaves would still remain up
due to the broadcast ARPs generated by the bond itself).

>The only possible scenario of breakage for someone, from my POV, is:
>
>1) arp monitor is used with loadbalance mode
>2) arp_ip_targets are set but _any_ arp replies are never received
>3) the user relies on that every slave will receive at least one packet per
>arp_interval

	For 2, above, the issue is that, after your changes, each slave
must receive an ARP during each arp_interval.

>This use case:
>
>1) contradicts with documentation
>2) contradicts with logic (arp monitor, arp ip targets etc. are used
>without, actually, meaning something)
>3) is really unstable

	On 1, I'll grant that the documentation is ambiguous, but the
acceptance of any incoming packet for the non-validate ARP monitor is
functioning as designed.  At worst, this means the documentation should
be updated to be clearer, not that the code is functioning incorrectly
because it can be shown to contradict a reading of the documentation.

	Heck, looking at the bonding.txt documentation, it still says
that device drivers have to update ->last_rx and ->trans_start, which
isn't true.  So, yah, the documentation needs some work.

	On 2, sure they mean something; (without arp_validate or this
change) the ARP request / replies are one source, but not the only
source, of traffic to determine slave state.

	On 3, that is, indeed, the case, and this is mentioned in the
documentation somewhere (that a continuous flow of traffic is necessary
to maintain correct up/down status for the slaves).  The whole point of
accepting any incoming frame (not just ARPs) is to permit this very
scenario to function at maximum efficiency, even if that's not 100%
reliable in all cases.

>In this case, indeed, it won't work. Two points, though:
>
>1) It shouldn't work in the first place, is unstable etc.
>2) Can be easily fixed by the following oneliner (though I *really* woudn't
>like to do it, as it's useless and dangerous). Basically, for non-ab mode
>we set last_arp_rx for every packet. Again, I wouldn't like to do it, but
>if you know any use case scenario for this usage (no working arps but
>continuous receive traffic) - I can send it as v2 or 7/6 patch (whatever
>suits David better, as he already applied it but didn't push).

	It's not useless and dangerous, it's preserving the intended
behavior for backwards compatibility with existing configurations.
Further, the current behavior is what allows ARP monitor for the
loadbalance modes (-xor and -rr) to work at all.

	The use case is any -xor or -rr bond using ARP monitor against a
switch configured for Etherchannel (static link aggregation) on the
bonded ports.  In those cases, the incoming ARP replies from an ARP
target will be delivered to only one slave (because switches generally
do load balancing by hash), and the other slaves will not see any
incoming ARP traffic from the arp_target.  Even if the switch did round
robin, the slaves still won't see enough ARPs coming in to reliably keep
the slaves up, even if there is lots of regular traffic.

	The fact that "accept only ARP" does not work at all for the
loadbalance modes is the reason that arp_validate is not permitted for
those modes.

	I think the bottom line here is pretty simple:

	Using the ARP monitor with the loadbalance modes is not a common
configuration in my experience, and making it work is tricky.  However,
anyone using it today will be relying on the current behavior, which we
therefore must not change.

	-J


>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 0f613ae..87358e5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2286,6 +2286,11 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
> 	__be32 sip, tip;
> 	int alen;
> +	if (!USES_PRIMARY(bond->params.mode)) {
>+		slave->last_arp_rx = jiffies;
>+		return RX_HANDLER_ANOTHER;
>+	}
>+
> 	if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
> 		return RX_HANDLER_ANOTHER;
> 
>
>>
>>>Also, the net_device->last_rx is already used in a lot of drivers (even
>>>though the comment states to NOT do it :)), and it's also ugly to modify it
>>>from bonding.
>>
>>	I didn't check, but I suspect those are mostly leftovers from
>>the distant past, when the drivers were expected to update last_rx, or
>>perhaps drivers using it for their own purposes.
>
>It's really a mix. Somebody just updates them, somebody uses it for their
>own purposes etc.
>
>>
>>	I don't really see an issue in decoupling bonding from the
>>net_device->last_rx; it's pretty much the same thing that was done for
>>trans_start some time ago.
>
>trans_start removal is also in queue :). Though still needs some
>polishing...
>
>>
>>	-J
>>
>>---
>>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>>
>>
>>>So, to fix this, remove the last_rx from bonding, *always* call
>>>bond_arp_rcv() in slave's rx_handler (bond_handle_frame), and if we spot an
>>>arp there - update the slave->last_arp_rx - and use it instead of
>>>net_device->last_rx. Finally, rename slave_last_rx() to slave_last_arp_rx()
>>>to reflect the changes.
>>>
>>>As the changes touch really sensitive parts, I've tried to split them as
>>>much as possible, for easier debugging/bisecting.
>>>
>>>CC: Jay Vosburgh <fubar@us.ibm.com>
>>>CC: Andy Gospodarek <andy@greyhouse.net>
>>>CC: "David S. Miller" <davem@davemloft.net>
>>>CC: netdev@vger.kernel.org
>>>Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
>>>
>>>---
>>> drivers/net/bonding/bond_main.c    | 18 ++++++++----------
>>> drivers/net/bonding/bond_options.c | 12 ++----------
>>> drivers/net/bonding/bonding.h      | 16 ++++++----------
>>> include/linux/netdevice.h          |  8 +-------
>>> 4 files changed, 17 insertions(+), 37 deletions(-)

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* [PATCH net-next] net: eth_type_trans() should use skb_header_pointer()
From: Eric Dumazet @ 2014-01-16 23:03 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev
In-Reply-To: <1389745269.2025.228.camel@bwh-desktop.uk.level5networks.com>

From: Eric Dumazet <edumazet@google.com>

eth_type_trans() can read uninitialized memory as drivers
do not necessarily pull more than 14 bytes in skb->head before
calling it.

As David suggested, we can use skb_header_pointer() to
fix this without breaking some drivers that might not expect
eth_type_trans() pulling 2 additional bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
---
Since this bug is very old, I cooked the patch on net-next

 net/ethernet/eth.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 8f032bae60ad..5dc638cad2e1 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -156,7 +156,9 @@ EXPORT_SYMBOL(eth_rebuild_header);
  */
 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 {
-	struct ethhdr *eth;
+	unsigned short _service_access_point;
+	const unsigned short *sap;
+	const struct ethhdr *eth;
 
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
@@ -194,7 +196,8 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 	 *      layer. We look for FFFF which isn't a used 802.2 SSAP/DSAP. This
 	 *      won't work for fault tolerant netware but does for the rest.
 	 */
-	if (unlikely(skb->len >= 2 && *(unsigned short *)(skb->data) == 0xFFFF))
+	sap = skb_header_pointer(skb, 0, sizeof(*sap), &_service_access_point);
+	if (sap && *sap == 0xFFFF)
 		return htons(ETH_P_802_3);
 
 	/*

^ permalink raw reply related

* Re: [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
From: Ben Hutchings @ 2014-01-16 23:16 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Linux Netdev List, Tom Herbert, Eric Dumazet, David S. Miller,
	Zhi Yong Wu
In-Reply-To: <CAEH94LjHcxkXLvs6AMNop_yzKG0zaT2mAgsAb21i+RtMbkXbmQ@mail.gmail.com>

On Fri, 2014-01-17 at 06:00 +0800, Zhi Yong Wu wrote:
> On Fri, Jan 17, 2014 at 5:31 AM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
> > On Wed, 2014-01-15 at 22:20 +0800, Zhi Yong Wu wrote:
> > [...]
> >> +static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
> >> +{
> >> +     int rc = 0;
> >> +
> >> +#ifdef CONFIG_RFS_ACCEL
> >> +     struct virtio_device *vdev = vi->vdev;
> >> +     unsigned int irq;
> >> +     int i;
> >> +
> >> +     if (!vi->affinity_hint_set)
> >> +             goto out;
> >> +
> >> +     vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->max_queue_pairs);
> >> +     if (!vi->dev->rx_cpu_rmap) {
> >> +             rc = -ENOMEM;
> >> +             goto out;
> >> +     }
> >> +
> >> +     for (i = 0; i < vi->max_queue_pairs; i++) {
> >> +             irq = virtqueue_get_vq_irq(vdev, vi->rq[i].vq);
> >> +             if (irq == -1)
> >> +                     goto failed;
> >
> > Jumping into an if-statement is confusing.  Also do you really want to
> > return 0 in this case?
> No, If it fail to get irq, i want it to exit as soon as possible,
> otherwise it will cause irq_cpu_rmap_add() to be invoked with one
> incorrect argument irq.

Well currently this goto does result in returning 0, as rc has not been
changed after its initialisation to 0.

> By the way, do you have thought about if it makes sense to add aRFS
> support to virtio_net? For [patch 2/3], what do you think of those
> missing stuff listed by me?
> For how indirect table is implemented in sfc NIC, do you have any doc
> to share with  me? thanks.

Going through that list:

> 1.)  guest virtio_net driver should have one filter table and its
> entries can be expired periodically;

In sfc, we keep a count how many entries have been inserted in each NAPI
context.  Whenever the NAPI poll function is about to call
napi_complete() and the count for that context has reached a trigger
level, it will scan some quota of filter entries for expiry.

> 2.)  guest virtio_net driver should pass rx queue index and filter
> info down to the emulated virtio_net NIC in QEMU.
> 3.) the emulated virtio_net NIC should have its indirect table to
> store the flow to rx queue mapping.
> 4.) the emulated virtio_net NIC should classify the rx packet to
> selected queue by applying the filter.

I think the most efficient way to do this would be to put a hash table
in some shared memory that both guest and host can read and write.  The
virtio control path would only be used to set up and tear down the
table.  I don't know whether virtio allows for that.

However, to take advantage of ARFS on a physical net driver, it would be
necessary to send a control request for part 2.

> 5.) update virtio spec.
> Do i miss anything? If yes, please correct me.
> For 3.) and 4.), do you have any doc about how they are implemented in
> physical NICs? e.g. mlx4_en or sfc, etc.

The Programmer's Reference Manuals for Solarflare controllers are only
available under NDA.  I can describe the hardware filtering briefly, but
actually I don't think it's very relevant to virtio_net.

There is a typical RSS hash indirection table (128 entries), but for
ARFS we use a different RX filter table which has 8K entries
(RX_FILTER_TBL0 on SFC4000/SFC9000 family).

Solarflare controllers support user-level networking, which requires
perfect filtering to deliver each application's flows into that
application's dedicated RX queue(s).  Lookups in this larger filter
table are still hash-based, but each entry specifies a TCP/IP or UDP/IP
4-tuple or local 2-tuple to match.  ARFS uses the 4-tuple type only.

To allow for hash collisions, a secondary hash function generates an
increment to be added to the initial table index repeatedly for hash
chaining.  There is a control register which tells the controller the
maximum hash chain length to search for each IP filter type; after this
it will fall back to checking MAC filters and then default filters.

On the SFC9100 family, filter updates and lookups are implemented by
firmware and the driver doesn't manage the filter table itself, but I
know it is still a hash table of perfect filters.

For ARFS, perfect filtering is not needed.  I think it would be
preferable to use a fairly big hash table and make insertion fail in
case of a collision.  Since the backend for virtio_net will do RX queue
selection in software, the entire table of queue indices should fit into
its L1 cache.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [RFC PATCH 0/3] Use cached allocations in place of order-3 allocations for sk_page_frag_refill() and __netdev_alloc_frag()
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm

This is a hack against 3.10.y to see if using cached allocations works better here. The unintended consequence is in the reference benchmark case, it performs ~7% better than the existing code even with a hacked slower get_page()/put_page(). The intent was to avoid very slow order-3 allocations (and really pathological retries under failure) which can cause lots of problems from OOM killer invocation to direct reclaim/compaction cycles that take up nearly all cpu and end up reaping large amounts of page cache which would have been otherwise useful. This is a regression from the same code that used order-0 allocations since those are easy and fast as they are cached per-cpu, and this code is under very heavy alloc/free behavior. This patch eliminates a majority of that due to slab caching t
 he allocations, though could still be improved by slab holding onto free'd slabs longer; this seems like an unoptimized case when object size == slab size.

vmstat output of bad behavior: http://pastebin.ubuntu.com/6687527/

This patchset could be fixed for submission by either making another pool of cached frag buffers specifically page_frag (not using slab), or by converting the whole stack to not use get_page/put_page() to reference count and free page allocations so that hacking swap.c is not necessary and slab use normal.

Benchmark:
ifconfig lo mtu 16436
perf record ./netperf -t UDP_STREAM ; perf report

With order-0 allocations:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

262144   65507   10.00      820758      0    43012.26
262144           10.00      820754           43012.05

# Overhead  Command      Shared Object                                      Symbol
# ........  .......  .................  ..........................................
#
    46.15%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string              
     7.89%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags               
     6.06%  netperf  [kernel.kallsyms]  [k] get_page_from_freelist                
     3.87%  netperf  [kernel.kallsyms]  [k] __rmqueue                             
     1.36%  netperf  [kernel.kallsyms]  [k] __alloc_pages_nodemask                
     1.11%  netperf  [kernel.kallsyms]  [k] alloc_pages_current                   

linux-3.10.y stock order-3 allocations:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00     1054158      0    55243.69
212992           10.00     1019505           53427.68

# Overhead  Command      Shared Object                                      Symbol
# ........  .......  .................  ..........................................
#
    59.80%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string              
     2.35%  netperf  [kernel.kallsyms]  [k] get_page_from_freelist                
     1.95%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags               
     1.27%  netperf  [ip_tables]        [k] ipt_do_table                          
     1.26%  netperf  [kernel.kallsyms]  [k] udp_sendmsg                           
     1.03%  netperf  [kernel.kallsyms]  [k] enqueue_task_fair                     
     1.00%  netperf  [kernel.kallsyms]  [k] ip_finish_output                              

With this patchset:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00     1127089      0    59065.70
212992           10.00     1072997           56230.98


# Overhead  Command      Shared Object                                      Symbol
# ........  .......  .................  ..........................................
#
    69.16%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string
     2.56%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags
     1.00%  netperf  [ip_tables]        [k] ipt_do_table
     0.96%  netperf  [kernel.kallsyms]  [k] sock_alloc_send_pskb
     0.93%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [RFC PATCH 1/3] Supporting hacks to be able to test slab allocated buffers in place of page_frag without rewriting lots of net code. We make several assumptions here, first that slab allocator is selected. Second, no one is doing get_page or put_page on pages marked PG_slab. Third we allocated all slabs page aligned that we do these calls on.
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm
In-Reply-To: <1389914224-10453-1-git-send-email-dbanerje@akamai.com>

---
 include/linux/mm.h |  6 ++++++
 mm/slab.c          |  8 ++++++++
 mm/swap.c          | 13 ++++++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..de21a92 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -398,12 +398,18 @@ static inline void get_huge_page_tail(struct page *page)
 }
 
 extern bool __get_page_tail(struct page *page);
+extern struct page *slabpage_to_headpage(struct page *page);
 
 static inline void get_page(struct page *page)
 {
 	if (unlikely(PageTail(page)))
 		if (likely(__get_page_tail(page)))
 			return;
+
+	//Hack for slab page
+	if (unlikely(page->flags & (1L << PG_slab)))
+		page = slabpage_to_headpage(page);
+
 	/*
 	 * Getting a normal page or the head of a compound page
 	 * requires to already have an elevated page->_count.
diff --git a/mm/slab.c b/mm/slab.c
index bd88411..36d5176 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -483,6 +483,14 @@ static inline unsigned int obj_to_index(const struct kmem_cache *cache,
 	return reciprocal_divide(offset, cache->reciprocal_buffer_size);
 }
 
+struct page *slabpage_to_headpage(struct page *page)
+{
+	//Hack to support get_page/put_page on slabs bigger than a page
+	unsigned int idx = obj_to_index(page->slab_cache, page->slab_page, page_address(page));
+	return virt_to_page(index_to_obj(page->slab_cache, page->slab_page, idx));
+}
+EXPORT_SYMBOL(slabpage_to_headpage);
+
 static struct arraycache_init initarray_generic =
     { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
 
diff --git a/mm/swap.c b/mm/swap.c
index 9f2225f..94c75bc 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -172,9 +172,20 @@ skip_lock_tail:
 	}
 }
 
+extern struct page *slabpage_to_headpage(struct page *page);
+
 void put_page(struct page *page)
 {
-	if (unlikely(PageCompound(page)))
+	if (unlikely(page->flags & (1L << PG_slab))) {
+		struct page *head_page = slabpage_to_headpage(page);
+		//Hack. Assume we have >PAGE_SIZE and aligned slabs, and no one is dumb enough
+		//to do a put_page to 0 on a slab page without meaning to free it from the slab.
+		if (put_page_testzero(head_page)) {
+			get_page(head_page); //restore 1 _count for slab
+			kmem_cache_free(page->slab_cache, page_address(head_page));
+		}
+	}
+	else if (unlikely(PageCompound(page)))
 		put_compound_page(page);
 	else if (put_page_testzero(page))
 		__put_single_page(page);
-- 
1.8.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 2/3] Use slab allocations for netdev page_frag receive buffers
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm
In-Reply-To: <1389914224-10453-1-git-send-email-dbanerje@akamai.com>

---
 net/core/skbuff.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d9e8736..7ecb7a8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -368,6 +368,8 @@ struct netdev_alloc_cache {
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
 
+struct kmem_cache *netdev_page_frag_cache;
+
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
 	struct netdev_alloc_cache *nc;
@@ -379,18 +381,22 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 	nc = &__get_cpu_var(netdev_alloc_cache);
 	if (unlikely(!nc->frag.page)) {
 refill:
-		for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
-			gfp_t gfp = gfp_mask;
-
-			if (order)
-				gfp |= __GFP_COMP | __GFP_NOWARN;
-			nc->frag.page = alloc_pages(gfp, order);
-			if (likely(nc->frag.page))
-				break;
-			if (--order < 0)
-				goto end;
+		if (NETDEV_FRAG_PAGE_MAX_ORDER > 0) {
+			void *kmem = kmem_cache_alloc(netdev_page_frag_cache, gfp_mask | __GFP_NOWARN);
+			if (likely(kmem)) {
+				nc->frag.page = virt_to_page(kmem);
+				nc->frag.size = PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER;
+				goto recycle;
+			}
 		}
-		nc->frag.size = PAGE_SIZE << order;
+
+		nc->frag.page = alloc_page(gfp_mask);
+
+		if (likely(nc->frag.page))
+			nc->frag.size = PAGE_SIZE;
+		else
+			goto end;
+
 recycle:
 		atomic_set(&nc->frag.page->_count, NETDEV_PAGECNT_MAX_BIAS);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
@@ -3092,6 +3098,11 @@ void __init skb_init(void)
 						0,
 						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
 						NULL);
+	netdev_page_frag_cache = kmem_cache_create("netdev_page_frag_cache",
+						PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER,
+						PAGE_SIZE,
+						SLAB_HWCACHE_ALIGN,
+						NULL);
 }
 
 /**
-- 
1.8.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 3/3] Use slab allocations for sk page_frag send buffers
From: Debabrata Banerjee @ 2014-01-16 23:17 UTC (permalink / raw)
  To: eric.dumazet, fw, netdev; +Cc: dbanerje, johunt, jbaron, davem, linux-mm
In-Reply-To: <1389914224-10453-1-git-send-email-dbanerje@akamai.com>

---
 net/core/sock.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 6565431..dbbd2f9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1792,10 +1792,12 @@ EXPORT_SYMBOL(sock_alloc_send_skb);
 
 /* On 32bit arches, an skb frag is limited to 2^15 */
 #define SKB_FRAG_PAGE_ORDER	get_order(32768)
+struct kmem_cache *sk_page_frag_cache;
 
 bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 {
 	int order;
+	gfp_t gfp_mask = sk->sk_allocation;
 
 	if (pfrag->page) {
 		if (atomic_read(&pfrag->page->_count) == 1) {
@@ -1807,21 +1809,25 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
 		put_page(pfrag->page);
 	}
 
-	/* We restrict high order allocations to users that can afford to wait */
-	order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
+	order = SKB_FRAG_PAGE_ORDER;
 
-	do {
-		gfp_t gfp = sk->sk_allocation;
-
-		if (order)
-			gfp |= __GFP_COMP | __GFP_NOWARN;
-		pfrag->page = alloc_pages(gfp, order);
-		if (likely(pfrag->page)) {
+	if (order > 0) {
+		void *kmem = kmem_cache_alloc(sk_page_frag_cache, gfp_mask | __GFP_NOWARN);
+		if (likely(kmem)) {
+			pfrag->page = virt_to_page(kmem);
 			pfrag->offset = 0;
 			pfrag->size = PAGE_SIZE << order;
 			return true;
 		}
-	} while (--order >= 0);
+	}
+
+	pfrag->page = alloc_page(gfp_mask);
+
+	if (likely(pfrag->page)) {
+		pfrag->offset = 0;
+		pfrag->size = PAGE_SIZE;
+		return true;
+	}
 
 	sk_enter_memory_pressure(sk);
 	sk_stream_moderate_sndbuf(sk);
@@ -2822,13 +2828,18 @@ static __net_init int proto_init_net(struct net *net)
 {
 	if (!proc_create("protocols", S_IRUGO, net->proc_net, &proto_seq_fops))
 		return -ENOMEM;
-
+	sk_page_frag_cache = kmem_cache_create("sk_page_frag_cache",
+			  PAGE_SIZE << SKB_FRAG_PAGE_ORDER,
+			  PAGE_SIZE,
+			  SLAB_HWCACHE_ALIGN,
+			  NULL);
 	return 0;
 }
 
 static __net_exit void proto_exit_net(struct net *net)
 {
 	remove_proc_entry("protocols", net->proc_net);
+	kmem_cache_destroy(sk_page_frag_cache);
 }
 
 
-- 
1.8.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH 00/13] Assorted mvneta fixes and improvements
From: David Miller @ 2014-01-16 23:21 UTC (permalink / raw)
  To: w; +Cc: netdev, thomas.petazzoni, gregory.clement, arno, eric.dumazet,
	ben
In-Reply-To: <1389856819-6503-1-git-send-email-w@1wt.eu>

From: Willy Tarreau <w@1wt.eu>
Date: Thu, 16 Jan 2014 08:20:06 +0100

> this series provides some fixes for a number of issues met with the
> mvneta driver, then adds some improvements. Patches 1-5 are fixes
> and would be needed in 3.13 and likely -stable. The next ones are
> performance improvements and cleanups :

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 0/9] net: stmmac PM related fixes.
From: David Miller @ 2014-01-16 23:24 UTC (permalink / raw)
  To: srinivas.kandagatla; +Cc: netdev, peppe.cavallaro, linux-kernel
In-Reply-To: <1389869321-27411-1-git-send-email-srinivas.kandagatla@st.com>

From: <srinivas.kandagatla@st.com>
Date: Thu, 16 Jan 2014 10:48:41 +0000

> During PM_SUSPEND_FREEZE testing, I have noticed that PM support in STMMAC is
> partly broken. I had to re-arrange the code to do PM correctly. There were lot
> of things I did not like personally and some bits did not work in the first
> place. I thought this is the nice opportunity to clean the mess up.

Series applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next v4 1/6] net: allow > 0 order atomic page alloc in skb_page_frag_refill
From: David Miller @ 2014-01-16 23:28 UTC (permalink / raw)
  To: mwdalton; +Cc: mst, netdev, virtualization, edumazet, bhutchings
In-Reply-To: <1389901950-3854-1-git-send-email-mwdalton@google.com>

All 6 patches applied.

Next time, PLEASE, give me a header email ala "[PATCH net-next v4 0/6]" giving
a broad overview of the series.

This serves several purposes.

First, it gives me a single top-level email to reply to when I want to let
you know that I've either applied or rejected this series.  Because you
didn't provide a header posting, I have to pick an arbitrary one of
the patches to use for this purpose as I have done here.

Second, it gives a place for you to describe at a high level what the patch
series is doing.  I create dummy merge commits and place that descriptive
text into it, so that anyone else looking at the GIT history can see that
these patches go together as a coherent unit and what that unit is trying
to achieve.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v4 1/6] net: allow > 0 order atomic page alloc in skb_page_frag_refill
From: David Miller @ 2014-01-16 23:30 UTC (permalink / raw)
  To: mwdalton; +Cc: netdev, edumazet, rusty, mst, jasowang, bhutchings,
	virtualization
In-Reply-To: <20140116.152800.901405996505782677.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 16 Jan 2014 15:28:00 -0800 (PST)

> All 6 patches applied.

Actually, I reverted, please resubmit this series with the following
build warning corrected:

net/core/net-sysfs.c: In function ‘rx_queue_add_kobject’:
net/core/net-sysfs.c:767:21: warning: ignoring return value of ‘sysfs_create_group’, declared with attribute warn_unused_result [-Wunused-result]

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: eth_type_trans() should use skb_header_pointer()
From: David Miller @ 2014-01-16 23:30 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhutchings, netdev
In-Reply-To: <1389913411.31367.430.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 16 Jan 2014 15:03:31 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> eth_type_trans() can read uninitialized memory as drivers
> do not necessarily pull more than 14 bytes in skb->head before
> calling it.
> 
> As David suggested, we can use skb_header_pointer() to
> fix this without breaking some drivers that might not expect
> eth_type_trans() pulling 2 additional bytes.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Since this bug is very old, I cooked the patch on net-next

Applied, thanks a lot Eric.

^ permalink raw reply

* Re: [RFC] sysfs_rename_link() and its usage
From: Eric W. Biederman @ 2014-01-16 23:34 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: Tejun Heo, Greg KH, linux-kernel, netdev
In-Reply-To: <20140116001116.GA27182@redhat.com>

Veaceslav Falico <vfalico@redhat.com> writes:

> On Wed, Jan 15, 2014 at 03:25:16PM -0800, Eric W. Biederman wrote:
>>Tejun Heo <tj@kernel.org> writes:
>>
>>> Hey, Veaceslav, Eric.
>
> Hi Tejun, Eric,
>
>>>
>>> On Tue, Jan 14, 2014 at 05:35:23PM -0800, Eric W. Biederman wrote:
>>>> >>> >>This works like a charm. However, if I want to use (obviously, with the
>>>> >>> >>symlink present):
>>>> >>> >>
>>>> >>> >>sysfs_rename_link(&(a->dev.kobj), &(b->dev.kobj), oldname, newname);
>>>> >>> >
>>>> >>> >You forgot the namespace option to this call, what kernel version are
>>>> >>> >you using here?
>>>> >>>
>>>> >>> It's git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ,
>>>> >>> 3.13-rc6 with some networking patches on top of it.
>>>
>>> Does this work on 3.12?  How about Greg's driver-core-next?  Do you
>>> have a minimal test case that I can use to reproduce the issue?
>
> Sorry for the latency in responses, I'll update once I'll manage to test it
> on those.
>
> ...snip...
>>> Veaceslav, please confirm whether the issue is reproducible w/ v3.12.
>>
>>Anyway since a symlink living in a different namespace from it's target
>>is just nonsense this (only compile tested) patch should fix the issue,
>>and make sysfs_rename_link usable for people without a masters degree in
>>sysfs again.
>
> It's still there :-(. I've used your patch and added my small addition[1] to
> test the sysfs_rename_link() (on top of net-next, 3.13-rc7), however the
> issue is still there:

I expect the bug is my quick patch missed testing for sysfs_ns_type to
see if we care at all about namespaces in sysfs_rename_link.

Which would make just the sysfs_rename_link bit look like.

Something like that.

Eric


diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 3ae3f1bf1a09..8b51d1b6cc21 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -194,15 +194,13 @@ EXPORT_SYMBOL_GPL(sysfs_remove_link);
  *     @targ:  object we're pointing to.
  *     @old:   previous name of the symlink.
  *     @new:   new name of the symlink.
- *     @new_ns: new namespace of the symlink.
- *
  *     A helper function for the common rename symlink idiom.
  */
-int sysfs_rename_link_ns(struct kobject *kobj, struct kobject *targ,
-                        const char *old, const char *new, const void *new_ns)
+int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
+                     const char *old, const char *new)
 {
        struct sysfs_dirent *parent_sd, *sd = NULL;
-       const void *old_ns = NULL;
+       const void *old_ns = NULL, *new_ns = NULL;
        int result;
 
        if (!kobj)
@@ -224,13 +222,16 @@ int sysfs_rename_link_ns(struct kobject *kobj, struct kobject *targ,
        if (sd->s_symlink.target_sd->s_dir.kobj != targ)
                goto out;
 
+       if (sysfs_ns_type(parent_sd))
+               new_ns = kobject_namespace(targ);
+
        result = sysfs_rename(sd, parent_sd, new, new_ns);
 
 out:
        sysfs_put(sd);
        return result;
 }
-EXPORT_SYMBOL_GPL(sysfs_rename_link_ns);
+EXPORT_SYMBOL_GPL(sysfs_rename_link);
 
 static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,
                                 struct sysfs_dirent *target_sd, char *path)



>
> [   79.038340] net bond0: renaming to bondbla
> [   79.038380] ------------[ cut here ]------------
> [   79.038411] WARNING: CPU: 1 PID: 5318 at fs/sysfs/dir.c:618
> sysfs_find_dirent+0x84/0x110()
> [   79.038449] sysfs: ns invalid in 'bridge0' for 'lower_bond0'
> ...snip...
> [   79.038877]  [<ffffffff810ae826>] warn_slowpath_fmt+0x46/0x50
> [   79.038903]  [<ffffffff812ba890>] ? sysfs_get_dirent_ns+0x30/0x80
> [   79.038930]  [<ffffffff812b97c4>] sysfs_find_dirent+0x84/0x110
> [   79.038957]  [<ffffffff812ba89e>] sysfs_get_dirent_ns+0x3e/0x80
> [   79.038983]  [<ffffffff812baf87>] sysfs_rename_link+0x57/0xe0
> [   79.039030]  [<ffffffff81689e72>] netdev_adjacent_rename_links+0xa2/0x160
>
> The current scheme (sysfs_remove_link() + sysfs_add_link()) works perfectly
> well without any namespaces. I'll dig into it once I have some spare time,
> it's not at all critical.
>
> [1]: the patch (I've included your patch too, just in case):
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 67b180d..0c9377a 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1825,9 +1825,8 @@ int device_rename(struct device *dev, const char *new_name)
>  	}
>  	if (dev->class) {
> -		error = sysfs_rename_link_ns(&dev->class->p->subsys.kobj,
> -					     kobj, old_device_name,
> -					     new_name, kobject_namespace(kobj));
> +		error = sysfs_rename_link(&dev->class->p->subsys.kobj,
> +					  kobj, old_device_name, new_name);
>  		if (error)
>  			goto out;
>  	}
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index 3ae3f1b..651444a 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c
> @@ -194,12 +194,10 @@ EXPORT_SYMBOL_GPL(sysfs_remove_link);
>   *	@targ:	object we're pointing to.
>   *	@old:	previous name of the symlink.
>   *	@new:	new name of the symlink.
> - *	@new_ns: new namespace of the symlink.
> - *
>   *	A helper function for the common rename symlink idiom.
>   */
> -int sysfs_rename_link_ns(struct kobject *kobj, struct kobject *targ,
> -			 const char *old, const char *new, const void *new_ns)
> +int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> +		      const char *old, const char *new)
>  {
>  	struct sysfs_dirent *parent_sd, *sd = NULL;
>  	const void *old_ns = NULL;
> @@ -224,13 +222,13 @@ int sysfs_rename_link_ns(struct kobject *kobj, struct kobject *targ,
>  	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
>  		goto out;
>  -	result = sysfs_rename(sd, parent_sd, new, new_ns);
> +	result = sysfs_rename(sd, parent_sd, new, kobject_namespace(targ));
>  out:
>  	sysfs_put(sd);
>  	return result;
>  }
> -EXPORT_SYMBOL_GPL(sysfs_rename_link_ns);
> +EXPORT_SYMBOL_GPL(sysfs_rename_link);
>  static int sysfs_get_target_path(struct sysfs_dirent *parent_sd,
>  				 struct sysfs_dirent *target_sd, char *path)
> diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
> index 6695040..093d992 100644
> --- a/include/linux/sysfs.h
> +++ b/include/linux/sysfs.h
> @@ -213,9 +213,8 @@ int __must_check sysfs_create_link_nowarn(struct kobject *kobj,
>  					  const char *name);
>  void sysfs_remove_link(struct kobject *kobj, const char *name);
>  -int sysfs_rename_link_ns(struct kobject *kobj, struct kobject *target,
> -			 const char *old_name, const char *new_name,
> -			 const void *new_ns);
> +int sysfs_rename_link(struct kobject *kobj, struct kobject *target,
> +		      const char *old_name, const char *new_name);
>  void sysfs_delete_link(struct kobject *dir, struct kobject *targ,
>  			const char *name);
> @@ -341,9 +340,8 @@ static inline void sysfs_remove_link(struct kobject *kobj, const char *name)
>  {
>  }
>  -static inline int sysfs_rename_link_ns(struct kobject *k, struct kobject *t,
> -				       const char *old_name,
> -				       const char *new_name, const void *ns)
> +static inline int sysfs_rename_link(struct kobject *k, struct kobject *t,
> +				    const char *old_name, const char *new_name)
>  {
>  	return 0;
>  }
> @@ -455,12 +453,6 @@ static inline void sysfs_remove_file(struct kobject *kobj,
>  	return sysfs_remove_file_ns(kobj, attr, NULL);
>  }
>  -static inline int sysfs_rename_link(struct kobject *kobj, struct kobject
> *target,
> -				    const char *old_name, const char *new_name)
> -{
> -	return sysfs_rename_link_ns(kobj, target, old_name, new_name, NULL);
> -}
> -
>  static inline struct sysfs_dirent *
>  sysfs_get_dirent(struct sysfs_dirent *parent_sd, const unsigned char *name)
>  {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 9957557..5d24d8e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5005,19 +5005,20 @@ EXPORT_SYMBOL(netdev_upper_dev_unlink);
>  void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>  {
>  	struct netdev_adjacent *iter;
> +	char old_linkname[IFNAMSIZ+7], new_linkname[IFNAMSIZ+7];
>  	list_for_each_entry(iter, &dev->adj_list.upper, list) {
> -		netdev_adjacent_sysfs_del(iter->dev, oldname,
> -					  &iter->dev->adj_list.lower);
> -		netdev_adjacent_sysfs_add(iter->dev, dev,
> -					  &iter->dev->adj_list.lower);
> +		sprintf(old_linkname, "lower_%s", oldname);
> +		sprintf(new_linkname, "lower_%s", dev->name);
> +		sysfs_rename_link(&(iter->dev->dev.kobj), &(dev->dev.kobj),
> +				  old_linkname, new_linkname);
>  	}
>  	list_for_each_entry(iter, &dev->adj_list.lower, list) {
> -		netdev_adjacent_sysfs_del(iter->dev, oldname,
> -					  &iter->dev->adj_list.upper);
> -		netdev_adjacent_sysfs_add(iter->dev, dev,
> -					  &iter->dev->adj_list.upper);
> +		sprintf(old_linkname, "upper_%s", oldname);
> +		sprintf(new_linkname, "upper_%s", dev->name);
> +		sysfs_rename_link(&(iter->dev->dev.kobj), &(dev->dev.kobj),
> +				  old_linkname, new_linkname);
>  	}
>  }
>  

^ permalink raw reply related

* Re: [net-next 0/6] Intel Wired LAN Driver Updates
From: David Miller @ 2014-01-16 23:35 UTC (permalink / raw)
  To: aaron.f.brown; +Cc: netdev, gospo, sassmann
In-Reply-To: <1389868210-24035-1-git-send-email-aaron.f.brown@intel.com>

From: Aaron Brown <aaron.f.brown@intel.com>
Date: Thu, 16 Jan 2014 02:30:04 -0800

> This series contains updates to ixgbe and ixgbevf.

Series applied, thanks.

^ permalink raw reply

* Re: TI CPSW Ethernet Tx performance regression
From: Florian Fainelli @ 2014-01-16 23:35 UTC (permalink / raw)
  To: Mugunthan V N; +Cc: Ben Hutchings, netdev
In-Reply-To: <52D77716.1020205@ti.com>

2014/1/15 Mugunthan V N <mugunthanvnm@ti.com>:
> Hi
>
> On Thursday 16 January 2014 02:51 AM, Florian Fainelli wrote:
>> 2014/1/15 Ben Hutchings <bhutchings@solarflare.com>:
>>> On Wed, 2014-01-15 at 18:18 +0530, Mugunthan V N wrote:
>>>> Hi
>>>>
>>>> I am seeing a performance regression with CPSW driver on AM335x EVM. AM335x EVM
>>>> CPSW has 3.2 kernel support [1] and Mainline support from 3.7. When I am
>>>> comparing the performance between 3.2 and 3.13-rc4. TCP receive performance of
>>>> CPSW between 3.2 and 3.13-rc4 is same (~180Mbps) but TCP Transmit performance
>>>> is poor comparing to 3.2 kernel. In 3.2 kernel is it *256Mbps* and in 3.13-rc4
>>>> it is *70Mbps*
>>>>
>>>> Iperf version is *iperf version 2.0.5 (08 Jul 2010) pthreads* on both PC and EVM
>>>>
>>>> On UDP transmit also performance is down comparing to 3.2 kernel. In 3.2 it is
>>>> 196Mbps for 200Mbps band width and in 3.13-rc4 it is 92Mbps
>>>>
>>>> Can someone point me out where can I look for improving Tx performance. I also
>>>> checked whether there is Tx descriptor over flow and there is none. I have
>>>> tries 3.11 and some older kernel, all are giving ~75Mbps Transmit performance
>>>> only.
>>>>
>>>> [1] - http://arago-project.org/git/projects/?p=linux-am33x.git;a=summary
>>> If you don't get any specific suggestions, you could try bisecting to
>>> find out which specific commit(s) changed the performance.
>> Not necessarily related to that issue, but there are a few
>> weird/unusual things done in the CPSW interrupt handler:
>>
>> static irqreturn_t cpsw_interrupt(int irq, void *dev_id)
>> {
>>         struct cpsw_priv *priv = dev_id;
>>
>>         cpsw_intr_disable(priv);
>>         if (priv->irq_enabled == true) {
>>                 cpsw_disable_irq(priv);
>>                 priv->irq_enabled = false;
>>         }
>>
>>         if (netif_running(priv->ndev)) {
>>                 napi_schedule(&priv->napi);
>>                 return IRQ_HANDLED;
>>         }
>>
>> Checking for netif_running() should not be required, you should not
>> get any TX/RX interrupts if your interface is not running.
>
> The driver also supports Dual EMAC with one physical device. More
> description can be found in [1] under the topic *9.2.1.5.2 Dual Mac
> Mode*. If the first interface is down and the second interface is up,
> without checking the interface we will not know which napi to schedule.
>
>>
>>
>>         priv = cpsw_get_slave_priv(priv, 1);
>>         if (!priv)
>>                 return IRQ_NONE;
>>
>> Should not this be moved up as the very first conditional check to do?
>> is not there a risk to leave the interrupts disabled and not
>> re-enabled due to the first 5 lines at the top?
>
> This has to be kept here to check if the interrupt is triggered by the
> second Ethernet port interface when the first interface is down.
>
>>
>>
>>         if (netif_running(priv->ndev)) {
>>                 napi_schedule(&priv->napi);
>>                 return IRQ_HANDLED;
>>         }
>>
>> This was done before, why doing it again?
>>
>> In drivers/net/ethernet/ti/davinci_cpdma.c::cpdma_chan_process()
>> treats equally an error processing a packet (and will stop there) as
>> well as successfully processing num_tx packets, is that also
>> intentional? Should you attempt to keep processing "quota" packets?
>
> I tried it in my local build but no success.
>
>>
>> As Ben suggests, bisecting what is causing the regression is your best bet here.
>
> I can do a bisect but the issue is I don't have a good commit to bisect
> as 3.2 kernel is TI maintained repo and is not upstreamed as is. CPSW
> with base port support is available in mainline kernel from v3.7, and I
> have tested till v3.7 and the Transmit performance is poor when compared
> to v3.2 kernel maintained by TI.

Whenever I had bad TX performance with hardware, the culprit was that
transmit buffers were not freed quickly enough so the transmit
scheduler cannot push as many packets as expected. When this happens,
the root cause for me was bad TX interrupt which messed up the TX flow
control, but there are plenty other stuff that can go wrong.

You could try to check a few things like TX interrupt rate for the
same workload on both kernels, dump the queue usage every few seconds
etc...

>
> [1] - http://www.ti.com/lit/ug/sprugz8e/sprugz8e.pdf
>
> Regards
> Mugunthan V N



-- 
Florian

^ permalink raw reply

* Re: [PATCH] e1000e: Fix compilation warning when !CONFIG_PM_SLEEP
From: David Miller @ 2014-01-16 23:36 UTC (permalink / raw)
  To: mika.westerberg
  Cc: davidx.m.ertman, aaron.f.brown, jeffrey.t.kirsher, bruce.w.allan,
	netdev
In-Reply-To: <1389875979-30340-1-git-send-email-mika.westerberg@linux.intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Thu, 16 Jan 2014 14:39:39 +0200

> Commit 7509963c703b (e1000e: Fix a compile flag mis-match for
> suspend/resume) moved suspend and resume hooks to be available when
> CONFIG_PM is set. However, it can be set even if CONFIG_PM_SLEEP is not set
> causing following warnings to be emitted:
> 
> drivers/net/ethernet/intel/e1000e/netdev.c:6178:12: warning:
>   	‘e1000_suspend’ defined but not used [-Wunused-function]
> 
> drivers/net/ethernet/intel/e1000e/netdev.c:6185:12: warning:
> 	‘e1000_resume’ defined but not used [-Wunused-function]
> 
> To fix this make the hooks to be available only when CONFIG_PM_SLEEP is set
> and remove CONFIG_PM wrapping from driver ops because this is already
> handled by SET_SYSTEM_SLEEP_PM_OPS() and SET_RUNTIME_PM_OPS().
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH V3 net-next 1/3] ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
From: Hannes Frederic Sowa @ 2014-01-16 23:41 UTC (permalink / raw)
  To: Florent Fourcot; +Cc: netdev
In-Reply-To: <1389889158-1710-1-git-send-email-florent.fourcot@enst-bretagne.fr>

On Thu, Jan 16, 2014 at 05:19:16PM +0100, Florent Fourcot wrote:
> diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> index cbc9351..55823f1 100644
> --- a/net/ipv6/ip6_flowlabel.c
> +++ b/net/ipv6/ip6_flowlabel.c
> @@ -486,6 +486,11 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq)
>  	struct ipv6_pinfo *np = inet6_sk(sk);
>  	struct ipv6_fl_socklist *sfl;
>  
> +	if (np->repflow) {
> +		freq->flr_label = np->flow_label;
> +		return 0;
> +	}
> +
>  	rcu_read_lock_bh();

I am still not sure if we should allow quering the label on repflow, if sender
can change it and we don't update the np->flow_label.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH V3 net-next 1/3] ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
From: Hannes Frederic Sowa @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Florent Fourcot, netdev
In-Reply-To: <20140116234103.GE17529@order.stressinduktion.org>

On Fri, Jan 17, 2014 at 12:41:03AM +0100, Hannes Frederic Sowa wrote:
> On Thu, Jan 16, 2014 at 05:19:16PM +0100, Florent Fourcot wrote:
> > diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> > index cbc9351..55823f1 100644
> > --- a/net/ipv6/ip6_flowlabel.c
> > +++ b/net/ipv6/ip6_flowlabel.c
> > @@ -486,6 +486,11 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq)
> >  	struct ipv6_pinfo *np = inet6_sk(sk);
> >  	struct ipv6_fl_socklist *sfl;
> >  
> > +	if (np->repflow) {
> > +		freq->flr_label = np->flow_label;
> > +		return 0;
> > +	}
> > +
> >  	rcu_read_lock_bh();
> 
> I am still not sure if we should allow quering the label on repflow, if sender
> can change it and we don't update the np->flow_label.

Disregard this comment, it was wrong.

Sorry,

  Hannes

^ permalink raw reply

* Re: [PATCH] net: sk == 0xffffffff fix - not for commit
From: Andrew Ruder @ 2014-01-17  0:01 UTC (permalink / raw)
  To: Andrzej Pietrasiewicz
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Kyungmin Park, Felipe Balbi,
	Greg Kroah-Hartman, Marek Szyprowski, Michal Nazarewicz,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1386589672-5830-1-git-send-email-andrzej.p-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>

On Mon, Dec 09, 2013 at 12:47:52PM +0100, Andrzej Pietrasiewicz wrote:
> With g_ether loaded the sk occasionally becomes 0xffffffff.
> It happens usually after transferring few hundreds of kilobytes to few
> tens of megabytes. If sk is 0xffffffff then dereferencing it causes
> kernel panic.

Don't know if this is relevant but I had this very similar stack trace
come up a few days ago (below).  I am working on a PXA 270/xscale with
gcc version 4.8.2 (Buildroot 2013.11-rc1-00028-gf388663).  Going to try
to see if I can reproduce it a little more readily before I start trying
to narrow down what is causing it.

===
Unable to handle kernel NULL pointer dereference at virtual address 00000011
pgd = d18e0000
[00000011] *pgd=a6d03831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: zeusvirt(O) zeus16550(O) 8390p ipv6
CPU: 0 PID: 2365 Comm: sshd Tainted: G           O 3.12.0+ #201
task: d7216f00 ti: d7144000 task.ti: d7144000
PC is at tcp_v4_early_demux+0xe8/0x154
LR is at __inet_lookup_established+0x1bc/0x2e0
pc : [<c0341cfc>]    lr : [<c0329bd8>]    psr: a0000013
sp : d7145b20  ip : d7145ae8  fp : d7145b44
r10: c0576c28  r9 : 00000008  r8 : d7998800
r7 : d7063800  r6 : c6cf2480  r5 : ffffffff  r4 : c6cf2480
r3 : c02ec018  r2 : d7145ad0  r1 : d7b66a28  r0 : ffffffff
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0000397f  Table: b18e0000  DAC: 00000015
Process sshd (pid: 2365, stack limit = 0xd71441c8)
Stack: (0xd7145b20 to 0xd7146000)
5b20: 17bf3f0a 00000016 00000003 c0026d90 d71f4634 d71f4600 d7145b6c d7145b48
5b40: c03211b4 c0341c20 000005ea d7bb0538 d7063800 00000034 d71f4600 c6cf2480
5b60: d7145b9c d7145b70 c03218dc c0321158 00001001 00000000 c0576c1c 00000000
5b80: c0577e84 c0576c14 00000000 00000000 d7145be4 d7145ba0 c02fae04 c03215d4
5ba0: c0590330 c057fc08 d7145bfc c6cf2480 c02571a0 c0576c28 000007e1 c05a3dc0
5bc0: 00000000 00000001 c05a3d60 c05a3d74 c05a3d60 c05a3d68 d7145bfc d7145be8
5be0: c02fb990 c02fa8f0 c05a3dc0 00000000 d7145c24 d7145c00 c02fc46c c02fb968
5c00: c02fc3dc c05a3dc0 c05a3d60 00000001 0000012c 00000040 d7145c64 d7145c28
5c20: c02fbcd0 c02fc3e8 00000000 d78af3c0 d7145c5c 00008d99 00000000 00000001
5c40: c05a81f0 00000003 00000100 3fa57e1c d7144028 c05a81ec d7145cb4 d7145c68
5c60: c0026a44 c02fbc10 d7145c8c d7145c78 c00538dc c0056ce4 00000000 00008d98
5c80: 00400100 0000000a c0228594 60000093 c0590330 00000000 d7145d54 00000001
5ca0: d7bb0480 000005b4 d7145ccc d7145cb8 c0026ca4 c00268f4 00000000 d7144010
5cc0: d7145ce4 d7145cd0 c0026f58 c0026c58 000000ab 0000001a d7145d04 d7145ce8
5ce0: c000f7d0 c0026ed0 00140000 d7145d20 a0000013 ffffffff d7145d1c d7145d08
5d00: c00085bc c000f768 c02f0048 c00ca7d8 d7145d7c d7145d20 c03a7dc0 c0008590
5d20: 000118ed 00000000 c05a474c c05d41cc d7bb0180 d18ed800 d7801080 000006a3
5d40: 00000001 d7bb0480 000005b4 d7145d7c d7145d80 d7145d68 c02f0048 c00ca7d8
5d60: a0000013 ffffffff c05a4738 d7bb0180 d7145dac d7145d80 c02f0048 c00ca7b0
5d80: 00000001 00c63fc0 d7b66a00 d7b66a00 00004040 000005b4 00000000 d7b66a00
5da0: d7145dcc d7145db0 c032e340 c02effd0 d7145e98 00004040 0008c414 00000000
5dc0: d7145e54 d7145dd0 c032f368 c032e310 d7145e24 c02ea81c c03a6040 c03a9c6c
5de0: 00000000 00000000 d7145ee8 00000000 000005b4 00000000 d7b66adc 00000000
5e00: 00000000 d7144000 00001854 000005b4 000027ec 00000040 d7116d80 000005b4
5e20: 00000000 00000000 d7145e6c d7b66a00 d7145ee8 d7145e98 00004040 00004040
5e40: 00004040 00020000 d7145e74 d7145e58 c03526c8 c032eb0c d7145e78 d7116d80
5e60: d7145ee0 d7116d80 d7145ed4 d7145e78 c02e63a4 c0352688 c05a3dc0 d7142000
5e80: 00000040 00004040 d76701c0 d7145ee0 00000000 d7145e98 00000000 00000000
5ea0: d7145ee0 00000001 00000000 00000000 00000040 d7145ee8 c6cf2900 00000000
5ec0: 00000000 d7145f78 d7145f44 d7145ed8 c00d1c64 c02e62e4 00000000 00000000
5ee0: 00089c28 00004040 d7116d80 00000000 00000000 d7145e78 d7216f00 00000000
5f00: 00000000 00000000 00000000 00000000 00004040 00000000 00000000 00000000
5f20: 00089c28 d7116d80 00089c28 d7145f78 00004040 00089c28 d7145f74 d7145f48
5f40: c00d23a0 c00d1bf4 00000000 00000000 00000000 00000000 d7116d80 00000000
5f60: 00089c28 00004040 d7145fa4 d7145f78 c00d2948 c00d22c0 00000000 00000000
5f80: beed167c 00000003 000614dc 00000004 c000ea28 d7144000 00000000 d7145fa8
5fa0: c000e7e0 c00d2908 beed167c 00000003 00000003 00089c28 00004040 beed167c
5fc0: beed167c 00000003 000614dc 00000004 00089c28 00060a88 0000093e beed17a0
5fe0: beed167c beed1648 00029910 b6dc821c 60000010 00000003 ffffffff ffffffff
[<c0341cfc>] (tcp_v4_early_demux+0xe8/0x154) from [<c03211b4>] (ip_rcv_finish+0x68/0x2c0)
[<c03211b4>] (ip_rcv_finish+0x68/0x2c0) from [<c03218dc>] (ip_rcv+0x314/0x398)
[<c03218dc>] (ip_rcv+0x314/0x398) from [<c02fae04>] (__netif_receive_skb_core+0x520/0x5d8)
[<c02fae04>] (__netif_receive_skb_core+0x520/0x5d8) from [<c02fb990>] (__netif_receive_skb+0x34/0x88)
[<c02fb990>] (__netif_receive_skb+0x34/0x88) from [<c02fc46c>] (process_backlog+0x90/0x148)
[<c02fc46c>] (process_backlog+0x90/0x148) from [<c02fbcd0>] (net_rx_action+0xcc/0x258)
[<c02fbcd0>] (net_rx_action+0xcc/0x258) from [<c0026a44>] (__do_softirq+0x15c/0x2e0)
[<c0026a44>] (__do_softirq+0x15c/0x2e0) from [<c0026ca4>] (do_softirq+0x58/0x64)
[<c0026ca4>] (do_softirq+0x58/0x64) from [<c0026f58>] (irq_exit+0x94/0xf0)
[<c0026f58>] (irq_exit+0x94/0xf0) from [<c000f7d0>] (handle_IRQ+0x74/0x90)
[<c000f7d0>] (handle_IRQ+0x74/0x90) from [<c00085bc>] (ichp_handle_irq+0x38/0x40)
[<c00085bc>] (ichp_handle_irq+0x38/0x40) from [<c03a7dc0>] (__irq_svc+0x40/0x6c)
Exception stack(0xd7145d20 to 0xd7145d68)
5d20: 000118ed 00000000 c05a474c c05d41cc d7bb0180 d18ed800 d7801080 000006a3
5d40: 00000001 d7bb0480 000005b4 d7145d7c d7145d80 d7145d68 c02f0048 c00ca7d8
5d60: a0000013 ffffffff
[<c03a7dc0>] (__irq_svc+0x40/0x6c) from [<c00ca7d8>] (ksize+0x34/0xc8)
[<c00ca7d8>] (ksize+0x34/0xc8) from [<c02f0048>] (__alloc_skb+0x84/0x15c)
[<c02f0048>] (__alloc_skb+0x84/0x15c) from [<c032e340>] (sk_stream_alloc_skb+0x3c/0x108)
[<c032e340>] (sk_stream_alloc_skb+0x3c/0x108) from [<c032f368>] (tcp_sendmsg+0x868/0xd34)
[<c032f368>] (tcp_sendmsg+0x868/0xd34) from [<c03526c8>] (inet_sendmsg+0x4c/0x78)
[<c03526c8>] (inet_sendmsg+0x4c/0x78) from [<c02e63a4>] (sock_aio_write+0xcc/0xdc)
[<c02e63a4>] (sock_aio_write+0xcc/0xdc) from [<c00d1c64>] (do_sync_write+0x7c/0xa0)
[<c00d1c64>] (do_sync_write+0x7c/0xa0) from [<c00d23a0>] (vfs_write+0xec/0x194)
[<c00d23a0>] (vfs_write+0xec/0x194) from [<c00d2948>] (SyS_write+0x4c/0x7c)
[<c00d2948>] (SyS_write+0x4c/0x7c) from [<c000e7e0>] (ret_fast_syscall+0x0/0x2c)
Code: 0a000019 e59f306c e5845010 e5843068 (e5d53012)
---[ end trace 5a028e59aa5bc81a ]---
Kernel panic - not syncing: Fatal exception in interrupt
===
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox