Netdev List
 help / color / mirror / Atom feed
* Re: suspicious RCU usage in net/ipv4/ip_tunnel.c:80
From: Eric Dumazet @ 2014-01-13  8:20 UTC (permalink / raw)
  To: Cong Wang; +Cc: Tom Herbert, netdev
In-Reply-To: <CAHA+R7Pe_pGFw5cA3v_RnGvOurAykJNVH5=DeeEVHG84faWi2Q@mail.gmail.com>

On Sun, 2014-01-12 at 22:36 -0800, Cong Wang wrote:
> > Please read rcu_dereference_protected() documentation in
> > include/linux/rcupdate.h
> 
> I did before I replied.



> 
> >
> > Also you can run sparse, with CONFIG_SPARSE_RCU_POINTER=y in
> > your .config
> >
> > make C=2 net/ipv4/ip_tunnel.o
> >
> > And then you'll know the answer to this question.
> >
> 
> Sounds like it is only to shut up a sparse warning, then its name
> is misleading, we clearly don't dereference it here.

Historical reasons, you should have been there when Paul invented the
name and lazy people like us let him do so !

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b62730baea32f86fe91a7930e4b7ee8d82778b79

You are lucky, there is plenty of documentation, maybe too much..

^ permalink raw reply

* Re: ipv6: default route for link local address is not added while assigning a address
From: sohny thomas @ 2014-01-13  8:49 UTC (permalink / raw)
  To: netdev, linux-kernel, yoshfuji, davem, kumuda, hannes
In-Reply-To: <52D2F201.1090903@gmail.com>

On Friday 10 January 2014 10:46 PM, Hannes Frederic Sowa wrote:
> On Fri, Jan 10, 2014 at 05:33:08PM +0100, Nicolas Dichtel wrote:
>> CC: netdev
>>
>> Le 10/01/2014 13:20, sohny thomas a écrit :
>>> Default route for link local address is configured automatically if
>>> NETWORKING_IPV6=yes is in ifcfg-eth*.
>>> When the route table for the interface is flushed and a new address is
>>> added to
>>> the same device with out removing linklocal addr, default route for link
>>> local
>>> address has to added by default.
>> I would say that removing the link local route but not the link local
>> address
>> is a configuration problem.
>> If you remove a connected route but not the associated address, you will
>> have
>> the same problem.
> We have some user accessible routes that are essential for IPv6 stack
> to work at all. So I don't know if I can agree with that.
>
> Maybe flush is a bit too aggressive?
>
Hi ,

Thank you for the inputs.

In the test for ipv6 default address selection , we are testing the rule
2 as specified in RFC 6724

   If Scope(SA) < Scope(SB): If Scope(SA) < Scope(D), then prefer SB
     and otherwise prefer SA.
     Similarly, if Scope(SB) < Scope(SA): If Scope(SB) < Scope(D), then
     prefer SA and otherwise prefer SB.

Test:

   Check 04:
      Destination: ff08::2(OS)
      Candidate Source Addresses: fec0::1(SS) or LLA(LS)
      Result: fec0::1(SS)

      Scope(LLA) < Scope(fec0::1): If Scope(LLA) < Scope(ff08::2),  yes, 
prefer fec0::1


Now in the test its flushing all the routes and adding an address ,
which in causes to add route into the routing table including the link
local routes.
Earlier in 2.6.32 it used to work fine now due to the above mentioned
check-in this is not happening

Of course we can still just delete a route and add , but even if we
delete the link local route, IMHO i think it should update the LLA route
when the interface is next added an address or bought up which ever is
the case.

Regards,
Sohny

^ permalink raw reply

* Re: [PATCH net-next v2 4/4] virtio-net: initial debugfs support, export mergeable rx buffer size
From: Michael S. Tsirkin @ 2014-01-13  9:40 UTC (permalink / raw)
  To: Michael Dalton; +Cc: netdev, lf-virt, Eric Dumazet, David S. Miller
In-Reply-To: <CANJ5vP+Vu8=haCetZkcefPgR8pGe0iDZ03EZhJHLSgunCaHJsQ@mail.gmail.com>

On Sun, Jan 12, 2014 at 03:32:28PM -0800, Michael Dalton wrote:
> Hi Michael,
> 
> On Sun, Jan 12, 2014 at 9:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Can't we add struct attribute * to netdevice, and pass that in when
> > creating the kobj?
> 
> I like that idea, I think that will work and should be better than
> the alternatives. The actual kobjs for RX queues (struct netdev_rx_queue)
> are allocated and deallocated by calls to net_rx_queue_update_kobjects,
> which resizes RX queue kobjects when the netdev RX queues are resized.
> 
> Is this what you had in mind:
> (1) Add a pointer to an attribute group to struct net_device, used for
>     per-netdev rx queue attributes and initialized before the call to
>     register_netdevice().
> (2) Declare an attribute group containing the mergeable_rx_buffer_size
>     attribute in virtio-net, and initialize the per-netdevice group pointer
>     to the address of this group in virtnet_probe before register_netdevice
> (3) In net-sysfs, modify net_rx_queue_update_kobjects
>     (or rx_queue_add_kobject) to call sysfs_create_group on the
>     per-netdev attribute group (if non-NULL), adding the attributes in
>     the group to the RX queue kobject.


Exactly.

> That should allow us to have per-RX queue attributes that are
> device-specific. I'm not a sysfs expert, but it seems that rx_queue_ktype
> and rx_queue_sysfs_ops presume that all rx queue sysfs operations are
> performed on attributes of type rx_queue_attribute. That type will need
> to be moved from net-sysfs.c to a header file like netdevice.h so that
> the type can be used in virtio-net when we declare the
> mergeable_rx_buffer_size attribute.
> 
> The last issue is how the rx_queue_attribute 'show' function
> implementation for mergeable_rx_buffer_size will access the appropriate
> per-receive queue EWMA data. The arguments to the show function will be
> the netdev_rx_queue and the attribute itself. We can get to the
> struct net_device from the netdev_rx_queue.  If we extended
> netdev_rx_queue to indicate the queue_index or to store a void *priv_data
> pointer, that would be sufficient to allow us to resolve this issue.

Hmm netdev_rx_queue is not defined unless CONFIG_RPS is set.
Maybe we should use a different structure.


> Please let me know if the above sounds good or if you see a better way
> to accomplish this goal. Thanks!
> 
> Best,
> 
> Mike

Sounds good to me.

^ permalink raw reply

* RE: Use of ENOTSUPP in drivers?
From: David Laight @ 2014-01-13  9:41 UTC (permalink / raw)
  To: 'Ben Hutchings', Sabrina Dubroca; +Cc: netdev@vger.kernel.org
In-Reply-To: <1389559622.3720.115.camel@deadeye.wl.decadent.org.uk>

From: Ben Hutchings
> I believe they should all be patched.  According to
> include/linux/errno.h, ENOTSUPP is meant for use in the NFSv3 code only.
> (But it's apparently erroneously used *all over* the tree, not just in
> net drivers!)

The 'real' problem is that there are quite a few very specific errno
values, but very few general ones.
I suspect that once (way back in the 1980s - the definitions predate linux)
adding new errno values was easy, and some that might have been general
got documented for their single use - the text for EAGAIN used to be
"No more processes".
New errno values were added for some subsystems (like NFS) but documented
with subsystem-specific texts.
So there are few generic errno codes except EINVAL and ENXIO - neither
of which is helpful.
Nothing for thinks like:
- I don't understand the request.
- I don't support the request.
- Some internal limit reached.
etc.

	David


^ permalink raw reply

* RE: [PATCH net-next v3 8/9] xen-netback: Timeout packets in RX path
From: Paul Durrant @ 2014-01-13  9:53 UTC (permalink / raw)
  To: Zoltan Kiss, Ian Campbell, Wei Liu,
	xen-devel@lists.xenproject.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jonathan Davies
In-Reply-To: <52D33138.4090903@citrix.com>

> -----Original Message-----
> From: Zoltan Kiss
> Sent: 13 January 2014 00:20
> To: Paul Durrant; Ian Campbell; Wei Liu; xen-devel@lists.xenproject.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jonathan Davies
> Subject: Re: [PATCH net-next v3 8/9] xen-netback: Timeout packets in RX
> path
> 
> On 09/01/14 09:20, Paul Durrant wrote:
> >> We are adding the skb to vif->rx_queue even when
> >> xenvif_rx_ring_slots_available(vif, min_slots_needed) said there is no
> >> space for that. Or am I missing something? Paul?
> >>
> > That's correct. Part of the flow control improvement was to get rid of
> needless packet drops. For your purposes, you basically need to avoid using
> the queuing discipline and take packets into netback's vif->rx_queue
> regardless of the state of the shared ring so that you can drop them if they
> get beyond a certain age. So, perhaps you should never stop the netif
> queue, place an upper limit on vif->rx_queue (either packet or byte count)
> and drop when that is exceeded (i.e. mimicking pfifo or bfifo internally).
> >
> How about this:
> - when the timer fires first we wake up the thread an tell it to drop
> all the packets in rx_queue
> - start_xmit then can drain the qdisc queue into the device queue
> - additionally, the RX thread should stop that timer when it was able to
> do some work
> 

Yes, you could do it that way.

  Paul

^ permalink raw reply

* RE: [PATCH net-next 3/3] packet: use percpu mmap tx frame pending refcount
From: David Laight @ 2014-01-13  9:55 UTC (permalink / raw)
  To: 'Cong Wang', Daniel Borkmann; +Cc: David Miller, netdev
In-Reply-To: <CAHA+R7NnaxXgOGW=jTEGkeMByO7wXVr9bGR-J9rLzwL4ne52bA@mail.gmail.com>

From: Cong Wang
> On Sun, Jan 12, 2014 at 8:22 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> > +static void packet_inc_pending(struct packet_ring_buffer *rb)
> > +{
> > +       this_cpu_inc(*rb->pending_refcnt);
> > +}
> > +
> > +static void packet_dec_pending(struct packet_ring_buffer *rb)
> > +{
> > +       this_cpu_dec(*rb->pending_refcnt);
> > +}
> > +
> > +static int packet_read_pending(const struct packet_ring_buffer *rb)
> > +{
> > +       int i, refcnt = 0;
> > +
> > +       /* We don't use pending refcount in rx_ring. */
> > +       if (rb->pending_refcnt == NULL)
> > +               return 0;
> > +
> > +       for_each_possible_cpu(i)
> > +               refcnt += *per_cpu_ptr(rb->pending_refcnt, i);
> > +
> > +       return refcnt;
> > +}
> 
> How is this supposed to work? Since there is no lock,
> you can't read accurate refcnt. Take a look at lib/percpu_counter.c.
> 
> I guess for some reason you don't care the accuracy?
> Then at least you need to comment in the code.

Hmmm... did the code ever work?

The value looks like the number of active transmits (s/pending/tx_pending/)
The test of the number of pending tx is at the bottom of a code loop,
I'd expect that some action should be locked against this check - but even
the atomic read doesn't do this.

Check what happens if the count reads as 1 just before another cpu
decrements it to zero (or reads 0 just before being incremented).
In one of those cases something is likely to go wrong.

I'm actually surprised that there isn't a mutex covering the code path.

	David

^ permalink raw reply

* RE: [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
From: David Laight @ 2014-01-13 10:13 UTC (permalink / raw)
  To: 'Willy Tarreau', davem@davemloft.net
  Cc: netdev@vger.kernel.org, Thomas Petazzoni, Gregory CLEMENT
In-Reply-To: <1389525848-1814-7-git-send-email-w@1wt.eu>

From: Willy Tarreau
> calling dma_map_single()/dma_unmap_single() is quite expensive compared
> to copying a small packet. So let's copy short frames and keep the buffers
> mapped. We set the limit to 256 bytes which seems to give good results both
> on the XP-GP board and on the AX3/4.

Which architecture is this?
I presume it is one that needs iommu setup and/or cache flushing.

> The Rx small packet rate increased by 16.4% doing this, from 486kpps to
> 573kpps. It is worth noting that even the call to the function
> dma_sync_single_range_for_cpu() is expensive (300 ns) although less
> than dma_unmap_single(). Without it, the packet rate raises to 711kpps
> (+24% more). Thus on systems where coherency from device to CPU is
> guaranteed by a snoop control unit, this patch should provide even more
> gains, and probably rx_copybreak could be increased.

Is that the right way around?
If cache coherency is guaranteed then I'd have thought that the dma sync
would be a nop.
...
> +			memcpy(skb_put(skb, rx_bytes),
> +			       data + MVNETA_MH_SIZE + NET_SKB_PAD,
> +			       rx_bytes);

You can probably arrange for the copy to be fully aligned since
the partial words at both ends can be safely read and written.
That might speed things up further.

	David

^ permalink raw reply

* Re: [PATCH 1/3 RESUBMIT 2nd time] b43: Fix lockdep splat
From: Stanislaw Gruszka @ 2014-01-13 10:45 UTC (permalink / raw)
  To: Larry Finger
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1389470266-13150-1-git-send-email-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>

On Sat, Jan 11, 2014 at 01:57:46PM -0600, Larry Finger wrote:
> From: Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Just friendly remark that from formal linux patch posting rules point
of view you can not post patch authored by me without my Signed-off-by.
Sorry for not spot this earlier.

Anyway, further patch V2 posted on mailing list is fine. Please don't
change it :-)

On the future, you can use rarely used "Originally-from:" mark for
posting not signed patches from others, like I did on commit 55eaa7c1f
where I sent Linus' code.

Thanks
Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
From: Willy Tarreau @ 2014-01-13 10:49 UTC (permalink / raw)
  To: David Laight
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Thomas Petazzoni,
	Gregory CLEMENT
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D458FB3@AcuExch.aculab.com>

Hi David,

On Mon, Jan 13, 2014 at 10:13:16AM +0000, David Laight wrote:
> From: Willy Tarreau
> > calling dma_map_single()/dma_unmap_single() is quite expensive compared
> > to copying a small packet. So let's copy short frames and keep the buffers
> > mapped. We set the limit to 256 bytes which seems to give good results both
> > on the XP-GP board and on the AX3/4.
> 
> Which architecture is this?

It's an ARMv7.

> I presume it is one that needs iommu setup and/or cache flushing.

Just wait for cache snoop completion.

> > The Rx small packet rate increased by 16.4% doing this, from 486kpps to
> > 573kpps. It is worth noting that even the call to the function
> > dma_sync_single_range_for_cpu() is expensive (300 ns) although less
> > than dma_unmap_single(). Without it, the packet rate raises to 711kpps
> > (+24% more). Thus on systems where coherency from device to CPU is
> > guaranteed by a snoop control unit, this patch should provide even more
> > gains, and probably rx_copybreak could be increased.
> 
> Is that the right way around?
> If cache coherency is guaranteed then I'd have thought that the dma sync
> would be a nop.

It's a bit more tricky. I found that the DMA API is not optimal for such
an architecture, because we need to wait *once* for the cache snooping to
complete at the beginning of the Rx loop, and then all other access may be
done with a NOP. However, since we don't currently have the ability to do
a first call and replace the other ones with a NOP, we still have to do
a dma_sync_single_for_cpu() for each packet, resulting in waiting for cache
snoop completion for each packet.
 
I've hacked the DMA API to add support for ops->iobarrier and test for it
at the beginning of the loop, call it, then avoid doing dma_sync_* afterwards
if it's defined. That way I reach the higher performance mentionned above.

But in my opinion, this is only material for future discussions.

> ...
> > +			memcpy(skb_put(skb, rx_bytes),
> > +			       data + MVNETA_MH_SIZE + NET_SKB_PAD,
> > +			       rx_bytes);
> 
> You can probably arrange for the copy to be fully aligned since
> the partial words at both ends can be safely read and written.
> That might speed things up further.

In fact it does not, this is what the very first patch did but I did not
see any difference.

Thanks,
Willy

^ permalink raw reply

* RE: PROBLEM: usbnet / ax88179_178a: Panic in usb_hcd_map_urb_for_dma
From: David Laight @ 2014-01-13 11:08 UTC (permalink / raw)
  To: 'Bjørn Mork', Thomas Kear
  Cc: Ben Hutchings, netdev,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <87zjn3z6d5.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

From: Bjørn Mork
> Thomas Kear <thomas@kear.co.nz> writes:
> 
> > On Sat, Jan 11, 2014 at 11:09 AM, Bjrn Mork <bjorn@mork.no> wrote:
> >> But looking at the code I think I found and obvious miss in the SG list
> >> initialisation.  I'll post a proposed fix for that.  Would be good if
> >> someone was able to test it.
> >
> > I've built 3.13.0-rc7-next-20140110 with your patch applied.
> > Unfortunately since this bug has taken anywhere from minutes to days
> > to manifest previously I'm not sure how quickly I'll be able to report
> > on its efficacy.
> 
> Thanks for testing it.
> 
> If I'm correct, then your problem is caused by usbnet incrementing
> urb->num_sgs above the value sg_init_table was called with. This happens
> if usbnet adds padding to a fragmented skb.  Unfortunately I have no
> idea how you can create fragmented skbs with a certain length.  But I'm
> sure others here know?

I've managed to send fragmented skb, but not of specific lengths.
Maybe Nagle can be used on a TCP to get the required merging.
Send an initial fragment to 'prime' Nagle, then send a few fragments
that get sent together when the timer expires (or send data the other
way to force the send).

The patch you submitted is wrong.
Whoever wrote the sg interface was on crack.
The 'last' marker needs moving as well.

	David


^ permalink raw reply

* Re: [PATCH net-next 3/3] packet: use percpu mmap tx frame pending refcount
From: Daniel Borkmann @ 2014-01-13 11:19 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev
In-Reply-To: <CAHA+R7NnaxXgOGW=jTEGkeMByO7wXVr9bGR-J9rLzwL4ne52bA@mail.gmail.com>

On 01/13/2014 06:51 AM, Cong Wang wrote:
> On Sun, Jan 12, 2014 at 8:22 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> +static void packet_inc_pending(struct packet_ring_buffer *rb)
>> +{
>> +       this_cpu_inc(*rb->pending_refcnt);
>> +}
>> +
>> +static void packet_dec_pending(struct packet_ring_buffer *rb)
>> +{
>> +       this_cpu_dec(*rb->pending_refcnt);
>> +}
>> +
>> +static int packet_read_pending(const struct packet_ring_buffer *rb)
>> +{
>> +       int i, refcnt = 0;
>> +
>> +       /* We don't use pending refcount in rx_ring. */
>> +       if (rb->pending_refcnt == NULL)
>> +               return 0;
>> +
>> +       for_each_possible_cpu(i)
>> +               refcnt += *per_cpu_ptr(rb->pending_refcnt, i);
>> +
>> +       return refcnt;
>> +}
>
> How is this supposed to work? Since there is no lock,
> you can't read accurate refcnt. Take a look at lib/percpu_counter.c.
>
> I guess for some reason you don't care the accuracy?

Yep, not per se. Look at how we do net device reference counting.
The reason is that we call packet_read_pending() *only* after we
finished processing all frames in TX_RING and we wait for
completion in case MSG_DONTWAIT is *not set*, when that happens
we're back to 0.

But I think I found a different problem with this idea. It could
happen with net devices as well, but probably less likely as there
might be a better distribution of hold/puts among CPUs. However,
for TX_RING, if we pin the process to a particular CPU, and since
the destructor is invoked through ksoftirqd, we could end up with
a misbalance and if the process runs long enough eventually
overflow for one particular CPU. We could work around that, but I
think it's not worth the effort.

Dave, please drop the 3rd patch of the series, thanks.

> Then at least you need to comment in the code.
>
> Thanks.
>

^ permalink raw reply

* RE: [PATCH net-next v2 1/3] net: add skb_checksum_setup
From: Paul Durrant @ 2014-01-13 11:26 UTC (permalink / raw)
  To: Paul Durrant, netdev@vger.kernel.org, xen-devel@lists.xen.org
  Cc: David Miller, Eric Dumazet, Veaceslav Falico, Alexander Duyck,
	Nicolas Dichtel
In-Reply-To: <1389261768-30606-2-git-send-email-paul.durrant@citrix.com>

> -----Original Message-----
> From: Paul Durrant [mailto:paul.durrant@citrix.com]
> Sent: 09 January 2014 10:03
> To: netdev@vger.kernel.org; xen-devel@lists.xen.org
> Cc: Paul Durrant; David Miller; Eric Dumazet; Veaceslav Falico; Alexander
> Duyck; Nicolas Dichtel
> Subject: [PATCH net-next v2 1/3] net: add skb_checksum_setup
> 
> This patch adds a function to set up the partial checksum offset for IP
> packets (and optionally re-calculate the pseudo-header checksum) into the
> core network code.
> The implementation was previously private and duplicated between xen-
> netback
> and xen-netfront, however it is not xen-specific and is potentially useful
> to any network driver.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Veaceslav Falico <vfalico@redhat.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Ping?

  Paul

> ---
>  include/linux/skbuff.h |    2 +
>  net/core/skbuff.c      |  273
> ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 275 insertions(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index d97f2d0..48b7605 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2893,6 +2893,8 @@ static inline void skb_checksum_none_assert(const
> struct sk_buff *skb)
> 
>  bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off);
> 
> +int skb_checksum_setup(struct sk_buff *skb, bool recalculate);
> +
>  u32 __skb_get_poff(const struct sk_buff *skb);
> 
>  /**
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1d641e7..15057d2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -65,6 +65,7 @@
>  #include <net/dst.h>
>  #include <net/sock.h>
>  #include <net/checksum.h>
> +#include <net/ip6_checksum.h>
>  #include <net/xfrm.h>
> 
>  #include <asm/uaccess.h>
> @@ -3549,6 +3550,278 @@ bool skb_partial_csum_set(struct sk_buff *skb,
> u16 start, u16 off)
>  }
>  EXPORT_SYMBOL_GPL(skb_partial_csum_set);
> 
> +static int skb_maybe_pull_tail(struct sk_buff *skb, unsigned int len,
> +			       unsigned int max)
> +{
> +	if (skb_headlen(skb) >= len)
> +		return 0;
> +
> +	/* If we need to pullup then pullup to the max, so we
> +	 * won't need to do it again.
> +	 */
> +	if (max > skb->len)
> +		max = skb->len;
> +
> +	if (__pskb_pull_tail(skb, max - skb_headlen(skb)) == NULL)
> +		return -ENOMEM;
> +
> +	if (skb_headlen(skb) < len)
> +		return -EPROTO;
> +
> +	return 0;
> +}
> +
> +/* This value should be large enough to cover a tagged ethernet header
> plus
> + * maximally sized IP and TCP or UDP headers.
> + */
> +#define MAX_IP_HDR_LEN 128
> +
> +static int skb_checksum_setup_ip(struct sk_buff *skb, bool recalculate)
> +{
> +	unsigned int off;
> +	bool fragment;
> +	int err;
> +
> +	fragment = false;
> +
> +	err = skb_maybe_pull_tail(skb,
> +				  sizeof(struct iphdr),
> +				  MAX_IP_HDR_LEN);
> +	if (err < 0)
> +		goto out;
> +
> +	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
> +		fragment = true;
> +
> +	off = ip_hdrlen(skb);
> +
> +	err = -EPROTO;
> +
> +	if (fragment)
> +		goto out;
> +
> +	switch (ip_hdr(skb)->protocol) {
> +	case IPPROTO_TCP:
> +		err = skb_maybe_pull_tail(skb,
> +					  off + sizeof(struct tcphdr),
> +					  MAX_IP_HDR_LEN);
> +		if (err < 0)
> +			goto out;
> +
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct tcphdr, check))) {
> +			err = -EPROTO;
> +			goto out;
> +		}
> +
> +		if (recalculate)
> +			tcp_hdr(skb)->check =
> +				~csum_tcpudp_magic(ip_hdr(skb)->saddr,
> +						   ip_hdr(skb)->daddr,
> +						   skb->len - off,
> +						   IPPROTO_TCP, 0);
> +		break;
> +	case IPPROTO_UDP:
> +		err = skb_maybe_pull_tail(skb,
> +					  off + sizeof(struct udphdr),
> +					  MAX_IP_HDR_LEN);
> +		if (err < 0)
> +			goto out;
> +
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct udphdr, check))) {
> +			err = -EPROTO;
> +			goto out;
> +		}
> +
> +		if (recalculate)
> +			udp_hdr(skb)->check =
> +				~csum_tcpudp_magic(ip_hdr(skb)->saddr,
> +						   ip_hdr(skb)->daddr,
> +						   skb->len - off,
> +						   IPPROTO_UDP, 0);
> +		break;
> +	default:
> +		goto out;
> +	}
> +
> +	err = 0;
> +
> +out:
> +	return err;
> +}
> +
> +/* This value should be large enough to cover a tagged ethernet header
> plus
> + * an IPv6 header, all options, and a maximal TCP or UDP header.
> + */
> +#define MAX_IPV6_HDR_LEN 256
> +
> +#define OPT_HDR(type, skb, off) \
> +	(type *)(skb_network_header(skb) + (off))
> +
> +static int skb_checksum_setup_ipv6(struct sk_buff *skb, bool recalculate)
> +{
> +	int err;
> +	u8 nexthdr;
> +	unsigned int off;
> +	unsigned int len;
> +	bool fragment;
> +	bool done;
> +
> +	fragment = false;
> +	done = false;
> +
> +	off = sizeof(struct ipv6hdr);
> +
> +	err = skb_maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
> +	if (err < 0)
> +		goto out;
> +
> +	nexthdr = ipv6_hdr(skb)->nexthdr;
> +
> +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
> +	while (off <= len && !done) {
> +		switch (nexthdr) {
> +		case IPPROTO_DSTOPTS:
> +		case IPPROTO_HOPOPTS:
> +		case IPPROTO_ROUTING: {
> +			struct ipv6_opt_hdr *hp;
> +
> +			err = skb_maybe_pull_tail(skb,
> +						  off +
> +						  sizeof(struct ipv6_opt_hdr),
> +						  MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_optlen(hp);
> +			break;
> +		}
> +		case IPPROTO_AH: {
> +			struct ip_auth_hdr *hp;
> +
> +			err = skb_maybe_pull_tail(skb,
> +						  off +
> +						  sizeof(struct ip_auth_hdr),
> +						  MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_authlen(hp);
> +			break;
> +		}
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp;
> +
> +			err = skb_maybe_pull_tail(skb,
> +						  off +
> +						  sizeof(struct frag_hdr),
> +						  MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct frag_hdr, skb, off);
> +
> +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
> +				fragment = true;
> +
> +			nexthdr = hp->nexthdr;
> +			off += sizeof(struct frag_hdr);
> +			break;
> +		}
> +		default:
> +			done = true;
> +			break;
> +		}
> +	}
> +
> +	err = -EPROTO;
> +
> +	if (!done || fragment)
> +		goto out;
> +
> +	switch (nexthdr) {
> +	case IPPROTO_TCP:
> +		err = skb_maybe_pull_tail(skb,
> +					  off + sizeof(struct tcphdr),
> +					  MAX_IPV6_HDR_LEN);
> +		if (err < 0)
> +			goto out;
> +
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct tcphdr, check))) {
> +			err = -EPROTO;
> +			goto out;
> +		}
> +
> +		if (recalculate)
> +			tcp_hdr(skb)->check =
> +				~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
> +						 &ipv6_hdr(skb)->daddr,
> +						 skb->len - off,
> +						 IPPROTO_TCP, 0);
> +		break;
> +	case IPPROTO_UDP:
> +		err = skb_maybe_pull_tail(skb,
> +					  off + sizeof(struct udphdr),
> +					  MAX_IPV6_HDR_LEN);
> +		if (err < 0)
> +			goto out;
> +
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct udphdr, check))) {
> +			err = -EPROTO;
> +			goto out;
> +		}
> +
> +		if (recalculate)
> +			udp_hdr(skb)->check =
> +				~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
> +						 &ipv6_hdr(skb)->daddr,
> +						 skb->len - off,
> +						 IPPROTO_UDP, 0);
> +		break;
> +	default:
> +		goto out;
> +	}
> +
> +	err = 0;
> +
> +out:
> +	return err;
> +}
> +
> +/**
> + * skb_checksum_setup - set up partial checksum offset
> + * @skb: the skb to set up
> + * @recalculate: if true the pseudo-header checksum will be recalculated
> + */
> +int skb_checksum_setup(struct sk_buff *skb, bool recalculate)
> +{
> +	int err;
> +
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		err = skb_checksum_setup_ip(skb, recalculate);
> +		break;
> +
> +	case htons(ETH_P_IPV6):
> +		err = skb_checksum_setup_ipv6(skb, recalculate);
> +		break;
> +
> +	default:
> +		err = -EPROTO;
> +		break;
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(skb_checksum_setup);
> +
>  void __skb_warn_lro_forwarding(const struct sk_buff *skb)
>  {
>  	net_warn_ratelimited("%s: received packets cannot be forwarded
> while LRO is enabled\n",
> --
> 1.7.10.4

^ permalink raw reply

* Re: US$200 Million Global Humanitarian Projects
From: Mr. William Billington @ 2014-01-13  7:51 UTC (permalink / raw)
  To: Recipients

This is based on blind trust hoping you can be trusted. The above sum is avaliable for worldwide Humanitarian projects and you will get 20% of the total sum for your involvement if interested.

Regards,
Mr. William Billington

^ permalink raw reply

* Re: PROBLEM: usbnet / ax88179_178a: Panic in usb_hcd_map_urb_for_dma
From: Bjørn Mork @ 2014-01-13 11:28 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Kear, Ben Hutchings, netdev, linux-usb@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D458FEE@AcuExch.aculab.com>

David Laight <David.Laight@ACULAB.COM> writes:

> The patch you submitted is wrong.
> Whoever wrote the sg interface was on crack.
> The 'last' marker needs moving as well.

I'm afraid I don't understand what you meant by this.

sg_init_table() set the 'last' marker.  AFAICS, you don't need to change
it unless you want to chain lists.

Care to explain with some code?


Bjørn

^ permalink raw reply

* build_skb() and data corruption
From: Jonas Jensen @ 2014-01-13 11:47 UTC (permalink / raw)
  To: netdev
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, bhutchings, alexander.h.duyck,
	Arnd Bergmann, Florian Fainelli

Please help,

I think I see memory corruption with a driver recently added to linux-next.

The following error occur downloading a large file with wget (or
ncftp): "read error: Bad address"
wget exits and leaves the file unfinished.

The error goes away when build_skb() is patched out, in this case by
allocating pages directly in RX loop.

I currently have two branches with code placed under ifdef USE_BUILD_SKB:
https://bitbucket.org/Kasreyn/linux-next/src/faa7c408a655fdfd7c383f79259031da5a05d61e/drivers/net/ethernet/moxa/moxart_ether.c#cl-472

If build_skb() is the cause, is there something the driver can do about it?

A quick search on "build_skb memory corruption" reveals the following
commit, "igb: Revert support for build_skb in igb"
f9d40f6a9921cc7d9385f64362314054e22152bd:

"The reason for reverting this patch is that it can lead to data corruption.
The following flow was pointed out by Ben Hutchings:
1. skb is forwarded to another device
2. Packet headers are modified and it's put into a queue
3. Second packet is received into the other half of this page
4. Page cannot be reused, so is DMA-unmapped
5. The DMA mapping was non-coherent, so unmap copies or invalidates
cache
The headers added in step 2 get trashed in step 5."


This is extra interesting, errors only happen on a locally mounted
ext3 filesystem, never on tmpfs e.g.:

# mount
/dev/mmcblk0p1 on / type ext3
(rw,relatime,errors=continue,barrier=1,data=ordered)
tmpfs on /dev/shm type tmpfs (rw,relatime,mode=777)
tmpfs on /tmp type tmpfs (rw,relatime)

#cd /tmp
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.  25% |*******                        | 11374k
0:01:36 ETAwget: short write
[  153.560000] wget (383) used greatest stack depth: 4776 bytes left
# rm linux-2.6.11.11.tar.gz
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.  25% |*******                        | 11315k
0:01:34 ETAwget: short write
# rm linux-2.6.11.11.tar.gz
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.  25% |*******                        | 11473k
0:01:38 ETAwget: short write
[  237.300000] wget (387) used greatest stack depth: 4752 bytes left

# cd /root/
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.   3% |*                              |  1647k  0:03:02 ETA
wget: read error: Bad address


Regards,
Jonas

^ permalink raw reply

* RE: PROBLEM: usbnet / ax88179_178a: Panic in usb_hcd_map_urb_for_dma
From: David Laight @ 2014-01-13 11:52 UTC (permalink / raw)
  To: 'Bjørn Mork'
  Cc: Thomas Kear, Ben Hutchings, netdev, linux-usb@vger.kernel.org
In-Reply-To: <87zjn086iz.fsf@nemi.mork.no>

From: Bjørn Mork
> David Laight <David.Laight@ACULAB.COM> writes:
> 
> > The patch you submitted is wrong.
> > Whoever wrote the sg interface was on crack.
> > The 'last' marker needs moving as well.
> 
> I'm afraid I don't understand what you meant by this.
> 
> sg_init_table() set the 'last' marker.  AFAICS, you don't need to change
> it unless you want to chain lists.
> 
> Care to explain with some code?

Just assuming that there will be some code, somewhere, that will try
to process the entire sg list - so won't like the entry with a
NULL pointer and zero length at the end.

If all the places that process the list are given an explicit
number of entries, or don't care about the NULL it doesn't matter.

	David


^ permalink raw reply

* RE: [PATCH net-next 3/3] packet: use percpu mmap tx frame pending refcount
From: David Laight @ 2014-01-13 11:58 UTC (permalink / raw)
  To: 'Daniel Borkmann', Cong Wang; +Cc: David Miller, netdev
In-Reply-To: <52D3CBA5.4080301@redhat.com>

From: Daniel Borkmann
...
> But I think I found a different problem with this idea. It could
> happen with net devices as well, but probably less likely as there
> might be a better distribution of hold/puts among CPUs. However,
> for TX_RING, if we pin the process to a particular CPU, and since
> the destructor is invoked through ksoftirqd, we could end up with
> a misbalance and if the process runs long enough eventually
> overflow for one particular CPU. We could work around that, but I
> think it's not worth the effort.

The sum will be 'correct' when summed across all the cpu even if
one of the values has wrapped - provided all the arithmetic is
unsigned and the variables all the same type.

	David

^ permalink raw reply

* Re: PROBLEM: usbnet / ax88179_178a: Panic in usb_hcd_map_urb_for_dma
From: Bjørn Mork @ 2014-01-13 12:25 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Kear, Ben Hutchings, netdev, linux-usb@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D45904D-VkEWCZq2GCInGFn1LkZF6NBPR1lH4CV8@public.gmane.org>

David Laight <David.Laight-ZS65k/vG3HxXrIkS9f7CXA@public.gmane.org> writes:
> From: Bjørn Mork
>> David Laight <David.Laight-ZS65k/vG3HxXrIkS9f7CXA@public.gmane.org> writes:
>> 
>> > The patch you submitted is wrong.
>> > Whoever wrote the sg interface was on crack.
>> > The 'last' marker needs moving as well.
>> 
>> I'm afraid I don't understand what you meant by this.
>> 
>> sg_init_table() set the 'last' marker.  AFAICS, you don't need to change
>> it unless you want to chain lists.
>> 
>> Care to explain with some code?
>
> Just assuming that there will be some code, somewhere, that will try
> to process the entire sg list - so won't like the entry with a
> NULL pointer and zero length at the end.
>
> If all the places that process the list are given an explicit
> number of entries, or don't care about the NULL it doesn't matter.

I believe all processing use the urb->num_sgs field to limit the number
of entries.  Common interfaces like dma_map_sg() and for_each_sg() limit
their processing to "nents" entries, and the USB code use the value of
urb->num_sgs for this parameter.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 3/3] packet: use percpu mmap tx frame pending refcount
From: Daniel Borkmann @ 2014-01-13 12:57 UTC (permalink / raw)
  To: David Laight; +Cc: Cong Wang, David Miller, netdev
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D459065@AcuExch.aculab.com>

On 01/13/2014 12:58 PM, David Laight wrote:
> From: Daniel Borkmann
> ...
>> But I think I found a different problem with this idea. It could
>> happen with net devices as well, but probably less likely as there
>> might be a better distribution of hold/puts among CPUs. However,
>> for TX_RING, if we pin the process to a particular CPU, and since
>> the destructor is invoked through ksoftirqd, we could end up with
>> a misbalance and if the process runs long enough eventually
>> overflow for one particular CPU. We could work around that, but I
>> think it's not worth the effort.
>
> The sum will be 'correct' when summed across all the cpu even if
> one of the values has wrapped - provided all the arithmetic is
> unsigned and the variables all the same type.

You are right, sorry, too much coffee this morning, I missed that.

Then, imho, this should work.

^ permalink raw reply

* [PATCH net 1/2] net: dev: add netdev_upper_dev_rename()
From: Ding Tianhong @ 2014-01-13 13:08 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Eric Dumazet, David S. Miller,
	Netdev

The function will deal with the links for dev and upper_dev
when the dev's name changed.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 include/linux/netdevice.h |  3 +++
 net/core/dev.c            | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ce2a1f5..1120278 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2896,6 +2896,9 @@ void *netdev_lower_dev_get_private_rcu(struct net_device *dev,
 				       struct net_device *lower_dev);
 void *netdev_lower_dev_get_private(struct net_device *dev,
 				   struct net_device *lower_dev);
+int netdev_upper_dev_rename(struct net_device *dev,
+			    struct net_device *upper_dev,
+			    const char *old_name, const char *new_name);
 int skb_checksum_help(struct sk_buff *skb);
 struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 				  netdev_features_t features, bool tx_path);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ce469e..31bff66 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4997,6 +4997,63 @@ void *netdev_lower_dev_get_private(struct net_device *dev,
 }
 EXPORT_SYMBOL(netdev_lower_dev_get_private);
 
+int netdev_upper_dev_rename(struct net_device *dev,
+			    struct net_device *upper_dev,
+			    const char *old_name, const char *new_name)
+{
+	int ret = 0;
+	struct netdev_adjacent *i, *to_i;
+
+	sysfs_remove_link(&(dev->dev.kobj), old_name);
+	ret = sysfs_create_link(&(dev->dev.kobj), &(upper_dev->dev.kobj),
+				new_name);
+	if (ret) {
+		pr_warn("%s unable to rename link %s to %s.\n",
+			dev->name, old_name, new_name);
+		goto rollback_name;
+	}
+
+	list_for_each_entry(i, &dev->all_adj_list.lower, list) {
+		sysfs_remove_link(&(i->dev->dev.kobj), old_name);
+		ret = sysfs_create_link(&(i->dev->dev.kobj),
+					&(upper_dev->dev.kobj),
+					new_name);
+		if (ret) {
+			pr_warn("%s unable to rename link %s to %s.\n",
+				dev->name, old_name, new_name);
+			goto rollback_lower_name;
+		}
+	}
+
+	return ret;
+
+rollback_lower_name:
+	to_i = i;
+	list_for_each_entry(i, &dev->all_adj_list.lower, list) {
+		sysfs_remove_link(&(i->dev->dev.kobj), new_name);
+		if (sysfs_create_link(&(i->dev->dev.kobj),
+				      &(upper_dev->dev.kobj),
+				      old_name)) {
+			pr_err("%s unable to rename link %s to %s.\n",
+			       i->dev->name, new_name, old_name);
+			BUG();
+		}
+		if (i == to_i)
+			break;
+	}
+rollback_name:
+	sysfs_remove_link(&(dev->dev.kobj), new_name);
+	if (sysfs_create_link(&(dev->dev.kobj), &(upper_dev->dev.kobj),
+			      old_name)) {
+		pr_err("%s unable to rename link %s to %s.\n",
+		       dev->name, new_name, old_name);
+		BUG();
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(netdev_upper_dev_rename);
+
 static void dev_change_rx_flags(struct net_device *dev, int flags)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
-- 
1.8.0

^ permalink raw reply related

* [PATCH net 0/2] bonding: fix the sysfs warning when change the master's name
From: Ding Tianhong @ 2014-01-13 13:08 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev,
	Eric Dumazet

When I change the master's name, and then rebuild the master and ensalve a nic again,
than I got the calltrace:

[329215.749344] WARNING: CPU: 0 PID: 4778 at fs/sysfs/dir.c:486 sysfs_warn_dup+0x87/0xa0()
[329215.749347] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth100/upper_bond0'
[329215.749350] Modules linked in: bonding(O) igb af_packet edd bridge stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave microcode fuse loop dm_mod iTCO_wdt iTCO_vendor_support ipv6 dca i2c_algo_bit ptp i2c_i801 pps_core hid_generic bnx2 lpc_ich i2c_core button acpi_cpufreq serio_raw sg ehci_pci mfd_core pcspkr ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd sd_mod usbcore usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic ata_piix libata megaraid_sas scsi_mod [last unloaded: bonding]
[329215.749437] CPU: 0 PID: 4778 Comm: bash Tainted: G           O 3.13.0-rc6-0.7-default+ #32
[329215.749440] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIOS CTSAV036 04/27/2011
[329215.749443]  00000000000001e6 ffff88032d4dbaf8 ffffffff814e27b8 ffff88032d4dbb38
[329215.749449]  ffffffff81048d07 ffff880330f7ae80 ffff88032cc24000 ffff88032cc24000
[329215.749455]  ffff880330f7ae80 ffff880330f7ae80 0000000000000001 ffff88032d4dbb98
[329215.749461] Call Trace:
[329215.749470]  [<ffffffff814e27b8>] dump_stack+0x6a/0x7a
[329215.749477]  [<ffffffff81048d07>] warn_slowpath_common+0x87/0xb0
[329215.749482]  [<ffffffff81048dd1>] warn_slowpath_fmt+0x41/0x50
[329215.749489]  [<ffffffff81292194>] ? strlcat+0x74/0x90
[329215.749494]  [<ffffffff81205b27>] sysfs_warn_dup+0x87/0xa0
[329215.749500]  [<ffffffff81205eed>] sysfs_add_one+0x4d/0x50
[329215.749505]  [<ffffffff81206f9e>] sysfs_do_create_link_sd+0xbe/0x210
[329215.749511]  [<ffffffff812951a0>] ? sprintf+0x40/0x50
[329215.749516]  [<ffffffff8120714b>] sysfs_create_link+0x2b/0x30
[329215.749523]  [<ffffffff8140a708>] __netdev_adjacent_dev_insert+0x1b8/0x270
[329215.749528]  [<ffffffff8140a7f8>] __netdev_adjacent_dev_link_lists+0x38/0x90
[329215.749533]  [<ffffffff8140a98b>] __netdev_upper_dev_link+0x13b/0x470
[329215.749538]  [<ffffffff8141319c>] ? __ethtool_get_settings+0x5c/0x90
[329215.749547]  [<ffffffffa0722179>] ? bond_update_speed_duplex+0x29/0x70 [bonding]
[329215.749552]  [<ffffffff8140acd1>] netdev_master_upper_dev_link_private+0x11/0x20
[329215.749561]  [<ffffffffa0729246>] bond_enslave+0x806/0xe40 [bonding]
[329215.749570]  [<ffffffffa073241f>] bonding_store_slaves+0x18f/0x1c0 [bonding]
[329215.749576]  [<ffffffff813757ab>] dev_attr_store+0x1b/0x20
[329215.749581]  [<ffffffff812049cc>] sysfs_write_file+0x15c/0x1f0
[329215.749587]  [<ffffffff81188897>] vfs_write+0xc7/0x1e0
[329215.749593]  [<ffffffff814e8d49>] ? retint_swapgs+0xe/0x13
[329215.749598]  [<ffffffff81188acd>] SyS_write+0x5d/0xa0
[329215.749604]  [<ffffffff814f11a2>] system_call_fastpath+0x16/0x1b

-----------------------[ cut here ]-----------------------------------

The reason is that when I change the master's name, there is no way to change
it for slave's upper_xxxx link, it is still the original name, when I release
the master, the link is still there, so if I rebuild a master and the slave
dev to this master again, the warning will occurs.

I fix the problem by export and use a new function netdev_upper_dev_rename(),
it will clean the old link and rebuild a new link for slave when the master's
name changed every time.

I beleave if the slave's name changed, the master still has the old dev link for
this slave, so I will test and fix it in next patchset.

Ding Tianhong (2):
  net: dev: add netdev_upper_dev_rename()
  bonding: rename the dev upper link if the master's name changed

 drivers/net/bonding/bond_main.c | 35 +++++++++++++++++++++++++
 include/linux/netdevice.h       |  3 +++
 net/core/dev.c                  | 57 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)

-- 
1.8.0

^ permalink raw reply

* [PATCH net 2/2] bonding: rename the dev upper link if the master's, name changed
From: Ding Tianhong @ 2014-01-13 13:08 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Eric Dumazet, David S. Miller,
	Netdev

The bond_maste_rename() will rename the links for slave dev's upper dev link,
if faild, it will rollback and rename the new name to old name for slave dev.

Add a new parameter called name to save the old bonding name in struct bonding.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 35 +++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bonding.h   |  1 +
 2 files changed, 36 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4b8c58b..8c044c0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2799,11 +2799,41 @@ re_arm:
 
 /*-------------------------- netdev event handling --------------------------*/
 
+static int bond_master_rename(struct bonding *bond)
+{
+	struct slave *slave;
+	struct list_head *iter;
+	char ori_linkname[IFNAMSIZ + 7], new_linkname[IFNAMSIZ + 7];
+	int err = 0;
+
+	sprintf(ori_linkname, "upper_%s", bond->name);
+	sprintf(new_linkname, "upper_%s", bond->dev->name);
+
+	bond_for_each_slave(bond, slave, iter) {
+
+		err = netdev_upper_dev_rename(slave->dev, bond->dev, ori_linkname,
+					new_linkname);
+		if (err) {
+			pr_err("slave %s unable to rename link %s to %s.\n",
+			       slave->dev->name, bond->name, bond->dev->name);
+			break;
+		}
+	}
+	if (!err)
+		strlcpy(bond->name, bond->dev->name, IFNAMSIZ);
+	return err;
+}
 /*
  * Change device name
  */
 static int bond_event_changename(struct bonding *bond)
 {
+	/* If a rename fails, the rollback will cause another
+	 * rename call with the existing name.
+	 */
+	if (bond_master_rename(bond))
+		return NOTIFY_BAD;
+
 	bond_remove_proc_entry(bond);
 	bond_create_proc_entry(bond);
 
@@ -4418,6 +4448,7 @@ unsigned int bond_get_num_tx_queues(void)
 int bond_create(struct net *net, const char *name)
 {
 	struct net_device *bond_dev;
+	struct bonding *bond;
 	int res;
 
 	rtnl_lock();
@@ -4438,6 +4469,10 @@ int bond_create(struct net *net, const char *name)
 
 	netif_carrier_off(bond_dev);
 
+	bond = netdev_priv(bond_dev);
+
+	strlcpy(bond->name, bond_dev->name, IFNAMSIZ);
+
 	rtnl_unlock();
 	if (res < 0)
 		bond_destructor(bond_dev);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index a9f4f9f..d279f3a 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -232,6 +232,7 @@ struct bonding {
 	struct   proc_dir_entry *proc_entry;
 	char     proc_file_name[IFNAMSIZ];
 #endif /* CONFIG_PROC_FS */
+	char	 name[IFNAMSIZ];
 	struct   list_head bond_list;
 	u32      rr_tx_counter;
 	struct   ad_bond_info ad_info;
-- 
1.8.0

^ permalink raw reply related

* RE: PROBLEM: usbnet / ax88179_178a: Panic in usb_hcd_map_urb_for_dma
From: David Laight @ 2014-01-13 13:26 UTC (permalink / raw)
  To: 'Bjørn Mork'
  Cc: Thomas Kear, Ben Hutchings, netdev,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <87vbxo83wp.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

From: Bjørn Mork [mailto:bjorn@mork.no]
> David Laight <David.Laight@ACULAB.COM> writes:
> > From: Bjørn Mork
> >> David Laight <David.Laight@ACULAB.COM> writes:
> >>
> >> > The patch you submitted is wrong.
> >> > Whoever wrote the sg interface was on crack.
> >> > The 'last' marker needs moving as well.
> >>
> >> I'm afraid I don't understand what you meant by this.
> >>
> >> sg_init_table() set the 'last' marker.  AFAICS, you don't need to change
> >> it unless you want to chain lists.
> >>
> >> Care to explain with some code?
> >
> > Just assuming that there will be some code, somewhere, that will try
> > to process the entire sg list - so won't like the entry with a
> > NULL pointer and zero length at the end.
> >
> > If all the places that process the list are given an explicit
> > number of entries, or don't care about the NULL it doesn't matter.
> 
> I believe all processing use the urb->num_sgs field to limit the number
> of entries.  Common interfaces like dma_map_sg() and for_each_sg() limit
> their processing to "nents" entries, and the USB code use the value of
> urb->num_sgs for this parameter.

Which mostly means that the sg_xxx functions are doing a whole load
of unnecessary instructions and memory accesses...

This probably has a lot to do with the significant difference in the
cpu use for the usb3 and 'normal' ethernet interfaces.

While each bit doesn't seem significant, they soon add up.

	David


^ permalink raw reply

* Re: build_skb() and data corruption
From: Arnd Bergmann @ 2014-01-13 13:29 UTC (permalink / raw)
  To: Jonas Jensen
  Cc: netdev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, bhutchings, alexander.h.duyck,
	Florian Fainelli
In-Reply-To: <CACmBeS35RAEQ+t2vtYtFTKNT-VdQw20hURjTi193Jk8HG7UECA@mail.gmail.com>

On Monday 13 January 2014, Jonas Jensen wrote:
> Please help,
> 
> I think I see memory corruption with a driver recently added to linux-next.
> 
> The following error occur downloading a large file with wget (or
> ncftp): "read error: Bad address"
> wget exits and leaves the file unfinished.
> 
> The error goes away when build_skb() is patched out, in this case by
> allocating pages directly in RX loop.
> 
> I currently have two branches with code placed under ifdef USE_BUILD_SKB:
> https://bitbucket.org/Kasreyn/linux-next/src/faa7c408a655fdfd7c383f79259031da5a05d61e/drivers/net/ethernet/moxa/moxart_ether.c#cl-472
> 

The problem appears to be incorrect coherency management of the
dma buffers. You are using the streaming DMA mapping API on a
platform that is not cache coherent, and you are reusing buffers.

The solution to this problem is to add the appropriate
dma_sync_single_for_device() and dma_sync_single_for_cpu()
calls at the point where buffer ownership changes between the
dma master and the CPU. This will flush the caches after the
filling them on the tx path and invalidate caches when the
dma master fills them on rx.

See Documentation/DMA-API-HOWTO.txt for details.

	Arnd

^ permalink raw reply

* Re: [PATCH 0/2] tun: add the RFS support
From: Zhi Yong Wu @ 2014-01-13 13:29 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List, Eric Dumazet, Zhi Yong Wu
In-Reply-To: <CA+mtBx8b_D+JQaphS4LCbAVKRWE5Tz1-P_eVxJJdG-FbMdWPdw@mail.gmail.com>

On Wed, Jan 1, 2014 at 6:33 AM, Tom Herbert <therbert@google.com> wrote:
> Zhi,
HI, Tom
>
> Thanks for following up on these patches. It would still be nice to
> have performance numbers to show the impact. These will be helpful for
> the next task of integrating RFS into virtio-net.
I don't get why RFS need to be integrated into virtio-net since
virtio-net has supported mq. Can you give some clue? thanks.

>
> Tom
>
> On Tue, Dec 31, 2013 at 10:32 AM, David Miller <davem@davemloft.net> wrote:
>> From: Zhi Yong Wu <zwu.kernel@gmail.com>
>> Date: Sun, 22 Dec 2013 18:54:30 +0800
>>
>>> Since Tom Herbert's hash related patchset was modified and got merged,
>>> his pachset about adding support for RFS on tun flows also need to get
>>> adjusted accordingly. I tried to update them, and before i will start
>>> to do some perf tests, i hope to get one correct code base, so it's time
>>> to post them out now. Any constructive comments are welcome, thanks.
>>
>> Series applied to net-next, thanks.



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox