Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] ipv4: do not cache looped multicasts
From: Julian Anastasov @ 2012-11-22 21:04 UTC (permalink / raw)
  To: netdev; +Cc: Maxime Bizon

	Starting from 3.6 we cache output routes for
multicasts only when using route to 224/4. For local receivers
we can set RTCF_LOCAL flag depending on the membership but
in such case we use maddr and saddr which are not caching
keys as before. Additionally, we can not use same place to
cache routes that differ in RTCF_LOCAL flag value.

	Fix it by caching only RTCF_MULTICAST entries
without RTCF_LOCAL (send-only, no loopback). As a side effect,
we avoid unneeded lookup for fnhe when not caching because
multicasts are not redirected and they do not learn PMTU.

	Thanks to Maxime Bizon for showing the caching
problems in __mkroute_output for 3.6 kernels: different
RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
ip_output call and the visible problem is that traffic can
not reach local receivers via loopback.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Tested-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---

	Patch applies to both trees but should go
to net tree.

 net/ipv4/route.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5b58788..0d73f86 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1785,6 +1785,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	if (dev_out->flags & IFF_LOOPBACK)
 		flags |= RTCF_LOCAL;
 
+	do_cache = true;
 	if (type == RTN_BROADCAST) {
 		flags |= RTCF_BROADCAST | RTCF_LOCAL;
 		fi = NULL;
@@ -1793,6 +1794,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		if (!ip_check_mc_rcu(in_dev, fl4->daddr, fl4->saddr,
 				     fl4->flowi4_proto))
 			flags &= ~RTCF_LOCAL;
+		else
+			do_cache = false;
 		/* If multicast route do not exist use
 		 * default one, but do not gateway in this case.
 		 * Yes, it is hack.
@@ -1802,8 +1805,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	}
 
 	fnhe = NULL;
-	do_cache = fi != NULL;
-	if (fi) {
+	do_cache &= fi != NULL;
+	if (do_cache) {
 		struct rtable __rcu **prth;
 		struct fib_nh *nh = &FIB_RES_NH(*res);
 
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH] ipv4: do not cache looped multicasts
From: David Miller @ 2012-11-22 21:08 UTC (permalink / raw)
  To: ja; +Cc: netdev, mbizon
In-Reply-To: <1353618254-1874-1-git-send-email-ja@ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Thu, 22 Nov 2012 23:04:14 +0200

> 	Starting from 3.6 we cache output routes for
> multicasts only when using route to 224/4. For local receivers
> we can set RTCF_LOCAL flag depending on the membership but
> in such case we use maddr and saddr which are not caching
> keys as before. Additionally, we can not use same place to
> cache routes that differ in RTCF_LOCAL flag value.
> 
> 	Fix it by caching only RTCF_MULTICAST entries
> without RTCF_LOCAL (send-only, no loopback). As a side effect,
> we avoid unneeded lookup for fnhe when not caching because
> multicasts are not redirected and they do not learn PMTU.
> 
> 	Thanks to Maxime Bizon for showing the caching
> problems in __mkroute_output for 3.6 kernels: different
> RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
> ip_output call and the visible problem is that traffic can
> not reach local receivers via loopback.
> 
> Reported-by: Maxime Bizon <mbizon@freebox.fr>
> Tested-by: Maxime Bizon <mbizon@freebox.fr>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH 085/493] net/wireless: remove use of __devexit_p
From: Hin-Tak Leung @ 2012-11-22  6:04 UTC (permalink / raw)
  To: gregkh, Bill Pemberton; +Cc: linux-wireless, netdev
In-Reply-To: <1353349642-3677-85-git-send-email-wfp5p@virginia.edu>



--- On Mon, 19/11/12, Bill Pemberton <wfp5p@virginia.edu> wrote:

> CONFIG_HOTPLUG is going away as an
> option so __devexit_p is no longer
> needed.
> 
> Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
> Cc: "John W. Linville" <linville@tuxdriver.com>
> 
> Cc: Jiri Slaby <jirislaby@gmail.com>
> 
> Cc: Nick Kossifidis <mickflemm@gmail.com>
> 
> Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
> 
> Cc: Simon Kelley <simon@thekelleys.org.uk>
> 
> Cc: Stefano Brivio <stefano.brivio@polimi.it>
> 
> Cc: Stanislav Yakovlev <stas.yakovlev@gmail.com>
> 
> Cc: Dan Williams <dcbw@redhat.com>
> 
> Cc: Christian Lamparter <chunkeey@googlemail.com>
> 
> Cc: Ivo van Doorn <IvDoorn@gmail.com>
> 
> Cc: Gertjan van Wingerde <gwingerde@gmail.com>
> 
> Cc: Helmut Schaa <helmut.schaa@googlemail.com>
> 
> Cc: Herton Ronaldo Krzesinski <herton@canonical.com>
> 
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> 
> Cc: Larry Finger <Larry.Finger@lwfinger.net>
> 
> Cc: Luciano Coelho <coelho@ti.com> 
> Cc: linux-wireless@vger.kernel.org
> 
> Cc: netdev@vger.kernel.org
> 
> Cc: ath5k-devel@lists.ath5k.org
> 
> Cc: b43-dev@lists.infradead.org
> 
> Cc: brcm80211-dev-list@broadcom.com
> 
> Cc: libertas-dev@lists.infradead.org
> 
> Cc: users@rt2x00.serialmonkey.com
> 
> ---

Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>

The rtl818x parts.

^ permalink raw reply

* Re: [PATCH 471/493] net: remove use of __devexit
From: Hin-Tak Leung @ 2012-11-22  6:11 UTC (permalink / raw)
  To: gregkh, Bill Pemberton; +Cc: netdev, linux-wireless
In-Reply-To: <1353349642-3677-471-git-send-email-wfp5p@virginia.edu>



--- On Mon, 19/11/12, Bill Pemberton <wfp5p@virginia.edu> wrote:

> CONFIG_HOTPLUG is going away as an
> option so __devexit is no
> longer needed.
> 
> Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
> Cc: Wolfgang Grandegger <wg@grandegger.com>
> 
> Cc: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
> 
> Cc: Jes Sorensen <jes@trained-monkey.org>
> 
> Cc: Samuel Ortiz <samuel@sortiz.org>
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> 
> Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
> 
> Cc: "VMware, Inc." <pv-drivers@vmware.com>
> 
> Cc: Francois Romieu <romieu@fr.zoreil.com>
> 
> Cc: Krzysztof Halasa <khc@pm.waw.pl> 
> Cc: "John W. Linville" <linville@tuxdriver.com>
> 
> Cc: Jiri Slaby <jirislaby@gmail.com>
> 
> Cc: Nick Kossifidis <mickflemm@gmail.com>
> 
> Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
> 
> Cc: Simon Kelley <simon@thekelleys.org.uk>
> 
> Cc: Stefano Brivio <stefano.brivio@polimi.it>
> 
> Cc: Stanislav Yakovlev <stas.yakovlev@gmail.com>
> 
> Cc: Dan Williams <dcbw@redhat.com>
> 
> Cc: Christian Lamparter <chunkeey@googlemail.com>
> 
> Cc: Herton Ronaldo Krzesinski <herton@canonical.com>
> 
> Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
> 
> Cc: Larry Finger <Larry.Finger@lwfinger.net>
> 
> Cc: Luciano Coelho <coelho@ti.com> 
> Cc: netdev@vger.kernel.org
> 
> Cc: linux-can@vger.kernel.org
> 
> Cc: linux-hippi@sunsite.dk
> 
> Cc: virtualization@lists.linux-foundation.org
> 
> Cc: linux-wireless@vger.kernel.org
> 
> Cc: ath5k-devel@lists.ath5k.org
> 
> Cc: b43-dev@lists.infradead.org
> 
> Cc: libertas-dev@lists.infradead.org
> 
> Cc: xen-devel@lists.xensource.com
> 
> ---


Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>

The rtl818x parts.

^ permalink raw reply

* Re: [PATCH v6] can: kvaser_usb: Add support for Kvaser CAN/USB devices
From: Greg KH @ 2012-11-22 21:30 UTC (permalink / raw)
  To: Olivier Sobrie
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, linux-can, netdev,
	linux-usb, Daniel Berglund
In-Reply-To: <20121122150149.GB11612@hposo>

On Thu, Nov 22, 2012 at 04:01:49PM +0100, Olivier Sobrie wrote:
> Hi linux-usb folks,
> 
> Is there someone who can help me to fix the following errors?
> 
> smatch warnings:
> 
> + drivers/net/can/usb/kvaser_usb.c:431 kvaser_usb_send_simple_msg() error: doing
> +dma on the stack ((null))
> + drivers/net/can/usb/kvaser_usb.c:1073 kvaser_usb_set_opt_mode() error: doing
> +dma on the stack ((null))
> + drivers/net/can/usb/kvaser_usb.c:1174 kvaser_usb_flush_queue() error: doing
> +dma on the stack ((null))
> + drivers/net/can/usb/kvaser_usb.c:1384 kvaser_usb_set_bittiming() error: doing
> +dma on the stack ((null))
> 
> I assume it's due to the buffer I pass to the function usb_bulk_msg()
> which is on the stack and can't be.
> Do I just have to kmalloc a buffer and give it to the usb_bulk_msg()
> function? That's what I understood by reading
> "Documentation/DMA-API-HOWTO.txt" section "What memory is DMA'able?"...
> and from commit
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=32ec4576c3fb37316b1d11a04b220527822f3f0d

Yes, that is all that is needed.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] 8139cp: set ring address after enabling C+ mode
From: Francois Romieu @ 2012-11-22 21:39 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: David Miller, dwmw2, jasowang, netdev, slacky, rggjan, gilboad,
	Hayes Wang
In-Reply-To: <50ADAFB7.7070704@pobox.com>

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]

Jeff Garzik <jgarzik@pobox.com> :
> On 11/21/2012 11:39 PM, David Miller wrote:
> >From: Jeff Garzik <jgarzik@pobox.com>
> >Date: Wed, 21 Nov 2012 22:47:39 -0500
> >
> >>State A:  pre-b01af457, known working
> >>State B:  b01af457, known broken
> >
> >State A is also known buggy on the largest consumer of this driver,
> >the emulated hardware.
>
> >Please evaluate this realistically.
> 
> If the simulator fails to match the hardware, that is a simulator bug.

Yes.

> It is disappointing to work around someone else's software bug in
> the kernel.

Yes. :o/

I like David Woodhouse's C (attached patch) since 1) Realtek does
not seem to care about oldies 2) the emulation will not be fixed in a
decent timeframe 3) real 8139cp users care.

It would be nice if gilboad could give it a try (users Cced).

Btw David W., could consider adding artificial delays between the writes
and see if / when things start to fail (CpCmd write in cp_start_hw is an
unflushed posted write for instance).

-- 
Ueimor

[-- Attachment #2: 8139cp.patch.gz --]
[-- Type: application/x-gzip, Size: 934 bytes --]

^ permalink raw reply

* VXLAN multicast receive not working
From: Bernhard Schmidt @ 2012-11-21 23:27 UTC (permalink / raw)
  To: netdev

Hello,

I'm just trying to play with VXLAN a bit and wanted to build a Linux
gateway routing into seperate VXLAN segments.

Debian Wheezy, running 3.7-rc6, with current git HEAD of iproute2.
It's a VMware VM but that should not matter much.

Two vmxnet3 NICs, one with management and one with my VXLAN transport
network.

4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN qlen 1000
    link/ether 00:50:56:8e:0d:c8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.250/24 scope global eth1
    inet6 fe80::250:56ff:fe8e:dc8/64 scope link 
       valid_lft forever preferred_lft forever

In the same network segment are two VMware ESXi 5.0 hosts with Nexus
1000V for VLAN termination (10.0.0.1 and 10.0.0.2)

On top of that there is a VXLAN interface defined, with ID 12340 and
group 239.0.0.42.

6: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue
state UNKNOWN mode DEFAULT 
    link/ether f6:59:e7:db:82:92 brd ff:ff:ff:ff:ff:ff
    vxlan id 12340 group 239.0.0.42 dev eth1 port 32768 61000 ageing 300 

That interface has an address as well

6: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue
state UNKNOWN 
    link/ether f6:59:e7:db:82:92 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.1/24 scope global vxlan0
    inet6 fe80::f459:e7ff:fedb:8292/64 scope link 
       valid_lft forever preferred_lft forever

The same VXLAN domain is defined on the Nexus 1000V and a VM is attached
to it. When I send some broadcast traffic down vxlan0 (i.e. ping
10.1.1.2 which generates an ARP request) the VM sees the packet just
fine.

When I do it the other way around (the VM sends a broadcast ARP for
10.1.1.3) I see a packet coming into eth1 on the multicast group, but
vxlan0 stays silent. 

I have captured one of those packets, wireshark does not support
disecting it yet but in my eyes the packet is correct. I've put it
online at http://users.birkenwald.de/~berni/temp/vxlan.pcap

Weirdly enough, as soon as I populate the ARP and VXLAN forwarding table
by pinging back from the destination to the source (so the source can
learn both MAC->Nexthop for VXLAN and IP->MAC from the ARP request) it
starts working. 

To summarize, Multicast/Broadcast from N1k to Linux seems to be broken,
the encapsulated packet is seen on the Ethernet but the decapsulated
packet is not seen on vxlan0. Broadcast/Multicast in the other direction
works just fine as well as Unicast in both directions.

Thanks for any pointers,
Bernhard

^ permalink raw reply

* Re: [PATCH] 8139cp: set ring address after enabling C+ mode
From: David Woodhouse @ 2012-11-21 22:32 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Jeff Garzik, Jason Wang, David S. Miller, netdev, Hayes Wang,
	gilboad
In-Reply-To: <20121121204045.GA17627@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On Wed, 2012-11-21 at 21:40 +0100, Francois Romieu wrote:
> Straight to -stable ?

That's the way it works. You put the Cc: stable on the *original* commit
that goes upstream. There's no sane way to retroactively add that tag
after it's already been merged and tested.

Yes, you can bug Greg manually to 'please add this upstream commit which
we forgot to mark as Cc: stable' but that isn't the way it's usually
done.

> Afaik nobody complained from the original (pre b01af457) problem on
> real hardware.
>
> May be someone @realtek (hi Hayes) can give an explanation regarding
> the CpCmd, RingAddr, Cmd init sequence and the start of DMA.

That would be really useful; thanks. To recap for Hayes' benefit: the
concern is that if we follow the instructions in §6.33 of the data
sheet:

Recommendation to C+ mode programming: Enable C+ mode functions in C+CR
register first, => Enable transmit/receive in Command register (offset
37h), => Configure other related registers (ex. Descriptor start
address, TCR, RCR, ...).

... then we appear to be starting up the DMA before we actually tell it
the descriptor ring addresses, which will cause stray DMA to random
unconfigured addresses!

Is there some detail of the hardware which prevents this from actually
happening? Or if not, is my proposed workaround (enabling Tx/Rx in the
C+ Command Register *first*, then setting the descriptor addresses, and
enabling Tx/Rx in the old-style Command register last) OK?

It was observed that when setting the descriptor addresses *first*, the
Transmit Descriptor Start Address Register was getting overwritten with
garbage when we enabled Tx in the C+ Command Register.

I note that we're also setting a bunch of other things in the Rx and Tx
config registers *after* operation all seems to have started up... is
that OK too?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: [PATCH] 8139cp: set ring address after enabling C+ mode
From: David Woodhouse @ 2012-11-22 23:12 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Jeff Garzik, David Miller, jasowang, netdev, slacky, rggjan,
	gilboad, Hayes Wang
In-Reply-To: <20121122213950.GA8873@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 1662 bytes --]

On Thu, 2012-11-22 at 22:39 +0100, Francois Romieu wrote:
> Btw David W., could consider adding artificial delays between the
> writes and see if / when things start to fail (CpCmd write in
> cp_start_hw is an unflushed posted write for instance).

That's how I tracked it down to the CpCmd write. I littered the whole of
the init path with
 printk("at line %d TxRingAddr %08x%08 (sb %08x)\n", __LINE__,
         cpr32(TxRingAddr+4), cpr32(TxRingAddr), cp->ring_dma + whatever);
... until the output looked something like this:

root@geos:~# insmod ./8139cp.ko 
[ 1331.492486] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
[ 1331.500388] 8139cp 0000:00:0a.0: eth0: RTL-8139Cx at 0xd10a6000, 00:0a:fa:22:
00:96, IRQ 10
[ 1331.509608] 8139cp 0000:00:0b.0: eth1: RTL-8139Cx at 0xd10a8100, 00:0a:fa:22:
00:97, IRQ 11
root@geos:~# [ 1331.644393] at line 995 TxRingAddr   000000000f3c6400 (sb f3c6400)
[ 1331.650579] at line 960 TxRingAddr   000000000f3c6400 (sb f3c6400)
[ 1331.656820] at line 962 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.663020] at line 964 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.669205] at line 998 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.675412] at line 1001 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.681706] at line 1003 TxRingAddr   000000000f3e4400 (sb f3c6400)
[ 1331.687977] at line 1005 TxRingAddr   000000000f3e4400 (sb f3c6400)

Each of those printks will have effectively flushed any prior posted
writes... not that this AMD Geode platform actually *does* post writes,
to my knowledge. And at 115200 baud, each one was about a 6ms delay.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: [PATCH] 8139cp: set ring address after enabling C+ mode
From: David Woodhouse @ 2012-11-21 22:52 UTC (permalink / raw)
  To: David Miller; +Cc: romieu, jgarzik, jasowang, netdev, hayeswang, gilboad
In-Reply-To: <20121121.174014.268181096393133050.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

On Wed, 2012-11-21 at 17:40 -0500, David Miller wrote:
> On the contrary, for networking I submit everything manually and I
> remove the CC: tags.
> 
> I have a queue on patchwork that I add such patches to, so that they
> do not get lost.

Ah, right. Thanks for the correction. Is it even worth giving the hint
that this should be for the stable tree (from v3.5 onwards), or should I
leave you to work that all out for yourself? And if it *is* worth giving
that hint, is it better to do it in a comment after --- at the end of
the commit comment, rather than the "normal" 'Cc: stable' tag?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* FW:
From: balu.balini @ 2012-11-23  2:41 UTC (permalink / raw)
  To: mdoug68

http://bitly.com/Tewapi I just found out your mother has a rap sheet!

^ permalink raw reply

* Fwd: Re: [PATCH] net: ipv6: change %8s to %s for rt->dst.dev->name in seq_printf of rt6_info_route
From: Chen Gang @ 2012-11-23  3:35 UTC (permalink / raw)
  To: Shan Wei, Eric Dumazet, David Miller; +Cc: netdev
In-Reply-To: <50ADE447.8030300@asianux.com>


1) about the proof:
 currently, sorry for I can not find the device which name length is more than 8.
 maybe they (Asianux user) use system call in user mode to assign the new name to device.
   please reference: dev_ioctl -> dev_ifsioc -> dev_change_name  in net/core/dev.c.
   I do not know why they want to change the net device name (but they surely can do).

2) about %*s:
 since kernel is an open system, IFNAMSIZ is belong to OS API level for outside
   it has effect both on individual kernel modules and user mode system call
   we need obey this rule, and %8s is not match this rule.
   so %8s is not suitable. (and now we have to choose %16s or %s).

 for the format of information which seq_printf output:
   it is not belong to OS API level for outside (at least, for current case, it is true). 
   so we need not keep 'compatible' of it, so %16s is not necessary.

 for keeping source code simple and clearly:
   %s is better than %16s.

 so for result, we should choose %s only (neither %16s nor %8s).

3) about my original mail:
 why did my original mail (first mail relative with this patch) say %16s ?
 my goals are:
   i)   to confirm whether suitable to communicate about RHEL* in *@vger.kernel.org.
   ii)  to confirm whether *@vger.kernel.org welcome such a minor patch (at least, it is not a spam).
   iii) to confirm whether *@vger.kernel.org are focused on coding. 
        (so I intended to use %16s and 'beautiful')
        (I have seen too many another various organizations to not be focused on coding)
 after get feed back from Eric Dumazet.
   i)   it is not suitable to communicate about RHEL* in *@vger.kernel.org.
   ii)  *@vger.kernel.org welcome such a minor patch.
   iii) *@vger.kernel.org are focused on coding.
        (so I am sure that can use "coding review" to provide contributes to *@vger.kernel.org)


 Regards

gchen.

-------- 原始消息 --------
主题: Re: [PATCH] net: ipv6: change %8s to %s for rt->dst.dev->name in seq_printf of rt6_info_route
日期: Thu, 22 Nov 2012 16:37:27 +0800
发件人: Chen Gang <gang.chen@asianux.com>
收件人: Shan Wei <shanwei88@gmail.com>
抄送: Eric Dumazet <eric.dumazet@gmail.com>,  David Miller <davem@davemloft.net>, netdev <netdev@vger.kernel.org>

于 2012年11月22日 13:28, Shan Wei 写道:
> Hi chen gang:
> 
> For length of device name which less than 8 char，
> your patch changes them to be print from align right 
> to align left. But at least since 2005(git age-time),
> we keep this style so far.
> Maybe, since birth of this code, just align right. :-)
> 

  originally, it is a solid output length, the length is "#define
RT6_INFO_LEN (32 + 4 + 32 + 4 + 32 + 40 + 5 + 1)"
  and RHEL5 (kernel-2.6.18-308.20.el5) still use it.
  it assume that the length of rt->rt6i_dev->name (in RHEL5) is 8.

> Why we *should* change this style?
> just keep be consistent with the case which length of device
> name greater than 8 char?
> 

  as a solid length, 8 is not suitable, firstly I suggest to '%16s' (I
call it 'beautiful',  but for RHEL5, it is a correctness issue)
  and Eric Dumazet suggest use '%s' is better, since it is not solid
length any more (have already let seq_printf instead of arg->buffer)
  and I think: as a result, what he said is reasonable

> Not only old name rule i.e. eth0,eth1, but also new name rule
> base on pci address ,i.e. em1,p3p1. most of them are less than 8 char.
> Should not we take more attention on the case less than 8 char?
> 

  I have ever seen such a device name is more than 8 characters.
  I am not quite sure: maybe they are eth-route* or eth-usb* ...
  I will check it in these days, please wait for some days.


> By addition, if we want to add new field in the future,
> align right is a better choice.
> 

  maybe what you said is better (still keep it 'beautiful', but need use
'%16s' instead of '%8s')

  for this, Eric Dumazet maybe have his opinions.


 Regards

gchen.

> 
> Chen Gang said, at 2012/11/22 10:52:
>> Hi Shan Wei, Eric Dumazet
>>
>>   is this patch integrated into main branch ?
>>   if need me for additional completion (such as: merge another 2 trivial patches into this patch, too)
>>   please tell me, I will do. 
>>
>>   I understand you are working overtime, maybe no time for any minor and trivial patches.
>>   if surely it is, I think:
>>     you can modify these code manually, and obsolete these minor and trivial patches which I provided.
>>     I do not mind whether mention me in another new patches (you can mention me or not mention me, both are OK).
>>     since our goal is to provide contributes to outside, efficiently.
>>
>>  regards
>>
>> gchen
>>
>>
>> 于 2012年11月05日 11:02, Chen Gang 写道:
>>>
>>> 1. not to send same patch triple times. 
>>
>>   thanks, I shall notice, next time.
>>   (I shall 'believe' another members).
>>
>>> 2. config your email client,because tab is changed to space.
>>>    you can read Documentation/email-clients.txt.
>>
>>   1) thanks. I shall notice, next time.
>>   2) now, I get gvim as extention editor for thounderbird
>>   3) the patch is generated by `git format-patch -s --summary --stat`
>>      it use "' '\t" as head, I do not touch it, maybe it is correct.
>>
>> welcome any members to giving additional suggestions and completions.
>>
>> thanks
>>
>> the modified contents are below,
>> -----------------------------------------------------------------------------------
>>
>>   the length of rt->dst.dev->name is 16 (IFNAMSIZ)
>>   in seq_printf, it is not suitable to use %8s for rt->dst.dev->name.
>>   so change it to %s, since each line has not been solid any more.
>>
>>   additional information:
>>
>>     %8s  limit the width, not for the original string output length
>>          if name length is more than 8, it still can be fully displayed.
>>          if name length is less than 8, the ' ' will be filled before name.
>>
>>     %.8s truly limit the original string output length (precision)
>>
>> Signed-off-by: Chen Gang <gang.chen@asianux.com>
>> ---
>>  net/ipv6/route.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index c42650c..b60bc52 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -2835,7 +2835,7 @@ static int rt6_info_route(struct rt6_info *rt, void *p_arg)
>>  	} else {
>>  		seq_puts(m, "00000000000000000000000000000000");
>>  	}
>> -	seq_printf(m, " %08x %08x %08x %08x %8s\n",
>> +	seq_printf(m, " %08x %08x %08x %08x %s\n",
>>  		   rt->rt6i_metric, atomic_read(&rt->dst.__refcnt),
>>  		   rt->dst.__use, rt->rt6i_flags,
>>  		   rt->dst.dev ? rt->dst.dev->name : "");
>>
>>
>>
> 
> 
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* [PATCH net-next] tcp: remove dead prototype for tcp_v4_get_peer()
From: Neal Cardwell @ 2012-11-23  3:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell

This function no longer exists.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/tcp.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6feeccd..3202bde 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -369,7 +369,6 @@ extern void tcp_shutdown (struct sock *sk, int how);
 extern void tcp_v4_early_demux(struct sk_buff *skb);
 extern int tcp_v4_rcv(struct sk_buff *skb);
 
-extern struct inet_peer *tcp_v4_get_peer(struct sock *sk);
 extern int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
 extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		       size_t size);
-- 
1.7.7.3

^ permalink raw reply related

* Re: [PATCH] 8139cp: set ring address after enabling C+ mode
From: Jason Wang @ 2012-11-23  3:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Miller, dwmw2, netdev
In-Reply-To: <50ADAFB7.7070704@pobox.com>

On 11/22/2012 12:53 PM, Jeff Garzik wrote:
> On 11/21/2012 11:39 PM, David Miller wrote:
>> From: Jeff Garzik <jgarzik@pobox.com>
>> Date: Wed, 21 Nov 2012 22:47:39 -0500
>>
>>> State A:  pre-b01af457, known working
>>> State B:  b01af457, known broken
>>
>> State A is also known buggy on the largest consumer of this driver,
>> the emulated hardware.
>>
>> Please evaluate this realistically.
>
> If the simulator fails to match the hardware, that is a simulator bug.
Resend the mail because it's fail to post to the list yesterday.

CC realtek linux driver mainter (nic_swsd@realtek.com)

The problem the behaviour of the hardware is subtle, and we could not 
just infer it from the datasheet. Another issue is in some situation, 
the datasheet is conflict with what real hardware does, one example is 
the cfg9364 issue mentioned by David ( I also meet it during qemu 
development).

If the hardware always fit garbage into the TxRingAddr register when 
"plus mode" were enabled, it may send something from memory to the wire 
unexpectedly which looks really strange. If it does not change the 
RxRingAddr when enabling C+, another method is to keep setting the rx 
address before C+ enabling but does the tx after.
>
> It is disappointing to work around someone else's software bug in the 
> kernel.
>

Qemu also has some workarounds for the buggy kernels and even in this 
case: it initialize RxRingAddr to 0 and check it during receiving, it  
check whether the addr is still zero ( which may mean the rx ring addr 
were set after the c+ is enabled), it won't do the receiving to prevent 
the corruption. So reverting is safe for rx now.
>     Jeff
>
>
>

^ permalink raw reply

* Re: [PATCHv4] virtio-spec: virtio network device RFS support
From: Jason Wang @ 2012-11-23  5:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: rusty, virtualization, netdev, kvm
In-Reply-To: <20121122144645.GA28284@redhat.com>

On 11/22/2012 10:46 PM, Michael S. Tsirkin wrote:
> Add RFS support to virtio network device.
> Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
> configuration field max_virtqueue_pairs to detect supported number of
> virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
> packet steering for unidirectional protocols.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> --
>
> Changes from v3:
> - rename multiqueue -> rfs this is what we support
> - Be more explicit about what driver should do.
> - Simplify layout making VQs functionality depend on feature.
> - Remove unused commands, only leave in programming # of queues
>
> Changes from v2:
> Address Jason's comments on v2:
> - Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
>    this is both clearer and easier to support.
>    It does not look like we need a separate steering command
>    since host can just watch tx packets as they go.
> - Moved RX and TX steering sections near each other.
> - Add motivation for other changes in v2
>
> Changes from Jason's rfc:
> - reserved vq 3: this makes all rx vqs even and tx vqs odd, which
>    looks nicer to me.
> - documented packet steering, added a generalized steering programming
>    command. Current modes are single queue and host driven multiqueue,
>    but I envision support for guest driven multiqueue in the future.
> - make default vqs unused when in mq mode - this wastes some memory
>    but makes it more efficient to switch between modes as
>    we can avoid this causing packet reordering.
>
> Rusty, could you please take a look and comment soon?
> If this looks OK to everyone, we can proceed with finalizing the
> implementation. Would be nice to try and put it in 3.8.
>
> ---
>
> diff --git a/virtio-spec.lyx b/virtio-spec.lyx
> index d2f0da9..c1fa3e4 100644
> --- a/virtio-spec.lyx
> +++ b/virtio-spec.lyx
> @@ -59,6 +59,7 @@
>   \author -608949062 "Rusty Russell,,,"
>   \author -385801441 "Cornelia Huck" cornelia.huck@de.ibm.com
>   \author 1531152142 "Paolo Bonzini,,,"
> +\author 1986246365 "Michael S. Tsirkin"
>   \end_header
>   
>   \begin_body
> @@ -4170,9 +4171,42 @@ ID 1
>   \end_layout
>   
>   \begin_layout Description
> -Virtqueues 0:receiveq.
> - 1:transmitq.
> - 2:controlq
> +Virtqueues 0:receiveq
> +\change_inserted 1986246365 1352742829
> +0
> +\change_unchanged
> +.
> + 1:transmitq
> +\change_inserted 1986246365 1352742832
> +0
> +\change_deleted 1986246365 1352742947
> +.
> +
> +\change_inserted 1986246365 1352742952
> +.
> + ....
> + 2N
> +\begin_inset Foot
> +status open
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1352743187
> +N=0 if VIRTIO_NET_F_RFS is not negotiated, otherwise N is indicated by max_
> +\emph on
> +virtqueue_pairs control
> +\emph default
> + field.
> +
> +\end_layout
> +
> +\end_inset
> +
> +: receivqN.
> + 2N+1: transmitqN.
> + 2N+
> +\change_unchanged
> +2:controlq
>   \begin_inset Foot
>   status open
>   
> @@ -4343,6 +4377,16 @@ VIRTIO_NET_F_CTRL_VLAN
>   
>   \begin_layout Description
>   VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
> +\change_inserted 1986246365 1352742767
> +
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 1986246365 1352742808
> +VIRTIO_NET_F_RFS(2) Device supports Receive Flow Steering.
> +\change_unchanged

should be 22
> +
>   \end_layout
>   
>   \end_deeper
> @@ -4355,11 +4399,44 @@ configuration
>   \begin_inset space ~
>   \end_inset
>   
> -layout Two configuration fields are currently defined.
> +layout
> +\change_deleted 1986246365 1352743300
> +Two
> +\change_inserted 1986246365 1352743301
> +Four
> +\change_unchanged
> + configuration fields are currently defined.
>    The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
>    is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
>    Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
>   K_UP and VIRTIO_NET_S_ANNOUNCE.
> +
> +\change_inserted 1986246365 1353595219
> + The following read-only field,
> +\emph on
> +max_virtqueue_pairs
> +\emph default
> + only exists if VIRTIO_NET_F_RFS is set.
> + This field specifies the maximum number of each of transmit and receive
> + virtqueues (receiveq0..receiveq
> +\emph on
> +N
> +\emph default
> + and transmitq0..transmitq
> +\emph on
> +N
> +\emph default
> + respectively;
> +\emph on
> +N
> +\emph default
> +=
> +\emph on
> +max_virtqueue_pairs
> +\emph default
> +) that can be configured once VIRTIO_NET_F_RFS is negotiated.
> +
> +\change_unchanged

So the virt queues used in single queue mode is still reserved in 
multiqueue mode, since when max_virtqueue_pairs in N, we finally get N+1 
virt queue pairs? And this looks conflict with the description in 
"Packet Receive Flow Steering":

"specifying the number of the last transmit and receive queue that is 
going to be used; thus out of transmitq0..transmitqn and 
receiveq0..receiveqn where n=virtqueue_pairs will be used."

In this description, looks like n+1 virtqueue pairs (include receiveq0 
and transmitq0) could be used in RFS mode.
>    
>   \begin_inset listings
>   inline false
> @@ -4410,7 +4487,24 @@ Device Initialization
>   
>   \begin_layout Enumerate
>   The initialization routine should identify the receive and transmission
> - virtqueues.
> + virtqueues
> +\change_inserted 1986246365 1352744077
> +, up to N+1 of each kind
> +\change_unchanged
> +.
> +
> +\change_inserted 1986246365 1352743942
> + If VIRTIO_NET_F_RFS feature bit is negotiated,
> +\emph on
> +N=max_virtqueue_pairs
> +\emph default
> +, otherwise identify
> +\emph on
> +N=0
> +\emph default
> +.
> +\change_unchanged
> +
>   \end_layout
>   
>   \begin_layout Enumerate
> @@ -4455,7 +4549,11 @@ status
>   \end_layout
>   
>   \begin_layout Enumerate
> -The receive virtqueue should be filled with receive buffers.
> +The receive virtqueue
> +\change_inserted 1986246365 1352743953
> +s
> +\change_unchanged
> + should be filled with receive buffers.
>    This is described in detail below in
>   \begin_inset Quotes eld
>   \end_inset
> @@ -4550,8 +4648,15 @@ Device Operation
>   \end_layout
>   
>   \begin_layout Standard
> -Packets are transmitted by placing them in the transmitq, and buffers for
> - incoming packets are placed in the receiveq.
> +Packets are transmitted by placing them in the transmitq
> +\change_inserted 1986246365 1353593685
> +0..transmitqN
> +\change_unchanged
> +, and buffers for incoming packets are placed in the receiveq
> +\change_inserted 1986246365 1353593692
> +0..receiveqN
> +\change_unchanged
> +.
>    In each case, the packet itself is preceeded by a header:
>   \end_layout
>   
> @@ -4861,6 +4966,17 @@ If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the
>   struct virtio_net_hdr
>   \family default
>   .
> +\change_inserted 1986246365 1353594518
> +
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_inserted 1986246365 1353594638
> +If VIRTIO_NET_F_RFS is negotiated, each of the receiveq0...receiveqN that will
> + be used should be populated with receive buffers.
> +\change_unchanged
> +
>   \end_layout
>   
>   \begin_layout Subsection*
> @@ -5293,8 +5409,125 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
>    
>   \end_layout
>   
> -\begin_layout Enumerate
> +\begin_layout Subsection*
> +
> +\change_inserted 1986246365 1353593879
> +Packet Receive Flow Steering
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_inserted 1986246365 1353594403
> +If the driver negotiates the VIRTIO_NET_F_RFS (depends on VIRTIO_NET_F_CTRL_VQ),
> + it can transmit outgoing packets on one of the multiple transmitq0..transmitqN
> + and ask the device to queue incoming packets into one the multiple receiveq0..rec
> +eiveqN depending on the packet flow.
> +\change_unchanged
> +
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_inserted 1986246365 1353594292
> +\begin_inset listings
> +inline false
> +status open
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1353594178
> +
> +struct virtio_net_ctrl_rfs {
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1353594212
> +
> +	u16 virtqueue_pairs;
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1353594172
> +
> +};
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1353594172
> +
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1353594263
> +
> +#define VIRTIO_NET_CTRL_RFC    1

RFS
> +\end_layout
> +
> +\begin_layout Plain Layout
> +
> +\change_inserted 1986246365 1353594273
> +
> + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET        0
> +\end_layout
> +
> +\end_inset
> +
> +
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_inserted 1986246365 1353594884
> +RFS acceleration is disabled by default.
> + Driver enables RFS by executing the VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET command,
> + specifying the number of the last transmit and receive queue that is going
> + to be used; thus out of transmitq0..transmitqn and receiveq0..receiveqn where
> +
> +\emph on
> +n=virtqueue
> +\emph default
> +_pairs will be used.
> + All these virtqueues must have been pre-configured in advance.
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_inserted 1986246365 1353595328
> +Programming of the receive flow classificator is implicit.
> + Transmitting a packet of a specific flow on transmitqX will cause incoming
> + packets for this flow to be steered to receiveqX.
> + For uni-directional protocols, or where no packets have been transmitted
> + yet, device will steer a packet to a random queue out of the specified
> + receiveq0..receiveqn.
> +\change_unchanged
> +
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_inserted 1986246365 1353595040
> +RFS acceleration is disabled by setting
> +\emph on
> +virtqueue_pairs = 0

Zero looks a little bit misleading, use 1 here is more clear since we 
would still use 1 queue pairs.
> +\emph default
> + (this is the default).
> + Following this, driver should not transmit new packets on virtqueues other
> + than transmitq0 and device will not steer new packets on virtqueues other
> + than receiveq0.
> +\change_unchanged
> +
> +\end_layout
> +
> +\begin_layout Standard
> +
> +\change_deleted 1986246365 1353593873
>   .
> +
> +\change_unchanged
>    
>   \end_layout
>   
> @@ -6152,13 +6385,7 @@ Virtqueues 0:receiveq(port0).
>   status open
>   
>   \begin_layout Plain Layout
> -Ports
> -\change_inserted 1986246365 1347188327
> -1
> -\change_deleted 1986246365 1347188327
> -2
> -\change_unchanged
> - onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
> +Ports 12 onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set

The changes here and follow looks unrelated.
>   \end_layout
>   
>   \end_inset
> @@ -6185,13 +6412,8 @@ VIRTIO_CONSOLE_F_SIZE
>   
>   \begin_layout Description
>   VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple ports; configurati
> -on fields nr_ports and max_nr_ports are valid
> -\change_inserted 1986246365 1347188404
> -; if this bit is negotiated,
> -\change_deleted 1986246365 1347188406
> - and
> -\change_unchanged
> - control virtqueues will be used.
> +on fields nr_ports and max_nr_ports are valid; if this bit is negotiated,
> + and control virtqueues will be used.
>   \end_layout
>   
>   \end_deeper
> @@ -6260,8 +6482,7 @@ If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the driver can
>    spawn multiple ports, not all of which may be attached to a console.
>    Some could be generic ports.
>    In this case, the control virtqueues are enabled and according to the max_nr_po
> -rts configuration-space value, an appropriate number of virtqueues are
> - created.
> +rts configuration-space value, an appropriate number of virtqueues are created.
>    A control message indicating the driver is ready is sent to the host.
>    The host can then send control messages for adding new ports to the device.
>    After creating and initializing each port, a VIRTIO_CONSOLE_PORT_READY
> @@ -6699,14 +6920,9 @@ The driver constructs an array of addresses of memory pages it has previously
>   \end_layout
>   
>   \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is
> -\change_inserted 1986246365 1347188540
> -negotiated
> -\change_deleted 1986246365 1347188542
> -set
> -\change_unchanged
> -, the guest may not use these requested pages until that descriptor in the
> - deflateq has been used by the device.
> +If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiatedset, the guest
> + may not use these requested pages until that descriptor in the deflateq
> + has been used by the device.
>   \end_layout
>   
>   \begin_layout Enumerate

^ permalink raw reply

* KINGDOM NELSON
From: KINGDOM NELSON @ 2012-11-23  6:48 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: KINGDOM NELSON.rtf --]
[-- Type: application/msword, Size: 3465 bytes --]

^ permalink raw reply

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
From: Stefan Hajnoczi @ 2012-11-23  7:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel@nongnu.org, netdev
In-Reply-To: <50AE36E0.8000307@dlhnet.de>

On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> a bridge from sending pakets to a tap device?
> 
> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> which is based on Linux 3.2.33.
> 
> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> the physical interface on the host, but they are not forwarded to the tap interface.
> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> bridge) again. Unfortunately it seems not to be reproducible.

Not sure but this patch from Michael Tsirkin may help - it solves an
issue with persistent tap devices:

http://patchwork.ozlabs.org/patch/198598/

Stefan

^ permalink raw reply

* Re: [BUG] Kernel recieves DNS reply, but doesn't deliver it to a waiting application
From: Andrew Savchenko @ 2012-11-23  7:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <20121021032543.09d1844f.bircoph@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2113 bytes --]

Hello,

On Sun, 21 Oct 2012 03:25:43 +0400 Andrew Savchenko wrote:
> > On Sat, 13 Oct 2012 15:44:20 +0200 Eric Dumazet wrote:
[...]
> > > You should investigate and check where the incoming packet is lost
> > > 
> > > Tools :
> > > 
> > > netstat -s
> > > 
> > > drop_monitor module and dropwatch command
> > > 
> > > cat /proc/net/udp
> > 
> > Thank you for you reply; I updated my kernel to 3.4.14, enabled
> > CONFIG_NET_DROP_MONITOR, and installed dropwatch utility.
> > 
> > I will report back when the bug will struck again.
> > This may take a weak or two, however.
> 
> This bug is back again on kernel 3.4.14, but this time I was able to
> get debug data and to recover running kernel without reboot.
> 
> Drowpatch showed that DNS UDP replies are always dropped here:
> 1 drops at __udp_queue_rcv_skb+61 (0xffffffff813bd670)
> 
> Another observations:
> - only UDP replies are lost, TCP works fine;
> - if network load is dropped dramatically (ip_forward disabled, most
> network daemons are stopped) UDP DNS queries work again; but with
> gradual load increase replies became first slow and than cease at all.
> - CPU load is very low (uptime is below 0.05), so this shouldn't be
> an insufficient computing power issue.
> 
> I found __udp_queue_rcv_skb function in net/ipv4/udp.c. From the code
> and observations above it follows that this is likely to be a ENOMEM
> condition leading to a packet loss.
[...]
> net.ipv4.udp_mem = 100000       150000  200000
> 
> This solved my issue, at least for a while: DNS queries are working
> fine now.

And this solved problem only temporary: after 40 days of uptime the
same problem struck again with the same observables. I "solved" this
by increasing udp memory again:

net.ipv4.udp_mem = 200000  300000  400000

Of course, this solution is only a temporary workaround. Such
behaviour increases my suspicions on some kind of memory leak.

This host is still on 3.4.14, however: can't reboot now due to
workload. Will try 3.7 branch as soon as this will be possible.

Best regards,
Andrew Savchenko

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH 1/1] ARM: dts: am335x-evmsk: Add cpsw phy_id
From: Mugunthan V N @ 2012-11-23  8:32 UTC (permalink / raw)
  To: b-cousson
  Cc: netdev, devicetree-discuss, linux-arm-kernel, linux-omap, paul,
	Mugunthan V N

Add phy id for CPSW

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
The patch is verified with CPSW patches present in the following git repo
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

 arch/arm/boot/dts/am335x-evmsk.dts |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/am335x-evmsk.dts b/arch/arm/boot/dts/am335x-evmsk.dts
index 6f53879..c629086 100644
--- a/arch/arm/boot/dts/am335x-evmsk.dts
+++ b/arch/arm/boot/dts/am335x-evmsk.dts
@@ -164,3 +164,11 @@
 		};
 	};
 };
+
+&cpsw_emac0 {
+	phy_id = <&davinci_mdio>, <0>;
+};
+
+&cpsw_emac1 {
+	phy_id = <&davinci_mdio>, <1>;
+};
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH v6] can: kvaser_usb: Add support for Kvaser CAN/USB devices
From: Marc Kleine-Budde @ 2012-11-23  8:48 UTC (permalink / raw)
  To: Greg KH
  Cc: Olivier Sobrie, Wolfgang Grandegger, linux-can, netdev, linux-usb,
	Daniel Berglund
In-Reply-To: <20121122213022.GB1461@kroah.com>

[-- Attachment #1: Type: text/plain, Size: 1594 bytes --]

On 11/22/2012 10:30 PM, Greg KH wrote:
> On Thu, Nov 22, 2012 at 04:01:49PM +0100, Olivier Sobrie wrote:
>> Hi linux-usb folks,
>>
>> Is there someone who can help me to fix the following errors?
>>
>> smatch warnings:
>>
>> + drivers/net/can/usb/kvaser_usb.c:431 kvaser_usb_send_simple_msg() error: doing
>> +dma on the stack ((null))
>> + drivers/net/can/usb/kvaser_usb.c:1073 kvaser_usb_set_opt_mode() error: doing
>> +dma on the stack ((null))
>> + drivers/net/can/usb/kvaser_usb.c:1174 kvaser_usb_flush_queue() error: doing
>> +dma on the stack ((null))
>> + drivers/net/can/usb/kvaser_usb.c:1384 kvaser_usb_set_bittiming() error: doing
>> +dma on the stack ((null))
>>
>> I assume it's due to the buffer I pass to the function usb_bulk_msg()
>> which is on the stack and can't be.
>> Do I just have to kmalloc a buffer and give it to the usb_bulk_msg()
>> function? That's what I understood by reading
>> "Documentation/DMA-API-HOWTO.txt" section "What memory is DMA'able?"...
>> and from commit
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=32ec4576c3fb37316b1d11a04b220527822f3f0d
> 
> Yes, that is all that is needed.

Thanks Greg. Olivier, you can post an incremental patch, I'll squash it
before sending the patches upstream.

regards,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply

* [PATCH 1/3] net: stmmac: change GMAC control register for SGMII
From: Byungho An @ 2012-11-23  9:04 UTC (permalink / raw)
  To: davem, peppe.cavallaro, jeffrey.t.kirsher; +Cc: netdev, kgene.kim, linux-kernel


This patch changes GMAC control register (TC(Transmit
Configuration) and PS(Port Selection) bit for SGMII.
In case of SGMII, TC bit is '1' and PS bit is 0.

Signed-off-by: Byungho An <bh74.an@samsung.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c6cdbc4..a719c87 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1037,6 +1037,7 @@ static int stmmac_open(struct net_device *dev)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 	int ret;
+	u32 value;
 
 #ifdef CONFIG_STMMAC_TIMER
 	priv->tm = kzalloc(sizeof(struct stmmac_timer *), GFP_KERNEL);
@@ -1088,6 +1089,15 @@ static int stmmac_open(struct net_device *dev)
 	/* Initialize the MAC Core */
 	priv->hw->mac->core_init(priv->ioaddr);
 
+	if (priv->phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+		value = readl(priv->ioaddr);
+		/* GMAC_CONTROL_TC : transmit config in RGMII/SGMII */
+		value |= 0x1000000;
+		/* GMAC_CONTROL_PS : Port Selection for GMII */
+		value &= ~(0x8000);
+		writel(value, priv->ioaddr);
+	}
+
 	/* Request the IRQ lines */
 	ret = request_irq(dev->irq, stmmac_interrupt,
 			 IRQF_SHARED, dev->name, dev);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 2/3] net: stmmac: add SGMII RAL control bit
From: Byungho An @ 2012-11-23  9:04 UTC (permalink / raw)
  To: davem, peppe.cavallaro, jeffrey.t.kirsher; +Cc: netdev, kgene.kim, linux-kernel


This patch sets SGMRAL bit in AN control register.
This bit forces the SGMII RAL block to operate in the
speed configured in the Speed and Port Select bits of
the GMAC Configuration register.

Signed-off-by: Byungho An <bh74.an@samsung.com>
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig       |    7 +++++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   11 +++++++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig
b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 9f44827..d65d63b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -54,6 +54,13 @@ config STMMAC_DA
 	  By default, the DMA arbitration scheme is based on Round-robin
 	  (rx:tx priority is 1:1).
 
+config STMMAC_SGMRAL
+	bool "STMMAC SGMII RAL Control"
+	default n
+	---help---
+	  SGMII RAL block to operate in the speed configured in the speed
+	  and port select bits of the MAC Configuration register.
+
 config STMMAC_TIMER
 	bool "STMMAC Timer optimisation"
 	default n
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a719c87..670e585 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1096,6 +1096,17 @@ static int stmmac_open(struct net_device *dev)
 		/* GMAC_CONTROL_PS : Port Selection for GMII */
 		value &= ~(0x8000);
 		writel(value, priv->ioaddr);
+
+#ifdef CONFIG_STMMAC_SGMRAL
+		value = readl(priv->ioaddr + 0xc0);
+		/*
+		 * forces RAL block to operate in speed configured
+		 * in the speed and port select bits of GMAC
+		 * configuration register
+		 */
+		value = |= 0x40000;
+		writel(value, priv->ioaddr + 0xc0);
+#endif
 	}
 
 	/* Request the IRQ lines */
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH 1/3] net: stmmac: change GMAC control register for SGMII
From: Giuseppe CAVALLARO @ 2012-11-23  9:31 UTC (permalink / raw)
  To: Byungho An; +Cc: davem, jeffrey.t.kirsher, netdev, kgene.kim, linux-kernel
In-Reply-To: <004b01cdc959$80af8030$820e8090$%an@samsung.com>

Hello An

On 11/23/2012 10:04 AM, Byungho An wrote:
>
> This patch changes GMAC control register (TC(Transmit
> Configuration) and PS(Port Selection) bit for SGMII.
> In case of SGMII, TC bit is '1' and PS bit is 0.

I was looking at this too. In particular, I was working on the rgmii 
interrupt so I guess we could improve this part together.

First my note is that I would like to have this kind of code never 
placed in the stmmac_main. It should stay in the core part.
Also I 'd like to avoid the Kconfig option where possible.

At any rate, I'll come back with further details soon.

BR,
Peppe

>
> Signed-off-by: Byungho An <bh74.an@samsung.com>
> ---
>   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index c6cdbc4..a719c87 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1037,6 +1037,7 @@ static int stmmac_open(struct net_device *dev)
>   {
>   	struct stmmac_priv *priv = netdev_priv(dev);
>   	int ret;
> +	u32 value;
>
>   #ifdef CONFIG_STMMAC_TIMER
>   	priv->tm = kzalloc(sizeof(struct stmmac_timer *), GFP_KERNEL);
> @@ -1088,6 +1089,15 @@ static int stmmac_open(struct net_device *dev)
>   	/* Initialize the MAC Core */
>   	priv->hw->mac->core_init(priv->ioaddr);
>
> +	if (priv->phydev->interface == PHY_INTERFACE_MODE_SGMII) {
> +		value = readl(priv->ioaddr);
> +		/* GMAC_CONTROL_TC : transmit config in RGMII/SGMII */
> +		value |= 0x1000000;
> +		/* GMAC_CONTROL_PS : Port Selection for GMII */
> +		value &= ~(0x8000);
> +		writel(value, priv->ioaddr);
> +	}
> +
>   	/* Request the IRQ lines */
>   	ret = request_irq(dev->irq, stmmac_interrupt,
>   			 IRQF_SHARED, dev->name, dev);
>

^ permalink raw reply

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
From: Peter Lieven @ 2012-11-23  9:41 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, netdev, mst
In-Reply-To: <20121123070211.GC22787@stefanha-thinkpad.hitronhub.home>

Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:

> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>> a bridge from sending pakets to a tap device?
>> 
>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>> which is based on Linux 3.2.33.
>> 
>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>> the physical interface on the host, but they are not forwarded to the tap interface.
>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>> bridge) again. Unfortunately it seems not to be reproducible.
> 
> Not sure but this patch from Michael Tsirkin may help - it solves an
> issue with persistent tap devices:
> 
> http://patchwork.ozlabs.org/patch/198598/

Hi Stefan,

thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
with persistent taps. But maybe the taps in the kernel are not deleted directly. 
Can you remember what the syptomps of the above issue have been? Sorry for
being vague, but I currently have no clue whats going on.

Can someone who has more internal knowledge of the bridging/tap code say if qemu can
be responsible at all if the tap device is not receiving packets from the bridge.

If I have the following config. Lets say packets coming in via physical interface eth1.123,
and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
and tap0 are member of br123. 

If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
in via eth1.123, but the reply can't be seen on tap0.

Peter

> 
> Stefan

^ permalink raw reply

* [TCP] Flags returned by poll on connection request to closed peer?
From: Yi Li @ 2012-11-23 10:07 UTC (permalink / raw)
  To: netdev

Hi List,
When I issues a non-blocking connection request to a closed peer, and 
call select() to get the status
of the socket. But When I issues many threads, and I got the statistic 
as follow:

  POLLIN_SET POLLOUT_SET	9980000
  !POLLIN_SET !POLLOUT_SET	0
  POLLIN_SET !POLLOUT_SET0
  !POLLIN_SET POLLOUT_SET20000

as POLLIN_SET&& POLLOUT_SET means connection error.(of course, we are attempting to connect
to a closed peer). But what the meaning of !POLLIN_SET POLLOUT_SET ?

Here is my test program, and my test command is :
./client -d $SERVERS -s $max_range_start -e $max_range_end -t 20000

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <arpa/inet.h>
#include <netdb.h>

#define BUFSIZE 255
#define POOL_SIZE 1000

unsigned short port_start, port_end;
uint32_t server_ip;
int total_count;
unsigned short port_pool[POOL_SIZE];

void usage(const char *name){
     printf("%s: -d SERVER_IP -s SERVER_PORT_S -e SERVER_PORT_E  -t TOTAL_COUNT -h\n", name);
     printf("-d SERVER_IP: server ip is SERVER_IP\n");
     printf("-s SERVER_PORT_S: closed server port range start, one thread per port\n");
     printf("-s SERVER_PORT_E: closed server port range end, one thread per port\n");
     printf("-t TOTAL_COUNT: per thread try TOTAL_COUNT tmies connection requests\n");
     printf("-h: print this help message\n");
     return;
}

void* talk_to_server(void* arg){
     int sockfd, flags, ret, i = 0;
     struct timeval timeout;
     fd_set rset, wset;
     struct sockaddr_in server_addr;
     int index = *((int *)&arg);
	
     server_addr.sin_family = AF_INET;
     server_addr.sin_addr.s_addr = server_ip;
     server_addr.sin_port = htons(port_pool[index]);

     while(i++ < total_count){
	if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0){
	    perror("client: socket create error");
	    goto exit;
	}

	FD_ZERO(&rset);
	FD_SET(sockfd, &rset);
	wset = rset;
	timeout.tv_sec = 10;
	timeout.tv_usec = 0;
     
	flags = fcntl(sockfd, F_GETFL, 0);
	fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
     
	if ((ret = connect(sockfd, (struct sockaddr *)&server_addr, sizeof(server_addr))) <= 0){
	    if(ret == 0){
		fprintf(stderr, "client: connect established.\n");
		goto sockfd_exit;
	    }
	    if(errno != EINPROGRESS){
		perror("client: connect failed.");
		goto sockfd_exit;
	    }
	}
     
	if( (ret = select(sockfd+1, &rset, &wset, NULL, &timeout)) < 0){
	    perror("client: select failed.");
	    goto sockfd_exit;
	}

	if(FD_ISSET(sockfd, &rset) && FD_ISSET(sockfd, &wset)){
	    fprintf(stdout, "client: sockfd=%d, with POLLIN_SET POLLOUT_SET\n", sockfd);
	}else if(FD_ISSET(sockfd, &rset) && !FD_ISSET(sockfd, &wset)){
	    fprintf(stdout, "client: sockfd=%d, with POLLIN_SET !POLLOUT_SET\n", sockfd);
	}else if(!FD_ISSET(sockfd, &rset) && FD_ISSET(sockfd, &wset)){
	    fprintf(stdout, "client: sockfd=%d, with !POLLIN_SET POLLOUT_SET\n", sockfd);
	}else{
	    fprintf(stdout, "client: sockfd=%d, with !POLLIN_SET !POLLOUT_SET\n", sockfd);
	}
	
     sockfd_exit:
	close(sockfd);
     }
  exit:
     pthread_exit(NULL);
}

int parse_options(int argc, char *argv[]){
     int ret;
     struct hostent *hptr;
     char buf[BUFSIZE];

     if(argc < 6){
	usage(argv[0]);
	return -1;
     }
     
     while((ret = getopt(argc, argv, "d:s:e:t:h")) != -1){
	switch(ret){
	case 'd':
	    if( (hptr = gethostbyname(optarg)) == NULL){
		fprintf(stderr, "client: gethostbyname error: %s\n", hstrerror(h_errno));
		return -1;
	    }
	    switch(hptr->h_addrtype){
	    case AF_INET:
		server_ip =((struct in_addr*)hptr->h_addr)->s_addr;
		break;
	    default:
		fprintf(stderr, "client: unknow address type\n");
		return -1;
	    }
	    break;
	case 's':
	    port_start = atoi(optarg);
	    break;
	case 'e':
	    port_end = atoi(optarg);
	    break;
	case 't':
	    total_count = atoi(optarg);
	    break;
	case 'h':
	    usage(argv[0]);
	    return -1;
	case '?':
	default:
	    fprintf(stderr, "unknow option %c\n", optopt);
	    return -1;
	}
     }
     return 0;
}

int main(int argc, char *argv[]){
     int i;
     pthread_t tid;
     pthread_attr_t child_thread_attr;
     
     if(parse_options(argc, argv) < 0)
	return 0;

     if( port_end - port_start+1 > POOL_SIZE)
	port_end = port_start + POOL_SIZE -1;
     
     /*initialize port pool*/
     for(i = 0; port_start + i <= port_end; i++){
	port_pool[i] = port_start + i;
     }
     
     /*create threads, one thread per server port*/
     pthread_attr_init(&child_thread_attr);
     pthread_attr_setdetachstate(&child_thread_attr, PTHREAD_CREATE_DETACHED);
     for( i = 0; port_start + i <= port_end ; i++){
	    if( pthread_create(&tid, &child_thread_attr, talk_to_server, (void *)i) != 0 )
		fprintf(stderr, "client: pthread create failed thread %d port %d\n",
			i, port_start+i);
     }
     pthread_exit(NULL);
}

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox