Netdev List
 help / color / mirror / Atom feed
* Re: [RFC] GRO scalability
From: Herbert Xu @ 2012-10-06 10:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jesse Gross
In-Reply-To: <1349506825.21172.241.camel@edumazet-glaptop>

On Sat, Oct 06, 2012 at 09:00:25AM +0200, Eric Dumazet wrote:
> On Sat, 2012-10-06 at 08:22 +0200, Eric Dumazet wrote:
> 
> > All drivers.
> > 
> > If the napi->poll() consumes all budget, we dont call napi_complete()
> 
> Probably nobody noticed, because TCP stack retransmits packets
> eventually.
> 
> But this adds unexpected latencies.
> 
> I'll send a patch.

Yeah we should definitely do a flush in that case.  Thanks!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: Francois Romieu @ 2012-10-06 11:31 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: David S. Miller, netdev, Haicheng Li, Ben Hutchings
In-Reply-To: <20121006075954.GA29618@localhost>

Fengguang Wu <fengguang.wu@intel.com> :
[...]
> FYI, kernel build failed on
> 
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
> head:   c0b8b99287235626a5850ef7e5bfc842d1ebcecd
> commit: da1586461e53a4dd045738cce309ab488970f0ef [1/9] pch_gbe: Fix PTP dependencies.
> config: x86_64-randconfig-s052 (attached as .config)

7c236c43b838221e17220bcb39e8e8d8c7123713 does something like the patch
below for the sfc driver. It would be worth checking if gianfar and
intel get it right too.

/me must leave.

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 9730241..1dd9e33 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -21,15 +21,12 @@ config PCH_GBE
 	  ML7223/ML7831 is companion chip for Intel Atom E6xx series.
 	  ML7223/ML7831 is completely compatible for Intel EG20T PCH.
 
-if PCH_GBE
-
 config PCH_PTP
 	bool "PCH PTP clock support"
 	default n
+	depends on GBE && PTP_1588_CLOCK && !(GBE=y && PTP_1588_CLOCK=m)
 	select PTP_1588_CLOCK_PCH
 	---help---
 	  Say Y here if you want to use Precision Time Protocol (PTP) in the
 	  driver. PTP is a method to precisely synchronize distributed clocks
 	  over Ethernet networks.
-
-endif # PCH_GBE
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index b2a94d0..351a585 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -262,6 +262,7 @@ static int hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 	case HWTSTAMP_FILTER_NONE:
 		adapter->hwts_rx_en = 0;
 		break;
+#ifdef CONFIG_PCH_PTP
 	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
 		adapter->hwts_rx_en = 0;
 		pch_ch_control_write(pdev, SLAVE_MODE | CAP_MODE0);
@@ -282,6 +283,7 @@ static int hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 		strcpy(station, PTP_L2_MULTICAST_SA);
 		pch_set_station_address(station, pdev);
 		break;
+#endif
 	default:
 		return -ERANGE;
 	}

^ permalink raw reply related

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: Haicheng Li @ 2012-10-06 12:07 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: David S. Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <20121006075954.GA29618@localhost>

The failure is due to the CONFIG_PPS is not set there, consequently 
CONFIG_PTP_1588_CLOCK can not be set as =y anyway.

So David's patch of "da1586461e53a4dd045738cce309ab488970f0ef [1/9] pch_gbe: 
Fix PTP dependencies" is buggy. Furthermore, I think using "selects" to 
resolve such dependency issue is not good idea as it won't visit the dependencies.

David, I would still suggest to take my original patch:
https://lkml.org/lkml/2012/9/28/70

+    depends on PTP_1588_CLOCK_PCH && (PCH_GBE=m || PTP_1588_CLOCK_PCH=y)

or simply like:
---
From: Haicheng Li <haicheng.lee@gmail.com>

Fix build error caused by broken PCH_PTP module dependency.
The .config is:
         CONFIG_PCH_GBE=y
         CONFIG_PCH_PTP=y
         CONFIG_PTP_1588_CLOCK=m

The build error:

drivers/built-in.o: In function `pch_tx_timestamp':
.../pch_gbe_main.c:215: undefined reference to `pch_ch_event_read'
.../pch_gbe_main.c:225: undefined reference to `pch_tx_snap_read'
.../pch_gbe_main.c:231: undefined reference to `pch_ch_event_write'

.../pch_gbe_main.c:170: undefined reference to `pch_ch_event_read'
.../pch_gbe_main.c:175: undefined reference to `pch_src_uuid_lo_read'
.../pch_gbe_main.c:176: undefined reference to `pch_src_uuid_hi_read'
.../pch_gbe_main.c:190: undefined reference to `pch_ch_event_write'
.../pch_gbe_main.c:184: undefined reference to `pch_rx_snap_read'

.../pch_gbe_main.c:267: undefined reference to `pch_ch_control_write'
.../pch_gbe_main.c:271: undefined reference to `pch_ch_control_write'
.../pch_gbe_main.c:275: undefined reference to `pch_ch_control_write'
.../pch_gbe_main.c:281: undefined reference to `pch_ch_control_write'
.../pch_gbe_main.c:283: undefined reference to `pch_set_station_address'
.../pch_gbe_main.c:290: undefined reference to `pch_ch_event_write'

Signed-off-by: Haicheng Li <haicheng.lee@gmail.com>
---
  drivers/net/ethernet/oki-semi/pch_gbe/Kconfig |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig 
b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index bce0164..df1e649 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -21,12 +21,12 @@ config PCH_GBE
        ML7223/ML7831 is companion chip for Intel Atom E6xx series.
        ML7223/ML7831 is completely compatible for Intel EG20T PCH.

-if PCH_GBE
+if PTP_1588_CLOCK_PCH

  config PCH_PTP
      bool "PCH PTP clock support"
      default n
-    depends on PTP_1588_CLOCK_PCH
+    depends on PTP_1588_CLOCK_PCH=y || PCH_GBE=m
      ---help---
        Say Y here if you want to use Precision Time Protocol (PTP) in the
        driver. PTP is a method to precisely synchronize distributed clocks



On 10/06/2012 03:59 PM, Fengguang Wu wrote:
> Hi David,
>
> FYI, kernel build failed on
>
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
> head:   c0b8b99287235626a5850ef7e5bfc842d1ebcecd
> commit: da1586461e53a4dd045738cce309ab488970f0ef [1/9] pch_gbe: Fix PTP dependencies.
> config: x86_64-randconfig-s052 (attached as .config)
>
> All error/warnings:
>
> drivers/built-in.o: In function `pch_gbe_ioctl':
> pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
> pch_gbe_main.c:(.text+0x510393): undefined reference to `pch_ch_control_write'
> pch_gbe_main.c:(.text+0x5103b3): undefined reference to `pch_ch_control_write'
> pch_gbe_main.c:(.text+0x5103e1): undefined reference to `pch_set_station_address'
> pch_gbe_main.c:(.text+0x510403): undefined reference to `pch_ch_control_write'
> pch_gbe_main.c:(.text+0x510431): undefined reference to `pch_set_station_address'
> pch_gbe_main.c:(.text+0x51043e): undefined reference to `pch_ch_event_write'
> drivers/built-in.o: In function `pch_gbe_xmit_frame':
> pch_gbe_main.c:(.text+0x510fe4): undefined reference to `pch_ch_event_read'
> pch_gbe_main.c:(.text+0x511049): undefined reference to `pch_tx_snap_read'
> pch_gbe_main.c:(.text+0x51106f): undefined reference to `pch_ch_event_write'
> drivers/built-in.o: In function `pch_gbe_napi_poll':
> pch_gbe_main.c:(.text+0x511537): undefined reference to `pch_ch_event_read'
> pch_gbe_main.c:(.text+0x511562): undefined reference to `pch_src_uuid_lo_read'
> pch_gbe_main.c:(.text+0x51156d): undefined reference to `pch_src_uuid_hi_read'
> pch_gbe_main.c:(.text+0x511659): undefined reference to `pch_ch_event_write'
> pch_gbe_main.c:(.text+0x511dc1): undefined reference to `pch_rx_snap_read'
>
> ---
> 0-DAY kernel build testing backend         Open Source Technology Center
> Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply related

* [PATCH] pppoatm: don't send frames to destroyed vcc
From: Krzysztof Mazur @ 2012-10-06 12:19 UTC (permalink / raw)
  To: mitch, netdev; +Cc: davem, linux-kernel, dwmw2, Krzysztof Mazur

The pppoatm_send() uses vcc->send() directly and does not check if vcc
is ready for send(). This causes Oops when send() is used after
vcc_destroy_socket() at least with usbatm driver:

Oops: 0000 [#1] PREEMPT
Pid: 0, comm: swapper Not tainted 3.6.0-krzysiek-00001-gb7cd93b-dirty #60    /AK32
EIP: 0060:[<c01413c6>] EFLAGS: 00010082 CPU: 0
EIP is at __wake_up_common+0x16/0x70
EAX: 30707070 EBX: 00000292 ECX: 00000001 EDX: dca75fc0
ESI: 00000000 EDI: de7f500f EBP: df409f24 ESP: df409f08
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
CR0: 8005003b CR2: 30707070 CR3: 1c920000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process swapper (pid: 0, ti=df408000 task=c07bd4e0 task.ti=c07b0000)
Stack:
 00000000 00000001 00000001 dca75fc0 00000292 00000000 de7f500f df409f3c
 c0143299 00000000 00000000 dc84f000 dc84f000 df409f4c c0602bf0 00000000
 dc84f000 df409f58 c0604301 dc840cc0 df409fb4 c04672e5 c076a240 00000000
Call Trace:
 [<c0143299>] __wake_up+0x29/0x50
 [<c0602bf0>] vcc_write_space+0x40/0x80
 [<c0604301>] atm_pop_raw+0x21/0x30
 [<c04672e5>] usbatm_tx_process+0x2a5/0x380
 [<c0126cf9>] tasklet_action+0x39/0x70
 [<c0126f1f>] __do_softirq+0x7f/0x120
 [<c0126ea0>] ? local_bh_enable_ip+0xa0/0xa0
 <IRQ>

Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
indicate that vcc is not ready.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
---
This bug can be easily triggered with 9d02daf754238adac48fa075ee79e7edd3d79ed3
(pppoatm: Fix excessive queue bloat) present in Linux >= 3.5-rc1.
Steps to reproduce (tested with 9d02daf75 and v3.6):
$ ping -i 0.1 some.host.via.modem &
unplug ADSL line
sleep 3
plug ADSL line
The Oops occurs almost always.

On Linux v3.0.44, v3.4.12 the above steps did not trigger the bug, but
I saw similar Oops also on kernels < 3.5-rc1, but I was never able to
reproduce it or save kernel log.

This bug also exists in some stable kernels.

Maybe it's better to drop frame:
	if (..) {
		kfree_skb(skb);
		return 1;
	}
instead of
	goto nospace;

The full Oops + some logs from Linux 3.6.0 with some minor changes
(copying kernel log to VRAM after crash and enabled usbatm debugging with
some additional debug).

ATM dev 0: usbatm_atm_close entered
ATM dev 0: usbatm_atm_close: deallocating vcc 0xdca75e20 with vpi 0 vci 35
ATM dev 0: usbatm_cancel_send entered
ATM dev 0: usbatm_cancel_send: popping skb 0xdc840840
ATM dev 0: usbatm_cancel_send: popping current skb (0xdcb26740)
ATM dev 0: usbatm_cancel_send done
ATM dev 0: usbatm_atm_close successful
ATM dev 0: usbatm_atm_send entered (skb 0xdc840cc0, len 10) vcc=dc84f000
ATM dev 0: usbatm_atm_send queue skb: dc840cc0 vcc dc84f000
ATM dev 0: usbatm_atm_send entered (skb 0xdc84c920, len 10) vcc=dc84f000
ATM dev 0: usbatm_atm_send queue skb: dc84c920 vcc dc84f000
ATM dev 0: usbatm_tx_process skb dcb26740
  len=86
ATM dev 0: usbatm_tx_process skb dc840cc0
  len=10
usbatm_rx_process: 4991 callbacks suppressed
ATM dev 0: usbatm_rx_process: status 0 in frame 0!
ATM dev 0: usbatm_rx_process: status 0 in frame 1!
ATM dev 0: usbatm_rx_process: status 0 in frame 2!
ATM dev 0: usbatm_rx_process: status 0 in frame 0!
ATM dev 0: usbatm_rx_process: status 0 in frame 1!
ATM dev 0: usbatm_rx_process: status 0 in frame 2!
ATM dev 0: usbatm_rx_process: status 0 in frame 0!
ATM dev 0: usbatm_rx_process: status 0 in frame 1!
ATM dev 0: usbatm_rx_process: status 0 in frame 2!
ATM dev 0: usbatm_rx_process: status 0 in frame 0!
ATM dev 0: usbatm_atm_open: vpi 0, vci 35
ATM dev 0: usbatm_atm_open: allocated vcc data 0xdca75c20
ATM dev 0: usbatm_atm_send entered (skb 0xdcda8b40, len 12) vcc=d2761c00
ATM dev 0: usbatm_atm_send queue skb: dcda8b40 vcc d2761c00
ATM dev 0: usbatm_tx_process skb dc840cc0
  len=10
ATM dev 0: usbatm_atm_send entered (skb 0xdc8406c0, len 12) vcc=d2761c00
ATM dev 0: usbatm_atm_send queue skb: dc8406c0 vcc d2761c00
ATM dev 0: usbatm_tx_process skb dc840cc0
  len=10
ATM dev 0: usbatm_tx_process skb dc840cc0
  len=10
ATM dev 0: usbatm_tx_process pop skb dc840cc0, vcc dc84f000%
BUG: unable to handle kernel paging request at 30707070
IP: [<c01413c6>] __wake_up_common+0x16/0x70
*pde = 00000000 
Oops: 0000 [#1] PREEMPT 
Pid: 0, comm: swapper Not tainted 3.6.0-krzysiek-00001-gb7cd93b-dirty #60    /AK32 
EIP: 0060:[<c01413c6>] EFLAGS: 00010082 CPU: 0
EIP is at __wake_up_common+0x16/0x70
EAX: 30707070 EBX: 00000292 ECX: 00000001 EDX: dca75fc0
ESI: 00000000 EDI: de7f500f EBP: df409f24 ESP: df409f08
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
CR0: 8005003b CR2: 30707070 CR3: 1c920000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process swapper (pid: 0, ti=df408000 task=c07bd4e0 task.ti=c07b0000)
Stack:
 00000000 00000001 00000001 dca75fc0 00000292 00000000 de7f500f df409f3c
 c0143299 00000000 00000000 dc84f000 dc84f000 df409f4c c0602bf0 00000000
 dc84f000 df409f58 c0604301 dc840cc0 df409fb4 c04672e5 c076a240 00000000
Call Trace:
 [<c0143299>] __wake_up+0x29/0x50
 [<c0602bf0>] vcc_write_space+0x40/0x80
 [<c0604301>] atm_pop_raw+0x21/0x30
 [<c04672e5>] usbatm_tx_process+0x2a5/0x380
 [<c0126cf9>] tasklet_action+0x39/0x70
 [<c0126f1f>] __do_softirq+0x7f/0x120
 [<c0126ea0>] ? local_bh_enable_ip+0xa0/0xa0
 <IRQ> 

 [<c01271de>] ? irq_exit+0x6e/0x90
 [<c0103be3>] ? do_IRQ+0x43/0xb0
 [<c0144f9c>] ? sched_clock_local.constprop.1+0x4c/0x1a0
 [<c0626369>] ? common_interrupt+0x29/0x30
 [<c02f458b>] ? acpi_idle_enter_simple+0xf5/0x122
 [<c04bd81e>] ? cpuidle_enter+0x1e/0x30
 [<c04bdb08>] ? cpuidle_idle_call+0x68/0xd0
 [<c0109155>] ? cpu_idle+0x45/0x80
 [<c060b4df>] ? rest_init+0x63/0x74
 [<c07fd851>] ? start_kernel+0x25e/0x264
 [<c07fd42e>] ? repair_env_string+0x51/0x51
 [<c07fd26e>] ? i386_start_kernel+0x44/0x46
Code: c3 8d 74 26 00 31 f6 31 db eb cc 66 90 0f b6 4d e8 d3 eb eb bb 55 89 e5 57 56 53 83 ec 10 89 45 f0 89 55 ec 89 c2 8b 00 89 4d e8 <8b> 18 8d 70 f4 83 eb 0c 39 c2 75 0a eb 37 8d 74 26 00 89 de 89
EIP: [<c01413c6>] __wake_up_common+0x16/0x70 SS:ESP 0068:df409f08
CR2: 0000000030707070
---[ end trace bc86ff846f3d97ec ]---
Kernel panic - not syncing: Fatal exception in interrupt

 net/atm/pppoatm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 226dca9..0dcb5dc 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -269,6 +269,8 @@ static inline int pppoatm_may_send(struct pppoatm_vcc *pvcc, int size)
 static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 {
 	struct pppoatm_vcc *pvcc = chan_to_pvcc(chan);
+	struct atm_vcc *vcc;
+
 	ATM_SKB(skb)->vcc = pvcc->atmvcc;
 	pr_debug("(skb=0x%p, vcc=0x%p)\n", skb, pvcc->atmvcc);
 	if (skb->data[0] == '\0' && (pvcc->flags & SC_COMP_PROT))
@@ -301,6 +303,12 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 		return 1;
 	}
 
+	vcc = ATM_SKB(skb)->vcc;
+	if (test_bit(ATM_VF_RELEASED, &vcc->flags)
+			|| test_bit(ATM_VF_CLOSE, &vcc->flags)
+			|| !test_bit(ATM_VF_READY, &vcc->flags))
+		goto nospace;
+
 	atomic_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
-- 
1.7.12.2.2.g1c3c581

^ permalink raw reply related

* Re: [PATCH 0/3] virtio-net: inline header support
From: Paolo Bonzini @ 2012-10-06 12:54 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kvm, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Sasha Levin, avi, Thomas Lendacky
In-Reply-To: <87391t1nkq.fsf__40391.6521034718$1349505001$gmane$org@rustcorp.com.au>

Il 05/10/2012 07:43, Rusty Russell ha scritto:
>> >     struct virtio_scsi_req_cmd {
>> >         // Read-only
>> >         u8 lun[8];
>> >         u64 id;
>> >         u8 task_attr;
>> >         u8 prio;
>> >         u8 crn;
>> >         char cdb[cdb_size];
>> >         char dataout[];
>> >         // Write-only part
>> >         u32 sense_len;
>> >         u32 residual;
>> >         u16 status_qualifier;
>> >         u8 status;
>> >         u8 response;
>> >         u8 sense[sense_size];
>> >         char datain[];
>> >     };
>> >
>> > where cdb_size and sense_size come from configuration space.  The device
>> > right now expects everything before dataout/datain to be in a single
>> > descriptor, but that's in no way part of the spec.  Am I missing
>> > something egregious?
> Since you wrote it, I hope not :)

Yeah, I guess the confusion came from cdb_size and sense_size being in
configuration space.

> That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
> said to Anthony, the best rules are "always" and "never", so I'd really
> rather not have to grandfather that in.

It is, but we can add a rule that if the (transport) flag
VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
virtio-blk.

Paolo

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: David Miller @ 2012-10-06 13:22 UTC (permalink / raw)
  To: haicheng.li; +Cc: fengguang.wu, netdev, linux-kernel
In-Reply-To: <50701EEC.7080805@linux.intel.com>

From: Haicheng Li <haicheng.li@linux.intel.com>
Date: Sat, 06 Oct 2012 20:07:08 +0800

> The failure is due to the CONFIG_PPS is not set there, consequently
> CONFIG_PTP_1588_CLOCK can not be set as =y anyway.
> 
> So David's patch of "da1586461e53a4dd045738cce309ab488970f0ef [1/9]
> pch_gbe: Fix PTP dependencies" is buggy. Furthermore, I think using
> "selects" to resolve such dependency issue is not good idea as it
> won't visit the dependencies.
> 
> David, I would still suggest to take my original patch:
> https://lkml.org/lkml/2012/9/28/70
> 
> + depends on PTP_1588_CLOCK_PCH && (PCH_GBE=m || PTP_1588_CLOCK_PCH=y)
> 
> or simply like:

This is all very rediculous if you ask me.

Why should the user have to know a detail like the underlying
PTP chip type just to enable PTP on his networking card?

Because that is what you are making him do with your change.

Select removed the necessity of the user having to know these
things.

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: David Miller @ 2012-10-06 13:22 UTC (permalink / raw)
  To: romieu; +Cc: fengguang.wu, netdev, haicheng.li, ben
In-Reply-To: <20121006113126.GA17783@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sat, 6 Oct 2012 13:31:26 +0200

> Fengguang Wu <fengguang.wu@intel.com> :
> [...]
>> FYI, kernel build failed on
>> 
>> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
>> head:   c0b8b99287235626a5850ef7e5bfc842d1ebcecd
>> commit: da1586461e53a4dd045738cce309ab488970f0ef [1/9] pch_gbe: Fix PTP dependencies.
>> config: x86_64-randconfig-s052 (attached as .config)
> 
> 7c236c43b838221e17220bcb39e8e8d8c7123713 does something like the patch
> below for the sfc driver. It would be worth checking if gianfar and
> intel get it right too.

They use select and it should work here too, I don't want to
force users to have to know about an obscure PTP chip type
in order to enable PTP for their networking card.

It should be completely hidden from them.

That's why we should use select for this.

^ permalink raw reply

* Re: [PATCH] pppoatm: don't send frames to destroyed vcc
From: David Woodhouse @ 2012-10-06 13:32 UTC (permalink / raw)
  To: Krzysztof Mazur; +Cc: mitch, netdev, davem, linux-kernel
In-Reply-To: <1349525991-21462-1-git-send-email-krzysiek@podlesie.net>

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Sat, 2012-10-06 at 14:19 +0200, Krzysztof Mazur wrote:
> Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
> indicate that vcc is not ready.

And what locking prevents the flag from being set immediately after we
check it?

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: Haicheng Li @ 2012-10-06 14:07 UTC (permalink / raw)
  To: David Miller; +Cc: fengguang.wu, netdev, linux-kernel
In-Reply-To: <20121006.092207.1280541690613202532.davem@davemloft.net>

On 10/06/2012 09:22 PM, David Miller wrote:
> From: Haicheng Li<haicheng.li@linux.intel.com>
> Date: Sat, 06 Oct 2012 20:07:08 +0800
>
>> The failure is due to the CONFIG_PPS is not set there, consequently
>> CONFIG_PTP_1588_CLOCK can not be set as =y anyway.
>>
>> So David's patch of "da1586461e53a4dd045738cce309ab488970f0ef [1/9]
>> pch_gbe: Fix PTP dependencies" is buggy. Furthermore, I think using
>> "selects" to resolve such dependency issue is not good idea as it
>> won't visit the dependencies.
>>
>> David, I would still suggest to take my original patch:
>> https://lkml.org/lkml/2012/9/28/70
>>
>> + depends on PTP_1588_CLOCK_PCH&&  (PCH_GBE=m || PTP_1588_CLOCK_PCH=y)
>>
>> or simply like:
>
> This is all very rediculous if you ask me.
>
> Why should the user have to know a detail like the underlying
> PTP chip type just to enable PTP on his networking card?
>
> Because that is what you are making him do with your change.
>
> Select removed the necessity of the user having to know these
> things.
However it possibly breaks the build...

IMHO, the reason why the dependency of PCH_PTP becomes so tricky is that the 
code of these two modules call the functions of each other (bad code 
structure?). To fix it neatly, either we restructure the code or just simply 
make it:
+ depends on PTP_1588_CLOCK_PCH=y

For PCH_GBE=m case, it does be able to pass the build test, but I'm afraid it 
won't be smoothly workable via "insmod" due to the codependency of these two 
when PCH_PTP is enabled.

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: David Miller @ 2012-10-06 14:21 UTC (permalink / raw)
  To: haicheng.li; +Cc: fengguang.wu, netdev, linux-kernel
In-Reply-To: <50703B1B.2040705@linux.intel.com>

From: Haicheng Li <haicheng.li@linux.intel.com>
Date: Sat, 06 Oct 2012 22:07:23 +0800

> On 10/06/2012 09:22 PM, David Miller wrote:
>> From: Haicheng Li<haicheng.li@linux.intel.com>
>> Date: Sat, 06 Oct 2012 20:07:08 +0800
>>
>>> The failure is due to the CONFIG_PPS is not set there, consequently
>>> CONFIG_PTP_1588_CLOCK can not be set as =y anyway.
>>>
>>> So David's patch of "da1586461e53a4dd045738cce309ab488970f0ef [1/9]
>>> pch_gbe: Fix PTP dependencies" is buggy. Furthermore, I think using
>>> "selects" to resolve such dependency issue is not good idea as it
>>> won't visit the dependencies.
>>>
>>> David, I would still suggest to take my original patch:
>>> https://lkml.org/lkml/2012/9/28/70
>>>
>>> + depends on PTP_1588_CLOCK_PCH&&  (PCH_GBE=m || PTP_1588_CLOCK_PCH=y)
>>>
>>> or simply like:
>>
>> This is all very rediculous if you ask me.
>>
>> Why should the user have to know a detail like the underlying
>> PTP chip type just to enable PTP on his networking card?
>>
>> Because that is what you are making him do with your change.
>>
>> Select removed the necessity of the user having to know these
>> things.
> However it possibly breaks the build...
> 
> IMHO, the reason why the dependency of PCH_PTP becomes so tricky is
> that the code of these two modules call the functions of each other
> (bad code structure?). To fix it neatly, either we restructure the
> code or just simply make it:
> + depends on PTP_1588_CLOCK_PCH=y
> 
> For PCH_GBE=m case, it does be able to pass the build test, but I'm
> afraid it won't be smoothly workable via "insmod" due to the
> codependency of these two when PCH_PTP is enabled.

Then why does it work for IXGBE and others who use select?

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: Richard Cochran @ 2012-10-06 14:56 UTC (permalink / raw)
  To: Haicheng Li; +Cc: David Miller, fengguang.wu, netdev, linux-kernel
In-Reply-To: <50703B1B.2040705@linux.intel.com>

On Sat, Oct 06, 2012 at 10:07:23PM +0800, Haicheng Li wrote:
> 
> IMHO, the reason why the dependency of PCH_PTP becomes so tricky is
> that the code of these two modules call the functions of each other
> (bad code structure?).

Yes, that is the source of the problem. I don't recall how this habit
of having the PTP option as a module got started (and I hope it wasn't
me), but I think the right way to handle this option is with a simple
boolean connected to ifdefs for the MAC driver.

Come to think of it, it may have all started with the gianfar ptp
module. In principle, the time stamping function of a MAC driver can
be completely separate from the clock function, and indeed that is how
the pair of gianfar drivers work.

But for other hardware, it might not be practical to keep the
functions separate, and in that case I would say, just keep it as one
driver.

Thanks,
Richard

PS It looks like no one is really looking after this PCH thing, so
does anyone want to send me a board so I can take care of it? Let me
know off list.

^ permalink raw reply

* Re: [PATCH] pppoatm: don't send frames to destroyed vcc
From: Krzysztof Mazur @ 2012-10-06 15:38 UTC (permalink / raw)
  To: David Woodhouse; +Cc: mitch, netdev, davem, linux-kernel
In-Reply-To: <1349530370.6524.2.camel@shinybook.infradead.org>

On Sat, Oct 06, 2012 at 02:32:50PM +0100, David Woodhouse wrote:
> On Sat, 2012-10-06 at 14:19 +0200, Krzysztof Mazur wrote:
> > Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
> > indicate that vcc is not ready.
> 
> And what locking prevents the flag from being set immediately after we
> check it?
> 

nothing, this patch should fix this. 

The vcc_sendmsg() uses lock_sock(sk). This lock is used by
vcc_release(), so vcc_destroy_socket() will not be called between
check and during ->send(). The vcc_release_async() sets ATM_VF_CLOSE,
but it should be safe to call ->send() after it, because
vcc->dev->ops->close() is not called.

The pppoatm_send() is called with bottom halfs disabled, so
bh_lock_sock() should be used instead of lock_sock().

diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 0dcb5dc..29afc68 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -270,6 +270,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 {
 	struct pppoatm_vcc *pvcc = chan_to_pvcc(chan);
 	struct atm_vcc *vcc;
+	int ret;
 
 	ATM_SKB(skb)->vcc = pvcc->atmvcc;
 	pr_debug("(skb=0x%p, vcc=0x%p)\n", skb, pvcc->atmvcc);
@@ -304,17 +305,22 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 	}
 
 	vcc = ATM_SKB(skb)->vcc;
+	bh_lock_sock(sk_atm(vcc));
 	if (test_bit(ATM_VF_RELEASED, &vcc->flags)
 			|| test_bit(ATM_VF_CLOSE, &vcc->flags)
-			|| !test_bit(ATM_VF_READY, &vcc->flags))
+			|| !test_bit(ATM_VF_READY, &vcc->flags)) {
+		bh_unlock_sock(sk_atm(vcc));
 		goto nospace;
+	}
 
 	atomic_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
 		 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
-	return ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
+	ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
 	    ? DROP_PACKET : 1;
+	bh_unlock_sock(sk_atm(vcc));
+	return ret;
 nospace:
 	/*
 	 * We don't have space to send this SKB now, but we might have

-- 
Krzysiek

^ permalink raw reply related

* Re: [PATCH] pppoatm: don't send frames to destroyed vcc
From: Krzysztof Mazur @ 2012-10-06 15:46 UTC (permalink / raw)
  To: David Woodhouse; +Cc: mitch, netdev, davem, linux-kernel
In-Reply-To: <20121006153804.GA13564@shrek.podlesie.net>

On Sat, Oct 06, 2012 at 05:38:04PM +0200, Krzysztof Mazur wrote:
> On Sat, Oct 06, 2012 at 02:32:50PM +0100, David Woodhouse wrote:
> > On Sat, 2012-10-06 at 14:19 +0200, Krzysztof Mazur wrote:
> > > Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
> > > indicate that vcc is not ready.
> > 
> > And what locking prevents the flag from being set immediately after we
> > check it?
> > 
> 
> nothing, this patch should fix this. 
> 

I think there is another problem here. The pppoatm gets a reference
to atmvcc, but I don't see anything that protects against removal
of that vcc.

The vcc uses vcc->sk socket for reference counting, so sock_hold()
and sock_put() should be used by pppoatm.

diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 29afc68..126f57f 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -154,6 +154,7 @@ static void pppoatm_unassign_vcc(struct atm_vcc *atmvcc)
 	tasklet_kill(&pvcc->wakeup_tasklet);
 	ppp_unregister_channel(&pvcc->chan);
 	atmvcc->user_back = NULL;
+	sock_put(sk_atm(pvcc->atmvcc));
 	kfree(pvcc);
 	/* Gee, I hope we have the big kernel lock here... */
 	module_put(THIS_MODULE);
@@ -371,6 +372,7 @@ static int pppoatm_assign_vcc(struct atm_vcc *atmvcc, void __user *arg)
 	if (pvcc == NULL)
 		return -ENOMEM;
 	pvcc->atmvcc = atmvcc;
+	sock_hold(sk_atm(atmvcc));
 
 	/* Maximum is zero, so that we can use atomic_inc_not_zero() */
 	atomic_set(&pvcc->inflight, NONE_INFLIGHT);
@@ -385,6 +387,7 @@ static int pppoatm_assign_vcc(struct atm_vcc *atmvcc, void __user *arg)
 	pvcc->wakeup_tasklet.data = (unsigned long) &pvcc->chan;
 	err = ppp_register_channel(&pvcc->chan);
 	if (err != 0) {
+		sock_put(sk_atm(atmvcc));
 		kfree(pvcc);
 		return err;
 	}

-- 
Krzysiek

^ permalink raw reply related

* Re: Filter matching problems
From: John A. Sullivan III @ 2012-10-06 16:30 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349459175.18710.112.camel@denise.theartistscloset.com>

On Fri, 2012-10-05 at 13:46 -0400, John A. Sullivan III wrote:
> Hello, all.  Can anyone tell me why this filter does not match anything
> when I think it should match everything:
> 
> tc filter replace dev bond3 parent ffff: protocol ip prio 1 u32 match u8
> 0 0 action mirred egress redirect dev ifb0
> 
> I get:
> tc -s filter show dev bond3 parent ffff:
> filter protocol ip pref 1 u32
> filter protocol ip pref 1 u32 fh 800: ht divisor 1
> filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0
> (rule hit 1216169263 success 0)
>   match 00000000/00000000 at 0 (success 1216169263 )
>         action order 1: mirred (Egress Redirect to device ifb0) stolen
>         index 2 ref 1 bind 1 installed 11362879 sec used 11362879 sec
>         Action statistics:
>         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>         rate 0bit 0pps backlog 0b 0p requeues 0
> 
> which I think is telling me I'm not getting any matches.  Here are more
> details.
> 
> 
> We are redirecting ingress into ifb0 and redirecting egress into ifb1
> (because several interfaces, e.g., OpenVPN tun devices) need the exact
> same traffic shaping.  We realize that we cannot place the ingress
> filter at ffff: and the egress filter at root as those appear to be
> treated as one and the same, i.e., the first match of those two filters
> will be used and the other will thus never see any packets.
> 
> To get around that problem, we inserted a priomap on the interface and
> apply the egress redirect filter on it's child like this:
> 
> tc qdisc replace dev bond3 root handle 1: prio bands 2 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> tc filter replace dev bond3 parent 1:0 protocol ip prio 1 u32 match u8 0 0 flowid 1:1 action mirred egress redirect dev ifb1
> tc qdisc replace dev bond3 ingress
> tc filter replace dev bond3 parent ffff: protocol ip prio 1 u32 match u8 0 0 action mirred egress redirect dev ifb0
> 
> So what have I mangled? Thanks - John
> 
<snip>
I believe I've solved it.  We were not specifying a flowid.  Debian
Squeeze in our test lab did not object but the production system in this
case is CentOS 5.7.  I am guessing the older tc did not like the missing
flowid.  We added flowid :1 and all started working.  I am a little
confused though.  What meaning does the flowid have on an ingress qdisc
and what is class 0:1? Thanks - John

^ permalink raw reply

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: Haicheng Li @ 2012-10-06 17:21 UTC (permalink / raw)
  To: David Miller; +Cc: fengguang.wu, netdev, linux-kernel
In-Reply-To: <20121006.102132.1497404004255213029.davem@davemloft.net>



On 10/06/2012 10:21 PM, David Miller wrote:
> From: Haicheng Li<haicheng.li@linux.intel.com>
> Date: Sat, 06 Oct 2012 22:07:23 +0800
>
>> On 10/06/2012 09:22 PM, David Miller wrote:
>>> From: Haicheng Li<haicheng.li@linux.intel.com>
>>> Date: Sat, 06 Oct 2012 20:07:08 +0800
>>>
>>>> The failure is due to the CONFIG_PPS is not set there, consequently
>>>> CONFIG_PTP_1588_CLOCK can not be set as =y anyway.
>>>>
>>>> So David's patch of "da1586461e53a4dd045738cce309ab488970f0ef [1/9]
>>>> pch_gbe: Fix PTP dependencies" is buggy. Furthermore, I think using
>>>> "selects" to resolve such dependency issue is not good idea as it
>>>> won't visit the dependencies.
>>>>
>>>> David, I would still suggest to take my original patch:
>>>> https://lkml.org/lkml/2012/9/28/70
>>>>
>>>> + depends on PTP_1588_CLOCK_PCH&&   (PCH_GBE=m || PTP_1588_CLOCK_PCH=y)
>>>>
>>>> or simply like:
>>>
>>> This is all very rediculous if you ask me.
>>>
>>> Why should the user have to know a detail like the underlying
>>> PTP chip type just to enable PTP on his networking card?
>>>
>>> Because that is what you are making him do with your change.
>>>
>>> Select removed the necessity of the user having to know these
>>> things.
>> However it possibly breaks the build...
>>
>> IMHO, the reason why the dependency of PCH_PTP becomes so tricky is
>> that the code of these two modules call the functions of each other
>> (bad code structure?). To fix it neatly, either we restructure the
>> code or just simply make it:
>> + depends on PTP_1588_CLOCK_PCH=y
>>
>> For PCH_GBE=m case, it does be able to pass the build test, but I'm
>> afraid it won't be smoothly workable via "insmod" due to the
>> codependency of these two when PCH_PTP is enabled.
>
> Then why does it work for IXGBE and others who use select?

They explicitly select all the possible dependencies if they are bug-free (I 
didn't strictly check them).

Take IXGBE_PTP as example, it explicitly selects PPS, and also depends on 
EXPERIMENTAL:
config IXGBE_PTP
         bool "PTP Clock Support"
         default n
         depends on IXGBE && EXPERIMENTAL
         select PPS
         select PTP_1588_CLOCK

So if you stick to use "select" as the convention of such build issue fixing, 
fengguang's build failure would be fixed by:
	+ depends on EXPERIMENTAL
	+ select PPS
	+ select PTP_1588_CLOCK

would you prefer this way?

^ permalink raw reply

* [PATCH] net: gro: selective flush of packets
From: Eric Dumazet @ 2012-10-06 18:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, Jesse Gross, Tom Herbert, Yuchung Cheng
In-Reply-To: <20121006105618.GA28591@gondor.apana.org.au>

From: Eric Dumazet <edumazet@google.com>

Current GRO can hold packets in gro_list for almost unlimited
time, in case napi->poll() handler consumes its budget over and over.

In this case, napi_complete()/napi_gro_flush() are not called.

Another problem is that gro_list is flushed in non friendly way :
We scan the list and complete packets in the reverse order.
(youngest packets first, oldest packets last)
This defeats priorities that sender could have cooked.

Since GRO currently only store TCP packets, we dont really notice the
bug because of retransmits, but this behavior can add unexpected
latencies, particularly on mice flows clamped by elephant flows.

This patch makes sure no packet can stay more than 1 ms in queue, and
only in stress situations.

It also complete packets in the right order to minimize latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 drivers/net/ethernet/marvell/skge.c   |    2 -
 drivers/net/ethernet/realtek/8139cp.c |    2 -
 include/linux/netdevice.h             |   15 +++++----
 net/core/dev.c                        |   38 +++++++++++++++++++-----
 4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/marvell/skge.c b/drivers/net/ethernet/marvell/skge.c
index 5a30bf8..eb3cba0 100644
--- a/drivers/net/ethernet/marvell/skge.c
+++ b/drivers/net/ethernet/marvell/skge.c
@@ -3189,7 +3189,7 @@ static int skge_poll(struct napi_struct *napi, int to_do)
 	if (work_done < to_do) {
 		unsigned long flags;
 
-		napi_gro_flush(napi);
+		napi_gro_flush(napi, false);
 		spin_lock_irqsave(&hw->hw_lock, flags);
 		__napi_complete(napi);
 		hw->intr_mask |= napimask[skge->port];
diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 995d0cf..1c81825 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -563,7 +563,7 @@ rx_next:
 		if (cpr16(IntrStatus) & cp_rx_intr_mask)
 			goto rx_status_loop;
 
-		napi_gro_flush(napi);
+		napi_gro_flush(napi, false);
 		spin_lock_irqsave(&cp->lock, flags);
 		__napi_complete(napi);
 		cpw16_f(IntrMask, cp_intr_mask);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 01646aa..63efc28 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1497,19 +1497,22 @@ struct napi_gro_cb {
 	/* This indicates where we are processing relative to skb->data. */
 	int data_offset;
 
-	/* This is non-zero if the packet may be of the same flow. */
-	int same_flow;
-
 	/* This is non-zero if the packet cannot be merged with the new skb. */
 	int flush;
 
 	/* Number of segments aggregated. */
-	int count;
+	u16	count;
+
+	/* This is non-zero if the packet may be of the same flow. */
+	u8	same_flow;
 
 	/* Free the skb? */
-	int free;
+	u8	free;
 #define NAPI_GRO_FREE		  1
 #define NAPI_GRO_FREE_STOLEN_HEAD 2
+
+	/* jiffies when first packet was created/queued */
+	unsigned long age;
 };
 
 #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)
@@ -2157,7 +2160,7 @@ extern gro_result_t	dev_gro_receive(struct napi_struct *napi,
 extern gro_result_t	napi_skb_finish(gro_result_t ret, struct sk_buff *skb);
 extern gro_result_t	napi_gro_receive(struct napi_struct *napi,
 					 struct sk_buff *skb);
-extern void		napi_gro_flush(struct napi_struct *napi);
+extern void		napi_gro_flush(struct napi_struct *napi, bool flush_old);
 extern struct sk_buff *	napi_get_frags(struct napi_struct *napi);
 extern gro_result_t	napi_frags_finish(struct napi_struct *napi,
 					  struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 1e0a184..8c960e8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3471,17 +3471,31 @@ out:
 	return netif_receive_skb(skb);
 }
 
-inline void napi_gro_flush(struct napi_struct *napi)
+/* napi->gro_list contains packets ordered by age.
+ * youngest packets at the head of it.
+ * Complete skbs in reverse order to reduce latencies.
+ */
+void napi_gro_flush(struct napi_struct *napi, bool flush_old)
 {
-	struct sk_buff *skb, *next;
+	struct sk_buff *skb, *prev = NULL;
 
-	for (skb = napi->gro_list; skb; skb = next) {
-		next = skb->next;
+	/* scan list and build reverse chain */
+	for (skb = napi->gro_list; skb != NULL; skb = skb->next) {
+		skb->prev = prev;
+		prev = skb;
+	}
+
+	for (skb = prev; skb; skb = prev) {
 		skb->next = NULL;
+
+		if (flush_old && NAPI_GRO_CB(skb)->age == jiffies)
+			return;
+
+		prev = skb->prev;
 		napi_gro_complete(skb);
+		napi->gro_count--;
 	}
 
-	napi->gro_count = 0;
 	napi->gro_list = NULL;
 }
 EXPORT_SYMBOL(napi_gro_flush);
@@ -3542,6 +3556,7 @@ enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 
 	napi->gro_count++;
 	NAPI_GRO_CB(skb)->count = 1;
+	NAPI_GRO_CB(skb)->age = jiffies;
 	skb_shinfo(skb)->gso_size = skb_gro_len(skb);
 	skb->next = napi->gro_list;
 	napi->gro_list = skb;
@@ -3876,7 +3891,7 @@ void napi_complete(struct napi_struct *n)
 	if (unlikely(test_bit(NAPI_STATE_NPSVC, &n->state)))
 		return;
 
-	napi_gro_flush(n);
+	napi_gro_flush(n, false);
 	local_irq_save(flags);
 	__napi_complete(n);
 	local_irq_restore(flags);
@@ -3981,8 +3996,17 @@ static void net_rx_action(struct softirq_action *h)
 				local_irq_enable();
 				napi_complete(n);
 				local_irq_disable();
-			} else
+			} else {
+				if (n->gro_list) {
+					/* flush too old packets
+					 * If HZ < 1000, flush all packets.
+					 */
+					local_irq_enable();
+					napi_gro_flush(n, HZ >= 1000);
+					local_irq_disable();
+				}
 				list_move_tail(&n->poll_list, &sd->poll_list);
+			}
 		}
 
 		netpoll_poll_unlock(have);

^ permalink raw reply related

* [RFC] ipv6: gro: IPV6_GRO_CB(skb)->proto problem
From: Eric Dumazet @ 2012-10-06 19:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev

It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
if a new skb is allocated (to serve as an anchor for frag_list)

At line 3049 we copy NAPI_GRO_CB() only (not the IPV6 specific part)

*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);

So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
of PROTO_TCP

So ipv6_gro_complete() wont be able to call ops->gro_complete()
[ tcp6_gro_complete() ]

I would fix this by moving proto in NAPI_GRO_CB() [ ie getting rid of
IPV6_GRO_CB ]

Am I missing something ?

(I'll submit a proper patch once/if prior GRO ones are accepted/merged
by David)

Thanks

 include/linux/netdevice.h |    2 ++
 net/ipv6/af_inet6.c       |   11 ++---------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 01646aa..3f13441 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1510,6 +1510,8 @@ struct napi_gro_cb {
 	int free;
 #define NAPI_GRO_FREE		  1
 #define NAPI_GRO_FREE_STOLEN_HEAD 2
+
+	u8 proto;
 };
 
 #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e22e6d8..5c7d809 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -822,13 +822,6 @@ out:
 	return segs;
 }
 
-struct ipv6_gro_cb {
-	struct napi_gro_cb napi;
-	int proto;
-};
-
-#define IPV6_GRO_CB(skb) ((struct ipv6_gro_cb *)(skb)->cb)
-
 static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 					 struct sk_buff *skb)
 {
@@ -874,7 +867,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 		iph = ipv6_hdr(skb);
 	}
 
-	IPV6_GRO_CB(skb)->proto = proto;
+	NAPI_GRO_CB(skb)->proto = proto;
 
 	flush--;
 	nlen = skb_network_header_len(skb);
@@ -927,7 +920,7 @@ static int ipv6_gro_complete(struct sk_buff *skb)
 				 sizeof(*iph));
 
 	rcu_read_lock();
-	ops = rcu_dereference(inet6_protos[IPV6_GRO_CB(skb)->proto]);
+	ops = rcu_dereference(inet6_protos[NAPI_GRO_CB(skb)->proto]);
 	if (WARN_ON(!ops || !ops->gro_complete))
 		goto out_unlock;
 

^ permalink raw reply related

* Re: [net:master 1/9] pch_gbe_main.c:(.text+0x510370): undefined reference to `pch_ch_control_write'
From: David Miller @ 2012-10-06 21:17 UTC (permalink / raw)
  To: haicheng.li; +Cc: fengguang.wu, netdev, linux-kernel
In-Reply-To: <50706885.1030908@linux.intel.com>

From: Haicheng Li <haicheng.li@linux.intel.com>
Date: Sun, 07 Oct 2012 01:21:09 +0800

> Take IXGBE_PTP as example, it explicitly selects PPS, and also depends
> on EXPERIMENTAL:
> config IXGBE_PTP
>         bool "PTP Clock Support"
>         default n
>         depends on IXGBE && EXPERIMENTAL
>         select PPS
>         select PTP_1588_CLOCK
> 
> So if you stick to use "select" as the convention of such build issue
> fixing, fengguang's build failure would be fixed by:
> 	+ depends on EXPERIMENTAL
> 	+ select PPS
> 	+ select PTP_1588_CLOCK
> 
> would you prefer this way?

Yes, I would.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: gro: selective flush of packets
From: Herbert Xu @ 2012-10-07  0:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Jesse Gross, Tom Herbert, Yuchung Cheng
In-Reply-To: <1349546929.21172.1598.camel@edumazet-glaptop>

On Sat, Oct 06, 2012 at 08:08:49PM +0200, Eric Dumazet wrote:
>
> @@ -3981,8 +3996,17 @@ static void net_rx_action(struct softirq_action *h)
>  				local_irq_enable();
>  				napi_complete(n);
>  				local_irq_disable();
> -			} else
> +			} else {
> +				if (n->gro_list) {
> +					/* flush too old packets
> +					 * If HZ < 1000, flush all packets.
> +					 */
> +					local_irq_enable();
> +					napi_gro_flush(n, HZ >= 1000);
> +					local_irq_disable();
> +				}
>  				list_move_tail(&n->poll_list, &sd->poll_list);
> +			}

Why don't we just always flush everything?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] flexcan: disable bus error interrupts for the i.MX28
From: Shawn Guo @ 2012-10-07  3:09 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Linux Netdev List, Linux-CAN, Hui Wang, Dong Aisheng
In-Reply-To: <5065A35B.3020702@grandegger.com>

On Fri, Sep 28, 2012 at 03:17:15PM +0200, Wolfgang Grandegger wrote:
> Due to a bug in most Flexcan cores, the bus error interrupt needs
> to be enabled. Otherwise we don't get any error warning or passive
> interrupts. This is _not_ necessay for the i.MX28 and this patch
> disables bus error interrupts if "berr-reporting" is not requested.
> This avoids bus error flooding, which might harm, especially on
> low-end systems.
> 
> To handle such quirks of the Flexcan cores, a hardware feature flag
> has been introduced, also replacing the "hw_ver" variable. So far
> nobody could tell what Flexcan core version is available on what
> Freescale SOC, apart from the i.MX6Q and P1010, and which bugs or
> features are present on the various "hw_rev".
> 
> CC: Hui Wang <jason77.wang@gmail.com>
> CC: Shawn Guo <shawn.guo@linaro.org>
> Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
> ---
> 
> Concerning the bug, I know that the i.MX35 does have it. Maybe other
> Flexcan cores than on the i.MX28 does *not* have it either. If you
> have a chance, please check on the P1010, i.MX6Q, i.MX51, i.MX53,
> etc.

>From what I can tell, i.MX35, i.MX51 and i.MX53 use the same version,
so they should all have the bug.  And for i.MX6Q, since it uses a newer
version even than i.MX28, I would believe it's affected by the bug.
But I'm copying Dong who should have better knowledge about this to
confirm. 

Shawn

> 
> Wolfgang.
> 
> 
>  drivers/net/can/flexcan.c |   29 +++++++++++++++++++----------
>  1 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index c5f1431..c78ecfc 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -144,6 +144,10 @@
>  
>  #define FLEXCAN_MB_CODE_MASK		(0xf0ffffff)
>  
> +/* FLEXCAN hardware feature flags */
> +#define FLEXCAN_HAS_V10_FEATURES	BIT(1) /* For core version >= 10 */
> +#define FLEXCAN_HAS_BROKEN_ERR_STATE	BIT(2) /* Broken error state handling */
> +
>  /* Structure of the message buffer */
>  struct flexcan_mb {
>  	u32 can_ctrl;
> @@ -178,7 +182,7 @@ struct flexcan_regs {
>  };
>  
>  struct flexcan_devtype_data {
> -	u32 hw_ver;	/* hardware controller version */
> +	u32 features;	/* hardware controller features */
>  };
>  
>  struct flexcan_priv {
> @@ -197,11 +201,11 @@ struct flexcan_priv {
>  };
>  
>  static struct flexcan_devtype_data fsl_p1010_devtype_data = {
> -	.hw_ver = 3,
> +	.features = FLEXCAN_HAS_BROKEN_ERR_STATE,
>  };
> -
> +static struct flexcan_devtype_data fsl_imx28_devtype_data;
>  static struct flexcan_devtype_data fsl_imx6q_devtype_data = {
> -	.hw_ver = 10,
> +	.features = FLEXCAN_HAS_V10_FEATURES | FLEXCAN_HAS_BROKEN_ERR_STATE,
>  };
>  
>  static const struct can_bittiming_const flexcan_bittiming_const = {
> @@ -741,15 +745,19 @@ static int flexcan_chip_start(struct net_device *dev)
>  	 * enable tx and rx warning interrupt
>  	 * enable bus off interrupt
>  	 * (== FLEXCAN_CTRL_ERR_STATE)
> -	 *
> -	 * _note_: we enable the "error interrupt"
> -	 * (FLEXCAN_CTRL_ERR_MSK), too. Otherwise we don't get any
> -	 * warning or bus passive interrupts.
>  	 */
>  	reg_ctrl = flexcan_read(&regs->ctrl);
>  	reg_ctrl &= ~FLEXCAN_CTRL_TSYN;
>  	reg_ctrl |= FLEXCAN_CTRL_BOFF_REC | FLEXCAN_CTRL_LBUF |
> -		FLEXCAN_CTRL_ERR_STATE | FLEXCAN_CTRL_ERR_MSK;
> +		FLEXCAN_CTRL_ERR_STATE;
> +	/*
> +	 * enable the "error interrupt" (FLEXCAN_CTRL_ERR_MSK),
> +	 * on most Flexcan cores, too. Otherwise we don't get
> +	 * any error warning or passive interrupts.
> +	 */
> +	if (priv->devtype_data->features & FLEXCAN_HAS_BROKEN_ERR_STATE ||
> +	    priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
> +		reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
>  
>  	/* save for later use */
>  	priv->reg_ctrl_default = reg_ctrl;
> @@ -772,7 +780,7 @@ static int flexcan_chip_start(struct net_device *dev)
>  	flexcan_write(0x0, &regs->rx14mask);
>  	flexcan_write(0x0, &regs->rx15mask);
>  
> -	if (priv->devtype_data->hw_ver >= 10)
> +	if (priv->devtype_data->features & FLEXCAN_HAS_V10_FEATURES)
>  		flexcan_write(0x0, &regs->rxfgmask);
>  
>  	flexcan_transceiver_switch(priv, 1);
> @@ -954,6 +962,7 @@ static void __devexit unregister_flexcandev(struct net_device *dev)
>  
>  static const struct of_device_id flexcan_of_match[] = {
>  	{ .compatible = "fsl,p1010-flexcan", .data = &fsl_p1010_devtype_data, },
> +	{ .compatible = "fsl,imx28-flexcan", .data = &fsl_imx28_devtype_data, },
>  	{ .compatible = "fsl,imx6q-flexcan", .data = &fsl_imx6q_devtype_data, },
>  	{ /* sentinel */ },
>  };
> -- 
> 1.7.7.6
> 

^ permalink raw reply

* Re: [RFC:PATCH 3.6.0-] net/ipconfig: Extend ipconfig retries to device open
From: David Miller @ 2012-10-07  4:11 UTC (permalink / raw)
  To: srinivas.kandagatla; +Cc: netdev
In-Reply-To: <1349365123-25826-1-git-send-email-srinivas.kandagatla@st.com>

From: Srinivas KANDAGATLA <srinivas.kandagatla@st.com>
Date: Thu,  4 Oct 2012 16:38:43 +0100

> This patch adds retries to ipconfig at device open, the reason to do
> this is: Lets say If some mdio bus driver decide to use defered probe
> when it does not find any phys on the bus. The same mdio-bus driver is
> re-probed as part of lateinit calls. However ipconfig also fits into
> lateinit calls, so if ipconfig is called before the re-probe of mdio-bus
> driver, the mac driver will fail to find a valid PHY on the mdio-bus.

Real device drivers for real devices should not probe using late
initcalls.

The whole point of late initcalls is that you can be certain that they
run after such things.

Fix the virus not the symptom.

I'm not applying this patch.

^ permalink raw reply

* Re: [PATCH] vlan: don't deliver frames for unknown vlans to protocols
From: David Miller @ 2012-10-07  4:14 UTC (permalink / raw)
  To: florz; +Cc: kaber, eric.dumazet, netdev, linux-kernel, jpirko
In-Reply-To: <20121004165020.GT1162@florz.florz.dyndns.org>

From: Florian Zumbiehl <florz@florz.de>
Date: Thu, 4 Oct 2012 18:50:20 +0200

> 6a32e4f9dd9219261f8856f817e6655114cfec2f made the vlan code skip marking
> vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
> there was an rx_handler, as the rx_handler could cause the frame to be received
> on a different (virtual) vlan-capable interface where that vlan might be
> configured.
> 
> As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
> frames for unknown vlans to be delivered to the protocol stack as if they had
> been received untagged.
> 
> For example, if an ipv6 router advertisement that's tagged for a locally not
> configured vlan is received on an interface with macvlan interfaces attached,
> macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
> macvlan interfaces, which caused it to be passed to the protocol stack, leading
> to ipv6 addresses for the announced prefix being configured even though those
> are completely unusable on the underlying interface.
> 
> The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
> rx_handler, if there is one, sees the frame unchanged, but afterwards,
> before the frame is delivered to the protocol stack, it gets marked whether
> there is an rx_handler or not.
> 
> Signed-off-by: Florian Zumbiehl <florz@florz.de>

I agree with your analysis but I do not like your fix.

This is one of the hottest paths in the networking stack and I don't
want to add more "pass pointer to local variable" type interfaces.

Fix this in a cleaner way please, thanks

^ permalink raw reply

* Re: [RFC] ipv6: gro: IPV6_GRO_CB(skb)->proto problem
From: Herbert Xu @ 2012-10-07  4:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1349550927.21172.1759.camel@edumazet-glaptop>

On Sat, Oct 06, 2012 at 09:15:27PM +0200, Eric Dumazet wrote:
> It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
> if a new skb is allocated (to serve as an anchor for frag_list)
> 
> At line 3049 we copy NAPI_GRO_CB() only (not the IPV6 specific part)
> 
> *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);
> 
> So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
> of PROTO_TCP
> 
> So ipv6_gro_complete() wont be able to call ops->gro_complete()
> [ tcp6_gro_complete() ]
> 
> I would fix this by moving proto in NAPI_GRO_CB() [ ie getting rid of
> IPV6_GRO_CB ]
> 
> Am I missing something ?
> 
> (I'll submit a proper patch once/if prior GRO ones are accepted/merged
> by David)

No I think you're absolutely right.

>  include/linux/netdevice.h |    2 ++
>  net/ipv6/af_inet6.c       |   11 ++---------
>  2 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 01646aa..3f13441 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1510,6 +1510,8 @@ struct napi_gro_cb {
>  	int free;
>  #define NAPI_GRO_FREE		  1
>  #define NAPI_GRO_FREE_STOLEN_HEAD 2
> +
> +	u8 proto;

I'd prefer to keep it as an int since we're not really running
short on space.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [patch v4 1/2] netlink: add reference of module in netlink_dump_start
From: David Miller @ 2012-10-07  4:31 UTC (permalink / raw)
  To: gaofeng-BthXqXjhjHXQFUHtdCDX3A
  Cc: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
	pablo-Cap9r6Oaw4JrovVCs/uTlw, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA, jengelh-9+2X+4sQBs8
In-Reply-To: <1349417749-24869-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

From: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Date: Fri, 5 Oct 2012 14:15:48 +0800

> I get a panic when I use ss -a and rmmod inet_diag at the
> same time.
> 
> it's because netlink_dump use inet_diag_dump witch function
> belongs to module inet_diag.
> 
> I search the codes and find many modules have the same problem.
> We need add reference of the module which the cb->dump belongs
> to.
> 
> Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.
> 
> Change From v3:
> change netlink_dump_start to inline,suggestion from Pablo and
> Eric.
> 
> Change From v2:
> delete netlink_dump_done,and call module_put in netlink_dump
> and netlink_sock_destruct.
> 
> Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch v4 2/2] infiniband: pass rdma_cm module to netlink_dump_start
From: David Miller @ 2012-10-07  4:31 UTC (permalink / raw)
  To: gaofeng-BthXqXjhjHXQFUHtdCDX3A
  Cc: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
	pablo-Cap9r6Oaw4JrovVCs/uTlw, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA, jengelh-9+2X+4sQBs8,
	roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <1349417749-24869-2-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

From: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Date: Fri, 5 Oct 2012 14:15:49 +0800

> set netlink_dump_control.module to avoid panic.
> 
> Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox