Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH net-next 01/10] qlcnic: Print informational messages only once during driver load.
From: Sucheta Chakraborty @ 2013-10-10 12:41 UTC (permalink / raw)
  To: Stephen Hemminger, Himanshu Madhani
  Cc: David Miller, netdev, Dept-NX Linux NIC Driver
In-Reply-To: <20131004142938.4ab6c6aa@nehalam.linuxnetplumber.net>

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, October 05, 2013 3:00 AM
> To: Himanshu Madhani
> Cc: David Miller; netdev; Dept-NX Linux NIC Driver; Sucheta Chakraborty
> Subject: Re: [PATCH net-next 01/10] qlcnic: Print informational
> messages only once during driver load.
> 
> On Fri, 4 Oct 2013 14:30:48 -0400
> Himanshu Madhani <himanshu.madhani@qlogic.com> wrote:
> 
> > From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
> >
> > Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
> > Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
> > ---
> >  drivers/net/ethernet/qlogic/qlcnic/qlcnic.h        |  1 +
> >  .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c    | 12 -----------
> >  .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_vnic.c  | 25
> ++++++++++++++++++----
> >  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c   |  1 +
> >  4 files changed, 23 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> > b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> > index 81bf836..a3c4379 100644
> > --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> > +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> > @@ -1199,6 +1199,7 @@ struct qlcnic_npar_info {
> >  	u8	promisc_mode;
> >  	u8	offload_flags;
> >  	u8      pci_func;
> > +	u8      mac[ETH_ALEN];
> >  };
> >
> 
> >
> 
> There is a field in netdevice which should probably be used for this
> perm_addr.
> 

This information is logged from the management function in Virtual NIC mode.
Management function do not a have access to the netdev info structure of other functions.
So we cannot make the changes as suggested.

> And then this could be corrected:
> 
> static void qlcnic_dcb_get_perm_hw_addr(struct net_device *netdev, u8
> *addr) {
> 	memcpy(addr, netdev->dev_addr, netdev->addr_len); }

Will incorporate this change in "dcb cleanup patch".

Thanks,
Sucheta.

^ permalink raw reply

* Re: [PATCH] usbnet: smsc95xx: Add device tree input for MAC address
From: Dan Murphy @ 2013-10-10 12:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: Steve Glendinning, Network Development, Linux Kernel Mailing List,
	linux-usb, mugunthanvnm
In-Reply-To: <CACVXFVP-KgqcNTnk+JZsLc188pYk6iUXvW6=kPXNaurSuevGeA@mail.gmail.com>

On 10/10/2013 07:39 AM, Ming Lei wrote:
> On Thu, Oct 10, 2013 at 8:08 PM, Dan Murphy <dmurphy@ti.com> wrote:
>> You are correct I don't expect a match for PnP devices only devices that are hard wired.
>>
>> After thinking of it I should move the OF code below the EEPROM code as the EEPROM should take preference over the DT code.
>>
>> I will need to post V2 for that.
> Your patch still doesn't deal with two smsc95xx devices built in
> one board.

Is there a board that has 2 built in smsc devices?

This is all dependent on what the dev.id is of udev.

>
> The problem is a generic one, looks it is better to do it when OF supports
> discoverable bus.
>
>
> Thanks,


-- 
------------------
Dan Murphy

^ permalink raw reply

* [PATCH net] bridge: allow receiption on disabled port
From: Felix Fietkau @ 2013-10-10 12:52 UTC (permalink / raw)
  To: netdev; +Cc: stephen

When an ethernet device is enslaved to a bridge, and the bridge STP
detects loss of carrier (or operational state down), then normally
packet receiption is blocked.

This breaks control applications like WPA which maybe expecting to
receive packets to negotiate to bring link up. The bridge needs to
block forwarding packets from these disabled ports, but there is no
hard requirement to not allow local packet delivery.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
---
 net/bridge/br_input.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index a2fd37e..0a8a8cd 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -146,9 +146,11 @@ static int br_handle_local_finish(struct sk_buff *skb)
 	struct net_bridge_port *p = br_port_get_rcu(skb->dev);
 	u16 vid = 0;
 
-	br_vlan_get_tag(skb, &vid);
-	if (p->flags & BR_LEARNING)
-		br_fdb_update(p->br, p, eth_hdr(skb)->h_source, vid);
+	if (p->state != BR_STATE_DISABLED) {
+		br_vlan_get_tag(skb, &vid);
+		if (p->flags & BR_LEARNING)
+			br_fdb_update(p->br, p, eth_hdr(skb)->h_source, vid);
+	}
 	return 0;	 /* process further */
 }
 
@@ -218,6 +220,18 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 
 forward:
 	switch (p->state) {
+	case BR_STATE_DISABLED:
+		if (ether_addr_equal(p->br->dev->dev_addr, dest))
+			skb->pkt_type = PACKET_HOST;
+
+		if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
+			br_handle_local_finish))
+			break;
+
+		BR_INPUT_SKB_CB(skb)->brdev = p->br->dev;
+		br_pass_frame_up(skb);
+		break;
+
 	case BR_STATE_FORWARDING:
 		rhook = rcu_dereference(br_should_route_hook);
 		if (rhook) {
-- 
1.8.0.2

^ permalink raw reply related

* Re: [PATCH net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Neil Horman @ 2013-10-10 13:11 UTC (permalink / raw)
  To: Fan Du; +Cc: vyasevich, steffen.klassert, davem, netdev
In-Reply-To: <1381384296-1821-1-git-send-email-fan.du@windriver.com>

On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
> 
Shouldn't this be fixed in the xfrm code then?  E.g. check the device features
for SCTP checksum offloading and and skip the checksum during xfrm output if its
available?

Or am I missing something?
Neil

^ permalink raw reply

* [PATCH] l2tp: must disable bh before calling l2tp_xmit_skb()
From: Eric Dumazet @ 2013-10-10 13:30 UTC (permalink / raw)
  To: François Cachereul; +Cc: James Chapman, David S. Miller, netdev
In-Reply-To: <5256592E.3080600@alphalink.fr>

From: Eric Dumazet <edumazet@google.com>

François Cachereul made a very nice bug report and suspected
the bh_lock_sock() / bh_unlok_sock() pair used in l2tp_xmit_skb() from
process context was not good.

This problem was added by commit
("l2tp: Fix locking in l2tp_core.c").

l2tp_eth_dev_xmit() runs from BH context, so we must disable BH
from other l2tp_xmit_skb() users.


[  452.060011] BUG: soft lockup - CPU#1 stuck for 23s! [accel-pppd:6662]
[  452.061757] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox 
ppp_generic slhc ipv6 ext3 mbcache jbd virtio_balloon xfs exportfs dm_mod 
virtio_blk ata_generic virtio_net floppy ata_piix libata virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan]
[  452.064012] CPU 1
[  452.080015] BUG: soft lockup - CPU#2 stuck for 23s! [accel-pppd:6643]
[  452.080015] CPU 2
[  452.080015]
[  452.080015] Pid: 6643, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs
[  452.080015] RIP: 0010:[<ffffffff81059f6c>]  [<ffffffff81059f6c>] do_raw_spin_lock+0x17/0x1f
[  452.080015] RSP: 0018:ffff88007125fc18  EFLAGS: 00000293
[  452.080015] RAX: 000000000000aba9 RBX: ffffffff811d0703 RCX: 0000000000000000
[  452.080015] RDX: 00000000000000ab RSI: ffff8800711f6896 RDI: ffff8800745c8110
[  452.080015] RBP: ffff88007125fc18 R08: 0000000000000020 R09: 0000000000000000
[  452.080015] R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000286
[  452.080015] R13: 0000000000000020 R14: 0000000000000240 R15: 0000000000000000
[  452.080015] FS:  00007fdc0cc24700(0000) GS:ffff8800b6f00000(0000) knlGS:0000000000000000
[  452.080015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  452.080015] CR2: 00007fdb054899b8 CR3: 0000000074404000 CR4: 00000000000006a0
[  452.080015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  452.080015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  452.080015] Process accel-pppd (pid: 6643, threadinfo ffff88007125e000, task ffff8800b27e6dd0)
[  452.080015] Stack:
[  452.080015]  ffff88007125fc28 ffffffff81256559 ffff88007125fc98 ffffffffa01b2bd1
[  452.080015]  ffff88007125fc58 000000000000000c 00000000029490d0 0000009c71dbe25e
[  452.080015]  000000000000005c 000000080000000e 0000000000000000 ffff880071170600
[  452.080015] Call Trace:
[  452.080015]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.080015]  [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
[  452.080015]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.080015]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.080015]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.080015]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.080015]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.080015]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.080015]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.080015]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.080015]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
[  452.080015] Code: 81 48 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3
[  452.080015] Call Trace:
[  452.080015]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.080015]  [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
[  452.080015]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.080015]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.080015]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.080015]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.080015]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.080015]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.080015]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.080015]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.080015]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
[  452.064012]
[  452.064012] Pid: 6662, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs
[  452.064012] RIP: 0010:[<ffffffff81059f6e>]  [<ffffffff81059f6e>] do_raw_spin_lock+0x19/0x1f
[  452.064012] RSP: 0018:ffff8800b6e83ba0  EFLAGS: 00000297
[  452.064012] RAX: 000000000000aaa9 RBX: ffff8800b6e83b40 RCX: 0000000000000002
[  452.064012] RDX: 00000000000000aa RSI: 000000000000000a RDI: ffff8800745c8110
[  452.064012] RBP: ffff8800b6e83ba0 R08: 000000000000c802 R09: 000000000000001c
[  452.064012] R10: ffff880071096c4e R11: 0000000000000006 R12: ffff8800b6e83b18
[  452.064012] R13: ffffffff8125d51e R14: ffff8800b6e83ba0 R15: ffff880072a589c0
[  452.064012] FS:  00007fdc0b81e700(0000) GS:ffff8800b6e80000(0000) knlGS:0000000000000000
[  452.064012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  452.064012] CR2: 0000000000625208 CR3: 0000000074404000 CR4: 00000000000006a0
[  452.064012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  452.064012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  452.064012] Process accel-pppd (pid: 6662, threadinfo ffff88007129a000, task ffff8800744f7410)
[  452.064012] Stack:
[  452.064012]  ffff8800b6e83bb0 ffffffff81256559 ffff8800b6e83bc0 ffffffff8121c64a
[  452.064012]  ffff8800b6e83bf0 ffffffff8121ec7a ffff880072a589c0 ffff880071096c62
[  452.064012]  0000000000000011 ffffffff81430024 ffff8800b6e83c80 ffffffff8121f276
[  452.064012] Call Trace:
[  452.064012]  <IRQ>
[  452.064012]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8121c64a>] spin_lock+0x9/0xb
[  452.064012]  [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269
[  452.064012]  [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae
[  452.064012]  [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0
[  452.064012]  [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c
[  452.064012]  [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5
[  452.064012]  [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84
[  452.064012]  [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3
[  452.064012]  [<ffffffff811fe78f>] ip_rcv+0x210/0x269
[  452.064012]  [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb
[  452.064012]  [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7
[  452.064012]  [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e
[  452.064012]  [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b
[  452.064012]  [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net]
[  452.064012]  [<ffffffff811d9417>] net_rx_action+0x73/0x184
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8
[  452.064012]  [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12
[  452.064012]  [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8125e0ac>] call_softirq+0x1c/0x26
[  452.064012]  [<ffffffff81003587>] do_softirq+0x45/0x82
[  452.064012]  [<ffffffff81034667>] irq_exit+0x42/0x9c
[  452.064012]  [<ffffffff8125e146>] do_IRQ+0x8e/0xa5
[  452.064012]  [<ffffffff8125676e>] common_interrupt+0x6e/0x6e
[  452.064012]  <EOI>
[  452.064012]  [<ffffffff810b82a1>] ? kfree+0x8a/0xa3
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.064012]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.064012]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.064012]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.064012]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.064012]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.064012]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.064012]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.064012]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
[  452.064012] Code: 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 8a 07 <eb> f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 55 48
[  452.064012] Call Trace:
[  452.064012]  <IRQ>  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8121c64a>] spin_lock+0x9/0xb
[  452.064012]  [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269
[  452.064012]  [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae
[  452.064012]  [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0
[  452.064012]  [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c
[  452.064012]  [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5
[  452.064012]  [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84
[  452.064012]  [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3
[  452.064012]  [<ffffffff811fe78f>] ip_rcv+0x210/0x269
[  452.064012]  [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb
[  452.064012]  [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7
[  452.064012]  [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e
[  452.064012]  [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b
[  452.064012]  [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net]
[  452.064012]  [<ffffffff811d9417>] net_rx_action+0x73/0x184
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8
[  452.064012]  [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12
[  452.064012]  [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10
[  452.064012]  [<ffffffff8125e0ac>] call_softirq+0x1c/0x26
[  452.064012]  [<ffffffff81003587>] do_softirq+0x45/0x82
[  452.064012]  [<ffffffff81034667>] irq_exit+0x42/0x9c
[  452.064012]  [<ffffffff8125e146>] do_IRQ+0x8e/0xa5
[  452.064012]  [<ffffffff8125676e>] common_interrupt+0x6e/0x6e
[  452.064012]  <EOI>  [<ffffffff810b82a1>] ? kfree+0x8a/0xa3
[  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core]
[  452.064012]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
[  452.064012]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
[  452.064012]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
[  452.064012]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
[  452.064012]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
[  452.064012]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
[  452.064012]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
[  452.064012]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
[  452.064012]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b


Reported-by: François Cachereul <f.cachereul@alphalink.fr>
Tested-by: François Cachereul <f.cachereul@alphalink.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
---
 net/l2tp/l2tp_ppp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index f0a7ada..ffda81e 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -353,7 +353,9 @@ static int pppol2tp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msgh
 		goto error_put_sess_tun;
 	}
 
+	local_bh_disable();
 	l2tp_xmit_skb(session, skb, session->hdr_len);
+	local_bh_enable();
 
 	sock_put(ps->tunnel_sock);
 	sock_put(sk);
@@ -422,7 +424,9 @@ static int pppol2tp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
 	skb->data[0] = ppph[0];
 	skb->data[1] = ppph[1];
 
+	local_bh_disable();
 	l2tp_xmit_skb(session, skb, session->hdr_len);
+	local_bh_enable();
 
 	sock_put(sk_tun);
 	sock_put(sk);

^ permalink raw reply related

* LOAN OFFER.
From: tenger @ 2013-10-10 12:53 UTC (permalink / raw)
  To: Recipients

Apply for loan at 3% interest rate. Contact us with your requirement via Email; tenger.g@mail.com, for more information. Try us and be convinced.

^ permalink raw reply

* Re: [PATCH net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-10 14:11 UTC (permalink / raw)
  To: Fan Du, nhorman, steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1381384296-1821-1-git-send-email-fan.du@windriver.com>

On 10/10/2013 01:51 AM, Fan Du wrote:
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
>
> And I saw another point in this part of code, when IPsec is not armed,
> sctp communication is good, however setting setting CHECKSUM_PARTIAL will
> make xfrm_output compute dummy checksum values which will be overwritten by
> hardware lately.
>
> So this patch try to solve above two issues together.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
> note:
>    igb/ixgbe hardware is not handy on my side, so just build test only.
>
> ---
>   net/sctp/output.c |   27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 0ac3a65..f0b9cc5 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
>   	atomic_inc(&sk->sk_wmem_alloc);
>   }
>
> +static int is_xfrm_armed(struct dst_entry *dst)
> +{
> +#ifdef CONFIG_XFRM
> +	/* If dst->xfrm is valid, this skb needs to be transformed */
> +	return dst->xfrm != NULL;
> +#else
> +	return 0;
> +#endif
> +}
> +
>   /* All packets are sent to the network through this function from
>    * sctp_outq_tail().
>    *
> @@ -536,20 +546,21 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>   	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>   	 */
>   	if (!sctp_checksum_disable) {
> -		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
> +		if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
> +			is_xfrm_armed(dst)) {
> +
>   			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>
>   			/* 3) Put the resultant value into the checksum field in the
>   			 *    common header, and leave the rest of the bits unchanged.
>   			 */
>   			sh->checksum = sctp_end_cksum(crc32);
> -		} else {
> -			/* no need to seed pseudo checksum for SCTP */
> -			nskb->ip_summed = CHECKSUM_PARTIAL;
> -			nskb->csum_start = (skb_transport_header(nskb) -
> -			                    nskb->head);
> -			nskb->csum_offset = offsetof(struct sctphdr, checksum);
> -		}
> +		} else
> +			/* Mark skb as CHECKSUM_UNNECESSARY to let hardware compute
> +			 * the checksum, and also avoid xfrm_output to do unceccessary
> +			 * checksum.
> +			 */
> +			nskb->ip_summed = CHECKSUM_UNNECESSARY;
>   	}

In addition to what Niel said, the use of CHECKSUM_UNNECESSARY is
incorrect as it will cause the nic to not compute the checksum.
The checksum offload depends on the use of CHECKSUM_PARTIAL.

-vlad


>
>   	/* IP layer ECN support
>

^ permalink raw reply

* LOAN OFFER.
From: tenger @ 2013-10-10 12:53 UTC (permalink / raw)
  To: Recipients

Apply for loan at 3% interest rate. Contact us with your requirement via Email; tenger.g@mail.com, for more information. Try us and be convinced.

^ permalink raw reply

* [PATCH net-next v3 0/5] xen-netback: IPv6 offload support
From: Paul Durrant @ 2013-10-10 15:25 UTC (permalink / raw)
  To: xen-devel, netdev

This patch series adds support for checksum and large packet offloads into
xen-netback.
Testing has mainly been done using the Microsoft network hardware
certification suite running in Server 2008R2 VMs with Citrix PV frontends.

v2:
- Fixed Wei's email address in Cc lines

v3:
- Responded to Wei's comments:
 - netif.h now updated with comments and a definition of
   XEN_NETIF_GSO_TYPE_NONE.
 - limited number of pullups
- Responded to Annie's comments:
 - New GSO_BIT macro

^ permalink raw reply

* [PATCH net-next v3 1/5] xen-netback: add support for IPv6 checksum offload to guest
From: Paul Durrant @ 2013-10-10 15:25 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-1-git-send-email-paul.durrant@citrix.com>

Check xenstore flag feature-ipv6-csum-offload to determine if a
guest is happy to accept IPv6 packets with only partial checksum.
Also check analogous feature-ip-csum-offload to determine if a
guest is happy to accept IPv4 packets with only partial checksum
as a replacement for a negated feature-no-csum-offload value and
add a comment to deprecate use of feature-no-csum-offload.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 ++-
 drivers/net/xen-netback/interface.c |   10 +++++++---
 drivers/net/xen-netback/xenbus.c    |   24 +++++++++++++++++++++++-
 include/xen/interface/io/netif.h    |   10 ++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..b4a9a3c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -153,7 +153,8 @@ struct xenvif {
 	u8 can_sg:1;
 	u8 gso:1;
 	u8 gso_prefix:1;
-	u8 csum:1;
+	u8 ip_csum:1;
+	u8 ipv6_csum:1;
 
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..8e92783 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -216,8 +216,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_SG;
 	if (!vif->gso && !vif->gso_prefix)
 		features &= ~NETIF_F_TSO;
-	if (!vif->csum)
+	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
+	if (!vif->ipv6_csum)
+		features &= ~NETIF_F_IPV6_CSUM;
 
 	return features;
 }
@@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
-	vif->csum = 1;
+	vif->ip_csum = 1;
 	vif->dev = dev;
 
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
@@ -316,7 +318,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_timeout.expires = jiffies;
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
-	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
+	dev->hw_features = NETIF_F_SG |
+		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+		NETIF_F_TSO;
 	dev->features = dev->hw_features;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 1b08d87..99343ca 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -571,10 +571,32 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->gso_prefix = !!val;
 
+	/* Before feature-ipv6-csum-offload was introduced, checksum offload
+	 * was turned on by a zero value in (or lack of)
+	 * feature-no-csum-offload. Therefore, for compatibility, assume
+	 * that if feature-ip-csum-offload is missing that IP (v4) offload
+	 * is turned on. If this is not the case then the flag will be
+	 * cleared when we subsequently sample feature-no-csum-offload.
+	 */
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-offload",
+			 "%d", &val) < 0)
+		val = 1;
+	vif->ip_csum = !!val;
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
+			 "%d", &val) < 0)
+		val = 0;
+	vif->ipv6_csum = !!val;
+
+	/* This is deprecated. Frontends should set IP v4 and v6 checksum
+	 * offload using feature-ip-csum-offload and
+	 * feature-ipv6-csum-offload respectively.
+	 */
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->csum = !val;
+	if (val)
+		vif->ip_csum = 0;
 
 	/* Map the shared frame, irq etc. */
 	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index eb262e3..d9fb44739 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -51,6 +51,16 @@
  */
 
 /*
+ * "feature-no-csum-offload" was used to turn off IPv4 TCP/UDP checksum
+ * offload but is now deprecated. Two new feature flags should now be used
+ * to control checksum offload:
+ * "feature-ip-csum-offload" should be used to turn IPv4 TCP/UDP checksum
+ * offload on or off. If it is missing then the feature is assumed to be on.
+ * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP checksum
+ * offload on or off. If it is missing then the feature is assumed to be off.
+ */
+
+/*
  * This is the 'wire' format for packets:
  *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
  * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v3 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM
From: Paul Durrant @ 2013-10-10 15:25 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-1-git-send-email-paul.durrant@citrix.com>

There is no mechanism to insist that a guest always generates a packet
with good checksum (at least for IPv4) so we must handle checksum
offloading from the guest and hence should set NETIF_F_RXCSUM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/interface.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 8e92783..cb0d8ea 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -321,7 +321,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_TSO;
-	dev->features = dev->hw_features;
+	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v3 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Paul Durrant @ 2013-10-10 15:25 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-1-git-send-email-paul.durrant@citrix.com>

This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
if the frontend passes an extra segment with the new type
XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |   11 ++++++++---
 drivers/net/xen-netback/xenbus.c  |    7 +++++++
 include/xen/interface/io/netif.h  |   10 +++++++++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4c12030..afc055c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1096,15 +1096,20 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 		return -EINVAL;
 	}
 
-	/* Currently only TCPv4 S.O. is supported. */
-	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+	switch (gso->u.gso.type) {
+	case XEN_NETIF_GSO_TYPE_TCPV4:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		break;
+	case XEN_NETIF_GSO_TYPE_TCPV6:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		break;
+	default:
 		netdev_err(vif->dev, "Bad GSO type %d.\n", gso->u.gso.type);
 		xenvif_fatal_tx_err(vif);
 		return -EINVAL;
 	}
 
 	skb_shinfo(skb)->gso_size = gso->u.gso.size;
-	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
 
 	/* Header must be checked, and gso_segs computed. */
 	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 9c9b37d..7e4dcc9 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -105,6 +105,13 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv6",
+				    "%d", sg);
+		if (err) {
+			message = "writing feature-gso-tcpv6";
+			goto abort_transaction;
+		}
+
 		/* We support partial checksum setup for IPv6 packets */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-ipv6-csum-offload",
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index d9fb44739..d7dd8d7 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -61,6 +61,13 @@
  */
 
 /*
+ * "feature-gso-tcpv4" and "feature-gso-tcpv6" advertise the capability to
+ * handle large TCP packets (in IPv4 or IPv6 form respectively). Neither
+ * frontends nor backends are assumed to be capable unless the flags are
+ * present.
+ */
+
+/*
  * This is the 'wire' format for packets:
  *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
  * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
@@ -105,8 +112,9 @@ struct xen_netif_tx_request {
 #define _XEN_NETIF_EXTRA_FLAG_MORE	(0)
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
-/* GSO types - only TCPv4 currently supported. */
+/* GSO types */
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
+#define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
 /*
  * This structure needs to fit within both netif_tx_request and
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-10 15:25 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-1-git-send-email-paul.durrant@citrix.com>

This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    9 +++++--
 drivers/net/xen-netback/interface.c |    6 +++--
 drivers/net/xen-netback/netback.c   |   48 +++++++++++++++++++++++++++--------
 drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
 include/xen/interface/io/netif.h    |    1 +
 5 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index b4a9a3c..55b8dec 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -87,9 +87,13 @@ struct pending_tx_info {
 struct xenvif_rx_meta {
 	int id;
 	int size;
+	int gso_type;
 	int gso_size;
 };
 
+#define GSO_BIT(type) \
+	(1 << XEN_NETIF_GSO_TYPE_ ## type)
+
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
@@ -150,9 +154,10 @@ struct xenvif {
 	u8               fe_dev_addr[6];
 
 	/* Frontend feature information. */
+	int gso_mask;
+	int gso_prefix_mask;
+
 	u8 can_sg:1;
-	u8 gso:1;
-	u8 gso_prefix:1;
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index cb0d8ea..e4aa267 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -214,8 +214,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 
 	if (!vif->can_sg)
 		features &= ~NETIF_F_SG;
-	if (!vif->gso && !vif->gso_prefix)
+	if (~(vif->gso_mask | vif->gso_prefix_mask) & GSO_BIT(TCPV4))
 		features &= ~NETIF_F_TSO;
+	if (~(vif->gso_mask | vif->gso_prefix_mask) & GSO_BIT(TCPV6))
+		features &= ~NETIF_F_TSO6;
 	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
 	if (!vif->ipv6_csum)
@@ -320,7 +322,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-		NETIF_F_TSO;
+		NETIF_F_TSO | NETIF_F_TSO6;
 	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index afc055c..bb31921 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -140,7 +140,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	/* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
-	if (vif->can_sg || vif->gso || vif->gso_prefix)
+	if (vif->can_sg || vif->gso_mask || vif->gso_prefix_mask)
 		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
 
 	return max;
@@ -312,6 +312,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 
 	meta = npo->meta + npo->meta_prod++;
+	meta->gso_type = 0;
 	meta->gso_size = 0;
 	meta->size = 0;
 	meta->id = req->id;
@@ -334,6 +335,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
 	unsigned long bytes;
+	int gso_type;
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
@@ -392,7 +394,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		else
+			gso_type = XEN_NETIF_GSO_TYPE_NONE;
+
+		if (*head && ((1 << gso_type) & vif->gso_mask))
 			vif->rx.req_cons++;
 
 		*head = 0; /* There must be something in this buffer now. */
@@ -423,14 +432,28 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	unsigned char *data;
 	int head = 1;
 	int old_meta_prod;
+	int gso_type;
+	int gso_size;
 
 	old_meta_prod = npo->meta_prod;
 
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else {
+		gso_type = 0;
+		gso_size = 0;
+	}
+
 	/* Set up a GSO prefix descriptor, if necessary */
-	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
+	if ((1 << skb_shinfo(skb)->gso_type) & vif->gso_prefix_mask) {
 		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 		meta = npo->meta + npo->meta_prod++;
-		meta->gso_size = skb_shinfo(skb)->gso_size;
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
 		meta->size = 0;
 		meta->id = req->id;
 	}
@@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 	meta = npo->meta + npo->meta_prod++;
 
-	if (!vif->gso_prefix)
-		meta->gso_size = skb_shinfo(skb)->gso_size;
-	else
+	if ((1 << gso_type) & vif->gso_mask) {
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
+	} else {
+		meta->gso_type = 0;
 		meta->gso_size = 0;
+	}
 
 	meta->size = 0;
 	meta->id = req->id;
@@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 		vif = netdev_priv(skb->dev);
 
-		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_prefix_mask) {
 			resp = RING_GET_RESPONSE(&vif->rx,
 						 vif->rx.rsp_prod_pvt++);
 
@@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
 					vif->meta[npo.meta_cons].size,
 					flags);
 
-		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_mask) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
 				RING_GET_RESPONSE(&vif->rx,
@@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
+			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
 			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
-			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 7e4dcc9..6a005f1 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -577,15 +577,40 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->can_sg = !!val;
 
+	vif->gso_mask = 0;
+	vif->gso_prefix_mask = 0;
+
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso = !!val;
+	if (val)
+		vif->gso_mask |= GSO_BIT(TCPV4);
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso_prefix = !!val;
+	if (val)
+		vif->gso_prefix_mask |= GSO_BIT(TCPV4);
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_mask |= GSO_BIT(TCPV6);
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6-prefix",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_prefix_mask |= GSO_BIT(TCPV6);
+
+	if (vif->gso_mask & vif->gso_prefix_mask) {
+		xenbus_dev_fatal(dev, err,
+				 "%s: gso and gso prefix flags are not "
+				 "mutually exclusive",
+				 dev->otherend);
+		return -EOPNOTSUPP;
+	}
 
 	/* Before feature-ipv6-csum-offload was introduced, checksum offload
 	 * was turned on by a zero value in (or lack of)
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index d7dd8d7..41674bc 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -113,6 +113,7 @@ struct xen_netif_tx_request {
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
 /* GSO types */
+#define XEN_NETIF_GSO_TYPE_NONE         (0)
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
 #define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v3 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-10 15:25 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-1-git-send-email-paul.durrant@citrix.com>

For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |  243 +++++++++++++++++++++++++++++++------
 drivers/net/xen-netback/xenbus.c  |    9 ++
 2 files changed, 213 insertions(+), 39 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..4c12030 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -109,15 +109,10 @@ static inline unsigned long idx_to_kaddr(struct xenvif *vif,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
 }
 
-/*
- * This is the amount of packet we copy rather than map, so that the
- * guest can't fiddle with the contents of the headers while we do
- * packet processing on them (netfilter, routing, etc).
+/* This is a miniumum size for the linear area to avoid lots of
+ * calls to __pskb_pull_tail() as we set up checksum offsets.
  */
-#define PKT_PROT_LEN    (ETH_HLEN + \
-			 VLAN_HLEN + \
-			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
-			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
+#define PKT_PROT_LEN 128
 
 static u16 frag_get_pending_idx(skb_frag_t *frag)
 {
@@ -1118,61 +1113,74 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 	return 0;
 }
 
-static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
+{
+	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
+		/* If we need to pullup then pullup to the max, so we
+		 * won't need to do it again.
+		 */
+		int target = min_t(int, skb->len, MAX_TCP_HEADER);
+		__pskb_pull_tail(skb, target - skb_headlen(skb));
+	}
+}
+
+static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
+			     int recalculate_partial_csum)
 {
-	struct iphdr *iph;
+	struct iphdr *iph = (void *)skb->data;
+	unsigned int header_size;
+	unsigned int off;
 	int err = -EPROTO;
-	int recalculate_partial_csum = 0;
 
-	/*
-	 * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
-	 * peers can fail to set NETRXF_csum_blank when sending a GSO
-	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
-	 * recalculate the partial checksum.
-	 */
-	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
-		vif->rx_gso_checksum_fixup++;
-		skb->ip_summed = CHECKSUM_PARTIAL;
-		recalculate_partial_csum = 1;
-	}
+	off = sizeof(struct iphdr);
 
-	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
-	if (skb->ip_summed != CHECKSUM_PARTIAL)
-		return 0;
+	header_size = skb->network_header + off + MAX_IPOPTLEN;
+	maybe_pull_tail(skb, header_size);
 
-	if (skb->protocol != htons(ETH_P_IP))
-		goto out;
+	off = iph->ihl * 4;
 
-	iph = (void *)skb->data;
 	switch (iph->protocol) {
 	case IPPROTO_TCP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct tcphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr);
+			maybe_pull_tail(skb, header_size);
+
 			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct udphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
 			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_UDP, 0);
 		}
 		break;
 	default:
 		if (net_ratelimit())
 			netdev_err(vif->dev,
-				   "Attempting to checksum a non-TCP/UDP packet, dropping a protocol %d packet\n",
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
 				   iph->protocol);
 		goto out;
 	}
@@ -1183,6 +1191,168 @@ out:
 	return err;
 }
 
+static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
+			       int recalculate_partial_csum)
+{
+	int err = -EPROTO;
+	struct ipv6hdr *ipv6h = (void *)skb->data;
+	u8 nexthdr;
+	unsigned int header_size;
+	unsigned int off;
+	bool done;
+	bool found;
+
+	done = false;
+	found = false;
+
+	off = sizeof(struct ipv6hdr);
+
+	header_size = skb->network_header + off;
+	maybe_pull_tail(skb, header_size);
+
+	nexthdr = ipv6h->nexthdr;
+
+	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
+	       !done) {
+		switch (nexthdr) {
+		case IPPROTO_FRAGMENT: {
+			struct frag_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct frag_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += 8;
+			break;
+		}
+		case IPPROTO_DSTOPTS:
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING: {
+			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct ipv6_opt_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += ipv6_optlen(hp);
+			break;
+		}
+		case IPPROTO_AH: {
+			struct ip_auth_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct ip_auth_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += (hp->hdrlen+2)<<2;
+			break;
+		}
+		case IPPROTO_TCP:
+		case IPPROTO_UDP:
+			found = true;
+			/* Fall through */
+		default:
+			done = true;
+			break;
+		}
+	}
+
+	if (!done) {
+		if (net_ratelimit())
+			netdev_err(vif->dev, "Failed to parse packet header\n");
+		goto out;
+	}
+
+	if (!found) {
+		if (net_ratelimit())
+			netdev_err(vif->dev,
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
+				   nexthdr);
+		goto out;
+	}
+
+	switch (nexthdr) {
+	case IPPROTO_TCP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct tcphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr);
+			maybe_pull_tail(skb, header_size);
+
+			tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_TCP, 0);
+		}
+		break;
+	case IPPROTO_UDP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct udphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
+			udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_UDP, 0);
+		}
+		break;
+	}
+
+	err = 0;
+
+out:
+	return err;
+}
+
+static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+{
+	int err = -EPROTO;
+	int recalculate_partial_csum = 0;
+
+	/* A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
+	 * peers can fail to set NETRXF_csum_blank when sending a GSO
+	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
+	 * recalculate the partial checksum.
+	 */
+	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
+		vif->rx_gso_checksum_fixup++;
+		skb->ip_summed = CHECKSUM_PARTIAL;
+		recalculate_partial_csum = 1;
+	}
+
+	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
+	if (skb->ip_summed != CHECKSUM_PARTIAL)
+		return 0;
+
+	if (skb->protocol == htons(ETH_P_IP))
+		err = checksum_setup_ip(vif, skb, recalculate_partial_csum);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		err = checksum_setup_ipv6(vif, skb, recalculate_partial_csum);
+
+	return err;
+}
+
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
 	unsigned long now = jiffies;
@@ -1428,12 +1598,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 
 		xenvif_fill_frags(vif, skb);
 
-		/*
-		 * If the initial fragment was < PKT_PROT_LEN then
-		 * pull through some bytes from the other fragments to
-		 * increase the linear region to PKT_PROT_LEN bytes.
-		 */
-		if (skb_headlen(skb) < PKT_PROT_LEN && skb_is_nonlinear(skb)) {
+		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
 		}
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 99343ca..9c9b37d 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -105,6 +105,15 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		/* We support partial checksum setup for IPv6 packets */
+		err = xenbus_printf(xbt, dev->nodename,
+				    "feature-ipv6-csum-offload",
+				    "%d", 1);
+		if (err) {
+			message = "writing feature-ipv6-csum-offload";
+			goto abort_transaction;
+		}
+
 		/* We support rx-copy path. */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-rx-copy", "%d", 1);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH RFC 50/77] mlx5: Update MSI/MSI-X interrupts enablement code
From: Eli Cohen @ 2013-10-10 15:29 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Bjorn Helgaas, Ralf Baechle,
	Michael Ellerman, Benjamin Herrenschmidt, Martin Schwidefsky,
	Ingo Molnar, Tejun Heo, Dan Williams, Andy King, Jon Mason,
	Matt Porter, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux390-tA70FqPdS9bQT0dZR+AlfA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-ide-u79uwXL29TY76Z2rM5mHXA, iss_storagedev-VXdhtT5mjnY,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-driver-h88ZbnxC6KDQT0dZR+AlfA, Solarflare linux maintainers,
	"VMware, Inc." <pv-dr
In-Reply-To: <20131003194837.GA27636-hdGaXg0bp3uRXgp2RCiI5R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>

On Thu, Oct 03, 2013 at 09:48:39PM +0200, Alexander Gordeev wrote:
> 
> pci_enable_msix() may fail, but it can not return a positive number.
>

That is true according to the current logic but the comment on top of
pci_enable_msix() still says:
"A return of < 0 indicates a failure.  Or a return of > 0 indicates
that driver request is exceeding the number of irqs or MSI-X vectors
available"

So you're counting on an implementation that may change in the future.
I think leaving the code as it is now is safer.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] tcp: tcp_transmit_skb() optimizations
From: Eric Dumazet @ 2013-10-10 15:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, MF Nowlan, Yuchung Cheng, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

1) We need to take a timestamp only for skb that should be cloned.

Other skbs are not in write queue and no rtt estimation is done on them.

2) the unlikely() hint is wrong for receivers (they send pure ACK)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: MF Nowlan <fitz@cs.yale.edu>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_output.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index faec813..c40f608 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -850,15 +850,15 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 
 	BUG_ON(!skb || !tcp_skb_pcount(skb));
 
-	/* If congestion control is doing timestamping, we must
-	 * take such a timestamp before we potentially clone/copy.
-	 */
-	if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
-		__net_timestamp(skb);
-
-	if (likely(clone_it)) {
+	if (clone_it) {
 		const struct sk_buff *fclone = skb + 1;
 
+		/* If congestion control is doing timestamping, we must
+		 * take such a timestamp before we potentially clone/copy.
+		 */
+		if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
+			__net_timestamp(skb);
+
 		if (unlikely(skb->fclone == SKB_FCLONE_ORIG &&
 			     fclone->fclone == SKB_FCLONE_CLONE))
 			NET_INC_STATS_BH(sock_net(sk),

^ permalink raw reply related

* Re: [PATCH net-next] tcp: tcp_transmit_skb() optimizations
From: Yuchung Cheng @ 2013-10-10 15:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, MF Nowlan, Neal Cardwell
In-Reply-To: <1381419780.4971.84.camel@edumazet-glaptop.roam.corp.google.com>

On Thu, Oct 10, 2013 at 8:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> 1) We need to take a timestamp only for skb that should be cloned.
>
> Other skbs are not in write queue and no rtt estimation is done on them.
>
> 2) the unlikely() hint is wrong for receivers (they send pure ACK)
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: MF Nowlan <fitz@cs.yale.edu>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
> ---
>  net/ipv4/tcp_output.c |   14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index faec813..c40f608 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -850,15 +850,15 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
>
>         BUG_ON(!skb || !tcp_skb_pcount(skb));
>
> -       /* If congestion control is doing timestamping, we must
> -        * take such a timestamp before we potentially clone/copy.
> -        */
> -       if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
> -               __net_timestamp(skb);
> -
> -       if (likely(clone_it)) {
> +       if (clone_it) {
>                 const struct sk_buff *fclone = skb + 1;
>
> +               /* If congestion control is doing timestamping, we must
> +                * take such a timestamp before we potentially clone/copy.
> +                */
> +               if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
> +                       __net_timestamp(skb);
> +
>                 if (unlikely(skb->fclone == SKB_FCLONE_ORIG &&
>                              fclone->fclone == SKB_FCLONE_CLONE))
>                         NET_INC_STATS_BH(sock_net(sk),
>
>

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: H. Peter Anvin @ 2013-10-10 16:28 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Benjamin Herrenschmidt, linux-kernel, Bjorn Helgaas, Ralf Baechle,
	Michael Ellerman, Martin Schwidefsky, Ingo Molnar, Tejun Heo,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel,
	linux-dr
In-Reply-To: <20131010101704.GC11874@dhcp-26-207.brq.redhat.com>

On 10/10/2013 03:17 AM, Alexander Gordeev wrote:
> On Wed, Oct 09, 2013 at 03:24:08PM +1100, Benjamin Herrenschmidt wrote:
> 
> Ok, this suggestion sounded in one or another form by several people.
> What about name it pcim_enable_msix_range() and wrap in couple more
> helpers to complete an API:
> 
> int pcim_enable_msix_range(pdev, msix_entries, nvec, minvec);
> 	<0 - error code
> 	>0 - number of MSIs allocated, where minvec >= result <= nvec
> 
> int pcim_enable_msix(pdev, msix_entries, nvec);
> 	<0 - error code
> 	>0 - number of MSIs allocated, where 1 >= result <= nvec 
> 
> int pcim_enable_msix_exact(pdev, msix_entries, nvec);
> 	<0 - error code
> 	>0 - number of MSIs allocated, where result == nvec
> 
> The latter's return value seems odd, but I can not help to make
> it consistent with the first two.
> 

Is there a reason for the wrappers, as opposed to just specifying either
1 or nvec as the minimum?

	-hpa

^ permalink raw reply

* [PATCH] ipv6: Initialize ip6_tnl.hlen in gre tunnel even if no route is found
From: Oussama Ghorbel @ 2013-10-10 17:50 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Hannes Frederic Sowa, Oussama Ghorbel

The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the
outgoing interface, so it would be better to initialize it even when no
route is found, otherwise its value will be zero.
While I'm not sure if this could happen in real life, but doing that
will avoid to call the skb_push function with a zero in ip6gre_header
function.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
---
 net/ipv6/ip6_gre.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 90747f1..8e4d42e 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -978,6 +978,7 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu)
 		if (t->parms.o_flags&GRE_SEQ)
 			addend += 4;
 	}
+	t->hlen = addend;
 
 	if (p->flags & IP6_TNL_F_CAP_XMIT) {
 		int strict = (ipv6_addr_type(&p->raddr) &
@@ -1004,8 +1005,6 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu)
 		}
 		ip6_rt_put(rt);
 	}
-
-	t->hlen = addend;
 }
 
 static int ip6gre_tnl_change(struct ip6_tnl *t,
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] carl9170: fix leaks at failure path in carl9170_usb_probe()
From: John W. Linville @ 2013-10-10 17:59 UTC (permalink / raw)
  To: Alexey Khoroshilov
  Cc: Fabio Estevam, Christian Lamparter, linux-wireless,
	netdev@vger.kernel.org, linux-kernel, ldv-project
In-Reply-To: <52466624.4040106@ispras.ru>

On Sat, Sep 28, 2013 at 01:16:20AM -0400, Alexey Khoroshilov wrote:
> On 28.09.2013 00:17, Fabio Estevam wrote:
> >On Sat, Sep 28, 2013 at 12:51 AM, Alexey Khoroshilov
> ><khoroshilov@ispras.ru> wrote:
> >
> >>-       return request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
> >>+       err = request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
> >>                 &ar->udev->dev, GFP_KERNEL, ar, carl9170_usb_firmware_step2);
> >>+       if (err) {
> >>+               usb_put_dev(udev);
> >>+               usb_put_dev(udev);
> >You are doing the same free twice.
> Yes, because it was get twice.
> >I guess you meant to also free: usb_put_dev(ar->udev)
> udev and ar->udev are equal, so technically the patch is correct.
> 
> I agree that there is some inconsistency, but I would prefer to fix
> it at usb_get_dev() side with a comment about reasons for the double
> get.

What is the reason for the double get?

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Alexander Gordeev @ 2013-10-10 18:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Benjamin Herrenschmidt, linux-kernel, Bjorn Helgaas, Ralf Baechle,
	Michael Ellerman, Martin Schwidefsky, Ingo Molnar, Tejun Heo,
	Dan Williams, Andy King, Jon Mason, Matt Porter, linux-pci,
	linux-mips, linuxppc-dev, linux390, linux-s390, x86, linux-ide,
	iss_storagedev, linux-nvme, linux-rdma, netdev, e1000-devel,
	linux-dr
In-Reply-To: <5256D5AB.4050105@zytor.com>

On Thu, Oct 10, 2013 at 09:28:27AM -0700, H. Peter Anvin wrote:
> On 10/10/2013 03:17 AM, Alexander Gordeev wrote:
> > On Wed, Oct 09, 2013 at 03:24:08PM +1100, Benjamin Herrenschmidt wrote:
> > 
> > Ok, this suggestion sounded in one or another form by several people.
> > What about name it pcim_enable_msix_range() and wrap in couple more
> > helpers to complete an API:
> > 
> > int pcim_enable_msix_range(pdev, msix_entries, nvec, minvec);
> > 	<0 - error code
> > 	>0 - number of MSIs allocated, where minvec >= result <= nvec
> > 
> > int pcim_enable_msix(pdev, msix_entries, nvec);
> > 	<0 - error code
> > 	>0 - number of MSIs allocated, where 1 >= result <= nvec 
> > 
> > int pcim_enable_msix_exact(pdev, msix_entries, nvec);
> > 	<0 - error code
> > 	>0 - number of MSIs allocated, where result == nvec
> > 
> > The latter's return value seems odd, but I can not help to make
> > it consistent with the first two.
> > 
> 
> Is there a reason for the wrappers, as opposed to just specifying either
> 1 or nvec as the minimum?

The wrappers are more handy IMO.

I.e. can imagine people start struggling to figure out what minvec to provide:
1 or 0? Why 1? Oh.. okay.. Or should we tolerate 0 (as opposite to -ERANGE)?

Well, do not know.. pcim_enable_msix(pdev, msix_entries, nvec, nvec) is 
less readable for me than just pcim_enable_msix_exact(pdev, msix_entries,
nvec).

> 	-hpa

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH] carl9170: fix leaks at failure path in carl9170_usb_probe()
From: Christian Lamparter @ 2013-10-10 18:17 UTC (permalink / raw)
  To: John W. Linville
  Cc: Alexey Khoroshilov, Fabio Estevam, linux-wireless,
	netdev@vger.kernel.org, linux-kernel, ldv-project
In-Reply-To: <20131010175952.GG2691@tuxdriver.com>

On Thursday, October 10, 2013 01:59:52 PM John W. Linville wrote:
> On Sat, Sep 28, 2013 at 01:16:20AM -0400, Alexey Khoroshilov wrote:
> > On 28.09.2013 00:17, Fabio Estevam wrote:
> > >On Sat, Sep 28, 2013 at 12:51 AM, Alexey Khoroshilov
> > ><khoroshilov@ispras.ru> wrote:
> > >
> > >>-       return request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
> > >>+       err = request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
> > >>                 &ar->udev->dev, GFP_KERNEL, ar, carl9170_usb_firmware_step2);
> > >>+       if (err) {
> > >>+               usb_put_dev(udev);
> > >>+               usb_put_dev(udev);
> > >You are doing the same free twice.
> > Yes, because it was get twice.
> > >I guess you meant to also free: usb_put_dev(ar->udev)
> > udev and ar->udev are equal, so technically the patch is correct.
> > 
> > I agree that there is some inconsistency, but I would prefer to fix
> > it at usb_get_dev() side with a comment about reasons for the double
> > get.
> 
> What is the reason for the double get?

The idea is:
One (extra) reference protects the asynchronous firmware loader callback
from disappearing "udev".

Regards,
Chr
 
 

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Pravin Shelar @ 2013-10-10 18:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jesse Gross, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <CAMEtUuzP1b_VNqP1texkrQ7kvCRxEukUoRz=Dj-pMX9gsO9AWw@mail.gmail.com>

On Wed, Oct 9, 2013 at 11:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Wed, Oct 9, 2013 at 11:07 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Wed, Oct 9, 2013 at 9:11 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> On Wed, Oct 9, 2013 at 8:02 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>>>> On Tue, Oct 8, 2013 at 8:07 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>>> The combination of two commits
>>>>>
>>>>> commit 8e4e1713e4
>>>>> ("openvswitch: Simplify datapath locking.")
>>>>>
>>>>> and
>>>>>
>>>>> commit 2537b4dd0a
>>>>> ("openvswitch:: link upper device for port devices")
>>>>>
>>>>> introduced a bug where upper_dev wasn't unlinked upon
>>>>> netdev_unregister notification
>>>>>
>>>>> The following steps:
>>>>>
>>>>>   modprobe openvswitch
>>>>>   ovs-dpctl add-dp test
>>>>>   ip tuntap add dev tap1 mode tap
>>>>>   ovs-dpctl add-if test tap1
>>>>>   ip tuntap del dev tap1 mode tap
>>>>>
>>>>> are causing multiple warnings:
>>>>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>>>>> index c323567..e9380bd 100644
>>>>> --- a/net/openvswitch/dp_notify.c
>>>>> +++ b/net/openvswitch/dp_notify.c
>>>>> @@ -88,6 +88,11 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
>>>>>                 return NOTIFY_DONE;
>>>>>
>>>>>         if (event == NETDEV_UNREGISTER) {
>>>>> +               /* rx_handler_unregister and upper_dev_unlink immediately */
>>>>> +               if (dev->reg_state == NETREG_UNREGISTERING)
>>>>> +                       ovs_netdev_unlink_dev(vport);
>>>>> +
>>>>
>>>> Rather than doing vport destroy here, we can just unlink upper device
>>>> and let workq do rest of work.
>>>
>>> isn't it what it's doing?
>>
>> I meant just call netdev_upper_dev_unlink() here in event handler and
>> rest of vport destroy can be done in workq.
>
> netdev_upper_dev_unlink() without netdev_rx_handler_unregister() ?!
> that's dangerous.
why is it dangerous? ovs already had ref to net-device.

> If that is acceptable, then there was no reason to link them in the first place.
>
I do not see any harm in linking device hierarchy for ovs.

> notifier asks to unregister. imo the only acceptable deferred task
> here is to delay dev_put,
> since ovs structures are still referring to it.

^ permalink raw reply

* Re: [PATCH net-next] inet: rename ir_loc_port to ir_num
From: David Miller @ 2013-10-10 18:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381388677.4971.65.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 10 Oct 2013 00:04:37 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> In commit 634fb979e8f ("inet: includes a sock_common in request_sock")
> I forgot that the two ports in sock_common do not have same byte order :
> 
> skc_dport is __be16 (network order), but skc_num is __u16 (host order)
> 
> So sparse complains because ir_loc_port (mapped into skc_num) is
> considered as __u16 while it should be __be16
> 
> Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num),
> and perform appropriate htons/ntohs conversions.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Wu Fengguang <fengguang.wu@intel.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Paul E. McKenney @ 2013-10-10 19:05 UTC (permalink / raw)
  To: Eric Dumazet, Josh Triplett, linux-kernel, mingo, laijs, dipankar,
	akpm, mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells,
	edumazet, darren, fweisbec, sbw, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev
In-Reply-To: <20131010020422.GB24368@order.stressinduktion.org>

On Thu, Oct 10, 2013 at 04:04:22AM +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 09, 2013 at 05:28:33PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 09, 2013 at 05:12:40PM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-09 at 16:40 -0700, Josh Triplett wrote:
> > > 
> > > > that.  Constructs like list_del_rcu are much clearer, and not
> > > > open-coded.  Open-coding synchronization code is almost always a Bad
> > > > Idea.
> > > 
> > > OK, so you think there is synchronization code.
> > > 
> > > I will shut up then, no need to waste time.
> > 
> > As you said earlier, we should at least get rid of the memory barrier
> > as long as we are changing the code.
> 
> Interesting thread!
> 
> Sorry to chime in and asking a question:
> 
> Why do we need an ACCESS_ONCE here if rcu_assign_pointer can do without one?
> In other words I wonder why rcu_assign_pointer is not a static inline function
> to use the sequence point in argument evaluation (if I remember correctly this
> also holds for inline functions) to not allow something like this:
> 
> E.g. we want to publish which lock to take first to prevent an ABBA problem
> (extreme example):
> 
> rcu_assign_pointer(lockptr, min(lptr1, lptr2));
> 
> Couldn't a compiler spill the lockptr memory location as a temporary buffer
> if the compiler is under register pressure? (yes, this seems unlikely if we
> flushed out most registers to memory because of the barrier, but still... ;) )
> 
> This seems to be also the case if we publish a multi-dereferencing pointers
> e.g. ptr->ptr->ptr.

IIRC, sequence points only confine volatile accesses.  For non-volatile
accesses, the so-called "as-if rule" allows compiler writers to do some
surprisingly global reordering.

The reason that rcu_assign_pointer() isn't an inline function is because
it needs to be type-generic, in other words, it needs to be OK to use
it on any type of pointers as long as the C types of the two pointers
match (the sparse types can vary a bit).

One of the reasons for wanting a volatile cast in rcu_assign_pointer() is
to prevent compiler mischief such as you described in your last two
paragraphs.  That said, it would take a very brave compiler to pull
a pointer-referenced memory location into a register and keep it there.
Unfortunately, increasing compiler bravery seems to be a solid long-term
trend.

							Thanx, Paul

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox