Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: 8% performance improved by change tap interact with kernel stack
From: Qin Chuanyu @ 2014-01-29  7:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Peter Klausler
In-Reply-To: <1390920560.28432.8.camel@edumazet-glaptop2.roam.corp.google.com>

On 2014/1/28 22:49, Eric Dumazet wrote:
> On Tue, 2014-01-28 at 16:14 +0800, Qin Chuanyu wrote:
>> according perf test result，I found that there are 5%-8% cpu cost on
>> softirq by use netif_rx_ni called in tun_get_user.
>>
>> so I changed the function which cause skb transmitted more quickly.
>> from
>> 	tun_get_user	->
>> 		 netif_rx_ni(skb);
>> to
>> 	tun_get_user	->
>> 		rcu_read_lock_bh();
>> 		netif_receive_skb(skb);
>> 		rcu_read_unlock_bh();
>
> No idea why you use rcu here ?

In my first version, I forgot to add lock when called netif_receive_skb
then I met a dad spinlock when using tcpdump.

tcpdump receive skb in netif_receive_skb but also in dev_queue_xmit.
and I have notice dev_queue_xmit add rcu_read_lock_bh before 
transmitting skb, and this lock avoid race between softirq and transmit 
thread.
	/* Disable soft irqs for various locks below. Also
	 * stops preemption for RCU.
	 */
	rcu_read_lock_bh();
Now I try to xmit skb in vhost thread, so I did the same thing.

^ permalink raw reply

* Re: [PATCH v2] tun: add device name(iff) field to proc fdinfo entry
From: David Miller @ 2014-01-29  7:19 UTC (permalink / raw)
  To: yamato; +Cc: netdev
In-Reply-To: <1390973614-3929-1-git-send-email-yamato@redhat.com>

From: Masatake YAMATO <yamato@redhat.com>
Date: Wed, 29 Jan 2014 14:33:34 +0900

>     (v2): indent iff just like the other fdinfo fields are.
>           Suggested by David Miller <davem@davemloft.net>.

Please fix the build warnings generated by your changes:

drivers/net/tun.c: In function ‘tun_chr_show_fdinfo’:
drivers/net/tun.c:2237:6: warning: unused variable ‘ret’ [-Wunused-variable]

^ permalink raw reply

* Re: [PATCH stable 3.11+] can: bcm: add skb destructor
From: Andre Naujoks @ 2014-01-29  7:40 UTC (permalink / raw)
  To: Eric Dumazet, Oliver Hartkopp; +Cc: David Miller, Linux Netdev List
In-Reply-To: <1390953066.28432.26.camel@edumazet-glaptop2.roam.corp.google.com>

On 29.01.2014 00:51, schrieb Eric Dumazet:
> On Tue, 2014-01-28 at 23:49 +0100, Oliver Hartkopp wrote:
> 
>> The sbk->sk reference is used to make sure in AF_CAN to identify the
>> originating socket (if any) to not deliver echoed CAN frames to the
>> originating application.
>>
>> See first check in raw_rcv() in net/can/raw.c
> 
> Nice, this is buggy.
> 
>>
>>>
>>> Instead of understanding the issue, it seems this patch exactly shutup
>>> the useful warning.
>>
>> I would have been happy to have this a warning and not a bug as you
>> implemented it.
>>
> 
> Yes, I understand you are not happy of our work to discover CAN bugs.
> 
>> I don't need this warning as I'm using skb_alloc in the cases where CAN frames
>> are generated autonomously. They are not triggered through a direct socket
>> write operation nor do they need to take case about any sock wmem.
>>
>> The useful warning/bug might be nice for common use cases. I'm using plain
>> skb_alloc here for fire-and-forget skbs.
>>
>> So I need to shutup the useful warning or revert the two commits at
>> skb_orphan(). I would prefer the latter.
>>
>>>
>>> If you set skb->sk, then you expect a future reader of skb->sk to be
>>> 100% sure the socket did not disappear.
>>
>> It's a fire-and-forget skb. I don't need to care if the socket disappears.
>> If it disappears no new traffic is generated. That's enough.
>>
>>>
>>> I do not see this explained in the changelog.
>>>
>>
>> I hopefully was able to make it more clearly.
>> See Documentation/networking/can.txt
>>
> 
> 
> Just take a reference on the damn socket, and we do not have to worry.
> 
> bcm_tx_send() suffers from the same problem
> 
> can_send() is buggy as well :
> 
> newskb->sk = skb->sk; // line 293
> 
> dev_queue_xmit() can queue a packet a long time, and some packet qdisc
> even look at skb->sk.
> 
> So this is really wrong to assume only net/can can assume things about
> skb->sk, and not care of net/core or net/sched users.
> 
> I absolutely disagree with your patch. You need quite different _real_
> fixes.

Hi.

Even if this is a bug in the CAN BCM implementation. Your "fix" just
enabled a user space application to shut down any machine with a kernel
containing the BUG_ON patch.

If the BCM implementation is broken, it needs to be fixed. But this is a
regression that causes Kernel crashes, where there were none before.

As I am using the BCM, I would rather have a flawed but working
implementation than an unusable one. If the empty socket destructor
enables the system to work again, then I would like to see it. But, like
Oliver, I would prefer the BUG_ON patch reverted at least until this
issue is resolved.

Regards
  Andre

> 
> 
> 

^ permalink raw reply

* Re: 8% performance improved by change tap interact with kernel stack
From: Qin Chuanyu @ 2014-01-29  7:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, Anthony Liguori, KVM list, netdev
In-Reply-To: <20140128103325.GA17794@redhat.com>

On 2014/1/28 18:33, Michael S. Tsirkin wrote:

>>> Nice.
>>> What about CPU utilization?
>>> It's trivially easy to speed up networking by
>>> burning up a lot of CPU so we must make sure it's
>>> not doing that.
>>> And I think we should see some tests with TCP as well, and
>>> try several message sizes.
>>>
>>>
>> Yes, by burning up more CPU we could get better performance easily.
>> So I have bond vhost thread and interrupt of nic on CPU1 while testing.
>>
>> modified before, the idle of CPU1 is 0%-1% while testing.
>> and after modify, the idle of CPU1 is 2%-3% while testing
>>
>> TCP also could gain from this, but pps is less than UDP, so I think
>> the improvement would be not so obviously.
>
> Still need to test this doesn't regress but overall looks convincing to me.
> Could you send a patch, accompanied by testing results for
> throughput latency and cpu utilization for tcp and udp
> with various message sizes?
>
> Thanks!
>
because of spring festival of china, the test result would be given two 
week later.
throughput would be test by netperf, and latency would be tested by 
qperf. Is that OK?


^ permalink raw reply

* Re: [PATCH 0/2] DT: net: davinci_emac: couple more properties actually optional
From: David Miller @ 2014-01-29  7:43 UTC (permalink / raw)
  To: sergei.shtylyov
  Cc: netdev, robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak,
	rob, devicetree, linux-doc, davinci-linux-open-source
In-Reply-To: <201401280245.35745.sergei.shtylyov@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Tue, 28 Jan 2014 02:45:34 +0300

>    Though described as required, couple more properties in the DaVinci EMAC
> binding are actually optional, as the driver will happily function without them.
> The patchset is against DaveM's 'net.git' tree this time.
> 
> [1/2] DT: net: davinci_emac: "ti,davinci-rmii-en" property is actually optional
> [2/2] DT: net: davinci_emac: "ti,davinci-no-bd-ram" property is actually optional

Series applied with the "has/have" thing fixed.

Thanks.

^ permalink raw reply

* [PATCH v3] tun: add device name(iff) field to proc fdinfo entry
From: Masatake YAMATO @ 2014-01-29  7:43 UTC (permalink / raw)
  To: davem; +Cc: netdev, yamato
In-Reply-To: <20140128.231947.1827551289841225189.davem@davemloft.net>

A file descriptor opened for /dev/net/tun and a tun device are
connected with ioctl.  Though understanding the connection is
important for trouble shooting, no way is given to a user to know
the connected device for a given file descriptor at userland.

This patch adds a new fdinfo field for the device name connected to
a file descriptor opened for /dev/net/tun.

Here is an example of the field:

    # lsof | grep tun
    qemu-syst 4565         qemu   25u      CHR             10,200       0t138      12921 /dev/net/tun
    ...

    # cat /proc/4565/fdinfo/25
    pos:	138
    flags:	0104002
    iff:	vnet0

    # ip link show dev vnet0
    8: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...

changelog:

    v2: indent iff just like the other fdinfo fields are.
    v3: remove unused variable.
        Both are suggested by David Miller <davem@davemloft.net>.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 drivers/net/tun.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bcf01af..44c4db8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -69,6 +69,7 @@
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
 #include <net/sock.h>
+#include <linux/seq_file.h>
 
 #include <asm/uaccess.h>
 
@@ -2228,6 +2229,27 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 	return 0;
 }
 
+#ifdef CONFIG_PROC_FS
+static int tun_chr_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct tun_struct *tun;
+	struct ifreq ifr;
+
+	memset(&ifr, 0, sizeof(ifr));
+
+	rtnl_lock();
+	tun = tun_get(f);
+	if (tun)
+		tun_get_iff(current->nsproxy->net_ns, tun, &ifr);
+	rtnl_unlock();
+
+	if (tun)
+		tun_put(tun);
+
+	return seq_printf(m, "iff:\t%s\n", ifr.ifr_name);
+}
+#endif
+
 static const struct file_operations tun_fops = {
 	.owner	= THIS_MODULE,
 	.llseek = no_llseek,
@@ -2242,7 +2264,10 @@ static const struct file_operations tun_fops = {
 #endif
 	.open	= tun_chr_open,
 	.release = tun_chr_close,
-	.fasync = tun_chr_fasync
+	.fasync = tun_chr_fasync,
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo = tun_chr_show_fdinfo,
+#endif
 };
 
 static struct miscdevice tun_miscdev = {
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH stable 3.11+] can: bcm: add skb destructor
From: David Miller @ 2014-01-29  7:46 UTC (permalink / raw)
  To: nautsch2; +Cc: eric.dumazet, socketcan, netdev
In-Reply-To: <52E8B053.2030808@gmail.com>

From: Andre Naujoks <nautsch2@gmail.com>
Date: Wed, 29 Jan 2014 08:40:03 +0100

> Even if this is a bug in the CAN BCM implementation. Your "fix" just
> enabled a user space application to shut down any machine with a kernel
> containing the BUG_ON patch.

Rather, he detected a potential stray pointer reference to freed data
that was caused by the CAN code which would difficult if not
impossible to detect otherwise.

That's even more dangerous, and you should be thanking him.

^ permalink raw reply

* Re: [PATCH v3] tun: add device name(iff) field to proc fdinfo entry
From: David Miller @ 2014-01-29  7:47 UTC (permalink / raw)
  To: yamato; +Cc: netdev
In-Reply-To: <1390981411-16146-1-git-send-email-yamato@redhat.com>

From: Masatake YAMATO <yamato@redhat.com>
Date: Wed, 29 Jan 2014 16:43:31 +0900

> A file descriptor opened for /dev/net/tun and a tun device are
> connected with ioctl.  Though understanding the connection is
> important for trouble shooting, no way is given to a user to know
> the connected device for a given file descriptor at userland.
> 
> This patch adds a new fdinfo field for the device name connected to
> a file descriptor opened for /dev/net/tun.
> 
> Here is an example of the field:
...
> Signed-off-by: Masatake YAMATO <yamato@redhat.com>

This looks a lot better, applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] bonding: fix locking in bond_loadbalance_arp_mon()
From: David Miller @ 2014-01-29  7:48 UTC (permalink / raw)
  To: dingtianhong; +Cc: fubar, vfalico, netdev, andy
In-Reply-To: <52E728A5.5070701@huawei.com>

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Tue, 28 Jan 2014 11:48:53 +0800

> The commit 1d3ee88ae0d605629bf369
> (bonding: add netlink attributes to slave link dev)
> has add rtmsg_ifinfo() in bond_set_active_slave() and
> bond_set_backup_slave(), so the two function need to
> called in RTNL lock, but bond_loadbalance_arp_mon()
> only calling these functions in RCU, warning message
> will occurs.
> 
> fix this by add a new function bond_slave_state_change(),
> which will reset the slave's state after slave link check,
> so remove the bond_set_xxx_slave() from the cycle and only
> record the slave_state_changed, this will call the new
> function to set all slaves to new state in RTNL later.
> 
> Cc: Jay Vosburgh <fubar@us.ibm.com>
> Cc: Veaceslav Falico <vfalico@redhat.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 0/2] [BUG FIXES - 3.10.27] sit: More backports
From: David Miller @ 2014-01-29  7:52 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, netdev, nicolas.dichtel, stable, williams, lclaudio,
	jkacur
In-Reply-To: <20140128205756.074448668@goodmis.org>

From: Steven Rostedt <rostedt@goodmis.org>
Date: Tue, 28 Jan 2014 15:57:56 -0500

> At Red Hat we base our real-time kernel off of 3.10.27 and do lots of
> stress testing on that kernel. This has discovered some bugs that we
> can hit with the vanilla 3.10.27 kernel (no -rt patches applied).
> 
> I sent out a bug fix that can cause a crash with the current 3.10.27
> when you add and then remove the sit module. That patch is obsoleted by
> these patches, as that patch was not enough.
> 
> A previous patch that was backported:
> 
>   Upstream commit 205983c43700ac3a81e7625273a3fa83cd2759b5
>   sit: allow to use rtnl ops on fb tunnel
> 
> Had a depenency on commit 5e6700b3bf98 ("sit: add support of x-netns")
> which was not backported. The dependency was only on part of that
> commit which is what I backported.
> 
> The other upstream commit 9434266f2c645d4fcf62a03a8e36ad8075e37943
> sit: fix use after free of fb_tunnel_dev
> 
> fixes another bug we encountered, it also fixes the 3.10.27 bug
> where removing the sit module cause the crash. This is the patch
> that obsoletes my previous patch.

Greg, these have my blessing, please apply these to 3.10 -stable,
thanks a lot!

^ permalink raw reply

* Re: 8% performance improved by change tap interact with kernel stack
From: Michael S. Tsirkin @ 2014-01-29  7:56 UTC (permalink / raw)
  To: Qin Chuanyu; +Cc: jasowang, Anthony Liguori, KVM list, netdev
In-Reply-To: <52E8B0A4.6030806@huawei.com>

On Wed, Jan 29, 2014 at 03:41:24PM +0800, Qin Chuanyu wrote:
> On 2014/1/28 18:33, Michael S. Tsirkin wrote:
> 
> >>>Nice.
> >>>What about CPU utilization?
> >>>It's trivially easy to speed up networking by
> >>>burning up a lot of CPU so we must make sure it's
> >>>not doing that.
> >>>And I think we should see some tests with TCP as well, and
> >>>try several message sizes.
> >>>
> >>>
> >>Yes, by burning up more CPU we could get better performance easily.
> >>So I have bond vhost thread and interrupt of nic on CPU1 while testing.
> >>
> >>modified before, the idle of CPU1 is 0%-1% while testing.
> >>and after modify, the idle of CPU1 is 2%-3% while testing
> >>
> >>TCP also could gain from this, but pps is less than UDP, so I think
> >>the improvement would be not so obviously.
> >
> >Still need to test this doesn't regress but overall looks convincing to me.
> >Could you send a patch, accompanied by testing results for
> >throughput latency and cpu utilization for tcp and udp
> >with various message sizes?
> >
> >Thanks!
> >
> because of spring festival of china, the test result would be given
> two week later.
> throughput would be test by netperf, and latency would be tested by
> qperf. Is that OK?

For testing - sounds good. Run vmstat in host to check host cpu utilization.
Pls don't forget to address all issues raised in this thread and in
the old one Eric mentioned:
http://patchwork.ozlabs.org/patch/52963/
either address in code or address in commit log why it doesn't apply
anymore.

-- 
MST

^ permalink raw reply

* Question about skb_segment()
From: Arnaud Ebalard @ 2014-01-29  8:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Eric Dumazet, Willy Tarreau, netdev

Hi Herbert,

I wonder if you could share some knowledge on the behaviour of
skb_segment() as it is implemented in 3.13.0: when passed a GSO
packet to be segmented, can the skb result have skb->next == NULL?

One would expect the number of segments of the result to usually match
tcp_skb_pcount() of passed packet and hence having skb->next != NULL: 
AFAICT, this what usually happens when skb_segment() is called in
tcp_gso_segment() but I noticed some cases where the number of segments
(length of chained sk_buff) is lower than the expected value and also
have two backtraces resulting from a skb delivered with a NULL skb->next. 

The whole thread leading to my question is here, in case you want to
take a look:

 http://thread.gmane.org/gmane.linux.network/301587

Thanks in advance,

a+

^ permalink raw reply

* [PATCH net] qeth: fix build of s390 allmodconfig
From: Frank Blaschka @ 2014-01-29  8:23 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

From: Eugene Crosser <Eugene.Crosser@ru.ibm.com>

commit 949efd1c "qeth: bridgeport support - basic control" broke
s390 allmodconfig. This patch fixes this by eliminating one of the
cross-module calls, and by making two other calls via function
pointers in the qeth_discipline structure.

Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 drivers/s390/net/qeth_core.h      |    5 +---
 drivers/s390/net/qeth_core_main.c |   18 +++++-----------
 drivers/s390/net/qeth_l2_main.c   |   41 ++++++++++++++++++++++++++++++++------
 drivers/s390/net/qeth_l3_main.c   |    8 +++++++
 4 files changed, 51 insertions(+), 21 deletions(-)

--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -738,6 +738,8 @@ struct qeth_discipline {
 	int (*freeze)(struct ccwgroup_device *);
 	int (*thaw) (struct ccwgroup_device *);
 	int (*restore)(struct ccwgroup_device *);
+	int (*control_event_handler)(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd);
 };
 
 struct qeth_vlan_vid {
@@ -948,13 +950,10 @@ int qeth_query_card_info(struct qeth_car
 int qeth_send_control_data(struct qeth_card *, int, struct qeth_cmd_buffer *,
 	int (*reply_cb)(struct qeth_card *, struct qeth_reply*, unsigned long),
 	void *reply_param);
-void qeth_bridge_state_change(struct qeth_card *card, struct qeth_ipa_cmd *cmd);
-void qeth_bridgeport_query_support(struct qeth_card *card);
 int qeth_bridgeport_query_ports(struct qeth_card *card,
 	enum qeth_sbp_roles *role, enum qeth_sbp_states *state);
 int qeth_bridgeport_setrole(struct qeth_card *card, enum qeth_sbp_roles role);
 int qeth_bridgeport_an_set(struct qeth_card *card, int enable);
-void qeth_bridge_host_event(struct qeth_card *card, struct qeth_ipa_cmd *cmd);
 int qeth_get_priority_queue(struct qeth_card *, struct sk_buff *, int, int);
 int qeth_get_elements_no(struct qeth_card *, struct sk_buff *, int);
 int qeth_get_elements_for_frags(struct sk_buff *);
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -69,6 +69,7 @@ static void qeth_clear_output_buffer(str
 static int qeth_init_qdio_out_buf(struct qeth_qdio_out_q *, int);
 
 struct workqueue_struct *qeth_wq;
+EXPORT_SYMBOL_GPL(qeth_wq);
 
 static void qeth_close_dev_handler(struct work_struct *work)
 {
@@ -616,15 +617,12 @@ static struct qeth_ipa_cmd *qeth_check_i
 				qeth_schedule_recovery(card);
 				return NULL;
 			case IPA_CMD_SETBRIDGEPORT:
-				if (cmd->data.sbp.hdr.command_code ==
-					IPA_SBP_BRIDGE_PORT_STATE_CHANGE) {
-					qeth_bridge_state_change(card, cmd);
-					return NULL;
-				} else
-					return cmd;
 			case IPA_CMD_ADDRESS_CHANGE_NOTIF:
-				qeth_bridge_host_event(card, cmd);
-				return NULL;
+				if (card->discipline->control_event_handler
+								(card, cmd))
+					return cmd;
+				else
+					return NULL;
 			case IPA_CMD_MODCCID:
 				return cmd;
 			case IPA_CMD_REGISTER_LOCAL_ADDR:
@@ -4973,10 +4971,6 @@ retriable:
 		qeth_query_setadapterparms(card);
 	if (qeth_adp_supported(card, IPA_SETADP_SET_DIAG_ASSIST))
 		qeth_query_setdiagass(card);
-	qeth_bridgeport_query_support(card);
-	if (card->options.sbp.supported_funcs)
-		dev_info(&card->gdev->dev,
-		"The device represents a HiperSockets Bridge Capable Port\n");
 	return 0;
 out:
 	dev_warn(&card->gdev->dev, "The qeth device driver failed to recover "
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -33,6 +33,11 @@ static int qeth_l2_send_setdelmac(struct
 					    unsigned long));
 static void qeth_l2_set_multicast_list(struct net_device *);
 static int qeth_l2_recover(void *);
+static void qeth_bridgeport_query_support(struct qeth_card *card);
+static void qeth_bridge_state_change(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd);
+static void qeth_bridge_host_event(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd);
 
 static int qeth_l2_do_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
@@ -989,6 +994,10 @@ static int __qeth_l2_set_online(struct c
 		rc = -ENODEV;
 		goto out_remove;
 	}
+	qeth_bridgeport_query_support(card);
+	if (card->options.sbp.supported_funcs)
+		dev_info(&card->gdev->dev,
+		"The device represents a HiperSockets Bridge Capable Port\n");
 	qeth_trace_features(card);
 
 	if (!card->dev && qeth_l2_setup_netdev(card)) {
@@ -1233,6 +1242,26 @@ out:
 	return rc;
 }
 
+/* Returns zero if the command is successfully "consumed" */
+static int qeth_l2_control_event(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd)
+{
+	switch (cmd->hdr.command) {
+	case IPA_CMD_SETBRIDGEPORT:
+		if (cmd->data.sbp.hdr.command_code ==
+				IPA_SBP_BRIDGE_PORT_STATE_CHANGE) {
+			qeth_bridge_state_change(card, cmd);
+			return 0;
+		} else
+			return 1;
+	case IPA_CMD_ADDRESS_CHANGE_NOTIF:
+		qeth_bridge_host_event(card, cmd);
+		return 0;
+	default:
+		return 1;
+	}
+}
+
 struct qeth_discipline qeth_l2_discipline = {
 	.start_poll = qeth_qdio_start_poll,
 	.input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
@@ -1246,6 +1275,7 @@ struct qeth_discipline qeth_l2_disciplin
 	.freeze = qeth_l2_pm_suspend,
 	.thaw = qeth_l2_pm_resume,
 	.restore = qeth_l2_pm_resume,
+	.control_event_handler = qeth_l2_control_event,
 };
 EXPORT_SYMBOL_GPL(qeth_l2_discipline);
 
@@ -1463,7 +1493,8 @@ static void qeth_bridge_state_change_wor
 	kfree(data);
 }
 
-void qeth_bridge_state_change(struct qeth_card *card, struct qeth_ipa_cmd *cmd)
+static void qeth_bridge_state_change(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd)
 {
 	struct qeth_sbp_state_change *qports =
 		 &cmd->data.sbp.data.state_change;
@@ -1488,7 +1519,6 @@ void qeth_bridge_state_change(struct qet
 			sizeof(struct qeth_sbp_state_change) + extrasize);
 	queue_work(qeth_wq, &data->worker);
 }
-EXPORT_SYMBOL(qeth_bridge_state_change);
 
 struct qeth_bridge_host_data {
 	struct work_struct worker;
@@ -1528,7 +1558,8 @@ static void qeth_bridge_host_event_worke
 	kfree(data);
 }
 
-void qeth_bridge_host_event(struct qeth_card *card, struct qeth_ipa_cmd *cmd)
+static void qeth_bridge_host_event(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd)
 {
 	struct qeth_ipacmd_addr_change *hostevs =
 		 &cmd->data.addrchange;
@@ -1560,7 +1591,6 @@ void qeth_bridge_host_event(struct qeth_
 			sizeof(struct qeth_ipacmd_addr_change) + extrasize);
 	queue_work(qeth_wq, &data->worker);
 }
-EXPORT_SYMBOL(qeth_bridge_host_event);
 
 /* SETBRIDGEPORT support; sending commands */
 
@@ -1683,7 +1713,7 @@ static int qeth_bridgeport_query_support
  * Sets bitmask of supported setbridgeport subfunctions in the qeth_card
  * strucutre: card->options.sbp.supported_funcs.
  */
-void qeth_bridgeport_query_support(struct qeth_card *card)
+static void qeth_bridgeport_query_support(struct qeth_card *card)
 {
 	struct qeth_cmd_buffer *iob;
 	struct qeth_ipa_cmd *cmd;
@@ -1709,7 +1739,6 @@ void qeth_bridgeport_query_support(struc
 	}
 	card->options.sbp.supported_funcs = cbctl.data.supported;
 }
-EXPORT_SYMBOL_GPL(qeth_bridgeport_query_support);
 
 static int qeth_bridgeport_query_ports_cb(struct qeth_card *card,
 	struct qeth_reply *reply, unsigned long data)
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3593,6 +3593,13 @@ out:
 	return rc;
 }
 
+/* Returns zero if the command is successfully "consumed" */
+static int qeth_l3_control_event(struct qeth_card *card,
+					struct qeth_ipa_cmd *cmd)
+{
+	return 1;
+}
+
 struct qeth_discipline qeth_l3_discipline = {
 	.start_poll = qeth_qdio_start_poll,
 	.input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
@@ -3606,6 +3613,7 @@ struct qeth_discipline qeth_l3_disciplin
 	.freeze = qeth_l3_pm_suspend,
 	.thaw = qeth_l3_pm_resume,
 	.restore = qeth_l3_pm_resume,
+	.control_event_handler = qeth_l3_control_event,
 };
 EXPORT_SYMBOL_GPL(qeth_l3_discipline);
 

^ permalink raw reply

* Re: [PATCH net] qeth: fix build of s390 allmodconfig
From: David Miller @ 2014-01-29  8:43 UTC (permalink / raw)
  To: blaschka; +Cc: netdev, linux-s390
In-Reply-To: <20140129082348.GA34455@tuxmaker.boeblingen.de.ibm.com>

From: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Date: Wed, 29 Jan 2014 09:23:48 +0100

> From: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
> 
> commit 949efd1c "qeth: bridgeport support - basic control" broke
> s390 allmodconfig. This patch fixes this by eliminating one of the
> cross-module calls, and by making two other calls via function
> pointers in the qeth_discipline structure.
> 
> Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH stable 3.11+] can: bcm: add skb destructor
From: Andre Naujoks @ 2014-01-29  8:47 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, socketcan, netdev
In-Reply-To: <20140128.234630.1768378245126172951.davem@davemloft.net>

On 29.01.2014 08:46, schrieb David Miller:
> From: Andre Naujoks <nautsch2@gmail.com>
> Date: Wed, 29 Jan 2014 08:40:03 +0100
> 
>> Even if this is a bug in the CAN BCM implementation. Your "fix" just
>> enabled a user space application to shut down any machine with a kernel
>> containing the BUG_ON patch.
> 
> Rather, he detected a potential stray pointer reference to freed data
> that was caused by the CAN code which would difficult if not
> impossible to detect otherwise.
> 
> That's even more dangerous, and you should be thanking him.

"potential" is the keyword here. But its a definite kernel crash as it
is right now with a standard use case for the BCM.

Don't get me wrong. If there are bugs in the code, they should be fixed,
but I don't think breaking a working (even if flawed) part of the kernel
is the right thing to do here.

Regards
  Andre

> 

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2014-01-29  8:55 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Several fixups, of note:

1) Fix unlock of not held spinlock in RXRPC code, from Alexey
   Khoroshilov.

2) Call pci_disable_device() from the correct shutdown path in
   bnx2x driver, from Yuval Mintz.

3) Fix qeth build on s390 for some configurations, from Eugene
   Crosser.

4) Cure locking bugs in bond_loadbalance_arp_mon(), from Ding
   Tianhong.

5) Must do netif_napi_add() before registering netdevice in sky2
   driver, from Stanislaw Gruszka.

6) Fix lost bug fix during merge due to code movement in ieee802154,
   noticed and fixed by the eagle eyed Stephen Rothwell.

7) Get rid of resource leak in xen-netfront driver, from Annie Li.

8) Bounds checks in qlcnic driver are off by one, from Manish Chopra.

9) TPROXY can leak sockets when TCP early demux is enabled, fix from
   Holger Eitzenberger.

Please pull, thanks a lot!

The following changes since commit 77d143de75812596a58d126606f42d1214e09dde:

  Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml (2014-01-26 11:06:16 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to c044dc2132d19d8c643cdd340f21afcec177c046:

  qeth: fix build of s390 allmodconfig (2014-01-29 00:43:33 -0800)

----------------------------------------------------------------
Alexey Khoroshilov (1):
      RxRPC: do not unlock unheld spinlock in rxrpc_connect_exclusive()

Annie Li (1):
      xen-netfront: fix resource leak in netfront

Dave Jones (2):
      i40e: Add missing braces to i40e_dcb_need_reconfig()
      llc: remove noisy WARN from llc_mac_hdr_init

David S. Miller (4):
      Merge branch 'bonding'
      Merge branch 'qlcnic'
      Merge tag 'rxrpc-20140126' of git://git.kernel.org/.../dhowells/linux-fs
      Merge branch 'DT'

Ding Tianhong (1):
      bonding: fix locking in bond_loadbalance_arp_mon()

Duan Jiong (1):
      net: gre: use icmp_hdr() to get inner ip header

Eugene Crosser (1):
      qeth: fix build of s390 allmodconfig

Florian Westphal (1):
      net: add and use skb_gso_transport_seglen()

Geert Uytterhoeven (1):
      net/apne: Remove unused variable ei_local

Haiyang Zhang (1):
      hyperv: Add support for physically discontinuous receive buffer

Hans de Goede (2):
      net: stmmac: Silence PTP init errors on hw without PTP
      net: stmmac: Log MAC address only once

Holger Eitzenberger (1):
      net: Fix memory leak if TPROXY used with TCP early demux

Manish Chopra (1):
      qlcnic: Correct off-by-one errors in bounds checks

Martin Schwenke (1):
      net: Document promote_secondaries

Masanari Iida (1):
      net: Fix warning on make htmldocs caused by skbuff.c

Masatake YAMATO (1):
      tun: add device name(iff) field to proc fdinfo entry

Neil Horman (1):
      AF_PACKET: Add documentation for queue mapping fanout mode

Rajesh Borundia (2):
      qlcnic: Fix initialization of vlan list.
      qlcnic: Fix tx timeout.

Sachin Kamat (1):
      net: ipv4: Use PTR_ERR_OR_ZERO

Sergei Shtylyov (2):
      DT: net: davinci_emac: "ti, davinci-rmii-en" property is actually optional
      DT: net: davinci_emac: "ti, davinci-no-bd-ram" property is actually optional

Shahed Shaikh (1):
      qlcnic: Fix loopback test failure

Stanislaw Gruszka (1):
      sky2: initialize napi before registering device

Stephen Rothwell (1):
      net: 6lowpan: fixup for code movement

Tim Smith (2):
      af_rxrpc: Avoid setting up double-free on checksum error
      af_rxrpc: Handle frames delivered from another VM

Veaceslav Falico (2):
      bonding: RCUify bond_ab_arp_probe
      bonding: restructure locking of bond_ab_arp_probe()

Yaniv Rosner (1):
      bnx2x: Fix generic option settings

Yuval Mintz (1):
      bnx2x: More Shutdown revisions

 Documentation/devicetree/bindings/net/davinci_emac.txt   |  4 +-
 Documentation/networking/ip-sysctl.txt                   |  6 +++
 Documentation/networking/packet_mmap.txt                 |  1 +
 drivers/hv/channel.c                                     | 14 +++---
 drivers/net/bonding/bond_main.c                          | 96 ++++++++++++++++++++++++------------------
 drivers/net/bonding/bonding.h                            | 13 ++++++
 drivers/net/ethernet/8390/apne.c                         |  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c      | 78 +++++++++++++++++-----------------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c         |  4 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c              |  3 +-
 drivers/net/ethernet/marvell/sky2.c                      |  4 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c           | 19 ++++++---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c         |  9 +---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 11 +++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c        |  6 +--
 drivers/net/hyperv/hyperv_net.h                          |  2 +-
 drivers/net/hyperv/netvsc.c                              |  7 +--
 drivers/net/tun.c                                        | 27 +++++++++++-
 drivers/net/xen-netfront.c                               | 88 ++++++++++++--------------------------
 drivers/s390/net/qeth_core.h                             |  5 +--
 drivers/s390/net/qeth_core_main.c                        | 18 +++-----
 drivers/s390/net/qeth_l2_main.c                          | 41 +++++++++++++++---
 drivers/s390/net/qeth_l3_main.c                          |  8 ++++
 include/linux/skbuff.h                                   |  1 +
 net/core/skbuff.c                                        | 27 +++++++++++-
 net/ieee802154/6lowpan_iphc.c                            |  2 +-
 net/ipv4/ip_gre.c                                        |  2 +-
 net/ipv4/ip_input.c                                      |  2 +-
 net/ipv4/ip_tunnel.c                                     |  3 +-
 net/ipv6/ip6_input.c                                     |  2 +-
 net/llc/llc_output.c                                     |  2 +-
 net/rxrpc/ar-connection.c                                |  2 +
 net/rxrpc/ar-recvmsg.c                                   |  7 ++-
 net/sched/sch_tbf.c                                      | 13 ++----
 34 files changed, 303 insertions(+), 225 deletions(-)

^ permalink raw reply

* [PATCH] xfrm: avoid creating temporary SA when there are no listeners
From: Horia Geanta @ 2014-01-29  9:12 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev

In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.

The sane thing to do is to make sure we have audience before
temporary SA allocation.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
---
Resending - initially posted as RFC:
http://www.spinics.net/lists/netdev/msg268454.html

Please apply.

 include/net/xfrm.h    | 15 +++++++++++++++
 net/key/af_key.c      | 20 ++++++++++++++++++++
 net/xfrm/xfrm_state.c | 31 +++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c  |  6 ++++++
 4 files changed, 72 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index cd7c46f..449a867 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -594,6 +594,7 @@ struct xfrm_mgr {
 					   const struct xfrm_migrate *m,
 					   int num_bundles,
 					   const struct xfrm_kmaddress *k);
+	bool			(*is_alive)(const struct km_event *c);
 };
 
 int xfrm_register_km(struct xfrm_mgr *km);
@@ -1646,6 +1647,20 @@ static inline int xfrm_aevent_is_on(struct net *net)
 	rcu_read_unlock();
 	return ret;
 }
+
+static inline int xfrm_acquire_is_on(struct net *net)
+{
+	struct sock *nlsk;
+	int ret = 0;
+
+	rcu_read_lock();
+	nlsk = rcu_dereference(net->xfrm.nlsk);
+	if (nlsk)
+		ret = netlink_has_listeners(nlsk, XFRMNLGRP_ACQUIRE);
+	rcu_read_unlock();
+
+	return ret;
+}
 #endif
 
 static inline int xfrm_alg_len(const struct xfrm_algo *alg)
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1a04c13..12eb0ad 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3059,6 +3059,25 @@ static u32 get_acqseq(void)
 	return res;
 }
 
+static bool pfkey_is_alive(const struct km_event *c)
+{
+	struct netns_pfkey *net_pfkey = net_generic(c->net, pfkey_net_id);
+	struct sock *sk;
+	struct hlist_node *node;
+	bool is_alive = false;
+
+	rcu_read_lock();
+	sk_for_each_rcu(sk, node, &net_pfkey->table) {
+		if (pfkey_sk(sk)->registered) {
+			is_alive = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return is_alive;
+}
+
 static int pfkey_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *xp)
 {
 	struct sk_buff *skb;
@@ -3784,6 +3803,7 @@ static struct xfrm_mgr pfkeyv2_mgr =
 	.new_mapping	= pfkey_send_new_mapping,
 	.notify_policy	= pfkey_send_policy_notify,
 	.migrate	= pfkey_send_migrate,
+	.is_alive	= pfkey_is_alive,
 };
 
 static int __net_init pfkey_net_init(struct net *net)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8d11d28..e79f376 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -161,6 +161,7 @@ static DEFINE_SPINLOCK(xfrm_state_gc_lock);
 int __xfrm_state_delete(struct xfrm_state *x);
 
 int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol);
+bool km_is_alive(const struct km_event *c);
 void km_state_expired(struct xfrm_state *x, int hard, u32 portid);
 
 static DEFINE_SPINLOCK(xfrm_type_lock);
@@ -788,6 +789,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	struct xfrm_state *best = NULL;
 	u32 mark = pol->mark.v & pol->mark.m;
 	unsigned short encap_family = tmpl->encap_family;
+	struct km_event c;
 
 	to_put = NULL;
 
@@ -832,6 +834,17 @@ found:
 			error = -EEXIST;
 			goto out;
 		}
+
+		c.net = net;
+		/* If the KMs have no listeners (yet...), avoid allocating an SA
+		 * for each and every packet - garbage collection might not
+		 * handle the flood.
+		 */
+		if (!km_is_alive(&c)) {
+			error = -ESRCH;
+			goto out;
+		}
+
 		x = xfrm_state_alloc(net);
 		if (x == NULL) {
 			error = -ENOMEM;
@@ -1793,6 +1806,24 @@ int km_report(struct net *net, u8 proto, struct xfrm_selector *sel, xfrm_address
 }
 EXPORT_SYMBOL(km_report);
 
+bool km_is_alive(const struct km_event *c)
+{
+	struct xfrm_mgr *km;
+	bool is_alive = false;
+
+	read_lock(&xfrm_km_lock);
+	list_for_each_entry(km, &xfrm_km_list, list) {
+		if (km->is_alive && km->is_alive(c)) {
+			is_alive = true;
+			break;
+		}
+	}
+	read_unlock(&xfrm_km_lock);
+
+	return is_alive;
+}
+EXPORT_SYMBOL(km_is_alive);
+
 int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen)
 {
 	int err;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3348566..b53a489 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2981,6 +2981,11 @@ static int xfrm_send_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
 	return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_MAPPING, GFP_ATOMIC);
 }
 
+static bool xfrm_is_alive(const struct km_event *c)
+{
+	return (bool)xfrm_acquire_is_on(c->net);
+}
+
 static struct xfrm_mgr netlink_mgr = {
 	.id		= "netlink",
 	.notify		= xfrm_send_state_notify,
@@ -2990,6 +2995,7 @@ static struct xfrm_mgr netlink_mgr = {
 	.report		= xfrm_send_report,
 	.migrate	= xfrm_send_migrate,
 	.new_mapping	= xfrm_send_mapping,
+	.is_alive	= xfrm_is_alive,
 };
 
 static int __net_init xfrm_user_net_init(struct net *net)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] xfrm: avoid creating temporary SA when there are no listeners
From: Horia Geantă @ 2014-01-29  9:17 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev
In-Reply-To: <1390986731-12861-1-git-send-email-horia.geanta@freescale.com>

On 1/29/2014 11:12 AM, Horia Geanta wrote:
> In the case when KMs have no listeners, km_query() will fail and
> temporary SAs are garbage collected immediately after their allocation.
> This causes strain on memory allocation, leading even to OOM since
> temporary SA alloc/free cycle is performed for every packet
> and garbage collection does not keep up the pace.
>
> The sane thing to do is to make sure we have audience before
> temporary SA allocation.
>
> Signed-off-by: Horia Geanta<horia.geanta@freescale.com>
> ---
> Resending - initially posted as RFC:
> http://www.spinics.net/lists/netdev/msg268454.html
>
> Please apply.

This is for ipsec-next, sorry for not mentioning in the first place.

Regards,
Horia

^ permalink raw reply

* ethtool 3.13 released
From: Ben Hutchings @ 2014-01-28 22:00 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 528 bytes --]

ethtool version 3.13 has been released.

Home page: https://ftp.kernel.org/pub/software/network/ethtool/
Download link:
https://ftp.kernel.org/pub/software/network/ethtool/ethtool-3.13.tar.xz

Release notes:

	* Doc: Update GPL text to include current address of the FSF
	* Fix: Spelling fixes
	* Doc: Fix advertising flag values in manual page for 20G link modes,
	  and add missing 1G, 10G and 40G link modes

Ben.
-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [PATCH] net: eth: cpsw: Correctly attach to GPIO bitbang MDIO driver
From: Stefan Roese @ 2014-01-29 10:32 UTC (permalink / raw)
  To: netdev; +Cc: Lukas Stockmann, Mugunthan V N

When the GPIO bitbang MDIO driver is used instead of the Davinci MDIO driver
we need to configure the phy_id string differently. Otherwise this string
looks like this "gpio.6" instead of "gpio-0" and the PHY is not found when
phy_connect() is called.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Lukas Stockmann <lukas.stockmann@siemens.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a3ca3dd..43e6591 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1858,8 +1858,18 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 		mdio_node = of_find_node_by_phandle(be32_to_cpup(parp));
 		phyid = be32_to_cpup(parp+1);
 		mdio = of_find_device_by_node(mdio_node);
-		snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
-			 PHY_ID_FMT, mdio->name, phyid);
+
+		if (strncmp(mdio->name, "gpio", 4) == 0) {
+			/* GPIO bitbang MDIO driver attached */
+			struct mii_bus *bus = dev_get_drvdata(&mdio->dev);
+
+			snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
+				 PHY_ID_FMT, bus->id, phyid);
+		} else {
+			/* davinci MDIO driver attached */
+			snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
+				 PHY_ID_FMT, mdio->name, phyid);
+		}
 
 		mac_addr = of_get_mac_address(slave_node);
 		if (mac_addr)
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH] ipv6: default route for link local address is not added while assigning a address
From: Nicolas Dichtel @ 2014-01-29 10:38 UTC (permalink / raw)
  To: Sohny Thomas, netdev, linux-kernel, yoshfuji, davem, kumuda
In-Reply-To: <52E8A2AA.3090507@linux.vnet.ibm.com>

Le 29/01/2014 07:41, Sohny Thomas a écrit :
> Resending this on netdev mailing list:
> Default route for link local address is configured automatically if
> NETWORKING_IPV6=yes is in ifcfg-eth*.
> When the route table for the interface is flushed and a new address is added to
> the same device with out removing linklocal addr, default route for link local
> address has to added by default.
>
> I have found the issue to be caused by this checkin
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/ipv6?id=62b54dd91567686a1cb118f76a72d5f4764a86dd
>
>
> According to this change :
> He removes adding a link local route if any other address is added , applicable
> across all interfaces though there's mentioned only lo interface
> So below patch fixes for other devices
>
> Signed-off-by: Sohny THomas <sohthoma@linux.vnet.ibm.com>
Your email client has corrupted the patch, it cannot be applied.
Please read Documentation/email-clients.txt

About the patch, I still think that the flush is too agressive. Link local
routes are marked as 'proto kernel', removing them without the link local
address is wrong.

With this patch, you will add a link local route even if you don't have a link
local address.

^ permalink raw reply

* Re: [PATCH 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Florian Westphal @ 2014-01-29 10:53 UTC (permalink / raw)
  To: Eric Dumazet, Florian Westphal, netdev
In-Reply-To: <20140128002707.GA12308@order.stressinduktion.org>

Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> > TCP stream not using DF flag are very unlikely to care if we adjust
> > their MTU (lowering gso_size) at this point ?
> 
> UDP shouldn't be a problem, too.

Sorry for late reply, but how can this be safe for UDP?
We should make sure that peer sees original, unchanged datagram?

And only solution for UDP that I can see is to do sw segmentation (i.e.
create ip fragments).

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH RFC v3 0/12] vti4: prepare namespace and interfamily support.
From: Christophe Gouault @ 2014-01-29 10:55 UTC (permalink / raw)
  To: Steffen Klassert, netdev; +Cc: Saurabh Mohan
In-Reply-To: <1390818577-19589-1-git-send-email-steffen.klassert@secunet.com>

On 01/27/2014 11:29 AM, Steffen Klassert wrote:
> This patchset prepares vti4 for proper namespace and interfamily support.
>
> Currently the receive hook is in the middle of the decapsulation
> process, some of the header pointers point still into the IPsec packet
> others point already into the decapsulated packet. This makes it
> very unflexible and proper namespace and interfamily support can't
> be done as it is.
>
> The patchset that implements an IPsec protocol multiplexer, so that vti
> can register it's own receive path hooks. Further it makes the i_key
> usable for vti and changes the vti code to do the following:
>
> vti uses the IPsec protocol multiplexer to register it's
> own receive side hooks for ESP, AH and IPCOMP.
>
> Vti does the following on receive side:
>
> 1. Do an input policy check for the IPsec packet we received.
>     This is required because this packet could be already
>     processed by IPsec (tunnel in tunnel or a block policy
>     is present), so an inbound policy check is needed.
>
> 2. Mark the packet with the i_key. The policy and the state
>     must match this key now. Policy and state belong to the vti
>     namespace and policy enforcement is done at the further layers.

Hi Steffen,

I did some tests, and it seems there is no inbound policy check against
a vti SP after the ipsec decryption:

To confirm the problem, I added some logs in the kernel to track the
outbound SPD lookup and inbound policy check.

I tested a ping from HostL(10.22.1.1) to HostR(10.24.1.201), that must
be encapsulated via a vti interface (vti1, mark 1, ifindex 8) between
IPsecVTI(10.23.1.101) and HostR(10.23.1.201).

. 10.22.1.0/24 10.23.1.0/24 10.24.1.0/24
. (HostL) ------------(IPsecVTI)============(HostR)------------
. .1 .101 .201

Here is the trace:

(1) xfrm_lookup: oif=8 mark=0 saddr=10.22.1.1 daddr=10.24.1.201
(2) xfrm_lookup: oif=8 mark=1 saddr=10.22.1.1 daddr=10.24.1.201
(3) vti_rcv: found tunnel vti1
(4) __xfrm_policy_check: dir=0 iif=0 mark=0 saddr=10.23.1.201
daddr=10.23.1.101 skb->sp->len=0
(5) __xfrm_policy_check: dir=2 iif=0 mark=0 saddr=10.24.1.201
daddr=10.22.1.1 skb->sp->len=0

And the analysis:

- A first SPD lookup is done before entering vti1 in (1), seeking for
a "global SP".
- A second SPD lookup is done after entering vti1 in (2), with mark 1,
seeking for a "vti SP"
- the icmp request is encapsulated and sent to HostR
- the esp-encrypted icmp reply is received, the packet enters vti1
and an inbound policy check is performed on the ESP packet itself in
(4), with mark 0, seeking for a "global SP".
- the packet is decrypted and its mark set to 1, but no vti inbound
policy check is done. Then the skb mark and secpath are reset by
skb_scrub_params (called by vti_rcv_cb).
- Then only an inbound policy check is performed on the icmp
reply in (5), seeking for a "global SP". It is considered a plaintext
packet, with no mark or secpath.

=> there is no check that the forward vti security policy was
enforced.

Best Regards,
Christophe.

>
> 3. Call the generic xfrm layer to do decryption and decapsulation.
>
> 4. Wait for a callback from the xfrm layer to properly clean the skb to
>     not leak informations on namespace transitions and to update the device
>     statistics.
>
> On transmit side:
>
> 1. Mark the packet with the o_key. The policy and the state
>     must match this key now.
>
> 2. Do a xfrm_lookup on the original packet with the mark applied.
>
> 3. Check if we got an IPsec route.
>
> 4. Clean the skb to not leak informations on namespace
>     transitions.
>
> 5. Attach the dst_enty we got from the xfrm_lookup to the skb.
>
> 6. Call dst_output to do the IPsec processing.
>
> 7. Do the device statistics.
>
>
> Changes from v1:
>
> - Rebased to current net-next.
> - Fix a rcu lockdep complaint in xfrm protocol registration/deregistration.
> - Fix usage of a ipv4 specific callback handler in generic code.
> - Use skb_scrub_packet() to clear the skb in vti_rcv(), suggested by
>    Nicolas Dichtel.
> - Add support for IPCOMP.
> - Support inter address family tunneling.
>
> Changes from v2:
>
> - Rebased to current net-next.
> - Check for matching tunnel endpoints of the xfrm state and
>    the vti interface.
> - Use a own error handler to not create dependencies to the
>    generic IPsec protocol handlers.
> - Change the receive path to do the namespace transition after
>    decapsulation. With this the xfrm lookups are done in the outer
>    namespace for xmit and receive, thanks to Christophe Gouault
>    for pointing this out.
> - Enable namespace changing of vti devices.
>
> I'd take this into the ipsec-next tree after the merge window closes
> if noone has further suggestions or objections.
>

^ permalink raw reply

* Re: [PATCH 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Florian Westphal @ 2014-01-29 11:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, Herbert Xu, netdev
In-Reply-To: <1390926883.28432.12.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > Its still unclear if this is the best strategy.
> > > 
> > > TCP stream not using DF flag are very unlikely to care if we adjust
> > > their MTU (lowering gso_size) at this point ?
> > 
> > Thanks for this suggestion.  It would indeed be nice to avoid sw
> > segmentation.  I tried:
> > 
> > static void ip_gso_adjust_seglen(struct sk_buff *skb)
> > {
> >         unsigned int mtu;
> > 
> >         if (!skb_is_gso(skb))
> >                 return;
> > 
> >         mtu = ip_dst_mtu_maybe_forward(skb_dst(skb), true);
> >         skb_shinfo(skb)->gso_size = mtu - sizeof(struct iphdr);

FWIW Erics idea works fine with:

        headerlen = skb_transport_header(skb) - skb_network_header(skb);
        if (likely(skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
                headerlen += tcp_hdrlen(skb);
        skb_shinfo(skb)->gso_size = mtu - headerlen;

and disabling 'sg' on outgoing (lower-mtu) interface.  [ else BUG() ]

> > [   28.644776] kernel BUG at net/net/core/skbuff.c:2984!
> 
> Yep, lets CC Herbert Xu, as he 'owns' skb_segment()
> 
> BUG_ON(skb_headlen(fskb));

^ permalink raw reply

* Re: [PATCH 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Hannes Frederic Sowa @ 2014-01-29 11:04 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, netdev
In-Reply-To: <20140129105347.GF30123@breakpoint.cc>

On Wed, Jan 29, 2014 at 11:53:47AM +0100, Florian Westphal wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> > > TCP stream not using DF flag are very unlikely to care if we adjust
> > > their MTU (lowering gso_size) at this point ?
> > 
> > UDP shouldn't be a problem, too.
> 
> Sorry for late reply, but how can this be safe for UDP?
> We should make sure that peer sees original, unchanged datagram?

Peer as in original destination? Of course, we must not alter the datagram but
can only do fragmentation or send back frag_needed.

> And only solution for UDP that I can see is to do sw segmentation (i.e.
> create ip fragments).

Hardware(-UFO) would do fragmentation in hardware, too, because there is
no other way to split UDP data in any other way. If UFO is not supported
manual sw segmentation would create the required fragments in output
path, too.

Greetings,

  Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox