* Re: [PATCH 5/5] tcp: ipv4 listen state scaled
From: Alexey Kuznetsov @ 2010-10-27 15:04 UTC (permalink / raw)
To: Dmitry Popov, netdev
In-Reply-To: <AANLkTikRsOevLBHn0xb0S_YvfPMWpAdw373bxQUc+xbV@mail.gmail.com>
Hello!
It looks like there is at least one hole here.
You take lock, check syn table and drop lock in tcp_v4_hnd_req().
Then you immediately enter tcp_v4_conn_request() and grab lock again.
Oops, in the tiny hole while lock was dropped the request can be already
created (even funnier, the whole socket can be already created and even accepted).
So, if you drop lock, you have to restart the whole tcp_v4_rcv_listen()
(which seems to be impossible without additional tricks)
Alexey
^ permalink raw reply
* [net-2.6 PATCH 1/1] qlge: bugfix: Restoring the vlan setting.
From: Ron Mercer @ 2010-10-27 14:58 UTC (permalink / raw)
To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
drivers/net/qlge/qlge_main.c | 17 +++++++++++++++++
1 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index e621056..c30e0fe 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -2385,6 +2385,20 @@ static void qlge_vlan_rx_kill_vid(struct net_device *ndev, u16 vid)
}
+static void qlge_restore_vlan(struct ql_adapter *qdev)
+{
+ qlge_vlan_rx_register(qdev->ndev, qdev->vlgrp);
+
+ if (qdev->vlgrp) {
+ u16 vid;
+ for (vid = 0; vid < VLAN_N_VID; vid++) {
+ if (!vlan_group_get_device(qdev->vlgrp, vid))
+ continue;
+ qlge_vlan_rx_add_vid(qdev->ndev, vid);
+ }
+ }
+}
+
/* MSI-X Multiple Vector Interrupt Handler for inbound completions. */
static irqreturn_t qlge_msix_rx_isr(int irq, void *dev_id)
{
@@ -3960,6 +3974,9 @@ static int ql_adapter_up(struct ql_adapter *qdev)
clear_bit(QL_PROMISCUOUS, &qdev->flags);
qlge_set_multicast_list(qdev->ndev);
+ /* Restore vlan setting. */
+ qlge_restore_vlan(qdev);
+
ql_enable_interrupts(qdev);
ql_enable_all_completion_interrupts(qdev);
netif_tx_start_all_queues(qdev->ndev);
--
1.6.0.2
^ permalink raw reply related
* [net-next PATCH 2/2] qlge: Version change to v1.00.00.27
From: Ron Mercer @ 2010-10-27 14:58 UTC (permalink / raw)
To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok
In-Reply-To: <1288191507-1994-1-git-send-email-ron.mercer@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
drivers/net/qlge/qlge.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 0474d20..69c4780 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -16,7 +16,7 @@
*/
#define DRV_NAME "qlge"
#define DRV_STRING "QLogic 10 Gigabit PCI-E Ethernet Driver "
-#define DRV_VERSION "v1.00.00.25.00.00-01"
+#define DRV_VERSION "v1.00.00.27.00.00-01"
#define WQ_ADDR_ALIGN 0x3 /* 4 byte alignment */
--
1.6.0.2
^ permalink raw reply related
* [net-next PATCH 1/2] qlge: Add firmware info to ethtool get regs.
From: Ron Mercer @ 2010-10-27 14:58 UTC (permalink / raw)
To: davem; +Cc: netdev, ron.mercer, jitendra.kalsaria, ying.lok
By default we add firmware information to ethtool get regs.
Optionally firmware info can instead be sent to log.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
drivers/net/qlge/qlge.h | 2 ++
drivers/net/qlge/qlge_dbg.c | 21 ++++++++++++++++++++-
drivers/net/qlge/qlge_ethtool.c | 19 ++++++++++++++++---
3 files changed, 38 insertions(+), 4 deletions(-)
diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index a478786..0474d20 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -2221,6 +2221,7 @@ int ql_write_mpi_reg(struct ql_adapter *qdev, u32 reg, u32 data);
int ql_unpause_mpi_risc(struct ql_adapter *qdev);
int ql_pause_mpi_risc(struct ql_adapter *qdev);
int ql_hard_reset_mpi_risc(struct ql_adapter *qdev);
+int ql_soft_reset_mpi_risc(struct ql_adapter *qdev);
int ql_dump_risc_ram_area(struct ql_adapter *qdev, void *buf,
u32 ram_addr, int word_count);
int ql_core_dump(struct ql_adapter *qdev,
@@ -2237,6 +2238,7 @@ int ql_mb_set_mgmnt_traffic_ctl(struct ql_adapter *qdev, u32 control);
int ql_mb_get_port_cfg(struct ql_adapter *qdev);
int ql_mb_set_port_cfg(struct ql_adapter *qdev);
int ql_wait_fifo_empty(struct ql_adapter *qdev);
+void ql_get_dump(struct ql_adapter *qdev, void *buff);
void ql_gen_reg_dump(struct ql_adapter *qdev,
struct ql_reg_dump *mpi_coredump);
netdev_tx_t ql_lb_send(struct sk_buff *skb, struct net_device *ndev);
diff --git a/drivers/net/qlge/qlge_dbg.c b/drivers/net/qlge/qlge_dbg.c
index 4747492..fca804f 100644
--- a/drivers/net/qlge/qlge_dbg.c
+++ b/drivers/net/qlge/qlge_dbg.c
@@ -1317,9 +1317,28 @@ void ql_gen_reg_dump(struct ql_adapter *qdev,
status = ql_get_ets_regs(qdev, &mpi_coredump->ets[0]);
if (status)
return;
+}
+
+void ql_get_dump(struct ql_adapter *qdev, void *buff)
+{
+ /*
+ * If the dump has already been taken and is stored
+ * in our internal buffer and if force dump is set then
+ * just start the spool to dump it to the log file
+ * and also, take a snapshot of the general regs to
+ * to the user's buffer or else take complete dump
+ * to the user's buffer if force is not set.
+ */
- if (test_bit(QL_FRC_COREDUMP, &qdev->flags))
+ if (!test_bit(QL_FRC_COREDUMP, &qdev->flags)) {
+ if (!ql_core_dump(qdev, buff))
+ ql_soft_reset_mpi_risc(qdev);
+ else
+ netif_err(qdev, drv, qdev->ndev, "coredump failed!\n");
+ } else {
+ ql_gen_reg_dump(qdev, buff);
ql_get_core_dump(qdev);
+ }
}
/* Coredump to messages log file using separate worker thread */
diff --git a/drivers/net/qlge/qlge_ethtool.c b/drivers/net/qlge/qlge_ethtool.c
index 4892d64..8149cc9 100644
--- a/drivers/net/qlge/qlge_ethtool.c
+++ b/drivers/net/qlge/qlge_ethtool.c
@@ -375,7 +375,10 @@ static void ql_get_drvinfo(struct net_device *ndev,
strncpy(drvinfo->bus_info, pci_name(qdev->pdev), 32);
drvinfo->n_stats = 0;
drvinfo->testinfo_len = 0;
- drvinfo->regdump_len = 0;
+ if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
+ drvinfo->regdump_len = sizeof(struct ql_mpi_coredump);
+ else
+ drvinfo->regdump_len = sizeof(struct ql_reg_dump);
drvinfo->eedump_len = 0;
}
@@ -547,7 +550,12 @@ static void ql_self_test(struct net_device *ndev,
static int ql_get_regs_len(struct net_device *ndev)
{
- return sizeof(struct ql_reg_dump);
+ struct ql_adapter *qdev = netdev_priv(ndev);
+
+ if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
+ return sizeof(struct ql_mpi_coredump);
+ else
+ return sizeof(struct ql_reg_dump);
}
static void ql_get_regs(struct net_device *ndev,
@@ -555,7 +563,12 @@ static void ql_get_regs(struct net_device *ndev,
{
struct ql_adapter *qdev = netdev_priv(ndev);
- ql_gen_reg_dump(qdev, p);
+ ql_get_dump(qdev, p);
+ qdev->core_is_dumped = 0;
+ if (!test_bit(QL_FRC_COREDUMP, &qdev->flags))
+ regs->len = sizeof(struct ql_mpi_coredump);
+ else
+ regs->len = sizeof(struct ql_reg_dump);
}
static int ql_get_coalesce(struct net_device *dev, struct ethtool_coalesce *c)
--
1.6.0.2
^ permalink raw reply related
* [PATCH] tunnels: Fix tunnels change rcu protection
From: Pavel Emelyanov @ 2010-10-27 15:43 UTC (permalink / raw)
To: David Miller; +Cc: Eric Dumazet, Linux Netdev List
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.
The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.
Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).
The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index d0ffcbe..01087e0 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1072,6 +1072,7 @@ ipgre_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
break;
}
ipgre_tunnel_unlink(ign, t);
+ synchronize_net();
t->parms.iph.saddr = p.iph.saddr;
t->parms.iph.daddr = p.iph.daddr;
t->parms.i_key = p.i_key;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index e9b816e..cd300aa 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -676,6 +676,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
}
t = netdev_priv(dev);
ipip_tunnel_unlink(ipn, t);
+ synchronize_net();
t->parms.iph.saddr = p.iph.saddr;
t->parms.iph.daddr = p.iph.daddr;
memcpy(dev->dev_addr, &p.iph.saddr, 4);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 38b9a56..2a59610 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1284,6 +1284,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
t = netdev_priv(dev);
ip6_tnl_unlink(ip6n, t);
+ synchronize_net();
err = ip6_tnl_change(t, &p);
ip6_tnl_link(ip6n, t);
netdev_state_change(dev);
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 367a6cc..d6bfaec 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -963,6 +963,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
}
t = netdev_priv(dev);
ipip6_tunnel_unlink(sitn, t);
+ synchronize_net();
t->parms.iph.saddr = p.iph.saddr;
t->parms.iph.daddr = p.iph.daddr;
memcpy(dev->dev_addr, &p.iph.saddr, 4);
^ permalink raw reply related
* Re: [PATCH net-next-2.6 1/2] be2net: Adding an option to use INTx instead of MSI-X
From: David Miller @ 2010-10-27 15:45 UTC (permalink / raw)
To: michael; +Cc: bhutchings, somnath.kotur, netdev, linux-pci
In-Reply-To: <1288075928.6578.185.camel@concordia>
From: Michael Ellerman <michael@ellerman.id.au>
Date: Tue, 26 Oct 2010 17:52:08 +1100
> That horse has really really bolted, it's gawn.
>
> I count 26 drivers with "disable MSI/X" parameters. Some even have more
> than one.
>
> 11 of them are network drivers, 9 scsi, 3 ata.
>
> I agree it's a mess for users, but it's probably preferable to a
> non-working driver.
Stupid inappropriate things being in the tree doesn't mean I need
to accept more of them.
^ permalink raw reply
* Re: [PATCH net-next-2.6 1/2] be2net: Adding an option to use INTx instead of MSI-X
From: David Miller @ 2010-10-27 15:46 UTC (permalink / raw)
To: michael; +Cc: bhutchings, somnath.kotur, netdev, linux-pci
In-Reply-To: <1288135235.16778.14.camel@concordia>
From: Michael Ellerman <michael@ellerman.id.au>
Date: Wed, 27 Oct 2010 10:20:35 +1100
> On Tue, 2010-10-26 at 14:32 +0100, Ben Hutchings wrote:
>> Michael Ellerman wrote:
>> > On Mon, 2010-10-25 at 16:25 -0700, David Miller wrote:
>> > > From: Ben Hutchings <bhutchings@solarflare.com>
>> > > Date: Mon, 25 Oct 2010 23:38:53 +0100
>
>> > Ethtool would be nice, but only for network drivers. Is there a generic
>> > solution, quirks are obviously not keeping people happy.
>>
>> Since this is (normally) a property of the system, pci=nomsi is the
>> generic solution.
>
> Sort of, it's a big hammer. Did all these driver writers not know about
> pci=nomsi or did they prefer to add a parameter to their driver for some
> reason?
Every time I've actually done the work to try and track down the
true issue, it always turned out to be a PCI chipset problem rather
than a device specific issue.
^ permalink raw reply
* Re: [PATCH] af_packet: account for VLAN when checking packet size
From: David Miller @ 2010-10-27 15:48 UTC (permalink / raw)
To: horms; +Cc: mst, eric.dumazet, netdev, johann.baudy
In-Reply-To: <20101022084052.GA2118@verge.net.au>
From: Simon Horman <horms@verge.net.au>
Date: Fri, 22 Oct 2010 10:41:26 +0200
> Incidently, I believe that this problem will only become more acute
> and complex if support for 802.1ad (Provider Bridges, aka Q-in-Q),
> 802.1ah (Provider Backbone Bridges, aka MAC-in-MAC) or other standards
> which further extend the maximum frame size.
No doubt.
> Dave, you were mentioning to me the other day that the kernel
> already supports some notion of Q-in-Q (though its not 802.1ad).
> Does the current implementation allow for frames > 1504 bytes?
It's only going to hardware offload and allow the extra space
for the outer-most VLAN tag. Everthing inside of the outer
tag will be handled in software as far as Linux is concerned.
> Is that a complication to the change proposed here?
For now, I don't think so.
^ permalink raw reply
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: David Miller @ 2010-10-27 15:51 UTC (permalink / raw)
To: lorenzo; +Cc: brian.haley, shemminger, netdev
In-Reply-To: <AANLkTik2Q=fF9j6eiBQx33e2JU7EG3Sc5hnXyDH24+mt@mail.gmail.com>
From: Lorenzo Colitti <lorenzo@google.com>
Date: Tue, 26 Oct 2010 15:53:50 -0700
> As the patch stands, they don't. Only autoconfigured addresses will be
> cleared, because addrconf_ifdown() does not remove any addresses that
> are permanent (unless they are link-local, in which case they are
> recreated as soon as link comes back).
Ok, and that brings us back to the issue of losing a TCP connection
over a link-local et al. address during a minor link flap.
I think some financial services people will really dislike that
behavior :)
^ permalink raw reply
* Re: [PATCH] tcp: md5 signature check scaling
From: Stephen Hemminger @ 2010-10-27 15:56 UTC (permalink / raw)
To: Dmitry Popov
Cc: David S. Miller, William.Allen.Simpson, Eric Dumazet,
Andreas Petlund, Ilpo Järvinen, Alexey Kuznetsov,
Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, Herbert Xu, Gilad Ben-Yossef, Yony Amit, Zhu Yi,
linux-kernel, netdev
In-Reply-To: <AANLkTi=U1smX6XXDMyBTicyqbUU5V-t56jmH7qtX2XW5@mail.gmail.com>
On Wed, 27 Oct 2010 16:52:30 +0400
Dmitry Popov <dp@highloadlab.com> wrote:
> From: Dmitry Popov <dp@highloadlab.com>
>
> TCP MD5 signature checking without socket lock.
>
> Each tcp_sock has 2 RCU-protected arrays (tcp[46]_md5sig_info) of
> tcp[46]_md5sig_key address-key pairs.
> Each key (tcp_md5sig_key) has kref struct so that there is no need to
> lock the whole array to work with one key.
>
> MD5 functions were rewritten according to above statement and hash
> check (tcp_v4_inbound_md5_hash) was moved before socket lock.
>
> Signed-off-by: Dmitry Popov <dp@highloadlab.com>
You traded locking for ref counting which may not be as big
a win as your think.
Also, the overhead of RCU here might impact tests that involve
lots of socket creation and destruction.
--
^ permalink raw reply
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Lorenzo Colitti @ 2010-10-27 16:01 UTC (permalink / raw)
To: David Miller; +Cc: brian.haley, shemminger, netdev
In-Reply-To: <20101027.085115.104074771.davem@davemloft.net>
On Wed, Oct 27, 2010 at 8:51 AM, David Miller <davem@davemloft.net> wrote:
>> As the patch stands, they don't. Only autoconfigured addresses will be
>> cleared, because addrconf_ifdown() does not remove any addresses that
>> are permanent (unless they are link-local, in which case they are
>> recreated as soon as link comes back).
>
> Ok, and that brings us back to the issue of losing a TCP connection
> over a link-local et al. address during a minor link flap.
TCP connections don't go away when the address they are using goes
away. They go away if their address goes away forever. As mentioned
above in the thread, I did test this :)
^ permalink raw reply
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Lorenzo Colitti @ 2010-10-27 16:03 UTC (permalink / raw)
To: Maciej Żenczykowski; +Cc: netdev, Brian Haley
In-Reply-To: <AANLkTikTT5ie-R61rA9NucgGi5RBOPpXZLMYQRa_wb7j@mail.gmail.com>
On Wed, Oct 27, 2010 at 1:35 AM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> So how does all this work with privacy addresses, established
> connections, and a link flap?
The current privacy address comes back because it's a time-based hash.
I think the old ones are gone. Still, I think it's better that
connections from 1 day ago don't work any more (the default for
privacy addresses is 1 day), than if all new and all old connections
don't work any more.
^ permalink raw reply
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: David Miller @ 2010-10-27 16:05 UTC (permalink / raw)
To: lorenzo; +Cc: brian.haley, shemminger, netdev
In-Reply-To: <AANLkTikhhsEu7zzYpEH-Dby_Ke77iSo0z1wu2t-xU+Jc@mail.gmail.com>
From: Lorenzo Colitti <lorenzo@google.com>
Date: Wed, 27 Oct 2010 09:01:25 -0700
> On Wed, Oct 27, 2010 at 8:51 AM, David Miller <davem@davemloft.net> wrote:
>>> As the patch stands, they don't. Only autoconfigured addresses will be
>>> cleared, because addrconf_ifdown() does not remove any addresses that
>>> are permanent (unless they are link-local, in which case they are
>>> recreated as soon as link comes back).
>>
>> Ok, and that brings us back to the issue of losing a TCP connection
>> over a link-local et al. address during a minor link flap.
>
> TCP connections don't go away when the address they are using goes
> away. They go away if their address goes away forever. As mentioned
> above in the thread, I did test this :)
Ok, that's good enough for me :)
^ permalink raw reply
* Re: [PATCH] ehea: fix use after free
From: Breno Leitao @ 2010-10-27 16:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1288156867.2652.127.camel@edumazet-laptop>
> [PATCH] ehea: fix use after free
>
> ehea_start_xmit() dereferences skb after its freeing in ehea_xmit3() to
> get vlan tags.
>
> Move the offending block before the potential ehea_xmit3() call.
>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
^ permalink raw reply
* Re: tap0 device stopped working in 2.6.36 (ok in 2.6.35)
From: Jim @ 2010-10-27 16:09 UTC (permalink / raw)
To: Nolan Leake; +Cc: netdev
In-Reply-To: <1288142302.26640.7.camel@voxel>
On 10/27/2010 03:18 AM, Nolan Leake wrote:
> On Sun, 2010-10-24 at 11:59 +0200, Jim wrote:
>> Thanks for explaining the purpose of the patch.
>> But it appears something is missing and I think it breaks current
>> userspace. I use this tap0 device together with VirtualBox, I have a
>> virtual machine setup as bridged to tap0, not a very odd or strange
>> setup (this used to be the only method).
>> On the host side I run dhcpd to hand out IP address to the virtual
>> machine, but despite the dhcpd running on the tap0 device it never got
>> 'ready' in the sense that no IP packets made it out from the host to the
>> guest.
>
> To make sure I understand the situation, is this correct (ignoring the
> exact names of the interfaces):
> br0 bridges between eth0 and tap0, and you run dhcpd on tap0?
>
> Since tap0 is part of the bridge, I think dhcpd should be running on
> br0. Does that work?
>
Not exactly, VirtualBox calls it "bridged adapter", it 'bridges' the
guest machine to the tap0 interface on the host for so called host-only
networking.
See eg. http://forums.virtualbox.org/viewtopic.php?f=1&t=165
And this sequence is now simply failing
tunctl -t tap0 -u tuxuser
ifconfig tap0 10.0.0.1 up
Jim
^ permalink raw reply
* E-Mail Quote zu aktualisieren
From: khorst @ 2010-10-27 16:09 UTC (permalink / raw)
Sie haben die Lagerung für Ihr Postfach überschritten wird.
Sie werden nicht in der Lage zu senden oder zu empfangen, bis Sie neue
E-Mail Ihre E-Mail Quote zu aktualisieren.
Kopieren Sie den untenstehenden Link und füllen Sie das Formular, um Ihr
Konto zu aktualisieren.
http://www.do.my/e-mailaccountactivation/
System-Administrator
192.168.0.1
^ permalink raw reply
* Re: [PATCH 5/5] tcp: ipv4 listen state scaled
From: Dmitry Popov @ 2010-10-27 16:44 UTC (permalink / raw)
To: Alexey Kuznetsov; +Cc: netdev
In-Reply-To: <20101027150434.GA13121@ms2.inr.ac.ru>
On Wed, Oct 27, 2010 at 7:04 PM, Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:
> Hello!
>
> It looks like there is at least one hole here.
>
> You take lock, check syn table and drop lock in tcp_v4_hnd_req().
> Then you immediately enter tcp_v4_conn_request() and grab lock again.
> Oops, in the tiny hole while lock was dropped the request can be already
> created (even funnier, the whole socket can be already created and even accepted).
> So, if you drop lock, you have to restart the whole tcp_v4_rcv_listen()
> (which seems to be impossible without additional tricks)
>
> Alexey
>
Hello, Alexey!
Yes, it may happen, but I don't see any problem. On 2 same SYN-packets
we will add 2 requests to syn table. Yes, it's not so good, but
nothing criminal, no?
Regards,
Dmitry.
^ permalink raw reply
* [PATCH] bonding: Fix lockdep warning after bond_vlan_rx_register()
From: Jarek Poplawski @ 2010-10-27 17:08 UTC (permalink / raw)
To: David Miller; +Cc: Eric Dumazet, netdev, Jesse Gross, Jay Vosburgh
In-Reply-To: <1288184416.2709.109.camel@edumazet-laptop>
On Wed, Oct 27, 2010 at 03:00:16PM +0200, Eric Dumazet wrote:
>
> Indeed this is the right fix, I wonder why I did not catch it before ?
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Thanks,
Jarek P.
------------------------>
Fix lockdep warning:
[ 52.991402] ======================================================
[ 52.991511] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[ 52.991569] 2.6.36-04573-g4b60626-dirty #65
[ 52.991622] ------------------------------------------------------
[ 52.991696] ip/4842 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
[ 52.991758] (&bond->lock){++++..}, at: [<efe4d300>] bond_set_multicast_list+0x60/0x2c0 [bonding]
[ 52.991966]
[ 52.991967] and this task is already holding:
[ 52.992008] (&bonding_netdev_addr_lock_key){+.....}, at: [<c04e5530>] dev_mc_sync+0x50/0xa0
[ 52.992008] which would create a new lock dependency:
[ 52.992008] (&bonding_netdev_addr_lock_key){+.....} -> (&bond->lock){++++..}
[ 52.992008]
[ 52.992008] but this new dependency connects a SOFTIRQ-irq-safe lock:
[ 52.992008] (&(&mc->mca_lock)->rlock){+.-...}
[ 52.992008] ... which became SOFTIRQ-irq-safe at:
[ 52.992008] [<c0272beb>] __lock_acquire+0x96b/0x1960
[ 52.992008] [<c027415e>] lock_acquire+0x7e/0xf0
[ 52.992008] [<c05f356d>] _raw_spin_lock_bh+0x3d/0x50
[ 52.992008] [<c0584e40>] mld_ifc_timer_expire+0xf0/0x280
[ 52.992008] [<c024cee6>] run_timer_softirq+0x146/0x310
[ 52.992008] [<c024591d>] __do_softirq+0xad/0x1c0
[ 52.992008]
[ 52.992008] to a SOFTIRQ-irq-unsafe lock:
[ 52.992008] (&bond->lock){++++..}
[ 52.992008] ... which became SOFTIRQ-irq-unsafe at:
[ 52.992008] ... [<c0272c3b>] __lock_acquire+0x9bb/0x1960
[ 52.992008] [<c027415e>] lock_acquire+0x7e/0xf0
[ 52.992008] [<c05f36b8>] _raw_write_lock+0x38/0x50
[ 52.992008] [<efe4cbe4>] bond_vlan_rx_register+0x24/0x70 [bonding]
[ 52.992008] [<c0598010>] register_vlan_dev+0xc0/0x280
[ 52.992008] [<c0599f3a>] vlan_newlink+0xaa/0xd0
[ 52.992008] [<c04ed4b4>] rtnl_newlink+0x404/0x490
[ 52.992008] [<c04ece35>] rtnetlink_rcv_msg+0x1e5/0x220
[ 52.992008] [<c050424e>] netlink_rcv_skb+0x8e/0xb0
[ 52.992008] [<c04ecbac>] rtnetlink_rcv+0x1c/0x30
[ 52.992008] [<c0503bfb>] netlink_unicast+0x24b/0x290
[ 52.992008] [<c0503e37>] netlink_sendmsg+0x1f7/0x310
[ 52.992008] [<c04cd41c>] sock_sendmsg+0xac/0xe0
[ 52.992008] [<c04ceb80>] sys_sendmsg+0x130/0x230
[ 52.992008] [<c04cf04e>] sys_socketcall+0xde/0x280
[ 52.992008] [<c0202d10>] sysenter_do_call+0x12/0x36
[ 52.992008]
[ 52.992008] other info that might help us debug this:
...
[ Full info at netdev: Wed, 27 Oct 2010 12:24:30 +0200
Subject: [BUG net-2.6 vlan/bonding] lockdep splats ]
Use BH variant of write_lock(&bond->lock) (as elsewhere in bond_main)
to prevent this dependency.
Fixes commit f35188faa0fbabefac476536994f4b6f3677380f [v2.6.36]
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
---
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index beb3b7c..bdb68a6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -493,9 +493,9 @@ static void bond_vlan_rx_register(struct net_device *bond_dev,
struct slave *slave;
int i;
- write_lock(&bond->lock);
+ write_lock_bh(&bond->lock);
bond->vlgrp = grp;
- write_unlock(&bond->lock);
+ write_unlock_bh(&bond->lock);
bond_for_each_slave(bond, slave, i) {
struct net_device *slave_dev = slave->dev;
^ permalink raw reply related
* Re: [Security] TIPC security issues
From: David Miller @ 2010-10-27 17:29 UTC (permalink / raw)
To: torvalds; +Cc: drosenberg, jon.maloy, allan.stephens, netdev, security
In-Reply-To: <AANLkTi=V93A660+YS8C2TvC13kGUcJpFgjPHUvONd_WW@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 21 Oct 2010 17:31:12 -0700
> Something like the appended UNTESTED patch. NOTE! it actually makes
> "verify_iovec()" *change* the iovec if it grows too big.
Ok I thought a bit about this patch.
How we can behave here depends upon the socket type.
For example, for a stream socket such as TCP a partial write is OK and
we could truncate the iovec like this. That's fine.
But for a datagram socket, we have to have a one-to-one correspondance
between write() calls and packets on the wire. So we'd either need to
accept the entire write() length or fail it with an error.
verify_iovec() (currently) doesn't have the socket type information
available, so it's not able to key off of that right now.
I agree that cutting off these cases at a high level would be the
thing to do long term, but right now verify_iovec() isn't positioned
such that we can do it just yet.
For now I'm going to look into specifically fixing the TIPC case and
also think longer term about another way to address this at a high
level.
^ permalink raw reply
* Re: [Security] TIPC security issues
From: Linus Torvalds @ 2010-10-27 17:37 UTC (permalink / raw)
To: David Miller; +Cc: drosenberg, jon.maloy, allan.stephens, netdev, security
In-Reply-To: <20101027.102940.112580564.davem@davemloft.net>
On Wed, Oct 27, 2010 at 10:29 AM, David Miller <davem@davemloft.net> wrote:
>
> But for a datagram socket, we have to have a one-to-one correspondance
> between write() calls and packets on the wire. So we'd either need to
> accept the entire write() length or fail it with an error.
I disagree. We had that exact issue with regular file read/write: in
theory, POSIX says that you should never do a partial write to a
regular file.
And the thing is, WE SIMPLY DON'T CARE. If somebody does a 2GB+ IO,
they damn well need to accept that it's not going to be some atomic
single event. It doesn't matter _how_ much actual real memory you
have, it's just stupid to even care about that situation. It's not
something any real app actually can reasonably ever expect to work, so
rather than say "we have to do it right or error out", you should just
see it as a "it's a stupid situation, we can do whatever the hell we
want, because anybody who cares is a f*cking moron that we don't care
about".
If you _really_ care deeply, then some packet-oriented protocol can
just have its own private packet size limit (which would be way less
than 2GB), and then just look at the total size and say "oh, the total
size is bigger than my limit, so I'll just error out". Then, the fact
that verify_iovec() may have truncated the message to 2GB-1 doesn't
matter at all.
(Practically speaking, I bet all packet-oriented protocols already
have a limit that is enforced by simply allocation patterns, so I
don't think it's actually a problem even now)
Linus
^ permalink raw reply
* Re: tap0 device stopped working in 2.6.36 (ok in 2.6.35)
From: Nolan Leake @ 2010-10-27 17:48 UTC (permalink / raw)
To: Jim; +Cc: netdev, David Miller
In-Reply-To: <4CC84EAD.7040506@xs4all.nl>
On Wed, 2010-10-27 at 18:09 +0200, Jim wrote:
> Not exactly, VirtualBox calls it "bridged adapter", it 'bridges' the
> guest machine to the tap0 interface on the host for so called host-only
> networking.
> See eg. http://forums.virtualbox.org/viewtopic.php?f=1&t=165
OK, so you have the tap0 device, and you assign an IP to it and run
dhcpd on it. Understood. Thank you for the explanatory link.
> And this sequence is now simply failing
> tunctl -t tap0 -u tuxuser
> ifconfig tap0 10.0.0.1 up
The link is not ready until some process has attached to the tap device.
tunctl simply attaches and then immediately detaches, leaving it
link-down until the virtualbox process starts and attaches.
But this doesn't cause the problem for me! I suspect that is because I
am running an ipv4 only kernel; the "ADDRCONF(NETDEV_UP): tap0: link is
not ready" error comes from net/ipv6/addrconf.c.
I have no idea why ipv6 vetos the upping of a link-down interface, while
ipv4 doesn't care.
If this is all intended behavior, then I guess I'll need to make the old
"tap devices are always link-up" mode the default, and add a way for
newer software to opt-in into correct link-state reporting.
David (CC'd), could you comment on this?
Thanks,
Nolan
^ permalink raw reply
* Re: [Security] TIPC security issues
From: David Miller @ 2010-10-27 17:50 UTC (permalink / raw)
To: torvalds; +Cc: drosenberg, jon.maloy, allan.stephens, netdev, security
In-Reply-To: <AANLkTinHVXDJbgJ0zBVRnr8R4sx=NtZCaFgbFC+8DE+q@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 27 Oct 2010 10:37:46 -0700
> If you _really_ care deeply, then some packet-oriented protocol can
> just have its own private packet size limit (which would be way less
> than 2GB), and then just look at the total size and say "oh, the total
> size is bigger than my limit, so I'll just error out". Then, the fact
> that verify_iovec() may have truncated the message to 2GB-1 doesn't
> matter at all.
>
> (Practically speaking, I bet all packet-oriented protocols already
> have a limit that is enforced by simply allocation patterns, so I
> don't think it's actually a problem even now)
This is, as it turns out, effectively what the TIPC socket layer
already does.
Most of the send calls that propagate down to this code adding up the
iov_len lengths gets passed a maximum packet size.
Anyways, here is what I came up with to kill this specific bug in
TIPC:
>From 39dc17049a5ed989bab8997945a048ffddf48387 Mon Sep 17 00:00:00 2001
From: David S. Miller <davem@davemloft.net>
Date: Wed, 27 Oct 2010 10:46:59 -0700
Subject: [PATCH] tipc: Fix iov_len handling in message send path.
Use size_t to add together iov_len's from the iovec, error
out with -EMSGSIZE if total is greater than INT_MAX.
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/tipc/msg.c | 13 ++++++++++---
1 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index ecb532f..b29eb8b 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -76,11 +76,15 @@ void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type,
int tipc_msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect)
{
- int dsz = 0;
+ size_t dsz = 0;
int i;
for (i = 0; i < num_sect; i++)
dsz += msg_sect[i].iov_len;
+
+ if (dsz > INT_MAX)
+ return -EMSGSIZE;
+
return dsz;
}
@@ -93,12 +97,15 @@ int tipc_msg_calc_data_size(struct iovec const *msg_sect, u32 num_sect)
*/
int tipc_msg_build(struct tipc_msg *hdr,
- struct iovec const *msg_sect, u32 num_sect,
- int max_size, int usrmem, struct sk_buff** buf)
+ struct iovec const *msg_sect, u32 num_sect,
+ int max_size, int usrmem, struct sk_buff** buf)
{
int dsz, sz, hsz, pos, res, cnt;
dsz = tipc_msg_calc_data_size(msg_sect, num_sect);
+ if (dsz < 0)
+ return dsz;
+
if (unlikely(dsz > TIPC_MAX_USER_MSG_SIZE)) {
*buf = NULL;
return -EINVAL;
--
1.7.3.2
^ permalink raw reply related
* Re: tap0 device stopped working in 2.6.36 (ok in 2.6.35)
From: David Miller @ 2010-10-27 17:52 UTC (permalink / raw)
To: nolan; +Cc: jim876, netdev
In-Reply-To: <1288201734.26640.43.camel@voxel>
From: Nolan Leake <nolan@cumulusnetworks.com>
Date: Wed, 27 Oct 2010 10:48:54 -0700
> I have no idea why ipv6 vetos the upping of a link-down interface, while
> ipv4 doesn't care.
>
> If this is all intended behavior, then I guess I'll need to make the old
> "tap devices are always link-up" mode the default, and add a way for
> newer software to opt-in into correct link-state reporting.
>
> David (CC'd), could you comment on this?
If ipv6 cannot send multicast packets for neighbour and router
discovery, which it must do in order to function properly over the
device, the interface is unusable.
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: Fix some corner cases in dev_can_checksum()
From: David Miller @ 2010-10-27 18:00 UTC (permalink / raw)
To: bhutchings; +Cc: jesse, netdev
In-Reply-To: <1287756739.2316.11.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 22 Oct 2010 15:12:19 +0100
> dev_can_checksum() incorrectly returns true in these cases:
>
> 1. The skb has both out-of-band and in-band VLAN tags and the device
> supports checksum offload for the encapsulated protocol but only with
> one layer of encapsulation.
> 2. The skb has a VLAN tag and the device supports generic checksumming
> but not in conjunction with VLAN encapsulation.
>
> Rearrange the VLAN tag checks to avoid these.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied, thanks for fixing this Ben.
We really do need to figure out how to handle all of the various
"super vlan" encapsulation schemes (Cisco's RFC5517, 802.1ad,
802.1ah), and their offloads.
In fact, 802.1ad and 802.1ah use completely different framing than
the current VLAN bits.
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: NETIF_F_HW_CSUM does not imply FCoE CRC offload
From: David Miller @ 2010-10-27 18:01 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, yi.zou
In-Reply-To: <1287758306.2316.35.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 22 Oct 2010 15:38:26 +0100
> NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit
> checksum with the checksum of an arbitrary part of the packet data,
> whereas the FCoE CRC is something entirely different.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> Cc: stable@kernel.org [2.6.32+]
Applied, thanks a lot Ben.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox