* Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
From: Akinobu Mita @ 2009-10-13 2:18 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, FUJITA Tomonori, David S. Miller, sparclinux,
Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Greg Kroah-Hartman, Lothar Wassmann, linux-usb, Roland Dreier,
Yevgeny Petrilin, netdev, Tony Luck, Fenghua Yu, linux-ia64,
linux-altix
In-Reply-To: <20091009164100.85a36188.akpm@linux-foundation.org>
On Fri, Oct 09, 2009 at 04:41:00PM -0700, Andrew Morton wrote:
> On Fri, 9 Oct 2009 17:29:15 +0900
> Akinobu Mita <akinobu.mita@gmail.com> wrote:
>
> > This introduces new bitmap functions:
> >
> > bitmap_set: Set specified bit area
> > bitmap_clear: Clear specified bit area
> > bitmap_find_next_zero_area: Find free bit area
> >
> > These are stolen from iommu helper.
> >
> > I changed the return value of bitmap_find_next_zero_area if there is
> > no zero area.
> >
> > find_next_zero_area in iommu helper: returns -1
> > bitmap_find_next_zero_area: return >= bitmap size
>
> I'll plan to merge this patch into 2.6.32 so we can trickle all the
> other patches into subsystems in an orderly fashion.
Sounds good.
> > +void bitmap_set(unsigned long *map, int i, int len)
> > +{
> > + int end = i + len;
> > +
> > + while (i < end) {
> > + __set_bit(i, map);
> > + i++;
> > + }
> > +}
>
> This is really inefficient, isn't it? It's a pretty trivial matter to
> romp through memory 32 or 64 bits at a time.
OK. I'll do
> > +unsigned long bitmap_find_next_zero_area(unsigned long *map,
> > + unsigned long size,
> > + unsigned long start,
> > + unsigned int nr,
> > + unsigned long align_mask)
> > +{
> > + unsigned long index, end, i;
> > +again:
> > + index = find_next_zero_bit(map, size, start);
> > +
> > + /* Align allocation */
> > + index = (index + align_mask) & ~align_mask;
> > +
> > + end = index + nr;
> > + if (end >= size)
> > + return end;
> > + i = find_next_bit(map, end, index);
> > + if (i < end) {
> > + start = i + 1;
> > + goto again;
> > + }
> > + return index;
> > +}
> > +EXPORT_SYMBOL(bitmap_find_next_zero_area);
>
> This needs documentation, please. It appears that `size' is the size
> of the bitmap and `nr' is the number of zeroed bits we're looking for,
> but an inattentive programmer could get those reversed.
>
> Also the semantics of `align_mask' could benefit from spelling out. Is
> the alignment with respect to memory boundaries or with respect to
> `map' or with respect to map+start or what?
OK. I will document it.
And I plan to change bitmap_find_next_zero_area() to take the alignment
instead of an align_mask as Roland said.
> And why does align_mask exist at all? I was a bit surprised to see it
> there. In which scenarios will it be non-zero?
Because the users of iommu-helper and mlx4 need the alignment requirement
for the zero area.
arch/powerpc/kernel/iommu.c
arch/x86/kernel/amd_iommu.c
arch/x86/kernel/pci-gart_64.c
drivers/net/mlx4/alloc.c
^ permalink raw reply
* Re: bisect results of MSI-X related panic (help!)
From: Tejun Heo @ 2009-10-13 2:39 UTC (permalink / raw)
To: Brandeburg, Jesse
Cc: Jesse Brandeburg, Frans Pop, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, Ingo Molnar, hpa@zytor.com
In-Reply-To: <alpine.WNT.2.00.0910120907170.11804@jbrandeb-mobl2.amr.corp.intel.com>
Brandeburg, Jesse wrote:
> On Mon, 12 Oct 2009, Tejun Heo wrote:
>>> any other debugging tricks/ideas?
>> Hmm... stackprotector adds considerable amount of stack usage and it
>> could be you're seeing stack overflow which would also explain the
>> random crashes you've been seeing. Do you have DEBUG_STACKOVERFLOW
>> turned on? This is on x86_64, right?
>
> Hi, thanks for your response,
>
> [root@jbrandeb-hc linux-2.6.32-rc1]# grep STACKO .config
> CONFIG_DEBUG_STACKOVERFLOW=y
>
> [root@jbrandeb-hc linux-2.6.32-rc1]# grep X86_64 .config
> CONFIG_X86_64=y
> CONFIG_X86_64_SMP=y
> CONFIG_X86_64_ACPI_NUMA=y
>
> stack size is 8K
>
> I tried Jarek's suggestion of CPUMASK_OFFSTACK and still panic.
> [66027.266057] Kernel panic - not syncing: stack-protector: Kernel stack
> is corrupted in: ffffffff810b4eb0
> [66027.266059]
> [66027.266070] Kernel panic - not syncing: stack-protector: Kernel stack
> is corrupted in: ffffffff81472856
> [66027.266071]
> [66027.266081] Pid: 0, comm: swapper Tainted: G W
> 2.6.32-rc2-git-debug #6
> [66027.266086] Call Trace:
>
> that was all I got. Interesting double fault, that hadn't happened
> before.
>
> the symbols might be off slightly since I rebuilt the kernel, but this was
> initial poke at offsets above in gdb
> (gdb) l *0xffffffff810b4eb0
> 0xffffffff810b4eb0 is in dynamic_irq_cleanup (kernel/irq/chip.c:86).
> 81 desc->handle_irq = handle_bad_irq;
> 82 desc->chip = &no_irq_chip;
> 83 desc->name = NULL;
> 84 clear_kstat_irqs(desc);
> 85 spin_unlock_irqrestore(&desc->lock, flags);
> 86 }
Can you please apply the following patch and try to retrigger the
panic?
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index c166019..f5a1482 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -63,6 +63,9 @@ void dynamic_irq_cleanup(unsigned int irq)
struct irq_desc *desc = irq_to_desc(irq);
unsigned long flags;
+ printk("XXX dynamic_irq_cleanup() called on %u\n", irq);
+ dump_stack();
+
if (!desc) {
WARN(1, KERN_ERR "Trying to cleanup invalid IRQ%d\n", irq);
return;
--
tejun
^ permalink raw reply related
* TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-13 5:07 UTC (permalink / raw)
To: netdev
Hello,
I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.
This behaviour contradicts the man page which says it should
wait only for some time :
TCP_DEFER_ACCEPT (since Linux 2.4)
Allows a listener to be awakened only when data arrives
on the socket. Takes an integer value (seconds), this
can bound the maximum number of attempts TCP will
make to complete the connection. This option should not
be used in code intended to be portable.
Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :
case TCP_DEFER_ACCEPT:
icsk->icsk_accept_queue.rskq_defer_accept = 0;
if (val > 0) {
/* Translate value in seconds to number of
* retransmits */
while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
val > ((TCP_TIMEOUT_INIT / HZ) <<
icsk->icsk_accept_queue.rskq_defer_accept))
icsk->icsk_accept_queue.rskq_defer_accept++;
icsk->icsk_accept_queue.rskq_defer_accept++;
}
break;
==> rskq_defer_accept is used as a counter of retransmits.
But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does :
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f8d67cc..1b443b0 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+ inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
inet_rsk(req)->acked = 1;
return NULL;
}
Is there anything I am missing ?
Regards,
Willy
^ permalink raw reply related
* Re: Ping Is Broken
From: Jarek Poplawski @ 2009-10-13 5:10 UTC (permalink / raw)
To: Brian Haley
Cc: Rob.Townley, netdev, Omaha Linux User Group, CentOS mailing list
In-Reply-To: <4AD3BC12.6020409@hp.com>
On Mon, Oct 12, 2009 at 07:30:26PM -0400, Brian Haley wrote:
>
>
> Jarek Poplawski wrote:
> > Brian Haley wrote, On 10/12/2009 10:36 PM:
> >
> >> In this case ping is doing an SO_BINDTODEVICE to eth0, so the kernel is going
> >> to force the packets out of it, even if it isn't the "correct" interface. If
> >> you ran tcpdump you'd probably see an ARP resolution failure, or an ICMP from
> >> a gateway.
> >
> > BTW, SO_BINDTODEVICE is used only to acquire a source address, not the real
> > connection (unless I miss something).
>
> No, SO_BINDTODEVICE affects routing, as well as incoming packets - it's in macros
> like INET_MATCH(), from what I've seen it restricts a socket to only use the device
> specified, irregardless of the source address you're using. For example, you can
> bind to 127.0.0.1 and send packets out eth0 if sk_bound_dev_if is set to it if I
> remember correctly.
I've commented your: "In this case ping is doing an SO_BINDTODEVICE to eth0",
so meant: SO_BINDTODEVICE is used *by ping* only to acquire a source address.
Jarek P.
^ permalink raw reply
* Re: [PATCH -mm] connector: Passing required parameter for function cn_add_callback.
From: Rakib Mullick @ 2009-10-13 5:27 UTC (permalink / raw)
To: David Miller; +Cc: netdev, akpm, zbr, linux-kernel
In-Reply-To: <20091011.223502.77690228.davem@davemloft.net>
On 10/12/09, David Miller <davem@davemloft.net> wrote:
> From: Rakib Mullick <rakib.mullick@gmail.com>
> Date: Mon, 12 Oct 2009 08:44:33 +0600
>
>
> > */
> > -static void cn_proc_mcast_ctl(struct cn_msg *msg)
> > +static void cn_proc_mcast_ctl(struct cn_msg *msg, struct
> > netlink_skb_parms *nsp)
>
>
> Your patches are unusable because your email client corrupts the
> patch, here you can see it is splitting up long lines.
>
> Please fix this up and resubmit.
Oops..... sorry for that. This is the second time I'm facing this
problem. I'll try to fix it.
I'm resubmitting the patch. Here I've recreate the patch to split the
line into two.
And it just reaches its 80 line mark. Hopefully it will be okay.
Thanks,
----
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
--- linus/drivers/connector/cn_proc.c 2009-10-09 17:38:22.000000000 +0600
+++ rakib/drivers/connector/cn_proc.c 2009-10-13 12:29:37.000000000 +0600
@@ -225,9 +225,11 @@ static void cn_proc_ack(int err, int rcv
/**
* cn_proc_mcast_ctl
- * @data: message sent from userspace via the connector
+ * @msg: message sent from userspace via the connector
+ * @nsp: remains unused but we need it to keep it
*/
-static void cn_proc_mcast_ctl(struct cn_msg *msg)
+static void cn_proc_mcast_ctl(struct cn_msg *msg,
+ struct netlink_skb_parms *nsp)
{
enum proc_cn_mcast_op *mc_op = NULL;
int err = 0;
^ permalink raw reply
* [PATCH net-next] cnic: Fix build error on powerpc.
From: Michael Chan @ 2009-10-13 5:42 UTC (permalink / raw)
To: davem; +Cc: netdev, sfr, Michael Chan
Add missing #include <net/ip6_checksum.h>
Problem noted by Stephen Rothwell.
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/cnic.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/net/cnic.c b/drivers/net/cnic.c
index 6e7af7b..333b1d1 100644
--- a/drivers/net/cnic.c
+++ b/drivers/net/cnic.c
@@ -33,6 +33,7 @@
#include <net/route.h>
#include <net/ipv6.h>
#include <net/ip6_route.h>
+#include <net/ip6_checksum.h>
#include <scsi/iscsi_if.h>
#include "cnic_if.h"
--
1.6.4.GIT
^ permalink raw reply related
* [PATCH] [NIU] VLAN does not work with niu driver
From: Joyce Yu @ 2009-10-13 5:41 UTC (permalink / raw)
To: netdev
From 0bb77e878758bd72051577bcc568e2b95c87c203 Mon Sep 17 00:00:00 2001
From: Joyce Yu <joyce.yu@sun.com>
Date: Mon, 12 Oct 2009 11:03:54 -0700
Subject: [PATCH] VLAN does not work with niu driver
Signed-off-by: Joyce Yu <joyce.yu@sun.com>
---
drivers/net/niu.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index f9364d0..9559e42 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -3480,6 +3480,7 @@ static int niu_process_rx_pkt(struct napi_struct
*napi, struct niu *np,
unsigned int index = rp->rcr_index;
struct sk_buff *skb;
int len, num_rcr;
+ struct vlan_ethhdr *veth;
skb = netdev_alloc_skb(np->dev, RX_SKB_ALLOC_SIZE);
if (unlikely(!skb))
@@ -3545,7 +3546,15 @@ static int niu_process_rx_pkt(struct napi_struct
*napi, struct niu *np,
rp->rcr_index = index;
skb_reserve(skb, NET_IP_ALIGN);
- __pskb_pull_tail(skb, min(len, NIU_RXPULL_MAX));
+ __pskb_pull_tail(skb, min(len, VLAN_ETH_HLEN));
+
+ veth = (struct vlan_ethhdr *)skb->data;
+ if (veth->h_vlan_proto != __constant_htons(ETH_P_8021Q)) {
+ skb->tail -= 4;
+ skb->data_len += 4;
+ skb_shinfo(skb)->frags[0].page_offset -= 4;
+ skb_shinfo(skb)->frags[0].size += 4;
+ }
rp->rx_packets++;
rp->rx_bytes += skb->len;
--
1.6.4
--
^ permalink raw reply related
* Re: [PATCH] [NIU] VLAN does not work with niu driver
From: David Miller @ 2009-10-13 6:22 UTC (permalink / raw)
To: Joyce.Yu; +Cc: netdev
In-Reply-To: <4AD41310.7040209@Sun.COM>
From: Joyce Yu <Joyce.Yu@Sun.COM>
Date: Mon, 12 Oct 2009 22:41:36 -0700
> +++ b/drivers/net/niu.c
> @@ -3480,6 +3480,7 @@ static int niu_process_rx_pkt(struct napi_struct
> *napi, struct niu *np,
Your email client is still breaking up long lines.
Take your time and fix your setup correctly. Read
'linux/Documentation/email-clients.txt' for tips.
^ permalink raw reply
* Re: [PATCH 1/1] net: Introduce recvmmsg socket syscall
From: David Miller @ 2009-10-13 6:40 UTC (permalink / raw)
To: acme
Cc: netdev, caitlin.bestler, vanhoof, williams, nhorman, nir.tzachar,
niv, paul.moore, remi.denis-courmont, steve
In-Reply-To: <1255364440-23271-1-git-send-email-acme@redhat.com>
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Mon, 12 Oct 2009 13:20:40 -0300
> Meaning receive multiple messages, reducing the number of syscalls and
> net stack entry/exit operations.
>
> Next patches will introduce mechanisms where protocols that want to
> optimize this operation will provide an unlocked_recvmsg operation.
>
> This takes into account comments made by:
...
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
You missed the syscall entry addition for arch/sparc/kernel/systbls_32.S
but I fixed that up while applying this to net-next-2.6
Thanks.
^ permalink raw reply
* Re: [PATCH -mm] connector: Passing required parameter for function cn_add_callback.
From: David Miller @ 2009-10-13 6:42 UTC (permalink / raw)
To: rakib.mullick; +Cc: netdev, akpm, zbr, linux-kernel
In-Reply-To: <b9df5fa10910122227r710d6782m89fc9ec4273487b8@mail.gmail.com>
From: Rakib Mullick <rakib.mullick@gmail.com>
Date: Tue, 13 Oct 2009 11:27:49 +0600
> On 10/12/09, David Miller <davem@davemloft.net> wrote:
>> From: Rakib Mullick <rakib.mullick@gmail.com>
>> Date: Mon, 12 Oct 2009 08:44:33 +0600
>>
>>
>> > */
>> > -static void cn_proc_mcast_ctl(struct cn_msg *msg)
>> > +static void cn_proc_mcast_ctl(struct cn_msg *msg, struct
>> > netlink_skb_parms *nsp)
>>
>>
>> Your patches are unusable because your email client corrupts the
>> patch, here you can see it is splitting up long lines.
>>
>> Please fix this up and resubmit.
> Oops..... sorry for that. This is the second time I'm facing this
> problem. I'll try to fix it.
> I'm resubmitting the patch. Here I've recreate the patch to split the
> line into two.
> And it just reaches its 80 line mark. Hopefully it will be okay.
Upstream and my net-next-2.6 tree both have this function with
the proper parameters, maybe Andrew's -mm tree had some conflicts
and they weren't dealt with correctly.
^ permalink raw reply
* Re: [PATCH 0/8] gianfar: Add support for hibernation
From: David Miller @ 2009-10-13 6:57 UTC (permalink / raw)
To: avorontsov; +Cc: scottwood, linuxppc-dev, netdev, afleming
In-Reply-To: <20091012160000.GA32406@oksana.dev.rtsoft.ru>
From: Anton Vorontsov <avorontsov@ru.mvista.com>
Date: Mon, 12 Oct 2009 20:00:00 +0400
> Here are few patches that add support for hibernation for gianfar
> driver.
>
> Technically, we could just do gfar_close() and then gfar_enet_open()
> sequence to restore gianfar functionality after hibernation, but
> close/open does so many unneeded things (e.g. BDs buffers freeing and
> allocation, IRQ freeing and requesting), that I felt it would be much
> better to cleanup and refactor some code to make the hibernation [and
> not only hibernation] code a little bit prettier.
I applied all of this, it's a really nice patch set. If there are any
problems we can deal with it using follow-on fixups.
I noticed something, in patch #3 where you remove the spurious wrap
bit setting in startup_gfar(). It looks like that was not only
spurious but it was doing it wrong too.
It's writing garbage into the status word, because it's not using the
BD_LFLAG() macro to shift the value up 16 bits.
^ permalink raw reply
* Re: [PATCH -mm] connector: Passing required parameter for function cn_add_callback.
From: Rakib Mullick @ 2009-10-13 7:00 UTC (permalink / raw)
To: David Miller; +Cc: netdev, akpm, zbr, linux-kernel
In-Reply-To: <20091012.234214.110950674.davem@davemloft.net>
On 10/13/09, David Miller <davem@davemloft.net> wrote:
> From: Rakib Mullick <rakib.mullick@gmail.com>
>
> Date: Tue, 13 Oct 2009 11:27:49 +0600
>
>
> > On 10/12/09, David Miller <davem@davemloft.net> wrote:
> >> From: Rakib Mullick <rakib.mullick@gmail.com>
> >> Date: Mon, 12 Oct 2009 08:44:33 +0600
> >>
> >>
> >> > */
> >> > -static void cn_proc_mcast_ctl(struct cn_msg *msg)
> >> > +static void cn_proc_mcast_ctl(struct cn_msg *msg, struct
> >> > netlink_skb_parms *nsp)
> >>
> >>
> >> Your patches are unusable because your email client corrupts the
> >> patch, here you can see it is splitting up long lines.
> >>
> >> Please fix this up and resubmit.
> > Oops..... sorry for that. This is the second time I'm facing this
> > problem. I'll try to fix it.
> > I'm resubmitting the patch. Here I've recreate the patch to split the
> > line into two.
> > And it just reaches its 80 line mark. Hopefully it will be okay.
>
>
> Upstream and my net-next-2.6 tree both have this function with
> the proper parameters, maybe Andrew's -mm tree had some conflicts
> and they weren't dealt with correctly.
>
Ah, sorry again. I should've check the upstream first. Sorry for noise.
Thanks,
^ permalink raw reply
* Re: [PATCH 1/3] mdio: Advertise pause (flow control) settings even if autoneg is off
From: David Miller @ 2009-10-13 7:01 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1255375577.2931.11.camel@achroite>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 12 Oct 2009 20:26:17 +0100
> Currently, if pause autoneg is off we do not set either pause
> advertising flag. If autonegotiation of speed and duplex settings is
> enabled, there is no way for the link partner to distinguish this from
> our refusing to use pause frames.
>
> We should instead set the advertising flags according to the forced
> mode so that the link partner can follow our lead. This is consistent
> with the behaviour of other drivers.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied to net-next-2.6
^ permalink raw reply
* Re: [PATCH 2/3] mdio: Expose pause frame advertising flags to ethtool
From: David Miller @ 2009-10-13 7:02 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1255375597.2931.13.camel@achroite>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 12 Oct 2009 20:26:37 +0100
> In mdio45_ethtool_gset_npage() and mdio45_ethtool_gset(), check MDIO
> pause frame advertising flags and set the corresponding ethtool flags.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied to net-next-2.6
^ permalink raw reply
* Re: [PATCH 3/3] sfc: 10Xpress: Initialise pause advertising flags
From: David Miller @ 2009-10-13 7:02 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1255375627.2931.15.camel@achroite>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 12 Oct 2009 20:27:07 +0100
> The mdio module now handles reconfiguration of pause advertising
> through ethtool, but not initialisation. Add the necessary
> initialisation to tenxpress_phy_init().
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied to net-next-2.6
Thanks!
^ permalink raw reply
* Re: [PATCH] Fix IXP 2000 network driver building.
From: David Miller @ 2009-10-13 7:03 UTC (permalink / raw)
To: vince; +Cc: netdev
In-Reply-To: <1255376775-21390-1-git-send-email-vince@simtec.co.uk>
From: Vincent Sanders <vince@simtec.co.uk>
Date: Mon, 12 Oct 2009 20:46:15 +0100
> The IXP 2000 network driver was failing to build as it has its own
> statistics gathering which was not compatible with the recent network
> device operations changes. This patch fixes the driver in the obvious
> way and has been compile tested. I have been unable to get the ixp2000
> maintainer to comment or test this fix.
>
> Signed-off-by: Vincent Sanders <vince@simtec.co.uk>
Looks good enough for me, applied, thanks!
^ permalink raw reply
* Re: TCP_DEFER_ACCEPT is missing counter update
From: David Miller @ 2009-10-13 7:11 UTC (permalink / raw)
To: w; +Cc: netdev
In-Reply-To: <20091013050705.GA2194@1wt.eu>
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 Oct 2009 07:07:06 +0200
> But in tcp_minisocks.c, this counter is only checked. And in
> fact, I have found no location which updates it. So I think
> that what was intended was to decrease it in tcp_minisocks
> whenever it is checked, which the trivial patch below does :
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index f8d67cc..1b443b0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
> TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
> + inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
> inet_rsk(req)->acked = 1;
> return NULL;
> }
>
> Is there anything I am missing ?
Your logic looks sound and I can't come to any other conclusion
after going over this code, even going back to 2.4.x
Feel free to make a formal patch submission, thanks Willy.
^ permalink raw reply
* Re: [PATCH 2.6.32-rc4] net: VMware virtual Ethernet NIC driver: vmxnet3
From: David Miller @ 2009-10-13 7:16 UTC (permalink / raw)
To: sbhatewara
Cc: pv-drivers, netdev, linux-kernel, virtualization, chrisw, greg,
shemminger
In-Reply-To: <alpine.LRH.2.00.0910121447090.13205@sbhatewara-dev1.eng.vmware.com>
From: Shreyas Bhatewara <sbhatewara@vmware.com>
Date: Mon, 12 Oct 2009 15:18:42 -0700 (PDT)
>
> Ethernet NIC driver for VMware's vmxnet3
>
> From: Shreyas Bhatewara <sbhatewara@vmware.com>
>
> This patch adds driver support for VMware's virtual Ethernet NIC: vmxnet3
> Guests running on VMware hypervisors supporting vmxnet3 device will thus have
> access to improved network functionalities and performance.
>
> Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>
> Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
> Signed-off-by: Ronghua Zhang <ronghua@vmware.com>
Ok, looks good, applied to net-2.6, thanks!
^ permalink raw reply
* Re: TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-13 7:19 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20091013.001106.226276168.davem@davemloft.net>
On Tue, Oct 13, 2009 at 12:11:06AM -0700, David Miller wrote:
> Your logic looks sound and I can't come to any other conclusion
> after going over this code, even going back to 2.4.x
>
> Feel free to make a formal patch submission, thanks Willy.
Ok, here it comes (I used my explanation as the commit message).
Thanks David,
Willy
>From da80c99a503bab1256706ed8d967e2ab3f71afe0 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 Oct 2009 07:26:54 +0200
Subject: tcp: fix tcp_defer_accept to consider the timeout
I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.
This behaviour contradicts the man page which says it should
wait only for some time :
TCP_DEFER_ACCEPT (since Linux 2.4)
Allows a listener to be awakened only when data arrives
on the socket. Takes an integer value (seconds), this
can bound the maximum number of attempts TCP will
make to complete the connection. This option should not
be used in code intended to be portable.
Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :
case TCP_DEFER_ACCEPT:
icsk->icsk_accept_queue.rskq_defer_accept = 0;
if (val > 0) {
/* Translate value in seconds to number of
* retransmits */
while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
val > ((TCP_TIMEOUT_INIT / HZ) <<
icsk->icsk_accept_queue.rskq_defer_accept))
icsk->icsk_accept_queue.rskq_defer_accept++;
icsk->icsk_accept_queue.rskq_defer_accept++;
}
break;
==> rskq_defer_accept is used as a counter of retransmits.
But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does.
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
net/ipv4/tcp_minisocks.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f8d67cc..2f676f3 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -644,6 +644,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+ inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
inet_rsk(req)->acked = 1;
return NULL;
}
--
1.5.3.3
^ permalink raw reply related
* Re: [PATCH 0/2] Build fixes for todays next tree
From: David Miller @ 2009-10-13 7:21 UTC (permalink / raw)
To: alan; +Cc: netdev, linux-kernel
In-Reply-To: <20091012152247.5694.77290.stgit@localhost.localdomain>
From: Alan Cox <alan@linux.intel.com>
Date: Mon, 12 Oct 2009 16:27:17 +0100
> Couple of networking warns/build fails
I'll apply these, thanks Alan.
^ permalink raw reply
* Re: TCP_DEFER_ACCEPT is missing counter update
From: Eric Dumazet @ 2009-10-13 7:23 UTC (permalink / raw)
To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20091013050705.GA2194@1wt.eu>
Willy Tarreau a écrit :
> Hello,
>
> I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> client does not talk, the connection is never accepted and
> remains in SYN_RECV state until the retransmits expire, where
> it finally is deleted. This is bad when some firewall such as
> netfilter sits between the client and the server because the
> firewall sees the connection in ESTABLISHED state while the
> server will finally silently drop it without sending an RST.
>
> This behaviour contradicts the man page which says it should
> wait only for some time :
>
> TCP_DEFER_ACCEPT (since Linux 2.4)
> Allows a listener to be awakened only when data arrives
> on the socket. Takes an integer value (seconds), this
> can bound the maximum number of attempts TCP will
> make to complete the connection. This option should not
> be used in code intended to be portable.
>
> Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> computed :
>
> case TCP_DEFER_ACCEPT:
> icsk->icsk_accept_queue.rskq_defer_accept = 0;
> if (val > 0) {
> /* Translate value in seconds to number of
> * retransmits */
> while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
> val > ((TCP_TIMEOUT_INIT / HZ) <<
> icsk->icsk_accept_queue.rskq_defer_accept))
> icsk->icsk_accept_queue.rskq_defer_accept++;
> icsk->icsk_accept_queue.rskq_defer_accept++;
> }
> break;
>
> ==> rskq_defer_accept is used as a counter of retransmits.
>
> But in tcp_minisocks.c, this counter is only checked. And in
> fact, I have found no location which updates it. So I think
> that what was intended was to decrease it in tcp_minisocks
> whenever it is checked, which the trivial patch below does :
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index f8d67cc..1b443b0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
> TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
> + inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
> inet_rsk(req)->acked = 1;
> return NULL;
> }
>
I dont understand why you want to decrement rskq_defer_accept here.
We receive a pure ACK (wihout DATA).
We should receive exactly one such ACK.
(This is the third packet of the three way TCP handshake)
How this can solve the problem you mention ?
(ie, not sending an RST when we timeout the SYN_RECV session)
^ permalink raw reply
* Re: [PATCHv3] netxen: fix pci bar mapping
From: David Miller @ 2009-10-13 7:26 UTC (permalink / raw)
To: dhananjay; +Cc: netdev
In-Reply-To: <1255329573-17701-1-git-send-email-dhananjay@netxen.com>
From: Dhananjay Phadke <dhananjay@netxen.com>
Date: Sun, 11 Oct 2009 23:39:33 -0700
> Use resource_size_t for PCI resource remapping instead
> of unsigned long. Physical addresses can exceed range of
> long data type (e.g with PAE).
>
> Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Applied.
^ permalink raw reply
* Re: TCP_DEFER_ACCEPT is missing counter update
From: David Miller @ 2009-10-13 7:27 UTC (permalink / raw)
To: w; +Cc: netdev
In-Reply-To: <20091013071955.GA3587@1wt.eu>
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 Oct 2009 09:19:55 +0200
> On Tue, Oct 13, 2009 at 12:11:06AM -0700, David Miller wrote:
>> Your logic looks sound and I can't come to any other conclusion
>> after going over this code, even going back to 2.4.x
>>
>> Feel free to make a formal patch submission, thanks Willy.
>
> Ok, here it comes (I used my explanation as the commit message).
Applied and queued up for -stable, thanks!
^ permalink raw reply
* Re: [PATCH] Teach pegasus driver to ignore bluetoother adapters with clashing Vendor:Product IDs
From: David Miller @ 2009-10-13 7:31 UTC (permalink / raw)
To: rankincj-/E1597aS9LQAvxtiuMwx3w
Cc: petkan-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f,
linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <910710.29413.qm-7brg08ZX0I2B9c0Qi4KiSl5cfvJIxWXgQQ4Iyu8u01E@public.gmane.org>
From: Chris Rankin <rankincj-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>
Date: Sun, 11 Oct 2009 06:57:50 -0700 (PDT)
> I submitted this patch months ago and then forgot about it, but I've
> noticed that it hasn't been picked up so I've regenerated it against
> 2.6.31.
>
> Short description:
>
> The Belkin F8T012xx1 bluetooth adaptor has the same vendor and product
> IDs as the Belkin F5D5050, so we need to teach the pegasus driver to
> ignore adaptors belonging to the "Wireless" class 0xE0. For this one
> case anyway, seeing as pegasus is a driver for "Wired" adaptors.
Applied, but please put your Signed-off-by: here at the end
of your commit message rather than after the patch.
Putting it after the patch makes it more difficult for
automated tools to pick it up, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH kernel 2.6.32-rc3] 3c574_cs: spin_lock the set_multicast_list function
From: David Miller @ 2009-10-13 7:33 UTC (permalink / raw)
To: ken_kawasaki; +Cc: netdev
In-Reply-To: <20091011211225.35e6fb4a.ken_kawasaki@spring.nifty.jp>
From: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Date: Sun, 11 Oct 2009 21:12:25 +0900
> 3c574_cs:
> spin_lock the set_multicast_list function.
>
> Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Applied, thanks!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox