Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Moving drivers into staging (was Re: [GIT PULL] SCSI fixes for 2.6.32-rc3)
From: Ingo Molnar @ 2009-10-14  6:33 UTC (permalink / raw)
  To: Joe Perches
  Cc: Greg KH, Luis R. Rodriguez, James Bottomley, Linus Torvalds,
	Theodore Tso, Andrew Morton, linux-scsi, linux-kernel, Jing Huang,
	netdev, linux-wireless
In-Reply-To: <1255497575.1851.16.camel@Joe-Laptop.home>

* Joe Perches <joe@perches.com> wrote:

> On Tue, 2009-10-13 at 21:45 -0700, Greg KH wrote:
> > How about when it was scheduled to be removed, we put it in staging and
> > I'll add it to my announcements about the staging tree every release?
> > Unless you can think of a better way?
> 
> staging/to_be_removed_unless_fixed_by/v.x.y ?

Yes, that's a real worry. Some time ago i suggested:

  drivers/staging/good/
  drivers/staging/bad/
  drivers/staging/ugly/

 good:  drivers that are to go upstream in the next cycle 
 bad:   outgoing drivers being obsoleted or abandoned
 ugly:  incoming messy drivers with active developers

The messaging of this looks nice and the names are short and obvious.

An added benefit is that this kind of separation makes it easy for 
people interested in drivers/staging to follow the 'status' of drivers. 
Once stuff goes into 'good' a different kind of review is needed than if 
a driver goes into 'ugly'.

The main disadvantage would be the PR angle: putting new drivers into a 
path named 'ugly'. Not something you want to put into a quarterly status 
report, right? If we put drivers/staging/ugly/ drivers into 
drivers/staging/ itself, we'd solve that problem. I.e. we'd keep the 
current scheme, but we'd also add drivers/staging/good/ and 
drivers/staging/bad/ as two extra stages for incoming and outgoing 
drivers.

A third version would be a more neutral name:

  drivers/staging/incoming/
  drivers/staging/outgoing/

I think it has many advantages, but (of course!) it all depends on 
whether Greg wants to have any separation like this.

	Ingo

^ permalink raw reply

* Re: Moving drivers into staging (was Re: [GIT PULL] SCSI fixes for 2.6.32-rc3)
From: Joe Perches @ 2009-10-14  5:19 UTC (permalink / raw)
  To: Greg KH
  Cc: Luis R. Rodriguez, Ingo Molnar, James Bottomley, Linus Torvalds,
	Theodore Tso, Andrew Morton, linux-scsi, linux-kernel, Jing Huang,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20091014044519.GA19199-l3A5Bk7waGM@public.gmane.org>

On Tue, 2009-10-13 at 21:45 -0700, Greg KH wrote:
> How about when it was scheduled to be removed, we put it in staging and
> I'll add it to my announcements about the staging tree every release?
> Unless you can think of a better way?

staging/to_be_removed_unless_fixed_by/v.x.y ?


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-14  4:52 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, Eric Dumazet
In-Reply-To: <Pine.LNX.4.58.0910132335390.3095@u.domain.uli>

Hello Julian,

On Wed, Oct 14, 2009 at 12:27:41AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Tue, 13 Oct 2009, Willy Tarreau wrote:
> 
> > >From da80c99a503bab1256706ed8d967e2ab3f71afe0 Mon Sep 17 00:00:00 2001
> > From: Willy Tarreau <w@1wt.eu>
> > Date: Tue, 13 Oct 2009 07:26:54 +0200
> > Subject: tcp: fix tcp_defer_accept to consider the timeout
> > 
> > I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> > client does not talk, the connection is never accepted and
> > remains in SYN_RECV state until the retransmits expire, where
> > it finally is deleted. This is bad when some firewall such as
> 
> 	I think, this is by design, there is big comment in
> tcp_check_req().

I'm not sure. That would considerably reduce the usefulness of
the feature. The comment I see there is just a one line explaining
why we drop the ACK. It does not indicate any strategy on what to
do when the counter expires.

> > netfilter sits between the client and the server because the
> > firewall sees the connection in ESTABLISHED state while the
> > server will finally silently drop it without sending an RST.
> 
> 	Client can stay ESTABLISHED for long time but
> RST will be sent when client sends DATA or FIN.

Yes you're right. In fact, this only weakens firewalls in case of
pure scans, but attacks on SYN cookies do that too, as well as
TTL-based attacks.

> > This behaviour contradicts the man page which says it should
> > wait only for some time :
> > 
> >        TCP_DEFER_ACCEPT (since Linux 2.4)
> >           Allows a listener to be awakened only when data arrives
> >           on the socket.  Takes an integer value  (seconds), this
> >           can  bound  the  maximum  number  of attempts TCP will
> >           make to complete the connection. This option should not
> >           be used in code intended to be portable.
> 
> 	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
> and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.

That's what I observe too, but the connection is silently dropped
afterwards and I'm clearly not sure this was the intended behaviour.

> > Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> > computed :
> 
> 	rskq_defer_accept is threshold, not counter
> 
> >         case TCP_DEFER_ACCEPT:
> >                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
> >                 if (val > 0) {
> >                         /* Translate value in seconds to number of
> >                          * retransmits */
> >                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
> >                                val > ((TCP_TIMEOUT_INIT / HZ) <<
> >                                        icsk->icsk_accept_queue.rskq_defer_accept))
> >                                 icsk->icsk_accept_queue.rskq_defer_accept++;
> >                         icsk->icsk_accept_queue.rskq_defer_accept++;
> >                 }
> >                 break;
> > 
> > ==> rskq_defer_accept is used as a counter of retransmits.
> 
> 	as limit for retransmits, not as counter

yes if you want, that's what I mean.

> > But in tcp_minisocks.c, this counter is only checked. And in
> > fact, I have found no location which updates it. So I think
> > that what was intended was to decrease it in tcp_minisocks
> > whenever it is checked, which the trivial patch below does.
> 
> 	You can check net/ipv4/inet_connection_sock.c,
> inet_csk_reqsk_queue_prune() where TCP_DEFER_ACCEPT can extend
> the retransmission threshold for acked sockets above the
> applied 'thresh'.

So clearly this is in order to improve chances that the application
will receive the connection, no ?

> So, there are 2 options:
> 
> a) TCP_DEFER_ACCEPT is used as flag (eg. 1) or the period is below
> the TCP_SYNCNT period. In this case TCP_DEFER_ACCEPT does not
> extend the period for DATA (DATA must come before TCP_SYNCNT).
> Application is notified only when DATA comes.
> 
> or
> 
> b) TCP_DEFER_ACCEPT is set with seconds above the TCP_SYNCNT
> retrans limit and the first ACK extends the period up to
> TCP_DEFER_ACCEPT seconds (converted as retrans). By this
> way we provide more time for DATA after the empty ACKs.
> ACK again can come before TCP_SYNCNT but DATA after ACK
> can come even after TCP_SYNCNT but before TCP_DEFER_ACCEPT
> timeout. Again, application is notified only when DATA comes.

Yes this is what happens right now, but reading the man again
does not imply to me that the connection will not be accepted
once we reach the retransmit limit.

Maybe we have different usages and different interpretations of
the man can satisfy either, but I don't see what this would be
useful to in case we silently drop instead of finally accepting.

Regards,
Willy


^ permalink raw reply

* Re: Moving drivers into staging (was Re: [GIT PULL] SCSI fixes for 2.6.32-rc3)
From: Greg KH @ 2009-10-14  4:45 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, James Bottomley, Linus Torvalds, Theodore Tso,
	Andrew Morton, linux-scsi, linux-kernel, Jing Huang, netdev,
	linux-wireless
In-Reply-To: <43e72e890910131108m789110b4hc35e25601ad67bb7@mail.gmail.com>

On Tue, Oct 13, 2009 at 11:08:15AM -0700, Luis R. Rodriguez wrote:
> On Mon, Oct 12, 2009 at 4:24 PM, Greg KH <gregkh@suse.de> wrote:
> > On Mon, Oct 12, 2009 at 05:42:44PM +0200, Ingo Molnar wrote:
> >> Hm, i think i even gave drivers/staging/ its name?
> >
> > Yes you did, and I appreciate it :)
> >
> >> > [...] It seems that I'm the only one that has the ability to drop
> >> > drivers out of the kernel tree, which is a funny situation :)
> >>
> >> You are the only one who has the ability to send a warning shot towards
> >> drivers _without hurting users_, and by moving it into the focus of a
> >> team of cleanup oriented developers.
> >>
> >> I think that's an important distinction ;-)
> >
> > Good point.
> >
> >> > In thinking about this a lot more, I don't really mind it. ??If people
> >> > want to push stuff out of "real" places in the kernel, into
> >> > drivers/staging/ and give the original authors and maintainers notice
> >> > about what is going on, _and_ provide a TODO file for what needs to
> >> > happen to get the code back into the main portion of the kernel tree,
> >> > then I'll be happy to help out with this and manage it.
> >> >
> >> > I think a 6-9 month window (basically 3 kernel releases) should be
> >> > sufficient time to have a driver that has been in drivers/staging/ be
> >> > cleaned up enough to move back into the main kernel tree. ??If not, it
> >> > could be easily dropped.
> >> >
> >> > Any objections to this?
> >>
> >> Sounds excellent to me!
> >
> > Great, I'll await the patches to move stuff to drivers/staging/ now.
> >
> > Wireless developers, warm up your editors :)
> 
> OK -- prism54 seems like a good candidate, instead of removing it
> completely as I originally outlined on the feature removal schedule.
> Do we have a file to give notices to move drivers to staging because
> they are old as with the feature removal schedule? The more visible
> these things become the better it is for users.

We've found that the feature removal file is also ignored :)

How about when it was scheduled to be removed, we put it in staging and
I'll add it to my announcements about the staging tree every release?

Unless you can think of a better way?

thanks,

greg k-h

^ permalink raw reply

* query: tcpdump versus atomic?
From: William Allen Simpson @ 2009-10-14  4:23 UTC (permalink / raw)
  To: netdev

Anybody know what code path tcpdump changes to running atomic?

Is there a function to test whether you're running atomic?

Dutifully testing, and noticed a strange anomaly.  When the code is
running normally, no problems.  When running tcpdump on the client, no
problems.  When running tcpdump on the server:

[   88.997594] device eth0 entered promiscuous mode
[  114.827403] BUG: scheduling while atomic: swapper/0/0x10000100
[  114.827462] Modules linked in: lp snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd ppdev iTCO_wdt iTCO_vendor_support psmouse soundcore parport_pc intel_agp parport agpgart pcspkr serio_raw shpchp snd_page_alloc 8139too aic7xxx 8139cp 
scsi_transport_spi mii floppy
[  114.827493]
[  114.827497] Pid: 0, comm: swapper Not tainted (2.6.32-rc3 #4) Imperial
[  114.827501] EIP: 0060:[<c0123295>] EFLAGS: 00000246 CPU: 0
[  114.827512] EIP is at native_safe_halt+0x5/0x10
[  114.827515] EAX: c0740000 EBX: 00000000 ECX: ffff4b6e EDX: 00000000
[  114.827519] ESI: c07992c0 EDI: c0743000 EBP: c0741fa0 ESP: c0741fa0
[  114.827522]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  114.827525] CR0: 8005003b CR2: 09278fc4 CR3: 04b56000 CR4: 00000690
[  114.827529] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  114.827532] DR6: ffff0ff0 DR7: 00000400
[  114.827535] Call Trace:
[  114.827546]  [<c01098b5>] default_idle+0x65/0x90
[  114.827550]  [<c0102062>] cpu_idle+0x52/0x90
[  114.827558]  [<c056cc23>] rest_init+0x53/0x60
[  114.827565]  [<c079c93d>] start_kernel+0x328/0x390
[  114.827569]  [<c079c3ce>] ? unknown_bootoption+0x0/0x1f6
[  114.827574]  [<c079c07e>] i386_start_kernel+0x7e/0xa8
[  136.570632] device eth0 left promiscuous mode


^ permalink raw reply

* Re: [PATCH] net: add support for STMicroelectronics Ethernet controllers.
From: Eric Dumazet @ 2009-10-14  4:07 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: netdev
In-Reply-To: <4AD47425.7050003@st.com>

Giuseppe CAVALLARO a écrit :
> Hi Eric,
> 
> Giuseppe CAVALLARO wrote:
>> I'm going to post a new patch for the stmmac as soon as I fix all these
>> points.
> 
> Hoping to have well followed all your advice, I'm attaching the
> patch again.
> 
> As you will see, I've reviewed the tx clean process by removing the
> wrong check that improperly impacted the poll logic.
> I've noticed an improvement in terms of performances as well (*).
> Taking as example the tg3 driver, I've reviewed and modified the
> locking mechanism (fixing the issue on tso you had raised).
> 
> Please, let me know if you have other suggestions and/or advice.
> 
> Regards,
> Peppe
> 
> 
> P.S. (*)
> I've just done some stress tests on our STB (mb618 - STi7111 SH4-300
> @450MHz) and I cannot see any failures or strange issues at this time.
> 

Hi Giuseppe

I reviewed your code and found no obvious issues

Could you please avoid defining this

#define STMMAC_IP_ALIGN NET_IP_ALIGN

(I see tg3.c uses a similar TG3_RAW_IP_ALIGN, this is probably why you felt it was necessary :
In case of tg3, TG3_RAW_IP_ALIGN is used in cases where we always want to align the
IP/network header on dword boundaries, even on platforms where NET_IP_ALIGN)

I am not sure this is what you want for STM.

In that case, please check new netdev_alloc_skb_ip_align() helper

+		skb = netdev_alloc_skb(dev, bfsize);
+		if (unlikely(skb == NULL)) {
+			pr_err("%s: Rx init fails; skb is NULL\n",
+			       __func__);
+			break;
+		}
+		skb_reserve(skb, STMMAC_IP_ALIGN);
+
+		priv->rx_skbuff[i] = skb;
+		priv->rx_skbuff_dma[i] = dma_map_single(priv->device,
+						skb->data,
+						bfsize - STMMAC_IP_ALIGN,
+						DMA_FROM_DEVICE);

becomes  (pktsize being real packet size, not packet size + NET_IP_ALIGN)

+		skb = netdev_alloc_skb_ip_align(dev, pktsize);
+		if (unlikely(skb == NULL)) {
+			pr_err("%s: Rx init fails; skb is NULL\n",
+			       __func__);
+			break;
+		}
+
+		priv->rx_skbuff[i] = skb;
+		priv->rx_skbuff_dma[i] = dma_map_single(priv->device,
+						skb->data,
+						pktsize,
+						DMA_FROM_DEVICE);



^ permalink raw reply

* Re: [Bugme-new] [Bug 14330] New: Kernel
From: David Miller @ 2009-10-14  3:43 UTC (permalink / raw)
  To: akpm; +Cc: netdev, bugzilla-daemon, bugme-daemon, hannes
In-Reply-To: <20091013194806.e1403ca9.akpm@linux-foundation.org>

From: Andrew Morton <akpm@linux-foundation.org>
Date: Tue, 13 Oct 2009 19:48:06 -0700

> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).

Fixed by:

commit a21090cff296eb82b6de09304d64de466bdcaefc
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Wed Oct 7 03:18:17 2009 -0700

    ipv4: arp_notify address list bug
    
    This fixes a bug with arp_notify.
    
    If arp_notify is enabled, kernel will crash if address is changed
    and no IP address is assigned.
      http://bugzilla.kernel.org/show_bug.cgi?id=14330
    
    Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


^ permalink raw reply

* Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
From: Akinobu Mita @ 2009-10-14  3:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Fenghua Yu, x86, linux-ia64, Thomas Gleixner, David S. Miller,
	netdev, Greg Kroah-Hartman, linux-kernel, linux-altix,
	Yevgeny Petrilin, FUJITA Tomonori, linuxppc-dev, Tony Luck,
	Paul Mackerras, H. Peter Anvin, sparclinux, Andrew Morton,
	linux-usb, Ingo Molnar, Lothar Wassmann
In-Reply-To: <1255470887.21871.2.camel@concordia>

On Wed, Oct 14, 2009 at 08:54:47AM +1100, Michael Ellerman wrote:
> On Tue, 2009-10-13 at 18:10 +0900, Akinobu Mita wrote:
> > My user space testing exposed off-by-one error find_next_zero_area
> > in iommu-helper.
> 
> Why not merge those tests into the kernel as a configurable boot-time
> self-test?

I send the test program that I used. Obviously it needs
better diagnostic messages and cleanup to be added into kernel tests.

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>

#if 1 /* Copy and paste from kernel source */

#define BITS_PER_BYTE  8
#define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE)

#define BIT_WORD(nr)	((nr) / BITS_PER_LONG)
#define BITOP_WORD(nr)	((nr) / BITS_PER_LONG)

#define BITMAP_LAST_WORD_MASK(nbits)                                    \
(                                                                       \
        ((nbits) % BITS_PER_LONG) ?                                     \
                (1UL<<((nbits) % BITS_PER_LONG))-1 : ~0UL               \
)

#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))

void bitmap_set(unsigned long *map, int start, int nr)
{
	unsigned long *p = map + BIT_WORD(start);
	const int size = start + nr;
	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);

	while (nr - bits_to_set >= 0) {
		*p |= mask_to_set;
		nr -= bits_to_set;
		bits_to_set = BITS_PER_LONG;
		mask_to_set = ~0UL;
		p++;
	}
	if (nr) {
		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
		*p |= mask_to_set;
	}
}

void bitmap_clear(unsigned long *map, int start, int nr)
{
	unsigned long *p = map + BIT_WORD(start);
	const int size = start + nr;
	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);

	while (nr - bits_to_clear >= 0) {
		*p &= ~mask_to_clear;
		nr -= bits_to_clear;
		bits_to_clear = BITS_PER_LONG;
		mask_to_clear = ~0UL;
		p++;
	}
	if (nr) {
		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
		*p &= ~mask_to_clear;
	}
}

static unsigned long __ffs(unsigned long word)
{
	int num = 0;

	if ((word & 0xffff) == 0) {
		num += 16;
		word >>= 16;
	}
	if ((word & 0xff) == 0) {
		num += 8;
		word >>= 8;
	}
	if ((word & 0xf) == 0) {
		num += 4;
		word >>= 4;
	}
	if ((word & 0x3) == 0) {
		num += 2;
		word >>= 2;
	}
	if ((word & 0x1) == 0)
		num += 1;
	return num;
}

unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
			    unsigned long offset)
{
	const unsigned long *p = addr + BITOP_WORD(offset);
	unsigned long result = offset & ~(BITS_PER_LONG-1);
	unsigned long tmp;

	if (offset >= size)
		return size;
	size -= result;
	offset %= BITS_PER_LONG;
	if (offset) {
		tmp = *(p++);
		tmp &= (~0UL << offset);
		if (size < BITS_PER_LONG)
			goto found_first;
		if (tmp)
			goto found_middle;
		size -= BITS_PER_LONG;
		result += BITS_PER_LONG;
	}
	while (size & ~(BITS_PER_LONG-1)) {
		if ((tmp = *(p++)))
			goto found_middle;
		result += BITS_PER_LONG;
		size -= BITS_PER_LONG;
	}
	if (!size)
		return result;
	tmp = *p;

found_first:
	tmp &= (~0UL >> (BITS_PER_LONG - size));
	if (tmp == 0UL)		/* Are any bits set? */
		return result + size;	/* Nope. */
found_middle:
	return result + __ffs(tmp);
}

#define ffz(x)  __ffs(~(x))

unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
				 unsigned long offset)
{
	const unsigned long *p = addr + BITOP_WORD(offset);
	unsigned long result = offset & ~(BITS_PER_LONG-1);
	unsigned long tmp;

	if (offset >= size)
		return size;
	size -= result;
	offset %= BITS_PER_LONG;
	if (offset) {
		tmp = *(p++);
		tmp |= ~0UL >> (BITS_PER_LONG - offset);
		if (size < BITS_PER_LONG)
			goto found_first;
		if (~tmp)
			goto found_middle;
		size -= BITS_PER_LONG;
		result += BITS_PER_LONG;
	}
	while (size & ~(BITS_PER_LONG-1)) {
		if (~(tmp = *(p++)))
			goto found_middle;
		result += BITS_PER_LONG;
		size -= BITS_PER_LONG;
	}
	if (!size)
		return result;
	tmp = *p;

found_first:
	tmp |= ~0UL << size;
	if (tmp == ~0UL)	/* Are any bits zero? */
		return result + size;	/* Nope. */
found_middle:
	return result + ffz(tmp);
}

#define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask))

static inline int test_bit(int nr, const volatile unsigned long *addr)
{
	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

unsigned long bitmap_find_next_zero_area(unsigned long *map,
					 unsigned long size,
					 unsigned long start,
					 unsigned int nr,
					 unsigned long align_mask)
{
	unsigned long index, end, i;
again:
	index = find_next_zero_bit(map, size, start);

	/* Align allocation */
	index = __ALIGN_MASK(index, align_mask);

	end = index + nr;
#ifdef ORIGINAL
	if (end >= size)
#else
	if (end > size)
#endif
		return end;

#ifdef ORIGINAL
	for (i = index; i < end; i++) {
		if (test_bit(i, map)) {
			start = i+1;
			goto again;
		}
	}
#else
	i = find_next_bit(map, end, index);
	if (i < end) {
		start = i + 1;
		goto again;
	}
#endif
	return index;
}

#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
#define BITS_TO_LONGS(nr)       DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
#define DECLARE_BITMAP(name,bits) unsigned long name[BITS_TO_LONGS(bits)]

#endif /* Copy and paste from kernel source */

static DECLARE_BITMAP(bitmap, 1000);
static DECLARE_BITMAP(empty, 1000);
static DECLARE_BITMAP(full, 1000);

static void bitmap_dump(unsigned long *bitmap, int size)
{
	int i;

	for (i = 0; i < size; i++) {
		if (test_bit(i, bitmap))
			printf("1 ");
		else
			printf("0 ");
		if (i % 10 == 9)
			printf("\n");
	}
	printf("\n");
}

static int test1(int size)
{

	int start = random() % size;
	int nr = random() % (size - start);

	memset(bitmap, 0x00, BITS_TO_LONGS(size) * sizeof(unsigned long));

	bitmap_set(bitmap, start, nr);
	bitmap_clear(bitmap, start, nr);

	if (memcmp(empty, bitmap, BITS_TO_LONGS(size) * sizeof(unsigned long)))
		goto error;

	return 0;
error:
	bitmap_dump(bitmap, size);
	return 1;
}

int test2(int size)
{
	int start = random() % size;
	int nr = random() % (size - start);

	memset(bitmap, 0xff, BITS_TO_LONGS(size) * sizeof(unsigned long));

	bitmap_clear(bitmap, start, nr);
	bitmap_set(bitmap, start, nr);

	if (memcmp(full, bitmap, BITS_TO_LONGS(size) * sizeof(unsigned long)))
		goto error;

	return 0;
error:
	bitmap_dump(bitmap, size);
	return 1;
}

int test3(int size)
{
	int start = random() % size;
	int nr = random() % (size - start);
	unsigned long offset;

	memset(bitmap, 0x00, BITS_TO_LONGS(size) * sizeof(unsigned long));
	bitmap_set(bitmap, start, nr);
	if (start) {
		offset = bitmap_find_next_zero_area(bitmap, size, 0, start, 0);
		if (offset != 0) {
			printf("start %ld nr %ld\n", start, nr);
			printf("offset %ld != 0\n", offset);
			goto error;
		}
	}
	offset = bitmap_find_next_zero_area(bitmap, size, start,
						size - (start + nr), 0);
	if (offset != start + nr) {
		printf("start %ld nr %ld\n", start, nr);
		printf("offset %ld != size + nr %ld\n", offset, start + nr);
		goto error;
	}

	return 0;
error:
	bitmap_dump(bitmap, size);

	return 1;
}

int test4(int size)
{
	int start = random() % size;
	int nr = random() % (size - start);
	unsigned long offset;

	memset(bitmap, 0xff, BITS_TO_LONGS(size) * sizeof(unsigned long));
	bitmap_clear(bitmap, start, nr);
	offset = bitmap_find_next_zero_area(bitmap, size, start, nr, 0);
	if (nr != 0) {
		if (offset != start) {
			printf("start %ld nr %ld\n", start, nr);
			printf("offset %ld != start %ld\n", offset, start);
			goto error;
		}
	}
	return 0;
error:
	bitmap_dump(bitmap, size);

	return 1;
}

int main(int argc, char *argv[])
{
	int err = 0;

	srandom(time(NULL));

	memset(empty, 0x00, sizeof(empty));
	memset(full, 0xff, sizeof(full));

	while (!err) {
		err |= test1(1000);
		err |= test2(1000);
		err |= test3(1000);
		err |= test4(1000);
	}
	return 0;
}

^ permalink raw reply

* [PATCH -mmotm] Fix bitmap-introduce-bitmap_set-bitmap_clear-bitmap_find_next_zero_area. patch
From: Akinobu Mita @ 2009-10-14  3:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Fenghua Yu, Greg Kroah-Hartman, linux-ia64, Tony Luck, x86,
	netdev, linux-kernel, linux-altix, Yevgeny Petrilin,
	FUJITA Tomonori, linuxppc-dev, Ingo Molnar, Paul Mackerras,
	H. Peter Anvin, sparclinux, Thomas Gleixner, linux-usb,
	David S. Miller, Lothar Wassmann
In-Reply-To: <20091013091017.GA18431@localhost.localdomain>

Update PATCH 2/8 based on review comments by Andrew and bugfix
exposed by user space testing.

I didn't change argument of align_mask at this time because it
turned out that it needs more changes in iommu-helper users.

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: Fix bitmap-introduce-bitmap_set-bitmap_clear-bitmap_find_next_zero_area.patch

- Rewrite bitmap_set and bitmap_clear

  Instead of setting or clearing for each bit.

- Fix off-by-one error in bitmap_find_next_zero_area

  This bug was derived from find_next_zero_area in iommu-helper.

- Add kerneldoc for bitmap_find_next_zero_area

This patch is supposed to be folded into
bitmap-introduce-bitmap_set-bitmap_clear-bitmap_find_next_zero_area.patch

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
 lib/bitmap.c |   60 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index 2415da4..84292c9 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -271,28 +271,62 @@ int __bitmap_weight(const unsigned long *bitmap, int bits)
 }
 EXPORT_SYMBOL(__bitmap_weight);
 
-void bitmap_set(unsigned long *map, int i, int len)
-{
-	int end = i + len;
+#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
 
-	while (i < end) {
-		__set_bit(i, map);
-		i++;
+void bitmap_set(unsigned long *map, int start, int nr)
+{
+	unsigned long *p = map + BIT_WORD(start);
+	const int size = start + nr;
+	int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
+	unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+
+	while (nr - bits_to_set >= 0) {
+		*p |= mask_to_set;
+		nr -= bits_to_set;
+		bits_to_set = BITS_PER_LONG;
+		mask_to_set = ~0UL;
+		p++;
+	}
+	if (nr) {
+		mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+		*p |= mask_to_set;
 	}
 }
 EXPORT_SYMBOL(bitmap_set);
 
 void bitmap_clear(unsigned long *map, int start, int nr)
 {
-	int end = start + nr;
-
-	while (start < end) {
-		__clear_bit(start, map);
-		start++;
+	unsigned long *p = map + BIT_WORD(start);
+	const int size = start + nr;
+	int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+	unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+
+	while (nr - bits_to_clear >= 0) {
+		*p &= ~mask_to_clear;
+		nr -= bits_to_clear;
+		bits_to_clear = BITS_PER_LONG;
+		mask_to_clear = ~0UL;
+		p++;
+	}
+	if (nr) {
+		mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+		*p &= ~mask_to_clear;
 	}
 }
 EXPORT_SYMBOL(bitmap_clear);
 
+/*
+ * bitmap_find_next_zero_area - find a contiguous aligned zero area
+ * @map: The address to base the search on
+ * @size: The bitmap size in bits
+ * @start: The bitnumber to start searching at
+ * @nr: The number of zeroed bits we're looking for
+ * @align_mask: Alignment mask for zero area
+ *
+ * The @align_mask should be one less than a power of 2; the effect is that
+ * the bit offset of all zero areas this function finds is multiples of that
+ * power of 2. A @align_mask of 0 means no alignment is required.
+ */
 unsigned long bitmap_find_next_zero_area(unsigned long *map,
 					 unsigned long size,
 					 unsigned long start,
@@ -304,10 +338,10 @@ again:
 	index = find_next_zero_bit(map, size, start);
 
 	/* Align allocation */
-	index = (index + align_mask) & ~align_mask;
+	index = __ALIGN_MASK(index, align_mask);
 
 	end = index + nr;
-	if (end >= size)
+	if (end > size)
 		return end;
 	i = find_next_bit(map, end, index);
 	if (i < end) {
-- 
1.5.4.3

^ permalink raw reply related

* Re: [Bugme-new] [Bug 14330] New: Kernel
From: Andrew Morton @ 2009-10-14  2:48 UTC (permalink / raw)
  To: netdev; +Cc: bugzilla-daemon, bugme-daemon, hannes
In-Reply-To: <bug-14330-10286@http.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 6 Oct 2009 00:16:31 GMT bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14330
> 
>                URL: NULL pointer dereference with arp_notify and no ip
>                     address
>            Summary: Kernel
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.30.8-64.fc11.x86_64
>           Platform: All
>         OS/Version: Linux
>               Tree: Fedora
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: hannes@stressinduktion.org
>         Regression: No
> 
> 
> A NULL pointer dereference happens if arp_notify is enabled, no ip address is
> set
> and the mac address changes on an interface.
> 
> Steps to reproduce:
> sysctl -w net.ipv4.conf.all.arp_notify=1
> ip a f dev eth0
> ip l s eth0 a 02:00:00:00:00:00
> 
> It seems that in function inetdev_event (net/ipv4/devinet.c:1081) the
> dereference of in_dev->ifa_list->ifa_address blows up because in_dev->ifa_list
> is NULL.

^ permalink raw reply

* Re: [PATCH] net/hamradio: fix test in receive()
From: Andrew Morton @ 2009-10-14  2:10 UTC (permalink / raw)
  To: Roel Kluin; +Cc: netdev, davem, Thomas Sailer
In-Reply-To: <4ACB27E8.8060401@gmail.com>

On Tue, 06 Oct 2009 13:20:08 +0200 Roel Kluin <roel.kluin@gmail.com> wrote:

> The negation makes it a bool before the comparison and hence it
> will never evaluate to true.
> 
> Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
> ---
> Was this intended?
> 
> diff --git a/drivers/net/hamradio/baycom_epp.c b/drivers/net/hamradio/baycom_epp.c
> index 7bcaf7c..ee06a13 100644
> --- a/drivers/net/hamradio/baycom_epp.c
> +++ b/drivers/net/hamradio/baycom_epp.c
> @@ -596,7 +596,8 @@ static int receive(struct net_device *dev, int cnt)
>  						state = 0;
>  
>  					/* not flag received */
> -					else if (!(bitstream & (0x1fe << j)) != (0x0fc << j)) {
> +					else if ((bitstream & (0x1fe << j)) !=
> +							(0x0fc << j)) {
>  						if (state)
>  							do_rxpacket(dev);
>  						bc->hdlcrx.bufcnt = 0;

It does look like that is what was intended.

With this fix we're turning on a code path which probably hasn't been
executed by anyone in a decade or so.  I wonder what it'll do?


^ permalink raw reply

* Re: [PATCH] connector: Fix regression introduced by sid connector
From: David Rientjes @ 2009-10-14  0:53 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Andrew Morton, Oleg Nesterov, scott, zbr, linux-kernel,
	Matt Helsley, David S. Miller, netdev
In-Reply-To: <200910020816.41323.borntraeger@de.ibm.com>

On Fri, 2 Oct 2009, Christian Borntraeger wrote:

> since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add event 
> for process becoming session leader) we have the following warning:
> Badness at kernel/softirq.c:143
> [...]
> Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
> [...]
> Call Trace:
> ([<000000013fe04100>] 0x13fe04100)
>  [<000000000048a946>] sk_filter+0x9a/0xd0
>  [<000000000049d938>] netlink_broadcast+0x2c0/0x53c
>  [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
>  [<00000000003baef0>] proc_sid_connector+0xc4/0xd4
>  [<0000000000142604>] __set_special_pids+0x58/0x90
>  [<0000000000159938>] sys_setsid+0xb4/0xd8
>  [<00000000001187fe>] sysc_noemu+0x10/0x16
>  [<00000041616cb266>] 0x41616cb266
> 
> The warning is
> --->    WARN_ON_ONCE(in_irq() || irqs_disabled());
> 
> The network code must not be called with disabled interrupts but
> sys_setsid holds the tasklist_lock with spinlock_irq while calling
> the connector. 
> After a discussion we agreed that we can move proc_sid_connector
> from __set_special_pids to sys_setsid.
> We also agreed that it is sufficient to change the check from
> task_session(curr) != pid into err > 0, since if we don't change the
> session, this means we were already the leader and return -EPERM.
> 
> One last thing:
> There is also daemonize(), and some people might want to get a
> notification in that case. Since daemonize() is only needed if a user
> space does kernel_thread this does not look important (and there seems
> to be no consensus if this connector should be called in daemonize). If
> we really want this, we can add proc_sid_connector to daemonize() in an
> additional patch (Scott?)
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> CCed: Scott James Remnant <scott@ubuntu.com>
> CCed: Matt Helsley <matthltc@us.ibm.com>
> CCed: David S. Miller <davem@davemloft.net>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Acked-by: Evgeniy Polyakov <zbr@ioremap.net>

Acked-by: David Rientjes <rientjes@google.com>

I was getting the same softirq warnings and later in slub in slab_alloc() 
at might_sleep_if(gfpflags & __GFP_WAIT) from __alloc_skb(..., GFP_KERNEL) 
in this context that would infinitely spam my log.

^ permalink raw reply

* Re: [PATCH 2/3] sky2: Reading registers in reset causes a hang
From: Mike McCormack @ 2009-10-13 23:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20091013090038.6ec27475@nehalam>

Stephen Hemminger wrote:
> On Tue, 13 Oct 2009 07:36:45 +0900
> Mike McCormack <mikem@ring3k.org> wrote:
> 
>> 2009/10/13 Stephen Hemminger <shemminger@linux-foundation.org>
>>> On Mon, 12 Oct 2009 23:06:37 +0900
>>> Mike McCormack <mikem@ring3k.org> wrote:
>>>
>>>> When sky2 hardware is in reset, reading registers with ethtool -d
>>>> causes a hard system hang. eg.
>>>>
>>>>     ifconfig eth1 down
>>>>     ethtool -d eth1
>>>>
>>>> Avoid reading FIFOs, descriptor and status unit, etc. after we've
>>>>  bought the interface down, as these seem to cause the issue.
>>>>
>>>> Assume the same is true for the second port, as my port only has
>>>>  one card.
>>> I don't see this on my cards. Let me investigate further before
>>> committing this. Also, the debugfs interface would also be screwed
>>> if the registers were unavailable.
>> I forgot to include one other piece of information... I'm running a
>> ping -f at the sky2 interface on a remote machine. I'll check debugfs
>> tonight.
>>
> 
> You are flooding a down interface??

I'm flooding the interface from before it goes down. (Another machine ran
ping -f 192.168.0.100 continuously while I was testing this...)

/sys/kernel/debug/sky2/eth1 no longer exists after ifconfig eth1 down,
so debugfs cannot cause this problem on my machine.

It may be possible to fix this with extra cleanup in sky2_down, however I'm
not really which part of the chip needs more resetting...

thanks,

Mike

^ permalink raw reply

* Re: [PATCH] [NIU] VLAN does not work with niu driver
From: David Miller @ 2009-10-13 22:12 UTC (permalink / raw)
  To: Joyce.Yu; +Cc: netdev
In-Reply-To: <4AD4F0BF.1040606@Sun.COM>

From: Joyce Yu <Joyce.Yu@Sun.COM>
Date: Tue, 13 Oct 2009 14:27:27 -0700

> @@ -3545,7 +3546,15 @@ static int niu_process_rx_pkt(struct napi_struct *napi, struct niu *np,
>  	rp->rcr_index = index;
>  
>  	skb_reserve(skb, NET_IP_ALIGN);
> -	__pskb_pull_tail(skb, min(len, NIU_RXPULL_MAX));
> +	__pskb_pull_tail(skb, min(len, VLAN_ETH_HLEN));
> +
> +	veth = (struct vlan_ethhdr *)skb->data;
> +	if (veth->h_vlan_proto != __constant_htons(ETH_P_8021Q)) {
> +		skb->tail -= 4;
> +		skb->data_len += 4;
> +		skb_shinfo(skb)->frags[0].page_offset -= 4;
> +		skb_shinfo(skb)->frags[0].size += 4;
> +	}

So you guys spent an enormous amount of time trying to figure out
why just a plain '__pskb_pull_tail(skb, min(len, VLAN_ETH_HLEN));'
doesn't work.

Don't you think the full details of your discovery might be relevant
either in the commit message or a comment?

Otherwise, how is anyone else in the world going to figure out why you
had to do these frag list adjustments?

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: dann frazier @ 2009-10-13 22:08 UTC (permalink / raw)
  To: Greg KH
  Cc: Jordan_Hargrave, Narendra_K, netdev, linux-hotplug, Matt_Domsch,
	Charles_Rose
In-Reply-To: <20091013201931.GA4622@kroah.com>

On Tue, Oct 13, 2009 at 01:19:31PM -0700, Greg KH wrote:
> 
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top
> 
> On Tue, Oct 13, 2009 at 03:00:59PM -0500, Jordan_Hargrave@Dell.com wrote:
> > We have developed a mapping library that will convert the
> > user-friendly symlink names to the kernel names necessary for socket
> > ioctls.  All network tools that normally take ethX as argument have
> > been modified to use this mapping library.  Usually it's just a
> > one-line addition when parsing the command line arguments.
> 
> Either I missed this in the first message in this thread, or this was
> never stated before, but that is nice.  Where is this library,

I read about it here:
  http://linux.dell.com/wiki/index.php/Oss/libnetdevname#libnetdevname

Source appears to be here:
  http://linux.dell.com/git/?p=libnetdevname.git;a=summary

> and will
> it be accepted by the upstream tool maintainers?

-- 
dann frazier


^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Matt Domsch @ 2009-10-13 22:05 UTC (permalink / raw)
  To: Greg KH
  Cc: Jordan_Hargrave, Narendra_K, dannf, netdev, linux-hotplug,
	Charles_Rose
In-Reply-To: <20091013201931.GA4622@kroah.com>

On Tue, Oct 13, 2009 at 01:19:31PM -0700, Greg KH wrote:
> On Tue, Oct 13, 2009 at 03:00:59PM -0500, Jordan_Hargrave@Dell.com wrote:
> > We have developed a mapping library that will convert the
> > user-friendly symlink names to the kernel names necessary for socket
> > ioctls.  All network tools that normally take ethX as argument have
> > been modified to use this mapping library.  Usually it's just a
> > one-line addition when parsing the command line arguments.
> 
> Either I missed this in the first message in this thread, or this was
> never stated before, but that is nice.  Where is this library,

It was not noted in the initial patch post, but I did note it
immediately thereafter.

 Let me also note that we are prepared to have userspace consumers of
 this new character device node.

 http://linux.dell.com/wiki/index.php/Oss/libnetdevname

 notes how the kernel patch will interact with udev, describes the new
 library helper function in libnetdevname, and has patches for
 net-tools, iproute2, and ethtool to make use of the helper function.

 As has been noted here, MAC addresses are not necessarily unique to an
 interface.  As such, we are not proposing a net/by-mac/* symlink to
 /dev/netdev/*.

> and will it be accepted by the upstream tool maintainers?

Unknown, we haven't proposed it to any yet as it's irrelevant until
there is general acceptance of the approach (kernel or otherwise).  I
figured we'd start with the kernel discussion, and show how it could
be used.

-- 
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply

* [RFC net-next-2.6] udp: Dont use lock_sock()/release_sock() in rx path
From: Eric Dumazet @ 2009-10-13 21:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: Christoph Lameter, Vegard Nossum, Linux Netdev List
In-Reply-To: <4ACFA012.6010409@gmail.com>

Eric Dumazet a écrit :
> Sure, will do, but first I want to suppress the lock_sock()/release_sock() in
> rx path, that was added for sk_forward_alloc thing. This really hurts,
> because of the backlog handling.
> 
> I have preliminary patch that restore UDP latencies we had in the past ;)
> 
> Trick is for UDP, sk_forward_alloc is not updated by tx/rx, only rx.
> So we can use the sk_receive_queue.lock to forbid concurrent updates.
> 
> As this lock is already hot and only used by rx, we wont have to
> dirty the sk_lock, that will only be used by tx path.
> 
> Then we can carefuly reorder struct sock to lower number of cache lines
> needed for each path.
> 

[RFC net-next-2.6] udp: Dont use lock_sock()/release_sock() in rx path

We added two years ago memory accounting for UDP and some people complain
about increased latencies, especially on multicast.

Because sk_forward_alloc is not atomic, we duplicated the protection we used for TCP,
ie use lock_sock()/release_sock() to guard sk_forward_alloc against concurrent updates.

When a frame is received by NIC, sofirq handler has to lock the socket, and eventually
has to queue the frame into backlog instead of receive queue.

Then, user application also has to lock socket to dequeue a frame from receive_queue
and eventually process backlog queue, leading to high latencies.

This lock is also used in tx path to guard cork structure against concurrent updates.

This cause false sharing of socket lock between several cpus that could be avoided,
considering UDP touches sk_forward_alloc only on rx. (TCP use it both for rx and tx)

Instead of using socket lock, we can use the sk_receive.lock lock that we have to
get anyway to queue frame at softirq time (or dequeue it by user application)

This avoids two atomic ops per packet in softirq handler, one in user app doing recvmsg(),
and we dont touch backlog anymore.

This way, the socket lock is only used in tx path, as in the past, and we can improve
things by reordering struct sock fields into separate rx/tx groups, in a followup patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 core/sock.c |    6 +++++-
 ipv4/udp.c  |   34 ++++++++--------------------------
 ipv6/udp.c  |   36 ++++++++++--------------------------
 3 files changed, 23 insertions(+), 53 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 43ca2c9..d06b1a0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -292,8 +292,13 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (err)
 		goto out;
 
+	skb_orphan(skb);
+
+	spin_lock_irqsave(&list->lock, flags);
+
 	if (!sk_rmem_schedule(sk, skb->truesize)) {
 		err = -ENOBUFS;
+		spin_unlock_irqrestore(&list->lock, flags);
 		goto out;
 	}
 
@@ -307,7 +312,6 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	 */
 	skb_len = skb->len;
 
-	spin_lock_irqsave(&list->lock, flags);
 	skb->dropcount = atomic_read(&sk->sk_drops);
 	__skb_queue_tail(list, skb);
 	spin_unlock_irqrestore(&list->lock, flags);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ee61b3f..f62cec3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -855,29 +855,21 @@ out:
  */
 static unsigned int first_packet_length(struct sock *sk)
 {
-	struct sk_buff_head list_kill, *rcvq = &sk->sk_receive_queue;
+	struct sk_buff_head *rcvq = &sk->sk_receive_queue;
 	struct sk_buff *skb;
 	unsigned int res;
 
-	__skb_queue_head_init(&list_kill);
-
 	spin_lock_bh(&rcvq->lock);
 	while ((skb = skb_peek(rcvq)) != NULL &&
 		udp_lib_checksum_complete(skb)) {
 		UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS,
 				 IS_UDPLITE(sk));
 		__skb_unlink(skb, rcvq);
-		__skb_queue_tail(&list_kill, skb);
+		skb_kill_datagram(sk, skb, 0);
 	}
 	res = skb ? skb->len : 0;
 	spin_unlock_bh(&rcvq->lock);
 
-	if (!skb_queue_empty(&list_kill)) {
-		lock_sock(sk);
-		__skb_queue_purge(&list_kill);
-		sk_mem_reclaim_partial(sk);
-		release_sock(sk);
-	}
 	return res;
 }
 
@@ -1003,17 +995,17 @@ try_again:
 		err = ulen;
 
 out_free:
-	lock_sock(sk);
+	spin_lock_bh(&sk->sk_receive_queue.lock);
 	skb_free_datagram(sk, skb);
-	release_sock(sk);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
 out:
 	return err;
 
 csum_copy_err:
-	lock_sock(sk);
+	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (!skb_kill_datagram(sk, skb, flags))
-		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	release_sock(sk);
+		UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1095,7 +1087,6 @@ drop:
 int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
-	int rc;
 	int is_udplite = IS_UDPLITE(sk);
 
 	/*
@@ -1175,16 +1166,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
-	rc = 0;
-
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		rc = __udp_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
-
-	return rc;
+	return __udp_queue_rcv_skb(sk, skb);
 
 drop:
 	UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 1f8e2af..07468aa 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -288,23 +288,23 @@ try_again:
 		err = ulen;
 
 out_free:
-	lock_sock(sk);
+	spin_lock_bh(&sk->sk_receive_queue.lock);
 	skb_free_datagram(sk, skb);
-	release_sock(sk);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
 out:
 	return err;
 
 csum_copy_err:
-	lock_sock(sk);
+	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (!skb_kill_datagram(sk, skb, flags)) {
 		if (is_udp4)
-			UDP_INC_STATS_USER(sock_net(sk),
+			UDP_INC_STATS_BH(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 		else
-			UDP6_INC_STATS_USER(sock_net(sk),
+			UDP6_INC_STATS_BH(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 	}
-	release_sock(sk);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
 
 	if (flags & MSG_DONTWAIT)
 		return -EAGAIN;
@@ -468,21 +468,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	while ((sk2 = udp_v6_mcast_next(net, sk_nulls_next(sk2), uh->dest, daddr,
 					uh->source, saddr, dif))) {
 		struct sk_buff *buff = skb_clone(skb, GFP_ATOMIC);
-		if (buff) {
-			bh_lock_sock(sk2);
-			if (!sock_owned_by_user(sk2))
-				udpv6_queue_rcv_skb(sk2, buff);
-			else
-				sk_add_backlog(sk2, buff);
-			bh_unlock_sock(sk2);
-		}
+		if (buff)
+			udpv6_queue_rcv_skb(sk2, buff);
 	}
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		udpv6_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
+	udpv6_queue_rcv_skb(sk, skb);
 out:
 	spin_unlock(&hslot->lock);
 	return 0;
@@ -597,12 +586,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	/* deliver */
 
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		udpv6_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
+	udpv6_queue_rcv_skb(sk, skb);
 	sock_put(sk);
 	return 0;
 

^ permalink raw reply related

* Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
From: Michael Ellerman @ 2009-10-13 21:54 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: Fenghua Yu, x86, linux-ia64, Thomas Gleixner, David S. Miller,
	netdev, Greg Kroah-Hartman, linux-kernel, linux-altix,
	Yevgeny Petrilin, FUJITA Tomonori, linuxppc-dev, Tony Luck,
	Paul Mackerras, H. Peter Anvin, sparclinux, Andrew Morton,
	linux-usb, Ingo Molnar, Lothar Wassmann
In-Reply-To: <20091013091017.GA18431@localhost.localdomain>


[-- Attachment #1.1: Type: text/plain, Size: 241 bytes --]

On Tue, 2009-10-13 at 18:10 +0900, Akinobu Mita wrote:
> My user space testing exposed off-by-one error find_next_zero_area
> in iommu-helper.

Why not merge those tests into the kernel as a configurable boot-time
self-test?

cheers

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 150 bytes --]

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] ax25: unsigned cannot be less than 0 in ax25_ctl_ioctl()
From: Kevin Dawson @ 2009-10-13 21:50 UTC (permalink / raw)
  To: wharms; +Cc: linux-hams, netdev, Andrew Morton
In-Reply-To: <4AD44CF6.1070801@bfs.de>

walter harms wrote:
> 
> Kevin Dawson schrieb:
>> 
>>     if (ax25_ctl.arg > ULONG_MAX / HZ && ...
> 
> i like this because it prevents a wrap around for stupid ax25_ctl.arg values but will not help when
> ax25_ctl.arg * HZ is used later. NTL i think HZ does not need to be constant these days but i am not an
> expert on that area.

I don't have a kernel source tree at the moment, so I haven't seen where 
it's used, but the same size-limiting principle using divisions would 
apply whether you have constants, variables or function calls.

If ax25_ctl.arg * HZ is used later, one should arrange that the thread 
won't get that far if it has already been deemed an excessive value.

Kevin

^ permalink raw reply

* [PATCH] [NIU] VLAN does not work with niu driver
From: Joyce Yu @ 2009-10-13 21:27 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 6 bytes --]


-- 


[-- Attachment #2: 0001-VLAN-does-not-work-with-niu-driver.patch --]
[-- Type: text/x-patch, Size: 1257 bytes --]

>From 0bb77e878758bd72051577bcc568e2b95c87c203 Mon Sep 17 00:00:00 2001
From: Joyce Yu <joyce.yu@sun.com>
Date: Mon, 12 Oct 2009 11:03:54 -0700
Subject: [PATCH] VLAN does not work with niu driver

---
 drivers/net/niu.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index f9364d0..9559e42 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -3480,6 +3480,7 @@ static int niu_process_rx_pkt(struct napi_struct *napi, struct niu *np,
 	unsigned int index = rp->rcr_index;
 	struct sk_buff *skb;
 	int len, num_rcr;
+	struct vlan_ethhdr *veth;
 
 	skb = netdev_alloc_skb(np->dev, RX_SKB_ALLOC_SIZE);
 	if (unlikely(!skb))
@@ -3545,7 +3546,15 @@ static int niu_process_rx_pkt(struct napi_struct *napi, struct niu *np,
 	rp->rcr_index = index;
 
 	skb_reserve(skb, NET_IP_ALIGN);
-	__pskb_pull_tail(skb, min(len, NIU_RXPULL_MAX));
+	__pskb_pull_tail(skb, min(len, VLAN_ETH_HLEN));
+
+	veth = (struct vlan_ethhdr *)skb->data;
+	if (veth->h_vlan_proto != __constant_htons(ETH_P_8021Q)) {
+		skb->tail -= 4;
+		skb->data_len += 4;
+		skb_shinfo(skb)->frags[0].page_offset -= 4;
+		skb_shinfo(skb)->frags[0].size += 4;
+	}
 
 	rp->rx_packets++;
 	rp->rx_bytes += skb->len;
-- 
1.6.4


^ permalink raw reply related

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Julian Anastasov @ 2009-10-13 21:27 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev
In-Reply-To: <20091013071955.GA3587@1wt.eu>


	Hello,

On Tue, 13 Oct 2009, Willy Tarreau wrote:

> >From da80c99a503bab1256706ed8d967e2ab3f71afe0 Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <w@1wt.eu>
> Date: Tue, 13 Oct 2009 07:26:54 +0200
> Subject: tcp: fix tcp_defer_accept to consider the timeout
> 
> I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> client does not talk, the connection is never accepted and
> remains in SYN_RECV state until the retransmits expire, where
> it finally is deleted. This is bad when some firewall such as

	I think, this is by design, there is big comment in
tcp_check_req().

> netfilter sits between the client and the server because the
> firewall sees the connection in ESTABLISHED state while the
> server will finally silently drop it without sending an RST.

	Client can stay ESTABLISHED for long time but
RST will be sent when client sends DATA or FIN.

> This behaviour contradicts the man page which says it should
> wait only for some time :
> 
>        TCP_DEFER_ACCEPT (since Linux 2.4)
>           Allows a listener to be awakened only when data arrives
>           on the socket.  Takes an integer value  (seconds), this
>           can  bound  the  maximum  number  of attempts TCP will
>           make to complete the connection. This option should not
>           be used in code intended to be portable.

	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.

> Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> computed :

	rskq_defer_accept is threshold, not counter

>         case TCP_DEFER_ACCEPT:
>                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
>                 if (val > 0) {
>                         /* Translate value in seconds to number of
>                          * retransmits */
>                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
>                                val > ((TCP_TIMEOUT_INIT / HZ) <<
>                                        icsk->icsk_accept_queue.rskq_defer_accept))
>                                 icsk->icsk_accept_queue.rskq_defer_accept++;
>                         icsk->icsk_accept_queue.rskq_defer_accept++;
>                 }
>                 break;
> 
> ==> rskq_defer_accept is used as a counter of retransmits.

	as limit for retransmits, not as counter

> But in tcp_minisocks.c, this counter is only checked. And in
> fact, I have found no location which updates it. So I think
> that what was intended was to decrease it in tcp_minisocks
> whenever it is checked, which the trivial patch below does.

	You can check net/ipv4/inet_connection_sock.c,
inet_csk_reqsk_queue_prune() where TCP_DEFER_ACCEPT can extend
the retransmission threshold for acked sockets above the
applied 'thresh'. So, there are 2 options:

a) TCP_DEFER_ACCEPT is used as flag (eg. 1) or the period is below
the TCP_SYNCNT period. In this case TCP_DEFER_ACCEPT does not
extend the period for DATA (DATA must come before TCP_SYNCNT).
Application is notified only when DATA comes.

or

b) TCP_DEFER_ACCEPT is set with seconds above the TCP_SYNCNT
retrans limit and the first ACK extends the period up to
TCP_DEFER_ACCEPT seconds (converted as retrans). By this
way we provide more time for DATA after the empty ACKs.
ACK again can come before TCP_SYNCNT but DATA after ACK
can come even after TCP_SYNCNT but before TCP_DEFER_ACCEPT
timeout. Again, application is notified only when DATA comes.

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/ipv4/tcp_minisocks.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index f8d67cc..2f676f3 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -644,6 +644,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>  	/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
>  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {

	This is wrong considering inet_csk_reqsk_queue_prune()
and patch should not be applied except if I'm missing something:

> +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
>  		inet_rsk(req)->acked = 1;
>  		return NULL;
>  	}

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Your Webmail Account Expire in Two (2) Days.
From: System Administrator @ 2009-10-13 21:13 UTC (permalink / raw)


Dear Webmail Account User,

This message was sent automatically by a program on
Webmail admin center which periodically checks the size of 
inbox, The
program is run automatically to ensure no user inbox grows 
too large. If
your inbox becomes too large, you will be unable to 
receive new
emails.

Just before this message was sent, you are currently
running on 20.9GB,You have has exceeded the storage limit 
which is
20GB. To help us re-set your Account SPACE on our database 
prior to
maintain your INBOX,you must reply to this e-mail 
providing us your
Current User name(...... ......) and Password ( ... ...... 
... ) e-mail
(... ... ......)

If your inbox grows to 22.0 GB, you will be unable to
receive new email as it will be returned to the sender.

NB:Your Webmail Account Expire in Two (2) Days. After you
read this message, it is best to REPLY with the required 
information
to upgrade MailBox. Reply to this message immediately to 
Re activate
your Account.

Thank you for your cooperation.
Webmail Help Desk.
System Administrator.
--------------------------------------

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Greg KH @ 2009-10-13 20:19 UTC (permalink / raw)
  To: Jordan_Hargrave
  Cc: Narendra_K, dannf, netdev, linux-hotplug, Matt_Domsch,
	Charles_Rose
In-Reply-To: <5DDAB7BA7BDB58439DD0EED0B8E9A3AE02E8277A@ausx3mpc102.aus.amer.dell.com>


A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Tue, Oct 13, 2009 at 03:00:59PM -0500, Jordan_Hargrave@Dell.com wrote:
> We have developed a mapping library that will convert the
> user-friendly symlink names to the kernel names necessary for socket
> ioctls.  All network tools that normally take ethX as argument have
> been modified to use this mapping library.  Usually it's just a
> one-line addition when parsing the command line arguments.

Either I missed this in the first message in this thread, or this was
never stated before, but that is nice.  Where is this library, and will
it be accepted by the upstream tool maintainers?

thanks,

greg k-h

^ permalink raw reply

* RE: PATCH: Network Device Naming mechanism and policy
From: Jordan_Hargrave @ 2009-10-13 20:00 UTC (permalink / raw)
  To: greg, Narendra_K; +Cc: dannf, netdev, linux-hotplug, Matt_Domsch, Charles_Rose
In-Reply-To: <20091013195117.GA3778@kroah.com>

We have developed a mapping library that will convert the user-friendly symlink names to the kernel names necessary for socket ioctls.  All network tools that normally take ethX as argument have been modified to use this mapping library.  Usually it's just a one-line addition when parsing the command line arguments.

--jordan hargrave
Dell Enterprise Linux Engineering

-----Original Message-----
From: Greg KH [mailto:greg@kroah.com]
Sent: Tue 10/13/2009 14:51
To: K, Narendra
Cc: dannf@hp.com; netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; Domsch, Matt; Hargrave, Jordan; Rose, Charles
Subject: Re: PATCH: Network Device Naming mechanism and policy

On Tue, Oct 13, 2009 at 10:43:49PM +0530, Narendra_K@Dell.com wrote:
> 
> >> These device nodes are not functional at the moment - open() returns 
> >> -ENOSYS.  Their only purpose is to provide userspace with a kernel 
> >> name to ifindex mapping, in a form that udev can easily manage.
> >
> >If the idea is just to provide a userspace-visible mapping 
> >(and presumably take advantage of udev's infrastructure for 
> >naming) does this need kernel changes? Could this be a 
> >hierarchy under e.g. /etc/udev instead, using plain text 
> >files? It still means we need something like libnetdevname for 
> >apps to do the translation, but I'm not seeing why it matters 
> >how this map is stored. Is there some special property of the 
> >character devices (e.g. uevents) that we're not already 
> >getting with the existing interfaces?
> 
> Yes. The char device by itself doesn't help in any way. But it provides
> a flexible mechanism to provide multiple names for the same device, just
> the way it is for disks.

No, it's quite different than disks in that the symlinks, _and_ the
device nodes do absolutly nothing.  And any reference to a name that is
a symlink will not work with any existing network tool, you will have to
do some kind of lookup to determine which network device you really were
referring to.

These links end up being useless, and confusing, I still don't see how
you can use them for anything.

thanks,

greg k-h

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: John W. Linville @ 2009-10-13 19:53 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Dan Williams, Bill Nottingham, Scott James Remnant, Matt Domsch,
	Narendra K, netdev, linux-hotplug, jordan_hargrave
In-Reply-To: <1255459984.13438.2.camel@achroite>

On Tue, Oct 13, 2009 at 07:53:04PM +0100, Ben Hutchings wrote:
> On Tue, 2009-10-13 at 11:06 -0700, Dan Williams wrote:

> > And if we really want seamless support for MAC spoofing, we want
> > ETHTOOL_GPERMADDR for all drivers too, so that if your configuration
> > says "rename device XX:XX:XX:XX:XX:XX to YY:YY:YY:YY:YY:YY" we can
> > actually figure stuff out after the spoof.
> 
> ETHTOOL_GPERMADDR is handled in the ethtool core now.  Are you thinking
> of drivers that don't have ethtool ops?  Maybe it's time to add default
> operations.

Not quite true -- dev->perm_addr still has to be set by the driver.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox