Netdev List

Netdev List
 help / color / mirror / Atom feed

* Belay that... -- Re: Please pull 'upstream-jgarzik' branch of wireless-2.6
From: John W. Linville @ 2008-01-09  1:49 UTC (permalink / raw)
  To: jeff-o2qLIJkoznsdnm+yROfE0A
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <20080108224202.GE3086-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

Please don't pull yet -- I let a patch get in out of order.

I'll post a new pull request when I straighten this out...

John

On Tue, Jan 08, 2008 at 05:42:02PM -0500, John W. Linville wrote:
> On Tue, Jan 08, 2008 at 05:23:05PM -0500, John W. Linville wrote:
> > Jeff,
> > 
> > Another round of patches intended for 2.6.25...the biggest factions are
> > rt2x00 and b43 updates, as well as some Viro-isms... :-)
> > 
> > Please let me know if there are any problems!
> > 
> > John
> > 
> > P.S.  Copying Dave in case he is handling these requests...FWIW, it
> > will definitely depend on the patches already in netdev-2.6#upstream...
> 
> I left out a patch.  I have pushed it on top of the previous request.
> 
> Let me know if there are problems!
> 
> Thanks,
> 
> John
-- 
John W. Linville
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org

^ permalink raw reply

* Re: SACK scoreboard
From: Lachlan Andrew @ 2008-01-09  1:34 UTC (permalink / raw)
  To: David Miller; +Cc: jheffner, ilpo.jarvinen, netdev, quetchen
In-Reply-To: <20080108.144456.173014334.davem@davemloft.net>

Greetings David,

On 08/01/2008, David Miller <davem@davemloft.net> wrote:
> From: John Heffner <jheffner@psc.edu>
>
> > I haven't thought about this too hard, but can we approximate this by
> > moving scaked data into a sacked queue, then if something bad happens
> > merge this back into the retransmit queue?
>
> That defeats the impetus for the change.
>
> We want to free up the data, say, 2 packets at a time as
> ACKs come in.  The key goal is smooth liberation of
> retransmit queue packets over time.

John also suggested freeing the packets as a lower priority task, just
doing it after they're acknowledged.

When the ACK finally comes, you could do something like moving John's
entire list of packets to a "to be freed" list, and free a few every
time (say) another ACK comes in.

$0.02,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Ph: +1 (626) 395-8820    Fax: +1 (626) 568-3603
http://netlab.caltech.edu/~lachlan

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Andrew Morton @ 2008-01-09  1:07 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: fujita.tomonori, mingo, just.for.lkml, tomof, jarkao2, herbert,
	linux-kernel, neilb, bfields, netdev, tom
In-Reply-To: <20080109095445N.fujita.tomonori@lab.ntt.co.jp>

On Wed, 09 Jan 2008 09:54:45 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> > > --- a/lib/iommu-helper.c~a
> > > +++ a/lib/iommu-helper.c
> > > @@ -8,15 +8,20 @@
> > >  static unsigned long find_next_zero_area(unsigned long *map,
> > >  					 unsigned long size,
> > >  					 unsigned long start,
> > > -					 unsigned int nr)
> > > +					 unsigned int nr,
> > > +					 unsigned long align_mask)
> > >  {
> > >  	unsigned long index, end, i;
> > >  again:
> > >  	index = find_next_zero_bit(map, size, start);
> > > +
> > > +	/* Align allocation */
> > > +	index = (index + align_mask) & ~align_mask;
> > 
> > The ALIGN() macro is the approved way of doing this.
> > 
> > (I don't think ALIGN adds much value really, especially given that you've
> > commented what's going on, but I guess it does make reviewing and reading a
> > little easier).
> 
> Would be better to use __ALIGN_MASK? I can find only one user who
> directly use __ALIGN_MASK. The POWER IOMMU calculates align_mask by
> itself so it's easier to pass align_mask as an argument.

ALIGN() should be OK - its aditional type coercion isn't useful in this
case but ALIGN() is the official interface.

I don't see any reason why vermilion.c had to reach for __ALIGN_MASK.  I'll
switch it to ALIGN().


^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: FUJITA Tomonori @ 2008-01-09  0:54 UTC (permalink / raw)
  To: akpm
  Cc: fujita.tomonori, mingo, just.for.lkml, tomof, jarkao2, herbert,
	linux-kernel, neilb, bfields, netdev, tom
In-Reply-To: <20080108162739.f2f577ce.akpm@linux-foundation.org>

On Tue, 8 Jan 2008 16:27:39 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 09 Jan 2008 08:57:53 +0900
> FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> 
> > Andrew, can you replace
> > 
> > iommu-sg-add-iommu-helper-functions-for-the-free-area-management.patch
> > 
> > with the updated patch:
> > 
> > http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048997.html
> > 
> > For your convenience I've attached the updated patch too.
> 
> <generates the incremental>

Thanks for putting the fix to -mm.


> > --- a/lib/iommu-helper.c~a
> > +++ a/lib/iommu-helper.c
> > @@ -8,15 +8,20 @@
> >  static unsigned long find_next_zero_area(unsigned long *map,
> >  					 unsigned long size,
> >  					 unsigned long start,
> > -					 unsigned int nr)
> > +					 unsigned int nr,
> > +					 unsigned long align_mask)
> >  {
> >  	unsigned long index, end, i;
> >  again:
> >  	index = find_next_zero_bit(map, size, start);
> > +
> > +	/* Align allocation */
> > +	index = (index + align_mask) & ~align_mask;
> 
> The ALIGN() macro is the approved way of doing this.
> 
> (I don't think ALIGN adds much value really, especially given that you've
> commented what's going on, but I guess it does make reviewing and reading a
> little easier).

Would be better to use __ALIGN_MASK? I can find only one user who
directly use __ALIGN_MASK. The POWER IOMMU calculates align_mask by
itself so it's easier to pass align_mask as an argument.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: linux @ 2008-01-09  0:38 UTC (permalink / raw)
  To: linux, romieu; +Cc: akpm, davem, netdev
In-Reply-To: <20080108213640.GC4450@electric-eye.fr.zoreil.com>

> Can you try the patch below ?

Testing now... (I presume you noticed the one-character typo in my
earlier patch.  That should be "mc = mc->next", not "mv = mc->next".)

That doesn't seem to do it.  Not entirely, at least.  After downloading
and partially re-uploading an 800M file, slabtop reports:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
341576 341574  99%    0.50K  42697        8    170788K kmalloc-512
342006 341953  99%    0.19K  16286       21     65144K kmalloc-192
 30592  30575  99%    2.00K   7648        4     61184K kmalloc-2048
 30213  30193  99%    0.44K   3357        9     13428K skbuff_fclone_cache
  7650   7643  99%    0.08K    150       51       600K sysfs_dir_cache
  4000   3938  98%    0.12K    125       32       500K kmalloc-128
   258    258 100%    1.15K     43        6       344K raid5-md5
   232    221  95%    1.00K     58        4       232K kmalloc-1024
  3136   3110  99%    0.06K     49       64       196K kmalloc-64
   264     80  30%    0.68K     24       11       192K ext3_inode_cache

The "kmalloc-2048" was down in the noise before the upload started.
This is in single-user mode, after sync and echo 3 > /proc/sys/vm/drop_caches.

I'll have to try this after this evening's social plans, but I'm thinking
of implementing more rapid bug detection: explicitly zero the sp->TxBuff
slot when the skb is freed, and check that it is zero before putting
anything else in there.  (And likewise for RxBuff.)

That way, I don't have to use up a noticeable amount of memory to see
the bug and reboot to clear up the damage each test cycle.

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Andrew Morton @ 2008-01-09  0:27 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: mingo, fujita.tomonori, just.for.lkml, tomof, jarkao2, herbert,
	linux-kernel, neilb, bfields, netdev, tom
In-Reply-To: <20080109085753O.fujita.tomonori@lab.ntt.co.jp>

On Wed, 09 Jan 2008 08:57:53 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> Andrew, can you replace
> 
> iommu-sg-add-iommu-helper-functions-for-the-free-area-management.patch
> 
> with the updated patch:
> 
> http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048997.html
> 
> For your convenience I've attached the updated patch too.

<generates the incremental>


> --- a/lib/iommu-helper.c~a
> +++ a/lib/iommu-helper.c
> @@ -8,15 +8,20 @@
>  static unsigned long find_next_zero_area(unsigned long *map,
>  					 unsigned long size,
>  					 unsigned long start,
> -					 unsigned int nr)
> +					 unsigned int nr,
> +					 unsigned long align_mask)
>  {
>  	unsigned long index, end, i;
>  again:
>  	index = find_next_zero_bit(map, size, start);
> +
> +	/* Align allocation */
> +	index = (index + align_mask) & ~align_mask;

The ALIGN() macro is the approved way of doing this.

(I don't think ALIGN adds much value really, especially given that you've
commented what's going on, but I guess it does make reviewing and reading a
little easier).


>  	end = index + nr;
> -	if (end > size)
> +	if (end >= size)
>  		return -1;
> -	for (i = index + 1; i < end; i++) {
> +	for (i = index; i < end; i++) {
>  		if (test_bit(i, map)) {
>  			start = i+1;
>  			goto again;
> @@ -50,9 +55,8 @@ unsigned long iommu_area_alloc(unsigned 
>  {
>  	unsigned long index;
>  again:
> -	index = find_next_zero_area(map, size, start, nr);
> +	index = find_next_zero_area(map, size, start, nr, align_mask);
>  	if (index != -1) {
> -		index = (index + align_mask) & ~align_mask;
>  		if (is_span_boundary(index, nr, shift, boundary_size)) {
>  			/* we could do more effectively */
>  			start = index + 1;
> _
> 
> 

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-09  0:08 UTC (permalink / raw)
  To: Tom Spink; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <7b9198260801081546u50f11e9ck32ec6f071efa300d@mail.gmail.com>

On Tue, 8 Jan 2008, Tom Spink wrote:

> Ach.  I *am* missing something... and what I'm missing is my
> understanding of the sendmsg and recvmsg calls.

No problem.  We all have those days. :)

Brent

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: FUJITA Tomonori @ 2008-01-08 23:57 UTC (permalink / raw)
  To: mingo, akpm
  Cc: fujita.tomonori, just.for.lkml, tomof, jarkao2, herbert,
	linux-kernel, neilb, bfields, netdev, tom
In-Reply-To: <20080108155948.GC26114@elte.hu>

On Tue, 8 Jan 2008 16:59:48 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> 
> > The patches are available at:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/
> > 
> > Or if you prefer the git tree:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git 
> > iommu-sg-fixes
> 
> btw., these improvements to the IOMMU code are in -mm and will go into 
> v2.6.25, right? The changes look robust to me.

Thanks, they have been in -mm though the iommu helper fix hasn't
yet. Balbir Singh found the bug in 2.6.24-rc6-mm1. I've just check
mmotm and found that the IOMMU helper patch doesn't include the fix.

Andrew, can you replace

iommu-sg-add-iommu-helper-functions-for-the-free-area-management.patch

with the updated patch:

http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048997.html

For your convenience I've attached the updated patch too.

Hopefully, they will go into v2.6.25. At least, I hope that the
patches (0001-0011) that make the IOMMUs respect segment size limits
when merging sg lists will be merged. They are simple and I got ACKs
on POWER and PARISC.


Thanks,

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] add IOMMU helper functions for the free area management

This adds IOMMU helper functions for the free area management.  These
functions take care of LLD's segment boundary limit for IOMMUs.  They would be
useful for IOMMUs that use bitmap for the free area management.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 include/linux/iommu-helper.h |    7 ++++
 lib/Makefile                 |    1 +
 lib/iommu-helper.c           |   80 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/iommu-helper.h
 create mode 100644 lib/iommu-helper.c

diff --git a/include/linux/iommu-helper.h b/include/linux/iommu-helper.h
new file mode 100644
index 0000000..4dd4c04
--- /dev/null
+++ b/include/linux/iommu-helper.h
@@ -0,0 +1,7 @@
+extern unsigned long iommu_area_alloc(unsigned long *map, unsigned long size,
+				      unsigned long start, unsigned int nr,
+				      unsigned long shift,
+				      unsigned long boundary_size,
+				      unsigned long align_mask);
+extern void iommu_area_free(unsigned long *map, unsigned long start,
+			    unsigned int nr);
diff --git a/lib/Makefile b/lib/Makefile
index b6793ed..0e7383f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_SMP) += percpu_counter.o
 obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
+obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
 lib-$(CONFIG_GENERIC_BUG) += bug.o
diff --git a/lib/iommu-helper.c b/lib/iommu-helper.c
new file mode 100644
index 0000000..495575a
--- /dev/null
+++ b/lib/iommu-helper.c
@@ -0,0 +1,80 @@
+/*
+ * IOMMU helper functions for the free area management
+ */
+
+#include <linux/module.h>
+#include <linux/bitops.h>
+
+static unsigned long find_next_zero_area(unsigned long *map,
+					 unsigned long size,
+					 unsigned long start,
+					 unsigned int nr,
+					 unsigned long align_mask)
+{
+	unsigned long index, end, i;
+again:
+	index = find_next_zero_bit(map, size, start);
+
+	/* Align allocation */
+	index = (index + align_mask) & ~align_mask;
+
+	end = index + nr;
+	if (end >= size)
+		return -1;
+	for (i = index; i < end; i++) {
+		if (test_bit(i, map)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return index;
+}
+
+static inline void set_bit_area(unsigned long *map, unsigned long i,
+				int len)
+{
+	unsigned long end = i + len;
+	while (i < end) {
+		__set_bit(i, map);
+		i++;
+	}
+}
+
+static inline int is_span_boundary(unsigned int index, unsigned int nr,
+				   unsigned long shift,
+				   unsigned long boundary_size)
+{
+	shift = (shift + index) & (boundary_size - 1);
+	return shift + nr > boundary_size;
+}
+
+unsigned long iommu_area_alloc(unsigned long *map, unsigned long size,
+			       unsigned long start, unsigned int nr,
+			       unsigned long shift, unsigned long boundary_size,
+			       unsigned long align_mask)
+{
+	unsigned long index;
+again:
+	index = find_next_zero_area(map, size, start, nr, align_mask);
+	if (index != -1) {
+		if (is_span_boundary(index, nr, shift, boundary_size)) {
+			/* we could do more effectively */
+			start = index + 1;
+			goto again;
+		}
+		set_bit_area(map, index, nr);
+	}
+	return index;
+}
+EXPORT_SYMBOL(iommu_area_alloc);
+
+void iommu_area_free(unsigned long *map, unsigned long start, unsigned int nr)
+{
+	unsigned long end = start + nr;
+
+	while (start < end) {
+		__clear_bit(start, map);
+		start++;
+	}
+}
+EXPORT_SYMBOL(iommu_area_free);
-- 
1.5.3.4



^ permalink raw reply related

* Re: AF_UNIX MSG_PEEK bug?
From: Tom Spink @ 2008-01-08 23:46 UTC (permalink / raw)
  To: Brent Casavant; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <7b9198260801081539x7aee72fbm53b5f298c5faf56@mail.gmail.com>

On 08/01/2008, Tom Spink <tspink@gmail.com> wrote:
> On 08/01/2008, Brent Casavant <bcasavan@sgi.com> wrote:
> > On Tue, 8 Jan 2008, Tom Spink wrote:
> >
> > > Where in the code is the message length being sent across the socket?
> >
> > In do_producer(), there are the following lines in the main loop:
> >
> >         /* Send random lengths of data */
> >         messages[i].length = (rand() % MAXLEN) + sizeof(size_t);
> >         iov[i].iov_len = messages[i].length;
> >
> > The entire "struct sockmsg" is sent across the socket, so the first
> > size_t in each message contains the length of the entire message
> > (including the size_t).  This size gets picked up at the
> > recv(...,MSG_PEEK) line in do_consumer().
> >
> > Thanks,
> > Brent
> >
> > --
> > Brent Casavant                          All music is folk music.  I ain't
> > bcasavan@sgi.com                        never heard a horse sing a song.
> > Silicon Graphics, Inc.                    -- Louis Armstrong
> >
>
> Hi,
>
> But you're not consuming the size_t on the other end.  You're only
> peeking it, i.e. you're doing the recv to peek at the message, but
> never calling recv to remove that data from the queue... or am I
> missing something?
>
> --
> Regards,
> Tom Spink
> University of Edinburgh
>

Ach.  I *am* missing something... and what I'm missing is my
understanding of the sendmsg and recvmsg calls.

A quick Google has sorted me out.

-- 
Regards,
Tom Spink
University of Edinburgh

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: Francois Romieu @ 2008-01-08 23:28 UTC (permalink / raw)
  To: David Miller; +Cc: linux, netdev, akpm
In-Reply-To: <20080108.150038.57735076.davem@davemloft.net>

David Miller <davem@davemloft.net> :
[...]
> Same kind of bug as the RX side :-)  I bet this fixes his
> problem...

I am not sure but the Rx side is probably just here to distract
from the real problem. Please don't ask... :o)

Anyway I'll poke an adapter in the test computer and give it a
try tomorrow. Nobody will complain if I crash it.

-- 
Ueimor

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Tom Spink @ 2008-01-08 23:39 UTC (permalink / raw)
  To: Brent Casavant; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <alpine.BSF.1.00.0801081716010.24286@pkunk.americas.sgi.com>

On 08/01/2008, Brent Casavant <bcasavan@sgi.com> wrote:
> On Tue, 8 Jan 2008, Tom Spink wrote:
>
> > Where in the code is the message length being sent across the socket?
>
> In do_producer(), there are the following lines in the main loop:
>
>         /* Send random lengths of data */
>         messages[i].length = (rand() % MAXLEN) + sizeof(size_t);
>         iov[i].iov_len = messages[i].length;
>
> The entire "struct sockmsg" is sent across the socket, so the first
> size_t in each message contains the length of the entire message
> (including the size_t).  This size gets picked up at the
> recv(...,MSG_PEEK) line in do_consumer().
>
> Thanks,
> Brent
>
> --
> Brent Casavant                          All music is folk music.  I ain't
> bcasavan@sgi.com                        never heard a horse sing a song.
> Silicon Graphics, Inc.                    -- Louis Armstrong
>

Hi,

But you're not consuming the size_t on the other end.  You're only
peeking it, i.e. you're doing the recv to peek at the message, but
never calling recv to remove that data from the queue... or am I
missing something?

-- 
Regards,
Tom Spink
University of Edinburgh

^ permalink raw reply

* Re: [PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write
From: linux @ 2008-01-08 23:33 UTC (permalink / raw)
  To: linux, romieu; +Cc: akpm, davem, netdev
In-Reply-To: <20080108204744.GB4450@electric-eye.fr.zoreil.com>

>> +	do {
>> +		/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
>> +		u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA;
>> +		/* + rather than | lets compiler microoptimize better */
>> +		ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
>> +	} while (len);

> Imho something is not quite right when the code needs a comment every line
> and I am mildly convinced that we really want to honk an "optimizing mdio
> methods is ok" signal around.

Oh, but those are SPACE-saving optimiztions. :-)
I know it's not time-critical; it's really pure hack value, but is it
that evil?

> "while (len--) {" is probably more akpm-ish btw.

Well spotted.

[...]
>>  static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>>  {
>>  	void __iomem *ioaddr = ipg_ioaddr(dev);
>> +	u8 const polarity = ipg_r8(PHY_CTRL) &
>> +			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);

> (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY) appears twice. I would not
> mind a #define for it.

I'm hardly going to go to war over over the matter, but actually I disagree.

There's a non-zero mental cost to keeping track of an additional name,
and when it's only used two times, and is pretty simple, I think reducing
the number of layers of #defines to understand is a positive advantage.
The above reads "the two polarity bits from the PHY_CTRL register"
to a person who's never read ipg.h.  Adding IPG_PC_POLARITY_BITS just
requires mentally dereferencing another layer of pointers.

Think of it as a function small enough that it can be inlined.

>> @@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>[...]
>> -	for (i = 0; i < p[6].len; i++) {
>> -		p[6].field |=
>> -		    (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
>> -	}
>> +	send_three_state(ioaddr, polarity);	/* TA first bit */
>> +	(void)read_phy_bit(ioaddr, polarity);		/* TA second bit */
>> +
>> +	for (i = 0; i < 16; i++)
>> +		data += data + read_phy_bit(ioaddr, polarity);
                ^^^^^^^^^^^^
> Huh ?

Okay, I guess you prefer
+		data = 2*data + read_phy_bit(ioaddr, polarity);

That's only one character longer and easier to understand.
Or even four characters:
+		data = (data<<1) + read_phy_bit(ioaddr, polarity);

That's just the synonym that happened to come out of my fingers at the
time.  There's no particular meaning to it.

>> @@ -299,11 +273,13 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>>  static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
[...]
>> +	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
>> +	mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 |
>> +	                                  phy_id << 7 | phy_reg << 2 |
>> +					  0x2, 16);

> Use the 80 cols luke:
>	                                  phy_id << 7 | phy_reg << 2 | 0x2, 16);

Good spotting, thanks.

Here's a revised patch:

    drivers/net/ipg.c: Fixed style problems that AKPM noticed.
    
    (And a few more while at it.  Including an actual bug in enabling multicast
    due to & vs. && confusion.)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 3860fcd..fb69374 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -188,9 +188,9 @@ static void send_end(void __iomem *ioaddr, u8 phyctrlpolarity)
 		phyctrlpolarity) & IPG_PC_RSVD_MASK, PHY_CTRL);
 }
 
-static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
+static unsigned read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
 {
-	u16 bit_data;
+	unsigned bit_data;
 
 	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | phyctrlpolarity);
 
@@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
 }
 
 /*
+ * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1)
+ * of the PhyCtrl register. 1 <= len <= 32.  "ioaddr" is the register
+ * address, and "otherbits" are the values of the other bits.
+ */
+static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, unsigned len)
+{
+	otherbits |= IPG_PC_MGMTDIR;
+	while (len--) {
+		/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
+		u8 d = ((data >> len) & 1) * IPG_PC_MGMTDATA;
+		/* + rather than | allows slight code size microoptimization */
+		ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
+	}
+}
+
+/*
  * Read a register from the Physical Layer device located
  * on the IPG NIC, using the IPG PHYCTRL register.
  */
 static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
+	u8 const polarity = ipg_r8(PHY_CTRL) &
+			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
+	unsigned i, data = 0;
 	/*
 	 * The GMII mangement frame structure for a read is as follows:
 	 *
@@ -218,78 +237,34 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 	 * A = bit of Physical Layer device address (MSB first)
 	 * R = bit of register address (MSB first)
 	 * z = High impedance state
-	 * D = bit of read data (MSB first)
+	 * 0 = preamble bit sent by PHY
+	 * D = bit of read data (MSB first), sent by PHY
 	 *
 	 * Transmission order is 'Preamble' field first, bits transmitted
-	 * left to right (first to last).
+	 * left to right (msbit-first).
 	 */
-	struct {
-		u32 field;
-		unsigned int len;
-	} p[] = {
-		{ GMII_PREAMBLE,	32 },	/* Preamble */
-		{ GMII_ST,		2  },	/* ST */
-		{ GMII_READ,		2  },	/* OP */
-		{ phy_id,		5  },	/* PHYAD */
-		{ phy_reg,		5  },	/* REGAD */
-		{ 0x0000,		2  },	/* TA */
-		{ 0x0000,		16 },	/* DATA */
-		{ 0x0000,		1  }	/* IDLE */
-	};
-	unsigned int i, j;
-	u8 polarity, data;
-
-	polarity  = ipg_r8(PHY_CTRL);
-	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-	for (j = 0; j < 5; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			/* For each variable length field, the MSB must be
-			 * transmitted first. Rotate through the field bits,
-			 * starting with the MSB, and move each bit into the
-			 * the 1st (2^1) bit position (this is the bit position
-			 * corresponding to the MgmtData bit of the PhyCtrl
-			 * register for the IPG).
-			 *
-			 * Example: ST = 01;
-			 *
-			 *          First write a '0' to bit 1 of the PhyCtrl
-			 *          register, then write a '1' to bit 1 of the
-			 *          PhyCtrl register.
-			 *
-			 * To do this, right shift the MSB of ST by the value:
-			 * [field length - 1 - #ST bits already written]
-			 * then left shift this result by 1.
-			 */
-			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-			data &= IPG_PC_MGMTDATA;
-			data |= polarity | IPG_PC_MGMTDIR;
-
-			ipg_drive_phy_ctl_low_high(ioaddr, data);
-		}
-	}
-
-	send_three_state(ioaddr, polarity);
-
-	read_phy_bit(ioaddr, polarity);
+	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+	mdio_write_bits(ioaddr, polarity, GMII_ST<<12 | GMII_READ << 10 |
+	                                  phy_id << 5 | phy_reg, 14);
 
 	/*
 	 * For a read cycle, the bits for the next two fields (TA and
 	 * DATA) are driven by the PHY (the IPG reads these bits).
 	 */
-	for (i = 0; i < p[6].len; i++) {
-		p[6].field |=
-		    (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
-	}
+	send_three_state(ioaddr, polarity);	/* TA first bit */
+	(void)read_phy_bit(ioaddr, polarity);		/* TA second bit */
+
+	for (i = 0; i < 16; i++)
+		data = 2*data + read_phy_bit(ioaddr, polarity);
 
+	/* Trailing idle */
 	send_three_state(ioaddr, polarity);
 	send_three_state(ioaddr, polarity);
 	send_three_state(ioaddr, polarity);
 	send_end(ioaddr, polarity);
 
 	/* Return the value of the DATA field. */
-	return p[6].field;
+	return data;
 }
 
 /*
@@ -299,11 +274,13 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
+	u8 const polarity = ipg_r8(PHY_CTRL) &
+			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
 	/*
-	 * The GMII mangement frame structure for a read is as follows:
+	 * The GMII mangement frame structure for a write is as follows:
 	 *
 	 * |Preamble|st|op|phyad|regad|ta|      data      |idle|
-	 * |< 32 1s>|01|10|AAAAA|RRRRR|z0|DDDDDDDDDDDDDDDD|z   |
+	 * |< 32 1s>|01|01|AAAAA|RRRRR|10|DDDDDDDDDDDDDDDD|z   |
 	 *
 	 * <32 1s> = 32 consecutive logic 1 values
 	 * A = bit of Physical Layer device address (MSB first)
@@ -312,66 +289,17 @@ static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
 	 * D = bit of write data (MSB first)
 	 *
 	 * Transmission order is 'Preamble' field first, bits transmitted
-	 * left to right (first to last).
+	 * left to right (msbit-first).
 	 */
-	struct {
-		u32 field;
-		unsigned int len;
-	} p[] = {
-		{ GMII_PREAMBLE,	32 },	/* Preamble */
-		{ GMII_ST,		2  },	/* ST */
-		{ GMII_WRITE,		2  },	/* OP */
-		{ phy_id,		5  },	/* PHYAD */
-		{ phy_reg,		5  },	/* REGAD */
-		{ 0x0002,		2  },	/* TA */
-		{ val & 0xffff,		16 },	/* DATA */
-		{ 0x0000,		1  }	/* IDLE */
-	};
-	unsigned int i, j;
-	u8 polarity, data;
-
-	polarity  = ipg_r8(PHY_CTRL);
-	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-	for (j = 0; j < 7; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			/* For each variable length field, the MSB must be
-			 * transmitted first. Rotate through the field bits,
-			 * starting with the MSB, and move each bit into the
-			 * the 1st (2^1) bit position (this is the bit position
-			 * corresponding to the MgmtData bit of the PhyCtrl
-			 * register for the IPG).
-			 *
-			 * Example: ST = 01;
-			 *
-			 *          First write a '0' to bit 1 of the PhyCtrl
-			 *          register, then write a '1' to bit 1 of the
-			 *          PhyCtrl register.
-			 *
-			 * To do this, right shift the MSB of ST by the value:
-			 * [field length - 1 - #ST bits already written]
-			 * then left shift this result by 1.
-			 */
-			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-			data &= IPG_PC_MGMTDATA;
-			data |= polarity | IPG_PC_MGMTDIR;
-
-			ipg_drive_phy_ctl_low_high(ioaddr, data);
-		}
-	}
+	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+	mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 |
+	                                  phy_id << 7 | phy_reg << 2 | 0x2, 16);
+	mdio_write_bits(ioaddr, polarity, val, 16);	/* DATA */
 
 	/* The last cycle is a tri-state, so read from the PHY. */
-	for (j = 7; j < 8; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
-
-			p[j].field |= ((ipg_r8(PHY_CTRL) &
-				IPG_PC_MGMTDATA) >> 1) << (p[j].len - 1 - i);
-
-			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
-		}
-	}
+	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
+	(void)ipg_r8(PHY_CTRL);
+	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
 }
 
 /* Set LED_Mode JES20040127EEPROM */
@@ -379,18 +307,17 @@ static void ipg_set_led_mode(struct net_device *dev)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
 	void __iomem *ioaddr = sp->ioaddr;
-	u32 mode;
+	u32 mode = ipg_r32(ASIC_CTRL);
 
-	mode = ipg_r32(ASIC_CTRL);
 	mode &= ~(IPG_AC_LED_MODE_BIT_1 | IPG_AC_LED_MODE | IPG_AC_LED_SPEED);
 
-	if ((sp->LED_Mode & 0x03) > 1)
-		mode |= IPG_AC_LED_MODE_BIT_1;	/* Write Asic Control Bit 29 */
-
-	if ((sp->LED_Mode & 0x01) == 1)
+	if (sp->LED_Mode & 0x01)
 		mode |= IPG_AC_LED_MODE;	/* Write Asic Control Bit 14 */
 
-	if ((sp->LED_Mode & 0x08) == 8)
+	if (sp->LED_Mode & 0x02)
+		mode |= IPG_AC_LED_MODE_BIT_1;	/* Write Asic Control Bit 29 */
+
+	if (sp->LED_Mode & 0x08)
 		mode |= IPG_AC_LED_SPEED;	/* Write Asic Control Bit 27 */
 
 	ipg_w32(mode, ASIC_CTRL);
@@ -401,11 +328,13 @@ static void ipg_set_phy_set(struct net_device *dev)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
 	void __iomem *ioaddr = sp->ioaddr;
-	int physet;
+	u8 physet = ipg_r8(PHY_SET);
+	u8 newset = sp->LED_Mode >> 4;
 
-	physet = ipg_r8(PHY_SET);
-	physet &= ~(IPG_PS_MEM_LENB9B | IPG_PS_MEM_LEN9 | IPG_PS_NON_COMPDET);
-	physet |= ((sp->LED_Mode & 0x70) >> 4);
+	/* Change three bits of physet to values in newset */
+	newset ^= physet;
+	newset &= (IPG_PS_MEM_LENB9B | IPG_PS_MEM_LEN9 | IPG_PS_NON_COMPDET);
+	physet ^= newset;
 	ipg_w8(physet, PHY_SET);
 }
 
@@ -444,7 +373,7 @@ static int ipg_find_phyaddr(struct net_device *dev)
 	unsigned int phyaddr, i;
 
 	for (i = 0; i < 32; i++) {
-		u32 status;
+		unsigned status;
 
 		/* Search for the correct PHY address among 32 possible. */
 		phyaddr = (IPG_NIC_PHY_ADDRESS + i) % 32;
@@ -470,10 +399,7 @@ static int ipg_config_autoneg(struct net_device *dev)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
 	void __iomem *ioaddr = sp->ioaddr;
-	unsigned int txflowcontrol;
-	unsigned int rxflowcontrol;
-	unsigned int fullduplex;
-	unsigned int gig;
+	bool txflowcontrol, rxflowcontrol, fullduplex, gig;
 	u32 mac_ctrl_val;
 	u32 asicctrl;
 	u8 phyctrl;
@@ -487,10 +413,10 @@ static int ipg_config_autoneg(struct net_device *dev)
 	/* Set flags for use in resolving auto-negotation, assuming
 	 * non-1000Mbps, half duplex, no flow control.
 	 */
-	fullduplex = 0;
-	txflowcontrol = 0;
-	rxflowcontrol = 0;
-	gig = 0;
+	fullduplex = false;
+	txflowcontrol = false;
+	rxflowcontrol = false;
+	gig = false;
 
 	/* To accomodate a problem in 10Mbps operation,
 	 * set a global flag if PHY running in 10Mbps mode.
@@ -512,7 +438,7 @@ static int ipg_config_autoneg(struct net_device *dev)
 		break;
 	case IPG_PC_LINK_SPEED_1000MBPS:
 		printk("1000Mbps.\n");
-		gig = 1;
+		gig = true;
 		break;
 	default:
 		printk("undefined!\n");
@@ -520,19 +446,20 @@ static int ipg_config_autoneg(struct net_device *dev)
 	}
 
 	if (phyctrl & IPG_PC_DUPLEX_STATUS) {
-		fullduplex = 1;
-		txflowcontrol = 1;
-		rxflowcontrol = 1;
+		fullduplex = true;
+		txflowcontrol = true;
+		rxflowcontrol = true;
 	}
 
 	/* Configure full duplex, and flow control. */
-	if (fullduplex == 1) {
+	if (fullduplex) {
 		/* Configure IPG for full duplex operation. */
 		printk(KERN_INFO "%s: setting full duplex, ", dev->name);
 
 		mac_ctrl_val |= IPG_MC_DUPLEX_SELECT_FD;
 
-		if (txflowcontrol == 1) {
+		/* FIXME: Does this variable always equal fullduplex? */
+		if (txflowcontrol) {
 			printk("TX flow control");
 			mac_ctrl_val |= IPG_MC_TX_FLOW_CONTROL_ENABLE;
 		} else {
@@ -540,7 +467,7 @@ static int ipg_config_autoneg(struct net_device *dev)
 			mac_ctrl_val &= ~IPG_MC_TX_FLOW_CONTROL_ENABLE;
 		}
 
-		if (rxflowcontrol == 1) {
+		if (rxflowcontrol) {
 			printk(", RX flow control.");
 			mac_ctrl_val |= IPG_MC_RX_FLOW_CONTROL_ENABLE;
 		} else {
@@ -568,9 +495,8 @@ static int ipg_config_autoneg(struct net_device *dev)
 static void ipg_nic_set_multicast_list(struct net_device *dev)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
-	struct dev_mc_list *mc_list_ptr;
-	unsigned int hashindex;
-	u32 hashtable[2];
+	struct dev_mc_list *mc;
+	u32 hashtable[2] = { 0, 0 };
 	u8 receivemode;
 
 	IPG_DEBUG_MSG("_nic_set_multicast_list\n");
@@ -581,52 +507,34 @@ static void ipg_nic_set_multicast_list(struct net_device *dev)
 		/* NIC to be configured in promiscuous mode. */
 		receivemode = IPG_RM_RECEIVEALLFRAMES;
 	} else if ((dev->flags & IFF_ALLMULTI) ||
-		   (dev->flags & IFF_MULTICAST &
-		    (dev->mc_count > IPG_MULTICAST_HASHTABLE_SIZE))) {
+		   (dev->flags & IFF_MULTICAST &&
+		    (dev->mc_count > 2*IPG_MULTICAST_HASHTABLE_SIZE))) {
 		/* NIC to be configured to receive all multicast
 		 * frames. */
 		receivemode |= IPG_RM_RECEIVEMULTICAST;
-	} else if (dev->flags & IFF_MULTICAST & (dev->mc_count > 0)) {
+	} else if (dev->flags & IFF_MULTICAST && (dev->mc_count > 0)) {
 		/* NIC to be configured to receive selected
 		 * multicast addresses. */
 		receivemode |= IPG_RM_RECEIVEMULTICASTHASH;
-	}
-
-	/* Calculate the bits to set for the 64 bit, IPG HASHTABLE.
-	 * The IPG applies a cyclic-redundancy-check (the same CRC
-	 * used to calculate the frame data FCS) to the destination
-	 * address all incoming multicast frames whose destination
-	 * address has the multicast bit set. The least significant
-	 * 6 bits of the CRC result are used as an addressing index
-	 * into the hash table. If the value of the bit addressed by
-	 * this index is a 1, the frame is passed to the host system.
-	 */
-
-	/* Clear hashtable. */
-	hashtable[0] = 0x00000000;
-	hashtable[1] = 0x00000000;
-
-	/* Cycle through all multicast addresses to filter. */
-	for (mc_list_ptr = dev->mc_list;
-	     mc_list_ptr != NULL; mc_list_ptr = mc_list_ptr->next) {
-		/* Calculate CRC result for each multicast address. */
-		hashindex = crc32_le(0xffffffff, mc_list_ptr->dmi_addr,
-				     ETH_ALEN);
 
-		/* Use only the least significant 6 bits. */
-		hashindex = hashindex & 0x3F;
-
-		/* Within "hashtable", set bit number "hashindex"
-		 * to a logic 1.
+		/*
+		 * The IPG uses the standard hash table technique to filter
+		 * incoming multicast packets.  It computes the CRC of the
+		 * incoming MAC address, and uses the low 6 bits of the
+		 * result to select a hash table bit.  If set (and the address
+		 * is a multicast address), the packet is received.
 		 */
-		set_bit(hashindex, (void *)hashtable);
-	}
+		for (mc = dev->mc_list; mc; mv = mc->next) {
+			/* Calculate CRC result for each multicast address. */
+			u32 hashindex = crc32_le(0xffffffff, mc->dmi_addr,
+					ETH_ALEN);
 
-	/* Write the value of the hashtable, to the 4, 16 bit
-	 * HASHTABLE IPG registers.
-	 */
-	ipg_w32(hashtable[0], HASHTABLE_0);
-	ipg_w32(hashtable[1], HASHTABLE_1);
+			/* Use low 6 bits to select hash table bit */
+			set_bit(hashindex & 63, (void *)hashtable);
+		}
+		ipg_w32(hashtable[0], HASHTABLE_0);
+		ipg_w32(hashtable[1], HASHTABLE_1);
+	}
 
 	ipg_w8(IPG_RM_RSVD_MASK & receivemode, RECEIVE_MODE);
 


^ permalink raw reply related

* Re: AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-08 23:20 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, David Miller, linux-kernel
In-Reply-To: <4783FBD6.1000004@hp.com>

On Tue, 8 Jan 2008, Rick Jones wrote:

> Potential bugs notwithstanding, given that this is a STREAM socket, and as
> such shouldn't (I hope, or I'm eating toes for dinner again) have side effects
> like tossing the rest of a datagram, why are you using MSG_PEEK?  Why not
> simply read the N bytes of the message that will have the message length with
> a normal read/recv, and then read that many bytes in the next call?

That's entirely reasonable, and probably a worthwhile change to make.
But, as you say, it doesn't change whether or not this is a bug in
the MSG_PEEK code.

With a small bit of complication I certainly can do what you suggest.

The initial reasoning was that this made it easy to handle the case where
the caller of the library routine (my code which stumbled on this was
part of a small library I wrote as part of the application) did not supply
a sufficiently sized buffer for reception of the next message.  The "easy"
way to do this was a MSG_PEEK to validate the buffer size against the
message size, followed by a regular recv() if the buffer is large enough.
To do what you suggest (which effectively works around the bug) is certainly
possible, but also requires maintaining state between calls to the
"get message" routine, as I need to track whether or not the size has already
been read from the next message on the stream socket.

The change isn't terribly difficult, but wasn't the initial idea.

Plus it's always good to flush a bug out of hiding, right? ;)

Brent

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-08 23:18 UTC (permalink / raw)
  To: Tom Spink; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <7b9198260801081453s198af7efycc7c35668c65eaf1@mail.gmail.com>

On Tue, 8 Jan 2008, Tom Spink wrote:

> Where in the code is the message length being sent across the socket?

In do_producer(), there are the following lines in the main loop:

	/* Send random lengths of data */
	messages[i].length = (rand() % MAXLEN) + sizeof(size_t);
	iov[i].iov_len = messages[i].length;

The entire "struct sockmsg" is sent across the socket, so the first
size_t in each message contains the length of the entire message
(including the size_t).  This size gets picked up at the
recv(...,MSG_PEEK) line in do_consumer().

Thanks,
Brent

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* RE: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: Ramkrishna Vepa @ 2008-01-08 23:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080108.150735.182342472.davem@davemloft.net>

Dave,

Got it. These new napi interface changes were introduced by someone else
and we assumed it to be correct. We will make the fix and submit.

Thanks,
Ram

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, January 08, 2008 3:08 PM
> To: Ramkrishna Vepa
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 3/7]: [NET]: Do not check netif_running() and
carrier
> state in ->poll()
> 
> From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
> Date: Tue, 8 Jan 2008 18:01:32 -0500
> 
> > Dave,
> > Sorry, should have been clearer. When I meant "brought down" did not
> > mean close, but when a adapter reset is initiated. The
napi_disable() is
> > called only on a close. When the driver does a reset, napi_disable()
is
> > not called.
> 
> You should be doing a napi_disable() during a reset, like every
> other driver does.
> 
> It is the only reliable way to prevent the code path from running.
> 
> Otherwise, you can start resetting the device right after
> that check in the ->poll() routine, and thus still touch
> the device during the reset sequence.
> 
> In short the check is wrong, because it doesn't fully prevent
> what you want it to prevent.  Only a napi_disable() would do
> that fully for you.

^ permalink raw reply

* my NAPI patches
From: David Miller @ 2008-01-08 23:08 UTC (permalink / raw)
  To: netdev

Can people do me a favor and do more constructive things like test
that my patches really do fix the "device down during packet flood"
bug instead of nit-picking all the individual driver fixups I made?

That's what is useful at this time, thanks.

^ permalink raw reply

* Re: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: David Miller @ 2008-01-08 23:07 UTC (permalink / raw)
  To: Ramkrishna.Vepa; +Cc: netdev
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD7702CAA9E8@nekter>

From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
Date: Tue, 8 Jan 2008 18:01:32 -0500

> Dave,
> Sorry, should have been clearer. When I meant "brought down" did not
> mean close, but when a adapter reset is initiated. The napi_disable() is
> called only on a close. When the driver does a reset, napi_disable() is
> not called. 

You should be doing a napi_disable() during a reset, like every
other driver does.

It is the only reliable way to prevent the code path from running.

Otherwise, you can start resetting the device right after
that check in the ->poll() routine, and thus still touch
the device during the reset sequence.

In short the check is wrong, because it doesn't fully prevent
what you want it to prevent.  Only a napi_disable() would do
that fully for you.

^ permalink raw reply

* RE: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: Ramkrishna Vepa @ 2008-01-08 23:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080108.144731.168854521.davem@davemloft.net>

Dave,
Sorry, should have been clearer. When I meant "brought down" did not
mean close, but when a adapter reset is initiated. The napi_disable() is
called only on a close. When the driver does a reset, napi_disable() is
not called. 

Ram
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, January 08, 2008 2:48 PM
> To: Ramkrishna Vepa
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 3/7]: [NET]: Do not check netif_running() and
carrier
> state in ->poll()
> 
> From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
> Date: Tue, 8 Jan 2008 13:17:03 -0500
> 
> > Dave,
> >
> > This change is not required as the macro, is_s2io_card_up() checks
for
> > an internal state of the adapter and not netif's state. We want to
make
> > sure that the adapter registers are not accessed when the adapter is
> > being brought down.
> 
> If the adapter is being brought down, you would have done
> a napi_disable() first and therefore never reach this code
> path.
> 
> The removal is correct, I read how your driver works.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: David Miller @ 2008-01-08 23:00 UTC (permalink / raw)
  To: romieu; +Cc: linux, netdev, akpm
In-Reply-To: <20080108213640.GC4450@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Tue, 8 Jan 2008 22:36:40 +0100

> linux@horizon.com <linux@horizon.com> :
> > I take that back.  This patch does NOT fix the leak, at least if
> > ping: sendmsg: No buffer space available
> > is any indication...
> 
> Can you try the patch below ?

Same kind of bug as the RX side :-)  I bet this fixes his
problem...

^ permalink raw reply

* Re: [PATCH 7/7]: [NET]: Make ->poll() breakout consistent in Intel ethernet drivers.
From: David Miller @ 2008-01-08 22:55 UTC (permalink / raw)
  To: auke-jan.h.kok; +Cc: netdev, jesse.brandeburg, john.ronciak
In-Reply-To: <4783C95B.8060005@intel.com>

From: "Kok, Auke" <auke-jan.h.kok@intel.com>
Date: Tue, 08 Jan 2008 11:04:59 -0800

> this is exactly the change I was eyeballing and indeed this seems to be the
> general use case in most drivers anyway.
> 
> I'll try to see how this impacts the (especially 4-port) TX performance issue with
> e1000e, but this should be just fine for now, and we can address later anyway.
> 
> Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
> 

Ok, thanks for reviewing.

^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Rafael J. Wysocki @ 2008-01-08 22:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Randy Dunlap, Kevin Winchester, J. Bruce Fields,
	Al Viro, Linux Kernel Mailing List, Andrew Morton, NetDev
In-Reply-To: <alpine.LFD.1.00.0801081126200.3148@woody.linux-foundation.org>

On Tuesday, 8 of January 2008, Linus Torvalds wrote:
> 
> On Tue, 8 Jan 2008, Arjan van de Ven wrote:
> > 
> > ok done; I had to fizzle a bit because some things aren't *exactly* a 
> > BUG() statement but I track them anyway (things like the "sleeping in 
> > invalid context" check), so I had to somewhat arbitrarily assign 
> > categories for those. I might fine tune these over time some; if you or 
> > someone else sees problems with categorization please let me know
> 
> Looking good. I wonder if we could also have some way to cross-ref these 
> things with the regression list (notably try to get pointers to them in 
> the regression list). 

I'm thinking about that, but haven't invented any automated solution yet.
I only can manually add references to kerneloops.org, for now.

Greetings,
Rafael

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Tom Spink @ 2008-01-08 22:53 UTC (permalink / raw)
  To: Rick Jones; +Cc: Brent Casavant, netdev, David Miller, linux-kernel
In-Reply-To: <4783FBD6.1000004@hp.com>

On 08/01/2008, Rick Jones <rick.jones2@hp.com> wrote:
> Potential bugs notwithstanding, given that this is a STREAM socket, and
> as such shouldn't (I hope, or I'm eating toes for dinner again) have
> side effects like tossing the rest of a datagram, why are you using
> MSG_PEEK?  Why not simply read the N bytes of the message that will have
> the message length with a normal read/recv, and then read that many
> bytes in the next call?
>
> rick jones
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Hi,

Where in the code is the message length being sent across the socket?

-- 
Regards,
Tom Spink
University of Edinburgh

^ permalink raw reply

* One more patch... -- Re: Please pull 'upstream-jgarzik' branch of wireless-2.6
From: John W. Linville @ 2008-01-08 22:42 UTC (permalink / raw)
  To: jeff; +Cc: netdev, linux-wireless, davem
In-Reply-To: <20080108222305.GD3086@tuxdriver.com>

On Tue, Jan 08, 2008 at 05:23:05PM -0500, John W. Linville wrote:
> Jeff,
> 
> Another round of patches intended for 2.6.25...the biggest factions are
> rt2x00 and b43 updates, as well as some Viro-isms... :-)
> 
> Please let me know if there are any problems!
> 
> John
> 
> P.S.  Copying Dave in case he is handling these requests...FWIW, it
> will definitely depend on the patches already in netdev-2.6#upstream...

I left out a patch.  I have pushed it on top of the previous request.

Let me know if there are problems!

Thanks,

John

---

The following changes since commit f94de7b013f78ad8bbe1064c108dd55141efb177:
  Miguel Botón (1):
        iwlwifi: fix compilation warning in 'iwl-4965.c'

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-jgarzik

Michael Buesch (1):
      zd1211rw: fix alignment for QOS and WDS frames

 drivers/net/wireless/zd1211rw/zd_mac.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c
index 14fb727..7b86930 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -623,6 +623,8 @@ int zd_mac_rx(struct ieee80211_hw *hw, const u8 *buffer, unsigned int length)
 	const struct rx_status *status;
 	struct sk_buff *skb;
 	int bad_frame = 0;
+	u16 fc;
+	bool is_qos, is_4addr, need_padding;
 
 	if (length < ZD_PLCP_HEADER_SIZE + 10 /* IEEE80211_1ADDR_LEN */ +
 	             FCS_LEN + sizeof(struct rx_status))
@@ -674,9 +676,22 @@ int zd_mac_rx(struct ieee80211_hw *hw, const u8 *buffer, unsigned int length)
 			&& !mac->pass_ctrl)
 		return 0;
 
-	skb = dev_alloc_skb(length);
+	fc = le16_to_cpu(*((__le16 *) buffer));
+
+	is_qos = ((fc & IEEE80211_FCTL_FTYPE) == IEEE80211_FTYPE_DATA) &&
+		 ((fc & IEEE80211_FCTL_STYPE) == IEEE80211_STYPE_QOS_DATA);
+	is_4addr = (fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) ==
+		   (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS);
+	need_padding = is_qos ^ is_4addr;
+
+	skb = dev_alloc_skb(length + (need_padding ? 2 : 0));
 	if (skb == NULL)
 		return -ENOMEM;
+	if (need_padding) {
+		/* Make sure the the payload data is 4 byte aligned. */
+		skb_reserve(skb, 2);
+	}
+
 	memcpy(skb_put(skb, length), buffer, length);
 
 	ieee80211_rx_irqsafe(hw, skb, &stats);
-- 
John W. Linville
linville@tuxdriver.com

^ permalink raw reply related

* Re: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: David Miller @ 2008-01-08 22:47 UTC (permalink / raw)
  To: Ramkrishna.Vepa; +Cc: netdev
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD7702CAA89B@nekter>

From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
Date: Tue, 8 Jan 2008 13:17:03 -0500

> Dave,
> 
> This change is not required as the macro, is_s2io_card_up() checks for
> an internal state of the adapter and not netif's state. We want to make
> sure that the adapter registers are not accessed when the adapter is
> being brought down.

If the adapter is being brought down, you would have done
a napi_disable() first and therefore never reach this code
path.

The removal is correct, I read how your driver works.

^ permalink raw reply

* Re: SACK scoreboard
From: David Miller @ 2008-01-08 22:44 UTC (permalink / raw)
  To: jheffner; +Cc: ilpo.jarvinen, lachlan.andrew, netdev, quetchen
In-Reply-To: <4783AA29.3080406@psc.edu>

From: John Heffner <jheffner@psc.edu>
Date: Tue, 08 Jan 2008 11:51:53 -0500

> I haven't thought about this too hard, but can we approximate this by 
> moving scaked data into a sacked queue, then if something bad happens 
> merge this back into the retransmit queue?

That defeats the impetus for the change.

We want to free up the data, say, 2 packets at a time as
ACKs come in.  The key goal is smooth liberation of
retransmit queue packets over time.

The big problem is that recovery from even a single packet loss in a
window makes us run kfree_skb() for a all the packets in a full
window's worth of data when recovery completes.

If we just move such packets to a seperate list, we still have to
iterate over all of them when the cumulative ACK arrives.

This problem, that retransmit queue liberation is not smooth, is the
biggest flaw in how SACK is specified.  I mean, consider Ilpo's
mentioned case of 500,000 packet windows.  The issue cannot be
ignored.  SACK is clearly broken.

You speak of a path in Linux where we can reneg on SACKs, but I doubt
it really ever runs because of how aggressive the queue collapser is.
Alexey even has a comment there:

	 * This must not ever occur. */

To be honest this code sits here because it was written before the
queue collapser was coded up.

Really, find me a box where the LINUX_MIB_OFOPRUNED or
LINUX_MIB_RECVPRUNED counters are anything other than zero.

So this is a non-issue and I did consider it before proposing that we
redefine SACK.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox