Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: coding style question on indentation
From: Joe Perches @ 2011-07-05 20:01 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Sathya.Perla, netdev
In-Reply-To: <4E136C0C.8010806@genband.com>

On Tue, 2011-07-05 at 13:54 -0600, Chris Friesen wrote:
> On 07/01/2011 09:23 AM, Joe Perches wrote:
> > I think this better:
> >           1         2         3         4         5         6         7         8
> > 12345678901234567890123456789012345678901234567890123456789012345678901234567890
> > 	dma_unmap_page(&adapter->pdev->dev, dma_unmap_addr(rx_page_info, bus),
> > 		       adapter->big_page_size, DMA_FROM_DEVICE);
> > maximally fill to 80 cols, then wrap with maximal tabs using spaces
> > if necessary to align args after opening parenthesis.
> > In this case, 2 tabs, 7 spaces.
> I think the most flexible is to use hard tabs to align the continued 
> line to the same level of indentation as the parent line, then use 
> spaces for "pretty" alignment.  That way, it all lines up regardless of 
> how many spaces the viewer has set for the equivalent size of hard tabs.

True, but none of the common editing tools do this.

You could produce and publish emacs/vi modes for this
and then teach checkpatch to understand this mode and
not complain about the spacing if it matters that much
to you.


^ permalink raw reply

* Re: IPv6 destination cache in Linux kernel
From: Chris Friesen @ 2011-07-05 19:58 UTC (permalink / raw)
  To: David Miller; +Cc: mparemm, netdev
In-Reply-To: <20110704.174619.1196575558517508080.davem@davemloft.net>

On 07/04/2011 06:46 PM, David Miller wrote:
> From: Mikhail Paremski<mparemm@gmail.com>
> Date: Mon, 4 Jul 2011 17:35:04 -0700
>
>> 1. What are the reasons to rout datagrams differently fromIPv4?
>
> It was just a design decision made when the ipv6 routing code
> was implemented.

Presumably there was some rationale for doing it differently than the 
existing IPv4 code.  Or was it just for fun?

>> 2. Where I could get details how IPv6 stack routes datagrams?
>
> By reading the source code.

While the source code has the final say, for someone new to a given part 
of the kernel it is helpful to have a high-level view of the overall 
flow.  It's a lot easier to understand something if you can place it in 
the overall scheme of of things.

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply

* Re: coding style question on indentation
From: Chris Friesen @ 2011-07-05 19:54 UTC (permalink / raw)
  To: Joe Perches; +Cc: Sathya.Perla, netdev
In-Reply-To: <1309533793.7277.51.camel@Joe-Laptop>

On 07/01/2011 09:23 AM, Joe Perches wrote:

> I think this better:
>
>           1         2         3         4         5         6         7         8
> 12345678901234567890123456789012345678901234567890123456789012345678901234567890
>
> 	dma_unmap_page(&adapter->pdev->dev, dma_unmap_addr(rx_page_info, bus),
> 		       adapter->big_page_size, DMA_FROM_DEVICE);
>
> maximally fill to 80 cols, then wrap with maximal tabs using spaces
> if necessary to align args after opening parenthesis.
> In this case, 2 tabs, 7 spaces.

I think the most flexible is to use hard tabs to align the continued 
line to the same level of indentation as the parent line, then use 
spaces for "pretty" alignment.  That way, it all lines up regardless of 
how many spaces the viewer has set for the equivalent size of hard tabs.

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Neil Horman @ 2011-07-05 19:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309891516.2545.23.camel@edumazet-laptop>

On Tue, Jul 05, 2011 at 08:45:16PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 20:32 +0200, Eric Dumazet a écrit :
> 
> > Then, maybe the driver model is completely wrong, and should really
> > setup 512 buffers, or use less descs but set EOT on last one.
> > 
> > Currently it uses a 200 sliding window out of the 512 descs.
> > 
> > 
> 
> One thing we could do would be to allocate a special guard buffer and
> set all 'out of window' descriptors to point to this guard buffer, and
> periodically check if buffer is dirtied by the card.
> 
> (first word would be enough)
> 
> (instead of setting desc->addr to NULL, set to
> dma_map_single(guard_buffer))
> 
I think this is a goo idea, at least for testing.  It seems odd to me that we
have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
descriptor to be processed next (the documentation isnt' very verbose on the
subject), along with the EOT bit on a descriptor.  It seems like both the
register and the bit are capable of conveying the same (or at least overlapping)
information.

I think what I'm having the most trouble with is understanding when the hw looks
at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
register?  Of does it stall until such time as the DMARX_PTR register is rotated
around?  What if it doesn't see the EOT bit set?  Does it just keep going with
the next descriptor?  

Also, there seems to be some inconsistency in the settnig of the B44_DMARX_PTR
register.  In bnx2_init_hw its set to the value of bp->rx_pending, which is
defined as being 200.  But in b44_rx its advanced by sizeof(struct dma_desc) for
every iteration.  So in b44_init_hw we write the value 200 to it, ostensibly
indicating a limit of 200 descriptors, but in b44_rx we iteratively write the
values 0, 8, 16, 24...4*n to the register to indicate which descriptor we're
indexing?  Something really doesn't sit right with me there.  In the former case
we treat the register as holding  number of entries, and in the latter we treat
it as holding a byte offset into an array.  Or am I missing something?

Regards
Neil

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: divide error: 0000, in bictcp_cong_avoid, kernel 2.6.39
From: TB @ 2011-07-05 19:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110705101618.744bbab0@nehalam.ftrdhcpuser.net>

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On 11-07-05 01:16 PM, Stephen Hemminger wrote:
> On Mon, 04 Jul 2011 14:10:16 -0400
> TB <lkml@techboom.com> wrote:
> 
>> On 11-07-04 01:36 PM, Stephen Hemminger wrote:
>>> Any data about the type of connection, kernel configuration or other
>>> information that might be useful in reproducing the problem?
>>>
>>> Also please try 2.6.39.2
>>
>> We haven't found a sure way of reproducing it.
>> It happened on 1.2% of our servers over the weekend and seems random.
>> Both are connected with 2 gigabit ports using bonding. Traffic tends to
>> be heavy, but doesn't seem to be a factor.
>>
>> Would a .config help ?
>>
>> Only the very basic filter module for iptables is compiled in.
>>
>> We will try 2.6.39.2 soon
> 
> Kernel config (and compiler version) would help in identifying which
> of the three divides is getting divide by zero.

# gcc --version
gcc (Debian 4.3.2-1.1) 4.3.2


# as --version
GNU assembler (GNU Binutils for Debian) 2.18.0.20080103
This assembler was configured for a target of `x86_64-linux-gnu'.

[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 13475 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3 v2] ssb: use pci_dev->subsystem_{vendor,device}
From: John W. Linville @ 2011-07-05 19:34 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: mb, netdev
In-Reply-To: <201107012234.42426.sshtylyov@ru.mvista.com>

On Fri, Jul 01, 2011 at 10:34:42PM +0400, Sergei Shtylyov wrote:
> The SSB code reads PCI subsystem IDs from the PCI configuration registers while
> they are already stored by the PCI subsystem in the 'subsystem_{vendor|device}'
> fields of 'struct pci_dev'...
> 
> Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

Applied.

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH 2/3 v2] ssb: use pci_dev->revision
From: John W. Linville @ 2011-07-05 19:34 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: mb, netdev
In-Reply-To: <201107012233.08489.sshtylyov@ru.mvista.com>

On Fri, Jul 01, 2011 at 10:33:08PM +0400, Sergei Shtylyov wrote:
> The SSB code reads PCI revision ID from the PCI configuration register while
> it's already stored by the PCI subsystem in the 'revision' field  of 'struct
> pci_dev'...
> 
> Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

Applied.

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH 1/3 resend] ssb: PCI revision ID register is 8-bit wide
From: John W. Linville @ 2011-07-05 19:33 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: mb, netdev
In-Reply-To: <201107012235.28745.sshtylyov@ru.mvista.com>

On Fri, Jul 01, 2011 at 10:35:28PM +0400, Sergei Shtylyov wrote:
> The SSB code reads PCI revision ID register as 16-bit entity while the register
> is actually 8-bit only (the next 8 bits are the programming interface register).
> Fix the read and make the 'rev' field of 'struct ssb_boardinfo' 8-bit as well,
> to match the register size.
> 
> Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

Applied.

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: libpcap and tc filters
From: jamal @ 2011-07-05 19:19 UTC (permalink / raw)
  To: Adam Katz; +Cc: Eric Dumazet, netdev
In-Reply-To: <CAA0qwj6Wanp45ntTm0FK+fM37b_N8+T72VJfr9Et8oVb9LqOrg@mail.gmail.com>

On Tue, 2011-07-05 at 19:54 +0300, Adam Katz wrote:

> 
> unless someone suggests a better solution, it seems like i'll be
> adhering to the first law of engineering - "if it works, don't fix it"
> and simply replace install fedora core 15 instead of ubuntu.

My feeling is it has nothing to do with distro. Probably
your tcpreplay?

I am running ubuntu 10.04, but own compiled kernel (3.0-rc4). 
tcpreplay is older than Eric's version:
--
tcpreplay version: 3.4.3 (build 2375)
Copyright 2001-2009 by Aaron Turner <aturner at synfin dot net>
Cache file supported: 04
Not compiled with libdnet.
Compiled against libpcap: 1.0.0
64 bit packet counters: enabled
Verbose printing via tcpdump: enabled
Packet editing: disabled
Fragroute engine: disabled
Injection method: PF_PACKET send()
---

cheers,
jamal


^ permalink raw reply

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Dan Magenheimer @ 2011-07-05 19:18 UTC (permalink / raw)
  To: Loke, Chetan, netdev; +Cc: Konrad Wilk, linux-mm
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B91257D6FD@nsmail.netscout.com>

> From: Loke, Chetan [mailto:Chetan.Loke@netscout.com]
> > From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> > Subject: RE: [RFC] non-preemptible kernel socket for RAMster
> >
> > > From: Loke, Chetan [mailto:Chetan.Loke@netscout.com]
> > > Sent: Tuesday, July 05, 2011 10:37 AM
> > > To: Dan Magenheimer; netdev@vger.kernel.org
> > > Cc: Konrad Wilk; linux-mm
> > > Subject: RE: [RFC] non-preemptible kernel socket for RAMster
> > >
> > > > In working on a kernel project called RAMster* (where RAM on a
> > > > remote system may be used for clean page cache pages and for swap
> > > > pages), I found I have need for a kernel socket to be used when
> > >
> > > How is RAMster+swap different than NBD's (pending etc?)support for
> > > SWAP over NBD?
> >
> > I may be ignorant of details about NBD, but did some quick
> > research using google.  If I understand correctly, swap over
> > NBD is still writing to a configured swap disk on the remote
> 
> Hi - I thought NBD-server needs a backing store(a file).
> Now the file itself could reside on a RAM-drive or disk-drive etc.
> And so a remote NBD(disk or RAM) can be mounted locally as a swap
> device.
> The local client should still see it as a block device.
> 
> I haven't used the RAM-drive feature myself but you may want to check if
> it
> works or even borrow that logic in your code.

Actually, RAMster is using a much more flexible type of
RAM-drive; it is built on top of Transcendent Memory
and on top of zcache (and thus on top of cleancache and
frontswap).  A RAM-drive is fixed size so is not very suitable
for the flexibility required for RAMster.  For example,
suppose you have two machines A and B.  At one point in
time A is overcommitted and needs to swap and B is relatively
idle.  Then later, B is overcommitted and needs to swap and
A is relatively idle.  RAMster can handle this entirely
dynamically, a RAM-drive cannot.

> > machine.  RAMster is swapping to *RAM* on the remote machine.
> > The idea is that most machines are very overprovisioned in
> > RAM, and are rarely using all of their RAM, especially when
> > a machine is (mostly) idle.  In other words, the "max of
> > the sums" of RAM usage on a group of machines is much lower
> > than the "sum of the max" of RAM usage.
> >
> > So if the network is sufficiently faster than disk for
> > moving a page of data, RAMster provides a significant
> > performance improvement.  OR RAMster may allow a significant
> > reduction in the total amount of RAM across a data center.
> >
> > The version of RAMster I am working on now is really
> > a proof-of-concept that works over sockets, using the
> > ocfs2 cluster layer.  One can easily envision a future
> > "exo-fabric" which allows one machine to write to the
> > RAM of another machine... for this future hardware,
> > RAMster becomes much more interesting.
> 
> Or you can also try scst-in-RAM mode(if you want to experiment with
> different fabrics).

Thanks.  Could you provide a pointer for this?  I found
the SCST sourceforge page but no obvious references to
scst-in-ram-mode.  (But also, since it appears to be
SCSI-related, I wonder if it also assumes a fixed size
target device, RAM or disk or ??)

Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Dan Magenheimer @ 2011-07-05 19:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Konrad Wilk, linux-mm
In-Reply-To: <1309890239.2545.10.camel@edumazet-laptop>

> > > > +++ linux-2.6.37-ramster/net/core/sock.c	2011-07-03 19:10:04.340980799 -0600
> > > > @@ -1587,6 +1587,14 @@ static void __lock_sock(struct sock *sk)
> > > >  	__acquires(&sk->sk_lock.slock)
> > > >  {
> > > >  	DEFINE_WAIT(wait);
> > > > +	if (!preemptible()) {
> > > > +		while (sock_owned_by_user(sk)) {
> > > > +			spin_unlock_bh(&sk->sk_lock.slock);
> > > > +			cpu_relax();
> > > > +			spin_lock_bh(&sk->sk_lock.slock);
> > > > +		}
> > > > +		return;
> > > > +	}
> > >
> > > Hmm, was this tested on UP machine ?
> >
> > Hi Eric --
> >
> > Thanks for the reply!
> >
> > I hadn't tested UP in awhile so am testing now, and it seems to
> > work OK so far.  However, I am just testing my socket, *not* testing
> > sockets in general.  Are you implying that this patch will
> > break (kernel) sockets in general on a UP machine?  If so,
> > could you be more specific as to why?  (Again, I said
> > I am a networking idiot. ;-)  I played a bit with adding
> > a new SOCK_ flag and triggering off of that, but this
> > version of the patch seemed much simpler.
> 
> Say you have two processes and socket S
> 
> One process locks socket S, and is preempted by another process.
> 
> This second process is non preemptible and try to lock same socket.
> 
> -> deadlock, since P1 never releases socket S

Oh, OK.  My use model is that a socket that is used non-preemptible
must always be used non-preemptible.  In other words, this kind
of socket is an extreme form of non-blocking.  Doesn't that seem
like a reasonable constraint? 

Thanks,
Dan

^ permalink raw reply

* Re: [PATCH net-next 6/6] r8169: support RTL8111E-VL
From: Francois Romieu @ 2011-07-05 18:56 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309859095-32031-6-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
[...]
> +	rtl_writephy(tp, 0x1e, 0x002D);
                                   ^^ 2d :o)

Some rtl_eri_write probably deserve to be tabulated. It can wait though.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net-next 5/6] r8169: fix wake on lan setting for 8111E
From: Francois Romieu @ 2011-07-05 18:55 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309859095-32031-5-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> Only 8111E needs enable RxConfig bit 0 ~ 3 when suspending or
> shutdowning when supporting wake on lan.

Nit: if it only applies to the non-VL 8111E, it could make sense to apply
it after the 8168E-VL support patch and outline this fact in the comment.

Your call.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net-next 4/6] r8169: add ERI functions
From: Francois Romieu @ 2011-07-05 18:55 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309859095-32031-4-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
[...]
> @@ -1046,6 +1046,49 @@ static u32 rtl_csi_read(void __iomem *ioaddr, int addr)
>  	return value;
>  }
>  
> +static
> +void rtl_eri_write(void __iomem *ioaddr, int addr, u32 mask, u32 val, int type)
> +{
> +	unsigned int i;
> +
> +	BUG_ON((addr & 3) || (mask == 0));
> +	RTL_W32(ERIDR, val);
> +	RTL_W32(ERIAR, ERIAR_WRITE_CMD | type | mask | addr);
> +
> +	for (i = 0; i < 100; i++) {
> +		udelay(100);
> +		if (!(RTL_R32(ERIAR) & ERIAR_FLAG))
> +			break;

I'd rather reverse those if possible : implicit posted write flushing read,
then udelay.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net-next 3/6] r8169: adjust the settings about RxConfig
From: Francois Romieu @ 2011-07-05 18:55 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309859095-32031-3-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
[...]
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index cdbbe47..3aeae68 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -71,7 +71,7 @@ static const int multicast_filter_limit = 32;
>  
>  #define MAX_READ_REQUEST_SHIFT	12
>  #define RX_FIFO_THRESH	7	/* 7 means NO threshold, Rx buffer level before first PCI xfer. */
> -#define RX_DMA_BURST	6	/* Maximum PCI burst, '6' is 1024 */
> +#define RX_DMA_BURST	7	/* Maximum PCI burst, '7' is Unlimited */
>  #define TX_DMA_BURST	6	/* Maximum PCI burst, '6' is 1024 */
>  #define SafeMtu		0x1c20	/* ... actually life sucks beyond ~7k */
>  #define InterFrameGap	0x03	/* 3 means InterFrameGap = the shortest one */
> @@ -272,9 +272,6 @@ enum rtl_registers {
>  	IntrStatus	= 0x3e,
>  	TxConfig	= 0x40,
>  	RxConfig	= 0x44,
> -
> -#define RTL_RX_CONFIG_MASK		0xff7e1880u
> -

Nit: could you remove RxCfgFIFOShift and add something like 

#define RXCFG_FIFO_SHIFT	13
				/* No threshold before first PCI xfer. */
#define RXCFG_FIFO_THRESH	(7 << RXCFG_FIFO_SHIFT)
#define RXCFG_DMA_SHIFT		8
				/* Unlimited maximum PCI burst. */
#define	RXCFG_DMA_BURST		(7 << RXCFG_DMA_SHIFT)

(and move both RX128_INT_EN and RX_MULTI_EN near RxConfig in patch #1)

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net-next 2/6] r8169: modify the flow hw reset
From: Francois Romieu @ 2011-07-05 18:55 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309859095-32031-2-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> Replace rtl8169_asic_down with rtl8169_hw_reset. Clear RxConfig
> bit 0 ~ 3 and do some checking before reset. Remove hw reset
> which is before hw_start because reset would be done in close or
> down.

The whole description of the changes ought to explain why things are
changed.

> Signed-off-by: Hayes Wang <hayeswang@realtek.com>
> ---
>  drivers/net/r8169.c |   43 ++++++++++++++++++++++++-------------------
>  1 files changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 701ab6b..cdbbe47 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -731,6 +731,8 @@ static int rtl8169_poll(struct napi_struct *napi, int budget);
>  static const unsigned int rtl8169_rx_config =
>  	(RX_FIFO_THRESH << RxCfgFIFOShift) | (RX_DMA_BURST << RxCfgDMAShift);
>  
> +static void rtl8169_init_ring_indexes(struct rtl8169_private *tp);
> +

rtl8169_init_ring_indexes is really short. Please move it before its first use
and avoid the forward declaration.

>  static u32 ocp_read(struct rtl8169_private *tp, u8 mask, u16 reg)
>  {
>  	void __iomem *ioaddr = tp->mmio_addr;
> @@ -1076,13 +1078,6 @@ static void rtl8169_irq_mask_and_ack(void __iomem *ioaddr)
>  	RTL_W16(IntrStatus, 0xffff);
>  }
>  
> -static void rtl8169_asic_down(void __iomem *ioaddr)
> -{
> -	RTL_W8(ChipCmd, 0x00);
> -	rtl8169_irq_mask_and_ack(ioaddr);
> -	RTL_R16(CPlusCmd);
> -}
> -
>  static unsigned int rtl8169_tbi_reset_pending(struct rtl8169_private *tp)
>  {
>  	void __iomem *ioaddr = tp->mmio_addr;
> @@ -3352,10 +3347,12 @@ static void rtl_hw_reset(struct rtl8169_private *tp)
>  
>  	/* Check that the chip has finished the reset. */
>  	for (i = 0; i < 100; i++) {
> +		udelay(100);
>  		if ((RTL_R8(ChipCmd) & CmdReset) == 0)
>  			break;

Nit: is it forbidden to perform the read - and thus the implicit PCI flush -
before the first 100 us delay ?

> -		msleep_interruptible(1);
>  	}
> +
> +	rtl8169_init_ring_indexes(tp);
>  }
>  
>  static int __devinit
> @@ -3737,6 +3734,16 @@ err_pm_runtime_put:
>  	goto out;
>  }
>  
> +static void rtl_rx_close(struct rtl8169_private *tp)
> +{
> +	void __iomem *ioaddr = tp->mmio_addr;
> +	u32 rxcfg = RTL_R32(RxConfig);
> +
> +	rxcfg &= ~(AcceptBroadcast | AcceptMulticast |
> +		   AcceptMyPhys | AcceptAllPhys);
> +	RTL_W32(RxConfig, rxcfg);
> +}
> +

Should not error and runt packets be considered too ?

<shot in the dark>
Is there any relationship with commit ca52efd5490f97f396d3c5863ba714624f272033 ?
</shot in the dark>

>  static void rtl8169_hw_reset(struct rtl8169_private *tp)
>  {
>  	void __iomem *ioaddr = tp->mmio_addr;
> @@ -3744,19 +3751,20 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
>  	/* Disable interrupts */
>  	rtl8169_irq_mask_and_ack(ioaddr);
>  
> +	rtl_rx_close(tp);
> +
>  	if (tp->mac_version == RTL_GIGA_MAC_VER_27 ||
>  	    tp->mac_version == RTL_GIGA_MAC_VER_28 ||
>  	    tp->mac_version == RTL_GIGA_MAC_VER_31) {
>  		while (RTL_R8(TxPoll) & NPQ)
>  			udelay(20);
>  
> +	} else {
> +		RTL_W8(ChipCmd, RTL_R8(ChipCmd) | StopReq);
> +		udelay(100);

No posted PCI write flush ?

Please remove the empty line after the udelay(20). It should not
have been there in the first place.

>  	}
>  
> -	/* Reset the chipset */
> -	RTL_W8(ChipCmd, CmdReset);
> -
> -	/* PCI commit */
> -	RTL_R8(ChipCmd);
> +	rtl_hw_reset(tp);
>  }
>  
>  static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
> @@ -3776,8 +3784,6 @@ static void rtl_hw_start(struct net_device *dev)
>  {
>  	struct rtl8169_private *tp = netdev_priv(dev);
>  
> -	rtl_hw_reset(tp);
> -
>  	tp->hw_start(dev);
>  
>  	netif_start_queue(dev);
> @@ -4718,7 +4724,6 @@ static void rtl8169_reset_task(struct work_struct *work)
>  
>  	rtl8169_tx_clear(tp);
>  
> -	rtl8169_init_ring_indexes(tp);
>  	rtl_hw_start(dev);
>  	netif_wake_queue(dev);
>  	rtl8169_check_link_status(dev, tp, tp->mmio_addr);

I do not see where the ring indexes will be set when __rtl8169_resume()
schedules rtl8169_reset_task. It could hurt.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net-next 1/6] r8169: adjust some registers
From: Francois Romieu @ 2011-07-05 18:53 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1309859095-32031-1-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> Define new registers and modify some existing ones.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>
> ---
>  drivers/net/r8169.c |   30 +++++++++++++++++++++++-------
>  1 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index fbd6838..701ab6b 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -327,7 +327,7 @@ enum rtl8168_8101_registers {
>  #define	EPHYAR_REG_SHIFT		16
>  #define	EPHYAR_DATA_MASK		0xffff
>  	DLLPR			= 0xd0,
> -#define	PM_SWITCH			(1 << 6)
> +#define	PFM_EN				(1 << 6)
>  	DBG_REG			= 0xd1,
>  #define	FIX_NAK_1			(1 << 4)
>  #define	FIX_NAK_2			(1 << 3)
> @@ -335,6 +335,7 @@ enum rtl8168_8101_registers {
>  	MCU			= 0xd3,
>  #define	EN_NDP				(1 << 3)
>  #define	EN_OOB_RESET			(1 << 2)
> +#define NOW_IS_OOB			(1 << 7)
         ^^^ there should be a tab, not a space.

Nit : 3, 2, 7 is a bit surprizing. 7, 3, 2 ?

@@ -345,18 +346,31 @@ enum rtl8168_8101_registers {
 };
 
 enum rtl8168_registers {
+	LED_FREQ		= 0x1a,
+	EEE_LED			= 0x1b,
+
+	/* TxConfig */
+#define AUTO_FIFO			(1 << 7)
+#define TX_EMPTY			(1 << 11)

Eventually move these close to the existing TxConfig register as :

+#define TXCFG_AUTO_FIFO		(1 <<  7) /* 8168e (?) */
+#define TXCFG_EMPTY			(1 << 11) /* 8168e (?) */

-- 
Ueimor

^ permalink raw reply

* [BUG] bd4265fe36 bridge: Only flood unreg groups... breaks DHCP setup
From: Michael Guntsche @ 2011-07-05 18:43 UTC (permalink / raw)
  To: xu; +Cc: netdev, David Miller

Hello,

After updating from 3.0.0-rc5 to rc6 I noticed that my cellphone was no
longer able to get an address assigned from my DHCP server. While trying
to figure out the problem I noticed that tracing with tcpdump made it
work again.

The setup I have here is the following:
PPC embedded board where the wired NIC and the wlan NIC are in a bridge.
Dnsmasq is listening on the Bridge device itself.

Looking at the changes between rc5 and rc6 I noticed commit

 bd4265fe365c0f3945d: bridge: Only flood unregistered groups to routers

For testing purposes I reverted it and the cellphone immediately got an
address even without running tcpdump. Now apparently the commit states
that the user can always force flooding behaviour to any given port by
marking it as a router but I did not find any documentation how to do
that.

Now my question? Is it "normal" that this change breaks my setup here
and if it is expected how can I force my ports to the old behaviour
without reverting the commit.

Thank you very much in advance,
Michael 

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05 18:45 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309890775.2545.17.camel@edumazet-laptop>

Le mardi 05 juillet 2011 à 20:32 +0200, Eric Dumazet a écrit :

> Then, maybe the driver model is completely wrong, and should really
> setup 512 buffers, or use less descs but set EOT on last one.
> 
> Currently it uses a 200 sliding window out of the 512 descs.
> 
> 

One thing we could do would be to allocate a special guard buffer and
set all 'out of window' descriptors to point to this guard buffer, and
periodically check if buffer is dirtied by the card.

(first word would be enough)

(instead of setting desc->addr to NULL, set to
dma_map_single(guard_buffer))




^ permalink raw reply

* Fwd: usbnet suspend issue
From: Chris Fries @ 2011-07-05 18:43 UTC (permalink / raw)
  To: netdev, stable, davem
In-Reply-To: <CADuEq9BkW8iHQjacohGJUZzwh--ZrqVEKYa4DzftH9XnMc=JNw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

(Plain text)

Hi David, Netdev, Stable,

Please evaluate the attached usbnet patch for inclusion to the kernel.

Thanks,
Chris



On Tue, Jul 5, 2011 at 4:49 AM, Oliver Neukum <oliver@neukum.org> wrote:
>
> Am Dienstag, 5. Juli 2011, 04:25:39 schrieb Chris Fries:
> > Hi Oliver,
> >
> > I noticed a bug in usbnet suspend, that's been around for a while and seems
> > to be still present in 3.0.  It causes some nasty behavior on ARM (Tegra)
> > when suspend/resume stops working due to a lost suspend count.
> > ...
>
> Thank you. That is a very good catch. Please resend to David Miller
> and netdev@vger.kernel.org and stable@kernel.org. Also please include
> your Signed-off-by: <mail> line. That way it'll be included fastest.
>
>        Regards
>                Oliver

[-- Attachment #2: usbnet-suspend.patch --]
[-- Type: application/octet-stream, Size: 1041 bytes --]

From 963360b955bae58963399124b4a542827f3272ea Mon Sep 17 00:00:00 2001
From: Chris Fries <C.Fries@motorola.com>
Date: Fri, 1 Jul 2011 15:28:58 -0500
Subject: [PATCH] usb: usbnet: suspend count gets lost when -EBUSY

When suspend is refused due to pending transmits, the
busy counter gets out of sync, and suspend will stop
working correctly.

Change-Id: I89d31d6584e8914192315671c8031a9c7658ca22
Signed-off-by: Chris Fries <C.Fries@motorola.com>
---
 drivers/net/usb/usbnet.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index cc95aad..9bcc3e3 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1451,6 +1451,7 @@ int usbnet_suspend (struct usb_interface *intf, pm_message_t message)
 		/* don't autosuspend while transmitting */
 		if (dev->txq.qlen && (message.event & PM_EVENT_AUTO)) {
 			spin_unlock_irq(&dev->txq.lock);
+			dev->suspend_count--;
 			return -EBUSY;
 		} else {
 			set_bit(EVENT_DEV_ASLEEP, &dev->flags);
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH 2/2] packet: Add fanout support.
From: Victor Julien @ 2011-07-05 18:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Loke, Chetan, David Miller, netdev
In-Reply-To: <1309890415.2545.13.camel@edumazet-laptop>

On 07/05/2011 08:26 PM, Eric Dumazet wrote:
>>>> What about a hashing scheme based on just the ip addresses? Would make
>>>> rxhash useless for this purpose, but would be a lot simpler overall maybe...
>>>>
>>>
>>> What about loads where a single IP address is used ?
>>
>> How would that be a problem?
> 
> Say I want my program able to process 2.000.000 packets per second on my
> 64 cpu machine, but one cpu is only able to process 100.000 pps.
> 
> All these packets are coming from IP1, and go to IP2, but thousand of
> different flows are in flight.
> 
> We need hash function taking into account source port and destination
> port.

Point taken. I see no other way either.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------


^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05 18:32 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309889634.2545.2.camel@edumazet-laptop>

Le mardi 05 juillet 2011 à 20:13 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 14:06 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> > > Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > > > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> > > 
> > > > > So all descriptors before prod are guaranteed to be ready for host
> > > > > consume... Fact that a dma access is running on 'next descriptor' should
> > > > > be irrelevant.
> > > > > 
> > > > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > > > there where we do advance to the next descriptor.
> > > 
> > > Yes, but we advance up to 'prod', which is the very last safe
> > > descriptor.
> > > 
> > > If hardware advertises descriptor X being ready to be handled by host,
> > > while DMA on this X descriptor is not yet finished, this would be a
> > > really useless hardware ;)
> > > 
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> > Something else just jumped out at me.  During b44_open, we call b44_init_rings.
> > This function allocates bp->rx_pending skb's and iteratively puts them in the rx
> > dma ring. bp->rx_pending is initalized to B44_DEF_RX_RING_PENDING, which is
> > defined as 200 (just about half of the 512 entries that the dma ring actually
> > supports in the hardware.  This is normally ok, as subsequent calls to
> > b44_alloc_rx_skb will fill in entries in the ring as those skbs are consumed.
> > The problem with this however is that b44_alloc_rx_skb only sets the
> > DESC_CTRL_EOT bit in the descriptor of the 512th entry, indicating that the
> > hardware should wrap around and reset the index counter.  If a large volume of
> > traffic is pushed through the adapter early on after initalization, or if the
> > cpu is busy during init, it would be possible that the ring buffer would fill up
> > prior to having additional entries added to the ring, the result being that the
> > dma engine would reach the end of the allocated descriptors, not see an EOT bit
> > set, and continue on using unallocated descriptors.
> > 
> > Just a theory, but it would be interesting to see if the problem subsided if you
> > ensured that you allocated  a full descriptor ring on b44_open
> > Neil
> >  
> > diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> > index 3d247f3..1b58a7c 100644
> > --- a/drivers/net/b44.c
> > +++ b/drivers/net/b44.c
> > @@ -57,7 +57,7 @@
> >  #define B44_MAX_MTU			1500
> >  
> >  #define B44_RX_RING_SIZE		512
> > -#define B44_DEF_RX_RING_PENDING		200
> > +#define B44_DEF_RX_RING_PENDING		512
> >  #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
> >  				 B44_RX_RING_SIZE)
> >  #define B44_TX_RING_SIZE		512
> 
> No
> 
> Please take time to read the driver again.
> 
> 200 desc are setup, and NIC is not allowed to use more than 200 descs.
> 
> ( B44_DMARX_PTR )
> 
> We carefuly advance this pointer after a new desc(s) is(are) setup
> 
> 

Then, maybe the driver model is completely wrong, and should really
setup 512 buffers, or use less descs but set EOT on last one.

Currently it uses a 200 sliding window out of the 512 descs.




^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 18:26 UTC (permalink / raw)
  To: Victor Julien; +Cc: Loke, Chetan, David Miller, netdev
In-Reply-To: <4E1346B7.4000105@inliniac.net>

Le mardi 05 juillet 2011 à 19:15 +0200, Victor Julien a écrit :
> On 07/05/2011 06:23 PM, Eric Dumazet wrote:
> > Le mardi 05 juillet 2011 à 18:21 +0200, Victor Julien a écrit :
> >> On 07/05/2011 06:16 PM, Eric Dumazet wrote:
> >>> Remember, goal is that _all_ packets of a given flow end in same queue.
> >>>
> >>
> >> What about a hashing scheme based on just the ip addresses? Would make
> >> rxhash useless for this purpose, but would be a lot simpler overall maybe...
> >>
> > 
> > What about loads where a single IP address is used ?
> 
> How would that be a problem?

Say I want my program able to process 2.000.000 packets per second on my
64 cpu machine, but one cpu is only able to process 100.000 pps.

All these packets are coming from IP1, and go to IP2, but thousand of
different flows are in flight.

We need hash function taking into account source port and destination
port.

^ permalink raw reply

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Eric Dumazet @ 2011-07-05 18:23 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: netdev, Konrad Wilk, linux-mm
In-Reply-To: <918f7b76-4904-41cc-9f55-c07adafb34b4@default>

Le mardi 05 juillet 2011 à 10:25 -0700, Dan Magenheimer a écrit :
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > Sent: Tuesday, July 05, 2011 10:31 AM
> > To: Dan Magenheimer
> > Cc: netdev@vger.kernel.org; Konrad Wilk; linux-mm
> > Subject: Re: [RFC] non-preemptible kernel socket for RAMster
> > 
> > Le mardi 05 juillet 2011 à 08:54 -0700, Dan Magenheimer a écrit :
> > > In working on a kernel project called RAMster* (where RAM on a
> > > remote system may be used for clean page cache pages and for swap
> > > pages), I found I have need for a kernel socket to be used when
> > > in non-preemptible state.  I admit to being a networking idiot,
> > > but I have been successfully using the following small patch.
> > > I'm not sure whether I am lucky so far... perhaps more
> > > sockets or larger/different loads will require a lot more
> > > changes (or maybe even make my objective impossible).
> > > So I thought I'd post it for comment.  I'd appreciate
> > > any thoughts or suggestions.
> > >
> > > Thanks,
> > > Dan
> > >
> > > * http://events.linuxfoundation.org/events/linuxcon/magenheimer
> > >
> > > diff -Napur linux-2.6.37/net/core/sock.c linux-2.6.37-ramster/net/core/sock.c
> > > --- linux-2.6.37/net/core/sock.c	2011-07-03 19:14:52.267853088 -0600
> > > +++ linux-2.6.37-ramster/net/core/sock.c	2011-07-03 19:10:04.340980799 -0600
> > > @@ -1587,6 +1587,14 @@ static void __lock_sock(struct sock *sk)
> > >  	__acquires(&sk->sk_lock.slock)
> > >  {
> > >  	DEFINE_WAIT(wait);
> > > +	if (!preemptible()) {
> > > +		while (sock_owned_by_user(sk)) {
> > > +			spin_unlock_bh(&sk->sk_lock.slock);
> > > +			cpu_relax();
> > > +			spin_lock_bh(&sk->sk_lock.slock);
> > > +		}
> > > +		return;
> > > +	}
> > 
> > Hmm, was this tested on UP machine ?
> 
> Hi Eric --
> 
> Thanks for the reply!
> 
> I hadn't tested UP in awhile so am testing now, and it seems to
> work OK so far.  However, I am just testing my socket, *not* testing
> sockets in general.  Are you implying that this patch will
> break (kernel) sockets in general on a UP machine?  If so,
> could you be more specific as to why?  (Again, I said
> I am a networking idiot. ;-)  I played a bit with adding
> a new SOCK_ flag and triggering off of that, but this
> version of the patch seemed much simpler.

Say you have two processes and socket S

One process locks socket S, and is preempted by another process.

This second process is non preemptible and try to lock same socket.

-> deadlock, since P1 never releases socket S




^ permalink raw reply

* RE: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 18:20 UTC (permalink / raw)
  To: Loke, Chetan; +Cc: Victor Julien, David Miller, netdev
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B91257D6E1@nsmail.netscout.com>

Le mardi 05 juillet 2011 à 13:35 -0400, Loke, Chetan a écrit :

> Sure, a lookup is needed(to steer what I call - Hot/Cold flows) and
> was proposed by me on the oisf mailing list. Always, use the ip_id bit
> then? Another problem that needs to be solved is, what if some
> decoders are overloaded, then what? How will this scheme work? How
> will we utilize other CPUs? RPS is needed for sure.
> 
> If we maintain a i) per port lookup-table ii) 2^20 flows/table and
> iii) 16 bytes/flow(one can also squeeze it down to 8 bytes) then we
> will need around 32MB worth memory/port. It's not a huge memory
> pressure for folks who want to use linux for doing IPS/IDS sort of
> stuff.
> 
> User-space decoders end up copying the packet anyways. So fanout can
> be implemented in user-space to achieve effective CPU utilization.
> As long as we don't bounce on different CPU-socket we could be ok.

This is the problem we want to address.

Going into user-space to perform the fanout is what you already have
today, with one socket, one thread doing the fanout to worker threads.

David patch is non adaptative : its a hash on N queue, with a fixed hash
function.

What you want is to add another 'control queue' where new flows are
directed. Then user application is able to reinject into kernel flow
director the "This flow should go to queue X" information.

Or, let the kernel do a mix of rxhash and loadbalance : Be able to
select a queue for a new flow without user land control, using a Flow
hash table.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox