Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: RFS seems to have incompatiblities with bridged vlans
From: Tom Herbert @ 2010-06-09  1:52 UTC (permalink / raw)
  To: John Fastabend
  Cc: tim.gardner@canonical.com, Eric Dumazet, Stephen Hemminger,
	Peter Lieven, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <4C0EE99B.8030300@intel.com>

>
> Problem with this is it doesn't address mis-aligned num_rx_queues.  For
> example with the bonding driver defaulting to 16 queues now. We could end up
> with a base driver with 16+ queues and a bond with 16.  Then we have the
> same issue again.
>
>  eth0  -------> bond / bridge ---------> vlan.id
> (nbrxq=64)      (nbrxq=16)              (nbrxq=X)
>

I don't think the intent is to solve this alignment problem here.  If
a driver allocates multiple queues, it should set the queue mapping
accordingly.  If this isn't the case, the meaning of queue mapping is
ambiguous (which driver is it relative to?), and using the value for
rps_cpus is probably not going to work well-- hacking the
queue-mapping to be a legal value in get_rps_cpus doesn't seem like
the right answer.

>>>
>>> On Tue, Jun 8, 2010 at 7:18 AM, Eric Dumazet<eric.dumazet@gmail.com>
>>>  wrote:
>>>>
>>>> Le lundi 07 juin 2010 à 15:30 -0700, John Fastabend a écrit :
>>>>
>>>>> There is always a possibility that the underlying device sets the
>>>>> queue_mapping to be greater then num_cpus.  Also I suspect the same
>>>>> issue exists with bonding devices.  Maybe something like the following
>>>>> is worth while? compile tested only,
>>>>>
>>>>> [PATCH] 8021q: vlan reassigns dev without check queue_mapping
>>>>>
>>>>> recv path reassigns skb->dev without sanity checking the
>>>>> queue_mapping field.  This can result in the queue_mapping
>>>>> field being set incorrectly if the new dev supports less
>>>>> queues then the underlying device.
>>>>>
>>>>> This patch just resets the queue_mapping to 0 which should
>>>>> resolve this issue?  Any thoughts?
>>>>>
>>>>> The same issue could happen on bonding devices as well.
>>>>>
>>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>>>> ---
>>>>>
>>>>>   net/8021q/vlan_core.c |    6 ++++++
>>>>>   1 files changed, 6 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>>>>> index bd537fc..ad309f8 100644
>>>>> --- a/net/8021q/vlan_core.c
>>>>> +++ b/net/8021q/vlan_core.c
>>>>> @@ -21,6 +21,9 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct
>>>>> vlan_group *grp,
>>>>>       if (!skb->dev)
>>>>>               goto drop;
>>>>>
>>>>> +     if (unlikely(skb->queue_mapping>= skb->dev->real_num_tx_queues))
>>>>> +             skb_set_queue_mapping(skb, 0);
>>>>> +
>>>>>       return (polling ? netif_receive_skb(skb) : netif_rx(skb));
>>>>>
>>>>>   drop:
>>>>> @@ -93,6 +96,9 @@ vlan_gro_common(struct napi_struct *napi, struct
>>>>> vlan_group *grp,
>>>>>       if (!skb->dev)
>>>>>               goto drop;
>>>>>
>>>>> +     if (unlikely(skb->queue_mapping>= skb->dev->real_num_tx_queues))
>>>>> +             skb_set_queue_mapping(skb, 0);
>>>>> +
>>>>>       for (p = napi->gro_list; p; p = p->next) {
>>>>>               NAPI_GRO_CB(p)->same_flow =
>>>>>                       p->dev == skb->dev&&  !compare_ether_header(
>>>>> --
>>>>
>>>> Only a workaround, added in hot path in a otherwise 'good' driver
>>>> (multiqueue enabled and ready)
>
> Agreed thanks!
>
>>>>
>>>> eth0  ------->  bond / bridge --------->  vlan.id
>>>> (nbtxq=8)      (ntxbq=1)        (nbtxq=X)
>>>>
>>>> X is capped to 1 because of bond/bridge, while bond has no "queue"
>>>> (LLTX driver)
>>>>
>>>> Solutions :
>>>>
>>>> 1) queue_mapping could be silently tested in get_rps_cpu()...
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 6f330ce..3a3f7f6 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -2272,14 +2272,11 @@ static int get_rps_cpu(struct net_device *dev,
>>>> struct sk_buff *skb,
>>>>
>>>>        if (skb_rx_queue_recorded(skb)) {
>>>>                u16 index = skb_get_rx_queue(skb);
>>>> -               if (unlikely(index>= dev->num_rx_queues)) {
>>>> -                       if (net_ratelimit()) {
>>>> -                               pr_warning("%s received packet on queue
>>>> "
>>>> -                                       "%u, but number of RX queues is
>>>> %u\n",
>>>> -                                       dev->name, index,
>>>> dev->num_rx_queues);
>>>> -                       }
>>>> -                       goto done;
>>>> -               }
>>>> +               if (WARN_ONCE(index>= dev->num_rx_queues,
>>>> +                               KERN_WARNING "%s received packet on
>>>> queue %u, "
>>>> +                               "but number of RX queues is %u\n",
>>>> +                               dev->name, index, dev->num_rx_queues))
>>>> +                       index %= dev->num_rx_queues;
>>>>                rxqueue = dev->_rx + index;
>>>>        } else
>>>>                rxqueue = dev->_rx;
>>>>
>>>>
>
> Looks good to me.
>
>>>>
>>>> 2) bond/bridge should setup more queues, just in case.
>>>>   We probably need to be able to make things more dynamic,
>>>>   (propagate nbtxq between layers) but not for 2.6.35
>>>>
>>>>
>
> The bonding driver is already multiq per Andy Gospodarek's patch commit
> bb1d912, but unless the bond and bridge devices use the max num_rx_queues of
> there underlying devices this could still go wrong.
>
> The bonding driver would possibly need to increase num_rx_queues and
> num_tx_queues when a device is enslaved or be set to some maximum at init
> for this to work right.
>
>>>>
>>>> diff --git a/drivers/net/bonding/bond_main.c
>>>> b/drivers/net/bonding/bond_main.c
>>>> index 5e12462..ce813dd 100644
>>>> --- a/drivers/net/bonding/bond_main.c
>>>> +++ b/drivers/net/bonding/bond_main.c
>>>> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>>>>
>>>>        rtnl_lock();
>>>>
>>>> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name :
>>>> "",
>>>> -                               bond_setup);
>>>> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name :
>>>> "",
>>>> +                                  bond_setup, max(64, nr_cpu_ids));
>>>>        if (!bond_dev) {
>>>>                pr_err("%s: eek! can't alloc netdev!\n", name);
>>>>                rtnl_unlock();
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Huh, so you guys are looking at the same issue (only my issue is RPS). See
>> http://marc.info/?l=linux-netdev&m=127603240621028&w=2. I'm in favor of
>> dropping the warning when no queues have been allocated.
>>
>> How about this (see attached).
>
> Prefer Eric's patch see first comment.
>
> Thanks,
> John
>
>>
>> rtg
>>
>>
>
>

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Linus Torvalds @ 2010-06-09  1:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Jens Axboe
  Cc: DRI, Linux SCSI List, Network Development, Linux Wireless List,
	Linux Kernel Mailing List, Linux ACPI, Andrew Morton,
	Kernel Testers List, Linux PM List, Maciej Rutecki
In-Reply-To: <Rlf_Qjov7bG.A.a8F.I8rDMB@chimera>



[ Added lots of cc's to direct specific people to look at the regressions 
  that may or may not be theirs... ]

On Wed, 9 Jun 2010, Rafael J. Wysocki wrote:
>
>  * Quite a few of the already reported regressions may be related to the bug
>    fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
>    bug in scrup, vt.c"), so reporters please retest with this commit applied.]

>From a quick look, most of them look unrelated to that unfortunate bug. 
It's hard to tell for sure, of course (memory corruption can do pretty 
much anything), but I'm not seeing the 07200720.. pattern at least.

And some of them do seem to be bisected to likely culprits and/or have 
patches that are claimed to have fixed them.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16163
> Subject	: [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> Submitter	: David John <davidjon@xenontk.org>
> Date		: 2010-06-02 12:52 (7 days old)
> Message-ID	: <4C065423.3000202@xenontk.org>
> References	: http://marc.info/?l=linux-kernel&m=127548313828613&w=2

That has a "reverting the commit fixes it", and a confirmation from Nick 
Bowler.

Eric, Carl: should I just revert that commit? Or do you have a fix?

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16145
> Subject	: Unable to boot after "ACPI: Don't let acpi_pad needlessly mark TSC unstable"
> Submitter	: Tom Gundersen <teg@jklm.no>
> Date		: 2010-06-07 13:11 (2 days old)
> Handled-By	: Venkatesh Pallipadi <venki@google.com>

Hmm. This does seem to be a properly bisected commit, but at the same time 
it looks from the bugzilla like it's just pure luck on that machine that 
the acpi_pad driver happened to mark TSC unstable - so while the commit 
bisected is the real one, it's not the "deeper reason" for the problem.

Venki, any updates?

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> Date		: 2010-06-05 06:15 (4 days old)

This seems to have been introduced by

	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
	Author: Ingo Molnar <mingo@elte.hu>
	Date:   Sat Nov 8 17:05:38 2008 +0100

	    sched: optimize sched_clock() a bit
    
	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
	    variant of __cycles_2_ns().
    
	    Most of the time sched_clock() is called with irqs disabled already.
	    The few places that call it with irqs enabled need to be updated.
    
	    Signed-off-by: Ingo Molnar <mingo@elte.hu>

and this seems to be one of those calling cases that need to be updated.

Ingo? The call trace is:

	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
	caller is native_sched_clock+0x3c/0x68
	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
	Call Trace:
	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
	[<ffffffff8101043d>] sched_clock+0x9/0xd
	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
	[<ffffffff81215537>] __make_request+0x338/0x45b
	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
	[<ffffffff81214909>] submit_bio+0xd2/0xef
	[<ffffffff811413cb>] submit_bh+0xf4/0x116
	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
	[<ffffffff81144875>] block_write_full_page+0x15/0x17
	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
	[<ffffffff810e1f91>] __writepage+0x1a/0x39
	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
	[<ffffffff810e3406>] generic_writepages+0x27/0x29
	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
	[<ffffffff811d0cba>] kjournald2+0x147/0x37a

(from the bugzilla thing)

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16122
> Subject	: 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
> Submitter	: Larry Finger <Larry.Finger@lwfinger.net>
> Date		: 2010-06-04 13:18 (5 days old)

Jens?

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16120
> Subject	: Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
> Submitter	: Alex Zhavnerchik <alex.vizor@gmail.com>
> Date		: 2010-06-04 09:25 (5 days old)
> Handled-By	: Eric Dumazet <eric.dumazet@gmail.com>

This one seems to have a patch in bugzilla.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16104
> Subject	: Radeon KMS does not start after merge of the new PM-Code
> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> Date		: 2010-06-02 07:47 (7 days old)

This one also has a patch in Bugzilla, I think Airlie is just waiting to 
calm down his queue and already removed the dependency on the temperature 
code. DaveA?

> Bug-Entry     : http://bugzilla.kernel.org/show_bug.cgi?id=16161
> Subject       : [2.6.35-rc1 regression] sysfs: cannot create duplicate filename ... XVR-600 related?
> Submitter     : Mikael Pettersson <mikpe@it.uu.se>
> Date          : 2010-06-01 19:57 (8 days old)
> Message-ID    : <19461.26166.427857.612983@pilspetsen.it.uu.se>
> References    : http://marc.info/?l=linux-kernel&m=127542227511925&w=2
>
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16090
> Subject	: sysfs: cannot create duplicate filename
> Submitter	: Tobias <devnull@plzk.org>
> Date		: 2010-06-01 15:59 (8 days old)
> Handled-By	: Jesse Barnes <jbarnes@virtuousgeek.org>

These two look related. Both are related to that "slot" thing in sysfs 
PCI, but only one of them is marked as Jesse's. Jesse?

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16037
> Subject	: NULL Pointer dereference in __ir_input_register/budget_ci_attach
> Submitter	: Sean Finney <seanius@debian.org>
> Date		: 2010-05-23 19:52 (17 days old)

Perhaps related to commit 13c24497086418010bf4f76378bcae241d7f59cd?

David Härdeman, Mauro Carvalho Chehab added to cc.

				Linus

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Mauro Carvalho Chehab @ 2010-06-09  2:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Jens Axboe, Dave Airlie, Jesse Barnes, Ingo Molnar,
	David Härdeman, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506@i5.linux-foundation.org>

Em 08-06-2010 22:53, Linus Torvalds escreveu:

> 
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16037
>> Subject	: NULL Pointer dereference in __ir_input_register/budget_ci_attach
>> Submitter	: Sean Finney <seanius@debian.org>
>> Date		: 2010-05-23 19:52 (17 days old)
> 
> Perhaps related to commit 13c24497086418010bf4f76378bcae241d7f59cd?
> 
> David Härdeman, Mauro Carvalho Chehab added to cc.

This patch probably solves the issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=84b14f181a36eea6591779156ef356f8d198bbfd

The patch were already applied upstream. I've already asked the reporter to test it, via BZ.

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Carl Worth @ 2010-06-09  2:38 UTC (permalink / raw)
  To: Linus Torvalds, Rafael J. Wysocki, Eric Anholt,
	Venkatesh Pallipadi, Jens Axboe
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Linux Wireless List, DRI
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506-GpypE611fyS63QaFMGN2QEqCLAeBNdoH@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]

On Tue, 8 Jun 2010 18:53:55 -0700 (PDT), Linus Torvalds <torvalds@linux-foundation.org> wrote:
> And some of them do seem to be bisected to likely culprits and/or have 
> patches that are claimed to have fixed them.
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16163
> > Subject	: [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> > Submitter	: David John <davidjon-XRr60H37pjdAfugRpC6u6w@public.gmane.org>
> > Date		: 2010-06-02 12:52 (7 days old)
> > Message-ID	: <4C065423.3000202-XRr60H37pjdAfugRpC6u6w@public.gmane.org>
> > References	: http://marc.info/?l=linux-kernel&m=127548313828613&w=2
> 
> That has a "reverting the commit fixes it", and a confirmation from Nick 
> Bowler.
> 
> Eric, Carl: should I just revert that commit? Or do you have a fix?

I'm not aware of any real fix here. That commit isn't supposed to change
much, but it clearly unmasks some broken driver code.

So reverting it for now to hide the broken code from poor users does
seem a good plan to me, (unless Eric has any updates or alternate
suggestions).

-Carl

-- 
carl.d.worth-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* RE: [PATCH] r8169: fix random mdio_write failures
From: hayeswang @ 2010-06-09  2:47 UTC (permalink / raw)
  To: 'Timo Teräs', 'Francois Romieu'; +Cc: netdev, davem
In-Reply-To: <4C0DDDCC.6010500@iki.fi>

I think the major point is that it needs a delay (20 us) after both read and
write completed indiacation before next command.
It needs almost 100 us for the indication to be completed, so our engineer
suggests us to check the indication per 100 us.
However, it is  ok to reduce the timer from 100 us to 25 us for checking .

Best Regards,
Hayes

-----Original Message-----
From: Timo Teräs [mailto:timo.teras@gmail.com] On Behalf Of Timo Teras
Sent: Tuesday, June 08, 2010 2:06 PM
To: Francois Romieu
Cc: Hayeswang; netdev@vger.kernel.org; davem@davemloft.net
Subject: Re: [PATCH] r8169: fix random mdio_write failures

On 06/08/2010 12:51 AM, Francois Romieu wrote:
> hayeswang <hayeswang@realtek.com> :
>> Our hardware engineer suggests that check the completed indication 
>> per 100 micro seconds. And it needs 20 micro seconds delay after the 
>> completed indication for the next command.
> 
> Should we do the same for mdio_read as well (100 us per iteration + an 
> extra 20 us) ?

Well, doing 100us per iteration will increase the latency that the code
notices "write complete" which slows down things. It'll also slightly
decrease bus traffic which is good. But I'd be just fine with 25us per
iteration. It sounds unlikely that polling the status register would slow
down the actual write operation (if that is the case then 100us would be
desirable).

Changing my 25us to 20us would good. The original 25us was just a guess.
The comment should be probably also updated that those delays are from
realtek hw specs then.

Would you like me to send a patch?

- Timo

------Please consider the environment before printing this e-mail. 

^ permalink raw reply

* #Very Important#
From: Mr. Jiang Jianmin @ 2010-06-09 11:36 UTC (permalink / raw)


Good Day,
 
I have a secured business proposal of $28,272,000.00. Only contact me via my private email(jiang_jianmin2011@9.cn) if interested.
 
Mr Jiang Jianmin.

^ permalink raw reply

* Re: [PATCH 1/2] Export firmware assigned labels of network devices to sysfs
From: Matt Domsch @ 2010-06-09  4:17 UTC (permalink / raw)
  To: Greg KH
  Cc: K, Narendra, netdev@vger.kernel.org,
	linux-hotplug@vger.kernel.org, linux-pci@vger.kernel.org,
	Hargrave, Jordan, Rose, Charles, Nijhawan, Vijay
In-Reply-To: <20100529045140.GA22563@mdomsch-pws380.aus.amer.dell.com>

On Fri, May 28, 2010 at 11:51:40PM -0500, Domsch, Matt wrote:
> On Fri, May 28, 2010 at 05:27:45PM -0500, Greg KH wrote:
> > Care to post that ECN publically?  And no, the Linux Foundation does not
> > have a PCI-SIG membership, the PCI-SIG keeps forbidding it.  Other
> > operating systems are allowed to join but not Linux.  Strange but
> > true...
> 
> I'm looking into it, and should know more next week.

I'm advised that I cannot post the ECN publically, due to it being an
in-progress work item of a SIG working group, and therefore falls
under the confidentiality rules that SIG members agree to.  Members of
the PCI SIG have access, which unfortunately is not everyone.

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO

^ permalink raw reply

* Re: RFS seems to have incompatiblities with bridged vlans
From: Eric Dumazet @ 2010-06-09  4:42 UTC (permalink / raw)
  To: John Fastabend
  Cc: tim.gardner@canonical.com, Tom Herbert, Stephen Hemminger,
	Peter Lieven, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <4C0EE99B.8030300@intel.com>

Le mardi 08 juin 2010 à 18:08 -0700, John Fastabend a écrit :


> 
> The bonding driver is already multiq per Andy Gospodarek's patch commit bb1d912, 
> but unless the bond and bridge devices use the max num_rx_queues of there 
> underlying devices this could still go wrong.
> 

This patch is in net-next-2.6, it is not scheduled for 2.6.35

We need some fix for 2.6.35

> The bonding driver would possibly need to increase num_rx_queues and 
> num_tx_queues when a device is enslaved or be set to some maximum at init for 
> this to work right.

Thats a bit complex and definitly not 2.6.35 material.




^ permalink raw reply

* [PATCH net-next-2.6] [PPPOE] cleanup: remove pppoe_xmit() declaration.
From: Rami Rosen @ 2010-06-09  5:07 UTC (permalink / raw)
  To: davem, netdev

[-- Attachment #1: Type: text/plain, Size: 204 bytes --]

Hi,
There is no need for pppoe_xmit() forward declaration in
drivers/net/pppoe.c. This patch removes this  pppoe_xmit() declaration.

Regards,
Rami Rosen



Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 443 bytes --]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 7ebb8e8..344ef33 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -89,7 +89,6 @@
 #define PPPOE_HASH_SIZE (1 << PPPOE_HASH_BITS)
 #define PPPOE_HASH_MASK	(PPPOE_HASH_SIZE - 1)
 
-static int pppoe_xmit(struct ppp_channel *chan, struct sk_buff *skb);
 static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb);
 
 static const struct proto_ops pppoe_ops;

^ permalink raw reply related

* [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs
From: Timo Teräs @ 2010-06-09  5:22 UTC (permalink / raw)
  To: netdev; +Cc: Timo Teräs, Francois Romieu, Hayeswang
In-Reply-To: <BD599B881C1D407DA7BC93A9244650B0@realtek.com.tw>

Realtek confirmed that a 20us delay is needed after mdio_read and
mdio_write operations. Reduce the delay in mdio_write, and add it
to mdio_read too. Also add a comment that the 20us is from hw specs.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayeswang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 03a8318..96b6cfb 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -560,10 +560,10 @@ static void mdio_write(void __iomem *ioaddr, int reg_addr, int value)
 		udelay(25);
 	}
 	/*
-	 * Some configurations require a small delay even after the write
-	 * completed indication or the next write might fail.
+	 * According to hardware specs a 20us delay is required after write
+	 * complete indication, but before sending next command.
 	 */
-	udelay(25);
+	udelay(20);
 }
 
 static int mdio_read(void __iomem *ioaddr, int reg_addr)
@@ -583,6 +583,12 @@ static int mdio_read(void __iomem *ioaddr, int reg_addr)
 		}
 		udelay(25);
 	}
+	/*
+	 * According to hardware specs a 20us delay is required after read
+	 * complete indication, but before sending next command.
+	 */
+	udelay(20);
+
 	return value;
 }
 
-- 
1.7.0.4


^ permalink raw reply related

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Ingo Molnar @ 2010-06-09  5:34 UTC (permalink / raw)
  To: Linus Torvalds, Jens Axboe, Peter Zijlstra
  Cc: Rafael J. Wysocki, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Jens Axboe, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506@i5.linux-foundation.org>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
> > Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> > Submitter	: Jan Kreuzer <kontrollator@gmx.de>
> > Date		: 2010-06-05 06:15 (4 days old)
> 
> This seems to have been introduced by
> 
> 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> 	Author: Ingo Molnar <mingo@elte.hu>
> 	Date:   Sat Nov 8 17:05:38 2008 +0100
> 
> 	    sched: optimize sched_clock() a bit
>     
> 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> 	    variant of __cycles_2_ns().
>     
> 	    Most of the time sched_clock() is called with irqs disabled already.
> 	    The few places that call it with irqs enabled need to be updated.
>     
> 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> and this seems to be one of those calling cases that need to be updated.

That's a commit from 2008.

> Ingo? The call trace is:
> 
> 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> 	caller is native_sched_clock+0x3c/0x68
> 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> 	Call Trace:
> 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> 	[<ffffffff8101043d>] sched_clock+0x9/0xd
> 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
> 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> 	[<ffffffff81215537>] __make_request+0x338/0x45b
> 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> 	[<ffffffff81214909>] submit_bio+0xd2/0xef
> 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
> 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
> 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
> 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
> 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
> 
> (from the bugzilla thing)

The warning was introduced by this fresh commit (and a followup commit) merged 
in the .35 merge window:

 | commit 9195291e5f05e01d67f9a09c756b8aca8f009089
 | Author: Divyesh Shah <dpshah@google.com>
 | Date:   Thu Apr 1 15:01:41 2010 -0700
 | 
 |     blkio: Increment the blkio cgroup stats for real now

IIRC Jens posted a fix for the regression. Jens, what's the status of that?

As the code there started using a raw sched_clock() call for block statistics 
purposes, which was a poorly thought out (and buggy) approach:

 - it takes timestamps on different cpus and then compares then, but doesnt 
   consider that sched_clock() is not comparable between CPUs without extra 
   care

 - it doesnt consider the possibility for the sched_clock() result going
   backwards on certain platforms (such as x86)

 - it doesnt consider preemptability

(There's work ongoing to add a clock variant that can be used for such 
purposes, but that's .36 fodder.)

Thanks,

	Ingo

^ permalink raw reply

* pkt_sched: gen_estimator: more fuel for Jarek and Changli
From: Eric Dumazet @ 2010-06-09  6:13 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276030354.2439.8.camel@edumazet-laptop>


With un-modified kernel, I ran following scripts on my machine

taskset 01 sh -c "while :;do iptables  -I INPUT -i eth0 -j RATEEST --rateest-name eth0 --rateest-interval 250ms --rateest-ewmalog 1000ms; done" &
taskset 02 sh -c "while :;do iptables  -F INPUT; done" &
taskset 02 sh -c "while :;do tc qdisc del dev eth0 root 2>/dev/null;done" &
taskset 08 sh -c "while :;do tc qdisc add dev eth0 root handle 1: est 250msec 1sec cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit 2>/dev/null;done" &


I got following oops in about 10 seconds, and my machine had to be
rebooted, rtnl being locked forever, so many commands block hard in
rtnl_lock()

root      6016  0.0  0.0  2040  536 pts/0    D    07:14   0:00 tc qdisc del dev eth0 root
root      6021  0.0  0.0  2040  676 pts/0    D    07:14   0:00 tc qdisc add dev eth0 root handle 1: est 250msec 1sec cbq avpkt 1000 rate 1
root     19358  0.0  0.0  1752  252 ?        D    07:45   0:00 ip -o link ls dev eth0

[  753.892107] BUG: unable to handle kernel NULL pointer dereference at (null)
[  753.892132] IP: [<c116b6c8>] rb_insert_color+0xc6/0xd0
[  753.892156] *pdpt = 0000000032827001 *pde = 0000000000000000 
[  753.892177] Oops: 0002 [#1] PREEMPT SMP 
[  753.892196] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class
[  753.892218] Modules linked in: xt_RATEEST iptable_filter ip_tables x_tables ipmi_devintf ipmi_si ipmi_msghandler ipv6 dm_mod button battery ac ehci_hcd uhci_hcd tg3 libphy bnx2x crc32c libcrc32c mdio [last unloaded: x_tables]
[  753.892314] 
[  753.892321] Pid: 5951, comm: tc Not tainted 2.6.35-rc1-00208-g50e3a9a #68 /ProLiant BL460c G6
[  753.892341] EIP: 0060:[<c116b6c8>] EFLAGS: 00010202 CPU: 3
[  753.892356] EIP is at rb_insert_color+0xc6/0xd0
[  753.892368] EAX: 00000000 EBX: f34c1750 ECX: f34c1750 EDX: c1b5a1bc
[  753.892384] ESI: 00000001 EDI: f34c1ae0 EBP: f34a0c0c ESP: f34a0bf8
[  753.892399]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  753.892413] Process tc (pid: 5951, ti=f34a0000 task=f43f2ac0 task.ti=f34a0000)
[  753.892430] Stack:
[  753.892465]  c1292899 c1b5a1bc f34c1aa8 f3ae47f4 f36baf78 f34a0c34 c1292a66 f36baf5c
[  753.892524] <0> 00000098 d8d43110 f36baf2c 00000000 f36baf00 f34a0ca0 00000000 f34a0c6c
[  753.892598] <0> c12aa80c d8d4310c c16ba5a0 00000000 f4160000 c1561fa0 f43f2a00 00000000
[  753.892681] Call Trace:
[  753.892707]  [<c1292899>] ? gen_new_estimator+0x55/0x247
[  753.892736]  [<c1292a66>] ? gen_new_estimator+0x222/0x247
[  753.892765]  [<c12aa80c>] ? qdisc_create+0x1e4/0x273
[  753.892793]  [<c12aabd8>] ? tc_modify_qdisc+0x33d/0x3be
[  753.892822]  [<c12aa89b>] ? tc_modify_qdisc+0x0/0x3be
[  753.892850]  [<c12a1c10>] ? rtnetlink_rcv_msg+0x197/0x1a6
[  753.892880]  [<c132d454>] ? mutex_lock_nested+0x26e/0x288
[  753.892909]  [<c12a1a79>] ? rtnetlink_rcv_msg+0x0/0x1a6
[  753.892938]  [<c12c74ec>] ? netlink_rcv_skb+0x32/0x73
[  753.892966]  [<c12a1a00>] ? rtnetlink_rcv+0x1b/0x22
[  753.892993]  [<c12c7045>] ? netlink_unicast+0x1b3/0x214
[  753.893021]  [<c12c72dc>] ? netlink_sendmsg+0x236/0x243
[  753.893050]  [<c1288262>] ? sock_sendmsg+0xc0/0xdb
[  753.893080]  [<c109f15a>] ? might_fault+0x36/0x70
[  753.893107]  [<c109f15a>] ? might_fault+0x36/0x70
[  753.893134]  [<c109f15a>] ? might_fault+0x36/0x70
[  753.893161]  [<c116f330>] ? _copy_from_user+0x39/0x4d
[  753.893189]  [<c1290a91>] ? verify_iovec+0x3e/0x6d
[  753.893217]  [<c1289b89>] ? sys_sendmsg+0x13f/0x18c
[  753.893244]  [<c12882cd>] ? sockfd_lookup_light+0x19/0x4b
[  753.893274]  [<c1094dea>] ? __lru_cache_add+0x64/0x7b
[  753.893302]  [<c102a200>] ? get_parent_ip+0x9/0x31
[  753.893332]  [<c105a62b>] ? lock_release_non_nested+0x88/0x245
[  753.893362]  [<c109f15a>] ? might_fault+0x36/0x70
[  753.893389]  [<c109f15a>] ? might_fault+0x36/0x70
[  753.893415]  [<c109f15a>] ? might_fault+0x36/0x70
[  753.893443]  [<c1289f62>] ? sys_socketcall+0x163/0x1a3
[  753.893472]  [<c116edd0>] ? trace_hardirqs_on_thunk+0xc/0x10
[  753.893501]  [<c100278c>] ? sysenter_do_call+0x12/0x32
[  753.893537] Code: cb 83 0b 01 89 f0 83 26 fe 8b 55 f0 e8 8e fe ff ff 8b 1f 83 e3 fc 74 0e 8b 33 f7 c6 01 00 00 00 0f 84 61 ff ff ff 8b 55 f0 8b 02 <83> 08 01 58 5a 5b 5e 5f 5d c3 55 89 e5 57 56 89 d6 53 89 c3 83 
[  753.893763] EIP: [<c116b6c8>] rb_insert_color+0xc6/0xd0 SS:ESP 0068:f34a0bf8
[  753.893799] CR2: 0000000000000000
[  753.894062] ---[ end trace da6bae989b9be023 ]---



Triggering the other bug is more difficult :

est_timer() should be interrupted 
(by hard irqs for example), right before spin_lock(e->stats_lock);

Then a caller of gen_kill_estimator() might freed stats_lock and 
est_timer() reference a freed spinlock.

This can be simulated with following patch, to inject a 100 ms delay.

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index cf8e703..55ba060 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -120,6 +120,8 @@ static void est_timer(unsigned long arg)
                u32 npackets;
                u32 rate;
 
+               for (rate = 0; rate < 100; rate++)
+                       udelay(1000);
                spin_lock(e->stats_lock);
                read_lock(&est_lock);
                if (e->bstats == NULL)

My machine crash almost instantly in spin_lock(e->stats_lock)

I'll post v3 of the patch, with updated Changelog



^ permalink raw reply related

* Re: [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs
From: Francois Romieu @ 2010-06-09  6:18 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev, Hayeswang
In-Reply-To: <1276060930-15697-1-git-send-email-timo.teras@iki.fi>

Timo Teräs <timo.teras@iki.fi> :
> Realtek confirmed that a 20us delay is needed after mdio_read and
> mdio_write operations. Reduce the delay in mdio_write, and add it
> to mdio_read too. Also add a comment that the 20us is from hw specs.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>

Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Eric Dumazet @ 2010-06-09  6:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Jens Axboe, Dave Airlie, Jesse Barnes, Ingo Molnar,
	David Härdeman, Mauro Carvalho Chehab,
	Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Linux Wireless List, DRI
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506-GpypE611fyS63QaFMGN2QEqCLAeBNdoH@public.gmane.org>

Le mardi 08 juin 2010 à 18:53 -0700, Linus Torvalds a écrit :

> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16120
> > Subject	: Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
> > Submitter	: Alex Zhavnerchik <alex.vizor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Date		: 2010-06-04 09:25 (5 days old)
> > Handled-By	: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> This one seems to have a patch in bugzilla.

Yep, commit 035320d54758e21227987e3aae0d46e7a04f4ddc in David net-2.6
tree, i'll be included in its next pull request.

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=035320d54758e21227987e3aae0d46e7a04f4ddc

Thanks

^ permalink raw reply

* Re: pkt_sched: gen_estimator: more fuel for Jarek and Changli
From: Jarek Poplawski @ 2010-06-09  6:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276063997.2439.650.camel@edumazet-laptop>

On Wed, Jun 09, 2010 at 08:13:17AM +0200, Eric Dumazet wrote:
> 
> With un-modified kernel, I ran following scripts on my machine

Why not modified with your other patch quite obviously needed by
rateest?:

> [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.

Btw, I agree with Changli that adding RTNL to rateest (if possible),
and doing the RTNL replacement later, seems more proper.

Jarek P.

^ permalink raw reply

* Re: [PATCH][RFC] Check sk_buff states
From: Andi Kleen @ 2010-06-09  7:14 UTC (permalink / raw)
  To: David VomLehn; +Cc: netdev
In-Reply-To: <20100608003055.GA29405@dvomlehn-lnx2.corp.sa.net>

David VomLehn <dvomlehn@cisco.com> writes:
>
> Some of the errors that can be detected are:
> o	Double-freeing an skb_buff

slab debugging should already catch that (although at higher cost)

I would say the big question is how many bugs such a new infrastructure
find. Did you find any?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH][RFC] Infrastructure for compact call location representation
From: Andi Kleen @ 2010-06-09  7:17 UTC (permalink / raw)
  To: David VomLehn; +Cc: netdev
In-Reply-To: <20100608003052.GA29377@dvomlehn-lnx2.corp.sa.net>

David VomLehn <dvomlehn@cisco.com> writes:

First your mailer generates broken cc addresses like
"netdev@vger.kernel.org"@cisco.com

> This patch allows the location of a call to be recorded as a small integer,
> with each call location ("callsite") assigned a new value the first time
> the code in that location is executed. Locations can be recorded as a
> an address or as a __FILE__/__LINE__ pair. The later is easier to read, but
> requires significantly more space.

Seems overly complicated.

How about using a hash of __FILE__, __LINE__ instead? 
With some care you can write it in a way that it gets completely
evaluated by gcc at compile time, so it's just a constant then.

That may use a few more bits then, but that's far better
than having so much runtime overhead for this.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* RE: [RFC PATCH v7 11/19] Use callback to deal with skb_release_data() specially.
From: Xin, Xiaohui @ 2010-06-09  7:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <1275749773.5238.244.camel@edumazet-laptop>

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Saturday, June 05, 2010 10:56 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 11/19] Use callback to deal with skb_release_data() specially.
>
>Le samedi 05 juin 2010 à 18:14 +0800, xiaohui.xin@intel.com a écrit :
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>>
>> If buffer is external, then use the callback to destruct
>> buffers.
>>
>> Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
>> Signed-off-by: Zhao Yu <yzhao81new@gmail.com>
>> Reviewed-by: Jeff Dike <jdike@linux.intel.com>
>> ---
>>  net/core/skbuff.c |   11 +++++++++++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 37587f0..418457c 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -385,6 +385,11 @@ static void skb_clone_fraglist(struct sk_buff *skb)
>>
>>  static void skb_release_data(struct sk_buff *skb)
>>  {
>> +	/* check if the skb has external buffers, we have use destructor_arg
>> +	 * here to indicate
>> +	 */
>> +	struct skb_external_page *ext_page = skb_shinfo(skb)->destructor_arg;
>> +
>
>Oh well. This is v7 of your series, and nobody complained yet ?
>
>This is a new cache miss on a _critical_ path.

Ok, I would remove the declaration here to avoid the new cache miss.

>
>
>>  	if (!skb->cloned ||
>>  	    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
>>  			       &skb_shinfo(skb)->dataref)) {
>> @@ -397,6 +402,12 @@ static void skb_release_data(struct sk_buff *skb)
>>  		if (skb_has_frags(skb))
>>  			skb_drop_fraglist(skb);
>>
>> +		/* if the skb has external buffers, use destructor here,
>> +		 * since after that skb->head will be kfree, in case skb->head
>> +		 * from external buffer cannot use kfree to destroy.
>> +		 */
>
>Why not deferring here the access to skb_shinfo(skb)->destructor_arg ?

And references skb_shinfo(skb)->destructor_arg here.

>
>> +		if (dev_is_mpassthru(skb->dev) && ext_page && ext_page->dtor)
>> +			ext_page->dtor(ext_page);
>>  		kfree(skb->head);
>>  	}
>>  }
>
>if (dev_is_mpassthru(skb->dev)) {
>	struct skb_external_page *ext_page =
>				skb_shinfo(skb)->destructor_arg;
>	if (ext_page && ext_page->dtor)
>		ext_page->dtor(ext_page);
>}
>
>destructor_arg should me moved before frags[] if you really want to use it.

Thanks for the patch. But why destructor_arg before frags[] is better than after frags[]?
skb_release_data() will reference both of them........

>
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index bf243fc..b136d90 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -202,10 +202,11 @@ struct skb_shared_info {
>         */
>        atomic_t        dataref;
>
>-       skb_frag_t      frags[MAX_SKB_FRAGS];
>        /* Intermediate layers must ensure that destructor_arg
>         * remains valid until skb destructor */
>        void *          destructor_arg;
>+
>+       skb_frag_t      frags[MAX_SKB_FRAGS];
> };
>
> /* We divide dataref into two halves.  The higher 16 bits hold references
>
>


^ permalink raw reply

* [PATCH] [ath5k][leds] Ability to disable leds support. If leds support enabled do not force mac802.11 leds layer selection.
From: Dmytro Milinevskyy @ 2010-06-09  7:31 UTC (permalink / raw)
  To: ath5k-devel
  Cc: Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez, Bob Copeland,
	John W. Linville, GeunSik Lim, Greg Kroah-Hartman, Lukas Turek,
	Mark Hindley, Jiri Kosina, Kalle Valo, Keng-Yu Lin, Luca Verdesca,
	Shahar Or, linux-wireless, netdev, linux-kernel

Hi!

Here is the patch to disable ath5k leds support on build stage.
However if the leds support was enabled do not force selection of 802.11 leds layer.
Depency on LEDS_CLASS is kept.

Suggestion given by Pavel Roskin and Bob Copeland applied.

Changes from the previos version of the patch:
 - Use bool directive in Kconfig for switching LEDs support
 - Move ath5k_hw_set_ledstate into led.c and change its semantic to align with other leds routines

Regards,

--Dima

Signed-off-by: Dmytro Milinevskyy <milinevskyy@gmail.com>
---
 drivers/net/wireless/ath/ath5k/Kconfig  |   12 +++++--
 drivers/net/wireless/ath/ath5k/Makefile |    2 +-
 drivers/net/wireless/ath/ath5k/ath5k.h  |   21 +++++++++++-
 drivers/net/wireless/ath/ath5k/attach.c |    2 +-
 drivers/net/wireless/ath/ath5k/base.c   |    6 ++--
 drivers/net/wireless/ath/ath5k/base.h   |   11 ++++--
 drivers/net/wireless/ath/ath5k/gpio.c   |   52 ------------------------------
 drivers/net/wireless/ath/ath5k/led.c    |   53 +++++++++++++++++++++++++++++++
 8 files changed, 94 insertions(+), 65 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..6b2a682 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -1,9 +1,6 @@
 config ATH5K
 	tristate "Atheros 5xxx wireless cards support"
 	depends on PCI && MAC80211
-	select MAC80211_LEDS
-	select LEDS_CLASS
-	select NEW_LEDS
 	---help---
 	  This module adds support for wireless adapters based on
 	  Atheros 5xxx chipset.
@@ -18,6 +15,15 @@ config ATH5K
 	  If you choose to build a module, it'll be called ath5k. Say M if
 	  unsure.
 
+
+config ATH5K_LEDS
+	bool "Atheros 5xxx wireless cards LEDs support"
+	depends on ATH5K
+	select NEW_LEDS
+	select LEDS_CLASS
+	---help---
+	  Atheros 5xxx LED support.
+
 config ATH5K_DEBUG
 	bool "Atheros 5xxx debugging"
 	depends on ATH5K
diff --git a/drivers/net/wireless/ath/ath5k/Makefile b/drivers/net/wireless/ath/ath5k/Makefile
index cc09595..6d552dd 100644
--- a/drivers/net/wireless/ath/ath5k/Makefile
+++ b/drivers/net/wireless/ath/ath5k/Makefile
@@ -10,8 +10,8 @@ ath5k-y				+= phy.o
 ath5k-y				+= reset.o
 ath5k-y				+= attach.o
 ath5k-y				+= base.o
-ath5k-y				+= led.o
 ath5k-y				+= rfkill.o
 ath5k-y				+= ani.o
 ath5k-$(CONFIG_ATH5K_DEBUG)	+= debug.o
+ath5k-$(CONFIG_ATH5K_LEDS) += led.o
 obj-$(CONFIG_ATH5K)		+= ath5k.o
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h b/drivers/net/wireless/ath/ath5k/ath5k.h
index 2785946..202d143 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ -1148,11 +1148,31 @@ struct ath5k_hw {
 int ath5k_hw_attach(struct ath5k_softc *sc);
 void ath5k_hw_detach(struct ath5k_hw *ah);
 
+#ifdef CONFIG_ATH5K_LEDS
 /* LED functions */
+void ath5k_hw_set_ledstate(struct ath5k_softc *sc, unsigned int state);
 int ath5k_init_leds(struct ath5k_softc *sc);
 void ath5k_led_enable(struct ath5k_softc *sc);
 void ath5k_led_off(struct ath5k_softc *sc);
 void ath5k_unregister_leds(struct ath5k_softc *sc);
+#else
+static inline void ath5k_hw_set_ledstate(struct ath5k_softc *sc, unsigned int state)
+{
+}
+static inline int ath5k_init_leds(struct ath5k_softc *sc)
+{
+	return 0;
+}
+static inline void ath5k_led_enable(struct ath5k_softc *sc)
+{
+}
+static inline void ath5k_led_off(struct ath5k_softc *sc)
+{
+}
+static inline void ath5k_unregister_leds(struct ath5k_softc *sc)
+{
+}
+#endif
 
 /* Reset Functions */
 int ath5k_hw_nic_wakeup(struct ath5k_hw *ah, int flags, bool initial);
@@ -1233,7 +1253,6 @@ int ath5k_hw_set_slot_time(struct ath5k_hw *ah, unsigned int slot_time);
 int ath5k_hw_init_desc_functions(struct ath5k_hw *ah);
 
 /* GPIO Functions */
-void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state);
 int ath5k_hw_set_gpio_input(struct ath5k_hw *ah, u32 gpio);
 int ath5k_hw_set_gpio_output(struct ath5k_hw *ah, u32 gpio);
 u32 ath5k_hw_get_gpio(struct ath5k_hw *ah, u32 gpio);
diff --git a/drivers/net/wireless/ath/ath5k/attach.c b/drivers/net/wireless/ath/ath5k/attach.c
index e0c244b..7930a39 100644
--- a/drivers/net/wireless/ath/ath5k/attach.c
+++ b/drivers/net/wireless/ath/ath5k/attach.c
@@ -336,7 +336,7 @@ int ath5k_hw_attach(struct ath5k_softc *sc)
 	ath5k_hw_init_nfcal_hist(ah);
 
 	/* turn on HW LEDs */
-	ath5k_hw_set_ledstate(ah, AR5K_LED_INIT);
+	ath5k_hw_set_ledstate(sc, AR5K_LED_INIT);
 
 	return 0;
 err_free:
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 2978359..64ea83d 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -3411,7 +3411,7 @@ static void ath5k_bss_info_changed(struct ieee80211_hw *hw,
 		sc->assoc = bss_conf->assoc;
 		if (sc->opmode == NL80211_IFTYPE_STATION)
 			set_beacon_filter(hw, sc->assoc);
-		ath5k_hw_set_ledstate(sc->ah, sc->assoc ?
+		ath5k_hw_set_ledstate(sc, sc->assoc ?
 			AR5K_LED_ASSOC : AR5K_LED_INIT);
 		if (bss_conf->assoc) {
 			ATH5K_DBG(sc, ATH5K_DEBUG_ANY,
@@ -3444,13 +3444,13 @@ static void ath5k_sw_scan_start(struct ieee80211_hw *hw)
 {
 	struct ath5k_softc *sc = hw->priv;
 	if (!sc->assoc)
-		ath5k_hw_set_ledstate(sc->ah, AR5K_LED_SCAN);
+		ath5k_hw_set_ledstate(sc, AR5K_LED_SCAN);
 }
 
 static void ath5k_sw_scan_complete(struct ieee80211_hw *hw)
 {
 	struct ath5k_softc *sc = hw->priv;
-	ath5k_hw_set_ledstate(sc->ah, sc->assoc ?
+	ath5k_hw_set_ledstate(sc, sc->assoc ?
 		AR5K_LED_ASSOC : AR5K_LED_INIT);
 }
 
diff --git a/drivers/net/wireless/ath/ath5k/base.h b/drivers/net/wireless/ath/ath5k/base.h
index 56221bc..8d289a3 100644
--- a/drivers/net/wireless/ath/ath5k/base.h
+++ b/drivers/net/wireless/ath/ath5k/base.h
@@ -186,9 +186,6 @@ struct ath5k_softc {
 
 	u8			bssidmask[ETH_ALEN];
 
-	unsigned int		led_pin,	/* GPIO pin for driving LED */
-				led_on;		/* pin setting for LED on */
-
 	struct tasklet_struct	restq;		/* reset tasklet */
 
 	unsigned int		rxbufsize;	/* rx size based on mtu */
@@ -196,7 +193,6 @@ struct ath5k_softc {
 	spinlock_t		rxbuflock;
 	u32			*rxlink;	/* link ptr in last RX desc */
 	struct tasklet_struct	rxtq;		/* rx intr tasklet */
-	struct ath5k_led	rx_led;		/* rx led */
 
 	struct list_head	txbuf;		/* transmit buffer */
 	spinlock_t		txbuflock;
@@ -204,7 +200,14 @@ struct ath5k_softc {
 	struct ath5k_txq	txqs[AR5K_NUM_TX_QUEUES];	/* tx queues */
 	struct ath5k_txq	*txq;		/* main tx queue */
 	struct tasklet_struct	txtq;		/* tx intr tasklet */
+
+
+#ifdef CONFIG_ATH5K_LEDS
+	unsigned int		led_pin,	/* GPIO pin for driving LED */
+				led_on;		/* pin setting for LED on */
+	struct ath5k_led	rx_led;		/* rx led */
 	struct ath5k_led	tx_led;		/* tx led */
+#endif
 
 	struct ath5k_rfkill	rf_kill;
 
diff --git a/drivers/net/wireless/ath/ath5k/gpio.c b/drivers/net/wireless/ath/ath5k/gpio.c
index 64a27e7..1e8b145 100644
--- a/drivers/net/wireless/ath/ath5k/gpio.c
+++ b/drivers/net/wireless/ath/ath5k/gpio.c
@@ -26,58 +26,6 @@
 #include "base.h"
 
 /*
- * Set led state
- */
-void ath5k_hw_set_ledstate(struct ath5k_hw *ah, unsigned int state)
-{
-	u32 led;
-	/*5210 has different led mode handling*/
-	u32 led_5210;
-
-	ATH5K_TRACE(ah->ah_sc);
-
-	/*Reset led status*/
-	if (ah->ah_version != AR5K_AR5210)
-		AR5K_REG_DISABLE_BITS(ah, AR5K_PCICFG,
-			AR5K_PCICFG_LEDMODE |  AR5K_PCICFG_LED);
-	else
-		AR5K_REG_DISABLE_BITS(ah, AR5K_PCICFG, AR5K_PCICFG_LED);
-
-	/*
-	 * Some blinking values, define at your wish
-	 */
-	switch (state) {
-	case AR5K_LED_SCAN:
-	case AR5K_LED_AUTH:
-		led = AR5K_PCICFG_LEDMODE_PROP | AR5K_PCICFG_LED_PEND;
-		led_5210 = AR5K_PCICFG_LED_PEND | AR5K_PCICFG_LED_BCTL;
-		break;
-
-	case AR5K_LED_INIT:
-		led = AR5K_PCICFG_LEDMODE_PROP | AR5K_PCICFG_LED_NONE;
-		led_5210 = AR5K_PCICFG_LED_PEND;
-		break;
-
-	case AR5K_LED_ASSOC:
-	case AR5K_LED_RUN:
-		led = AR5K_PCICFG_LEDMODE_PROP | AR5K_PCICFG_LED_ASSOC;
-		led_5210 = AR5K_PCICFG_LED_ASSOC;
-		break;
-
-	default:
-		led = AR5K_PCICFG_LEDMODE_PROM | AR5K_PCICFG_LED_NONE;
-		led_5210 = AR5K_PCICFG_LED_PEND;
-		break;
-	}
-
-	/*Write new status to the register*/
-	if (ah->ah_version != AR5K_AR5210)
-		AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG, led);
-	else
-		AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG, led_5210);
-}
-
-/*
  * Set GPIO inputs
  */
 int ath5k_hw_set_gpio_input(struct ath5k_hw *ah, u32 gpio)
diff --git a/drivers/net/wireless/ath/ath5k/led.c b/drivers/net/wireless/ath/ath5k/led.c
index 67aa52e..9cff1cf 100644
--- a/drivers/net/wireless/ath/ath5k/led.c
+++ b/drivers/net/wireless/ath/ath5k/led.c
@@ -41,6 +41,7 @@
 
 #include <linux/pci.h>
 #include "ath5k.h"
+#include "reg.h"
 #include "base.h"
 
 #define ATH_SDEVICE(subv,subd) \
@@ -86,6 +87,58 @@ static const struct pci_device_id ath5k_led_devices[] = {
 	{ }
 };
 
+/*
+ * Set led state
+ */
+void ath5k_hw_set_ledstate(struct ath5k_softc *sc, unsigned int state)
+{
+	u32 led;
+	/*5210 has different led mode handling*/
+	u32 led_5210;
+
+	ATH5K_TRACE(sc);
+
+	/*Reset led status*/
+	if (sc->ah->ah_version != AR5K_AR5210)
+		AR5K_REG_DISABLE_BITS(sc->ah, AR5K_PCICFG,
+			AR5K_PCICFG_LEDMODE |  AR5K_PCICFG_LED);
+	else
+		AR5K_REG_DISABLE_BITS(sc->ah, AR5K_PCICFG, AR5K_PCICFG_LED);
+
+	/*
+	 * Some blinking values, define at your wish
+	 */
+	switch (state) {
+	case AR5K_LED_SCAN:
+	case AR5K_LED_AUTH:
+		led = AR5K_PCICFG_LEDMODE_PROP | AR5K_PCICFG_LED_PEND;
+		led_5210 = AR5K_PCICFG_LED_PEND | AR5K_PCICFG_LED_BCTL;
+		break;
+
+	case AR5K_LED_INIT:
+		led = AR5K_PCICFG_LEDMODE_PROP | AR5K_PCICFG_LED_NONE;
+		led_5210 = AR5K_PCICFG_LED_PEND;
+		break;
+
+	case AR5K_LED_ASSOC:
+	case AR5K_LED_RUN:
+		led = AR5K_PCICFG_LEDMODE_PROP | AR5K_PCICFG_LED_ASSOC;
+		led_5210 = AR5K_PCICFG_LED_ASSOC;
+		break;
+
+	default:
+		led = AR5K_PCICFG_LEDMODE_PROM | AR5K_PCICFG_LED_NONE;
+		led_5210 = AR5K_PCICFG_LED_PEND;
+		break;
+	}
+
+	/*Write new status to the register*/
+	if (sc->ah->ah_version != AR5K_AR5210)
+		AR5K_REG_ENABLE_BITS(sc->ah, AR5K_PCICFG, led);
+	else
+		AR5K_REG_ENABLE_BITS(sc->ah, AR5K_PCICFG, led_5210);
+}
+
 void ath5k_led_enable(struct ath5k_softc *sc)
 {
 	if (test_bit(ATH_STAT_LEDSOFT, sc->status)) {
-- 
1.7.1

^ permalink raw reply related

* Re: pkt_sched: gen_estimator: more fuel for Jarek and Changli
From: Eric Dumazet @ 2010-06-09  7:36 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <20100609065134.GA6679@ff.dom.local>

Le mercredi 09 juin 2010 à 06:51 +0000, Jarek Poplawski a écrit :
> On Wed, Jun 09, 2010 at 08:13:17AM +0200, Eric Dumazet wrote:
> > 
> > With un-modified kernel, I ran following scripts on my machine
> 
> Why not modified with your other patch quite obviously needed by
> rateest?:
> 

First patch is obvious, but I cooked same script to trigger both bugs,
because you needed some evidences.

> > [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> > 
> > gen_kill_estimator() / gen_new_estimator() is not always called with
> > RTNL held.
> 
> Btw, I agree with Changli that adding RTNL to rateest (if possible),
> and doing the RTNL replacement later, seems more proper.
> 

I wont be the guy adding RTNL to netfilter, thats for sure. That would
be a step backward. 
Sometimes, the 'obvious' fix is also a dumb one.
Do you really think I didnt had this idea too ?

xt_RATEEST is an unfortunate domain intersection (netfilter / sched).

We can solve this using a fine grained lock, instead of interesting
lockdep issues, yet to be discovered.

You can submit your patch, but I wont Ack it, I'll Nack it for all these
reasons.

Why dont we remove all locks we have 'because we can use RTNL and be
with it' ?

qdisc_mod_lock could be removed quite instantly, qdisc_base could be
protected by RTNL... And so on...

^ permalink raw reply

* RE: [RFC PATCH v7 08/19] Make __alloc_skb() to get external buffer.
From: Xin, Xiaohui @ 2010-06-09  7:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <1275749604.5238.239.camel@edumazet-laptop>

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Saturday, June 05, 2010 10:53 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 08/19] Make __alloc_skb() to get external buffer.
>
>Le samedi 05 juin 2010 à 18:14 +0800, xiaohui.xin@intel.com a écrit :
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>> 	child->fclone = SKB_FCLONE_UNAVAILABLE;
>>  	}
>> +	/* Record the external buffer info in this field. It's not so good,
>> +	 * but we cannot find another place easily.
>> +	 */
>> +	shinfo->destructor_arg = ext_page;
>> +
>
>
>Yes this is a big problem, its basically using a cache line that was not
>touched before.
>

Did your patch which moves destructor_arg before frags[] also fix this?

Thanks
Xiaohui

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Jens Axboe @ 2010-06-09  7:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Carl Worth, Eric Anholt, Venkatesh Pallipadi,
	Dave Airlie, Jesse Barnes, Ingo Molnar, David Härdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <alpine.LFD.2.00.1006081814240.4506@i5.linux-foundation.org>

On 2010-06-09 03:53, Linus Torvalds wrote:
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
>> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
>> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
>> Date		: 2010-06-05 06:15 (4 days old)
> 
> This seems to have been introduced by
> 
> 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> 	Author: Ingo Molnar <mingo@elte.hu>
> 	Date:   Sat Nov 8 17:05:38 2008 +0100
> 
> 	    sched: optimize sched_clock() a bit
>     
> 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> 	    variant of __cycles_2_ns().
>     
> 	    Most of the time sched_clock() is called with irqs disabled already.
> 	    The few places that call it with irqs enabled need to be updated.
>     
> 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> and this seems to be one of those calling cases that need to be updated.
> 
> Ingo? The call trace is:
> 
> 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> 	caller is native_sched_clock+0x3c/0x68
> 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> 	Call Trace:
> 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> 	[<ffffffff8101043d>] sched_clock+0x9/0xd
> 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
> 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> 	[<ffffffff81215537>] __make_request+0x338/0x45b
> 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> 	[<ffffffff81214909>] submit_bio+0xd2/0xef
> 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
> 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
> 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
> 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
> 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
> 
> (from the bugzilla thing)

This should be fixed by commit 28f4197e which was merged on friday.

>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16122
>> Subject	: 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
>> Submitter	: Larry Finger <Larry.Finger@lwfinger.net>
>> Date		: 2010-06-04 13:18 (5 days old)
> 
> Jens?

Looking into this one.

-- 
Jens Axboe


^ permalink raw reply

* Re: [stable] Please apply ucc_geth patches to 2.6.32-stable
From: Pekka Enberg @ 2010-06-09  8:02 UTC (permalink / raw)
  To: Petri Lehtinen
  Cc: linux-kernel, stable, avorontsov, Linux Netdev List, David Miller
In-Reply-To: <20100609071623.GC20057@colossus>

On Wed, Jun 9, 2010 at 10:16 AM, Petri Lehtinen <petri.lehtinen@inoi.fi> wrote:
> The following three patches fix some serious flaws in the ucc_geth
> network driver in 2.6.32. From oldest to newest:
>
> 7583605b6d29f1f7f6fc505b883328089f3485ad ucc_geth: Fix empty TX queue processing
> 08b5e1c91ce95793c59a59529a362a1bcc81faae ucc_geth: Fix netdev watchdog triggering on link changes
> 34692421bc7d6145ef383b014860f4fde10b7505 ucc_geth: Fix full TX queue processing
>
> They apply cleanly on top of 2.6.32-stable master. They were also
> originally Cc'd to be included in 2.6.32-stable.

You forgot to CC stable@kernel.org, netdev, and the original author of
those patches.

^ permalink raw reply

* Re: pkt_sched: gen_estimator: more fuel for Jarek and Changli
From: Jarek Poplawski @ 2010-06-09  8:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David Miller, netdev, Stephen Hemminger,
	Patrick McHardy
In-Reply-To: <1276068964.2442.15.camel@edumazet-laptop>

On Wed, Jun 09, 2010 at 09:36:04AM +0200, Eric Dumazet wrote:
> Le mercredi 09 juin 2010 ?? 06:51 +0000, Jarek Poplawski a écrit :
> > On Wed, Jun 09, 2010 at 08:13:17AM +0200, Eric Dumazet wrote:
> > > 
> > > With un-modified kernel, I ran following scripts on my machine
> > 
> > Why not modified with your other patch quite obviously needed by
> > rateest?:
> > 
> 
> First patch is obvious, but I cooked same script to trigger both bugs,
> because you needed some evidences.
> 
> > > [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> > > 
> > > gen_kill_estimator() / gen_new_estimator() is not always called with
> > > RTNL held.
> > 
> > Btw, I agree with Changli that adding RTNL to rateest (if possible),
> > and doing the RTNL replacement later, seems more proper.
> > 
> 
> I wont be the guy adding RTNL to netfilter, thats for sure. That would
> be a step backward. 
> Sometimes, the 'obvious' fix is also a dumb one.
> Do you really think I didnt had this idea too ?
> 
> xt_RATEEST is an unfortunate domain intersection (netfilter / sched).
> 
> We can solve this using a fine grained lock, instead of interesting
> lockdep issues, yet to be discovered.
> 
> You can submit your patch, but I wont Ack it, I'll Nack it for all these
> reasons.

I'm not going to submit my patch. The reason for using the RTNL is
the current gen_estimator API. If rateest were done properly it would
have to use it, and I'm astonished Patrick missed it, since it took
place just around this API fix (with Patrick's part) with the patch I
mentioned:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0929c2dd83317813425b937fbc0041013b8685ff

So, the best would be to get Patrick's opinion. Otherwise, of course
your patch is the next alternative.

> 
> Why dont we remove all locks we have 'because we can use RTNL and be
> with it' ?
> 
> qdisc_mod_lock could be removed quite instantly, qdisc_base could be
> protected by RTNL... And so on...
> 

Yes, but it should be done independently from fixing bugs.

Jarek P.

^ permalink raw reply

* RE: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
From: Xin, Xiaohui @ 2010-06-09  8:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <20100606161348.427822fb@nehalam>

>-----Original Message-----
>From: Stephen Hemminger [mailto:shemminger@vyatta.com]
>Sent: Monday, June 07, 2010 7:14 AM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>Still not sure this is a good idea for a couple of reasons:
>
>1. We already have lots of special cases with skb's (frags and fraglist),
>   and skb's travel through a lot of different parts of the kernel.  So any
>   new change like this creates lots of exposed points for new bugs. Look
>   at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec
>   and ppp and ...
>

Yes, I agree on your concern at some extent. But the skbs which use external pages in our cases have short travels which start from NIC driver and then forward to the guest immediately. It will not travel into host kernel stack or any filters which may avoid many problems you have mentioned here.

Another point is that we try to make the solution more generic to different NIC drivers, then many drivers may use the way without modifications.

>2. SKB's can have infinite lifetime in the kernel. If these buffers come from
>   a fixed size pool in an external device, they can easily all get tied up
>   if you have a slow listener. What happens then?

The pool is not fixed size, it's the size of usable buffers submitted by guest virtio-net driver. Guest virtio-net will check how much buffers are filled and do resubmit. What a slow listener mean? A slow NIC driver?

Thanks
Xiaohui

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox