Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ipv4: add a fib_type to fib_info
From: Eric Dumazet @ 2012-10-04 13:32 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez, Dave Jones, Julian Anastasov
In-Reply-To: <506D8A4F.3020008@googlemail.com>

On Thu, 2012-10-04 at 14:08 +0100, Chris Clayton wrote:

> I've tested 3.6.0 with this patch applied and networking in a WinXP KVM 
> client is now working fine. The patch applies cleanly to 3.6.0, so I 
> assume the patch will be forwarded to stable in due course.
> 
> Tested-by: Chris Clayton <chris2553@googlemail.com>

Thanks for testing.

^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Paolo Bonzini @ 2012-10-04 13:23 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Thomas Lendacky, kvm, netdev, linux-kernel,
	virtualization, avi, Sasha Levin
In-Reply-To: <87ipaq1jtt.fsf@rustcorp.com.au>

Il 04/10/2012 14:51, Rusty Russell ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> Il 04/10/2012 02:11, Rusty Russell ha scritto:
>>>>> There's a reason I haven't done this.  I really, really dislike "my
>>>>> implemention isn't broken" feature bits.  We could have an infinite
>>>>> number of them, for each bug in each device.
>>>>
>>>> However, this bug affects (almost) all implementations and (almost) all
>>>> devices.  It even makes sense to reserve a transport feature bit for it
>>>> instead of a device feature bit.
>>>
>>> Perhaps, but we have to fix the bugs first!
>>
>> Yes. :)  Isn't that what mst's patch does?
>>
>>> As I said, my torture patch broke qemu immediately.  Since noone has
>>> leapt onto fixing that, I'll take a look now...
>>
>> I can look at virtio-scsi.
> 
> Actually, you can't, see my reply to Anthony...
> 
> Message-ID: <87lifm1y1n.fsf@rustcorp.com.au>

    struct virtio_scsi_req_cmd {
        // Read-only
        u8 lun[8];
        u64 id;
        u8 task_attr;
        u8 prio;
        u8 crn;
        char cdb[cdb_size];
        char dataout[];
        // Write-only part
        u32 sense_len;
        u32 residual;
        u16 status_qualifier;
        u8 status;
        u8 response;
        u8 sense[sense_size];
        char datain[];
    };

where cdb_size and sense_size come from configuration space.  The device
right now expects everything before dataout/datain to be in a single
descriptor, but that's in no way part of the spec.  Am I missing
something egregious?

Paolo

^ permalink raw reply

* Re: [PATCH] ipv4: add a fib_type to fib_info
From: Chris Clayton @ 2012-10-04 13:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez, Dave Jones, Julian Anastasov
In-Reply-To: <1349349926.16011.48.camel@edumazet-glaptop>



On 10/04/12 12:25, Eric Dumazet wrote:
> On Tue, 2012-10-02 at 17:48 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> On Tue, 2012-10-02 at 17:35 +0200, Eric Dumazet wrote:
>>> On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
>>>> On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
>>>
>>>>> If you can find a way to more reliably trigger the case, that would
>>>>> help us immensely.
>>>>
>>>> I am building a KMEMCHECK kernel, as a last try before my night ;)
>>>
>>> This was a total disaster. KMEMCHECK dies horribly on my machine
>>>
>>> David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
>>> __mkroute_input() ?
>>>
>>> (And change rt_cache_route() as well ?)
>>>
>>> I am testing a patch right now.
>>
>
> OK so I implemented David idea and it seems to work.
>
> Testers are needed, thanks ! ;)
>

I've tested 3.6.0 with this patch applied and networking in a WinXP KVM 
client is now working fine. The patch applies cleanly to 3.6.0, so I 
assume the patch will be forwarded to stable in due course.

Tested-by: Chris Clayton <chris2553@googlemail.com>

> [PATCH] ipv4: add a fib_type to fib_info
>
> commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
> introduced a regression for forwarding.
>
> This was hard to reproduce but the symptom was that packets were
> delivered to local host instead of being forwarded.
>
> David suggested to add fib_type to fib_info so that we dont
> inadvertently share same fib_info for different purposes.
>
> With help from Julian Anastasov who provided very helpful
> hints, reproduced here :
>
> <quote>
>          Can it be a problem related to fib_info reuse
> from different routes. For example, when local IP address
> is created for subnet we have:
>
> broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
> 192.168.0.1
> 192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
> local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
>
>          The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
> a reused fib_info structure where we put cached routes.
> The result can be same fib_info for 192.168.0.255 and
> 192.168.0.0/24. RTN_BROADCAST is cached only for input
> routes. Incoming broadcast to 192.168.0.255 can be cached
> and can cause problems for traffic forwarded to 192.168.0.0/24.
> So, this patch should solve the problem because it
> separates the broadcast from unicast traffic.
>
>          And the ip_route_input_slow caching will work for
> local and broadcast input routes (above routes 1 and 3) just
> because they differ in scope and use different fib_info.
>
> </quote>
>
> Many thanks to Chris Clayton for his patience and help.
>
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Bisected-by: Chris Clayton <chris2553@googlemail.com>
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Julian Anastasov <ja@ssi.bg>
> ---
>   include/net/ip_fib.h     |    1 +
>   net/ipv4/fib_semantics.c |    2 ++
>   2 files changed, 3 insertions(+)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 926142e..9497be1 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -102,6 +102,7 @@ struct fib_info {
>   	unsigned char		fib_dead;
>   	unsigned char		fib_protocol;
>   	unsigned char		fib_scope;
> +	unsigned char		fib_type;
>   	__be32			fib_prefsrc;
>   	u32			fib_priority;
>   	u32			*fib_metrics;
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 3509065..2677530 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -314,6 +314,7 @@ static struct fib_info *fib_find_info(const struct fib_info *nfi)
>   		    nfi->fib_scope == fi->fib_scope &&
>   		    nfi->fib_prefsrc == fi->fib_prefsrc &&
>   		    nfi->fib_priority == fi->fib_priority &&
> +		    nfi->fib_type == fi->fib_type &&
>   		    memcmp(nfi->fib_metrics, fi->fib_metrics,
>   			   sizeof(u32) * RTAX_MAX) == 0 &&
>   		    ((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_F_DEAD) == 0 &&
> @@ -833,6 +834,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
>   	fi->fib_flags = cfg->fc_flags;
>   	fi->fib_priority = cfg->fc_priority;
>   	fi->fib_prefsrc = cfg->fc_prefsrc;
> +	fi->fib_type = cfg->fc_type;
>
>   	fi->fib_nhs = nhs;
>   	change_nexthops(fi) {
>
>

^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Rusty Russell @ 2012-10-04 12:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Michael S. Tsirkin, Thomas Lendacky, kvm, netdev, linux-kernel,
	virtualization, avi, Sasha Levin
In-Reply-To: <506D3610.7000103@redhat.com>

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 04/10/2012 02:11, Rusty Russell ha scritto:
>> > > There's a reason I haven't done this.  I really, really dislike "my
>> > > implemention isn't broken" feature bits.  We could have an infinite
>> > > number of them, for each bug in each device.
>> >
>> > However, this bug affects (almost) all implementations and (almost) all
>> > devices.  It even makes sense to reserve a transport feature bit for it
>> > instead of a device feature bit.
>>
>> Perhaps, but we have to fix the bugs first!
>
> Yes. :)  Isn't that what mst's patch does?
>
>> As I said, my torture patch broke qemu immediately.  Since noone has
>> leapt onto fixing that, I'll take a look now...
>
> I can look at virtio-scsi.

Actually, you can't, see my reply to Anthony...

Message-ID: <87lifm1y1n.fsf@rustcorp.com.au>

Cheers,
Rusty.

^ permalink raw reply

* Re: policy routing vs dnat replies
From: Stephen Clark @ 2012-10-04 12:39 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Network Development
In-Reply-To: <CALCETrWs4NDb_4KzJqnQVAkWHuaUS3Xrguuk2wwZGGKo8TqSLA@mail.gmail.com>

On 10/03/2012 08:44 PM, Andy Lutomirski wrote:
> I hit an annoying policy routing corner case today.  I have a router
> with two WAN interfaces (and no BGP).  I have policy routing set up so
> that, if a source address matches either of my public networks, then
> outgoing packets use the correct interface.  If neither rule matches
> (e.g. the source is 0.0.0.0 for source address selection), then the
> default route is whichever one I prefer at the moment.  It looks like
> this:
>
> $ ip rule
> 0:	from all lookup local
> 32766:	from all lookup main
> 32767:	from all lookup default
> 40000:	from<net2>  lookup isp2
> 40001:	from<net1>  lookup isp1
> 40010:	from all lookup real_default
>
> The relevant routes are:
>
> default via<gw1>  dev eth0.2  table isp1  src<src1>
> default via<gw2>  dev eth0.3  table isp2  src<src2>
> default via<gw2>  dev eth0.3  table real_default  src<src2>   metric 101
> default via<gw1>  dev eth0.2  table real_default  src<src1>   metric 102
>
> (Yes, this is a bit verbose, but I don't know a more concise way to do this.)
>
> This works nicely: if I specifically bind to one of my public
> addresses, the corresponding WAN link is used, and if not or if I'm
> coming from a private address, then the metrics determine which link
> to use.
>
> DNAT breaks it.  I have a rule:
> -A PREROUTING -i eth0.2 -d<ip1>  -p tcp --dport<port>  -j DNAT --to
> <internal host>
>
> <ip1>  lives on isp1.  Someone sends a SYN.  It gets routed to the
> internal host, and that host sends a SYN/ACK back.  The SYN/ACK has a
> source ip that isn't on net1 or net2, so it matches the 'lookup
> real_default' rule and gets routed to *gw2*.  iptables rewrites the
> source address after the routing decision, and my router sends a
> packet with a source address belonging to isp1 to isp2's gateway.  The
> packet is then dropped.
>
> Is there any way I can either convince iptables to rewrite the source
> address in the prerouting hook or to query the conntrack source
> address from the policy rule?  Is there a better solution?  I'm
> currently using a somewhat gross combination of MARK and fwmark
> matches to work around this problem.  One possibility would be:
>
> Thanks,
> Andy
>
> P.S.  Linux 3.2 (at least) appears to have a bug: the SYN/ACK has
> ctdir ORIGINAL as seen from the the mangle PREROUTING chain.  I'll
> send a real bug report for that if I can reproduce it cleanly on a
> newer kernel.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>    
Well what I did faced with a similar problem was add a higher prio rule
that said if from ip1 lookup isp1.

-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)

^ permalink raw reply

* Re: [PATCH] tipc: flow control should not account for sk_rcvbuf
From: Erik Hugne @ 2012-10-04 12:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, Jon Maloy, ying.xue@windriver.com,
	paul.gortmaker@windriver.com
In-Reply-To: <1349346360.16011.36.camel@edumazet-glaptop>

> And this limit is tested _before_ queueing to backlog if socket is owned
> by the user ?
>
> You'll have to demonstrate this in the changelog.
>
> Again, I dont think this patch is safe, we need an explicit limit.
You're right Eric..

Another way of solving it is to increase the default sk_rcvbuf size to
(TIPC_FLOW_CONTROL_WIN * 2 * TIPC_MAX_USER_MSG_SIZE)
at socket creation.

Do you think that would be acceptable?

^ permalink raw reply

* [PATCH] ipv4: add a fib_type to fib_info
From: Eric Dumazet @ 2012-10-04 11:25 UTC (permalink / raw)
  To: David Miller; +Cc: chris2553, netdev, gpiez, Dave Jones, Julian Anastasov
In-Reply-To: <1349192919.12401.778.camel@edumazet-glaptop>

On Tue, 2012-10-02 at 17:48 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Tue, 2012-10-02 at 17:35 +0200, Eric Dumazet wrote:
> > On Mon, 2012-10-01 at 22:04 +0200, Eric Dumazet wrote:
> > > On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
> > 
> > > > If you can find a way to more reliably trigger the case, that would
> > > > help us immensely.
> > > 
> > > I am building a KMEMCHECK kernel, as a last try before my night ;)
> > 
> > This was a total disaster. KMEMCHECK dies horribly on my machine
> > 
> > David, shouldnt we use a nh_rth_forward instead of a nh_rth_input in
> > __mkroute_input() ?
> > 
> > (And change rt_cache_route() as well ?)
> > 
> > I am testing a patch right now.
> 

OK so I implemented David idea and it seems to work.

Testers are needed, thanks ! ;)

[PATCH] ipv4: add a fib_type to fib_info

commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
---
 include/net/ip_fib.h     |    1 +
 net/ipv4/fib_semantics.c |    2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 926142e..9497be1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -102,6 +102,7 @@ struct fib_info {
 	unsigned char		fib_dead;
 	unsigned char		fib_protocol;
 	unsigned char		fib_scope;
+	unsigned char		fib_type;
 	__be32			fib_prefsrc;
 	u32			fib_priority;
 	u32			*fib_metrics;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3509065..2677530 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -314,6 +314,7 @@ static struct fib_info *fib_find_info(const struct fib_info *nfi)
 		    nfi->fib_scope == fi->fib_scope &&
 		    nfi->fib_prefsrc == fi->fib_prefsrc &&
 		    nfi->fib_priority == fi->fib_priority &&
+		    nfi->fib_type == fi->fib_type &&
 		    memcmp(nfi->fib_metrics, fi->fib_metrics,
 			   sizeof(u32) * RTAX_MAX) == 0 &&
 		    ((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_F_DEAD) == 0 &&
@@ -833,6 +834,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 	fi->fib_flags = cfg->fc_flags;
 	fi->fib_priority = cfg->fc_priority;
 	fi->fib_prefsrc = cfg->fc_prefsrc;
+	fi->fib_type = cfg->fc_type;
 
 	fi->fib_nhs = nhs;
 	change_nexthops(fi) {

^ permalink raw reply related

* Re: [PATCH] tipc: flow control should not account for sk_rcvbuf
From: Eric Dumazet @ 2012-10-04 10:26 UTC (permalink / raw)
  To: Erik Hugne
  Cc: netdev@vger.kernel.org, Jon Maloy, ying.xue@windriver.com,
	paul.gortmaker@windriver.com
In-Reply-To: <506D5DEF.9050700@ericsson.com>

On Thu, 2012-10-04 at 11:59 +0200, Erik Hugne wrote:
> > What guarantee do we have this cannot use all kernel memory ?
> >
> > If sk->sk_rcvbuf is not an acceptable limit here, you must use a
> > different limit, but not infinity.
> >
> There is an implicit limit on how much data that can be buffered on each 
> socket, controlled by TIPC_FLOW_CONTROL_WIN.
> 
> This limit is:
> TIPC_FLOW_CONTROL_WIN * 2 * TIPC_MAX_USER_MSG_SIZE

And this limit is tested _before_ queueing to backlog if socket is owned
by the user ?

You'll have to demonstrate this in the changelog.

Again, I dont think this patch is safe, we need an explicit limit.

^ permalink raw reply

* Re: [net-next.git 3/8 (V2)] stmmac: add the initial tx coalesce schema
From: Giuseppe CAVALLARO @ 2012-10-04 10:06 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev
In-Reply-To: <50580995.4040609@st.com>

On 9/18/2012 7:41 AM, Giuseppe CAVALLARO wrote:
> Hello David, Ben,
>
> On 9/14/2012 9:36 AM, Giuseppe CAVALLARO wrote:
>> On 9/13/2012 10:23 PM, David Miller wrote:
>>> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
>>> Date: Tue, 11 Sep 2012 08:55:09 +0200
>>>
>>>> +    unsigned long flags;
>>>> +
>>>> +    spin_lock_irqsave(&priv->tx_lock, flags);
>>>>
>>>> -    spin_lock(&priv->tx_lock);
>>>> +    priv->xstats.tx_clean++;
>>>
>>> You are changing the locking here for the sake of the new timer.
>>>
>>> But timers run in software interrupt context, so this change is
>>> completely unnecessary since NAPI runs in software interrupt context
>>> as well, and neither timers nor NAPI run in hardware interrupts
>>> context.
>>
>> Indeed It can be called by the ISR too in this new implementation.
>> I have added the spin_lock_irqsave/restore otherwise, testing with
>> CONFIG_PROVE_LOOKING, I get the following warning on ARM SMP.
>
> sorry if I disturb you again, any news on these patches?
> Please, let me know.

Hello David,

I'd like to review these patches and resend them again (not now that 
net-next is closed) especially because they improve the driver 
performances and other people can get these benefits.

I've some doubts about what exactly I have to do. Sorry if I bother you.

Some of these patches were also reviewed by Ben long time ago; now there 
is only pending the spinlock modifications where you've had some concerns.

Maybe, I was not so clear in my comments so let me provide you further 
details and please let me know.

The stmmac_tx (to clear the tx resources) is now called by both the sw 
timer and the Interrupt handler.

   Note:
   I have also added a threshold that allows to set the interrupt on
   completion bit after a number of frames (Ben also game me some useful
   info to fix this code). On my tests, using both SH4 and ARM (SMP)
   platforms, this helped to make the driver more reactive on heavy
   traffic. Only using the timer I noticed that the driver losses frames
   in some network benchmarks (with iperf, netperf, nuttcp).

On ARM SMP, with  CONFIG_PROVE_LOOKING enabled, I got the info about the 
inconsistent lock state. Indeed, it makes sense to me to protect the 
stmmac_tx from irq in this new implementation.

What do you think, welcome your feedback.

Best Regards
Peppe

>
> Regards
> Peppe
>
>>
>> [    8.030000]
>> [    8.030000] =================================
>> [    8.030000] [ INFO: inconsistent lock state ]
>> [    8.030000] 3.4.7_stm24_0302-b2000+ #103 Not tainted
>> [    8.030000] ---------------------------------
>> [    8.030000] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>> [    8.030000] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
>> [    8.030000]  (&(&priv->tx_lock)->rlock){?.-...}, at: [<802651d8>]
>> stmmac_tx+0x1c/0x388
>> [    8.030000] {HARDIRQ-ON-W} state was registered at:
>> [    8.030000]   [<800562b4>] __lock_acquire+0x638/0x179c
>> [    8.030000]   [<80057884>] lock_acquire+0x60/0x74
>> [    8.030000]   [<80428a08>] _raw_spin_lock+0x40/0x50
>> [    8.030000]   [<802651d8>] stmmac_tx+0x1c/0x388
>> [    8.030000]   [<80026be0>] run_timer_softirq+0x180/0x23c
>> [    8.030000]   [<80020ccc>] __do_softirq+0xa0/0x114
>> [    8.030000]   [<80021204>] irq_exit+0x58/0x7c
>> [    8.030000]   [<8000dc80>] handle_IRQ+0x7c/0xb8
>> [    8.030000]   [<80008464>] gic_handle_irq+0x34/0x58
>> [    8.030000]   [<80429684>] __irq_svc+0x44/0x78
>> [    8.030000]   [<8001c3f4>] vprintk+0x41c/0x480
>> [    8.030000]   [<8042097c>] printk+0x18/0x24
>> [    8.030000]   [<805aef6c>] prepare_namespace+0x1c/0x1a4
>> [    8.030000]   [<805ae980>] kernel_init+0x1c8/0x20c
>> [    8.030000]   [<8000deb8>] kernel_thread_exit+0x0/0x8
>> [    8.030000] irq event stamp: 254745
>> [    8.030000] hardirqs last  enabled at (254744): [<80429240>]
>> _raw_spin_unlock_irqrestore+0x3c/0x6c
>> [    8.030000] hardirqs last disabled at (254745): [<80429674>]
>> __irq_svc+0x34/0x78
>> [    8.030000] softirqs last  enabled at (254741): [<8035d964>]
>> dev_queue_xmit+0x6a4/0x724
>> [    8.030000] softirqs last disabled at (254737): [<8035d2d4>]
>> dev_queue_xmit+0x14/0x724
>> [    8.030000]
>> [    8.030000] other info that might help us debug this:
>> [    8.030000]  Possible unsafe locking scenario:
>> [    8.030000]
>> [    8.030000]        CPU0
>> [    8.030000]        ----
>> [    8.030000]   lock(&(&priv->tx_lock)->rlock);
>> [    8.030000]   <Interrupt>
>> [    8.030000]     lock(&(&priv->tx_lock)->rlock);
>> [    8.030000]
>> [    8.030000]  *** DEADLOCK ***
>>
>>> Therefore, disabling hardware interrupts for this lock is unnecessary
>>> and will decrease performance.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Re: [PATCH] tipc: flow control should not account for sk_rcvbuf
From: Erik Hugne @ 2012-10-04  9:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, Jon Maloy, ying.xue@windriver.com,
	paul.gortmaker@windriver.com
In-Reply-To: <1349342493.16011.32.camel@edumazet-glaptop>

> What guarantee do we have this cannot use all kernel memory ?
>
> If sk->sk_rcvbuf is not an acceptable limit here, you must use a
> different limit, but not infinity.
>
There is an implicit limit on how much data that can be buffered on each 
socket, controlled by TIPC_FLOW_CONTROL_WIN.

This limit is:
TIPC_FLOW_CONTROL_WIN * 2 * TIPC_MAX_USER_MSG_SIZE

//E

^ permalink raw reply

* Re: [PATCH 3/20] drivers/net/can/sja1000/peak_pci.c: fix error return code
From: Marc Kleine-Budde @ 2012-10-04  9:31 UTC (permalink / raw)
  To: Peter Senna Tschudin
  Cc: wg, kernel-janitors, davem, jj, linux-can, netdev, linux-kernel
In-Reply-To: <1349281090-10013-4-git-send-email-peter.senna@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On 10/03/2012 06:17 PM, Peter Senna Tschudin wrote:
> From: Peter Senna Tschudin <peter.senna@gmail.com>
> 
> Convert a nonnegative error return code to a negative one, as returned
> elsewhere in the function.
> 
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> (
> if@p1 (\(ret < 0\|ret != 0\))
>  { ... return ret; }
> |
> ret@p1 = 0
> )
> ... when != ret = e1
>     when != &ret
> *if(...)
> {
>   ... when != ret = e2
>       when forall
>  return ret;
> }
> // </smpl>
> 
> Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>

Applied to can.
Tnx, Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply

* Re: [PATCH 4/20] drivers/net/can/sja1000/peak_pcmcia.c: fix error return code
From: Marc Kleine-Budde @ 2012-10-04  9:30 UTC (permalink / raw)
  To: Peter Senna Tschudin
  Cc: wg, kernel-janitors, linux, linux-can, netdev, linux-kernel
In-Reply-To: <1349281090-10013-3-git-send-email-peter.senna@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On 10/03/2012 06:17 PM, Peter Senna Tschudin wrote:
> From: Peter Senna Tschudin <peter.senna@gmail.com>
> 
> Convert a nonnegative error return code to a negative one, as returned
> elsewhere in the function.
> 
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> (
> if@p1 (\(ret < 0\|ret != 0\))
>  { ... return ret; }
> |
> ret@p1 = 0
> )
> ... when != ret = e1
>     when != &ret
> *if(...)
> {
>   ... when != ret = e2
>       when forall
>  return ret;
> }
> // </smpl>
> 
> Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>

Applied to can.
tnx, Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply

* Re: [PATCH] team: set qdisc_tx_busylock to avoid LOCKDEP splat
From: Jiri Pirko @ 2012-10-04  9:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jiri Pirko
In-Reply-To: <1349342319.16011.30.camel@edumazet-glaptop>

Thu, Oct 04, 2012 at 11:18:39AM CEST, eric.dumazet@gmail.com wrote:
>From: Eric Dumazet <edumazet@google.com>
>
>If a qdisc is installed on a team device, its possible to get
>a lockdep splat under stress, because nested dev_queue_xmit() can
>lock busylock a second time (on a different device, so its a false
>positive)
>
>Avoid this problem using a distinct lock_class_key for team
>devices.
>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Jiri Pirko <jpirko@redhat.com>
>---
> drivers/net/team/team.c |    2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index 5c7547c..d44cca3 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -1315,6 +1315,7 @@ static const struct team_option team_options[] = {
> 
> static struct lock_class_key team_netdev_xmit_lock_key;
> static struct lock_class_key team_netdev_addr_lock_key;
>+static struct lock_class_key team_tx_busylock_key;
> 
> static void team_set_lockdep_class_one(struct net_device *dev,
> 				       struct netdev_queue *txq,
>@@ -1327,6 +1328,7 @@ static void team_set_lockdep_class(struct net_device *dev)
> {
> 	lockdep_set_class(&dev->addr_list_lock, &team_netdev_addr_lock_key);
> 	netdev_for_each_tx_queue(dev, team_set_lockdep_class_one, NULL);
>+	dev->qdisc_tx_busylock = &team_tx_busylock_key;
> }
> 
> static int team_init(struct net_device *dev)
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks Eric.

Acked-by: Jiri Pirko <jiri@resnulli.us>

^ permalink raw reply

* Re: [PATCH] tipc: flow control should not account for sk_rcvbuf
From: Eric Dumazet @ 2012-10-04  9:21 UTC (permalink / raw)
  To: erik.hugne; +Cc: netdev, jon.maloy, ying.xue, paul.gortmaker
In-Reply-To: <1349342067-27586-1-git-send-email-erik.hugne@ericsson.com>

On Thu, 2012-10-04 at 11:14 +0200, erik.hugne@ericsson.com wrote:
> From: Erik Hugne <erik.hugne@ericsson.com>
> 
> The TIPC flow control is design around message count, and it should not
> account for the sk_rcvbuf when enqueueing messages to the socket
> receive queue.
> 
> This fixes a problem when the sk_add_backlog fails due to this check
> and TIPC_ERR_OVERLOAD is reported back to the sender.
> The sender would then drop it's side of the connection only, leaving
> a stale connection on the other end.
> 
> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
> ---
>  net/tipc/socket.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 09dc5b9..02fed90 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1269,10 +1269,8 @@ static u32 dispatch(struct tipc_port *tport, struct sk_buff *buf)
>  	if (!sock_owned_by_user(sk)) {
>  		res = filter_rcv(sk, buf);
>  	} else {
> -		if (sk_add_backlog(sk, buf, sk->sk_rcvbuf))
> -			res = TIPC_ERR_OVERLOAD;
> -		else
> -			res = TIPC_OK;
> +		__sk_add_backlog(sk, buf);
> +		res = TIPC_OK;
>  	}
>  	bh_unlock_sock(sk);
>  


What guarantee do we have this cannot use all kernel memory ?

If sk->sk_rcvbuf is not an acceptable limit here, you must use a
different limit, but not infinity.

^ permalink raw reply

* [PATCH] team: set qdisc_tx_busylock to avoid LOCKDEP splat
From: Eric Dumazet @ 2012-10-04  9:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jiri Pirko

From: Eric Dumazet <edumazet@google.com>

If a qdisc is installed on a team device, its possible to get
a lockdep splat under stress, because nested dev_queue_xmit() can
lock busylock a second time (on a different device, so its a false
positive)

Avoid this problem using a distinct lock_class_key for team
devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/team.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 5c7547c..d44cca3 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1315,6 +1315,7 @@ static const struct team_option team_options[] = {
 
 static struct lock_class_key team_netdev_xmit_lock_key;
 static struct lock_class_key team_netdev_addr_lock_key;
+static struct lock_class_key team_tx_busylock_key;
 
 static void team_set_lockdep_class_one(struct net_device *dev,
 				       struct netdev_queue *txq,
@@ -1327,6 +1328,7 @@ static void team_set_lockdep_class(struct net_device *dev)
 {
 	lockdep_set_class(&dev->addr_list_lock, &team_netdev_addr_lock_key);
 	netdev_for_each_tx_queue(dev, team_set_lockdep_class_one, NULL);
+	dev->qdisc_tx_busylock = &team_tx_busylock_key;
 }
 
 static int team_init(struct net_device *dev)

^ permalink raw reply related

* [PATCH] tipc: flow control should not account for sk_rcvbuf
From: erik.hugne @ 2012-10-04  9:14 UTC (permalink / raw)
  To: netdev, jon.maloy; +Cc: ying.xue, paul.gortmaker, Erik Hugne

From: Erik Hugne <erik.hugne@ericsson.com>

The TIPC flow control is design around message count, and it should not
account for the sk_rcvbuf when enqueueing messages to the socket
receive queue.

This fixes a problem when the sk_add_backlog fails due to this check
and TIPC_ERR_OVERLOAD is reported back to the sender.
The sender would then drop it's side of the connection only, leaving
a stale connection on the other end.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
---
 net/tipc/socket.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 09dc5b9..02fed90 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1269,10 +1269,8 @@ static u32 dispatch(struct tipc_port *tport, struct sk_buff *buf)
 	if (!sock_owned_by_user(sk)) {
 		res = filter_rcv(sk, buf);
 	} else {
-		if (sk_add_backlog(sk, buf, sk->sk_rcvbuf))
-			res = TIPC_ERR_OVERLOAD;
-		else
-			res = TIPC_OK;
+		__sk_add_backlog(sk, buf);
+		res = TIPC_OK;
 	}
 	bh_unlock_sock(sk);
 
-- 
1.7.5.4

^ permalink raw reply related

* Re: IPVS-DR problem with neigh lookup in 3.6
From: Julian Anastasov @ 2012-10-04  9:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121004.025808.2302904472507419433.davem@davemloft.net>


	Hello,

On Thu, 4 Oct 2012, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> Date: Thu, 4 Oct 2012 09:52:47 +0300 (EEST)
> 
> > 	Is it a good idea to keep rt_gateway always
> > valid and to check RTCF_REDIRECTED where needed?
> 
> What exactly do you mean by this?  rt_gateway can only
> be explicit nexthop or zero for local subnet.
> 
> It cannot take on any other value, otherwise routes are
> not properly sharable.

	Oh, yes, that was the reason for value 0.

> > Because adding rt_dst does not look good, it will not
> > help to callforward_do_filter too.
> 
> Adding rt_dst is not to be seriously considered.

	OK, I'll think more for alternatives. May be
if there is a way to request output route that should
not be cached. Then we can set valid rt_gateway for
such private routes, fnhe can be used for info too
but this route should not be cached there. Or we
should cache the entry in fnhe, so that we can detect
that fnhe is reclaimed.

	Currently, we handle NETDEV_UNREGISTER
both for cached and uncached routes, so users like IPVS
can safely request such special routes. IPVS does the
rt caching itself.

	Now the question is how to request such host routes
via flowi. Archives remember for FLOWI_FLAG_RT_NOCACHE, may be
we can port it to the new caching as FLOWI_FLAG_RT_HOST?
FLOWI_FLAG_RT_HOST will mean: can cache in fnhe if present
or return uncached result as second option.

	BTW, I see another place with problem, ip_forward():

if (opt->is_strictroute && opt->nexthop != rt->rt_gateway)

	may be should be

if (opt->is_strictroute && opt->nexthop != rt->rt_gateway &&
    rt->rt_gateway)

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH 19/20] drivers/net/ethernet/marvell/skge.c: fix error return code
From: Peter Senna Tschudin @ 2012-10-04  9:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: mlindner, kernel-janitors, netdev, linux-kernel
In-Reply-To: <20121003092508.6a7da662@nehalam.linuxnetplumber.net>

>
> Thanks for looking into these kind of problems. The contents
> of the patch are correct, but the automated commit message is useless.
> You shouldn't just blindly say what the automated
> script was looking for, you should describe what the bug is so that evaluators
> can decide what the impact is and if it should be backported to stable
> and vendor kernels.
>
> Please resubmit the patchs with a reasonable analysis in the commit message.
> Something like:
>
>   There is a bug in skge driver. If alloc_etherdev() fails, then
>   skge_devinit() will return NULL, and the skge_probe function incorrectly
>   returns success 0. It should return -ENOMEM instead.
>
>

Stephen, I do not want to include function names on the commit
message. What do you think about this updated message, is it
acceptable?

--- --- ---
This patch fixes a bug related to the return value of the function. In some
error cases, the function return non-negative SUCCESS values, when the
correct would be a negative ERROR value.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>
--- --- ---


Thanks,

Peter

-- 
Peter

^ permalink raw reply

* [PATCH] bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
From: Eric Dumazet @ 2012-10-04  9:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jay Vosburgh, Andy Gospodarek

From: Eric Dumazet <edumazet@google.com>

If a qdisc is installed on a bonding device, its possible to get
following lockdep splat under stress :

 =============================================
 [ INFO: possible recursive locking detected ]
 3.6.0+ #211 Not tainted
 ---------------------------------------------
 ping/4876 is trying to acquire lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
 
 but task is already holding lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
 
 other info that might help us debug this:
  Possible unsafe locking scenario:
 
        CPU0
        ----
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
 
  *** DEADLOCK ***
 
  May be due to missing lock nesting notation
 
 6 locks held by ping/4876:
  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30
  #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
  #3:  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
  #4:  (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding]
  #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
 
 stack backtrace:
 Pid: 4876, comm: ping Not tainted 3.6.0+ #211
 Call Trace:
  [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80
  [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100
  [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0
  [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
  [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50
  [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
  [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90
  [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
  [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
  [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding]
  [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0
  [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0
  [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830
  [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
  [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870
  [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870
  [<ffffffff815bb964>] ip_output+0x54/0xf0
  [<ffffffff815bad48>] ip_local_out+0x28/0x90
  [<ffffffff815bc444>] ip_send_skb+0x14/0x50
  [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40
  [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30
  [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0
  [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
  [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
  [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
  [<ffffffff815f6855>] inet_sendmsg+0x125/0x240
  [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
  [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0
  [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
  [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
  [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390
  [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0
  [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0
  [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0
  [<ffffffff8116f98a>] ? fget_light+0x3da/0x510
  [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80
  [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b

Avoid this problem using a distinct lock_class_key for bonding
devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
---
 drivers/net/bonding/bond_main.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7858c58..b721902 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4826,6 +4826,7 @@ static int bond_check_params(struct bond_params *params)
 
 static struct lock_class_key bonding_netdev_xmit_lock_key;
 static struct lock_class_key bonding_netdev_addr_lock_key;
+static struct lock_class_key bonding_tx_busylock_key;
 
 static void bond_set_lockdep_class_one(struct net_device *dev,
 				       struct netdev_queue *txq,
@@ -4840,6 +4841,7 @@ static void bond_set_lockdep_class(struct net_device *dev)
 	lockdep_set_class(&dev->addr_list_lock,
 			  &bonding_netdev_addr_lock_key);
 	netdev_for_each_tx_queue(dev, bond_set_lockdep_class_one, NULL);
+	dev->qdisc_tx_busylock = &bonding_tx_busylock_key;
 }
 
 /*

^ permalink raw reply related

* Question: Proper place for port details?
From: Stefan Raspl @ 2012-10-04  8:28 UTC (permalink / raw)
  To: netdev

Some network device drivers are capable of collecting further
information on the link partner, e.g. reporting details on the port
in the adjacent switch. Is there a good place to make these
available outside of the kernel? ethtool gives a bit of information
like the link partner's speed or duplex mode. But is adding further
information about the link partner to ethtool desired, or is there a
better place somewhere else?

Ciao,
Stefan

^ permalink raw reply

* Re: [patch v3 02/11] inet_diag: pass inet_diag module to netlink_dump_start
From: Gao feng @ 2012-10-04  8:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, steffen.klassert, netfilter-devel, linux-rdma, netdev,
	linux-crypto, pablo, stephen.hemminger, jengelh
In-Reply-To: <1349327316.16011.11.camel@edumazet-glaptop>

于 2012年10月04日 13:08, Eric Dumazet 写道:
> I believe Pablo suggestion was to make netlink_dump_start()
> automatically pass THIS_MODULE so that we dont need to change all call
> sites ?

Yes, since you and Pablo both think this should be better,
I will resend this patchset.

Thanks!

^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Rusty Russell @ 2012-10-04  7:44 UTC (permalink / raw)
  To: Anthony Liguori, Michael S. Tsirkin, Thomas Lendacky
  Cc: Sasha Levin, virtualization, linux-kernel, avi, kvm, netdev
In-Reply-To: <87wqz6kgg4.fsf@codemonkey.ws>

Anthony Liguori <anthony@codemonkey.ws> writes:
>> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
>> ever going to be merged, so I'm not sure what its status is?  But I'm
>> determined to fix qemu, and hence my torture patch to make sure this
>> doesn't creep in again.
>
> There are even more implementations out there and I'd wager they all
> rely on framing.

Worse, both virtio_blk (for scsi commands) and virtio_scsi explicitly
and inescapably rely on framing.  The spec conflicts clearly with
itself.

Such layering violations are always a mistake, but I can't blame anyone
else for my lack of attention :(

Here's the spec change:
commit 7e74459bb966ccbaad9e4bf361d1178b7f400b79
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Thu Oct 4 17:11:27 2012 +0930

No longer assume framing is independent of messages.  *sniff*

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

--- virtio-spec.txt	2012-10-04 17:13:04.988259234 +0930
+++ virtio-spec.txt.new	2012-10-04 17:12:54.624258969 +0930
@@ -880,19 +880,19 @@
 
   Message Framing
 
-The descriptors used for a buffer should not effect the semantics
-of the message, except for the total length of the buffer. For
-example, a network buffer consists of a 10 byte header followed
-by the network packet. Whether this is presented in the ring
-descriptor chain as (say) a 10 byte buffer and a 1514 byte
-buffer, or a single 1524 byte buffer, or even three buffers,
-should have no effect.
+Unless stated otherwise, it is expected that headers within a
+message are contained within their own descriptors. For example,
+a network buffer consists of a 10 or 12 byte header followed by
+the network packet. An implementation should expect that this
+header will be within the first descriptor, and that the
+remainder of the data will begin on the second descriptor.
 
-In particular, no implementation should use the descriptor
-boundaries to determine the size of any header in a request.[footnote:
-The current qemu device implementations mistakenly insist that
-the first descriptor cover the header in these cases exactly, so
-a cautious driver should arrange it so.
+[footnote:
+It was previously asserted that framing should be independent of
+message contents, yet invariably drivers layed out messages in
+reliable ways and devices assumed it. In addition, the
+specifications for virtio_blk and virtio_scsi require intuiting
+field lengths from frame boundaries.
 ]
 
   Device Improvements

^ permalink raw reply

* Re: [patch net] sky2: fix rx filter setup on link up
From: Jiri Pirko @ 2012-10-04  7:40 UTC (permalink / raw)
  To: Mirko Lindner
  Cc: Stephen Hemminger, netdev@vger.kernel.org, davem@davemloft.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <175CCF5F49938B4D99B2E3EF7F558EBE1B63898F0B@SC-VEXCH4.marvell.com>

Tue, Sep 18, 2012 at 02:38:52AM CEST, mlindner@marvell.com wrote:
>>Mon, Sep 17, 2012 at 06:12:14PM CEST, shemminger@vyatta.com wrote:
>>>On Mon, 17 Sep 2012 17:10:17 +0200
>>>Jiri Pirko <jiri@resnulli.us> wrote:
>>>
>>>> In my case I have following problem. sky2_set_multicast() sets registers
>>>> GM_MC_ADDR_H[1-4] correctly to:
>>>> 0000 0800 0001 0410
>>>> However, when adapter gets link and sky2_link_up() is called, the values
>>>> are for some reason different:
>>>> 0000 0800 0016 0410
>>>
>>>Rather than papering over the problem, it would be better to
>>>trace back what is setting those registers and fix that code.
>
>>Yes, I did that. No code at sky2.[ch] is writing to this registers other
>>than sky2_set_multicast() and sky2_gmac_reset() (I hooked on sky2_write*()).
>>So I strongly believe this is a HW issue (maybe only issue of my revision
>>"Yukon-2 EC chip revision 2")
>
>I would like to check the registers as soon as I'm back in my office next week and report my findings.
>Could you also please check the hint from Stephen?

Mirko, did you have a chance to look at this?

^ permalink raw reply

* Re: [RFC PATCH 1/2] sctp: fix a typo in prototype of __sctp_rcv_lookup()
From: Nicolas Dichtel @ 2012-10-04  7:17 UTC (permalink / raw)
  To: David Miller; +Cc: linux-sctp, vyasevich, netdev
In-Reply-To: <20121003.162847.312544751568068374.davem@davemloft.net>

Le 03/10/2012 22:28, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Wed,  3 Oct 2012 17:43:21 +0200
>
>> Just to avoid confusion when people only reads this prototype.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
> I think we should apply this one, regardless of what happens
> to patch #2 in this series.
>
Yes, I catch it when working on the second patch, but I should send two seperate 
patches.

^ permalink raw reply

* Good News---Please Reply Now.
From: roberta.miller @ 2012-10-04  4:57 UTC (permalink / raw)


Greeting, I want you to stand as next of kin so that we can transfer very
huge amount of money and I have all the legal documents to back it up.

Regards,
Barrister Ulrich Claypole.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox