Netdev List
 help / color / mirror / Atom feed
* Re: netconf notes and materials
From: David Miller @ 2010-11-08  1:53 UTC (permalink / raw)
  To: roszenrami; +Cc: netdev
In-Reply-To: <AANLkTimZFfft66L983BbOXb4+LN9yzx2e3=t-kfQyM-_@mail.gmail.com>

From: Rami Rosen <roszenrami@gmail.com>
Date: Sun, 7 Nov 2010 21:06:20 +0200

> David,
> 1)  Great, thanks for the link!
> 
> 2)  Regarding your "Linux Networking Futures 2010" slides :
>  You mention in the fifth slide :
> "XFS is in review state, 2.6.38 likely'.
> 
> I suppose you probably mean "XPS" patches,
> the transmit Packet Steering patches by
> Tom Herbert. Or am I wrong and don't know something ?

You're correct, and the typo is mine :-)

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: David Miller @ 2010-11-08  2:01 UTC (permalink / raw)
  To: andi
  Cc: drosenberg, chas3, tytso, torvalds, kuznet, pekkas, jmorris,
	yoshfuji, kaber, remi.denis-courmont, netdev, security
In-Reply-To: <20101107235610.GE17592@basil.fritz.box>

From: Andi Kleen <andi@firstfloor.org>
Date: Mon, 8 Nov 2010 00:56:10 +0100

> I would just remove the pointers from /proc and supply 
> gdb macros that extract the equivalent information from /proc/kcore.
> This is a bit racy, but for debugging it should be no
> problem to run them multiple times as needed.

I do not think at all that this is tenable for the kind of
things people use the socket pointers for when debugging
problems.

I defeinitely prefer the inode number to this idea.

^ permalink raw reply

* Re: [ovs-dev] Flow Control and Port Mirroring
From: Rusty Russell @ 2010-11-08  3:11 UTC (permalink / raw)
  To: virtualization
  Cc: dev, kvm, Michael S. Tsirkin, netdev, Jesse Gross, virtualization
In-Reply-To: <20101030025932.GG12842@verge.net.au>

On Sat, 30 Oct 2010 01:29:33 pm Simon Horman wrote:
> [ CCed VHOST contacts ]
> 
> On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
> > On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman <horms@verge.net.au> wrote:
> > > My reasoning is that in the non-mirroring case the guest is
> > > limited by the external interface through wich the packets
> > > eventually flow - that is 1Gbit/s. But in the mirrored either
> > > there is no flow control or the flow control is acting on the
> > > rate of dummy0, which is essentailly infinate.
> > >
> > > Before investigating this any further I wanted to ask if
> > > this behaviour is intentional.
> > 
> > It's not intentional but I can take a guess at what is happening.
> > 
> > When we send the packet to a mirror, the skb is cloned but only the
> > original skb is charged to the sender.  If the original packet is
> > delivered to localhost then it will be freed quickly and no longer
> > accounted for, despite the fact that the "real" packet is still
> > sitting in the transmit queue on the NIC.  The UDP stack will then
> > send the next packet, limited only by the speed of the CPU.
> 
> That would explain what I have observed.

I can't find the thread (what is ovs-dev?), but I think the tap device
has this fundamental feature: you can blast as many packets as you want
through it.

If that's a bad thing, we have to look harder...

Cheers,
Rusty.

^ permalink raw reply

* Re: Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
From: Stephen Hemminger @ 2010-11-08  3:13 UTC (permalink / raw)
  To: Michael Breuer; +Cc: Stephen Hemminger, Jarek Poplawski, David Miller, netdev
In-Reply-To: <4CD58911.3050201@majjas.com>

On Sat, 06 Nov 2010 12:57:53 -0400
Michael Breuer <mbreuer@majjas.com> wrote:

> Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang. 
> As with the earlier issue the effects are not seen until after a couple 
> days of uptime and seem exacerbated by load.
> 
> I can't 100% confirm that the problem is not occurring without tcp 
> timestamps, but will leave the system up for a while to try to confirm. 
> This didn't occur previously without tcp timestamps enabled, but I also 
> pulled git changes between the two events.
> 
> I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any 
> obvious commits between 2.6.36-09934 and -rc1 that would have affected this.
> 
>  From the log:
> Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
> Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
> fault addr ffea3000
> Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is 
> not set
> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt 
> status=0x80000000
> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
> Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
> Nov  2 05:42:11 mail root: ping of potter failed
> Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
> Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258 
> dev_watchdog+0x251/0x260()
> Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
> Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
> queue 0 timed out
> Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats 
> ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat 
> nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp 
> appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp 
> sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
> nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm 
> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec 
> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device 
> snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat 
> i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110 
> i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc 
> microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
> async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi 
> firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm 
> i2c_algo_bit i2c_core video output [
> Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
> Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W   
> 2.6.36-09934-g2aab243 #44
> Nov  2 05:42:16 mail kernel: Call Trace:
> Nov  2 05:42:16 mail kernel: <IRQ>  [<ffffffff81058a4f>] 
> warn_slowpath_common+0x7f/0xc0
> Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>] 
> warn_slowpath_fmt+0x46/0x50
> Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
> Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ? 
> tick_program_event+0x26/0x30
> Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ? 
> hrtimer_interrupt+0x134/0x240
> Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>] 
> run_timer_softirq+0x160/0x390
> Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ? 
> tick_dev_program_event+0x48/0x110
> Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
> Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
> Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
> Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
> Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
> Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
> Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
> Nov  2 05:42:16 mail kernel: <EOI>  [<ffffffff812e4165>] ? 
> intel_idle+0xd5/0x170
> Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
> Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>] 
> cpuidle_idle_call+0x91/0x150
> Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
> Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
> Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>] 
> x86_64_start_reservations+0x131/0x135
> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>] 
> x86_64_start_kernel+0x106/0x115
> Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29 
> .. 117 report=29 done=29
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks like a hardware issue, never saw it before.
Are you running MTU > 1500?
Does turning off TSO help?

One possibility is that NET_IP_ALIGN changed. Now the ethernet header is
aligned and the IP header is not.



^ permalink raw reply

* Re: Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
From: Michael Breuer @ 2010-11-08  3:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephen Hemminger, Jarek Poplawski, David Miller, netdev
In-Reply-To: <20101107191304.4a6cdfa4@s6510>

On 11/7/2010 10:13 PM, Stephen Hemminger wrote:
> On Sat, 06 Nov 2010 12:57:53 -0400
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>> Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang.
>> As with the earlier issue the effects are not seen until after a couple
>> days of uptime and seem exacerbated by load.
>>
>> I can't 100% confirm that the problem is not occurring without tcp
>> timestamps, but will leave the system up for a while to try to confirm.
>> This didn't occur previously without tcp timestamps enabled, but I also
>> pulled git changes between the two events.
>>
>> I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any
>> obvious commits between 2.6.36-09934 and -rc1 that would have affected this.
>>
>>   From the log:
>> Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
>> Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
>> fault addr ffea3000
>> Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is
>> not set
>> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt
>> status=0x80000000
>> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
>> Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
>> Nov  2 05:42:11 mail root: ping of potter failed
>> Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
>> Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258
>> dev_watchdog+0x251/0x260()
>> Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
>> Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
>> queue 0 timed out
>> Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats
>> ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat
>> nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp
>> appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp
>> sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
>> nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6
>> nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm
>> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec
>> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device
>> snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat
>> i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110
>> i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc
>> microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor
>> async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi
>> firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm
>> i2c_algo_bit i2c_core video output [
>> Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
>> Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W
>> 2.6.36-09934-g2aab243 #44
>> Nov  2 05:42:16 mail kernel: Call Trace:
>> Nov  2 05:42:16 mail kernel:<IRQ>   [<ffffffff81058a4f>]
>> warn_slowpath_common+0x7f/0xc0
>> Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>]
>> warn_slowpath_fmt+0x46/0x50
>> Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
>> Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ?
>> tick_program_event+0x26/0x30
>> Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ?
>> hrtimer_interrupt+0x134/0x240
>> Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>]
>> run_timer_softirq+0x160/0x390
>> Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ?
>> tick_dev_program_event+0x48/0x110
>> Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
>> Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
>> Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
>> Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
>> Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
>> Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
>> Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
>> Nov  2 05:42:16 mail kernel:<EOI>   [<ffffffff812e4165>] ?
>> intel_idle+0xd5/0x170
>> Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
>> Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>]
>> cpuidle_idle_call+0x91/0x150
>> Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
>> Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
>> Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
>> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>]
>> x86_64_start_reservations+0x131/0x135
>> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>]
>> x86_64_start_kernel+0x106/0x115
>> Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
>> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
>> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29
>> .. 117 report=29 done=29
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Looks like a hardware issue, never saw it before.
> Are you running MTU>  1500?
> Does turning off TSO help?
>
> One possibility is that NET_IP_ALIGN changed. Now the ethernet header is
> aligned and the IP header is not.
>
MTU=1500
TCP timestamps seems to be the culprit - no issues with it disabled. I 
hit the problem after running about 18 hours with TCP timestamps 
enabled. Has been stable since rebuilding without timestamps... but 
another day would be more telling.

Didn't look into the header alignment - but would that be inconsistent 
with tcp timestamps being involved?

^ permalink raw reply

* [PATCH] net dst: need linux/cache.h for ____cacheline_aligned_in_smp.
From: Paul Mundt @ 2010-11-08  3:51 UTC (permalink / raw)
  To: David Miller; +Cc: John W. Linville, netdev

Presently the b43legacy build fails on an sh randconfig:

In file included from include/net/dst.h:12,
                 from drivers/net/wireless/b43legacy/xmit.c:32:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
make[5]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[5]: *** Waiting for unfinished jobs....

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

---

 include/net/dst_ops.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 1fa5306..51665b3 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -2,6 +2,7 @@
 #define _NET_DST_OPS_H
 #include <linux/types.h>
 #include <linux/percpu_counter.h>
+#include <linux/cache.h>
 
 struct dst_entry;
 struct kmem_cachep;

^ permalink raw reply related

* Re: [PATCH] net dst: need linux/cache.h for ____cacheline_aligned_in_smp.
From: David Miller @ 2010-11-08  3:58 UTC (permalink / raw)
  To: lethal; +Cc: linville, netdev
In-Reply-To: <20101108035130.GA11477@linux-sh.org>

From: Paul Mundt <lethal@linux-sh.org>
Date: Mon, 8 Nov 2010 12:51:30 +0900

> Presently the b43legacy build fails on an sh randconfig:
 ...
> Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Applied, thanks Paul.

^ permalink raw reply

* Re: [ovs-dev] Flow Control and Port Mirroring
From: Simon Horman @ 2010-11-08  4:59 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, Jesse Gross, dev, virtualization, netdev, kvm,
	Michael S. Tsirkin
In-Reply-To: <201011081341.23529.rusty@rustcorp.com.au>

On Mon, Nov 08, 2010 at 01:41:23PM +1030, Rusty Russell wrote:
> On Sat, 30 Oct 2010 01:29:33 pm Simon Horman wrote:
> > [ CCed VHOST contacts ]
> > 
> > On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
> > > On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman <horms@verge.net.au> wrote:
> > > > My reasoning is that in the non-mirroring case the guest is
> > > > limited by the external interface through wich the packets
> > > > eventually flow - that is 1Gbit/s. But in the mirrored either
> > > > there is no flow control or the flow control is acting on the
> > > > rate of dummy0, which is essentailly infinate.
> > > >
> > > > Before investigating this any further I wanted to ask if
> > > > this behaviour is intentional.
> > > 
> > > It's not intentional but I can take a guess at what is happening.
> > > 
> > > When we send the packet to a mirror, the skb is cloned but only the
> > > original skb is charged to the sender.  If the original packet is
> > > delivered to localhost then it will be freed quickly and no longer
> > > accounted for, despite the fact that the "real" packet is still
> > > sitting in the transmit queue on the NIC.  The UDP stack will then
> > > send the next packet, limited only by the speed of the CPU.
> > 
> > That would explain what I have observed.
> 
> I can't find the thread (what is ovs-dev?),

Sorry, yes its on ovs-dev.
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html

> but I think the tap device
> has this fundamental feature: you can blast as many packets as you want
> through it.
> 
> If that's a bad thing, we have to look harder...

There does seem to be flow control in the non-mirrored case.
So I suspect its occurring at the skb level but that breaks down when
a clone occurs. It would seem that fragment level flow control would
help this problem (which is basically what Xen's netback/netfront has),
but by this point I am speculating wildly.  I'll try and find out exactly
where the problem is occurring in order for us to have a more informed
discussion.

^ permalink raw reply

* how to read one udp packet with more than one recvfrom() calls?
From: ranjith kumar @ 2010-11-08  7:08 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

Hi,

I  have implemented client and server programs using udp
protocol(files are attached).
UDP packet size is 500bytes.

I want to read these 500bytes in two calls to recvfrom(). First time
reading 100bytes and second time 400bytes.
How to do this?

When I tried to change the third argument of recvfrom(size_t len),
from 500 to 100, first 100bytes are read correctly.
But when I call recvfrom() second time with len=400, it is reading the
first 400bytes of "next udp packet".
Why? Isn't it possible to read one udp packet in two calls to
recvfrom()/read()????

Thanks in advance.

[-- Attachment #2: client.c --]
[-- Type: application/octet-stream, Size: 1169 bytes --]

#include<stdio.h>
#include <sys/types.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/time.h>
#include <stdio.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>

#define BUFLEN 500
#define PORT  5000
#define NPACK  5

 
#define SRV_IP "107.109.38.32"
 /* fprintf(stdout,), #includes and #defines like in the server */

 int main(void)
 {
   struct sockaddr_in si_other;
   int s, i, slen=sizeof(si_other);
   char buf[BUFLEN];

   if ((s=socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP))==-1)
     fprintf(stdout,"socket");

   memset((char *) &si_other, 0, sizeof(si_other));
   si_other.sin_family = AF_INET;
   si_other.sin_port = htons(PORT);
   if (inet_aton(SRV_IP, &si_other.sin_addr)==0) {
     fprintf(stderr, "inet_aton() failed\n");
     exit(1);
   }

   for (i=0; i<NPACK; i++) {
     printf("Sending packet %d\n", i);
     sprintf(buf, "This is packet %d\n", i);
//	write(s,buf,BUFLEN);
     if (sendto(s, buf, BUFLEN, 0, &si_other, slen)==-1)
      fprintf(stdout,"sendto()");
   }

   close(s);
   return 0;
 }


[-- Attachment #3: server.c --]
[-- Type: application/octet-stream, Size: 1083 bytes --]

#include<stdio.h>
#include <sys/types.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/time.h>
#include <stdio.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>

#define BUFLEN 500
#define PORT  5000
#define NPACK  5



void diep(char *s)
{
	perror(s);
	exit(1);
}

int main(void)
{
	struct sockaddr_in si_me, si_other;
	int s, i, slen=sizeof(si_other);
	char buf[BUFLEN];

	if ((s=socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP))==-1)
		diep("socket");

	memset((char *) &si_me, 0, sizeof(si_me));
	si_me.sin_family = AF_INET;
	si_me.sin_port = htons(PORT);
	si_me.sin_addr.s_addr = htonl(INADDR_ANY);
	if (bind(s, &si_me, sizeof(si_me))==-1)
		diep("bind");

	for (i=0; i<NPACK; i++) {
		if (recvfrom(s, buf, BUFLEN, 0, &si_other, &slen)==-1)
			diep("recvfrom()");
//	read(s,buf,BUFLEN);
		printf("Received packet from %s:%d\nData: %s\n\n", 
				inet_ntoa(si_other.sin_addr), ntohs(si_other.sin_port), buf);
	}

	close(s);
	return 0;
}

^ permalink raw reply

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Simon Horman @ 2010-11-08  7:31 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, netdev, Herbert Xu
In-Reply-To: <20101106142817.GA27212@verge.net.au>

[ CCing Herbet Xu ]

On Sat, Nov 06, 2010 at 11:28:21PM +0900, Simon Horman wrote:
> On Sat, Nov 06, 2010 at 04:18:21PM +0200, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Sat, 6 Nov 2010, Simon Horman wrote:
> > 
> > >This is a first attempt at allowing LVS to transmit
> > >skbs of greater than MTU length that have been aggregated by GRO.
> > >
> > >I have lightly tested the ip_vs_dr_xmit() portion of this patch and
> > >although it seems to work I am unsure that netif_needs_gso() is the correct
> > >test to use.
> > 
> > 	ip_forward() uses !skb_is_gso(skb), so may be it is
> > enough to check for GRO instead of using netif_needs_gso?
> 
> Thanks, I'll look into that.

Hi Julian,

just to clarify, you think that !skb_is_gso(skb) should be
used in ip_vs_xmit.c? If so, yes I think that makes sense
and I'll re-spin my patch accordingly.

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Eric Dumazet @ 2010-11-08  7:33 UTC (permalink / raw)
  To: David Miller
  Cc: andi, drosenberg, chas3, tytso, torvalds, kuznet, pekkas, jmorris,
	yoshfuji, kaber, remi.denis-courmont, netdev, security
In-Reply-To: <20101107.180108.71121019.davem@davemloft.net>

Le dimanche 07 novembre 2010 à 18:01 -0800, David Miller a écrit :
> From: Andi Kleen <andi@firstfloor.org>
> Date: Mon, 8 Nov 2010 00:56:10 +0100
> 
> > I would just remove the pointers from /proc and supply 
> > gdb macros that extract the equivalent information from /proc/kcore.
> > This is a bit racy, but for debugging it should be no
> > problem to run them multiple times as needed.
> 
> I do not think at all that this is tenable for the kind of
> things people use the socket pointers for when debugging
> problems.
> 
> I defeinitely prefer the inode number to this idea.

We currently have no guarantee of sockets inode numbers unicity.
I admit chances of clash are low.

When a printk() happens right before a BUG(), how are we going to check
the dumped registers are possibly close the socket involved, if we dont
have access to the machine, and only the crashlog ?

BTW, any local user can look at "dmesg", and crash reports. These
reports are even published on a remote site (bugzilla) so that hostile
hackers can be feeded.

I am OK to delete socket pointers from /proc files for non root users
(after checking things like lsof continue to work correctly).
I dont remember using them while doing debugging stuff.

BTW, rtnetlink also expose socket pointers to non root users :

$ ss -e dst 192.168.20.108
State      Recv-Q Send-Q    Local Address:Port    Peer Address:Port   
ESTAB      0      0         10.150.51.210:46979   192.168.20.108:ssh 
timer:(keepalive,119min,0) ino:136919 sk:ffff88002129d7c0


Mixing in same patch /proc pointers removal and printk() pointers
removal seems wrong to me. Very different problems.




^ permalink raw reply

* RE: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
From: Xin, Xiaohui @ 2010-11-08  7:43 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	herbert@gondor.apana.org.au, jdike@linux.intel.com
In-Reply-To: <20101029.132836.115944599.davem@davemloft.net>

I have addressed this issue in v14 patch set.

Thanks
Xiaohui

>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Saturday, October 30, 2010 4:29 AM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
>
>From: "Xin, Xiaohui" <xiaohui.xin@intel.com>
>Date: Wed, 27 Oct 2010 09:33:12 +0800
>
>> Somehow, it seems not a trivial work to support it now. Can we support it
>> later and as a todo with our current work?
>
>I would prefer the feature work properly, rather than only in specific
>cases, before being integated.

^ permalink raw reply

* Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: xiaohui.xin @ 2010-11-08  8:03 UTC (permalink / raw)
  To: eric.dumazet, netdev, kvm, linux-kernel, mst, mingo, davem,
	herbert, jdi
  Cc: Xin Xiaohui
In-Reply-To: <1288861663.2659.47.camel@edumazet-laptop>

From: Xin Xiaohui <xiaohui.xin@intel.com>

>> Hmm, I suggest you read the comment two lines above.
>>
>> If destructor_arg is now cleared each time we allocate a new skb, then,
>> please move it before dataref in shinfo structure, so that the following
>> memset() does the job efficiently...
>
>
>Something like :
>
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index e6ba898..2dca504 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -195,6 +195,9 @@ struct skb_shared_info {
> 	__be32          ip6_frag_id;
> 	__u8		tx_flags;
> 	struct sk_buff	*frag_list;
>+	/* Intermediate layers must ensure that destructor_arg
>+	 * remains valid until skb destructor */
>+	void		*destructor_arg;
> 	struct skb_shared_hwtstamps hwtstamps;
>
> 	/*
>@@ -202,9 +205,6 @@ struct skb_shared_info {
> 	 */
> 	atomic_t	dataref;
>
>-	/* Intermediate layers must ensure that destructor_arg
>-	 * remains valid until skb destructor */
>-	void *		destructor_arg;
> 	/* must be last field, see pskb_expand_head() */
> 	skb_frag_t	frags[MAX_SKB_FRAGS];
> };
>
>

Will that affect the cache line?
Or, we can move the line to clear destructor_arg to the end of __alloc_skb().
It looks like as the following, which one do you prefer?

Thanks
Xiaohui

---
 net/core/skbuff.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c83b421..df852f2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -224,6 +224,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 		child->fclone = SKB_FCLONE_UNAVAILABLE;
 	}
+	shinfo->destructor_arg = NULL;
 out:
 	return skb;
 nodata:
@@ -343,6 +344,13 @@ static void skb_release_data(struct sk_buff *skb)
 		if (skb_has_frags(skb))
 			skb_drop_fraglist(skb);
 
+		if (skb->dev && dev_is_mpassthru(skb->dev)) {
+			struct skb_ext_page *ext_page =
+				skb_shinfo(skb)->destructor_arg;
+			if (ext_page && ext_page->dtor)
+				ext_page->dtor(ext_page);
+		}
+
 		kfree(skb->head);
 	}
 }
-- 
1.7.3

^ permalink raw reply related

* Re: [PATCH 0/9] Fix leaking of kernel heap addresses in net/
From: Rémi Denis-Courmont @ 2010-11-08  8:04 UTC (permalink / raw)
  To: ext Dan Rosenberg
  Cc: chas@cmf.nrl.navy.mil, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org,
	kaber@trash.net, netdev@vger.kernel.org, security@kernel.org,
	stable@kernel.org
In-Reply-To: <1289147492.3090.137.camel@Dan>

On Sunday 07 November 2010 18:31:32 ext Dan Rosenberg, you wrote:
> This patch series resolves the leakage of kernel heap addresses to
> userspace via network protocol /proc interfaces and public error
> messages.  Revealing this information is a bad idea from a security
> perspective for a number of reasons, the most obvious of which is it
> provides unprivileged users a mechanism by which to create a structure
> in the kernel heap containing function pointers, obtain the address of
> that structure, and overwrite those function pointers by leveraging
> other vulnerabilities.  It is my hope that by eliminating this
> information leakage, in conjunction with making statically-declared
> function pointer tables read-only (to be done in a separate patch
> series), we can at least add a small hurdle for the exploitation of a
> subset of kernel vulnerabilities.

Seems like this patch series is incomplete to me as far as /proc/net is 
concerned.

-- 
Rémi Denis-Courmont
Nokia Devices R&D, Maemo Software, Helsinki

^ permalink raw reply

* Re: [PATCH] firewire: net: rate-limit log spam at transmit failure
From: Stefan Richter @ 2010-11-08  8:12 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux1394-devel, netdev
In-Reply-To: <1289180517.4318.3.camel@maxim-laptop>

Maxim Levitsky wrote:
> But why the timeout is  never set?

It is set to the default 0.1s per IEEE 1394 in core-card.c::fw_card_initialize.

If card->split_timeout_jiffies or card->split_timeout_cycles /ever/ become
zero, then only due to a memory corrupting bug.

> Also, note that I see here that if I send a TCP stream from one system
> to another then the system that recieves the packets (and sends TCP
> acks), still overflows the queue (error 10, and confirmed by printks).

OK, I'll send a stricter version of "firewire: net: throttle TX queue before
running out of tlabels".
-- 
Stefan Richter
-=====-==-=- =-== -=---
http://arcgraph.de/sr/

^ permalink raw reply

* Re: Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: Eric Dumazet @ 2010-11-08  8:24 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike
In-Reply-To: <1289203430-5935-1-git-send-email-xiaohui.xin@intel.com>

Le lundi 08 novembre 2010 à 16:03 +0800, xiaohui.xin@intel.com a écrit :
> From: Xin Xiaohui <xiaohui.xin@intel.com>
> 
> >> Hmm, I suggest you read the comment two lines above.
> >>
> >> If destructor_arg is now cleared each time we allocate a new skb, then,
> >> please move it before dataref in shinfo structure, so that the following
> >> memset() does the job efficiently...
> >
> >
> >Something like :
> >
> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >index e6ba898..2dca504 100644
> >--- a/include/linux/skbuff.h
> >+++ b/include/linux/skbuff.h
> >@@ -195,6 +195,9 @@ struct skb_shared_info {
> > 	__be32          ip6_frag_id;
> > 	__u8		tx_flags;
> > 	struct sk_buff	*frag_list;
> >+	/* Intermediate layers must ensure that destructor_arg
> >+	 * remains valid until skb destructor */
> >+	void		*destructor_arg;
> > 	struct skb_shared_hwtstamps hwtstamps;
> >
> > 	/*
> >@@ -202,9 +205,6 @@ struct skb_shared_info {
> > 	 */
> > 	atomic_t	dataref;
> >
> >-	/* Intermediate layers must ensure that destructor_arg
> >-	 * remains valid until skb destructor */
> >-	void *		destructor_arg;
> > 	/* must be last field, see pskb_expand_head() */
> > 	skb_frag_t	frags[MAX_SKB_FRAGS];
> > };
> >
> >
> 
> Will that affect the cache line?

What do you mean ?

> Or, we can move the line to clear destructor_arg to the end of __alloc_skb().
> It looks like as the following, which one do you prefer?
> 
> Thanks
> Xiaohui
> 
> ---
>  net/core/skbuff.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c83b421..df852f2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -224,6 +224,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  
>  		child->fclone = SKB_FCLONE_UNAVAILABLE;
>  	}
> +	shinfo->destructor_arg = NULL;
>  out:
>  	return skb;
>  nodata:

I dont understand why you want to do this.

This adds an instruction, makes code bigger, and no obvious gain for me,
at memory transactions side.

If integrated in the existing memset(), cost is an extra iteration to
perform the clear of this field.



^ permalink raw reply

* RE: Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: Xin, Xiaohui @ 2010-11-08  8:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
	davem@davemloft.net, herbert@gondor.apana.org.au,
	jdike@linux.intel.com
In-Reply-To: <1289204686.2478.375.camel@edumazet-laptop>

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Monday, November 08, 2010 4:25 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; davem@davemloft.net; herbert@gondor.apana.org.au;
>jdike@linux.intel.com
>Subject: Re: Re:[PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
>
>Le lundi 08 novembre 2010 à 16:03 +0800, xiaohui.xin@intel.com a écrit :
>> From: Xin Xiaohui <xiaohui.xin@intel.com>
>>
>> >> Hmm, I suggest you read the comment two lines above.
>> >>
>> >> If destructor_arg is now cleared each time we allocate a new skb, then,
>> >> please move it before dataref in shinfo structure, so that the following
>> >> memset() does the job efficiently...
>> >
>> >
>> >Something like :
>> >
>> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> >index e6ba898..2dca504 100644
>> >--- a/include/linux/skbuff.h
>> >+++ b/include/linux/skbuff.h
>> >@@ -195,6 +195,9 @@ struct skb_shared_info {
>> > 	__be32          ip6_frag_id;
>> > 	__u8		tx_flags;
>> > 	struct sk_buff	*frag_list;
>> >+	/* Intermediate layers must ensure that destructor_arg
>> >+	 * remains valid until skb destructor */
>> >+	void		*destructor_arg;
>> > 	struct skb_shared_hwtstamps hwtstamps;
>> >
>> > 	/*
>> >@@ -202,9 +205,6 @@ struct skb_shared_info {
>> > 	 */
>> > 	atomic_t	dataref;
>> >
>> >-	/* Intermediate layers must ensure that destructor_arg
>> >-	 * remains valid until skb destructor */
>> >-	void *		destructor_arg;
>> > 	/* must be last field, see pskb_expand_head() */
>> > 	skb_frag_t	frags[MAX_SKB_FRAGS];
>> > };
>> >
>> >
>>
>> Will that affect the cache line?
>
>What do you mean ?
>
>> Or, we can move the line to clear destructor_arg to the end of __alloc_skb().
>> It looks like as the following, which one do you prefer?
>>
>> Thanks
>> Xiaohui
>>
>> ---
>>  net/core/skbuff.c |    8 ++++++++
>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index c83b421..df852f2 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -224,6 +224,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>>
>>  		child->fclone = SKB_FCLONE_UNAVAILABLE;
>>  	}
>> +	shinfo->destructor_arg = NULL;
>>  out:
>>  	return skb;
>>  nodata:
>
>I dont understand why you want to do this.
>
>This adds an instruction, makes code bigger, and no obvious gain for me,
>at memory transactions side.
>
>If integrated in the existing memset(), cost is an extra iteration to
>perform the clear of this field.
>
Ok. Thanks for this explanation and will update with your solution.

Thanks
Xiaohui



^ permalink raw reply

* Re: [PATCH 1/1] UDEV - Add 'udevlom' command line param to start_udev
From: Sujit K M @ 2010-11-08  8:42 UTC (permalink / raw)
  To: Matt Domsch
  Cc: Greg KH, K, Narendra, linux-hotplug@vger.kernel.org,
	netdev@vger.kernel.org, Hargrave, Jordan, Rose, Charles
In-Reply-To: <20101105025848.GA14021@pws490.domsch.com>

> At Linux Plumbers Conference today, this problem space was discussed
> once again, and I believe concensus on approach was reached.  Here
> goes:

Was the patch a starting point for the discussion.

> * If a 70-persistent-net.rules file sets a name, honor that.  This
>  preserves existing installs.
>
> * If BIOS provides indexes for onboard devices, honor that.
> ** Rename onboard NICs "lom[1-N]" as BIOS reports (# matches chassis labels)
> ** No rename for all others "ethX" (no change for NICs in PCI slots/USB/others)
>
> * If neither are true, do not rename at all.

I would like to know what is the difference in the nomenclature for this.

>
> * Implementation will be:
> ** Udev rules to be included in upstream udev will read the index
>   value from sysfs (provided by SMBIOS 2.6 info on kernels >= 2.6.36,
>   PCI DSM info at some future point) if present, and rename LOMs
>   based on that index value.  Distros will use these rules by default
>   (Ubuntu and Fedora maintainers on board with the concept; I have
>   not spoken with other distros yet.)
> ** Legacy distros with older udev rules will invoke biosdevname on
>   kernels < 2.6.36 to get the same information, if present, and will
>   rename LOMs based on index value.

How will you manage these scenarios.
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] pktgen: correct uninitialized queue_map
From: Junchang Wang @ 2010-11-08  9:19 UTC (permalink / raw)
  To: davem, robert.olsson, eric.dumazet, joe, andy.shevchenko, backyes; +Cc: netdev


This fix a bug reported by backyes.
Right the first time pktgen's using queue_map that's not been initialized
by set_cur_queue_map(pkt_dev);

Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Backyes <backyes@mail.ustc.edu.cn>
---

 net/core/pktgen.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2c0df0f..564d9ba 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2611,8 +2611,8 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	/* Update any of the values, used when we're incrementing various
 	 * fields.
 	 */
-	queue_map = pkt_dev->cur_queue_map;
 	mod_cur_headers(pkt_dev);
+	queue_map = pkt_dev->cur_queue_map;
 
 	datalen = (odev->hard_header_len + 16) & ~0xf;
 
@@ -2975,8 +2975,8 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	/* Update any of the values, used when we're incrementing various
 	 * fields.
 	 */
-	queue_map = pkt_dev->cur_queue_map;
 	mod_cur_headers(pkt_dev);
+	queue_map = pkt_dev->cur_queue_map;
 
 	skb = __netdev_alloc_skb(odev,
 				 pkt_dev->cur_pkt_size + 64
--

--Junchang

^ permalink raw reply related

* Re: [RFC v2] ipvs: allow transmit of GRO aggregated skbs
From: Julian Anastasov @ 2010-11-08  9:36 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Herbert Xu
In-Reply-To: <20101108073149.GA31384@verge.net.au>


 	Hello,

On Mon, 8 Nov 2010, Simon Horman wrote:

> [ CCing Herbet Xu ]
>
>>>> This is a first attempt at allowing LVS to transmit
>>>> skbs of greater than MTU length that have been aggregated by GRO.
>>>>
>>>> I have lightly tested the ip_vs_dr_xmit() portion of this patch and
>>>> although it seems to work I am unsure that netif_needs_gso() is the correct
>>>> test to use.
>>>
>>> 	ip_forward() uses !skb_is_gso(skb), so may be it is
>>> enough to check for GRO instead of using netif_needs_gso?
>>
>> Thanks, I'll look into that.
>
> Hi Julian,
>
> just to clarify, you think that !skb_is_gso(skb) should be
> used in ip_vs_xmit.c? If so, yes I think that makes sense
> and I'll re-spin my patch accordingly.

 	Yes, I think we should check for !skb_is_gso(skb)
as it looks as correct check to avoid FRAG_NEEDED after GRO
but lets wait for confirmation from Herbert Xu.
Also, !skb->local_df check should help for local IPVS clients
that set local_df.

 	If you prefer you can create such helper in ip_vs_xmit.c:

/* Check if packet exceeds MTU */
static inline int ip_vs_mtu_exceeded(struct sk_buff *skb, unsigned int mtu)
{
 	return skb->len > mtu && !skb_is_gso(skb) && !skb->local_df;
}

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Andi Kleen @ 2010-11-08  9:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, andi, drosenberg, chas3, tytso, torvalds, kuznet,
	pekkas, jmorris, yoshfuji, kaber, remi.denis-courmont, netdev,
	security
In-Reply-To: <1289201612.2478.371.camel@edumazet-laptop>

> When a printk() happens right before a BUG(), how are we going to check
> the dumped registers are possibly close the socket involved, if we dont
> have access to the machine, and only the crashlog ?

Is that really something you do regularly? It seems highly obscure
to me.

Besides if the kernel has timestamps enabled you can easily
guess based on the timestamps if the printk and the oops
are related.

-Andi

^ permalink raw reply

* Re: [Security] [SECURITY] Fix leaking of kernel heap addresses via /proc
From: Eric Dumazet @ 2010-11-08 10:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Miller, drosenberg, chas3, tytso, torvalds, kuznet, pekkas,
	jmorris, yoshfuji, kaber, remi.denis-courmont, netdev, security
In-Reply-To: <20101108094358.GA22069@basil.fritz.box>

Le lundi 08 novembre 2010 à 10:43 +0100, Andi Kleen a écrit :
> > When a printk() happens right before a BUG(), how are we going to check
> > the dumped registers are possibly close the socket involved, if we dont
> > have access to the machine, and only the crashlog ?
> 
> Is that really something you do regularly? It seems highly obscure
> to me.

Yes, very regularly, I can find bugs thanks to every bit of information
found in kernel logs, including code around the fault.

If people now say : "I have a kernel bug, but am not able to provide you
a kernel stack trace and previous printk() messages because of security.
You cannot have an access to this machine, and the bug happens once in a
while. Kernel version is also hidden. Please help me."

Oh well, thats a challenge, maybe use this cristal ball I have somewhere
in the attic ;)




^ permalink raw reply

* [PATCH] ucc_geth: Fix hung tasks.
From: Joakim Tjernlund @ 2010-11-08 10:23 UTC (permalink / raw)
  To: linuxppc-dev, netdev, Anton Vorontsov; +Cc: Joakim Tjernlund

We noticed a few hangs like this:

INFO: task ifconfig:572 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ifconfig      D 0ff65760     0   572    369 0x00000000
Call Trace:
[c6157be0] [c6008460] 0xc6008460 (unreliable)
[c6157ca0] [c0008608] __switch_to+0x4c/0x6c
[c6157cb0] [c028fecc] schedule+0x184/0x310
[c6157ce0] [c0290e54] __mutex_lock_slowpath+0xa4/0x150
[c6157d20] [c0290c48] mutex_lock+0x44/0x48
[c6157d30] [c01aba74] phy_stop+0x20/0x70
[c6157d40] [c01aef40] ucc_geth_stop+0x30/0x98
[c6157d60] [c01b18fc] ucc_geth_close+0x9c/0xdc
[c6157d80] [c01db0cc] __dev_close+0xa0/0xd0
[c6157d90] [c01deddc] __dev_change_flags+0x8c/0x148
[c6157db0] [c01def54] dev_change_flags+0x1c/0x64
[c6157dd0] [c0237ac8] devinet_ioctl+0x678/0x784
[c6157e50] [c0239a58] inet_ioctl+0xb0/0xbc
[c6157e60] [c01cafa8] sock_ioctl+0x174/0x2a0
[c6157e80] [c009a16c] vfs_ioctl+0xcc/0xe0
[c6157ea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
[c6157f10] [c009b0b0] sys_ioctl+0x40/0x74
[c6157f40] [c00117c4] ret_from_syscall+0x0/0x38

I THINK this is due to a missing cancel_work_sync in the driver
although we cannot be sure. I found this by comparing
ucc_geth with gianfar.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 drivers/net/ucc_geth.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 97f9f7d..6647ed7 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3556,6 +3556,7 @@ static int ucc_geth_close(struct net_device *dev)
 
 	napi_disable(&ugeth->napi);
 
+	cancel_work_sync(&ugeth->timeout_work);
 	ucc_geth_stop(ugeth);
 
 	free_irq(ugeth->ug_info->uf_info.irq, ugeth->ndev);
-- 
1.7.2.2


^ permalink raw reply related

* Re: OOM when adding ipv6 route:  How to make available more per-cpu memory?
From: Eric Dumazet @ 2010-11-08 11:02 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, linux-kernel, Tejun Heo
In-Reply-To: <4CD58B9C.2030006@candelatech.com>

Le samedi 06 novembre 2010 à 10:08 -0700, Ben Greear a écrit :

> At least I don't see any percpu dumps in dmesg.  I vaguely remember
> someone posting some ipv6 address scalability patches some time back.
> I think they had to hack on /proc fs as well.  I'll see if I can
> dig those up.
> 
> > Make sure udev / hotplug is not the problem, if you create your devices
> > very fast.
> 
> We can create the macvlans w/out problem, though I'm sure that could
> be sped up.  The problem is when we try to add IPv6 addresses to
> them.

I see. Did you check /proc/sys/net/ipv6/ tunables ?

For example, I bet you need to make route/max_size a bigger value than
default (4096)

Following is working for me

echo 16384 >/proc/sys/net/ipv6/route/max_size
modprobe dummy numdummies=2000
for a in `seq 1 1999`
do
 ip -6 add add 4444::444:$a/24 dev dummy$a
done

ip -6 ro | wc -l
6008




^ permalink raw reply

* Re: [PATCH v4 0/2] Get and Set Feature Reports on HIDRAW (USB and Bluetooth)
From: Antonio Ospite @ 2010-11-08 11:17 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Ott, Marcel Holtmann, David S. Miller, Stefan Achatz,
	Alexey Dobriyan, Tejun Heo, Alan Stern, Greg Kroah-Hartman,
	Stephane Chatty, Michael Poole, Bastien Nocera, Eric Dumazet,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.LNX.2.00.1011011523150.15851-ztGlSCb7Y1iN3ZZ/Hiejyg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

On Mon, 1 Nov 2010 15:23:34 -0400 (EDT)
Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org> wrote:

> On Wed, 22 Sep 2010, Jiri Kosina wrote:
> 
> > > > > This is version 4. Built against 2.6.35+ revision 320b2b8de12698 .
> > > > > 
> > > > > Alan Ott (2):
> > > > >   HID: Add Support for Setting and Getting Feature Reports from hidraw
> > > > >   Bluetooth: hidp: Add support for hidraw  HIDIOCGFEATURE  and
> > > > >     HIDIOCSFEATURE
[...]
> > > ... Marcel?
> > > 
> > > I'd really like not to miss 2.6.37 merge window with this.
> > 
> > Seemingly I have not enought powers to get statement from Marcel here 
> > these days/weeks.
> > 
> > Davem, would you perhaps be able to step in here?
> 
> Marcel, any word on this patchset by chance?
> 

Hopefully Alan will manage to send a v5 sometime soon, Alan I don't want
to pressure you, just remember to CC Gustavo F. Padovan (see
MAINTAINERS) as he looks to be the most active bluetooth maintainer
these days.

Thanks,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox