* Re: [PATCH 1/2] net/netfilter/ipvs: Move #define KMSG_COMPONENT to Makefile
From: Joe Perches @ 2009-10-01 15:55 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Patrick McHardy, David S. Miller, Simon Horman, Julian Anastasov,
Netfilter Developer Mailing List, netdev,
Linux Kernel Mailing List, lvs-devel
In-Reply-To: <alpine.LSU.2.00.0910011016480.24025@obet.zrqbmnf.qr>
On Thu, 2009-10-01 at 10:27 +0200, Jan Engelhardt wrote:
> On Thursday 2009-10-01 02:50, Joe Perches wrote:
> >I imagine an eventual goal of standardizing the default
> >pr_fmt define in kernel.h to
> > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >so that all pr_<level> calls get this unless otherwise
> >specified.
>
> I like that approach. Saves me adding that line to .c
> files repeatedly.
There aren't too many existing pr_<level> calls so
that this couldn't be considered.
Files with pr_<level> without pr_fmt:
$ grep -rPl --include=*.[ch] \
"\bpr_(info|warning|err|alert|notice|crit)\b" * |
xargs grep -Lw "pr_fmt" | wc -l
569
Uses of pr_<level> without pr_fmt:
$ grep -rPl --include=*.[ch] \
"\bpr_(info|warning|err|alert|notice|crit)\b" * |
xargs grep -Lw "pr_fmt" |
xargs grep -P "\bpr_(info|warning|err|alert|notice|crit)\b" |
wc -l
2885
If you look at the pr_<levels>, it's nearly
a mechanical thing to strip the ones with
some sort of prefix and add a #define pr_fmt
to replace them. Most all of them without
prefixes might benefit by using a standardized
#define pr_fmt(etc...) in kernel.h, so the
actual count of changes isn't that high.
> >Or perhaps better, to get rid of pr_fmt(fmt) altogether and
> >have printk emit the filename/modulename, function and/or
> >code offset by using something like %pS after the level.
> I object to that. You would be spamming the dmesg ring buffer
> with all that info
Of course printks could not change, there are way too
many of those to consider doing that globally.
But the printks emitted by pr_<level> might change.
Maybe by setting a bit in the string "<level>" or by
some other mechanism.
> filename: you would have to keep filename strings in the kernel.
> Surely I do not find that thrilling when there are ~18000
> non-arch .[ch] files whose pathnames amount to 542K.
> Same goes similar for functions.
>
> modulename: obj-y files would only get "<built-in>" or something
> for KBUILD_MODNAME. Printing that to dmesg is not too useful.
The removal of KBUILD_MODNAME could only be done
for builds with CONFIG_KALLSYMS or
CONFIG_DYNAMIC_DEBUG.
It might also be possible to use something like
CONFIG_DYNAMIC_DEBUG to control which modules get
MODNAME, __func__, __LINE__ or offset emitted
by the pr_<level> via some boot/module/sysconf
or FTRACE like parameters.
cheers, Joe
^ permalink raw reply
* Re: kernel doc / docbook pdfdocs question
From: Randy Dunlap @ 2009-10-01 15:57 UTC (permalink / raw)
To: Doug Maxey; +Cc: Stephen Hemminger, netdev
In-Reply-To: <27289.1254375747@jerryjeff.riw.enoyolf.org>
On Thu, 01 Oct 2009 00:42:27 -0500 Doug Maxey wrote:
>
> On Wed, 30 Sep 2009 17:30:02 PDT, Stephen Hemminger wrote:
> >On Wed, 30 Sep 2009 14:59:36 -0500
> >Doug Maxey <dwm@enoyolf.org> wrote:
> >
> >>
> >> Randy,
> >>
> >> This may be slightly off topic for this list, but it does involve an
> >> (as yet un-released) network driver. :)
> >>
> >> Do you have any insight that could guide me toward a fix for an issue
> >> seen with some header file constructs when trying to generate a pdf
> >> docbook?
> >>
> >
> >Why clutter docbook output (which is supposed to be about general kernel
> >API's) with output for data structures in one driver.
>
> It would be a general mechanism, and it would be to document an API.
> There are other subsystems that use DECLARE_BITMAP() (e.g., scsi).
> Just none at the moment that attempt to describe such a member,
> possibly because there isn't a way to document it. Dunno. Build it
> and they will come. There is one party that is interested anyway.
>
> Finally did find where this was getting warned about / tossed, in
> kernel-doc itself. =)
Hi,
Sorry for the delayed reply. I was away yesterday.
What did you find in kernel-doc? Something like the
"cannot understand prototype" message or something else?
Features/support in kernel-doc is mostly added on an as-needed basis.
Now that you have provided a sample, I can try to add support for it,
but it's not exactly a high priority for me... or you can add support
for it to kernel-doc and send a patch for it. :)
---
~Randy
^ permalink raw reply
* Re: [PATCH] skge: use unique IRQ name
From: Stephen Hemminger @ 2009-10-01 16:06 UTC (permalink / raw)
To: Michal Schmidt; +Cc: netdev
In-Reply-To: <20091001122720.3822bdd3@leela>
On Thu, 1 Oct 2009 12:27:20 +0200
Michal Schmidt <mschmidt@redhat.com> wrote:
> Most network drivers request their IRQ when the interface is activated.
> skge does it in ->probe() instead, because it can work with two-port
> cards where the two net_devices use the same IRQ. This works fine most
> of the time, except in some situations when the interface gets renamed.
> Consider this example:
>
> 1. modprobe skge
> The card is detected as eth0 and requests IRQ 17. Directory
> /proc/irq/17/eth0 is created.
> 2. There is an udev rule which says this interface should be called
> eth1, so udev renames eth0 -> eth1.
> 3. modprobe 8139too
> The Realtek card is detected as eth0. It will be using IRQ 17 too.
> 4. ip link set eth0 up
> Now 8139too requests IRQ 17.
>
> The result is:
> WARNING: at fs/proc/generic.c:590 proc_register ...
> proc_dir_entry '17/eth0' already registered
> ...
> And "ls /proc/irq/17" shows two subdirectories, both called eth0.
>
> Fix it by using a unique name for skge's IRQ, based on the PCI address.
> The naming from the example then looks like this:
> $ grep skge /proc/interrupts
> 17: 169 IO-APIC-fasteoi skge@0000:00:0a.0, eth0
>
> irqbalance daemon will have to be taught to recognize "skge@" as an
> Ethernet interrupt. This will be a one-liner addition in classify.c. I
> will send a patch to irqbalance if this change is accepted.
>
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
>
> Index: kernel/drivers/net/skge.c
> ===================================================================
> --- kernel.orig/drivers/net/skge.c
> +++ kernel/drivers/net/skge.c
> @@ -3895,6 +3895,7 @@ static int __devinit skge_probe(struct p
> struct net_device *dev, *dev1;
> struct skge_hw *hw;
> int err, using_dac = 0;
> + size_t irq_name_len;
>
> err = pci_enable_device(pdev);
> if (err) {
> @@ -3935,11 +3936,13 @@ static int __devinit skge_probe(struct p
> #endif
>
> err = -ENOMEM;
> - hw = kzalloc(sizeof(*hw), GFP_KERNEL);
> + irq_name_len = strlen(DRV_NAME) + strlen(dev_name(&pdev->dev)) + 2;
> + hw = kzalloc(sizeof(*hw) + irq_name_len, GFP_KERNEL);
> if (!hw) {
> dev_err(&pdev->dev, "cannot allocate hardware struct\n");
> goto err_out_free_regions;
> }
> + sprintf(hw->irq_name, DRV_NAME "@%s", dev_name(&pdev->dev));
I like this with one small change. Please use:
skge@pci:0000:00:02.0
This makes the driver follow same format as existing DRM graphics drivers.
Michal could you follow up with additional patches for:
1. sky2 driver has same issue
2. irqbalance has a list of special drivers that needs to be updated
^ permalink raw reply
* Re: [PATCHv2] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01 16:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Netdev, Ori Finkalman, Ilpo Järvinen
In-Reply-To: <4AC4BD1E.6060706@gmail.com>
Eric Dumazet wrote:
> Gilad Ben-Yossef a écrit :
>
>> From: Ori Finkelman <ori@comsleep.com>
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 5200aab..fcd278a 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock
>> *tp)
>> #define OPTION_SACK_ADVERTISE (1 << 0)
>> #define OPTION_TS (1 << 1)
>> #define OPTION_MD5 (1 << 2)
>> +#define OPTION_WSCALE (1 << 3)
>>
>
> I manually applied your patch and tested it.
>
> So far so good, it works well.
>
Glad to hear. Thank you both Eric and Ilpo for the review.
> But you'll need to find correct way to submit a patch so that your mailer doesnt
> mangle the content.
>
> File Documentation/email-clients.txt contains useful tips.
>
>
>
Arrggghh... I thought I have subdued Thunderbird but it tricked me. My
prefs.js got re-written somehow. My sincere apologies. This is not my
week with MUA.
I also noticed I put the signed-off-by in the wrong place...
Next email is version 3 - same content, hopefully correct formatting.
Thanks again.
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Now the world has gone to bed
Darkness won't engulf my head
I can see by infra-red
How I hate the night."
^ permalink raw reply
* [PATCH] make TLLAO option for NA packets configurable
From: Cosmin Ratiu @ 2009-10-01 16:16 UTC (permalink / raw)
To: netdev; +Cc: Octavian Purdila
Hello,
This is a patch that adds a sysctl to control the sending of the Target Link
Layer Address Option (TLLAO) with Neighbor Advertisements responding to
unicast NS. The patch was made for kernel 2.6.7 (yes it is ancient), but the
code is similar with the current kernel and I can rework it if you want it in.
RFC 2461, page 24 suggests that this option should be included with NAs to
avoid a race with the sender clearing its cache after sending an unicast NS,
but before receiving a NA.
It seems there are some Juniper routers (MX series) that expect this option to
be included with all NAs.
Another solution is to always send this option, as it has little overhead.
Please let me know what you think,
Cosmin.
Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
--- //packages/linux_2.6.7/main/src/include/linux/sysctl.h
+++ /home/z/w1/packages/linux_2.6.7/main/src/include/linux/sysctl.h
@@ -444,6 +444,7 @@
NET_IPV6_IP6FRAG_TIME=23,
NET_IPV6_IP6FRAG_SECRET_INTERVAL=24,
NET_IPV6_MLD_MAX_MSF=25,
+ NET_IPV6_NDISC_FORCE_TLLAO=26,
};
enum {
--- //packages/linux_2.6.7/main/src/include/net/ipv6.h
+++ /home/z/w1/packages/linux_2.6.7/main/src/include/net/ipv6.h
@@ -479,6 +479,7 @@
extern int sysctl_ip6frag_low_thresh;
extern int sysctl_ip6frag_time;
extern int sysctl_ip6frag_secret_interval;
+extern int sysctl_ndisc_force_tllao;
#endif /* __KERNEL__ */
#endif /* _NET_IPV6_H */
--- //packages/linux_2.6.7/main/src/net/ipv6/ndisc.c
+++ /home/z/w1/packages/linux_2.6.7/main/src/net/ipv6/ndisc.c
@@ -169,6 +169,8 @@
#define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
+int sysctl_ndisc_force_tllao;
+
static u8 *ndisc_fill_option(u8 *opt, int type, void *data, int data_len)
{
int space = NDISC_OPT_SPACE(data_len);
@@ -399,6 +401,9 @@
return;
}
+ if (sysctl_ndisc_force_tllao)
+ inc_opt = 1;
+
if (inc_opt) {
if (dev->addr_len)
len += NDISC_OPT_SPACE(dev->addr_len);
--- //packages/linux_2.6.7/main/src/net/ipv6/sysctl_net_ipv6.c
+++ /home/z/w1/packages/linux_2.6.7/main/src/net/ipv6/sysctl_net_ipv6.c
@@ -84,6 +84,14 @@
.mode = 0644,
.proc_handler = &proc_dointvec
},
+ {
+ .ctl_name = NET_IPV6_NDISC_FORCE_TLLAO,
+ .procname = "ndisc_force_tllao",
+ .data = &sysctl_ndisc_force_tllao,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
{ .ctl_name = 0 }
};
^ permalink raw reply
* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: Kalle Valo @ 2009-10-01 16:20 UTC (permalink / raw)
To: John W. Linville; +Cc: Luis R. Rodriguez, linux-wireless, netdev
In-Reply-To: <20091001151820.GA2895@tuxdriver.com>
"John W. Linville" <linville@tuxdriver.com> writes:
> On Thu, Oct 01, 2009 at 05:18:33PM +0300, Kalle Valo wrote:
>>
>> I'm not worried about the implementation complexity, and as your
>> patches show it was easy. My concern is the overall design for
>> wireless devices. Instead of using nl80211 for everything, with some
>> features we would use nl80211/iw and with some ethtool. That's just
>> confusing and I don't like that. I would prefer that nl80211 provides
>> everything, it makes things so much easier.
>
> Well, if the hw/fw version numbers were the only thing then I'd
> probably say it's not a big deal. But having ethtool support is nice
> in that it makes a familiar tool work for us. Among other things,
> this probably helps with some distro scripts that don't work quite
> right without it. Plus, there is lots of debugging stuff that could
> be turned-on without having to write new tools.
Agreed, maybe expect the distro scripts part. To me that just sounds
as a bug in the scripts.
> I suppose I understand the 'one API' idea, but why duplicate
> functionality?
Just because the common functionality in this case isn't high enough.
I'm worried that we will use 10% of the functionality in nl80211 and
the rest 90% will be something we can't use and have to reimplement in
nl80211.
> Anyway, adding a couple of ioctl calls isn't a big deal.
Sure, but we need to support this forever. If, say after two years, we
decide that ethtool is not the way to go, it's very difficult to
remove it. The less interfaces we have, the easier it is to maintain
them.
> And don't forget, we are still network drivers too...
I hope ethtool isn't a strict requirement for a network driver, at
least I haven't heard about that.
>> One example is the hw version, ethtool only provides u32 to userspace
>> and moves the burden of translating hw id to the user. For us a string
>> is much better choise because when debuggin we need to often (or
>> always?) know the chip version.
>
> Look at the way most drivers set the version (using each byte as a
> field).
Yes, that's how it is also with wl1251. A number like '0x7030101' is
just not that user friendly.
> If you want prettier output, adding a parser to the userland ethtool
> is fairly trivial. It looks something like the patch below...
Oh wow, that's cool and a truly useful feature. One complaint less
from me :)
>> ethtool -c|--show-coalesce DEVNAME Show coalesce options
>> ethtool -C|--coalesce DEVNAME Set coalesce options
>> [adaptive-rx on|off]
>> [adaptive-tx on|off]
>> [rx-usecs N]
>> [rx-frames N]
>> [rx-usecs-irq N]
>> [rx-frames-irq N]
>> [tx-usecs N]
>> [tx-frames N]
>> [tx-usecs-irq N]
>> [tx-frames-irq N]
>> [stats-block-usecs N]
>> [pkt-rate-low N]
>> [rx-usecs-low N]
>> [rx-frames-low N]
>> [tx-usecs-low N]
>> [tx-frames-low N]
>> [pkt-rate-high N]
>> [rx-usecs-high N]
>> [rx-frames-high N]
>> [tx-usecs-high N]
>> [tx-frames-high N]
>> [sample-interval N]
>
> These _could_ be useful if wireless becomes more
> performance-oriented...
Maybe, or maybe not. We will only find out within the next few years.
And what will we do if the parameters are actually a bit different? Is
it ok to extend ethtool for supporting wireless or do we later on have
to add separate support to nl80211? The latter would suck big time.
>> ethtool -g|--show-ring DEVNAME Query RX/TX ring parameters
>> ethtool -G|--set-ring DEVNAME Set RX/TX ring parameters
>> [ rx N ]
>> [ rx-mini N ]
>> [ rx-jumbo N ]
>> [ tx N ]
>
> Wireless devices have ring buffers, no?
Yes, there is hardware which have them but again the question is this
relevant for wireless devices. In ethernet the hardware is the
bottleneck but in 802.11 the wireless medium is the bottleneck, so the
parameters we need to configure are usually different.
>> ethtool -r|--negotiate DEVNAME Restart N-WAY negotation
>
> Ethernet-specific...might could be overloaded for wireless to trigger
> reassoc...?
Please no, I don't want to see any reassociation or anything else
802.11 state related in ethtool, nl80211 was created for this. This is
something I would object loudly :)
> Anyway, it doesn't really matter if we don't use the whole API -- many
> older ethernet devices don't support all these features. The point
> is that the API exists and has some overlap with our needs. It is a
> driver-oriented API, with nitty-gritty stuff that need not clutter a
> configuraiton API like cfg80211. There is even the potential of us
> adding our own extensions (e.g. WoW) that are also device-oriented.
>
> Anyway, between the link detection and making distro scripts work
> plus enabling a familiar tool for basic driver info I think this is
> a win. So much the better if some drivers move to ethtool for register
> dumping, setting message verbosity, querying/changing eeprom values,
> etc, etc...
Sounds good enough. As I said in my earlier email, I'm not going argue
about this for too long. You know this better than I do. So let's go
forward with ethtool.
Thanks for listening to my concerns.
--
Kalle Valo
^ permalink raw reply
* Re: [PATCH] make TLLAO option for NA packets configurable
From: Stephen Hemminger @ 2009-10-01 16:21 UTC (permalink / raw)
To: Cosmin Ratiu; +Cc: netdev, Octavian Purdila
In-Reply-To: <200910011916.40908.cratiu@ixiacom.com>
On Thu, 1 Oct 2009 19:16:40 +0300
Cosmin Ratiu <cratiu@ixiacom.com> wrote:
> Hello,
>
> This is a patch that adds a sysctl to control the sending of the Target Link
> Layer Address Option (TLLAO) with Neighbor Advertisements responding to
> unicast NS. The patch was made for kernel 2.6.7 (yes it is ancient), but the
> code is similar with the current kernel and I can rework it if you want it in.
>
> RFC 2461, page 24 suggests that this option should be included with NAs to
> avoid a race with the sender clearing its cache after sending an unicast NS,
> but before receiving a NA.
>
> It seems there are some Juniper routers (MX series) that expect this option to
> be included with all NAs.
>
> Another solution is to always send this option, as it has little overhead.
>
> Please let me know what you think,
> Cosmin.
>
> Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
> --- //packages/linux_2.6.7/main/src/include/linux/sysctl.h
> +++ /home/z/w1/packages/linux_2.6.7/main/src/include/linux/sysctl.h
> @@ -444,6 +444,7 @@
> NET_IPV6_IP6FRAG_TIME=23,
> NET_IPV6_IP6FRAG_SECRET_INTERVAL=24,
> NET_IPV6_MLD_MAX_MSF=25,
> + NET_IPV6_NDISC_FORCE_TLLAO=26,
Since numbered sysctl values are deprecated, can you use CTL_UNNUMBERED
to avoid having to add yet another value?
^ permalink raw reply
* Re: tg3: Badness at kernel/mutex.c:207
From: Matt Carlson @ 2009-10-01 16:25 UTC (permalink / raw)
To: Felix Radensky; +Cc: Matthew Carlson, netdev@vger.kernel.org
In-Reply-To: <4AC47826.1000809@embedded-sol.com>
On Thu, Oct 01, 2009 at 02:36:38AM -0700, Felix Radensky wrote:
> Hi, Matt
>
> Matt Carlson wrote:
> > On Sat, Sep 26, 2009 at 02:20:57PM -0700, Felix Radensky wrote:
> >
> >> Hi,
> >>
> >> I'm running linux-2.6.31 on a custom MPC8536 based board with BCM57760 chip.
> >> Both tg3 driver, and Broadcom PHY driver are modules.
> >>
> >> Each time I run ifconfig eth2 up, I get the following error message:
> >>
> >> Badness at kernel/mutex.c:207
> >> NIP: c025132c LR: c0251314 CTR: c0251334
> >> REGS: efbedbd0 TRAP: 0700 Not tainted (2.6.31)
> >> MSR: 00029000 <EE,ME,CE> CR: 24020422 XER: 00000000
> >> TASK = efacce10[1080] 'ifconfig' THREAD: efbec000
> >> GPR00: 00000000 efbedc80 efacce10 00000001 00007020 00000002 00000000
> >> 00000200
> >> GPR08: 00029000 c0350000 c0330000 00000001 24020424 10057d94 000002a0
> >> 1000d82c
> >> GPR16: 1000d81c 1000d814 10010000 10050000 ef897a0c efbede18 ffff8914
> >> ef897a00
> >> GPR24: 00008000 c034b480 efbec000 efb0122c c0350000 efacce10 ef82d2c0
> >> efb01228
> >> NIP [c025132c] __mutex_lock_slowpath+0x1f0/0x1f8
> >> LR [c0251314] __mutex_lock_slowpath+0x1d8/0x1f8
> >> Call Trace:
> >> [efbedcd0] [c025134c] mutex_lock+0x18/0x34
> >> [efbedcf0] [f534a228] tg3_chip_reset+0x7cc/0x9f8 [tg3]
> >> [efbedd20] [f534a8f0] tg3_reset_hw+0x58/0x2360 [tg3]
> >> [efbedd70] [f5351dd4] tg3_open+0x610/0x910 [tg3]
> >> [efbeddb0] [c01e1c6c] dev_open+0x100/0x138
> >> [efbeddd0] [c01dff20] dev_change_flags+0x80/0x1ac
> >> [efbeddf0] [c02232cc] devinet_ioctl+0x648/0x824
> >> [efbede60] [c0223de4] inet_ioctl+0xcc/0xf8
> >> [efbede70] [c01cdf44] sock_ioctl+0x60/0x300
> >> [efbede90] [c008a35c] vfs_ioctl+0x34/0x8c
> >> [efbedea0] [c008a580] do_vfs_ioctl+0x88/0x724
> >> [efbedf10] [c008ac5c] sys_ioctl+0x40/0x74
> >> [efbedf40] [c000f814] ret_from_syscall+0x0/0x3c
> >> Instruction dump:
> >> 0fe00000 4bfffe80 801a000c 5409016f 4182fe60 4bf0f6d9 2f830000 41befe54
> >> 3d20c035 8009c2c0 2f800000 40befe44 <0fe00000> 4bfffe3c 9421ffe0 7c0802a6
> >>
> >> Does it indicate a real problem, or something that can be ignored ?
> >>
> >> Additional information from kernel log:
> >>
> >> tg3.c:v3.99 (April 20, 2009)
> >> tg3 0002:05:00.0: enabling bus mastering
> >> tg3 0002:05:00.0: PME# disabled
> >> tg3 mdio bus: probed
> >> eth2: Tigon3 [partno(BCM57760) rev 57780001] (PCI Express) MAC address
> >> 00:10:18:00:00:00
> >> eth2: attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=500:01)
> >> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> >> eth2: dma_rwctrl[76180000] dma_mask[64-bit]
> >> tg3 0002:05:00.0: PME# disabled
> >>
> >
> > Yes, this is a real problem. The driver is taking the MDIO bus lock
> > while holding the device's own spinlock. I think I may have a
> > workaround. Let me test it and get back to you.
> >
>
> Did you have a chance to look into it ?
Yes, and the fix seems to work. The patch changes the locking behavior
of the driver, so I'm being extra careful and checking for possible side
effects. That's why it's taking so long. Sorry for the delay.
^ permalink raw reply
* Re: [PATCH] pktgen: Fix delay handling
From: David Miller @ 2009-10-01 16:29 UTC (permalink / raw)
To: eric.dumazet; +Cc: shemminger, jdb, robert, netdev
In-Reply-To: <4AC47EB9.6070809@gmail.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 01 Oct 2009 12:04:41 +0200
> But it appears net/core/pktgen.c is different on net-next-2.6
>
> Stephen, David, I am a bit lost here, something went wrong in a merge process ?
>
net-next-2.6 is just a stale old tree, there is no new networking
work in there and it is simply Linus's tree as of a few weeks
ago.
It's only there so Stephen Rothwell has something to do a 'nop'
pull from into his linux-next tree.
I'll apply your fix, thanks!
^ permalink raw reply
* Re: [PATCH] Use sk_mark for routing lookup in more places
From: Eric Dumazet @ 2009-10-01 16:30 UTC (permalink / raw)
To: Atis Elsts; +Cc: Laszlo Attila Toth, David S. Miller, netdev
In-Reply-To: <200910011814.47689.atis@mikrotik.com>
Atis Elsts a écrit :
> This patch against v2.6.31 adds support for route lookup using sk_mark in some
> more places. The benefits from this patch are the following.
> First, SO_MARK option now has effect on UDP sockets too.
> Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
> lookup correctly if TCP sockets with SO_MARK were used.
>
> Signed-off-by: Atis Elsts <atis@mikrotik.com>
Good catch, thanks !
I used SO_MARK on connected UDP sockets so did not notice the lack
of functionality. (ip_route_connect() does use sk->sk_mark)
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/ipv4/af_inet.c | 1 +
> net/ipv4/ip_output.c | 1 +
> net/ipv4/udp.c | 1 +
> 3 files changed, 3 insertions(+)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 566ea6c..7917963 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1103,6 +1103,7 @@ int inet_sk_rebuild_header(struct sock *sk)
> {
> struct flowi fl = {
> .oif = sk->sk_bound_dev_if,
> + .mark = sk->sk_mark,
> .nl_u = {
> .ip4_u = {
> .daddr = daddr,
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 7ffcd96..e088a97 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -335,6 +335,7 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
>
> {
> struct flowi fl = { .oif = sk->sk_bound_dev_if,
> + .mark = sk->sk_mark,
> .nl_u = { .ip4_u =
> { .daddr = daddr,
> .saddr = inet->saddr,
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 80e3812..f90cdcc 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -688,6 +688,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk,
> struct msghdr *msg,
>
> if (rt == NULL) {
> struct flowi fl = { .oif = ipc.oif,
> + .mark = sk->sk_mark,
> .nl_u = { .ip4_u =
> { .daddr = faddr,
> .saddr = saddr,
> --
^ permalink raw reply
* Re: [PATCH] pktgen: Fix delay handling
From: Eric Dumazet @ 2009-10-01 16:32 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, jdb, robert, netdev
In-Reply-To: <20091001.092920.246088613.davem@davemloft.net>
David Miller a écrit :
> net-next-2.6 is just a stale old tree, there is no new networking
> work in there and it is simply Linus's tree as of a few weeks
> ago.
>
> It's only there so Stephen Rothwell has something to do a 'nop'
> pull from into his linux-next tree.
>
> I'll apply your fix, thanks!
Thanks for the explanation David.
^ permalink raw reply
* Re: r8169c: Support for Realtek 8168DP chip?
From: David Miller @ 2009-10-01 16:42 UTC (permalink / raw)
To: dave; +Cc: Rainer.Koenig, netdev
In-Reply-To: <1254404310.24972.4.camel@obelisk.thedillows.org>
From: David Dillow <dave@thedillows.org>
Date: Thu, 01 Oct 2009 09:38:30 -0400
> Hmm, patchwork doesn't seem to have picked it up, yet.
It's there, I just marked it in "RFC" state since that's exactly what
that patch is.
The default patchwork page for a project only lists patches that in a
state other than one which means the patch won't be applied in it's
current form.
If you want to see "all patches in all states" click on "filter"
and set it to your needs.
^ permalink raw reply
* Re: [PATCH] make TLLAO option for NA packets configurable
From: David Miller @ 2009-10-01 16:43 UTC (permalink / raw)
To: shemminger; +Cc: cratiu, netdev, opurdila
In-Reply-To: <20091001092100.14ea024b@s6510>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 1 Oct 2009 09:21:00 -0700
>> Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
>> --- //packages/linux_2.6.7/main/src/include/linux/sysctl.h
>> +++ /home/z/w1/packages/linux_2.6.7/main/src/include/linux/sysctl.h
>> @@ -444,6 +444,7 @@
>> NET_IPV6_IP6FRAG_TIME=23,
>> NET_IPV6_IP6FRAG_SECRET_INTERVAL=24,
>> NET_IPV6_MLD_MAX_MSF=25,
>> + NET_IPV6_NDISC_FORCE_TLLAO=26,
>
> Since numbered sysctl values are deprecated, can you use CTL_UNNUMBERED
> to avoid having to add yet another value?
Using CLT_UNNUMBERED is a must these days.
Also, please fix the prefixing of the paths in your patch.
See Documentation/SubmittingPatches in the kernel tree.
^ permalink raw reply
* Re: [PATCH] make TLLAO option for NA packets configurable
From: Cosmin Ratiu @ 2009-10-01 16:43 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Octavian Purdila
In-Reply-To: <20091001092100.14ea024b@s6510>
On Thursday 01 October 2009 19:21:00 Stephen Hemminger wrote:
> On Thu, 1 Oct 2009 19:16:40 +0300
>
> Cosmin Ratiu <cratiu@ixiacom.com> wrote:
> > Hello,
> >
> > This is a patch that adds a sysctl to control the sending of the Target
> > Link Layer Address Option (TLLAO) with Neighbor Advertisements responding
> > to unicast NS. The patch was made for kernel 2.6.7 (yes it is ancient),
> > but the code is similar with the current kernel and I can rework it if
> > you want it in.
> >
> > RFC 2461, page 24 suggests that this option should be included with NAs
> > to avoid a race with the sender clearing its cache after sending an
> > unicast NS, but before receiving a NA.
> >
> > It seems there are some Juniper routers (MX series) that expect this
> > option to be included with all NAs.
> >
> > Another solution is to always send this option, as it has little
> > overhead.
> >
> > Please let me know what you think,
> > Cosmin.
> >
> > Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
> > --- //packages/linux_2.6.7/main/src/include/linux/sysctl.h
> > +++ /home/z/w1/packages/linux_2.6.7/main/src/include/linux/sysctl.h
> > @@ -444,6 +444,7 @@
> > NET_IPV6_IP6FRAG_TIME=23,
> > NET_IPV6_IP6FRAG_SECRET_INTERVAL=24,
> > NET_IPV6_MLD_MAX_MSF=25,
> > + NET_IPV6_NDISC_FORCE_TLLAO=26,
>
> Since numbered sysctl values are deprecated, can you use CTL_UNNUMBERED
> to avoid having to add yet another value?
Of course, but that is a detail.
If you decide on the sysctl solution, I'll do it.
If you decide on making this the default behavior, it's even better.
Cosmin.
^ permalink raw reply
* [PATCHv3] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01 16:41 UTC (permalink / raw)
To: netdev; +Cc: ori, ilpo.jarvinen, eric.dumazet
From: Ori Finkelman <ori@comsleep.com>
Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
and SYN headers even if our window scale is zero.
This fixes the following observed behavior:
1. Client sends a SYN with TCP window scaling option and non zero window scale
value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does not to
send windows scale TCP option header on SYN/ACK at all.
With the following result:
Client box thinks TCP window scaling is not supported, since SYN/ACK had no
TCP window scale option, while Linux thinks that TCP window scaling is
supported (and scale might be non zero), since SYN had TCP window scale
option and we have a mismatched idea between the client and server
regarding window sizes.
Probably it also fixes up the following bug (not observed in practice):
1. Linux box opens TCP connection to some server.
2. Linux decides on zero value of window scale.
3. Due to compare against computed window scale size option, Linux does
not to set windows scale TCP option header on SYN.
With the expected result that the server OS does not use window scale option
due to not receiving such an option in the SYN headers, leading to suboptimal
performance.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
---
Original bug reported and patch written by Ori Finkelman from Comsleep Ltd.
I've fixed the SYN header case based on feedback from Eric Dumazet and Ilpo
Jarvinen, as part of trying to get the patch mainlined.
The SYN/ACK behavior was observed with a Windows box as the client and latest
Debian kernel but for the best of my understanding this can happen with latest
kernel versions and other client OS (probably also Linux) as well.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5200aab..fcd278a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock *tp)
#define OPTION_SACK_ADVERTISE (1 << 0)
#define OPTION_TS (1 << 1)
#define OPTION_MD5 (1 << 2)
+#define OPTION_WSCALE (1 << 3)
struct tcp_out_options {
u8 options; /* bit field of OPTION_* */
@@ -427,7 +428,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
TCPOLEN_SACK_PERM);
}
- if (unlikely(opts->ws)) {
+ if (unlikely(OPTION_WSCALE & opts->options)) {
*ptr++ = htonl((TCPOPT_NOP << 24) |
(TCPOPT_WINDOW << 16) |
(TCPOLEN_WINDOW << 8) |
@@ -494,8 +495,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
}
if (likely(sysctl_tcp_window_scaling)) {
opts->ws = tp->rx_opt.rcv_wscale;
- if (likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(sysctl_tcp_sack)) {
opts->options |= OPTION_SACK_ADVERTISE;
@@ -537,8 +538,8 @@ static unsigned tcp_synack_options(struct sock *sk,
if (likely(ireq->wscale_ok)) {
opts->ws = ireq->rcv_wscale;
- if (likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(doing_ts)) {
opts->options |= OPTION_TS;
^ permalink raw reply related
* Re: [PATCH] skge: use unique IRQ name
From: Stephen Hemminger @ 2009-10-01 16:50 UTC (permalink / raw)
To: Michal Schmidt, David Miller; +Cc: netdev
In-Reply-To: <20091001122720.3822bdd3@leela>
(revised to use pci:)
From: Michal Schmidt <mschmidt@redhat.com>
Most network drivers request their IRQ when the interface is activated.
skge does it in ->probe() instead, because it can work with two-port
cards where the two net_devices use the same IRQ. This works fine most
of the time, except in some situations when the interface gets renamed.
Consider this example:
1. modprobe skge
The card is detected as eth0 and requests IRQ 17. Directory
/proc/irq/17/eth0 is created.
2. There is an udev rule which says this interface should be called
eth1, so udev renames eth0 -> eth1.
3. modprobe 8139too
The Realtek card is detected as eth0. It will be using IRQ 17 too.
4. ip link set eth0 up
Now 8139too requests IRQ 17.
The result is:
WARNING: at fs/proc/generic.c:590 proc_register ...
proc_dir_entry '17/eth0' already registered
...
And "ls /proc/irq/17" shows two subdirectories, both called eth0.
Fix it by using a unique name for skge's IRQ, based on the PCI address.
The naming from the example then looks like this:
$ grep skge /proc/interrupts
17: 169 IO-APIC-fasteoi skge@pci:0000:00:0a.0, eth0
irqbalance daemon will have to be taught to recognize "skge" as an
Ethernet interrupt. This will be a one-liner addition in classify.c. I
will send a patch to irqbalance if this change is accepted.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/drivers/net/skge.c 2009-10-01 09:36:21.893095582 -0700
+++ b/drivers/net/skge.c 2009-10-01 09:46:22.064555679 -0700
@@ -3895,6 +3895,7 @@ static int __devinit skge_probe(struct p
struct net_device *dev, *dev1;
struct skge_hw *hw;
int err, using_dac = 0;
+ size_t irq_name_len;
err = pci_enable_device(pdev);
if (err) {
@@ -3935,11 +3936,14 @@ static int __devinit skge_probe(struct p
#endif
err = -ENOMEM;
- hw = kzalloc(sizeof(*hw), GFP_KERNEL);
+ /* space for skge@pci:0000:04:00.0 */
+ irq_name_len = strlen(DRV_NAME) + strlen(dev_name(&pdev->dev)) + 6;
+ hw = kzalloc(sizeof(*hw) + irq_name_len, GFP_KERNEL);
if (!hw) {
dev_err(&pdev->dev, "cannot allocate hardware struct\n");
goto err_out_free_regions;
}
+ sprintf(hw->irq_name, DRV_NAME "@pci:%s", pci_name(pdev));
hw->pdev = pdev;
spin_lock_init(&hw->hw_lock);
@@ -3974,7 +3978,7 @@ static int __devinit skge_probe(struct p
goto err_out_free_netdev;
}
- err = request_irq(pdev->irq, skge_intr, IRQF_SHARED, dev->name, hw);
+ err = request_irq(pdev->irq, skge_intr, IRQF_SHARED, hw->irq_name, hw);
if (err) {
dev_err(&pdev->dev, "%s: cannot assign irq %d\n",
dev->name, pdev->irq);
--- a/drivers/net/skge.h 2009-10-01 09:34:51.036505545 -0700
+++ b/drivers/net/skge.h 2009-10-01 09:47:38.096558002 -0700
@@ -2423,6 +2423,8 @@ struct skge_hw {
u16 phy_addr;
spinlock_t phy_lock;
struct tasklet_struct phy_task;
+
+ char irq_name[0]; /* skge@pci:000:04:00.0 */
};
enum pause_control {
^ permalink raw reply
* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: John W. Linville @ 2009-10-01 16:56 UTC (permalink / raw)
To: Ben Hutchings
Cc: Kalle Valo, Luis R. Rodriguez,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1254411199.2735.4.camel@achroite>
On Thu, Oct 01, 2009 at 04:33:19PM +0100, Ben Hutchings wrote:
> On Thu, 2009-10-01 at 11:18 -0400, John W. Linville wrote:
> [...]
> > > But here are the features which I doubt we will ever use:
> > >
> > > ethtool -s|--change DEVNAME Change generic options
> > > [ speed %%d ]
> > > [ duplex half|full ]
> > > [ port tp|aui|bnc|mii|fibre ]
> > > [ autoneg on|off ]
> > > [ advertise %%x ]
> > > [ phyad %%d ]
> > > [ xcvr internal|external ]
> > > [ wol p|u|m|b|a|g|s|d... ]
> > > [ sopass %%x:%%x:%%x:%%x:%%x:%%x ]
> > > [ msglvl %%d ]
> > > ethtool -a|--show-pause DEVNAME Show pause options
> > > ethtool -A|--pause DEVNAME Set pause options
> > > [ autoneg on|off ]
> > > [ rx on|off ]
> > > [ tx on|off ]
> >
> > I agree that the above are ethernet-specific.
> [...]
>
> Message level isn't and WoL arguably isn't. It's a shame that these
> original ethtool settings are still bundled together...
Oh, yes! Missed those in the noise...
John
--
John W. Linville Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org might be all we have. Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] sky2: irqname based on pci address
From: Stephen Hemminger @ 2009-10-01 17:11 UTC (permalink / raw)
To: Michal Schmidt, David Miller; +Cc: netdev
In-Reply-To: <20091001122720.3822bdd3@leela>
This is based on Michal Schmidt fix for skge.
Most network drivers request their IRQ when the interface is activated.
sky2 does it in ->probe() instead, because it can work with two-port
cards where the two net_devices use the same IRQ. This works fine most
of the time, except in some situations when the interface gets renamed.
Consider this example:
1. modprobe sky2
The card is detected as eth0 and requests IRQ 17. Directory
/proc/irq/17/eth0 is created.
2. There is an udev rule which says this interface should be called
eth1, so udev renames eth0 -> eth1.
3. modprobe 8139too
The Realtek card is detected as eth0. It will be using IRQ 17 too.
4. ip link set eth0 up
Now 8139too requests IRQ 17.
The result is:
WARNING: at fs/proc/generic.c:590 proc_register ...
proc_dir_entry '17/eth0' already registered
The fix is for sky2 to name the irq based on the pci device, as is done
by some other devices DRM, infiniband, ... ie. sky2@pci:0000:00:00
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/drivers/net/sky2.c 2009-10-01 09:51:30.604556725 -0700
+++ b/drivers/net/sky2.c 2009-10-01 09:56:38.893342161 -0700
@@ -4487,13 +4487,16 @@ static int __devinit sky2_probe(struct p
wol_default = device_may_wakeup(&pdev->dev) ? WAKE_MAGIC : 0;
err = -ENOMEM;
- hw = kzalloc(sizeof(*hw), GFP_KERNEL);
+
+ hw = kzalloc(sizeof(*hw) + strlen(DRV_NAME "@pci:")
+ + strlen(pci_name(pdev)) + 1, GFP_KERNEL);
if (!hw) {
dev_err(&pdev->dev, "cannot allocate hardware struct\n");
goto err_out_free_regions;
}
hw->pdev = pdev;
+ sprintf(hw->irq_name, DRV_NAME "@pci:%s", pci_name(pdev));
hw->regs = ioremap_nocache(pci_resource_start(pdev, 0), 0x4000);
if (!hw->regs) {
@@ -4539,7 +4542,7 @@ static int __devinit sky2_probe(struct p
err = request_irq(pdev->irq, sky2_intr,
(hw->flags & SKY2_HW_USE_MSI) ? 0 : IRQF_SHARED,
- dev->name, hw);
+ hw->irq_name, hw);
if (err) {
dev_err(&pdev->dev, "cannot assign irq %d\n", pdev->irq);
goto err_out_unregister;
--- a/drivers/net/sky2.h 2009-10-01 09:51:17.553559116 -0700
+++ b/drivers/net/sky2.h 2009-10-01 09:51:42.069510492 -0700
@@ -2085,6 +2085,8 @@ struct sky2_hw {
struct timer_list watchdog_timer;
struct work_struct restart_work;
wait_queue_head_t msi_wait;
+
+ char irq_name[0];
};
static inline int sky2_is_copper(const struct sky2_hw *hw)
^ permalink raw reply
* Re: [PATCH] sky2: irqname based on pci address
From: Stephen Hemminger @ 2009-10-01 17:14 UTC (permalink / raw)
To: Michal Schmidt; +Cc: David Miller, netdev
In-Reply-To: <20091001101146.3368b4a4@s6510>
On Thu, 1 Oct 2009 10:11:46 -0700
Stephen Hemminger <shemminger@vyatta.com> wrote:
> This is based on Michal Schmidt fix for skge.
>
> Most network drivers request their IRQ when the interface is activated.
> sky2 does it in ->probe() instead, because it can work with two-port
> cards where the two net_devices use the same IRQ. This works fine most
> of the time, except in some situations when the interface gets renamed.
> Consider this example:
>
> 1. modprobe sky2
> The card is detected as eth0 and requests IRQ 17. Directory
> /proc/irq/17/eth0 is created.
> 2. There is an udev rule which says this interface should be called
> eth1, so udev renames eth0 -> eth1.
> 3. modprobe 8139too
> The Realtek card is detected as eth0. It will be using IRQ 17 too.
> 4. ip link set eth0 up
> Now 8139too requests IRQ 17.
One other note, the issue is less of a problem for most usage of sky2
because the drive is used mostly on systems that support MSI interrupts
which can never be shared.
^ permalink raw reply
* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: John W. Linville @ 2009-10-01 17:07 UTC (permalink / raw)
To: Kalle Valo; +Cc: Luis R. Rodriguez, linux-wireless, netdev
In-Reply-To: <873a63qe6e.fsf@purkki.valot.fi>
On Thu, Oct 01, 2009 at 07:20:09PM +0300, Kalle Valo wrote:
> "John W. Linville" <linville@tuxdriver.com> writes:
>
> > On Thu, Oct 01, 2009 at 05:18:33PM +0300, Kalle Valo wrote:
> > Anyway, adding a couple of ioctl calls isn't a big deal.
>
> Sure, but we need to support this forever. If, say after two years, we
> decide that ethtool is not the way to go, it's very difficult to
> remove it. The less interfaces we have, the easier it is to maintain
> them.
Just to be clear, I was taling about adding ioctl calls to a
userland application (if you didn't want to use the ethtool utility).
The required ioctls are already defined for ethtool in the kernel.
> >> ethtool -r|--negotiate DEVNAME Restart N-WAY negotation
> >
> > Ethernet-specific...might could be overloaded for wireless to trigger
> > reassoc...?
>
> Please no, I don't want to see any reassociation or anything else
> 802.11 state related in ethtool, nl80211 was created for this. This is
> something I would object loudly :)
Well, it was just a thought... :-)
> > Anyway, between the link detection and making distro scripts work
> > plus enabling a familiar tool for basic driver info I think this is
> > a win. So much the better if some drivers move to ethtool for register
> > dumping, setting message verbosity, querying/changing eeprom values,
> > etc, etc...
>
> Sounds good enough. As I said in my earlier email, I'm not going argue
> about this for too long. You know this better than I do. So let's go
> forward with ethtool.
>
> Thanks for listening to my concerns.
Sure, np. And FWIW, I don't predict a huge problem if there are
valid extensions required for use by wireless drivers in the future.
But for now, I'd like to see us make use of some of the debugging
facilities available in the ethtool API -- hopefully the iwlwifi guys
are listening... ;-)
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: [PATCH] TI DaVinci EMAC: Minor macro related updates
From: David Miller @ 2009-10-01 17:16 UTC (permalink / raw)
To: chaithrika; +Cc: netdev, davinci-linux-open-source
In-Reply-To: <1254428719-13960-1-git-send-email-chaithrika@ti.com>
Date: Thu, 1 Oct 2009 16:25:19 -0400
Please fix the time on your computer.
^ permalink raw reply
* Re: [PATCHv3] IPv4 TCP fails to send window scale option when window scale is zero
From: Eric Dumazet @ 2009-10-01 17:20 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: netdev, ori, ilpo.jarvinen
In-Reply-To: <1254413613.665110.11357.nullmailer@tron.codefidence.com>
Gilad Ben-Yossef a écrit :
> From: Ori Finkelman <ori@comsleep.com>
>
> Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
> and SYN headers even if our window scale is zero.
>
> This fixes the following observed behavior:
>
> 1. Client sends a SYN with TCP window scaling option and non zero window scale
> value to a Linux box.
> 2. Linux box notes large receive window from client.
> 3. Linux decides on a zero value of window scale for its part.
> 4. Due to compare against requested window scale size option, Linux does not to
> send windows scale TCP option header on SYN/ACK at all.
>
> With the following result:
>
> Client box thinks TCP window scaling is not supported, since SYN/ACK had no
> TCP window scale option, while Linux thinks that TCP window scaling is
> supported (and scale might be non zero), since SYN had TCP window scale
> option and we have a mismatched idea between the client and server
> regarding window sizes.
>
> Probably it also fixes up the following bug (not observed in practice):
>
> 1. Linux box opens TCP connection to some server.
> 2. Linux decides on zero value of window scale.
> 3. Due to compare against computed window scale size option, Linux does
> not to set windows scale TCP option header on SYN.
>
> With the expected result that the server OS does not use window scale option
> due to not receiving such an option in the SYN headers, leading to suboptimal
> performance.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> Signed-off-by: Ori Finkelman <ori@comsleep.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Note, to reproduce the wscale set to 0 on active connections,
you have to play with /proc/sys/net settings :
echo 65535 >/proc/sys/net/core/rmem_max
echo "4096 16384 32768" >/proc/sys/net/ipv4/tcp_rmem
-> wscale 0 -> SYN packet without WSCALE option (on non patched kernels)
^ permalink raw reply
* [PATCHv3] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01 16:24 UTC (permalink / raw)
To: netdev; +Cc: ori, ilpo.jarvinen, eric.dumazet
From: Ori Finkelman <ori@comsleep.com>
Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
and SYN headers even if our window scale is zero.
This fixes the following observed behavior:
1. Client sends a SYN with TCP window scaling option and non zero window scale
value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does not to
send windows scale TCP option header on SYN/ACK at all.
With the following result:
Client box thinks TCP window scaling is not supported, since SYN/ACK had no
TCP window scale option, while Linux thinks that TCP window scaling is
supported (and scale might be non zero), since SYN had TCP window scale
option and we have a mismatched idea between the client and server
regarding window sizes.
Probably it also fixes up the following bug (not observed in practice):
1. Linux box opens TCP connection to some server.
2. Linux decides on zero value of window scale.
3. Due to compare against computed window scale size option, Linux does
not to set windows scale TCP option header on SYN.
With the expected result that the server OS does not use window scale option
due to not receiving such an option in the SYN headers, leading to suboptimal
performance.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
---
Original bug reported and patch written by Ori Finkelman from Comsleep Ltd.
I've fixed the SYN header case based on feedback from Eric Dumazet and Ilpo
Jarvinen, as part of trying to get the patch mainlined.
The SYN/ACK behavior was observed with a Windows box as the client and latest
Debian kernel but for the best of my understanding this can happen with latest
kernel versions and other client OS (probably also Linux) as well.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5200aab..fcd278a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock *tp)
#define OPTION_SACK_ADVERTISE (1 << 0)
#define OPTION_TS (1 << 1)
#define OPTION_MD5 (1 << 2)
+#define OPTION_WSCALE (1 << 3)
struct tcp_out_options {
u8 options; /* bit field of OPTION_* */
@@ -427,7 +428,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
TCPOLEN_SACK_PERM);
}
- if (unlikely(opts->ws)) {
+ if (unlikely(OPTION_WSCALE & opts->options)) {
*ptr++ = htonl((TCPOPT_NOP << 24) |
(TCPOPT_WINDOW << 16) |
(TCPOLEN_WINDOW << 8) |
@@ -494,8 +495,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
}
if (likely(sysctl_tcp_window_scaling)) {
opts->ws = tp->rx_opt.rcv_wscale;
- if (likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(sysctl_tcp_sack)) {
opts->options |= OPTION_SACK_ADVERTISE;
@@ -537,8 +538,8 @@ static unsigned tcp_synack_options(struct sock *sk,
if (likely(ireq->wscale_ok)) {
opts->ws = ireq->rcv_wscale;
- if (likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(doing_ts)) {
opts->options |= OPTION_TS;
^ permalink raw reply related
* [PATCHv3] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01 16:13 UTC (permalink / raw)
To: netdev; +Cc: ori, ilpo.jarvinen, eric.dumazet
From: Ori Finkelman <ori@comsleep.com>
Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
and SYN headers even if our window scale is zero.
This fixes the following observed behavior:
1. Client sends a SYN with TCP window scaling option and non zero window scale
value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does not to
send windows scale TCP option header on SYN/ACK at all.
With the following result:
Client box thinks TCP window scaling is not supported, since SYN/ACK had no
TCP window scale option, while Linux thinks that TCP window scaling is
supported (and scale might be non zero), since SYN had TCP window scale
option and we have a mismatched idea between the client and server
regarding window sizes.
Probably it also fixes up the following bug (not observed in practice):
1. Linux box opens TCP connection to some server.
2. Linux decides on zero value of window scale.
3. Due to compare against computed window scale size option, Linux does
not to set windows scale TCP option header on SYN.
With the expected result that the server OS does not use window scale option
due to not receiving such an option in the SYN headers, leading to suboptimal
performance.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
---
Original bug reported and patch written by Ori Finkelman from Comsleep Ltd.
I've fixed the SYN header case based on feedback from Eric Dumazet and Ilpo
Jarvinen, as part of trying to get the patch mainlined.
The SYN/ACK behavior was observed with a Windows box as the client and latest
Debian kernel but for the best of my understanding this can happen with latest
kernel versions and other client OS (probably also Linux) as well.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5200aab..fcd278a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock *tp)
#define OPTION_SACK_ADVERTISE (1 << 0)
#define OPTION_TS (1 << 1)
#define OPTION_MD5 (1 << 2)
+#define OPTION_WSCALE (1 << 3)
struct tcp_out_options {
u8 options; /* bit field of OPTION_* */
@@ -427,7 +428,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
TCPOLEN_SACK_PERM);
}
- if (unlikely(opts->ws)) {
+ if (unlikely(OPTION_WSCALE & opts->options)) {
*ptr++ = htonl((TCPOPT_NOP << 24) |
(TCPOPT_WINDOW << 16) |
(TCPOLEN_WINDOW << 8) |
@@ -494,8 +495,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
}
if (likely(sysctl_tcp_window_scaling)) {
opts->ws = tp->rx_opt.rcv_wscale;
- if (likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(sysctl_tcp_sack)) {
opts->options |= OPTION_SACK_ADVERTISE;
@@ -537,8 +538,8 @@ static unsigned tcp_synack_options(struct sock *sk,
if (likely(ireq->wscale_ok)) {
opts->ws = ireq->rcv_wscale;
- if (likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(doing_ts)) {
opts->options |= OPTION_TS;
^ permalink raw reply related
* Re: [PATCH 00/31] Swap over NFS -v20
From: Christoph Hellwig @ 2009-10-01 17:42 UTC (permalink / raw)
To: Suresh Jayaraman
Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm, netdev,
Neil Brown, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
trond.myklebust
In-Reply-To: <1254405858-15651-1-git-send-email-sjayaraman@suse.de>
On Thu, Oct 01, 2009 at 07:34:18PM +0530, Suresh Jayaraman wrote:
> Hi,
>
> Here's the latest version of swap over NFS series since -v19 last October by
> Peter Zijlstra. Peter does not have time to pursue this further (though he has
> not lost interest) and that led me to take over this patchset and try merging
> upstream.
>
> The patches are against the current mmotm. It does not support SLQB, yet.
> These patches can also be found online here:
> http://www.suse.de/~sjayaraman/patches/swap-over-nfs/
My advise again that I already gave to Peter long ago. It's almost
impossible to get a patchset that large and touching many subsystems in.
Split it into smaller series that make sense of their own. One of them
would be the whole VM/net work to just make swap over nbd/iscsi safe.
The other really big one is adding a proper method for safe, page-backed
kernelspace I/O on files. That is not something like the grotty
swap-tied address_space operations in this patch, but more something in
the direction of the kernel direct I/O patches from Jenx Axboe he did
for using in the loop driver. But even those aren't complete as they
don't touch the locking issue yet.
Especially the latter is an absolutely essential step to make any
progress here, and an excellent patch series of it's own as there are
multiple users for this, like making swap safe on btrfs files, making
the MD bitmap code actually safe or improving the loop driver.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox