* Re: [PATCH V10 01/15] time: Introduce timekeeping_inject_offset
From: John Stultz @ 2011-01-27 18:48 UTC (permalink / raw)
To: Richard Cochran, Richard Cochran
Cc: linux-kernel, linux-api, netdev, Alan Cox, Arnd Bergmann,
Christoph Lameter, David Miller, Krzysztof Halasa, Peter Zijlstra,
Rodolfo Giometti, Thomas Gleixner, Benjamin Herrenschmidt,
H. Peter Anvin, Ingo Molnar, Mike Frysinger, Paul Mackerras,
Russell King
In-Reply-To: <6aec014551fc1d34924d6a7bcf97769867c15ba9.1296124770.git.richard.cochran@omicron.at>
On Thu, 2011-01-27 at 11:54 +0100, John Stultz wrote:
> This adds a kernel-internal timekeeping interface to add or subtract
> a fixed amount from CLOCK_REALTIME. This makes it so kernel users or
> interfaces trying to do so do not have to read the time, then add an
> offset and then call settimeofday(), which adds some extra error in
> comparision to just simply adding the offset in the kernel timekeeping
> core.
>
> CC: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
Hey Richard,
Something seems wrong with your mail sending script. It looks like your
sending the email under my name (John Stultz
<richardcochran@gmail.com>).
While I appreciate you preserving the path author, and the signoffs are
right, you really should send the email under your own name.
The proper style is to keep the mail-header From: the same (ie: Richard
Cochran <richardcochran@gmail.com>), but as the first line of the mail
body put:
From: Author Name <author@lemail.com>
thanks
-john
^ permalink raw reply
* Re: [PATCH net-next-2.6] drivers/net: remove some rcu sparse warnings
From: Michael Chan @ 2011-01-27 18:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Arnd Bergmann, Eilon Greenstein
In-Reply-To: <1296106103.1783.114.camel@edumazet-laptop>
On Wed, 2011-01-26 at 21:28 -0800, Eric Dumazet wrote:
> Add missing __rcu annotations and helpers.
> minor : Fix some rcu_dereference() calls in macvtap
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Arnd Bergmann <arnd@arndb.de>
> CC: Michael Chan <mchan@broadcom.com>
> CC: Eilon Greenstein <eilong@broadcom.com>
Thanks Eric. bnx2/bnx2x/cnic portions look good.
Acked-by: Michael Chan <mchan@broadcom.com>
^ permalink raw reply
* Re: skb_split in tcp_retransmit_skb question
From: Sergey Senozhatsky @ 2011-01-27 18:33 UTC (permalink / raw)
To: David S. Miller
Cc: Alexey Kuznetsov, Eric Dumazet, Pekka Savola (ipv6), netdev,
linux-kernel
In-Reply-To: <20110127152057.GA4153@swordfish.minsk.epam.com>
[-- Attachment #1: Type: text/plain, Size: 675 bytes --]
On (01/27/11 17:20), Sergey Senozhatsky wrote:
> Hello,
>
> Suppose we have the following scenario:
>
> tcp_write_timer ->
> tcp_retransmit_skb
>
> in tcp_retransmit_skb we have `if (skb->len > cur_mss)' evaluted to true, which leads
> to tcp_fragment(sk, skb, cur_mss, cur_mss) call. tcp_fragment calls skb_split(skb, buff, len)
> which, in turn, calls skb_split_no_header(skb, skb1, len, pos), where we have
> `skb_shinfo(skb)->nr_frags++' while in `for (i = 0; i < nfrags; i++)' loop.
Sorry for the noise. Alexey has pointed out that we have
skb_shinfo(skb)->nr_frags = 0 in skb_split_no_header. Have no idea how did I miss it.
Thanks,
Sergey
[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Julia Lawall @ 2011-01-27 18:18 UTC (permalink / raw)
To: Ben Pfaff; +Cc: hadi, netdev
In-Reply-To: <87pqrikzd6.fsf@benpfaff.org>
On Thu, 27 Jan 2011, Ben Pfaff wrote:
> Julia Lawall <julia@diku.dk> writes:
>
> > I find numerous occurrences of code like the following, in which nest ends
> > up with the value NULL and then nla_nest_cancel is called with nest as the
> > second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> > second argument, and nlmsg_trim does nothing if its second argument is
> > NULL. Is there any reason to keep these calls?
>
> I think that you are missing that NLA_PUT() contains an internal
> "goto nla_put_failure;". If that branch is taken, then
> nla_nest_cancel() trims off the nested attribute. So just
> removing the call to nla_nest_cancel() would change behavior in
> that case.
Indeed. Thank you for the explanation.
julia
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Ben Pfaff @ 2011-01-27 17:52 UTC (permalink / raw)
To: Julia Lawall; +Cc: hadi, netdev
In-Reply-To: <Pine.LNX.4.64.1101271804530.13796@pc-004.diku.dk>
Julia Lawall <julia@diku.dk> writes:
> I find numerous occurrences of code like the following, in which nest ends
> up with the value NULL and then nla_nest_cancel is called with nest as the
> second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> second argument, and nlmsg_trim does nothing if its second argument is
> NULL. Is there any reason to keep these calls?
I think that you are missing that NLA_PUT() contains an internal
"goto nla_put_failure;". If that branch is taken, then
nla_nest_cancel() trims off the nested attribute. So just
removing the call to nla_nest_cancel() would change behavior in
that case.
--
Ben Pfaff
http://benpfaff.org
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Julia Lawall @ 2011-01-27 17:25 UTC (permalink / raw)
To: Kurt Van Dijck; +Cc: hadi, netdev
In-Reply-To: <20110127172118.GA331@e-circ.dyndns.org>
On Thu, 27 Jan 2011, Kurt Van Dijck wrote:
> On Thu, Jan 27, 2011 at 06:08:34PM +0100, Julia Lawall wrote:
> >
> > I find numerous occurrences of code like the following, in which nest ends
> > up with the value NULL and then nla_nest_cancel is called with nest as the
> > second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> > second argument, and nlmsg_trim does nothing if its second argument is
> > NULL. Is there any reason to keep these calls?
> I just learned this:
> nla_nest_start() adds data to the skb.
> nla_nest_end() 'commits' the proper length.
> nla_nest_cancel() reverts skb to the state before nla_nest_start(),
> as if nothing happened.
Yes, I can see this as well. But in this case, it seems to me taht
nothing has happened, because nla_nest_star has returned NULL?
julia
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Kurt Van Dijck @ 2011-01-27 17:21 UTC (permalink / raw)
To: Julia Lawall; +Cc: hadi, netdev
In-Reply-To: <Pine.LNX.4.64.1101271804530.13796@pc-004.diku.dk>
On Thu, Jan 27, 2011 at 06:08:34PM +0100, Julia Lawall wrote:
>
> I find numerous occurrences of code like the following, in which nest ends
> up with the value NULL and then nla_nest_cancel is called with nest as the
> second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> second argument, and nlmsg_trim does nothing if its second argument is
> NULL. Is there any reason to keep these calls?
I just learned this:
nla_nest_start() adds data to the skb.
nla_nest_end() 'commits' the proper length.
nla_nest_cancel() reverts skb to the state before nla_nest_start(),
as if nothing happened.
Kurt
^ permalink raw reply
* question about nla_nest_cancel
From: Julia Lawall @ 2011-01-27 17:08 UTC (permalink / raw)
To: hadi, netdev
I find numerous occurrences of code like the following, in which nest ends
up with the value NULL and then nla_nest_cancel is called with nest as the
second argument. But nla_nest_cancel just calls nlmsg_trim with the same
second argument, and nlmsg_trim does nothing if its second argument is
NULL. Is there any reason to keep these calls?
thanks,
julia
static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
{
struct tbf_sched_data *q = qdisc_priv(sch);
struct nlattr *nest;
struct tc_tbf_qopt opt;
nest = nla_nest_start(skb, TCA_OPTIONS);
if (nest == NULL)
goto nla_put_failure;
opt.limit = q->limit;
opt.rate = q->R_tab->rate;
if (q->P_tab)
opt.peakrate = q->P_tab->rate;
else
memset(&opt.peakrate, 0, sizeof(opt.peakrate));
opt.mtu = q->mtu;
opt.buffer = q->buffer;
NLA_PUT(skb, TCA_TBF_PARMS, sizeof(opt), &opt);
nla_nest_end(skb, nest);
return skb->len;
nla_put_failure:
nla_nest_cancel(skb, nest);
return -1;
}
^ permalink raw reply
* Re: Realtek r8168C / r8169 driver VLAN TAG stripping
From: Francois Romieu @ 2011-01-27 16:50 UTC (permalink / raw)
To: Anand Raj Manickam; +Cc: netdev, Hayes
In-Reply-To: <AANLkTi=OwsMO8x9AOy=MmohU4SSQcv+o=TvwNs0NNsQR@mail.gmail.com>
Anand Raj Manickam <anandrm@gmail.com> :
> On Thu, Jan 27, 2011 at 8:37 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> > Anand Raj Manickam <anandrm@gmail.com> :
[...]
> > - ip addr show
>
> 3: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
> link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
> inet 172.16.1.1/16 brd 172.16.255.255 scope global eth0
> inet6 fe80::217:54ff:fe00:f662/64 scope link
> valid_lft forever preferred_lft forever
>
> 8: eth0.50@eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc noqueue
> link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
> inet 172.16.10.10/24 brd 172.16.10.255 scope global eth0.50
> inet6 fe80::217:54ff:fe00:f662/64 scope link
> valid_lft forever preferred_lft forever
Could you try again after issuing :
ip addr del 172.16.1.1/16 brd 172.16.255.255 dev eth0
then send the unabbreviated "ip addr show" and "ip route show all" if
things do not perform better.
(no iptables / ip rules wizardry, right ?)
[...]
> > - ethtool -k eth0
>
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: off
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
Ok.
[...]
> > I do not get the "VLAN tag gets stripped" concept, especially on Tx.
> > Does it mean "no packet" or "a packet whose content is wrong" ?
>
> Sorry for not being clear ;-)
>
> When we transmit a packet with VLAN TAG , the TAG get stripped when
> transmitted through the device , the other end trunk port / sniffer
> does NOT see a TAG.
> Similarly , when a VLAN Tagged packet is sent from the other end , The
> TAG gets stripped by the device , We DONOT see the tag .
But the data flows in both directions, right ?
> I use tcpdump -i eth0 -n -nn -e vlan 50
> to see if the packets are gettin tagged or NOT .
>
> The same config works on forcedeth
What do you call "same config" ?
I am mildly convinced that your config is simple enough to isolate a
driver level vlan problem.
--
Ueimor
^ permalink raw reply
* Re: SO_REUSEPORT - can it be done in kernel?
From: Bill Sommerfeld @ 2011-01-27 15:55 UTC (permalink / raw)
To: Daniel Baluta; +Cc: therbert, netdev
In-Reply-To: <AANLkTimDtaV=WhZUUEivg3_vEUeUk3_WQSs09h7USiUj@mail.gmail.com>
On Thu, Jan 27, 2011 at 02:07, Daniel Baluta <daniel.baluta@gmail.com> wrote:
> How did you solved the issue regarding scaling TCP listeners?
> I think SO_REUSEPORT proposed by patch [1] can be a good
> start. Where there any follow ups?
Google is using the patch internally. I've recently joined google and
have picked up this work from Tom; I'm starting to rework how it
interacts with TCP (in particular, changing how it interacts with
request sockets and listen sockets so that incoming connections are
not prematurely bound to a specific listener sharing the port). I
have nothing worth sharing yet.
^ permalink raw reply
* Re: netconsole build breakage (Re: [GIT] Networking)
From: Ingo Molnar @ 2011-01-27 15:51 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Américo Wang, David Miller, James Bottomley, Randy Dunlap,
torvalds, akpm, netdev, linux-kernel, Joel Becker
In-Reply-To: <1295433231.21351.17.camel@haakon2.linux-iscsi.org>
* Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> On Wed, 2011-01-19 at 18:08 +0800, Américo Wang wrote:
> > On Wed, Jan 19, 2011 at 10:59:20AM +0100, Ingo Molnar wrote:
> > >
> > >FYI, there's a .38-rc1 build failure that triggers rather often:
> > >
> > > drivers/built-in.o: In function `drop_netconsole_target':
> > > netconsole.c:(.text+0x130146): undefined reference to `config_item_put'
> > > drivers/built-in.o: In function `write_msg':
> > > netconsole.c:(.text+0x1301aa): undefined reference to `config_item_get'
> > > netconsole.c:(.text+0x130217): undefined reference to `config_item_put'
> > > drivers/built-in.o: In function `netconsole_netdev_event':
> > > netconsole.c:(.text+0x1302ab): undefined reference to `config_item_get'
> > > ...
> > >
> > >Triggered by this configuration:
> > >
> > > CONFIG_CONFIGFS_FS=m
> > > CONFIG_NETCONSOLE=y
> > >
> >
> > Should be "depends on CONFIGFS_FS=y".
>
> Sorry for breaking this one folks..
>
> Where this was left yesterday was to change NETCONSOLE_DYNAMIC, DLM and
> OCFS2_FS symbols to use 'select configfs' instead of 'depends on SYSFS
> && CONFIGFS':
>
> http://marc.info/?l=linux-kernel&m=129539400709508&w=2
>
> but unfortuately this did not make it into .38-rc1 in time..
>
> Using 'select CONFIGFS_FS' here for NETCONSOLE_DYNAMIC with the
> following patches should do the trick.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-linus
>
> Thanks,
>
> Nicholas Bellinger (3):
> net: Make NETCONSOLE_DYNAMIC use select CONFIGFS_FS
> dlm: Make DLM use select CONFIGFS_FS
> ocfs2: Make OCFS2_FS use select CONFIGFS_FS
>
> drivers/net/Kconfig | 3 ++-
> fs/dlm/Kconfig | 4 ++--
> fs/ocfs2/Kconfig | 3 ++-
> 3 files changed, 6 insertions(+), 4 deletions(-)
Ping? This is still broken in Linus's tree as of today ... simple builds like
allmodconfig still fail.
Thanks,
Ingo
^ permalink raw reply
* Re: Realtek r8168C / r8169 driver VLAN TAG stripping
From: Anand Raj Manickam @ 2011-01-27 15:31 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, Hayes
In-Reply-To: <20110127150744.GA7925@electric-eye.fr.zoreil.com>
On Thu, Jan 27, 2011 at 8:37 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Anand Raj Manickam <anandrm@gmail.com> :
> [...]
>> We upgraded to 2.6.36 kernel . The result is SAME.
>> The VLAN tag gets stripped ;-)
>> Do let me know if you need more info .
>
> - ip addr show
3: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
inet 172.16.1.1/16 brd 172.16.255.255 scope global eth0
inet6 fe80::217:54ff:fe00:f662/64 scope link
valid_lft forever preferred_lft forever
8: eth0.50@eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc noqueue
link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
inet 172.16.10.10/24 brd 172.16.10.255 scope global eth0.50
inet6 fe80::217:54ff:fe00:f662/64 scope link
valid_lft forever preferred_lft forever
> - ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
> I do not get the "VLAN tag gets stripped" concept, especially on Tx.
> Does it mean "no packet" or "a packet whose content is wrong" ?
Sorry for not being clear ;-)
When we transmit a packet with VLAN TAG , the TAG get stripped when
transmitted through the device , the other end trunk port / sniffer
does NOT see a TAG.
Similarly , when a VLAN Tagged packet is sent from the other end , The
TAG gets stripped by the device , We DONOT see the tag .
I use tcpdump -i eth0 -n -nn -e vlan 50
to see if the packets are gettin tagged or NOT .
The same config works on forcedeth
Thanks,
Anand
>
> --
> Ueimor
>
^ permalink raw reply
* skb_split in tcp_retransmit_skb question
From: Sergey Senozhatsky @ 2011-01-27 15:20 UTC (permalink / raw)
To: David S. Miller
Cc: Alexey Kuznetsov, Eric Dumazet, Pekka Savola (ipv6), netdev,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]
Hello,
Suppose we have the following scenario:
tcp_write_timer ->
tcp_retransmit_skb
in tcp_retransmit_skb we have `if (skb->len > cur_mss)' evaluted to true, which leads
to tcp_fragment(sk, skb, cur_mss, cur_mss) call. tcp_fragment calls skb_split(skb, buff, len)
which, in turn, calls skb_split_no_header(skb, skb1, len, pos), where we have
`skb_shinfo(skb)->nr_frags++' while in `for (i = 0; i < nfrags; i++)' loop.
Now we fall back to:
tcp_retransmit_skb ->
tcp_transmit_skb ->
pskb_copy(skb, gfp_mask)
In pskb_copy we perform iteration on nr_frags:
729 if (skb_shinfo(skb)->nr_frags) {
730 int i;
731 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
732 skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
733 get_page(skb_shinfo(n)->frags[i].page);
734 }
735 skb_shinfo(n)->nr_frags = i;
736 }
The problem here is that nr_frags was increased in skb_split, yet new page was not allocated.
So, get_page(skb_shinfo(n)->frags[i].page) is actually get_page(NULL):
mov (%rdx), %eax
where %rdx is 0x00
Please correct me if I'm missing something.
Sergey
[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]
^ permalink raw reply
* Re: Realtek r8168C / r8169 driver VLAN TAG stripping
From: Francois Romieu @ 2011-01-27 15:07 UTC (permalink / raw)
To: Anand Raj Manickam; +Cc: netdev, Hayes
In-Reply-To: <AANLkTimA1C=trH+WGpBdPU=vBf-cN+xB4Br3+vJb-Qum@mail.gmail.com>
Anand Raj Manickam <anandrm@gmail.com> :
[...]
> We upgraded to 2.6.36 kernel . The result is SAME.
> The VLAN tag gets stripped ;-)
> Do let me know if you need more info .
- ip addr show
- ethtool -k eth0
I do not get the "VLAN tag gets stripped" concept, especially on Tx.
Does it mean "no packet" or "a packet whose content is wrong" ?
--
Ueimor
^ permalink raw reply
* Re: TSO/GRO/LRO/somethingO breaks LVS on 2.6.36
From: Simon Horman @ 2011-01-27 14:42 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Simon Kirby, netdev
In-Reply-To: <1296114125.1783.139.camel@edumazet-laptop>
On Thu, Jan 27, 2011 at 08:42:05AM +0100, Eric Dumazet wrote:
> Le mercredi 26 janvier 2011 à 16:48 -0800, Simon Kirby a écrit :
> > On Thu, Jan 13, 2011 at 03:34:22PM +0900, Simon Horman wrote:
> >
> > > Hi Simon,
> > >
> > > thanks for prodding me to respond to this post offline and sorry for not
> > > responding earlier.
> > >
> > > Firstly, I think that this is a receive-side problem so I don't believe
> > > that GSO (generic segmentation offload) or other transmit-side options are
> > > likely to have any affect.
> > >
> > > My understanding is that on the receive-side there are two options which
> > > when enabled can result in the behaviour that you describe.
> > >
> > > * LRO (large receive offload)
> > >
> > > You have this disabled, and assuming it really is disabled it
> > > shouldn't be causing a problem.
> > >
> > > * GRO (generic receive offload)
> > >
> > > This does not seem to be in the output of your ethtool commands at all.
> > > So I wonder if your ethtool is too old to support this option?
> >
> > So, this was the case. Our ethtool (lenny) was too old to see the GRO
> > option, only GSO. Disabling GRO on eth1.39 has no effect, but disabling
> > it on eth1 caused it to stop receiving the merged frames, fixing the LVS
> > packet loss (due to no sending GSO support from LVS/IPVS).
> >
> > Speaking of this, did your patch for LVS/IPVS GSO support go anywhere?
> >
> > > In any case, I was able to reproduce the problem that you describe (or at
> > > least something very similar) using 2.6.36 with GRO enabled on eth1.1 and
> > > the problem did not manifest when I disabled GRO on eth1.1.
> >
> > It worked for you to do ethtool -K eth1.1 gro off, then? For me on
> > 2.6.37, it seemed to be that "ethtool -K eth1 gro off" was needed, even
> > though packets arrive on eth1.39.
> >
> > Also, strangely, 2.6.35.4's default state (with no received merged frames)
> > has GRO on for eth1 but off for eth1.39:
> >
> > # ethtool -k eth1
> > Offload parameters for eth1:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp-segmentation-offload: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: on
> > large-receive-offload: off
> > ntuple-filters: off
> > receive-hashing: off
> >
> > # ethtool -k eth1.39
> > Offload parameters for eth1.39:
> > rx-checksumming: on
> > tx-checksumming: off
> > scatter-gather: off
> > tcp-segmentation-offload: off
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: off
> > generic-receive-offload: off
> > large-receive-offload: off
> > ntuple-filters: off
> > receive-hashing: off
> >
> > If I set 2.6.37 to have all of the same options, I still see GRO frames
> > on 2.6.37 (tg3), which is weird.
> >
>
> Weird maybe, but GRO check/handling is done in dev_gro_receive(), on
> eth1 receive path.
>
> Frames are assembled by GRO layer using tg3 NAPI structure (holding GRO
> machine state) before being delivered to eth1.39
>
> It would be useless/expensive to add another GRO layer on eth1.39
>
> We might not report GRO state on vlan/bonding (or reflect real device
> GRO state)
That makes sense. So to clarify, this is the expected behaviour?
^ permalink raw reply
* Re: [autofs] [RFC 00/20] Proposal for remaining BKL users
From: Ian Kent @ 2011-01-27 14:41 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-kernel, autofs, linux-cifs, linux-x25, netdev, dri-devel,
linux-fsdevel
In-Reply-To: <1295994174-5043-1-git-send-email-arnd@arndb.de>
On Tue, 2011-01-25 at 23:22 +0100, Arnd Bergmann wrote:
> I've gone through all the code in the kernel that
> uses the big kernel lock and come up with a solution
> that seems at least half-reasonable for each of them.
>
> The decisions are somewhat arbitrary, but here is
> what I'd suggest we do:
>
> * Remove in 2.6.39:
> i830, autofs3, smbfs
Yes please, ack to removing autofs v3.
It is worth keeping in mind that if people are using the v3 protocol and
there is a problem using it then a bug should be logged against the
autofs4 module so it can be fixed. It's likely there will be a problem
or two because of the huge amount change that has occurred over time.
The same goes for the v4 protocol, since the testing done nowadays is
with the v5 protocol. The v2 protocol might work with the autofs4 module
since the code is still present but I'm not keen on trying to fix any
problems that might arise and think we should discourage its use. In any
case v2 users should few and far between.
>
> * Move to staging now, kill in 2.6.41 (or later):
> appletalk, hpfs
>
> * Work around in an ugly way, but keep alive:
> * ufs, ipx, i810, cx25721
>
> * Fix properly:
> * usbip, go7007, adfs, x25
>
> Some of the patches are rather tricky and I haven't
> really done much proper testing, so I'd much prefer
> the maintainers to pick up the patches and do the
> necessary testing where possible, or even come up
> with a better solution.
>
> Arnd Bergmann (20):
> drm/i810: remove the BKL
> drm: remove i830 driver
> staging/usbip: convert to kthread
> staging/cx25721: serialize access to devlist
> staging/go7007: remove the BKL
> staging: Remove autofs3
> staging: remove smbfs
> adfs: remove the big kernel lock
> hpfs: rename big kernel lock to hpfs_lock
> hpfs: replace BKL with a global mutex
> hpfs: move to drivers/staging
> x25: remove the BKL
> appletalk: move to staging
> staging/appletalk: remove the BKL
> ufs: remove the BKL
> ipx: remove the BKL
> tracing: don't trace the BKL
> rtmutex-tester: remove BKL tests
> drivers: remove extraneous includes of smp_lock.h
> BKL: That's all, folks
>
> MAINTAINERS | 11 +-
> drivers/gpu/drm/Kconfig | 47 +-
> drivers/gpu/drm/Makefile | 1 -
> drivers/gpu/drm/i810/i810_dma.c | 18 +-
> drivers/gpu/drm/i810/i810_drv.c | 6 +-
> drivers/gpu/drm/i830/Makefile | 8 -
> drivers/gpu/drm/i830/i830_dma.c | 1560 ---------
> drivers/gpu/drm/i830/i830_drv.c | 107 -
> drivers/gpu/drm/i830/i830_drv.h | 295 --
> drivers/gpu/drm/i830/i830_irq.c | 186 --
> drivers/net/Makefile | 1 -
> drivers/net/appletalk/Makefile | 7 -
> drivers/scsi/megaraid/megaraid_sas_fp.c | 1 -
> drivers/scsi/megaraid/megaraid_sas_fusion.c | 1 -
> drivers/staging/Kconfig | 8 +-
> drivers/staging/Makefile | 4 +-
> drivers/{net => staging}/appletalk/Kconfig | 1 -
> {net => drivers/staging}/appletalk/Makefile | 7 +-
> {net => drivers/staging}/appletalk/aarp.c | 2 +-
> .../linux => drivers/staging/appletalk}/atalk.h | 0
> {net => drivers/staging}/appletalk/atalk_proc.c | 2 +-
> drivers/{net => staging}/appletalk/cops.c | 2 +-
> drivers/{net => staging}/appletalk/cops.h | 0
> drivers/{net => staging}/appletalk/cops_ffdrv.h | 0
> drivers/{net => staging}/appletalk/cops_ltdrv.h | 0
> {net => drivers/staging}/appletalk/ddp.c | 44 +-
> {net => drivers/staging}/appletalk/dev.c | 0
> drivers/{net => staging}/appletalk/ipddp.c | 2 +-
> drivers/{net => staging}/appletalk/ipddp.h | 0
> drivers/{net => staging}/appletalk/ltpc.c | 2 +-
> drivers/{net => staging}/appletalk/ltpc.h | 0
> .../staging}/appletalk/sysctl_net_atalk.c | 2 +-
> drivers/staging/autofs/Kconfig | 22 -
> drivers/staging/autofs/Makefile | 7 -
> drivers/staging/autofs/TODO | 8 -
> drivers/staging/autofs/autofs_i.h | 165 -
> drivers/staging/autofs/dirhash.c | 260 --
> drivers/staging/autofs/init.c | 52 -
> drivers/staging/autofs/inode.c | 288 --
> drivers/staging/autofs/root.c | 648 ----
> drivers/staging/autofs/symlink.c | 26 -
> drivers/staging/autofs/waitq.c | 205 --
> drivers/staging/cx25821/Kconfig | 1 -
> drivers/staging/cx25821/cx25821-alsa.c | 2 +
> drivers/staging/cx25821/cx25821-core.c | 16 +-
> drivers/staging/cx25821/cx25821-video.c | 9 +-
> drivers/staging/cx25821/cx25821.h | 3 +-
> drivers/staging/easycap/easycap.h | 1 -
> drivers/staging/easycap/easycap_ioctl.c | 1 -
> drivers/staging/go7007/Kconfig | 1 -
> drivers/staging/go7007/s2250-loader.c | 3 -
> {fs => drivers/staging}/hpfs/Kconfig | 5 +-
> {fs => drivers/staging}/hpfs/Makefile | 0
> drivers/staging/hpfs/TODO | 5 +
> {fs => drivers/staging}/hpfs/alloc.c | 0
> {fs => drivers/staging}/hpfs/anode.c | 0
> {fs => drivers/staging}/hpfs/buffer.c | 0
> {fs => drivers/staging}/hpfs/dentry.c | 0
> {fs => drivers/staging}/hpfs/dir.c | 23 +-
> {fs => drivers/staging}/hpfs/dnode.c | 0
> {fs => drivers/staging}/hpfs/ea.c | 0
> {fs => drivers/staging}/hpfs/file.c | 9 +-
> {fs => drivers/staging}/hpfs/hpfs.h | 0
> {fs => drivers/staging}/hpfs/hpfs_fn.h | 36 +
> {fs => drivers/staging}/hpfs/inode.c | 9 +-
> {fs => drivers/staging}/hpfs/map.c | 0
> {fs => drivers/staging}/hpfs/name.c | 0
> {fs => drivers/staging}/hpfs/namei.c | 49 +-
> {fs => drivers/staging}/hpfs/super.c | 21 +-
> drivers/staging/smbfs/Kconfig | 56 -
> drivers/staging/smbfs/Makefile | 18 -
> drivers/staging/smbfs/TODO | 8 -
> drivers/staging/smbfs/cache.c | 208 --
> drivers/staging/smbfs/dir.c | 699 ----
> drivers/staging/smbfs/file.c | 456 ---
> drivers/staging/smbfs/getopt.c | 64 -
> drivers/staging/smbfs/getopt.h | 14 -
> drivers/staging/smbfs/inode.c | 854 -----
> drivers/staging/smbfs/ioctl.c | 68 -
> drivers/staging/smbfs/proc.c | 3502 --------------------
> drivers/staging/smbfs/proto.h | 89 -
> drivers/staging/smbfs/request.c | 817 -----
> drivers/staging/smbfs/request.h | 70 -
> drivers/staging/smbfs/smb.h | 118 -
> drivers/staging/smbfs/smb_debug.h | 34 -
> drivers/staging/smbfs/smb_fs.h | 153 -
> drivers/staging/smbfs/smb_fs_i.h | 37 -
> drivers/staging/smbfs/smb_fs_sb.h | 100 -
> drivers/staging/smbfs/smb_mount.h | 65 -
> drivers/staging/smbfs/smbfs.txt | 8 -
> drivers/staging/smbfs/smbiod.c | 343 --
> drivers/staging/smbfs/smbno.h | 363 --
> drivers/staging/smbfs/sock.c | 385 ---
> drivers/staging/smbfs/symlink.c | 67 -
> drivers/staging/usbip/Kconfig | 2 +-
> drivers/staging/usbip/stub.h | 4 +-
> drivers/staging/usbip/stub_dev.c | 13 +-
> drivers/staging/usbip/stub_rx.c | 13 +-
> drivers/staging/usbip/stub_tx.c | 14 +-
> drivers/staging/usbip/usbip_common.c | 105 -
> drivers/staging/usbip/usbip_common.h | 20 +-
> drivers/staging/usbip/usbip_event.c | 31 +-
> drivers/staging/usbip/vhci.h | 4 +-
> drivers/staging/usbip/vhci_hcd.c | 10 +-
> drivers/staging/usbip/vhci_rx.c | 16 +-
> drivers/staging/usbip/vhci_sysfs.c | 9 +-
> drivers/staging/usbip/vhci_tx.c | 14 +-
> drivers/target/target_core_device.c | 1 -
> drivers/target/target_core_fabric_lib.c | 1 -
> drivers/target/target_core_file.c | 1 -
> drivers/target/target_core_hba.c | 1 -
> drivers/target/target_core_iblock.c | 1 -
> drivers/target/target_core_pscsi.c | 1 -
> drivers/target/target_core_rd.c | 1 -
> drivers/target/target_core_tpg.c | 1 -
> drivers/target/target_core_transport.c | 1 -
> drivers/tty/n_hdlc.c | 1 -
> drivers/tty/n_r3964.c | 1 -
> drivers/tty/pty.c | 1 -
> drivers/tty/tty_io.c | 1 -
> drivers/tty/tty_ldisc.c | 2 -
> drivers/tty/vt/selection.c | 1 -
> drivers/tty/vt/vc_screen.c | 1 -
> drivers/tty/vt/vt.c | 1 -
> drivers/tty/vt/vt_ioctl.c | 1 -
> fs/Kconfig | 1 -
> fs/Makefile | 1 -
> fs/adfs/Kconfig | 1 -
> fs/adfs/dir.c | 6 -
> fs/adfs/inode.c | 6 -
> fs/adfs/super.c | 13 +-
> fs/compat_ioctl.c | 1 -
> fs/ufs/Kconfig | 1 -
> fs/ufs/inode.c | 78 +-
> fs/ufs/namei.c | 35 +-
> fs/ufs/super.c | 55 +-
> fs/ufs/truncate.c | 5 +-
> fs/ufs/ufs.h | 6 +-
> include/drm/Kbuild | 1 -
> include/drm/i830_drm.h | 342 --
> include/linux/Kbuild | 1 -
> include/linux/hardirq.h | 9 +-
> include/linux/smp_lock.h | 65 -
> include/trace/events/bkl.h | 61 -
> init/Kconfig | 5 -
> kernel/rtmutex-tester.c | 39 +-
> kernel/sched.c | 7 -
> lib/Makefile | 1 -
> lib/kernel_lock.c | 143 -
> net/Kconfig | 1 -
> net/Makefile | 1 -
> net/ipx/Kconfig | 1 -
> net/ipx/af_ipx.c | 52 +-
> net/socket.c | 1 -
> net/x25/Kconfig | 1 -
> net/x25/af_x25.c | 61 +-
> net/x25/x25_out.c | 7 +-
> 157 files changed, 356 insertions(+), 13722 deletions(-)
> delete mode 100644 drivers/gpu/drm/i830/Makefile
> delete mode 100644 drivers/gpu/drm/i830/i830_dma.c
> delete mode 100644 drivers/gpu/drm/i830/i830_drv.c
> delete mode 100644 drivers/gpu/drm/i830/i830_drv.h
> delete mode 100644 drivers/gpu/drm/i830/i830_irq.c
> delete mode 100644 drivers/net/appletalk/Makefile
> rename drivers/{net => staging}/appletalk/Kconfig (98%)
> rename {net => drivers/staging}/appletalk/Makefile (56%)
> rename {net => drivers/staging}/appletalk/aarp.c (99%)
> rename {include/linux => drivers/staging/appletalk}/atalk.h (100%)
> rename {net => drivers/staging}/appletalk/atalk_proc.c (99%)
> rename drivers/{net => staging}/appletalk/cops.c (99%)
> rename drivers/{net => staging}/appletalk/cops.h (100%)
> rename drivers/{net => staging}/appletalk/cops_ffdrv.h (100%)
> rename drivers/{net => staging}/appletalk/cops_ltdrv.h (100%)
> rename {net => drivers/staging}/appletalk/ddp.c (98%)
> rename {net => drivers/staging}/appletalk/dev.c (100%)
> rename drivers/{net => staging}/appletalk/ipddp.c (99%)
> rename drivers/{net => staging}/appletalk/ipddp.h (100%)
> rename drivers/{net => staging}/appletalk/ltpc.c (99%)
> rename drivers/{net => staging}/appletalk/ltpc.h (100%)
> rename {net => drivers/staging}/appletalk/sysctl_net_atalk.c (98%)
> delete mode 100644 drivers/staging/autofs/Kconfig
> delete mode 100644 drivers/staging/autofs/Makefile
> delete mode 100644 drivers/staging/autofs/TODO
> delete mode 100644 drivers/staging/autofs/autofs_i.h
> delete mode 100644 drivers/staging/autofs/dirhash.c
> delete mode 100644 drivers/staging/autofs/init.c
> delete mode 100644 drivers/staging/autofs/inode.c
> delete mode 100644 drivers/staging/autofs/root.c
> delete mode 100644 drivers/staging/autofs/symlink.c
> delete mode 100644 drivers/staging/autofs/waitq.c
> rename {fs => drivers/staging}/hpfs/Kconfig (80%)
> rename {fs => drivers/staging}/hpfs/Makefile (100%)
> create mode 100644 drivers/staging/hpfs/TODO
> rename {fs => drivers/staging}/hpfs/alloc.c (100%)
> rename {fs => drivers/staging}/hpfs/anode.c (100%)
> rename {fs => drivers/staging}/hpfs/buffer.c (100%)
> rename {fs => drivers/staging}/hpfs/dentry.c (100%)
> rename {fs => drivers/staging}/hpfs/dir.c (97%)
> rename {fs => drivers/staging}/hpfs/dnode.c (100%)
> rename {fs => drivers/staging}/hpfs/ea.c (100%)
> rename {fs => drivers/staging}/hpfs/file.c (97%)
> rename {fs => drivers/staging}/hpfs/hpfs.h (100%)
> rename {fs => drivers/staging}/hpfs/hpfs_fn.h (91%)
> rename {fs => drivers/staging}/hpfs/inode.c (98%)
> rename {fs => drivers/staging}/hpfs/map.c (100%)
> rename {fs => drivers/staging}/hpfs/name.c (100%)
> rename {fs => drivers/staging}/hpfs/namei.c (96%)
> rename {fs => drivers/staging}/hpfs/super.c (98%)
> delete mode 100644 drivers/staging/smbfs/Kconfig
> delete mode 100644 drivers/staging/smbfs/Makefile
> delete mode 100644 drivers/staging/smbfs/TODO
> delete mode 100644 drivers/staging/smbfs/cache.c
> delete mode 100644 drivers/staging/smbfs/dir.c
> delete mode 100644 drivers/staging/smbfs/file.c
> delete mode 100644 drivers/staging/smbfs/getopt.c
> delete mode 100644 drivers/staging/smbfs/getopt.h
> delete mode 100644 drivers/staging/smbfs/inode.c
> delete mode 100644 drivers/staging/smbfs/ioctl.c
> delete mode 100644 drivers/staging/smbfs/proc.c
> delete mode 100644 drivers/staging/smbfs/proto.h
> delete mode 100644 drivers/staging/smbfs/request.c
> delete mode 100644 drivers/staging/smbfs/request.h
> delete mode 100644 drivers/staging/smbfs/smb.h
> delete mode 100644 drivers/staging/smbfs/smb_debug.h
> delete mode 100644 drivers/staging/smbfs/smb_fs.h
> delete mode 100644 drivers/staging/smbfs/smb_fs_i.h
> delete mode 100644 drivers/staging/smbfs/smb_fs_sb.h
> delete mode 100644 drivers/staging/smbfs/smb_mount.h
> delete mode 100644 drivers/staging/smbfs/smbfs.txt
> delete mode 100644 drivers/staging/smbfs/smbiod.c
> delete mode 100644 drivers/staging/smbfs/smbno.h
> delete mode 100644 drivers/staging/smbfs/sock.c
> delete mode 100644 drivers/staging/smbfs/symlink.c
> delete mode 100644 include/drm/i830_drm.h
> delete mode 100644 include/linux/smp_lock.h
> delete mode 100644 include/trace/events/bkl.h
> delete mode 100644 lib/kernel_lock.c
>
> [re-sent to mailing lists, due to overly long
> Cc list getting rejected by vger]
>
> Cc: autofs@linux.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-cifs@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-x25@vger.kernel.org
> Cc: netdev@vger.kernel.org
>
>
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
^ permalink raw reply
* Re: [PATCH] r8169: use RxFIFO overflow workaround for 8168c chipset
From: Francois Romieu @ 2011-01-27 14:32 UTC (permalink / raw)
To: Ivan Vecera; +Cc: netdev, Hayes
In-Reply-To: <1296127451-12640-1-git-send-email-ivecera@redhat.com>
Ivan Vecera <ivecera@redhat.com> :
> I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
> generating RxFIFO overflow errors. The result is an infinite loop in
> interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
Acked-by: as your patch ties it to a specific 8168 revision (CFG_METHOD_6
in Realtek's parlance).
Surprizing as it may seem, unconditionaly enabling it has not always
produced the expected result. See 53f57357ff0afc37804f4e82ee3123e0c0a2cad6
for instance. Realtek's r1868 driver ignores it most of time as well.
Was it normal high-load or pktgen like high load ?
--
Ueimor
^ permalink raw reply
* [PATCH] xen: netfront: handle incoming GSO SKBs which are not CHECKSUM_PARTIAL
From: Ian Campbell @ 2011-01-27 14:14 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: Ian Campbell, Jeremy Fitzhardinge, David Miller, xen-devel,
netdev
In-Reply-To: <1296042981.14780.6813.camel@zakaz.uk.xensource.com>
The Linux network stack expects all GSO SKBs to have ip_summed ==
CHECKSUM_PARTIAL (which implies that the frame contains a partial
checksum) and the Xen network ring protocol similarly expects an SKB
which has GSO set to also have NETRX_csum_blank (which also implies a
partial checksum).
However there have been cases of buggy guests which mark a frame as
GSO but do not set csum_blank. If we detect that we a receiving such a
frame (which manifests as ip_summed != PARTIAL && skb_is_gso) then
force the SKB to partial and recalculate the checksum, since we cannot
rely on the peer having done so if they have not set csum_blank.
Add an ethtool stat to track occurances of this event.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David Miller <davem@davemloft.net>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
---
drivers/net/xen-netfront.c | 96 ++++++++++++++++++++++++++++++++++++++++----
1 files changed, 88 insertions(+), 8 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 546de57..da1f121 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -120,6 +120,9 @@ struct netfront_info {
unsigned long rx_pfn_array[NET_RX_RING_SIZE];
struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
struct mmu_update rx_mmu[NET_RX_RING_SIZE];
+
+ /* Statistics */
+ int rx_gso_checksum_fixup;
};
struct netfront_rx_info {
@@ -770,11 +773,29 @@ static RING_IDX xennet_fill_frags(struct netfront_info *np,
return cons;
}
-static int skb_checksum_setup(struct sk_buff *skb)
+static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
{
struct iphdr *iph;
unsigned char *th;
int err = -EPROTO;
+ int recalculate_partial_csum = 0;
+
+ /*
+ * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
+ * peers can fail to set NETRXF_csum_blank when sending a GSO
+ * frame. In this case force the SKB to CHECKSUM_PARTIAL and
+ * recalculate the partial checksum.
+ */
+ if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
+ struct netfront_info *np = netdev_priv(dev);
+ np->rx_gso_checksum_fixup++;
+ skb->ip_summed = CHECKSUM_PARTIAL;
+ recalculate_partial_csum = 1;
+ }
+
+ /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
+ if (skb->ip_summed != CHECKSUM_PARTIAL)
+ return 0;
if (skb->protocol != htons(ETH_P_IP))
goto out;
@@ -788,9 +809,23 @@ static int skb_checksum_setup(struct sk_buff *skb)
switch (iph->protocol) {
case IPPROTO_TCP:
skb->csum_offset = offsetof(struct tcphdr, check);
+
+ if (recalculate_partial_csum) {
+ struct tcphdr *tcph = (struct tcphdr *)th;
+ tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
+ skb->len - iph->ihl*4,
+ IPPROTO_TCP, 0);
+ }
break;
case IPPROTO_UDP:
skb->csum_offset = offsetof(struct udphdr, check);
+
+ if (recalculate_partial_csum) {
+ struct udphdr *udph = (struct udphdr *)th;
+ udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
+ skb->len - iph->ihl*4,
+ IPPROTO_UDP, 0);
+ }
break;
default:
if (net_ratelimit())
@@ -829,13 +864,11 @@ static int handle_incoming_queue(struct net_device *dev,
/* Ethernet work: Delayed to here as it peeks the header. */
skb->protocol = eth_type_trans(skb, dev);
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- if (skb_checksum_setup(skb)) {
- kfree_skb(skb);
- packets_dropped++;
- dev->stats.rx_errors++;
- continue;
- }
+ if (checksum_setup(dev, skb)) {
+ kfree_skb(skb);
+ packets_dropped++;
+ dev->stats.rx_errors++;
+ continue;
}
dev->stats.rx_packets++;
@@ -1632,12 +1665,59 @@ static void netback_changed(struct xenbus_device *dev,
}
}
+static const struct xennet_stat {
+ char name[ETH_GSTRING_LEN];
+ u16 offset;
+} xennet_stats[] = {
+ {
+ "rx_gso_checksum_fixup",
+ offsetof(struct netfront_info, rx_gso_checksum_fixup)
+ },
+};
+
+static int xennet_get_sset_count(struct net_device *dev, int string_set)
+{
+ switch (string_set) {
+ case ETH_SS_STATS:
+ return ARRAY_SIZE(xennet_stats);
+ default:
+ return -EINVAL;
+ }
+}
+
+static void xennet_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats, u64 * data)
+{
+ void *np = netdev_priv(dev);
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(xennet_stats); i++)
+ data[i] = *(int *)(np + xennet_stats[i].offset);
+}
+
+static void xennet_get_strings(struct net_device *dev, u32 stringset, u8 * data)
+{
+ int i;
+
+ switch (stringset) {
+ case ETH_SS_STATS:
+ for (i = 0; i < ARRAY_SIZE(xennet_stats); i++)
+ memcpy(data + i * ETH_GSTRING_LEN,
+ xennet_stats[i].name, ETH_GSTRING_LEN);
+ break;
+ }
+}
+
static const struct ethtool_ops xennet_ethtool_ops =
{
.set_tx_csum = ethtool_op_set_tx_csum,
.set_sg = xennet_set_sg,
.set_tso = xennet_set_tso,
.get_link = ethtool_op_get_link,
+
+ .get_sset_count = xennet_get_sset_count,
+ .get_ethtool_stats = xennet_get_ethtool_stats,
+ .get_strings = xennet_get_strings,
};
#ifdef CONFIG_SYSFS
--
1.5.6.5
^ permalink raw reply related
* Re: [PATCH v2] x25: remove the BKL
From: Arnd Bergmann @ 2011-01-27 13:43 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Hendry, linux-kernel, linux-x25, netdev
In-Reply-To: <1296134447.2614.5.camel@edumazet-laptop>
On Thursday 27 January 2011, Eric Dumazet wrote:
> Le jeudi 27 janvier 2011 à 13:38 +0100, Arnd Bergmann a écrit :
> > diff --git a/net/x25/x25_out.c b/net/x25/x25_out.c
> > index d00649f..f1a6ff1 100644
> > --- a/net/x25/x25_out.c
> > +++ b/net/x25/x25_out.c
> > @@ -68,8 +68,11 @@ int x25_output(struct sock *sk, struct sk_buff *skb)
> > frontlen = skb_headroom(skb);
> >
> > while (skb->len > 0) {
> > - if ((skbn = sock_alloc_send_skb(sk, frontlen + max_len,
> > - noblock, &err)) == NULL){
> > + release_sock(sk);
> > + skbn = sock_alloc_send_skb(sk, frontlen + max_len,
> > + 1, &err);
> > + lock_sock(sk);
> > + if (!skbn) {
> > if (err == -EWOULDBLOCK && noblock){
> > kfree_skb(skb);
> > return sent;
>
> This part looks strange :
>
> noblock variable became "const 1 : NOBLOCK"
>
> Why releasing socket if you dont block in sock_alloc_send_skb() ?
Leftover from an earlier version of the patch, thanks for catching this!
Originally, I wrote this as
long timeo = sock_sndtimeo(sk, noblock)
do {
skbn = sock_alloc_send_skb(sk, frontlen + max_len, 1, &err);
if (skbn)
break;
release_sock(sk);
timeo = sock_wait_for_wmem(sk, timeo);
lock_sock(sk);
} while (timeo);
Then I forgot to flip it back after I noticed that other protocols also just
call release_sock/lock_sock around sock_alloc_send_skb.
I think I'd better go over the whole series and see if there are more things
that got slightly broken...
Arnd
^ permalink raw reply
* Re: [PATCH v2] x25: remove the BKL
From: Eric Dumazet @ 2011-01-27 13:20 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: Andrew Hendry, linux-kernel, linux-x25, netdev
In-Reply-To: <201101271338.39295.arnd@arndb.de>
Le jeudi 27 janvier 2011 à 13:38 +0100, Arnd Bergmann a écrit :
> diff --git a/net/x25/x25_out.c b/net/x25/x25_out.c
> index d00649f..f1a6ff1 100644
> --- a/net/x25/x25_out.c
> +++ b/net/x25/x25_out.c
> @@ -68,8 +68,11 @@ int x25_output(struct sock *sk, struct sk_buff *skb)
> frontlen = skb_headroom(skb);
>
> while (skb->len > 0) {
> - if ((skbn = sock_alloc_send_skb(sk, frontlen + max_len,
> - noblock, &err)) == NULL){
> + release_sock(sk);
> + skbn = sock_alloc_send_skb(sk, frontlen + max_len,
> + 1, &err);
> + lock_sock(sk);
> + if (!skbn) {
> if (err == -EWOULDBLOCK && noblock){
> kfree_skb(skb);
> return sent;
This part looks strange :
noblock variable became "const 1 : NOBLOCK"
Why releasing socket if you dont block in sock_alloc_send_skb() ?
^ permalink raw reply
* [PATCH v2] x25: remove the BKL
From: Arnd Bergmann @ 2011-01-27 12:38 UTC (permalink / raw)
To: Andrew Hendry; +Cc: linux-kernel, linux-x25, netdev
In-Reply-To: <201101271317.01502.arnd@arndb.de>
This replaces all instances of lock_kernel in x25
with lock_sock, taking care to release the socket
lock around sleeping functions (sock_alloc_send_skb
and skb_recv_datagram). It is not clear whether
this is a correct solution, but it seem to be what
other protocols do in the same situation.
Compile-tested only.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: linux-x25@vger.kernel.org
Cc: netdev@vger.kernel.org
---
v2: fix possible NULL-pointer dereference in x25_sendmsg
net/x25/Kconfig | 1 -
net/x25/af_x25.c | 58 ++++++++++++++++------------------------------------
net/x25/x25_out.c | 7 ++++-
3 files changed, 23 insertions(+), 43 deletions(-)
diff --git a/net/x25/Kconfig b/net/x25/Kconfig
index 2196e55..e6759c9 100644
--- a/net/x25/Kconfig
+++ b/net/x25/Kconfig
@@ -5,7 +5,6 @@
config X25
tristate "CCITT X.25 Packet Layer (EXPERIMENTAL)"
depends on EXPERIMENTAL
- depends on BKL # should be fixable
---help---
X.25 is a set of standardized network protocols, similar in scope to
frame relay; the one physical line from your box to the X.25 network
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index ad96ee9..4680b1e 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -40,7 +40,6 @@
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/sched.h>
-#include <linux/smp_lock.h>
#include <linux/timer.h>
#include <linux/string.h>
#include <linux/net.h>
@@ -432,15 +431,6 @@ void x25_destroy_socket_from_timer(struct sock *sk)
sock_put(sk);
}
-static void x25_destroy_socket(struct sock *sk)
-{
- sock_hold(sk);
- lock_sock(sk);
- __x25_destroy_socket(sk);
- release_sock(sk);
- sock_put(sk);
-}
-
/*
* Handling for system calls applied via the various interfaces to a
* X.25 socket object.
@@ -647,18 +637,19 @@ static int x25_release(struct socket *sock)
struct sock *sk = sock->sk;
struct x25_sock *x25;
- lock_kernel();
if (!sk)
- goto out;
+ return 0;
x25 = x25_sk(sk);
+ sock_hold(sk);
+ lock_sock(sk);
switch (x25->state) {
case X25_STATE_0:
case X25_STATE_2:
x25_disconnect(sk, 0, 0, 0);
- x25_destroy_socket(sk);
+ __x25_destroy_socket(sk);
goto out;
case X25_STATE_1:
@@ -678,7 +669,8 @@ static int x25_release(struct socket *sock)
sock_orphan(sk);
out:
- unlock_kernel();
+ release_sock(sk);
+ sock_put(sk);
return 0;
}
@@ -1085,7 +1077,7 @@ static int x25_sendmsg(struct kiocb *iocb, struct socket *sock,
size_t size;
int qbit = 0, rc = -EINVAL;
- lock_kernel();
+ lock_sock(sk);
if (msg->msg_flags & ~(MSG_DONTWAIT|MSG_OOB|MSG_EOR|MSG_CMSG_COMPAT))
goto out;
@@ -1148,7 +1140,9 @@ static int x25_sendmsg(struct kiocb *iocb, struct socket *sock,
size = len + X25_MAX_L2_LEN + X25_EXT_MIN_LEN;
+ release_sock(sk);
skb = sock_alloc_send_skb(sk, size, noblock, &rc);
+ lock_sock(sk);
if (!skb)
goto out;
X25_SKB_CB(skb)->flags = msg->msg_flags;
@@ -1231,26 +1225,10 @@ static int x25_sendmsg(struct kiocb *iocb, struct socket *sock,
len++;
}
- /*
- * lock_sock() is currently only used to serialize this x25_kick()
- * against input-driven x25_kick() calls. It currently only blocks
- * incoming packets for this socket and does not protect against
- * any other socket state changes and is not called from anywhere
- * else. As x25_kick() cannot block and as long as all socket
- * operations are BKL-wrapped, we don't need take to care about
- * purging the backlog queue in x25_release().
- *
- * Using lock_sock() to protect all socket operations entirely
- * (and making the whole x25 stack SMP aware) unfortunately would
- * require major changes to {send,recv}msg and skb allocation methods.
- * -> 2.5 ;)
- */
- lock_sock(sk);
x25_kick(sk);
- release_sock(sk);
rc = len;
out:
- unlock_kernel();
+ release_sock(sk);
return rc;
out_kfree_skb:
kfree_skb(skb);
@@ -1271,7 +1249,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
unsigned char *asmptr;
int rc = -ENOTCONN;
- lock_kernel();
+ lock_sock(sk);
/*
* This works for seqpacket too. The receiver has ordered the queue for
* us! We do one quick check first though
@@ -1300,8 +1278,10 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_flags |= MSG_OOB;
} else {
/* Now we can treat all alike */
+ release_sock(sk);
skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
flags & MSG_DONTWAIT, &rc);
+ lock_sock(sk);
if (!skb)
goto out;
@@ -1338,14 +1318,12 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
msg->msg_namelen = sizeof(struct sockaddr_x25);
- lock_sock(sk);
x25_check_rbuf(sk);
- release_sock(sk);
rc = copied;
out_free_dgram:
skb_free_datagram(sk, skb);
out:
- unlock_kernel();
+ release_sock(sk);
return rc;
}
@@ -1581,18 +1559,18 @@ out_cud_release:
case SIOCX25CALLACCPTAPPRV: {
rc = -EINVAL;
- lock_kernel();
+ lock_sock(sk);
if (sk->sk_state != TCP_CLOSE)
break;
clear_bit(X25_ACCPT_APPRV_FLAG, &x25->flags);
- unlock_kernel();
+ release_sock(sk);
rc = 0;
break;
}
case SIOCX25SENDCALLACCPT: {
rc = -EINVAL;
- lock_kernel();
+ lock_sock(sk);
if (sk->sk_state != TCP_ESTABLISHED)
break;
/* must call accptapprv above */
@@ -1600,7 +1578,7 @@ out_cud_release:
break;
x25_write_internal(sk, X25_CALL_ACCEPTED);
x25->state = X25_STATE_3;
- unlock_kernel();
+ release_sock(sk);
rc = 0;
break;
}
diff --git a/net/x25/x25_out.c b/net/x25/x25_out.c
index d00649f..f1a6ff1 100644
--- a/net/x25/x25_out.c
+++ b/net/x25/x25_out.c
@@ -68,8 +68,11 @@ int x25_output(struct sock *sk, struct sk_buff *skb)
frontlen = skb_headroom(skb);
while (skb->len > 0) {
- if ((skbn = sock_alloc_send_skb(sk, frontlen + max_len,
- noblock, &err)) == NULL){
+ release_sock(sk);
+ skbn = sock_alloc_send_skb(sk, frontlen + max_len,
+ 1, &err);
+ lock_sock(sk);
+ if (!skbn) {
if (err == -EWOULDBLOCK && noblock){
kfree_skb(skb);
return sent;
^ permalink raw reply related
* Re: [PATCH 12/20] x25: remove the BKL
From: Arnd Bergmann @ 2011-01-27 12:17 UTC (permalink / raw)
To: Andrew Hendry; +Cc: linux-kernel, linux-x25, netdev
In-Reply-To: <AANLkTim=d4Tok-ri1BozZUfBFS=rofaodN8ZBA5AjKKd@mail.gmail.com>
On Thursday 27 January 2011, Andrew Hendry wrote:
> Left it running and put about 3.0G through x.25, it was running fine
> until after about 20 hours.
> I was stopping the test programs and hit this.
>
> Jan 27 20:18:34 jaunty kernel: [80403.945790] PGD 1d8b00067 PUD 1ddec3067 PMD 0
Is there no long above this about what problem was hit? There
is normally one saying things like "Bug: unable to handle ..."
Well, nevermind. It seems I could figure it out anyway:
> Jan 27 20:18:34 jaunty kernel: [80403.946083] RAX: 0000000000000080 RBX: ffff880228dbfd70 RCX: ffff880228dbfce4
> Jan 27 20:18:34 jaunty kernel: [80403.946096] RDX: 00000000fffffe00 RSI: 0000000000000000 RDI: ffff8801ba89f050
> Jan 27 20:18:34 jaunty kernel: [80403.946109] RBP: ffff880228dbfd18 R08: ffff88022aa91000 R09: 0000000000000000
> Jan 27 20:18:34 jaunty kernel: [80403.946482] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801ba89f000
> Jan 27 20:18:34 jaunty kernel: [80403.946495] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> ...
>
> If i have done it right, x25_sendmsg+0x1a7/0x530 is the skb_reserve
> which gets inlined here.
> (af_x25.c)
> /* Build a packet */
> SOCK_DEBUG(sk, "x25_sendmsg: sendto: building packet.\n");
>
> if ((msg->msg_flags & MSG_OOB) && len > 32)
> len = 32;
>
> size = len + X25_MAX_L2_LEN + X25_EXT_MIN_LEN;
>
> release_sock(sk);
> skb = sock_alloc_send_skb(sk, size, noblock, &rc);
> lock_sock(sk);
>
> X25_SKB_CB(skb)->flags = msg->msg_flags;
ok.
> objdump -dS show it at 2197 here.
>
> static inline void skb_reserve(struct sk_buff *skb, int len)
> {
> skb->data += len;
> skb->tail += len;
> 2197: 41 83 87 b4 00 00 00 addl $0x16,0xb4(%r15) <---
> 219e: 16
> 219f: 41 89 47 28 mov %eax,0x28(%r15)
> 21a3: 49 8b 87 c8 00 00 00 mov 0xc8(%r15),%rax
> 21aa: 48 83 c0 16 add $0x16,%rax
> skb_reserve(skb, X25_MAX_L2_LEN + X25_EXT_MIN_LEN);
>
> But im not sure where to go from there...
It's pretty clear that %r15 is the skb in this, and from the registers in the dump,
you can see that it's NULL. skb has just been returned from sock_alloc_send_skb,
which means that this function failed.
And indeed:
> > @@ -1148,9 +1140,10 @@ static int x25_sendmsg(struct kiocb *iocb, struct socket *sock,
> >
> > size = len + X25_MAX_L2_LEN + X25_EXT_MIN_LEN;
> >
> > + release_sock(sk);
> > skb = sock_alloc_send_skb(sk, size, noblock, &rc);
> > - if (!skb)
> > - goto out;
> > + lock_sock(sk);
> > +
> > X25_SKB_CB(skb)->flags = msg->msg_flags;
I accidentally removed the error handling in my patch. No idea how that
happened, it certainly wasn't intentional. Thanks a lot for the thorough
testing and the detailed bug report!
I'll follow up with a fixed patch that puts the error path back in.
Arnd
^ permalink raw reply
* Re: [PATCH] r8169: use RxFIFO overflow workaround for 8168c chipset
From: Ivan Vecera @ 2011-01-27 11:27 UTC (permalink / raw)
To: netdev; +Cc: romieu
In-Reply-To: <1296127451-12640-1-git-send-email-ivecera@redhat.com>
On Thu, 2011-01-27 at 12:24 +0100, Ivan Vecera wrote:
> I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
> generating RxFIFO overflow errors. The result is an infinite loop in
> interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
> With the workaround everything goes fine.
...of course starts generating RxFIFO overflow errors under very high
load. :-)
Ivan
^ permalink raw reply
* Re: [PATCH net-next-2.6] net_sched: sch_mqprio: dont leak kernel memory
From: Pádraig Brady @ 2011-01-27 11:17 UTC (permalink / raw)
To: Joe Perches; +Cc: Eric Dumazet, David Miller, netdev, John Fastabend
In-Reply-To: <1296064578.6115.39.camel@Joe-Laptop>
On 26/01/11 17:56, Joe Perches wrote:
> On Wed, 2011-01-26 at 18:49 +0100, Eric Dumazet wrote:
>> Le mercredi 26 janvier 2011 à 09:43 -0800, Joe Perches a écrit :
>>> I think the best style to use memset so that any
>>> possible struct padding is guaranteed to be zeroed.
>> We use the { 0 } style in net/sched,
>
> That's nice, but it's the wrong style.
> https://lkml.org/lkml/2010/12/15/63
>
>> and there is no padding in this
>> structure, I checked this point.
>
> That may be true right now for this particular
> structure, but that style is not future-proof.
I think {0,} is a valid init style, because the handling
of the initialization is special and overrides the point
mentioned in the link above.
cheers,
Pádraig.
^ permalink raw reply
* [PATCH] r8169: use RxFIFO overflow workaround for 8168c chipset
From: Ivan Vecera @ 2011-01-27 11:24 UTC (permalink / raw)
To: netdev; +Cc: romieu
I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
drivers/net/r8169.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index bde7d61..9ab3b43 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3757,7 +3757,8 @@ static void rtl_hw_start_8168(struct net_device *dev)
RTL_W16(IntrMitigate, 0x5151);
/* Work around for RxFIFO overflow. */
- if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+ if (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+ tp->mac_version == RTL_GIGA_MAC_VER_22) {
tp->intr_event |= RxFIFOOver | PCSTimeout;
tp->intr_event &= ~RxOverflow;
}
@@ -4641,7 +4642,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
/* Work around for rx fifo overflow */
if (unlikely(status & RxFIFOOver) &&
- (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+ (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+ tp->mac_version == RTL_GIGA_MAC_VER_22)) {
netif_stop_queue(dev);
rtl8169_tx_timeout(dev);
break;
--
1.7.3.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox