Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 0/3] net: minor gso encapsulation fixes
From: David Miller @ 2014-10-20 18:04 UTC (permalink / raw)
  To: fw; +Cc: netdev, edumazet, therbert
In-Reply-To: <1413805758-30026-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Mon, 20 Oct 2014 13:49:15 +0200

> The following series fixes a minor bug in the gso segmentation handlers
> when encapsulation offload is used.
> 
> Theoretically this could cause kernel panic when the stack tries
> to software-segment such a GRE offload packet, but it looks like there
> is only one affected call site (tbf scheduler) and it handles NULL
> return value.
> 
> I've included a followup patch to add IS_ERR_OR_NULL checks where needed.
> 
> While looking into this, I also found that size computation of the individual
> segments is incorrect if skb->encapsulation is set.
> 
> Please see individual patches for delta vs. v1.

Series applied, thanks Florian.

Longer term I'd really like to see the ops->gso_segment() implementations not
return NULL, but rather locally determinned pointer error codes instead.

^ permalink raw reply

* Re: Adding new packet scheduler
From: Cong Wang @ 2014-10-20 17:59 UTC (permalink / raw)
  To: Josh Clark; +Cc: netdev
In-Reply-To: <CAHmvzZS3bN1qErryz7Mps-dOveBk1_d7MJo-sqbVDMaoeu7_HA@mail.gmail.com>

On Mon, Oct 20, 2014 at 10:51 AM, Josh Clark <jcinma@gmail.com> wrote:
>
> That all makes a lot of sense. What do I need to do to apply these
> patches on a live system? I have access to a network made of virtual
> machines with standard Ubuntu 14.04, to which I have SSH access.
> There's no way for me to upload a different image to use.

https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

Q: How do the changes posted to netdev make their way into Linux?

A: There are always two trees (git repositories) in play.  Both are driven
   by David Miller, the main network maintainer.  There is the "net" tree,
   and the "net-next" tree.  As you can probably guess from the names, the
   net tree is for fixes to existing code already in the mainline tree from
   Linus, and net-next is where the new code goes for the future release.
   You can find the trees here:

http://git.kernel.org/?p=linux/kernel/git/davem/net.git
http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git

^ permalink raw reply

* Re: Adding new packet scheduler
From: Josh Clark @ 2014-10-20 17:51 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <CAHA+R7PScLKyhF7V+RWahL1Ovfjb7eCO+Hh-aDYchPrfvAUDHg@mail.gmail.com>

On Mon, Oct 20, 2014 at 1:37 PM, Cong Wang <cwang@twopensource.com> wrote:
> On Mon, Oct 20, 2014 at 10:32 AM, Josh Clark <jcinma@gmail.com> wrote:
>> Hi everyone,
>>
>> I'm a student at NC State University, and I'm working on a project to
>> implement some new classful AQM algorithms and test their
>> effectiveness. However, I'm getting hung up on how to get the new
>> algorithm set up in the kernel.
>>
>> From what I've looked at, it looks like I need to add my code to
>> /net/sched/, and edit both the Kconfig and the Makefile to be able to
>> add my code as another module.
>
> Basically yes, take a look at:
> http://lwn.net/Articles/577208/
>
>>
>> Finally, in order to use the new scheduler, I need to select it using
>> the tc command. What do I need to do to add my algorithm to the tc
>> command options?
>
> Read the existing qdisc's in tc, for example:
> http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/tc/q_fq.c

That all makes a lot of sense. What do I need to do to apply these
patches on a live system? I have access to a network made of virtual
machines with standard Ubuntu 14.04, to which I have SSH access.
There's no way for me to upload a different image to use.

^ permalink raw reply

* Re: Adding new packet scheduler
From: Dave Taht @ 2014-10-20 17:46 UTC (permalink / raw)
  To: Josh Clark; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAHmvzZQALDuwMPZC_ZncEP=BerniYis9Mx=oObUGNAY4wUGTZg@mail.gmail.com>

On Mon, Oct 20, 2014 at 10:32 AM, Josh Clark <jcinma@gmail.com> wrote:
> Hi everyone,
>
> I'm a student at NC State University, and I'm working on a project to
> implement some new classful AQM algorithms and test their
> effectiveness. However, I'm getting hung up on how to get the new
> algorithm set up in the kernel.
>
> From what I've looked at, it looks like I need to add my code to
> /net/sched/, and edit both the Kconfig and the Makefile to be able to
> add my code as another module.
>
> Finally, in order to use the new scheduler, I need to select it using
> the tc command. What do I need to do to add my algorithm to the tc
> command options?
>
> Any insight, articles, READMEs, or criticism you have for me is welcome.
>
> Thank you for all your help!

To give you an idea here are some currently out of tree, out of date,
and highly experimental codel patches of mine
that add in some new qdiscs, and patch the right places, with the
exception of patching include/uapi/linux/pkt_sched.h (which exports
the stuff to userspace)

https://github.com/dtaht/cerowrt-3.10/blob/master/target/linux/generic/patches-3.10/680-codel-add-experimental-codel-and-fq_codel-versions.patch

to add stuff to tc, you patch the iproute2 utility to match what is
exported from pkt_sched.h.




>
>
>
> -Josh Clark
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

^ permalink raw reply

* Re: Adding new packet scheduler
From: Cong Wang @ 2014-10-20 17:37 UTC (permalink / raw)
  To: Josh Clark; +Cc: netdev
In-Reply-To: <CAHmvzZQALDuwMPZC_ZncEP=BerniYis9Mx=oObUGNAY4wUGTZg@mail.gmail.com>

On Mon, Oct 20, 2014 at 10:32 AM, Josh Clark <jcinma@gmail.com> wrote:
> Hi everyone,
>
> I'm a student at NC State University, and I'm working on a project to
> implement some new classful AQM algorithms and test their
> effectiveness. However, I'm getting hung up on how to get the new
> algorithm set up in the kernel.
>
> From what I've looked at, it looks like I need to add my code to
> /net/sched/, and edit both the Kconfig and the Makefile to be able to
> add my code as another module.

Basically yes, take a look at:
http://lwn.net/Articles/577208/

>
> Finally, in order to use the new scheduler, I need to select it using
> the tc command. What do I need to do to add my algorithm to the tc
> command options?

Read the existing qdisc's in tc, for example:
http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/tc/q_fq.c

^ permalink raw reply

* Adding new packet scheduler
From: Josh Clark @ 2014-10-20 17:32 UTC (permalink / raw)
  To: netdev

Hi everyone,

I'm a student at NC State University, and I'm working on a project to
implement some new classful AQM algorithms and test their
effectiveness. However, I'm getting hung up on how to get the new
algorithm set up in the kernel.

>From what I've looked at, it looks like I need to add my code to
/net/sched/, and edit both the Kconfig and the Makefile to be able to
add my code as another module.

Finally, in order to use the new scheduler, I need to select it using
the tc command. What do I need to do to add my algorithm to the tc
command options?

Any insight, articles, READMEs, or criticism you have for me is welcome.

Thank you for all your help!

-Josh Clark

^ permalink raw reply

* Re: Routing BUG with ppp over l2tp
From: James Carlson @ 2014-10-20 17:19 UTC (permalink / raw)
  To: Alan Stern, James Chapman, Michal Ostrowski; +Cc: linux-ppp, netdev
In-Reply-To: <Pine.LNX.4.44L0.1410201152580.2403-100000@iolanthe.rowland.org>

On 10/20/14 12:39, Alan Stern wrote:
> As far as I can tell, the problem is caused by bad routing.  The kernel
> gets confused because the IP address assigned by the VPN server to the
> server's end of the ppp tunnel is the _same_ as the server's actual IP
> address.

Indeed!  That's pretty darned lame behavior by that peer.  It would
probably be workable if you had a virtual router instance and were able
to put the L2TP connection in one routing instance and the PPP
connection in another routing instance, but that's likely not at all
simple to achieve.

> Unfortunately, I can't work around this problem by reconfiguring the
> VPN server -- there's no way to tell it to use a different IP address
> for its end of the VPN tunnel.  Furthermore, the server works just fine
> with clients running Windows or OS-X.

Really?  That seems ... improbable.

> So it looks like the problem has to be fixed either in the kernel or in 
> the way pppd sets up its routing entry.  Can you guys help?

I think the easiest solution is to configure pppd to lie to the kernel
about the remote address.  Who cares what the remote address is on a
point-to-point link anyway?

There's currently no option to do this, but the code change in ipcp_up()
in pppd/ipcp.c would be rather simple.  Just make the "noremoteip" code
run all the time:

/* Deliberately falsify the remote address.  We don't care. */
ho->hisaddr = htonl(0x0aa00002);

As long as you don't need to contact that specific remote server using
the badly-assigned "internal" VPN address and can live with the fact
that you'll either go through the regular Internet to that address or be
forced to use some other address configured on that server, you should
be good.

(The address I used above is 10.160.0.2.  That was one of the internal
DNS server addresses provided in the log you posted.  It's not necessary
that the address used here is exactly that, but it may well be helpful.)

If you can't do that for some reason, then I suppose it would be
possible to use IP Chains (or whatever the packet-modification tool du
jure is used in your Linux distribution) to nail up an exception so that
the outside packets go to the outside interface and the inside ones go
to the PPP interface.  Doing that likely requires selecting on (at
least!) source address, so it's messy and ugly and possibly error-prone,
but it might be doable.

Otherwise, contact the maintainer of that VPN server.  It's just plain
old broken, and life's too short for broken software.

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply

* Routing BUG with ppp over l2tp
From: Alan Stern @ 2014-10-20 16:39 UTC (permalink / raw)
  To: James Chapman, Michal Ostrowski; +Cc: linux-ppp, netdev

James and Michal:

I'm having problem setting up a VPN connection that uses ppp over l2tp
over ipsec (this shows up under both 3.16 and 3.17-rc7).  The ipsec
part is working fine, and xl2tpd sets up its connection okay.  The
problem arises when ppp starts up.

As far as I can tell, the problem is caused by bad routing.  The kernel
gets confused because the IP address assigned by the VPN server to the
server's end of the ppp tunnel is the _same_ as the server's actual IP
address.

Here are some details.  My local address is 192.168.0.203 (behind a 
NAT-ing wireless router).  The VPN server is 140.247.233.37, as you can 
see from this entry in the system log:

Oct 13 17:10:27 saphir NetworkManager: xl2tpd[2616]: Connecting to host 140.247.233.37, port 1701

The addresses of the ppp tunnel endpoints are given later in the log:

Oct 13 17:10:30 saphir pppd[2618]: local  IP address 10.170.30.1
Oct 13 17:10:30 saphir pppd[2618]: remote IP address 140.247.233.37

The overall status from NetworkManager shows up in the log like this:

Oct 13 17:10:30 saphir NetworkManager: ** Message: L2TP service (IP Config Get) reply received.
Oct 13 17:10:30 saphir NetworkManager[439]: <info> VPN connection 'Rowland' (IP4 Config Get) reply received from old-style plugin.
Oct 13 17:10:30 saphir NetworkManager[439]: <info> VPN Gateway: 140.247.233.37
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Tunnel Device: ppp0
Oct 13 17:10:30 saphir NetworkManager[439]: <info> IPv4 configuration:
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Internal Address: 10.170.30.1
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Internal Prefix: 32
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Internal Point-to-Point Address: 140.247.233.37
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Maximum Segment Size (MSS): 0
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Forbid Default Route: no
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Internal DNS: 10.160.0.2
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   Internal DNS: 10.160.0.3
Oct 13 17:10:30 saphir NetworkManager[439]: <info>   DNS Domain: '(none)'
Oct 13 17:10:30 saphir NetworkManager[439]: <info> No IPv6 configuration
Oct 13 17:10:30 saphir NetworkManager[439]: <info> VPN connection 'Rowland' (IP Config Get) complete.

Once the ppp tunnel was set up, xl2tpd started getting errors:

Oct 13 17:11:31 saphir NetworkManager: xl2tpd[2616]: network_thread: select timeout
Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: network_thread: select timeout
Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: Maximum retries exceeded for tunnel 33716.  Closing.
Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: Connection 147 closed to 140.247.233.37, port 1701 (Timeout)

Packet-level debugging showed that once I reached this stage, the
control messages sent by xl2tpd were not received by the server.  I 
believe this is because they were not routed correctly.

Unfortunately, at the moment I don't have a copy of the routing table.  
Nevertheless, it definitely appears that the packets xl2tpd wanted to
send directly to the VPN server were instead routed back through the
ppp tunnel!  Presumably this was because the routing table contained
two entries with their destinations both set to 140.247.233.37/32 (one
for the l2tp connection and one for the ppp tunnel), and the kernel
used the wrong entry.

In fact, on several occasions during testing, the system deadlocked.  I 
was able to get a stack dump:

[ 2214.970639] BUG: soft lockup - CPU#1 stuck for 22s! [pppd:9423]
[ 2214.970648] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc authenc cmac rmd160 crypto_null ip_vti ip_tunnel af_key ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp salsa20_i586 camellia_generic cast6_generic cast5_generic cast_common deflate cts gcm ccm serpent_sse2_i586 serpent_generic glue_helper blowfish_generic blowfish_common twofish_generic twofish_i586 twofish_common xcbc sha512_generic des_generic geode_aes tpm_rng tpm timeriomem_rng virtio_rng uas usb_storage fuse ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 ip6table_filter xt_conntrack ip6_tables nf_con
 ntrack vfat
[ 2214.970769]  fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel arc4 iwldvm snd_hda_controller snd_hda_codec uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev mac80211 snd_hwdep coretemp kvm_intel kvm media snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd joydev iwlwifi microcode serio_raw cfg80211 asus_laptop lpc_ich atl1c soundcore sparse_keymap rfkill input_polldev acpi_cpufreq binfmt_misc i915 i2c_algo_bit drm_kms_helper drm i2c_core video
[ 2214.970854] CPU: 1 PID: 9423 Comm: pppd Tainted: G        W     3.16.3-200.fc20.i686 #1
[ 2214.970860] Hardware name: ASUSTeK Computer Inc.         UL20A               /UL20A     , BIOS 207     11/02/2009
[ 2214.970866] task: f0706a00 ti: e359c000 task.ti: e359c000
[ 2214.970873] EIP: 0060:[<c0a077b8>] EFLAGS: 00200287 CPU: 1
[ 2214.970885] EIP is at _raw_spin_lock_bh+0x28/0x40
[ 2214.970890] EAX: e5ff02a4 EBX: e5ff02a4 ECX: 00000060 EDX: 0000005f
[ 2214.970895] ESI: e5ff02b0 EDI: e3470d40 EBP: e359dc34 ESP: e359dc34
[ 2214.970900]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2214.970906] CR0: 8005003b CR2: b72eb000 CR3: 25f28000 CR4: 000407d0
[ 2214.970910] Stack:
[ 2214.970914]  e359dc9c f94efe2a f5140000 e359dc50 c045aba6 f5140000 00200286 e359dc78
[ 2214.970929]  c045c553 00000001 f5140000 001f9076 00200286 628a17c6 f075acc0 f075acc0
[ 2214.970942]  00000000 00200246 f4464848 00200246 00200246 e359dc9c e3470d84 f075acc0
[ 2214.970957] Call Trace:
[ 2214.970973]  [<f94efe2a>] ppp_push+0x32a/0x550 [ppp_generic]
[ 2214.970986]  [<c045aba6>] ? internal_add_timer+0x26/0x60
[ 2214.970994]  [<c045c553>] ? mod_timer_pending+0x63/0x130
[ 2214.971005]  [<f94f288d>] ppp_xmit_process+0x3cd/0x5e0 [ppp_generic]
[ 2214.971007]  [<c0914ae1>] ? harmonize_features+0x31/0x1d0
[ 2214.971007]  [<f94f2c78>] ppp_start_xmit+0x108/0x180 [ppp_generic]
[ 2214.971007]  [<c0915024>] dev_hard_start_xmit+0x2c4/0x540
[ 2214.971007]  [<c093244f>] sch_direct_xmit+0x9f/0x170
[ 2214.971007]  [<c091546a>] __dev_queue_xmit+0x1ca/0x430
[ 2214.971007]  [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007]  [<c09156df>] dev_queue_xmit+0xf/0x20
[ 2214.971007]  [<c091bacf>] neigh_direct_output+0xf/0x20
[ 2214.971007]  [<c094cb5a>] ip_finish_output+0x1aa/0x850
[ 2214.971007]  [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007]  [<c094dbbf>] ip_output+0x8f/0xe0
[ 2214.971007]  [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007]  [<c09a4f52>] xfrm_output_resume+0x342/0x3a0
[ 2214.971007]  [<c09a5013>] xfrm_output+0x43/0xf0
[ 2214.971007]  [<c0998f4d>] xfrm4_output_finish+0x3d/0x40
[ 2214.971007]  [<c0998e25>] __xfrm4_output+0x25/0x40
[ 2214.971007]  [<c0998f7f>] xfrm4_output+0x2f/0x70
[ 2214.971007]  [<c0998e00>] ? xfrm4_udp_encap_rcv+0x1b0/0x1b0
[ 2214.971007]  [<c094d2e7>] ip_local_out_sk+0x27/0x30
[ 2214.971007]  [<c094d5f4>] ip_queue_xmit+0x124/0x3f0
[ 2214.971007]  [<c0999f04>] ? xfrm_bundle_ok+0x64/0x170
[ 2214.971007]  [<c099a0ab>] ? xfrm_dst_check+0x1b/0x30
[ 2214.971007]  [<f94fd618>] l2tp_xmit_skb+0x298/0x4b0 [l2tp_core]
[ 2214.971007]  [<f950cd04>] pppol2tp_xmit+0x124/0x1d0 [l2tp_ppp]
[ 2214.971007]  [<f94f2adb>] ppp_channel_push+0x3b/0xb0 [ppp_generic]
[ 2214.971007]  [<f94f2d77>] ppp_write+0x87/0xc8 [ppp_generic]
[ 2214.971007]  [<f94f2cf0>] ? ppp_start_xmit+0x180/0x180 [ppp_generic]
[ 2214.971007]  [<c057723d>] vfs_write+0x9d/0x1d0
[ 2214.971007]  [<c0577951>] SyS_write+0x51/0xb0
[ 2214.971007]  [<c0a07b9f>] sysenter_do_call+0x12/0x12
[ 2214.971007] Code: 00 00 00 55 89 e5 66 66 66 66 90 64 81 05 90 b6 dc c0 00 02 00 00 ba 00 01 00 00 f0 66 0f c1 10 0f b6 ce 38 d1 75 04 5d c3 f3 90 <0f> b6 10 38 ca 75 f7 5d c3 90 90 90 90 90 90 90 90 90 90 90 90

The deadlock occurs because ppp_channel_push() (near the end of the
stack listing) holds the pch->downl spinlock while calling
pch->chan->ops->start_xmit().  The dump shows this call filtering down
through the routing layer and into ppp_push() (near the top of the
listing), which tries to acquire the same spinlock.

It sure looks like a ppp data packet was put into an l2tp wrapper 
and then sent back to the ppp layer for transmission, rather than 
getting sent out through the wlan0 interface.

Unfortunately, I can't work around this problem by reconfiguring the
VPN server -- there's no way to tell it to use a different IP address
for its end of the VPN tunnel.  Furthermore, the server works just fine
with clients running Windows or OS-X.

So it looks like the problem has to be fixed either in the kernel or in 
the way pppd sets up its routing entry.  Can you guys help?

Thanks,

Alan Stern

^ permalink raw reply

* Re: qdisc running
From: Jesper Dangaard Brouer @ 2014-10-20 16:17 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: john Fastabend, Herbert Xu, netdev@vger.kernel.org, eric Dumazet,
	brouer, Mathieu Desnoyers
In-Reply-To: <54440FFA.40307@mojatatu.com>


On Sun, 19 Oct 2014 15:24:42 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> Jesper,
> 
> You asked at the meeting the point to qdisc running.

Talking about __QDISC___STATE_RUNNING see slide 9/16:
 http://people.netfilter.org/hawk/presentations/LinuxPlumbers2014/performance_tx_qdisc_bulk_LPC2014.pdf

> Original intent is to allow only one cpu to enter the lower half of the
> qdisc path. IOW, if one cpu was already in the qdisc then that guy
> could be used to dequeue packets. i.e this is good for batching.
> Original idea was Herbert's with major improvement from Eric
> and a small one from me.

I guess it is good for our recent dequeue batching. But I think/hope we
can come up with a scheme that does not requires 6 lock/unlock
operations (as illustrated on slide 9).

John and I have talked about doing a lockless qdisc, but maintaining
this __QDISC___STATE_RUNNING in a lockless scenario, would cost us
extra atomic ops...

Are we still sure, that this model of only allowing a single CPU in the
dequeue path, is still the best solution?  (The TXQ lock should already
protect several CPUs in this code path).


> For history of different tried approaches look at:
> Look at slide 2: 
> http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf

I can see that you really needed the budget/fairness in the dequeue
loop, that we recently mangled with.

> then download the **amazing** flash animations which describe
> that history.
> http://vger.kernel.org/netconf2011_slides/netconf-2011-flash.tgz
> 
> Follow the bullets in slide2 and map to the flash animations.

What tool do I use to play these SWF files? (I tried VLC but no luck).

> If you go over them, you'll see it is still needed.

Too bad, I would like to avoid the second 
 
> I think someone oughta put those **amazing** animations on some
> website;->

I hope someone else can pick that up ;-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] net/mlx4_en: mlx4_en_netpoll shouldn't call napi_schedule when port is down
From: Ido Shamay @ 2014-10-20 15:59 UTC (permalink / raw)
  To: David Miller, amirv; +Cc: idos, netdev, yevgenyp, ogerlitz
In-Reply-To: <20140929.134704.646626597842088240.davem@davemloft.net>

On 9/29/2014 8:47 PM, David Miller wrote:
> From: Amir Vadai <amirv@mellanox.com>
> Date: Mon, 29 Sep 2014 14:04:55 +0300
>
>> From: Ido Shamay <idos@mellanox.com>
>>
>> mlx4_en_netpoll, which is mlx4_en ndo_poll_controller callback,
>> might be called when port is down, causing a napi_schedule when
>> napi->poll callback in NULL. mutex_trylock is needed to acquire
>> the port_state lock, since other threads may grab it and stop
>> the port while we are in napi scheduling. Using trylock since in atomic
>> context.
>>
>> Signed-off-by: Ido Shamay <idos@mellanox.com>
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>
> I don't think netpoll should 'variably succeed' on poll controller's
> ability to immediately take a trylock.
>
> You need to find a way to make this code always process the queues
> when it is called, the approach you are taking here is not a correct.
>
> If you've designed things in such a way that the locking here is
> difficult, that's unfortunate but you still have to make this
> function behave properly.

Sorry about that, aborting this commit since the problem was already 
fixed in commit 3484aac1, which added netif_device_detach to driver port 
stop flow and that made netconsole not to call ndo_poll_controller callback.

Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH 0/7] netfilter fixes for net
From: David Miller @ 2014-10-20 15:58 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1413792639-3954-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 20 Oct 2014 10:10:32 +0200

> The following patchset contains netfilter fixes for your net tree,
> they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks a lot Pablo.

^ permalink raw reply

* Re: v3.18-rc1 bloat-o-meter
From: Alexei Starovoitov @ 2014-10-20 15:57 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-embedded, Josh Triplett, Linux Kernel Development,
	netdev@vger.kernel.org
In-Reply-To: <alpine.DEB.2.02.1410201018230.24954@ayla.of.borg>

On Mon, Oct 20, 2014 at 10:37 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>         Hi all,
>
> Below is the bloat-o-meter output when comparing an m68k/atari_defconfig
> kernel for v3.17 and v3.18-rc1.
>
> Major culprit seems to be bpf. Can this become modular or optional?
> Currently it's always included if CONFIG_NET=y.

yes, the plan is to get it its own CONFIG_BPF so that tracing wouldn't
need to depend on whole NET and so it can be stubbed out.

^ permalink raw reply

* Re: [PATCH stable v3.2 v3.4] ipv4: disable bh while doing route gc
From: David Miller @ 2014-10-20 15:39 UTC (permalink / raw)
  To: ben; +Cc: mleitner, stable, netdev, hannes
In-Reply-To: <1413807808.31953.24.camel@decadent.org.uk>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Mon, 20 Oct 2014 13:23:28 +0100

> On Mon, 2014-10-20 at 00:23 -0400, David Miller wrote:
>> From: Ben Hutchings <ben@decadent.org.uk>
>> Date: Mon, 20 Oct 2014 04:09:41 +0100
>> 
>> > I've appplied this and the previous two patches mentioned ('ipv4: move
>> > route garbage collector to work queue' and 'ipv4: avoid parallel route
>> > cache gc executions').  But I didn't get the other two from you.  The
>> > last batch of networking fixes I received and applied was dated
>> > 2014-08-07, and the next one I've seen is dated 2014-10-11 and has
>> > nothing for 3.2 or 3.4.  Did I miss one between these?
>> 
>> I'm at the point where I'm personally not going to go back more than
>> four releases, anything more than that is rediculous.
>> 
>> And this time that was 3.17, 3.16, 3.14, and 3.10
> 
> OK.  I appreciate all the work you've done to backport to 3.2, but would
> also have appreciated an explicit note to say you were dropping the
> earlier versions.

My apologies, but as time goes on and more -stable releases go active
I will certainly be tail dropping to keep it within 4 releases.

^ permalink raw reply

* Re: [PATCH] net: ethernet : stmicro: fixed power suspend and resume failure in stmmac driver
From: David Miller @ 2014-10-20 15:36 UTC (permalink / raw)
  To: hliang1025; +Cc: peppe.cavallaro, netdev, linux-kernel, adi-linux-docs
In-Reply-To: <CAPig_t=Dguhw8jmAX_KvMrB9ZjL1POB+7jrdBmwpvTzjasKD1w@mail.gmail.com>

From: Hao Liang <hliang1025@gmail.com>
Date: Mon, 20 Oct 2014 16:55:08 +0800

> I can't quite seize David's meaing whether the ->mac->xxx should
> inside the lock or outside the lock.  In my opinion, the ->mac->xxx
> calls did not operate any shared data or struct. The calls just
> control the hardware by write data to registers.  So ->mac->xxx
> calls can be removed from lock.

If you make a decision to program the hardware in a certain way, you
must make sure that hardware programming operation occurs atomically.

This could be because it takes multiple register writes to perform
the operation and they are dependent upon eachother, or you are
makeing multiple modifications which must occur in a certain
sequence.

If you allow these ->mac->xxx calls to run in parallel with eachother
I guarantee it will cause problems.

Stop brushing it off as a non-issue, unless you want your driver
to mysteriously not work when the race is triggered.

^ permalink raw reply

* Re: [PATCH RFT 0/8] Marvell PXA168 libphy handling and Berlin Ethernet
From: Antoine Tenart @ 2014-10-20 15:10 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Antoine Tenart, David S. Miller, Florian Fainelli, Eric Miao,
	Haojian Zhuang, linux-arm-kernel, netdev, devicetree,
	linux-kernel
In-Reply-To: <54451E3F.1030700@gmail.com>

Sebastian,

On Mon, Oct 20, 2014 at 04:37:51PM +0200, Sebastian Hesselbarth wrote:
> On 16.10.2014 11:53, Antoine Tenart wrote:
> >On Thu, Oct 09, 2014 at 02:38:58PM +0200, Sebastian Hesselbarth wrote:
> >>This patch series deals with a removing a IP feature that can be found
> >>on all currently supported Marvell Ethernet IP (pxa168_eth, mv643xx_eth,
> >>mvneta). The MAC IP allows to automatically perform PHY auto-negotiation
> >>without software interaction.
> >>
> >>However, this feature (a) fundamentally clashes with the way libphy works
> >>and (b) is unable to deal with quirky PHYs that require special treatment.
> >>In this series, pxa168_eth driver is rewritten to completely disable that
> >>feature and properly deal with libphy provided PHYs. The other two drivers
> >>are suspect to future patch sets, also removing the code related with it.
> >>
> >>Currently, the patches are based on next-20141009 and will be resent once
> >>v3.18-rc1 drops. This is a Request-For-Test on both BG2Q and MMP/gplug as
> >
> >I tested the series on a BG2Q, it worked well.
> 
> Thanks for testing! I assume you have added a phy-connection-type
> property to BG2Q's ethernet node?

Yes, I added the following property to the ethernet-phy node:

	phy-connection-type = "mii";

Feel free to add this to your series, or I can also send a patch.


Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Thomas Graf @ 2014-10-20 14:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, Cornelia Huck
  Cc: wlinux-kernel, Rusty Russell, virtualization, linux-scsi,
	linux-s390, v9fs-developer, netdev, kvm, Amit Shah,
	Christian Borntraeger, David S. Miller, Paolo Bonzini,
	Arnd Bergmann, Greg Kroah-Hartman
In-Reply-To: <20141020140410.GA10994@redhat.com>

On 10/20/14 at 05:04pm, Michael S. Tsirkin wrote:
> Sure enough, this helps:
> 
> Tested-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Pls repost as a top-level patch.

Thanks for fixing this so promptly.

^ permalink raw reply

* Re: [PATCH RFT 0/8] Marvell PXA168 libphy handling and Berlin Ethernet
From: Sebastian Hesselbarth @ 2014-10-20 14:37 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: David S. Miller, Florian Fainelli, Eric Miao, Haojian Zhuang,
	linux-arm-kernel, netdev, devicetree, linux-kernel
In-Reply-To: <20141016095323.GA1061@kwain>

On 16.10.2014 11:53, Antoine Tenart wrote:
> On Thu, Oct 09, 2014 at 02:38:58PM +0200, Sebastian Hesselbarth wrote:
>> This patch series deals with a removing a IP feature that can be found
>> on all currently supported Marvell Ethernet IP (pxa168_eth, mv643xx_eth,
>> mvneta). The MAC IP allows to automatically perform PHY auto-negotiation
>> without software interaction.
>>
>> However, this feature (a) fundamentally clashes with the way libphy works
>> and (b) is unable to deal with quirky PHYs that require special treatment.
>> In this series, pxa168_eth driver is rewritten to completely disable that
>> feature and properly deal with libphy provided PHYs. The other two drivers
>> are suspect to future patch sets, also removing the code related with it.
>>
>> Currently, the patches are based on next-20141009 and will be resent once
>> v3.18-rc1 drops. This is a Request-For-Test on both BG2Q and MMP/gplug as
>
> I tested the series on a BG2Q, it worked well.

Antoine,

Thanks for testing! I assume you have added a phy-connection-type
property to BG2Q's ethernet node?

I doubt there will be any Tested-by from MMP guys anytime soon, so
I'll resend this with the minor remarks to be merged for 3.19.

Sebastian

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Michael S. Tsirkin @ 2014-10-20 14:04 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
	linux-kernel, virtualization, Amit Shah, Greg Kroah-Hartman,
	Arnd Bergmann, Paolo Bonzini, Thomas Graf, v9fs-developer,
	David S. Miller
In-Reply-To: <20141020133555.GA27111@redhat.com>

On Mon, Oct 20, 2014 at 04:35:55PM +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 20, 2014 at 02:42:23PM +0200, Cornelia Huck wrote:
> > On Mon, 20 Oct 2014 13:07:50 +0100
> > Thomas Graf <tgraf@suug.ch> wrote:
> > 
> > > On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > > > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > > > This is set automatically after probe returns, virtio console violated this
> > > > rule by adding inbufs, which causes the VQ to be used directly within
> > > > probe.
> > > > 
> > > > To fix, call virtio_device_ready before using VQs.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  drivers/char/virtio_console.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > > index b585b47..6ebe8f6 100644
> > > > --- a/drivers/char/virtio_console.c
> > > > +++ b/drivers/char/virtio_console.c
> > > > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> > > >  	spin_lock_init(&port->outvq_lock);
> > > >  	init_waitqueue_head(&port->waitqueue);
> > > >  
> > > > +	virtio_device_ready(portdev->vdev);
> > > > +
> > > >  	/* Fill the in_vq with buffers so the host can send us data. */
> > > >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> > > >  	if (!nr_added_bufs) {
> > > 
> > > Seems like probe and add_port() now both set VIRTIO_CONFIG_S_DRIVER_OK
> > 
> > I think we need to set this in the probe function instead, otherwise we
> > fail for multiqueue (which also wants to use the control queue early).
> > 
> > Completely untested patch below; I can send this with proper s-o-b if
> > it helps.
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index bfa6400..cf7a561 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1449,8 +1449,6 @@ static int add_port(struct ports_device *portdev, u32 id)
> >  	spin_lock_init(&port->outvq_lock);
> >  	init_waitqueue_head(&port->waitqueue);
> >  
> > -	virtio_device_ready(portdev->vdev);
> > -
> >  	/* Fill the in_vq with buffers so the host can send us data. */
> >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> >  	if (!nr_added_bufs) {
> > @@ -2026,6 +2024,8 @@ static int virtcons_probe(struct virtio_device *vdev)
> >  	spin_lock_init(&portdev->ports_lock);
> >  	INIT_LIST_HEAD(&portdev->ports);
> >  
> > +	virtio_device_ready(portdev->vdev);
> > +
> >  	if (multiport) {
> >  		unsigned int nr_added_bufs;
> >  
> 
> I wanted to set DRIVER_OK as late as possible, to both reduce
> the chance it can fail after DRIVER_OK and to reduce  the risk of
> introducing a regression since old qemu might only start sending
> interrupts after DRIVER_OK is set.
> 
> So I wanted everything completely initialized before DRIVER_OK.
> 
> You patch makes sense to me since nothing can fail,
> and we won't get interrupts before we add inbufs.
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Testig will report shortly, pls send with sob line.

Sure enough, this helps:

Tested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>

Pls repost as a top-level patch.

> -- 
> MST

^ permalink raw reply

* Re: Queue with wait-free enqueue, blocking dequeue, splice
From: Jesper Dangaard Brouer @ 2014-10-20 14:02 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: brouer, Paul E. McKenney, netdev, Jamal Hadi Salim
In-Reply-To: <412768308.11171.1413632892841.JavaMail.zimbra@efficios.com>

On Sat, 18 Oct 2014 11:48:12 +0000 (UTC) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Following our LPC discussion on lock-free queue algorithms
> for qdisc, here is some info on the wfcqueue implementation
> found in Userspace RCU. See http://urcu.so for info and
> git repository.

Thank for following up on our very interesting discussions.

I've started with the more simple variant "urcu/static/wfqueue.h" to
understand the concepts.  And I'm now reading wfcqueue.h, which I guess
it replacing wfqueue.h.

> Here is the wfcqueue ported to the Linux kernel I sent last
> year as RFC:
> https://lkml.org/lkml/2013/3/14/289
> 
> I'm very interested to learn if it fits well for your
> use-case,

Does this wfcqueue API support bulk dequeue?  (A feature needed for the
lock-less qdisc implementation, else it cannot compete with our new
bulk dequeue strategy).

AFAIK your queue implementation is a CAS-based, Wait-Free on enqueue,
but Lock-Free on dequeue with the potential for waiting/blocking on
a enqueue processes.
 I'm not 100% sure, that we want this behavior for the qdisc system.

I can certainly use the wfcq_empty() check, but I guess I need to
maintain a separate counter to maintain the qdisc limit, right?
(I would use the approximate/split counter API percpu_counter to keep
this scalable, and wfcq_empty() would provide an accurate empty check)

I think, we/I should start micro benchmarking the different approaches.
As our time budget is only 67.2ns 
 http://netoptimizer.blogspot.dk/2014/05/the-calculations-10gbits-wirespeed.html
(or bulking tricks artificially "increase" this budget)

The motivation behind this lockless qdisc is, the current qdisc locking
cost 48ns, see slide 9/16 titled "Qdisc locking is nasty":
 http://people.netfilter.org/hawk/presentations/LinuxPlumbers2014/performance_tx_qdisc_bulk_LPC2014.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Michael S. Tsirkin @ 2014-10-20 13:35 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
	linux-kernel, virtualization, Amit Shah, Greg Kroah-Hartman,
	Arnd Bergmann, Paolo Bonzini, Thomas Graf, v9fs-developer,
	David S. Miller
In-Reply-To: <20141020144223.5655dcbc.cornelia.huck@de.ibm.com>

On Mon, Oct 20, 2014 at 02:42:23PM +0200, Cornelia Huck wrote:
> On Mon, 20 Oct 2014 13:07:50 +0100
> Thomas Graf <tgraf@suug.ch> wrote:
> 
> > On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > > This is set automatically after probe returns, virtio console violated this
> > > rule by adding inbufs, which causes the VQ to be used directly within
> > > probe.
> > > 
> > > To fix, call virtio_device_ready before using VQs.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/char/virtio_console.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index b585b47..6ebe8f6 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> > >  	spin_lock_init(&port->outvq_lock);
> > >  	init_waitqueue_head(&port->waitqueue);
> > >  
> > > +	virtio_device_ready(portdev->vdev);
> > > +
> > >  	/* Fill the in_vq with buffers so the host can send us data. */
> > >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> > >  	if (!nr_added_bufs) {
> > 
> > Seems like probe and add_port() now both set VIRTIO_CONFIG_S_DRIVER_OK
> 
> I think we need to set this in the probe function instead, otherwise we
> fail for multiqueue (which also wants to use the control queue early).
> 
> Completely untested patch below; I can send this with proper s-o-b if
> it helps.
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index bfa6400..cf7a561 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -1449,8 +1449,6 @@ static int add_port(struct ports_device *portdev, u32 id)
>  	spin_lock_init(&port->outvq_lock);
>  	init_waitqueue_head(&port->waitqueue);
>  
> -	virtio_device_ready(portdev->vdev);
> -
>  	/* Fill the in_vq with buffers so the host can send us data. */
>  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
>  	if (!nr_added_bufs) {
> @@ -2026,6 +2024,8 @@ static int virtcons_probe(struct virtio_device *vdev)
>  	spin_lock_init(&portdev->ports_lock);
>  	INIT_LIST_HEAD(&portdev->ports);
>  
> +	virtio_device_ready(portdev->vdev);
> +
>  	if (multiport) {
>  		unsigned int nr_added_bufs;
>  

I wanted to set DRIVER_OK as late as possible, to both reduce
the chance it can fail after DRIVER_OK and to reduce  the risk of
introducing a regression since old qemu might only start sending
interrupts after DRIVER_OK is set.

So I wanted everything completely initialized before DRIVER_OK.

You patch makes sense to me since nothing can fail,
and we won't get interrupts before we add inbufs.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

Testig will report shortly, pls send with sob line.

-- 
MST

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Michael S. Tsirkin @ 2014-10-20 13:14 UTC (permalink / raw)
  To: Thomas Graf
  Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
	linux-kernel, virtualization, Greg Kroah-Hartman, Arnd Bergmann,
	Paolo Bonzini, Amit Shah, v9fs-developer, David S. Miller
In-Reply-To: <20141020131016.GA20489@redhat.com>

On Mon, Oct 20, 2014 at 04:10:16PM +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 20, 2014 at 01:07:50PM +0100, Thomas Graf wrote:
> > On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > > This is set automatically after probe returns, virtio console violated this
> > > rule by adding inbufs, which causes the VQ to be used directly within
> > > probe.
> > > 
> > > To fix, call virtio_device_ready before using VQs.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/char/virtio_console.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index b585b47..6ebe8f6 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> > >  	spin_lock_init(&port->outvq_lock);
> > >  	init_waitqueue_head(&port->waitqueue);
> > >  
> > > +	virtio_device_ready(portdev->vdev);
> > > +
> > >  	/* Fill the in_vq with buffers so the host can send us data. */
> > >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> > >  	if (!nr_added_bufs) {
> 
> I see Cornelia sent a patch already.
> I'd like to reproduce this though - could you send me
> the command line please?

Nevermind, the trick is to add a port it seems:

-device virtio-serial -chardev socket,path=/tmp/c1,server,nowait,id=foo
-device virtserialport,chardev=foo,name=org.fedoraproject.port.0

works fine without -device virtserialport.

-- 
MST

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Thomas Graf @ 2014-10-20 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Rusty Russell, virtualization, linux-scsi,
	linux-s390, v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
	Christian Borntraeger, David S. Miller, Paolo Bonzini,
	Arnd Bergmann, Greg Kroah-Hartman
In-Reply-To: <20141020131016.GA20489@redhat.com>

On 10/20/14 at 04:10pm, Michael S. Tsirkin wrote:
> On Mon, Oct 20, 2014 at 01:07:50PM +0100, Thomas Graf wrote:
> > On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > > This is set automatically after probe returns, virtio console violated this
> > > rule by adding inbufs, which causes the VQ to be used directly within
> > > probe.
> > > 
> > > To fix, call virtio_device_ready before using VQs.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/char/virtio_console.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index b585b47..6ebe8f6 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> > >  	spin_lock_init(&port->outvq_lock);
> > >  	init_waitqueue_head(&port->waitqueue);
> > >  
> > > +	virtio_device_ready(portdev->vdev);
> > > +
> > >  	/* Fill the in_vq with buffers so the host can send us data. */
> > >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> > >  	if (!nr_added_bufs) {
> 
> I see Cornelia sent a patch already.
> I'd like to reproduce this though - could you send me
> the command line please?

1. Invoke qemu:

/usr/bin/qemu-system-x86_64 -machine accel=kvm -name f20-2 -S -machine
pc-i440fx-1.4,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp
4,sockets=4,cores=1,threads=1 -uuid
dd45ec4c-7c26-4b73-9b6b-81f2912c5235 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/f20-2.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
file=/virt/f20n2-clone,if=none,id=drive-virtio-disk0,format=raw
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ef:72:4e,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on
-device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7

2. Attach console right away: virsh console f20-2

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Thomas Graf @ 2014-10-20 13:10 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Michael S. Tsirkin, linux-kernel, Rusty Russell, virtualization,
	linux-scsi, linux-s390, v9fs-developer, netdev, kvm, Amit Shah,
	Christian Borntraeger, David S. Miller, Paolo Bonzini,
	Arnd Bergmann, Greg Kroah-Hartman
In-Reply-To: <20141020144223.5655dcbc.cornelia.huck@de.ibm.com>

On 10/20/14 at 02:42pm, Cornelia Huck wrote:
> On Mon, 20 Oct 2014 13:07:50 +0100
> Thomas Graf <tgraf@suug.ch> wrote:
> 
> > On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > > This is set automatically after probe returns, virtio console violated this
> > > rule by adding inbufs, which causes the VQ to be used directly within
> > > probe.
> > > 
> > > To fix, call virtio_device_ready before using VQs.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/char/virtio_console.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index b585b47..6ebe8f6 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> > >  	spin_lock_init(&port->outvq_lock);
> > >  	init_waitqueue_head(&port->waitqueue);
> > >  
> > > +	virtio_device_ready(portdev->vdev);
> > > +
> > >  	/* Fill the in_vq with buffers so the host can send us data. */
> > >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> > >  	if (!nr_added_bufs) {
> > 
> > Seems like probe and add_port() now both set VIRTIO_CONFIG_S_DRIVER_OK
> 
> I think we need to set this in the probe function instead, otherwise we
> fail for multiqueue (which also wants to use the control queue early).
> 
> Completely untested patch below; I can send this with proper s-o-b if
> it helps.

virtio_dev_probe() already sets DRIVER_OK if ->probe() returned
without an error. I assume Michael added it to add_port() because
probing doesn't always occur first. Can we just remove the BUG_ON()?

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Michael S. Tsirkin @ 2014-10-20 13:10 UTC (permalink / raw)
  To: Thomas Graf
  Cc: linux-s390, kvm, linux-scsi, Christian Borntraeger, netdev,
	linux-kernel, virtualization, Greg Kroah-Hartman, Arnd Bergmann,
	Paolo Bonzini, Amit Shah, v9fs-developer, David S. Miller
In-Reply-To: <20141020120750.GA18561@casper.infradead.org>

On Mon, Oct 20, 2014 at 01:07:50PM +0100, Thomas Graf wrote:
> On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > This is set automatically after probe returns, virtio console violated this
> > rule by adding inbufs, which causes the VQ to be used directly within
> > probe.
> > 
> > To fix, call virtio_device_ready before using VQs.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/char/virtio_console.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index b585b47..6ebe8f6 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> >  	spin_lock_init(&port->outvq_lock);
> >  	init_waitqueue_head(&port->waitqueue);
> >  
> > +	virtio_device_ready(portdev->vdev);
> > +
> >  	/* Fill the in_vq with buffers so the host can send us data. */
> >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> >  	if (!nr_added_bufs) {

I see Cornelia sent a patch already.
I'd like to reproduce this though - could you send me
the command line please?


> Seems like probe and add_port() now both set VIRTIO_CONFIG_S_DRIVER_OK
> 
>     1.839658] kernel BUG at include/linux/virtio_config.h:125!
> [    1.839995] invalid opcode: 0000 [#1] SMP 
> [    1.840169] Modules linked in: serio_raw virtio_balloon pcspkr virtio_net virtio_console soundcore parport_pc floppy pvpanic parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc qxl drm_kms_helper ttm drm virtio_blk i2c_core virtio_pci ata_generic virtio_ring virtio pata_acpi
> [    1.840169] CPU: 2 PID: 266 Comm: kworker/2:2 Not tainted 3.17.0+ #1
> [    1.840169] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    1.840169] Workqueue: events control_work_handler [virtio_console]
> [    1.840169] task: ffff8800364bc840 ti: ffff880078490000 task.ti: ffff880078490000
> [    1.840169] RIP: 0010:[<ffffffffa01d92c6>]  [<ffffffffa01d92c6>] virtio_device_ready.part.12+0x4/0x6 [virtio_console]
> [    1.840169] RSP: 0018:ffff880078493c78  EFLAGS: 00010202
> [    1.840169] RAX: 0000000000000007 RBX: ffff880036406200 RCX: 0000000000006e39
> [    1.840169] RDX: 000000000000c0b2 RSI: 0000000000000000 RDI: 000000000001c0b2
> [    1.840169] RBP: ffff880078493c78 R08: ffffffff81d2c6f8 R09: 0000000000000000
> [    1.840169] R10: 0000000000000001 R11: ffff8800364bd000 R12: ffff880036618400
> [    1.840169] R13: 0000000000000001 R14: ffff8800368c6800 R15: ffff880036406220
> [    1.840169] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
> [    1.840169] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    1.840169] CR2: 00007f1c31c90000 CR3: 0000000001c14000 CR4: 00000000000006e0
> [    1.840169] Stack:
> [    1.840169]  ffff880078493ce8 ffffffffa01d886a ffff880000000001 ffffffff810e20cd
> [    1.840169]  ffff880078493cb8 0000000000000282 0000000000000000 0000000087f90194
> [    1.840169]  ffff880078493ce8 ffff88007ab1d4e0 ffff880036618498 ffff880036618410
> [    1.840169] Call Trace:
> [    1.840169]  [<ffffffffa01d886a>] add_port+0x40a/0x440 [virtio_console]
> [    1.840169]  [<ffffffff810e20cd>] ? trace_hardirqs_on+0xd/0x10
> [    1.840169]  [<ffffffffa01d8c6a>] control_work_handler+0x3ca/0x420 [virtio_console]
> [    1.840169]  [<ffffffff810b0e7b>] ? process_one_work+0x15b/0x530
> [    1.840169]  [<ffffffff810b0ef4>] process_one_work+0x1d4/0x530
> [    1.840169]  [<ffffffff810b0e7b>] ? process_one_work+0x15b/0x530
> [    1.840169]  [<ffffffff810b136b>] worker_thread+0x11b/0x490
> [    1.840169]  [<ffffffff810b1250>] ? process_one_work+0x530/0x530
> [    1.840169]  [<ffffffff810b67c3>] kthread+0xf3/0x110
> [    1.840169]  [<ffffffff81788f00>] ? _raw_spin_unlock_irq+0x30/0x50
> [    1.840169]  [<ffffffff810b66d0>] ? kthread_create_on_node+0x240/0x240
> [    1.840169]  [<ffffffff81789a7c>] ret_from_fork+0x7c/0xb0
> [    1.840169]  [<ffffffff810b66d0>] ? kthread_create_on_node+0x240/0x240
> [    1.840169] Code: ff 49 89 c4 4d 85 e4 0f 8f 25 ff ff ff eb ad 48 c7 c0 f4 ff ff ff e9 17 ff ff ff e8 f5 cd eb e0 90 55 48 89 e5 0f 0b 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 e8 99 e2 ff ff 48 c7 c7 c0 
> [    1.840169] RIP  [<ffffffffa01d92c6>] virtio_device_ready.part.12+0x4/0x6 [virtio_console]
> [    1.840169]  RSP <ffff880078493c78>

^ permalink raw reply

* Re: [PATCH v4 13/25] virtio_console: enable VQs early
From: Cornelia Huck @ 2014-10-20 12:42 UTC (permalink / raw)
  To: Thomas Graf
  Cc: linux-s390, kvm, linux-scsi, Michael S. Tsirkin, netdev,
	linux-kernel, virtualization, Christian Borntraeger,
	Greg Kroah-Hartman, Arnd Bergmann, Paolo Bonzini, Amit Shah,
	v9fs-developer, David S. Miller
In-Reply-To: <20141020120750.GA18561@casper.infradead.org>

On Mon, 20 Oct 2014 13:07:50 +0100
Thomas Graf <tgraf@suug.ch> wrote:

> On 10/13/14 at 10:50am, Michael S. Tsirkin wrote:
> > virtio spec requires drivers to set DRIVER_OK before using VQs.
> > This is set automatically after probe returns, virtio console violated this
> > rule by adding inbufs, which causes the VQ to be used directly within
> > probe.
> > 
> > To fix, call virtio_device_ready before using VQs.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/char/virtio_console.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index b585b47..6ebe8f6 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1449,6 +1449,8 @@ static int add_port(struct ports_device *portdev, u32 id)
> >  	spin_lock_init(&port->outvq_lock);
> >  	init_waitqueue_head(&port->waitqueue);
> >  
> > +	virtio_device_ready(portdev->vdev);
> > +
> >  	/* Fill the in_vq with buffers so the host can send us data. */
> >  	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
> >  	if (!nr_added_bufs) {
> 
> Seems like probe and add_port() now both set VIRTIO_CONFIG_S_DRIVER_OK

I think we need to set this in the probe function instead, otherwise we
fail for multiqueue (which also wants to use the control queue early).

Completely untested patch below; I can send this with proper s-o-b if
it helps.

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index bfa6400..cf7a561 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1449,8 +1449,6 @@ static int add_port(struct ports_device *portdev, u32 id)
 	spin_lock_init(&port->outvq_lock);
 	init_waitqueue_head(&port->waitqueue);
 
-	virtio_device_ready(portdev->vdev);
-
 	/* Fill the in_vq with buffers so the host can send us data. */
 	nr_added_bufs = fill_queue(port->in_vq, &port->inbuf_lock);
 	if (!nr_added_bufs) {
@@ -2026,6 +2024,8 @@ static int virtcons_probe(struct virtio_device *vdev)
 	spin_lock_init(&portdev->ports_lock);
 	INIT_LIST_HEAD(&portdev->ports);
 
+	virtio_device_ready(portdev->vdev);
+
 	if (multiport) {
 		unsigned int nr_added_bufs;

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox