From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N1l6V-000711-UW for qemu-devel@nongnu.org; Sat, 24 Oct 2009 14:12:40 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N1l6Q-0006u5-8Y for qemu-devel@nongnu.org; Sat, 24 Oct 2009 14:12:39 -0400 Received: from [199.232.76.173] (port=59872 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N1l6Q-0006ts-1L for qemu-devel@nongnu.org; Sat, 24 Oct 2009 14:12:34 -0400 Received: from mail-yw0-f176.google.com ([209.85.211.176]:39806) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1N1l6P-0004yQ-Og for qemu-devel@nongnu.org; Sat, 24 Oct 2009 14:12:33 -0400 Received: by ywh6 with SMTP id 6so8362078ywh.4 for ; Sat, 24 Oct 2009 11:12:32 -0700 (PDT) MIME-Version: 1.0 Date: Sun, 25 Oct 2009 02:12:31 +0800 Message-ID: From: Scott Tsai Content-Type: multipart/mixed; boundary=001636c92fcea94dc60476b245bf Subject: [Qemu-devel] qemu: async sending in tap causes "NFS not responding" error List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: markmc@redhat.com, Sven_Rudolph@drewag.de --001636c92fcea94dc60476b245bf Content-Type: text/plain; charset=UTF-8 Dear all, I recently found that this chageset: http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=e19eb22486f258a421108ac22b8380a4e2f16b97 "net: make use of async packet sending API in tap client" causes NFS root Linux guest setups using TAP networking to fail with error messages like: nfs: server 172.20.0.1 not responding, still trying nfs: server 172.20.0.1 OK < .... repeat infinitely ...> This happens on both the "master" and "stable-0.11" branches on qemu. The attached '0001-net-revert-e19eb22486f258a421108ac22b8380a4e2f16b97.patch' makes NFS root on qemu emulated "arm-integrator-cp" boards work for me again. I've uploaded wireshark captures of qemu-0.10(good, nfsroot works) and qemu-0.11(bad) here: http://scottt.tw/bug/qemu-async-tap-drops-packets/qemu-nfsroot-good.pcap http://scottt.tw/bug/qemu-async-tap-drops-packets/qemu-nfsroot-bad.pcap Inspecting frame 268 in "qemu-nfsroot-bad.pcap", I see: "ICMP Fragment reassembly time exceeded", reply to request in frame 53, duplicate to the reply in frame 56 and suspect qemu is dropping ethernet frames from larger, fragmented IP packets used for NFS READ replies. After finding: http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01173.html through Google and reading through the potentially bad commits that Sven found through bisection, I patched "tap_send()" to not run in a loop ("drain the tap send queue in one go"?) and the error goes away. To reproduce this, download "zImage", "scripts/" and "trigger-bug" from: http://scottt.tw/bug/qemu-async-tap-drops-packets/ and run the "trigger-bug" script I've only just started reading linux/Documentation/networking/tuntap.txt after encountering this problem and currently find the code called from "tap_send()" ex: qemu_send_packet_async, qemu_deliver_packet and the semantics of their return values pretty confusing. I'm sure my patch should be refined to both make both NFS root and the originally intended optimization work. --001636c92fcea94dc60476b245bf Content-Type: text/x-patch; charset=US-ASCII; name="0001-net-revert-e19eb22486f258a421108ac22b8380a4e2f16b97.patch" Content-Disposition: attachment; filename="0001-net-revert-e19eb22486f258a421108ac22b8380a4e2f16b97.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_g16ojin90 RnJvbSBmZmRlM2ZlYjE5MjJjMWQ1NjYzYWNlOWQ1YTMyZWFjYWEyNjQ5MWRkIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBTY290dCBUc2FpIDxzY290dHQudHdAZ21haWwuY29tPgpEYXRl OiBTdW4sIDI1IE9jdCAyMDA5IDAxOjUyOjM2ICswODAwClN1YmplY3Q6IFtQQVRDSF0gbmV0OiBy ZXZlcnQgZTE5ZWIyMjQ4NmYyNThhNDIxMTA4YWMyMmI4MzgwYTRlMmYxNmI5NyB0byBmaXggbmZz cm9vdAoKLS0tCiBuZXQuYyB8ICAgMTkgKysrKysrKystLS0tLS0tLS0tLQogMSBmaWxlcyBjaGFu Z2VkLCA4IGluc2VydGlvbnMoKyksIDExIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL25ldC5j IGIvbmV0LmMKaW5kZXggNDcwODA4MC4uMzlkY2QwNCAxMDA2NDQKLS0tIGEvbmV0LmMKKysrIGIv bmV0LmMKQEAgLTEzNTcsMTcgKzEzNTcsMTQgQEAgc3RhdGljIHZvaWQgdGFwX3NlbmQodm9pZCAq b3BhcXVlKQogICAgIFRBUFN0YXRlICpzID0gb3BhcXVlOwogICAgIGludCBzaXplOwogCi0gICAg ZG8gewotICAgICAgICBzaXplID0gdGFwX3JlYWRfcGFja2V0KHMtPmZkLCBzLT5idWYsIHNpemVv ZihzLT5idWYpKTsKLSAgICAgICAgaWYgKHNpemUgPD0gMCkgewotICAgICAgICAgICAgYnJlYWs7 Ci0gICAgICAgIH0KLQotICAgICAgICBzaXplID0gcWVtdV9zZW5kX3BhY2tldF9hc3luYyhzLT52 Yywgcy0+YnVmLCBzaXplLCB0YXBfc2VuZF9jb21wbGV0ZWQpOwotICAgICAgICBpZiAoc2l6ZSA9 PSAwKSB7Ci0gICAgICAgICAgICB0YXBfcmVhZF9wb2xsKHMsIDApOwotICAgICAgICB9Ci0gICAg fSB3aGlsZSAoc2l6ZSA+IDApOworICAgIHNpemUgPSB0YXBfcmVhZF9wYWNrZXQocy0+ZmQsIHMt PmJ1Ziwgc2l6ZW9mKHMtPmJ1ZikpOworICAgIGlmIChzaXplIDw9IDApIHsKKyAgICAgICAgcmV0 dXJuOworICAgIH0KKyAgICBzaXplID0gcWVtdV9zZW5kX3BhY2tldF9hc3luYyhzLT52Yywgcy0+ YnVmLCBzaXplLCB0YXBfc2VuZF9jb21wbGV0ZWQpOworICAgIGlmIChzaXplID09IDApIHsKKyAg ICAgICAgdGFwX3JlYWRfcG9sbChzLCAwKTsKKyAgICB9CiB9CiAKICNpZmRlZiBUVU5TRVRTTkRC VUYKLS0gCjEuNi4yLjUKCg== --001636c92fcea94dc60476b245bf--