* BFQ + dm-mpath @ 2017-08-24 23:16 Bart Van Assche 2017-08-30 16:31 ` Paolo Valente 0 siblings, 1 reply; 25+ messages in thread From: Bart Van Assche @ 2017-08-24 23:16 UTC (permalink / raw) To: paolo.valente@linaro.org; +Cc: linux-block@vger.kernel.org SGVsbG8gUGFvbG8sDQoNCkhhcyBCRlEgZXZlciBiZWVuIHRlc3RlZCBpbiBjb21iaW5hdGlvbiB3 aXRoIGRtLW1wYXRoPyBBIGZldyBzZWNvbmRzDQphZnRlciBJIGhhZCBzdGFydGVkIGEgcnVuIG9m IHRoZSBzcnAtdGVzdHMgc29mdHdhcmUgd2l0aCBCRlEgYSBrZXJuZWwNCm9vcHMgYXBwZWFyZWQg b24gdGhlIGNvbnNvbGUuIFRoZSBjb21tYW5kIEkgcmFuIGlzOg0KDQokIHdoaWxlIHRydWU7IGRv IH5iYXJ0L3NvZnR3YXJlL2luZmluaWJhbmQvc3JwLXRlc3QvcnVuX3Rlc3RzIC1kIC1yIDMwIC1l IGJmcTsgZG9uZQ0KDQpBbmQgdGhpcyBpcyB3aGF0IGFwcGVhcmVkIG9uIHRoZSBjb25zb2xlOg0K DQpbODkyNTEuOTc3MjAxXSBCVUc6IHVuYWJsZSB0byBoYW5kbGUga2VybmVsIE5VTEwgcG9pbnRl ciBkZXJlZmVyZW5jZSBhdCAwMDAwMDAwMDAwMDAwMDE4IA0KWzg5MjUxLjk3NzIwMV0gQlVHOiB1 bmFibGUgdG8gaGFuZGxlIGtlcm5lbCBOVUxMIHBvaW50ZXIgZGVyZWZlcmVuY2UgYXQgMDAwMDAw MDAwMDAwMDAxOCANCls4OTI1MS45ODc4MzFdIElQOiBiZnFfc2V0dXBfY29vcGVyYXRvcisweDE2 LzB4MmUwIFtiZnFdIA0KWzg5MjUxLjk4NzgzMV0gSVA6IGJmcV9zZXR1cF9jb29wZXJhdG9yKzB4 MTYvMHgyZTAgW2JmcV0gDQpbODkyNTEuOTk1MjQxXSBQR0QgMCAgDQpbODkyNTEuOTk1MjQxXSBQ R0QgMCAgDQpbODkyNTEuOTk1MjQzXSBQNEQgMCAgDQpbODkyNTEuOTk1MjQzXSBQNEQgMCAgDQpb ODkyNTEuOTk5MTk0XSAgDQpbODkyNTEuOTk5MTk0XSAgDQpbODkyNTIuMDA2NDIzXSBPb3BzOiAw MDAwIFsjMV0gUFJFRU1QVCBTTVAgDQpbODkyNTIuMDA2NDIzXSBPb3BzOiAwMDAwIFsjMV0gUFJF RU1QVCBTTVAgDQpbODkyNTIuMDEyMzU0XSBNb2R1bGVzIGxpbmtlZCBpbjogaWJfc3JwIGxpYmNy YzMyYyBzY3NpX3RyYW5zcG9ydF9zcnAgdGFyZ2V0X2NvcmVfZmlsZSBpYl9zcnB0IHRhcmdldF9j b3JlX2libG9jayB0YXJnZXRfY29yZV9tb2Qgc2NzaV9kZWJ1ZyBicmQgYmZxIGRtX3NlcnZpY2Vf dGltZSByZG1hX2NtDQppd19jbSBicmlkZ2Ugc3RwIGxsYyBpcDZ0YWJsZV9maWx0ZXIgaXAgDQo2 X3RhYmxlcyBpcHRhYmxlX2ZpbHRlciBpbnRlbF9yYXBsIHNiX2VkYWMgeDg2X3BrZ190ZW1wX3Ro ZXJtYWwgaW50ZWxfcG93ZXJjbGFtcCBjb3JldGVtcCBrdm1faW50ZWwga3ZtIGFmX3BhY2tldCBp cnFieXBhc3MgaXBtaV9zc2lmIGRjZGJhcyBjcmN0MTBkaWZfcGNsbXVsIGlvYXRkbWENCmNyYzMy X3BjbG11bCBtZWlfbWUgam95ZGV2IGdoYXNoX2NsbXVsbmlfaW50ZWwgDQptZWkgYWVzbmlfaW50 ZWwgc2cgc2hwY2hwIGlwbWlfc2kgYWVzX3g4Nl82NCBpcG1pX2RldmludGYgY3J5cHRvX3NpbWQg ZGNhIGNyeXB0ZCBnbHVlX2hlbHBlciBscGNfaWNoIGlwbWlfbXNnaGFuZGxlciBhY3BpX3Bvd2Vy X21ldGVyIGFjcGlfcGFkIHdtaSBtZmRfY29yZSBpYl9pcG9pYiBpYl9jbQ0KaWJfdXZlcmJzIGli X3VtYWQgbWx4NF9pYiBpYl9jb3JlIGRtX211bCANCnRpcGF0aCBkbV9tb2Qgc2NzaV9kaF9yZGFj IHNjc2lfZGhfZW1jIHNjc2lfZGhfYWx1YSBuZXRjb25zb2xlIGNvbmZpZ2ZzIGlwX3RhYmxlcyB4 X3RhYmxlcyBhdXRvZnM0IGV4dDQgbWJjYWNoZSBqYmQyIG1seDRfZW4gaGlkX2dlbmVyaWMgdXNi aGlkIG1nYWcyMDAgaTJjX2FsZ29fYml0DQpkcm1fa21zX2hlbHBlciANCls4OTI1Mi4wMTIzNTRd IE1vZHVsZXMgbGlua2VkIGluOiBpYl9zcnAgbGliY3JjMzJjIHNjc2lfdHJhbnNwb3J0X3NycCB0 YXJnZXRfY29yZV9maWxlIGliX3NycHQgdGFyZ2V0X2NvcmVfaWJsb2NrIHRhcmdldF9jb3JlX21v ZCBzY3NpX2RlYnVnIGJyZCBiZnEgZG1fc2VydmljZV90aW1lIHJkbWFfY20NCml3X2NtIGJyaWRn ZSBzdHAgbGxjIGlwNnRhYmxlX2ZpbHRlciBpcCANCjZfdGFibGVzIGlwdGFibGVfZmlsdGVyIGlu dGVsX3JhcGwgc2JfZWRhYyB4ODZfcGtnX3RlbXBfdGhlcm1hbCBpbnRlbF9wb3dlcmNsYW1wIGNv cmV0ZW1wIGt2bV9pbnRlbCBrdm0gYWZfcGFja2V0IGlycWJ5cGFzcyBpcG1pX3NzaWYgZGNkYmFz IGNyY3QxMGRpZl9wY2xtdWwgaW9hdGRtYQ0KY3JjMzJfcGNsbXVsIG1laV9tZSBqb3lkZXYgZ2hh c2hfY2xtdWxuaV9pbnRlbCANCm1laSBhZXNuaV9pbnRlbCBzZyBzaHBjaHAgaXBtaV9zaSBhZXNf eDg2XzY0IGlwbWlfZGV2aW50ZiBjcnlwdG9fc2ltZCBkY2EgY3J5cHRkIGdsdWVfaGVscGVyIGxw Y19pY2ggaXBtaV9tc2doYW5kbGVyIGFjcGlfcG93ZXJfbWV0ZXIgYWNwaV9wYWQgd21pIG1mZF9j b3JlIGliX2lwb2liIGliX2NtDQppYl91dmVyYnMgaWJfdW1hZCBtbHg0X2liIGliX2NvcmUgZG1f bXVsIA0KdGlwYXRoIGRtX21vZCBzY3NpX2RoX3JkYWMgc2NzaV9kaF9lbWMgc2NzaV9kaF9hbHVh IG5ldGNvbnNvbGUgY29uZmlnZnMgaXBfdGFibGVzIHhfdGFibGVzIGF1dG9mczQgZXh0NCBtYmNh Y2hlIGpiZDIgbWx4NF9lbiBoaWRfZ2VuZXJpYyB1c2JoaWQgbWdhZzIwMCBpMmNfYWxnb19iaXQN CmRybV9rbXNfaGVscGVyIA0KWzg5MjUyLjEwMzk0MV0gIHN5c2NvcHlhcmVhIHN5c2ZpbGxyZWN0 IHN5c2ltZ2JsdCBmYl9zeXNfZm9wcyB0dG0gZWhjaV9wY2kgbWx4NF9jb3JlIHRnMyBlaGNpX2hj ZCBjcmMzMmNfaW50ZWwgZGV2bGluayBkcm0gcHRwIHVzYmNvcmUgdXNiX2NvbW1vbiBwcHNfY29y ZSBtZWdhcmFpZF9zYXMgbGlicGh5DQpbbGFzdCB1bmxvYWRlZDogc2NzaV90cmFuc3BvcnRfc3Jw XSANCls4OTI1Mi4xMDM5NDFdICBzeXNjb3B5YXJlYSBzeXNmaWxscmVjdCBzeXNpbWdibHQgZmJf c3lzX2ZvcHMgdHRtIGVoY2lfcGNpIG1seDRfY29yZSB0ZzMgZWhjaV9oY2QgY3JjMzJjX2ludGVs IGRldmxpbmsgZHJtIHB0cCB1c2Jjb3JlIHVzYl9jb21tb24gcHBzX2NvcmUgbWVnYXJhaWRfc2Fz IGxpYnBoeQ0KW2xhc3QgdW5sb2FkZWQ6IHNjc2lfdHJhbnNwb3J0X3NycF0gDQpbODkyNTIuMTI4 Mzc1XSBDUFU6IDEzIFBJRDogMzUyIENvbW06IGt3b3JrZXIvMTM6MUggVGFpbnRlZDogRyAgICAg ICAgVyAgICAgICA0LjEzLjAtcmM2LWRiZysgIzIgDQpbODkyNTIuMTI4Mzc1XSBDUFU6IDEzIFBJ RDogMzUyIENvbW06IGt3b3JrZXIvMTM6MUggVGFpbnRlZDogRyAgICAgICAgVyAgICAgICA0LjEz LjAtcmM2LWRiZysgIzIgDQpbODkyNTIuMTM5ODg0XSBIYXJkd2FyZSBuYW1lOiBEZWxsIEluYy4g UG93ZXJFZGdlIFI3MjAvMFZXVDkwLCBCSU9TIDEuMy42IDA5LzExLzIwMTIgDQpbODkyNTIuMTM5 ODg0XSBIYXJkd2FyZSBuYW1lOiBEZWxsIEluYy4gUG93ZXJFZGdlIFI3MjAvMFZXVDkwLCBCSU9T IDEuMy42IDA5LzExLzIwMTIgDQpbODkyNTIuMTUwMjQzXSBXb3JrcXVldWU6IGtibG9ja2QgYmxr X21xX3J1bl93b3JrX2ZuIA0KWzg5MjUyLjE1MDI0M10gV29ya3F1ZXVlOiBrYmxvY2tkIGJsa19t cV9ydW5fd29ya19mbiANCls4OTI1Mi4xNTc0NDldIHRhc2s6IGZmZmY4ODA5MTFkODAwNDAgdGFz ay5zdGFjazogZmZmZmM5MDAwN2M2NDAwMCANCls4OTI1Mi4xNTc0NDldIHRhc2s6IGZmZmY4ODA5 MTFkODAwNDAgdGFzay5zdGFjazogZmZmZmM5MDAwN2M2NDAwMCANCls4OTI1Mi4xNjYwMzhdIFJJ UDogMDAxMDpiZnFfc2V0dXBfY29vcGVyYXRvcisweDE2LzB4MmUwIFtiZnFdIA0KWzg5MjUyLjE2 NjAzOF0gUklQOiAwMDEwOmJmcV9zZXR1cF9jb29wZXJhdG9yKzB4MTYvMHgyZTAgW2JmcV0gDQpb ODkyNTIuMTc0MjIyXSBSU1A6IDAwMTg6ZmZmZmM5MDAwN2M2N2IyMCBFRkxBR1M6IDAwMDEwMDgy IA0KWzg5MjUyLjE3NDIyMl0gUlNQOiAwMDE4OmZmZmZjOTAwMDdjNjdiMjAgRUZMQUdTOiAwMDAx MDA4MiANCls4OTI1Mi4xODIwMjZdIFJBWDogMDAwMDAwMDBmZmZmZmZlMCBSQlg6IGZmZmY4ODA3 Yjk5ODg3MDAgUkNYOiAwMDAwMDAwMDAwMDAwMDAxIA0KWzg5MjUyLjE4MjAyNl0gUkFYOiAwMDAw MDAwMGZmZmZmZmUwIFJCWDogZmZmZjg4MDdiOTk4ODcwMCBSQ1g6IDAwMDAwMDAwMDAwMDAwMDEg DQpbODkyNTIuMTkxOTkwXSBSRFg6IGZmZmY4ODA3Yjk5ODg3MDAgUlNJOiAwMDAwMDAwMDAwMDAw MDAwIFJESTogZmZmZjg4MDI4ODU1NGE2OCANCls4OTI1Mi4xOTE5OTBdIFJEWDogZmZmZjg4MDdi OTk4ODcwMCBSU0k6IDAwMDAwMDAwMDAwMDAwMDAgUkRJOiBmZmZmODgwMjg4NTU0YTY4IA0KWzg5 MjUyLjIwMTk1Nl0gUkJQOiBmZmZmYzkwMDA3YzY3YjY4IFIwODogZmZmZmZmZmY4MjU4MjBjMCBS MDk6IDAwMDAwMDAwMDAwMDAwMDAgDQpbODkyNTIuMjAxOTU2XSBSQlA6IGZmZmZjOTAwMDdjNjdi NjggUjA4OiBmZmZmZmZmZjgyNTgyMGMwIFIwOTogMDAwMDAwMDAwMDAwMDAwMCANCls4OTI1Mi4y MTE4OTldIFIxMDogZmZmZmM5MDAwN2M2N2FlOCBSMTE6IGZmZmZmZmZmYTA1YTVhZDUgUjEyOiBm ZmZmODgwMjg4NTU0ZGM4IA0KWzg5MjUyLjIxMTg5OV0gUjEwOiBmZmZmYzkwMDA3YzY3YWU4IFIx MTogZmZmZmZmZmZhMDVhNWFkNSBSMTI6IGZmZmY4ODAyODg1NTRkYzggDQpbODkyNTIuMjIxODIx XSBSMTM6IGZmZmY4ODAyODg1NTRhNjggUjE0OiBmZmZmODgwOTI4MDc4YmQwIFIxNTogZmZmZjg4 MDdiYmUwMzUyOCANCls4OTI1Mi4yMjE4MjFdIFIxMzogZmZmZjg4MDI4ODU1NGE2OCBSMTQ6IGZm ZmY4ODA5MjgwNzhiZDAgUjE1OiBmZmZmODgwN2JiZTAzNTI4IA0KWzg5MjUyLjIzMTcxNV0gRlM6 ICAwMDAwMDAwMDAwMDAwMDAwKDAwMDApIEdTOmZmZmY4ODA5M2Y3ODAwMDAoMDAwMCkga25sR1M6 MDAwMDAwMDAwMDAwMDAwMCANCls4OTI1Mi4yMzE3MTVdIEZTOiAgMDAwMDAwMDAwMDAwMDAwMCgw MDAwKSBHUzpmZmZmODgwOTNmNzgwMDAwKDAwMDApIGtubEdTOjAwMDAwMDAwMDAwMDAwMDAgDQpb ODkyNTIuMjQyNjY3XSBDUzogIDAwMTAgRFM6IDAwMDAgRVM6IDAwMDAgQ1IwOiAwMDAwMDAwMDgw MDUwMDMzIA0KWzg5MjUyLjI0MjY2N10gQ1M6ICAwMDEwIERTOiAwMDAwIEVTOiAwMDAwIENSMDog MDAwMDAwMDA4MDA1MDAzMyANCls4OTI1Mi4yNTA5NThdIENSMjogMDAwMDAwMDAwMDAwMDAxOCBD UjM6IDAwMDAwMDA0YWE3MzEwMDAgQ1I0OiAwMDAwMDAwMDAwMDQwNmUwDQpbODkyNTIuMjUwOTU4 XSBDUjI6IDAwMDAwMDAwMDAwMDAwMTggQ1IzOiAwMDAwMDAwNGFhNzMxMDAwIENSNDogMDAwMDAw MDAwMDA0MDZlMCANCls4OTI1Mi4yNjA3NzRdIENhbGwgVHJhY2U6IA0KWzg5MjUyLjI2MDc3NF0g Q2FsbCBUcmFjZTogDQpbODkyNTIuMjY1MzQxXSAgYmZxX2luc2VydF9yZXF1ZXN0cysweDEwZC8w eGYyMCBbYmZxXSANCls4OTI1Mi4yNjUzNDFdICBiZnFfaW5zZXJ0X3JlcXVlc3RzKzB4MTBkLzB4 ZjIwIFtiZnFdIA0KWzg5MjUyLjI3MjU0OV0gID8gbG9ja19hY3F1aXJlKzB4ZGMvMHgxZDAgDQpb ODkyNTIuMjcyNTQ5XSAgPyBsb2NrX2FjcXVpcmUrMHhkYy8weDFkMCANCls4OTI1Mi4yNzg1NjZd ICBibGtfbXFfc2NoZWRfaW5zZXJ0X3JlcXVlc3QrMHgxMjMvMHgxNzAgDQpbODkyNTIuMjc4NTY2 XSAgYmxrX21xX3NjaGVkX2luc2VydF9yZXF1ZXN0KzB4MTIzLzB4MTcwIA0KWzg5MjUyLjI4NTkz Nl0gIGJsa19pbnNlcnRfY2xvbmVkX3JlcXVlc3QrMHhhZS8weDFmMCANCls4OTI1Mi4yODU5MzZd ICBibGtfaW5zZXJ0X2Nsb25lZF9yZXF1ZXN0KzB4YWUvMHgxZjAgDQpbODkyNTIuMjkzMDI2XSAg bWFwX3JlcXVlc3QrMHgxMjMvMHgyOTAgW2RtX21vZF0gDQpbODkyNTIuMjkzMDI2XSAgbWFwX3Jl cXVlc3QrMHgxMjMvMHgyOTAgW2RtX21vZF0gDQpbODkyNTIuMjk5NzA1XSAgZG1fbXFfcXVldWVf cnErMHg5NC8weDExMCBbZG1fbW9kXSANCls4OTI1Mi4yOTk3MDVdICBkbV9tcV9xdWV1ZV9ycSsw eDk0LzB4MTEwIFtkbV9tb2RdIA0KWzg5MjUyLjMwNjU1N10gIGJsa19tcV9kaXNwYXRjaF9ycV9s aXN0KzB4MWYxLzB4M2EwIA0KWzg5MjUyLjMwNjU1N10gIGJsa19tcV9kaXNwYXRjaF9ycV9saXN0 KzB4MWYxLzB4M2EwIA0KWzg5MjUyLjMxMzUyMF0gIGJsa19tcV9zY2hlZF9kaXNwYXRjaF9yZXF1 ZXN0cysweDE2ZC8weDFlMCANCls4OTI1Mi4zMTM1MjBdICBibGtfbXFfc2NoZWRfZGlzcGF0Y2hf cmVxdWVzdHMrMHgxNmQvMHgxZTAgDQpbODkyNTIuMzIxMTgzXSAgX19ibGtfbXFfcnVuX2h3X3F1 ZXVlKzB4MTFkLzB4MWMwIA0KWzg5MjUyLjMyMTE4M10gIF9fYmxrX21xX3J1bl9od19xdWV1ZSsw eDExZC8weDFjMCANCls4OTI1Mi4zMjc5ODddICBibGtfbXFfcnVuX3dvcmtfZm4rMHgyYy8weDMw IA0KWzg5MjUyLjMyNzk4N10gIGJsa19tcV9ydW5fd29ya19mbisweDJjLzB4MzAgDQpbODkyNTIu MzM0Mjg0XSAgcHJvY2Vzc19vbmVfd29yaysweDIwMC8weDYzMCANCls4OTI1Mi4zMzQyODRdICBw cm9jZXNzX29uZV93b3JrKzB4MjAwLzB4NjMwIA0KWzg5MjUyLjM0MDUxNV0gIHdvcmtlcl90aHJl YWQrMHg0ZS8weDNiMCANCls4OTI1Mi4zNDA1MTVdICB3b3JrZXJfdGhyZWFkKzB4NGUvMHgzYjAg DQpbODkyNTIuMzQ2MzE3XSAga3RocmVhZCsweDExMy8weDE1MCANCls4OTI1Mi4zNDYzMTddICBr dGhyZWFkKzB4MTEzLzB4MTUwIA0KWzg5MjUyLjM1MTU3Ml0gID8gcHJvY2Vzc19vbmVfd29yaysw eDYzMC8weDYzMCANCls4OTI1Mi4zNTE1NzJdICA/IHByb2Nlc3Nfb25lX3dvcmsrMHg2MzAvMHg2 MzAgDQpbODkyNTIuMzU3ODkwXSAgPyBrdGhyZWFkX2NyZWF0ZV9vbl9ub2RlKzB4NDAvMHg0MCAN Cls4OTI1Mi4zNTc4OTBdICA/IGt0aHJlYWRfY3JlYXRlX29uX25vZGUrMHg0MC8weDQwIA0KWzg5 MjUyLjM2NDYwN10gID8ga3RocmVhZF9jcmVhdGVfb25fbm9kZSsweDQwLzB4NDAgDQpbODkyNTIu MzY0NjA3XSAgPyBrdGhyZWFkX2NyZWF0ZV9vbl9ub2RlKzB4NDAvMHg0MCANCls4OTI1Mi4zNzEz MDNdICByZXRfZnJvbV9mb3JrKzB4MjcvMHg0MCANCls4OTI1Mi4zNzEzMDNdICByZXRfZnJvbV9m b3JrKzB4MjcvMHg0MCANCls4OTI1Mi4zNzY5MTJdIENvZGU6IDk2IDA4IDAxIDAwIDAwIDQ4IGMx IGVhIDA2IDgzIGUyIDAxIDg5IGQwIDVkIGMzIDBmIDFmIDQ0IDAwIDAwIDY2IDY2IDY2IDY2IDkw IDU1IDQ4IDg5IGU1IDQxIDU3IDQxIDU2IDQxIDU1IDQxIDU0IDUzIDQ4IDgzIGVjIDIwIDw0Yz4g OGIgNjYgMTggNGQgODUgZTQgNzQNCjEyIDQ4IDgzIGM0IDIwIDRjIDg5IGUwIDViIDQxIDVjIDQx IA0KNWQgIA0KWzg5MjUyLjM3NjkxMl0gQ29kZTogOTYgMDggMDEgMDAgMDAgNDggYzEgZWEgMDYg ODMgZTIgMDEgODkgZDAgNWQgYzMgMGYgMWYgNDQgMDAgMDAgNjYgNjYgNjYgNjYgOTAgNTUgNDgg ODkgZTUgNDEgNTcgNDEgNTYgNDEgNTUgNDEgNTQgNTMgNDggODMgZWMgMjAgPDRjPiA4YiA2NiAx OCA0ZCA4NSBlNCA3NA0KMTIgNDggODMgYzQgMjAgNGMgODkgZTAgNWIgNDEgNWMgNDEgDQo1ZCAg DQpbODkyNTIuNDAxMzE5XSBSSVA6IGJmcV9zZXR1cF9jb29wZXJhdG9yKzB4MTYvMHgyZTAgW2Jm cV0gUlNQOiBmZmZmYzkwMDA3YzY3YjIwIA0KWzg5MjUyLjQwMTMxOV0gUklQOiBiZnFfc2V0dXBf Y29vcGVyYXRvcisweDE2LzB4MmUwIFtiZnFdIFJTUDogZmZmZmM5MDAwN2M2N2IyMCANCls4OTI1 Mi40MTA4NThdIENSMjogMDAwMDAwMDAwMDAwMDAxOCANCls4OTI1Mi40MTA4NThdIENSMjogMDAw MDAwMDAwMDAwMDAxOCANCls4OTI1Mi40MTYyNTddIC0tLVsgZW5kIHRyYWNlIGVjNDhlOTlhNTBk ODhjYjQgXS0tLSANCls4OTI1Mi40MTYyNTddIC0tLVsgZW5kIHRyYWNlIGVjNDhlOTlhNTBkODhj YjQgXS0tLQ0KDQokIGdkYiBiZnEua28NCihnZGIpIGxpc3QgKihiZnFfc2V0dXBfY29vcGVyYXRv cisweDE2KQ0KMHgxZDc2IGlzIGluIGJmcV9zZXR1cF9jb29wZXJhdG9yIChibG9jay9iZnEtaW9z Y2hlZC5jOjE5NTkpLg0KMTk1NCAgICBiZnFfc2V0dXBfY29vcGVyYXRvcihzdHJ1Y3QgYmZxX2Rh dGEgKmJmcWQsIHN0cnVjdCBiZnFfcXVldWUgKmJmcXEsDQoxOTU1ICAgICAgICAgICAgICAgICAg ICAgICAgIHZvaWQgKmlvX3N0cnVjdCwgYm9vbCByZXF1ZXN0KQ0KMTk1NiAgICB7DQoxOTU3ICAg ICAgICAgICAgc3RydWN0IGJmcV9xdWV1ZSAqaW5fc2VydmljZV9iZnFxLCAqbmV3X2JmcXE7DQox OTU4DQoxOTU5ICAgICAgICAgICAgaWYgKGJmcXEtPm5ld19iZnFxKQ0KMTk2MCAgICAgICAgICAg ICAgICAgICAgcmV0dXJuIGJmcXEtPm5ld19iZnFxOw0KMTk2MQ0KMTk2MiAgICAgICAgICAgIGlm ICghaW9fc3RydWN0IHx8DQoxOTYzICAgICAgICAgICAgICAgIHdyX2Zyb21fdG9vX2xvbmcoYmZx cSkgfHwNCg0KQmFydC4= ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: BFQ + dm-mpath 2017-08-24 23:16 BFQ + dm-mpath Bart Van Assche @ 2017-08-30 16:31 ` Paolo Valente 2017-09-05 7:56 ` Paolo Valente 0 siblings, 1 reply; 25+ messages in thread From: Paolo Valente @ 2017-08-30 16:31 UTC (permalink / raw) To: Bart Van Assche; +Cc: linux-block@vger.kernel.org > Il giorno 25 ago 2017, alle ore 01:16, Bart Van Assche = <bart.vanassche@wdc.com> ha scritto: >=20 > Hello Paolo, >=20 Hi Bart > Has BFQ ever been tested in combination with dm-mpath? Unfortunately no. > A few seconds > after I had started a run of the srp-tests software with BFQ a kernel > oops appeared on the console. The command I ran is: >=20 > $ while true; do ~bart/software/infiniband/srp-test/run_tests -d -r 30 = -e bfq; done >=20 > And this is what appeared on the console: >=20 > [89251.977201] BUG: unable to handle kernel NULL pointer dereference = at 0000000000000018=20 > [89251.977201] BUG: unable to handle kernel NULL pointer dereference = at 0000000000000018=20 > [89251.987831] IP: bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 > [89251.987831] IP: bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 > [89251.995241] PGD 0 =20 > [89251.995241] PGD 0 =20 > [89251.995243] P4D 0 =20 > [89251.995243] P4D 0 =20 > [89251.999194] =20 > [89251.999194] =20 > [89252.006423] Oops: 0000 [#1] PREEMPT SMP=20 > [89252.006423] Oops: 0000 [#1] PREEMPT SMP=20 > [89252.012354] Modules linked in: ib_srp libcrc32c scsi_transport_srp = target_core_file ib_srpt target_core_iblock target_core_mod scsi_debug = brd bfq dm_service_time rdma_cm > iw_cm bridge stp llc ip6table_filter ip=20 > 6_tables iptable_filter intel_rapl sb_edac x86_pkg_temp_thermal = intel_powerclamp coretemp kvm_intel kvm af_packet irqbypass ipmi_ssif = dcdbas crct10dif_pclmul ioatdma > crc32_pclmul mei_me joydev ghash_clmulni_intel=20 > mei aesni_intel sg shpchp ipmi_si aes_x86_64 ipmi_devintf crypto_simd = dca cryptd glue_helper lpc_ich ipmi_msghandler acpi_power_meter acpi_pad = wmi mfd_core ib_ipoib ib_cm > ib_uverbs ib_umad mlx4_ib ib_core dm_mul=20 > tipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole = configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 mlx4_en = hid_generic usbhid mgag200 i2c_algo_bit > drm_kms_helper=20 > [89252.012354] Modules linked in: ib_srp libcrc32c scsi_transport_srp = target_core_file ib_srpt target_core_iblock target_core_mod scsi_debug = brd bfq dm_service_time rdma_cm > iw_cm bridge stp llc ip6table_filter ip=20 > 6_tables iptable_filter intel_rapl sb_edac x86_pkg_temp_thermal = intel_powerclamp coretemp kvm_intel kvm af_packet irqbypass ipmi_ssif = dcdbas crct10dif_pclmul ioatdma > crc32_pclmul mei_me joydev ghash_clmulni_intel=20 > mei aesni_intel sg shpchp ipmi_si aes_x86_64 ipmi_devintf crypto_simd = dca cryptd glue_helper lpc_ich ipmi_msghandler acpi_power_meter acpi_pad = wmi mfd_core ib_ipoib ib_cm > ib_uverbs ib_umad mlx4_ib ib_core dm_mul=20 > tipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole = configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 mlx4_en = hid_generic usbhid mgag200 i2c_algo_bit > drm_kms_helper=20 > [89252.103941] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm = ehci_pci mlx4_core tg3 ehci_hcd crc32c_intel devlink drm ptp usbcore = usb_common pps_core megaraid_sas libphy > [last unloaded: scsi_transport_srp]=20 > [89252.103941] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm = ehci_pci mlx4_core tg3 ehci_hcd crc32c_intel devlink drm ptp usbcore = usb_common pps_core megaraid_sas libphy > [last unloaded: scsi_transport_srp]=20 > [89252.128375] CPU: 13 PID: 352 Comm: kworker/13:1H Tainted: G = W 4.13.0-rc6-dbg+ #2=20 > [89252.128375] CPU: 13 PID: 352 Comm: kworker/13:1H Tainted: G = W 4.13.0-rc6-dbg+ #2=20 > [89252.139884] Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS = 1.3.6 09/11/2012=20 > [89252.139884] Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS = 1.3.6 09/11/2012=20 > [89252.150243] Workqueue: kblockd blk_mq_run_work_fn=20 > [89252.150243] Workqueue: kblockd blk_mq_run_work_fn=20 > [89252.157449] task: ffff880911d80040 task.stack: ffffc90007c64000=20 > [89252.157449] task: ffff880911d80040 task.stack: ffffc90007c64000=20 > [89252.166038] RIP: 0010:bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 > [89252.166038] RIP: 0010:bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 > [89252.174222] RSP: 0018:ffffc90007c67b20 EFLAGS: 00010082=20 > [89252.174222] RSP: 0018:ffffc90007c67b20 EFLAGS: 00010082=20 > [89252.182026] RAX: 00000000ffffffe0 RBX: ffff8807b9988700 RCX: = 0000000000000001=20 > [89252.182026] RAX: 00000000ffffffe0 RBX: ffff8807b9988700 RCX: = 0000000000000001=20 > [89252.191990] RDX: ffff8807b9988700 RSI: 0000000000000000 RDI: = ffff880288554a68=20 > [89252.191990] RDX: ffff8807b9988700 RSI: 0000000000000000 RDI: = ffff880288554a68=20 > [89252.201956] RBP: ffffc90007c67b68 R08: ffffffff825820c0 R09: = 0000000000000000=20 > [89252.201956] RBP: ffffc90007c67b68 R08: ffffffff825820c0 R09: = 0000000000000000=20 > [89252.211899] R10: ffffc90007c67ae8 R11: ffffffffa05a5ad5 R12: = ffff880288554dc8=20 > [89252.211899] R10: ffffc90007c67ae8 R11: ffffffffa05a5ad5 R12: = ffff880288554dc8=20 > [89252.221821] R13: ffff880288554a68 R14: ffff880928078bd0 R15: = ffff8807bbe03528=20 > [89252.221821] R13: ffff880288554a68 R14: ffff880928078bd0 R15: = ffff8807bbe03528=20 > [89252.231715] FS: 0000000000000000(0000) GS:ffff88093f780000(0000) = knlGS:0000000000000000=20 > [89252.231715] FS: 0000000000000000(0000) GS:ffff88093f780000(0000) = knlGS:0000000000000000=20 > [89252.242667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033=20 > [89252.242667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033=20 > [89252.250958] CR2: 0000000000000018 CR3: 00000004aa731000 CR4: = 00000000000406e0 > [89252.250958] CR2: 0000000000000018 CR3: 00000004aa731000 CR4: = 00000000000406e0=20 > [89252.260774] Call Trace:=20 > [89252.260774] Call Trace:=20 > [89252.265341] bfq_insert_requests+0x10d/0xf20 [bfq]=20 > [89252.265341] bfq_insert_requests+0x10d/0xf20 [bfq]=20 > [89252.272549] ? lock_acquire+0xdc/0x1d0=20 > [89252.272549] ? lock_acquire+0xdc/0x1d0=20 > [89252.278566] blk_mq_sched_insert_request+0x123/0x170=20 > [89252.278566] blk_mq_sched_insert_request+0x123/0x170=20 > [89252.285936] blk_insert_cloned_request+0xae/0x1f0=20 > [89252.285936] blk_insert_cloned_request+0xae/0x1f0=20 > [89252.293026] map_request+0x123/0x290 [dm_mod]=20 > [89252.293026] map_request+0x123/0x290 [dm_mod]=20 > [89252.299705] dm_mq_queue_rq+0x94/0x110 [dm_mod]=20 > [89252.299705] dm_mq_queue_rq+0x94/0x110 [dm_mod]=20 > [89252.306557] blk_mq_dispatch_rq_list+0x1f1/0x3a0=20 > [89252.306557] blk_mq_dispatch_rq_list+0x1f1/0x3a0=20 > [89252.313520] blk_mq_sched_dispatch_requests+0x16d/0x1e0=20 > [89252.313520] blk_mq_sched_dispatch_requests+0x16d/0x1e0=20 > [89252.321183] __blk_mq_run_hw_queue+0x11d/0x1c0=20 > [89252.321183] __blk_mq_run_hw_queue+0x11d/0x1c0=20 > [89252.327987] blk_mq_run_work_fn+0x2c/0x30=20 > [89252.327987] blk_mq_run_work_fn+0x2c/0x30=20 > [89252.334284] process_one_work+0x200/0x630=20 > [89252.334284] process_one_work+0x200/0x630=20 > [89252.340515] worker_thread+0x4e/0x3b0=20 > [89252.340515] worker_thread+0x4e/0x3b0=20 > [89252.346317] kthread+0x113/0x150=20 > [89252.346317] kthread+0x113/0x150=20 > [89252.351572] ? process_one_work+0x630/0x630=20 > [89252.351572] ? process_one_work+0x630/0x630=20 > [89252.357890] ? kthread_create_on_node+0x40/0x40=20 > [89252.357890] ? kthread_create_on_node+0x40/0x40=20 > [89252.364607] ? kthread_create_on_node+0x40/0x40=20 > [89252.364607] ? kthread_create_on_node+0x40/0x40=20 > [89252.371303] ret_from_fork+0x27/0x40=20 > [89252.371303] ret_from_fork+0x27/0x40=20 > [89252.376912] Code: 96 08 01 00 00 48 c1 ea 06 83 e2 01 89 d0 5d c3 = 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 = 83 ec 20 <4c> 8b 66 18 4d 85 e4 74 > 12 48 83 c4 20 4c 89 e0 5b 41 5c 41=20 > 5d =20 > [89252.376912] Code: 96 08 01 00 00 48 c1 ea 06 83 e2 01 89 d0 5d c3 = 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 = 83 ec 20 <4c> 8b 66 18 4d 85 e4 74 > 12 48 83 c4 20 4c 89 e0 5b 41 5c 41=20 > 5d =20 > [89252.401319] RIP: bfq_setup_cooperator+0x16/0x2e0 [bfq] RSP: = ffffc90007c67b20=20 > [89252.401319] RIP: bfq_setup_cooperator+0x16/0x2e0 [bfq] RSP: = ffffc90007c67b20=20 > [89252.410858] CR2: 0000000000000018=20 > [89252.410858] CR2: 0000000000000018=20 > [89252.416257] ---[ end trace ec48e99a50d88cb4 ]---=20 > [89252.416257] ---[ end trace ec48e99a50d88cb4 ]--- >=20 > $ gdb bfq.ko > (gdb) list *(bfq_setup_cooperator+0x16) > 0x1d76 is in bfq_setup_cooperator (block/bfq-iosched.c:1959). > 1954 bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue = *bfqq, > 1955 void *io_struct, bool request) > 1956 { > 1957 struct bfq_queue *in_service_bfqq, *new_bfqq; > 1958 > 1959 if (bfqq->new_bfqq) > 1960 return bfqq->new_bfqq; > 1961 > 1962 if (!io_struct || > 1963 wr_from_too_long(bfqq) || >=20 At a first glance, bfq_insert_request seems to get a request that has not been properly initialized: the field (rq)->elv.priv[1], where bfq stores the bfq_queue the request belongs to (in the function bfq_prepare_request) happens to be null. Then the parameter bfqq at line 1959 is null, and the oops follows. This lets me suspect that something wrong happens around dm_mq_queue_rq, map_request and blk_insert_cloned_request. Unfortunately, I don't remember having ever read the code of those function. So I have written this short, preliminary report just to see whether it rings any bell for anyone reading this email. If not, I'll dig into these functions to check whether the bug is really where I suspect. Thanks, Paolo =20 > Bart. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: BFQ + dm-mpath 2017-08-30 16:31 ` Paolo Valente @ 2017-09-05 7:56 ` Paolo Valente 2017-09-05 14:15 ` Bart Van Assche 0 siblings, 1 reply; 25+ messages in thread From: Paolo Valente @ 2017-09-05 7:56 UTC (permalink / raw) To: Bart Van Assche; +Cc: linux-block@vger.kernel.org > Il giorno 30 ago 2017, alle ore 18:31, Paolo Valente = <paolo.valente@linaro.org> ha scritto: >=20 >>=20 >> Il giorno 25 ago 2017, alle ore 01:16, Bart Van Assche = <bart.vanassche@wdc.com> ha scritto: >>=20 >> Hello Paolo, >>=20 >=20 > Hi Bart >=20 >> Has BFQ ever been tested in combination with dm-mpath? >=20 > Unfortunately no. >=20 >> A few seconds >> after I had started a run of the srp-tests software with BFQ a kernel >> oops appeared on the console. The command I ran is: >>=20 >> $ while true; do ~bart/software/infiniband/srp-test/run_tests -d -r = 30 -e bfq; done >>=20 >> And this is what appeared on the console: >>=20 >> [89251.977201] BUG: unable to handle kernel NULL pointer dereference = at 0000000000000018=20 >> [89251.977201] BUG: unable to handle kernel NULL pointer dereference = at 0000000000000018=20 >> [89251.987831] IP: bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 >> [89251.987831] IP: bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 >> [89251.995241] PGD 0 =20 >> [89251.995241] PGD 0 =20 >> [89251.995243] P4D 0 =20 >> [89251.995243] P4D 0 =20 >> [89251.999194] =20 >> [89251.999194] =20 >> [89252.006423] Oops: 0000 [#1] PREEMPT SMP=20 >> [89252.006423] Oops: 0000 [#1] PREEMPT SMP=20 >> [89252.012354] Modules linked in: ib_srp libcrc32c scsi_transport_srp = target_core_file ib_srpt target_core_iblock target_core_mod scsi_debug = brd bfq dm_service_time rdma_cm >> iw_cm bridge stp llc ip6table_filter ip=20 >> 6_tables iptable_filter intel_rapl sb_edac x86_pkg_temp_thermal = intel_powerclamp coretemp kvm_intel kvm af_packet irqbypass ipmi_ssif = dcdbas crct10dif_pclmul ioatdma >> crc32_pclmul mei_me joydev ghash_clmulni_intel=20 >> mei aesni_intel sg shpchp ipmi_si aes_x86_64 ipmi_devintf crypto_simd = dca cryptd glue_helper lpc_ich ipmi_msghandler acpi_power_meter acpi_pad = wmi mfd_core ib_ipoib ib_cm >> ib_uverbs ib_umad mlx4_ib ib_core dm_mul=20 >> tipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole = configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 mlx4_en = hid_generic usbhid mgag200 i2c_algo_bit >> drm_kms_helper=20 >> [89252.012354] Modules linked in: ib_srp libcrc32c scsi_transport_srp = target_core_file ib_srpt target_core_iblock target_core_mod scsi_debug = brd bfq dm_service_time rdma_cm >> iw_cm bridge stp llc ip6table_filter ip=20 >> 6_tables iptable_filter intel_rapl sb_edac x86_pkg_temp_thermal = intel_powerclamp coretemp kvm_intel kvm af_packet irqbypass ipmi_ssif = dcdbas crct10dif_pclmul ioatdma >> crc32_pclmul mei_me joydev ghash_clmulni_intel=20 >> mei aesni_intel sg shpchp ipmi_si aes_x86_64 ipmi_devintf crypto_simd = dca cryptd glue_helper lpc_ich ipmi_msghandler acpi_power_meter acpi_pad = wmi mfd_core ib_ipoib ib_cm >> ib_uverbs ib_umad mlx4_ib ib_core dm_mul=20 >> tipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua netconsole = configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 mlx4_en = hid_generic usbhid mgag200 i2c_algo_bit >> drm_kms_helper=20 >> [89252.103941] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm = ehci_pci mlx4_core tg3 ehci_hcd crc32c_intel devlink drm ptp usbcore = usb_common pps_core megaraid_sas libphy >> [last unloaded: scsi_transport_srp]=20 >> [89252.103941] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm = ehci_pci mlx4_core tg3 ehci_hcd crc32c_intel devlink drm ptp usbcore = usb_common pps_core megaraid_sas libphy >> [last unloaded: scsi_transport_srp]=20 >> [89252.128375] CPU: 13 PID: 352 Comm: kworker/13:1H Tainted: G = W 4.13.0-rc6-dbg+ #2=20 >> [89252.128375] CPU: 13 PID: 352 Comm: kworker/13:1H Tainted: G = W 4.13.0-rc6-dbg+ #2=20 >> [89252.139884] Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS = 1.3.6 09/11/2012=20 >> [89252.139884] Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS = 1.3.6 09/11/2012=20 >> [89252.150243] Workqueue: kblockd blk_mq_run_work_fn=20 >> [89252.150243] Workqueue: kblockd blk_mq_run_work_fn=20 >> [89252.157449] task: ffff880911d80040 task.stack: ffffc90007c64000=20 >> [89252.157449] task: ffff880911d80040 task.stack: ffffc90007c64000=20 >> [89252.166038] RIP: 0010:bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 >> [89252.166038] RIP: 0010:bfq_setup_cooperator+0x16/0x2e0 [bfq]=20 >> [89252.174222] RSP: 0018:ffffc90007c67b20 EFLAGS: 00010082=20 >> [89252.174222] RSP: 0018:ffffc90007c67b20 EFLAGS: 00010082=20 >> [89252.182026] RAX: 00000000ffffffe0 RBX: ffff8807b9988700 RCX: = 0000000000000001=20 >> [89252.182026] RAX: 00000000ffffffe0 RBX: ffff8807b9988700 RCX: = 0000000000000001=20 >> [89252.191990] RDX: ffff8807b9988700 RSI: 0000000000000000 RDI: = ffff880288554a68=20 >> [89252.191990] RDX: ffff8807b9988700 RSI: 0000000000000000 RDI: = ffff880288554a68=20 >> [89252.201956] RBP: ffffc90007c67b68 R08: ffffffff825820c0 R09: = 0000000000000000=20 >> [89252.201956] RBP: ffffc90007c67b68 R08: ffffffff825820c0 R09: = 0000000000000000=20 >> [89252.211899] R10: ffffc90007c67ae8 R11: ffffffffa05a5ad5 R12: = ffff880288554dc8=20 >> [89252.211899] R10: ffffc90007c67ae8 R11: ffffffffa05a5ad5 R12: = ffff880288554dc8=20 >> [89252.221821] R13: ffff880288554a68 R14: ffff880928078bd0 R15: = ffff8807bbe03528=20 >> [89252.221821] R13: ffff880288554a68 R14: ffff880928078bd0 R15: = ffff8807bbe03528=20 >> [89252.231715] FS: 0000000000000000(0000) GS:ffff88093f780000(0000) = knlGS:0000000000000000=20 >> [89252.231715] FS: 0000000000000000(0000) GS:ffff88093f780000(0000) = knlGS:0000000000000000=20 >> [89252.242667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033=20 >> [89252.242667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033=20 >> [89252.250958] CR2: 0000000000000018 CR3: 00000004aa731000 CR4: = 00000000000406e0 >> [89252.250958] CR2: 0000000000000018 CR3: 00000004aa731000 CR4: = 00000000000406e0=20 >> [89252.260774] Call Trace:=20 >> [89252.260774] Call Trace:=20 >> [89252.265341] bfq_insert_requests+0x10d/0xf20 [bfq]=20 >> [89252.265341] bfq_insert_requests+0x10d/0xf20 [bfq]=20 >> [89252.272549] ? lock_acquire+0xdc/0x1d0=20 >> [89252.272549] ? lock_acquire+0xdc/0x1d0=20 >> [89252.278566] blk_mq_sched_insert_request+0x123/0x170=20 >> [89252.278566] blk_mq_sched_insert_request+0x123/0x170=20 >> [89252.285936] blk_insert_cloned_request+0xae/0x1f0=20 >> [89252.285936] blk_insert_cloned_request+0xae/0x1f0=20 >> [89252.293026] map_request+0x123/0x290 [dm_mod]=20 >> [89252.293026] map_request+0x123/0x290 [dm_mod]=20 >> [89252.299705] dm_mq_queue_rq+0x94/0x110 [dm_mod]=20 >> [89252.299705] dm_mq_queue_rq+0x94/0x110 [dm_mod]=20 >> [89252.306557] blk_mq_dispatch_rq_list+0x1f1/0x3a0=20 >> [89252.306557] blk_mq_dispatch_rq_list+0x1f1/0x3a0=20 >> [89252.313520] blk_mq_sched_dispatch_requests+0x16d/0x1e0=20 >> [89252.313520] blk_mq_sched_dispatch_requests+0x16d/0x1e0=20 >> [89252.321183] __blk_mq_run_hw_queue+0x11d/0x1c0=20 >> [89252.321183] __blk_mq_run_hw_queue+0x11d/0x1c0=20 >> [89252.327987] blk_mq_run_work_fn+0x2c/0x30=20 >> [89252.327987] blk_mq_run_work_fn+0x2c/0x30=20 >> [89252.334284] process_one_work+0x200/0x630=20 >> [89252.334284] process_one_work+0x200/0x630=20 >> [89252.340515] worker_thread+0x4e/0x3b0=20 >> [89252.340515] worker_thread+0x4e/0x3b0=20 >> [89252.346317] kthread+0x113/0x150=20 >> [89252.346317] kthread+0x113/0x150=20 >> [89252.351572] ? process_one_work+0x630/0x630=20 >> [89252.351572] ? process_one_work+0x630/0x630=20 >> [89252.357890] ? kthread_create_on_node+0x40/0x40=20 >> [89252.357890] ? kthread_create_on_node+0x40/0x40=20 >> [89252.364607] ? kthread_create_on_node+0x40/0x40=20 >> [89252.364607] ? kthread_create_on_node+0x40/0x40=20 >> [89252.371303] ret_from_fork+0x27/0x40=20 >> [89252.371303] ret_from_fork+0x27/0x40=20 >> [89252.376912] Code: 96 08 01 00 00 48 c1 ea 06 83 e2 01 89 d0 5d c3 = 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 = 83 ec 20 <4c> 8b 66 18 4d 85 e4 74 >> 12 48 83 c4 20 4c 89 e0 5b 41 5c 41=20 >> 5d =20 >> [89252.376912] Code: 96 08 01 00 00 48 c1 ea 06 83 e2 01 89 d0 5d c3 = 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 = 83 ec 20 <4c> 8b 66 18 4d 85 e4 74 >> 12 48 83 c4 20 4c 89 e0 5b 41 5c 41=20 >> 5d =20 >> [89252.401319] RIP: bfq_setup_cooperator+0x16/0x2e0 [bfq] RSP: = ffffc90007c67b20=20 >> [89252.401319] RIP: bfq_setup_cooperator+0x16/0x2e0 [bfq] RSP: = ffffc90007c67b20=20 >> [89252.410858] CR2: 0000000000000018=20 >> [89252.410858] CR2: 0000000000000018=20 >> [89252.416257] ---[ end trace ec48e99a50d88cb4 ]---=20 >> [89252.416257] ---[ end trace ec48e99a50d88cb4 ]--- >>=20 >> $ gdb bfq.ko >> (gdb) list *(bfq_setup_cooperator+0x16) >> 0x1d76 is in bfq_setup_cooperator (block/bfq-iosched.c:1959). >> 1954 bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue = *bfqq, >> 1955 void *io_struct, bool request) >> 1956 { >> 1957 struct bfq_queue *in_service_bfqq, *new_bfqq; >> 1958 >> 1959 if (bfqq->new_bfqq) >> 1960 return bfqq->new_bfqq; >> 1961 >> 1962 if (!io_struct || >> 1963 wr_from_too_long(bfqq) || >>=20 >=20 > At a first glance, bfq_insert_request seems to get a request that has > not been properly initialized: the field (rq)->elv.priv[1], where bfq > stores the bfq_queue the request belongs to (in the function > bfq_prepare_request) happens to be null. Then the parameter bfqq at > line 1959 is null, and the oops follows. >=20 > This lets me suspect that something wrong happens around > dm_mq_queue_rq, map_request and blk_insert_cloned_request. > Unfortunately, I don't remember having ever read the code of those > function. So I have written this short, preliminary report just to > see whether it rings any bell for anyone reading this email. If not, > I'll dig into these functions to check whether the bug is really where > I suspect. >=20 Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request. This seems an error for any scheduler that needs to initialize fields in the incoming request, or in general to take some preliminary action. Am I missing something here? Thanks, Paolo > Thanks, > Paolo >=20 >> Bart. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: BFQ + dm-mpath 2017-09-05 7:56 ` Paolo Valente @ 2017-09-05 14:15 ` Bart Van Assche 2017-09-07 15:52 ` Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Bart Van Assche @ 2017-09-05 14:15 UTC (permalink / raw) To: paolo.valente@linaro.org; +Cc: linux-block@vger.kernel.org, snitzer@redhat.com T24gVHVlLCAyMDE3LTA5LTA1IGF0IDA5OjU2ICswMjAwLCBQYW9sbyBWYWxlbnRlIHdyb3RlOg0K PiBPaywgbXkgc3VzcGVjdHMgc2VlbSBjb25maXJtZWQ6IHRoZSBwYXRoIGRtX21xX3F1ZXVlX3Jx IC0+IG1hcF9yZXF1ZXN0DQo+IC0+IHNldHVwX2Nsb25lIC0+IGJsa19ycV9wcmVwX2Nsb25lIGNy ZWF0ZXMgYSBjbG9uZWQgcmVxdWVzdCB3aXRob3V0DQo+IGludm9raW5nIGUtPnR5cGUtPm9wcy5t cS5wcmVwYXJlX3JlcXVlc3QgZm9yIHRoZSB0YXJnZXQgZWxldmF0b3IgZS4NCj4gVGhlIGNsb25l ZCByZXF1ZXN0IGlzIHRoZXJlZm9yZSBub3QgaW5pdGlhbGl6ZWQgZm9yIHRoZSBzY2hlZHVsZXIs IGJ1dA0KPiBpdCBpcyBob3dldmVyIGluc2VydGVkIGludG8gdGhlIHNjaGVkdWxlciBieQ0KPiBi bGtfbXFfc2NoZWRfaW5zZXJ0X3JlcXVlc3QuICBUaGlzIHNlZW1zIGFuIGVycm9yIGZvciBhbnkg c2NoZWR1bGVyDQo+IHRoYXQgbmVlZHMgdG8gaW5pdGlhbGl6ZSBmaWVsZHMgaW4gdGhlIGluY29t aW5nIHJlcXVlc3QsIG9yIGluIGdlbmVyYWwNCj4gdG8gdGFrZSBzb21lIHByZWxpbWluYXJ5IGFj dGlvbi4NCj4gDQo+IEFtIEkgbWlzc2luZyBzb21ldGhpbmcgaGVyZT8NCg0KKCtNaWtlIFNuaXR6 ZXIpDQoNCk1pa2UsIGRvIHlvdSBwZXJoYXBzIGhhdmUgdGhlIHRpbWUgdG8gbG9vayBpbnRvIHRo aXMgbWVtb3J5IGxlYWs/DQoNClRoYW5rcywNCg0KQmFydC4= ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: BFQ + dm-mpath 2017-09-05 14:15 ` Bart Van Assche @ 2017-09-07 15:52 ` Mike Snitzer 2017-09-08 9:13 ` Paolo Valente 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-07 15:52 UTC (permalink / raw) To: Bart Van Assche Cc: paolo.valente@linaro.org, linux-block@vger.kernel.org, dm-devel, axboe On Tue, Sep 05 2017 at 10:15am -0400, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote: > > Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request > > -> setup_clone -> blk_rq_prep_clone creates a cloned request without > > invoking e->type->ops.mq.prepare_request for the target elevator e. > > The cloned request is therefore not initialized for the scheduler, but > > it is however inserted into the scheduler by > > blk_mq_sched_insert_request. This seems an error for any scheduler > > that needs to initialize fields in the incoming request, or in general > > to take some preliminary action. > > > > Am I missing something here? > > (+Mike Snitzer) > > Mike, do you perhaps have the time to look into this memory leak? It isn't a memory leak, it is missing initialization in the case of cloned requests (if I'm understanding Paolo correctly). But cloned requests shouldn't be going through the scheduler. Only the original requests should. Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched from blk_mq_insert_request() to blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to this bug. Could be we need to take steps to ensure the block layer still supports bypassing the elevator by using direct insertion? Or blk_mq_sched_insert_request() needs updating to check if e->type->ops.mq.prepare_request were actually performed and to fallback to the !elevator case if not.. Not sure on the fix, but I can look closer if others (like Jens or Paolo) don't have quicker suggestions. Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: BFQ + dm-mpath 2017-09-07 15:52 ` Mike Snitzer @ 2017-09-08 9:13 ` Paolo Valente 2017-09-08 16:41 ` [PATCH] dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Paolo Valente @ 2017-09-08 9:13 UTC (permalink / raw) To: Mike Snitzer Cc: Bart Van Assche, linux-block@vger.kernel.org, dm-devel, axboe > Il giorno 07 set 2017, alle ore 17:52, Mike Snitzer = <snitzer@redhat.com> ha scritto: >=20 > On Tue, Sep 05 2017 at 10:15am -0400, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: >=20 >> On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote: >>> Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> = map_request >>> -> setup_clone -> blk_rq_prep_clone creates a cloned request without >>> invoking e->type->ops.mq.prepare_request for the target elevator e. >>> The cloned request is therefore not initialized for the scheduler, = but >>> it is however inserted into the scheduler by >>> blk_mq_sched_insert_request. This seems an error for any scheduler >>> that needs to initialize fields in the incoming request, or in = general >>> to take some preliminary action. >>>=20 >>> Am I missing something here? >>=20 >> (+Mike Snitzer) >>=20 >> Mike, do you perhaps have the time to look into this memory leak? >=20 > It isn't a memory leak, it is missing initialization in the case of > cloned requests (if I'm understanding Paolo correctly). >=20 Exactly! > But cloned requests shouldn't be going through the scheduler. Only = the > original requests should. >=20 > Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO > schedulers") switched from blk_mq_insert_request() to > blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to > this bug. >=20 > Could be we need to take steps to ensure the block layer still > supports bypassing the elevator by using direct insertion? >=20 > Or blk_mq_sched_insert_request() needs updating to check if > e->type->ops.mq.prepare_request were actually performed and to = fallback > to the !elevator case if not.. >=20 > Not sure on the fix, but I can look closer if others (like Jens or > Paolo) don't have quicker suggestions. >=20 No quick suggestion from me :( Thanks for analyzing this bug, Paolo > Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] 2017-09-08 9:13 ` Paolo Valente @ 2017-09-08 16:41 ` Mike Snitzer 2017-09-08 16:48 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-08 16:41 UTC (permalink / raw) To: Paolo Valente, axboe, Bart Van Assche Cc: linux-block@vger.kernel.org, dm-devel On Fri, Sep 08 2017 at 5:13P -0400, Paolo Valente <paolo.valente@linaro.org> wrote: > > > Il giorno 07 set 2017, alle ore 17:52, Mike Snitzer <snitzer@redhat.com> ha scritto: > > > > On Tue, Sep 05 2017 at 10:15am -0400, > > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > >> On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote: > >>> Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request > >>> -> setup_clone -> blk_rq_prep_clone creates a cloned request without > >>> invoking e->type->ops.mq.prepare_request for the target elevator e. > >>> The cloned request is therefore not initialized for the scheduler, but > >>> it is however inserted into the scheduler by > >>> blk_mq_sched_insert_request. This seems an error for any scheduler > >>> that needs to initialize fields in the incoming request, or in general > >>> to take some preliminary action. > >>> > >>> Am I missing something here? > >> > >> (+Mike Snitzer) > >> > >> Mike, do you perhaps have the time to look into this memory leak? > > > > It isn't a memory leak, it is missing initialization in the case of > > cloned requests (if I'm understanding Paolo correctly). > > > > Exactly! > > > But cloned requests shouldn't be going through the scheduler. Only the > > original requests should. > > > > Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO > > schedulers") switched from blk_mq_insert_request() to > > blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to > > this bug. > > > > Could be we need to take steps to ensure the block layer still > > supports bypassing the elevator by using direct insertion? > > > > Or blk_mq_sched_insert_request() needs updating to check if > > e->type->ops.mq.prepare_request were actually performed and to fallback > > to the !elevator case if not.. > > > > Not sure on the fix, but I can look closer if others (like Jens or > > Paolo) don't have quicker suggestions. > > > > No quick suggestion from me :( > > Thanks for analyzing this bug, Please see the following untested patch. All testing/review/comments/acks appreciated. I elected to use elevator_change() rather than fiddle with adding a new blk-mq elevator hook (e.g. ->request_prepared) to verify that each blk-mq elevator enabled request did in fact get prepared. Bart, please test this patch and reply with your review/feedback. Jens, if you're OK with this solution please reply with your Ack and I'll send it to Linus along with the rest of the handful of DM changes I have for 4.14. Thanks, Mike From: Mike Snitzer <snitzer@redhat.com> Date: Fri, 8 Sep 2017 11:45:13 -0400 Subject: [PATCH] dm mpath: switch IO scheduler of underlying paths to "none" A NULL pointer crash was reported for the case of having the BFQ IO scheduler attached to the underlying paths of a DM multipath device. The crash occurs in blk_mq_sched_insert_request()'s call to e->type->ops.mq.insert_requests(). Paolo Valente correctly summarized why the crash occured with: "the call chain (dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone) creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request." All said, there is no reason for IO scheduling in the underlying paths because the top-level DM multipath request_queue handles all IO scheduling of the original requests issued to the multipath device. The multipath device's clones of the original requests are then just inserted directly into the underlying path's dispatch queue(s). Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any. To fix this DM multipath now explicitly removes the IO scheduler from all underlying paths during multipath device initialization. To do so elevator_change() is needed, so elevator_change() is reinstated by reverting commit c033269490 ("block: Remove elevator_change()"). Fixes: bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/elevator.c | 13 +++++++++++++ drivers/md/dm-mpath.c | 14 ++++++++++++-- include/linux/elevator.h | 1 + 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/block/elevator.c b/block/elevator.c index 4bb2f0c..a5d9639 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -1084,6 +1084,19 @@ static int __elevator_change(struct request_queue *q, const char *name) return elevator_switch(q, e); } +int elevator_change(struct request_queue *q, const char *name) +{ + int ret; + + /* Protect q->elevator from elevator_init() */ + mutex_lock(&q->sysfs_lock); + ret = __elevator_change(q, name); + mutex_unlock(&q->sysfs_lock); + + return ret; +} +EXPORT_SYMBOL(elevator_change); + static inline bool elv_support_iosched(struct request_queue *q) { if (q->mq_ops && q->tag_set && (q->tag_set->flags & diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index bf280a9..de046b0 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -25,6 +25,7 @@ #include <scsi/scsi_dh.h> #include <linux/atomic.h> #include <linux/blk-mq.h> +#include <linux/elevator.h> #define DM_MSG_PREFIX "multipath" #define DM_PG_INIT_DELAY_MSECS 2000 @@ -757,8 +758,17 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps goto bad; } - if (test_bit(MPATHF_RETAIN_ATTACHED_HW_HANDLER, &m->flags) || m->hw_handler_name) - q = bdev_get_queue(p->path.dev->bdev); + q = bdev_get_queue(p->path.dev->bdev); + + /* + * The underlying path's IO scheduler is _not_ used because all + * scheduling is done by the top-level multipath request_queue. + */ + if (elevator_change(q, "none")) { + ti->error = "error switching underlying path's IO scheduler to 'none'"; + dm_put_device(ti, p->path.dev); + goto bad; + } if (test_bit(MPATHF_RETAIN_ATTACHED_HW_HANDLER, &m->flags)) { retain: diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 5bc8f86..fe24004 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -220,6 +220,7 @@ extern ssize_t elv_iosched_store(struct request_queue *, const char *, size_t); extern int elevator_init(struct request_queue *, char *); extern void elevator_exit(struct request_queue *, struct elevator_queue *); +extern int elevator_change(struct request_queue *, const char *); extern bool elv_bio_merge_ok(struct request *, struct bio *); extern struct elevator_queue *elevator_alloc(struct request_queue *, struct elevator_type *); -- 2.10.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] 2017-09-08 16:41 ` [PATCH] dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] Mike Snitzer @ 2017-09-08 16:48 ` Jens Axboe 2017-09-08 17:07 ` Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2017-09-08 16:48 UTC (permalink / raw) To: Mike Snitzer, Paolo Valente, Bart Van Assche Cc: linux-block@vger.kernel.org, dm-devel On 09/08/2017 10:41 AM, Mike Snitzer wrote: > On Fri, Sep 08 2017 at 5:13P -0400, > Paolo Valente <paolo.valente@linaro.org> wrote: > >> >>> Il giorno 07 set 2017, alle ore 17:52, Mike Snitzer <snitzer@redhat.com> ha scritto: >>> >>> On Tue, Sep 05 2017 at 10:15am -0400, >>> Bart Van Assche <Bart.VanAssche@wdc.com> wrote: >>> >>>> On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote: >>>>> Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request >>>>> -> setup_clone -> blk_rq_prep_clone creates a cloned request without >>>>> invoking e->type->ops.mq.prepare_request for the target elevator e. >>>>> The cloned request is therefore not initialized for the scheduler, but >>>>> it is however inserted into the scheduler by >>>>> blk_mq_sched_insert_request. This seems an error for any scheduler >>>>> that needs to initialize fields in the incoming request, or in general >>>>> to take some preliminary action. >>>>> >>>>> Am I missing something here? >>>> >>>> (+Mike Snitzer) >>>> >>>> Mike, do you perhaps have the time to look into this memory leak? >>> >>> It isn't a memory leak, it is missing initialization in the case of >>> cloned requests (if I'm understanding Paolo correctly). >>> >> >> Exactly! >> >>> But cloned requests shouldn't be going through the scheduler. Only the >>> original requests should. >>> >>> Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO >>> schedulers") switched from blk_mq_insert_request() to >>> blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to >>> this bug. >>> >>> Could be we need to take steps to ensure the block layer still >>> supports bypassing the elevator by using direct insertion? >>> >>> Or blk_mq_sched_insert_request() needs updating to check if >>> e->type->ops.mq.prepare_request were actually performed and to fallback >>> to the !elevator case if not.. >>> >>> Not sure on the fix, but I can look closer if others (like Jens or >>> Paolo) don't have quicker suggestions. >>> >> >> No quick suggestion from me :( >> >> Thanks for analyzing this bug, > > Please see the following untested patch. All > testing/review/comments/acks appreciated. > > I elected to use elevator_change() rather than fiddle with adding a new > blk-mq elevator hook (e.g. ->request_prepared) to verify that each > blk-mq elevator enabled request did in fact get prepared. > > Bart, please test this patch and reply with your review/feedback. > > Jens, if you're OK with this solution please reply with your Ack and > I'll send it to Linus along with the rest of the handful of DM changes I > have for 4.14. I am not - we used to have this elevator change functionality from inside the kernel, and finally got rid of it when certain drivers killed it. I don't want to be bringing it back. Sounds like we have two issues here. One is that we run into issues with stacking IO schedulers, and the other is that we'd rather not have multiple schedulers in play for a stacked setup. Maybe it'd be cleaner to have the dm-mq side of things not insert through the scheduler, but rather just FIFO on the target end? -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] 2017-09-08 16:48 ` Jens Axboe @ 2017-09-08 17:07 ` Mike Snitzer 2017-09-08 19:58 ` Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-08 17:07 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Fri, Sep 08 2017 at 12:48pm -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 09/08/2017 10:41 AM, Mike Snitzer wrote: > > On Fri, Sep 08 2017 at 5:13P -0400, > > Paolo Valente <paolo.valente@linaro.org> wrote: > > > >> > >>> Il giorno 07 set 2017, alle ore 17:52, Mike Snitzer <snitzer@redhat.com> ha scritto: > >>> > >>> On Tue, Sep 05 2017 at 10:15am -0400, > >>> Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > >>> > >>>> On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote: > >>>>> Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request > >>>>> -> setup_clone -> blk_rq_prep_clone creates a cloned request without > >>>>> invoking e->type->ops.mq.prepare_request for the target elevator e. > >>>>> The cloned request is therefore not initialized for the scheduler, but > >>>>> it is however inserted into the scheduler by > >>>>> blk_mq_sched_insert_request. This seems an error for any scheduler > >>>>> that needs to initialize fields in the incoming request, or in general > >>>>> to take some preliminary action. > >>>>> > >>>>> Am I missing something here? > >>>> > >>>> (+Mike Snitzer) > >>>> > >>>> Mike, do you perhaps have the time to look into this memory leak? > >>> > >>> It isn't a memory leak, it is missing initialization in the case of > >>> cloned requests (if I'm understanding Paolo correctly). > >>> > >> > >> Exactly! > >> > >>> But cloned requests shouldn't be going through the scheduler. Only the > >>> original requests should. > >>> > >>> Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO > >>> schedulers") switched from blk_mq_insert_request() to > >>> blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to > >>> this bug. > >>> > >>> Could be we need to take steps to ensure the block layer still > >>> supports bypassing the elevator by using direct insertion? > >>> > >>> Or blk_mq_sched_insert_request() needs updating to check if > >>> e->type->ops.mq.prepare_request were actually performed and to fallback > >>> to the !elevator case if not.. > >>> > >>> Not sure on the fix, but I can look closer if others (like Jens or > >>> Paolo) don't have quicker suggestions. > >>> > >> > >> No quick suggestion from me :( > >> > >> Thanks for analyzing this bug, > > > > Please see the following untested patch. All > > testing/review/comments/acks appreciated. > > > > I elected to use elevator_change() rather than fiddle with adding a new > > blk-mq elevator hook (e.g. ->request_prepared) to verify that each > > blk-mq elevator enabled request did in fact get prepared. > > > > Bart, please test this patch and reply with your review/feedback. > > > > Jens, if you're OK with this solution please reply with your Ack and > > I'll send it to Linus along with the rest of the handful of DM changes I > > have for 4.14. > > I am not - we used to have this elevator change functionality from > inside the kernel, and finally got rid of it when certain drivers killed > it. I don't want to be bringing it back. Fine. > Sounds like we have two issues here. One is that we run into issues with > stacking IO schedulers, and the other is that we'd rather not have > multiple schedulers in play for a stacked setup. > > Maybe it'd be cleaner to have the dm-mq side of things not insert > through the scheduler, but rather just FIFO on the target end? That was how blk_insert_cloned_request() was before. From the patch header (you may have missed it): "Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any." So shouldn't blk_insert_cloned_request() be made to _not_ use blk_mq_sched_insert_request()? We'd need a new block interface established, or equivalent open-coded in blk_insert_cloned_request(), to handle direct dispatch to an mq request_queue's queue(s). Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] 2017-09-08 17:07 ` Mike Snitzer @ 2017-09-08 19:58 ` Mike Snitzer 2017-09-08 20:28 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-08 19:58 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Fri, Sep 08 2017 at 1:07pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Fri, Sep 08 2017 at 12:48pm -0400, > Jens Axboe <axboe@kernel.dk> wrote: > > > > Please see the following untested patch. All > > > testing/review/comments/acks appreciated. > > > > > > I elected to use elevator_change() rather than fiddle with adding a new > > > blk-mq elevator hook (e.g. ->request_prepared) to verify that each > > > blk-mq elevator enabled request did in fact get prepared. > > > > > > Bart, please test this patch and reply with your review/feedback. > > > > > > Jens, if you're OK with this solution please reply with your Ack and > > > I'll send it to Linus along with the rest of the handful of DM changes I > > > have for 4.14. > > > > I am not - we used to have this elevator change functionality from > > inside the kernel, and finally got rid of it when certain drivers killed > > it. I don't want to be bringing it back. > > Fine. BTW, while I conceded "Fine": I think your justification for not reintroducing elevator_change() lacks substance. What is inherently problematic about elevator_change()? Having an elevator attached to a DM multipath device's underlying path's request_queue just asks for trouble (especially given the blk-mq elevator interface). Please own this issue as a regression and help me arrive at a timely way forward. Thanks, Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] 2017-09-08 19:58 ` Mike Snitzer @ 2017-09-08 20:28 ` Jens Axboe 2017-09-08 21:42 ` [PATCH] block: directly insert blk-mq request from blk_insert_cloned_request() Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2017-09-08 20:28 UTC (permalink / raw) To: Mike Snitzer Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On 09/08/2017 01:58 PM, Mike Snitzer wrote: > On Fri, Sep 08 2017 at 1:07pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > >> On Fri, Sep 08 2017 at 12:48pm -0400, >> Jens Axboe <axboe@kernel.dk> wrote: >> >>>> Please see the following untested patch. All >>>> testing/review/comments/acks appreciated. >>>> >>>> I elected to use elevator_change() rather than fiddle with adding a new >>>> blk-mq elevator hook (e.g. ->request_prepared) to verify that each >>>> blk-mq elevator enabled request did in fact get prepared. >>>> >>>> Bart, please test this patch and reply with your review/feedback. >>>> >>>> Jens, if you're OK with this solution please reply with your Ack and >>>> I'll send it to Linus along with the rest of the handful of DM changes I >>>> have for 4.14. >>> >>> I am not - we used to have this elevator change functionality from >>> inside the kernel, and finally got rid of it when certain drivers killed >>> it. I don't want to be bringing it back. >> >> Fine. > > BTW, while I conceded "Fine": I think your justification for not > reintroducing elevator_change() lacks substance. What is inherently > problematic about elevator_change()? Because no in-kernel users should be mucking with the IO scheduler. Adding this back is just an excuse for drivers to start doing it again, which generally happens because whatever vendors driver team tests some synthetic benchmark and decide that X is better than the default of Y. So we're not going back to that. > Having an elevator attached to a DM multipath device's underlying path's > request_queue just asks for trouble (especially given the blk-mq > elevator interface). > > Please own this issue as a regression and help me arrive at a timely way > forward. I'm trying, I made suggestions on how we can proceed - we can have a way to insert to hctx->dispatch without bothering the IO scheduler. I'm open to other suggestions as well, just not open to exporting an interface to change IO schedulers from inside the kernel. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-08 20:28 ` Jens Axboe @ 2017-09-08 21:42 ` Mike Snitzer 2017-09-08 21:50 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-08 21:42 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Fri, Sep 08 2017 at 4:28P -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 09/08/2017 01:58 PM, Mike Snitzer wrote: > > On Fri, Sep 08 2017 at 1:07pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > >> On Fri, Sep 08 2017 at 12:48pm -0400, > >> Jens Axboe <axboe@kernel.dk> wrote: > >> > >>>> Please see the following untested patch. All > >>>> testing/review/comments/acks appreciated. > >>>> > >>>> I elected to use elevator_change() rather than fiddle with adding a new > >>>> blk-mq elevator hook (e.g. ->request_prepared) to verify that each > >>>> blk-mq elevator enabled request did in fact get prepared. > >>>> > >>>> Bart, please test this patch and reply with your review/feedback. > >>>> > >>>> Jens, if you're OK with this solution please reply with your Ack and > >>>> I'll send it to Linus along with the rest of the handful of DM changes I > >>>> have for 4.14. > >>> > >>> I am not - we used to have this elevator change functionality from > >>> inside the kernel, and finally got rid of it when certain drivers killed > >>> it. I don't want to be bringing it back. > >> > >> Fine. > > > > BTW, while I conceded "Fine": I think your justification for not > > reintroducing elevator_change() lacks substance. What is inherently > > problematic about elevator_change()? > > Because no in-kernel users should be mucking with the IO scheduler. Adding > this back is just an excuse for drivers to start doing it again, which > generally happens because whatever vendors driver team tests some synthetic > benchmark and decide that X is better than the default of Y. So we're not > going back to that. Fine. But I really mean it this time, fair position, thanks :) > > Having an elevator attached to a DM multipath device's underlying path's > > request_queue just asks for trouble (especially given the blk-mq > > elevator interface). > > > > Please own this issue as a regression and help me arrive at a timely way > > forward. > > I'm trying, I made suggestions on how we can proceed - we can have a way > to insert to hctx->dispatch without bothering the IO scheduler. I'm > open to other suggestions as well, just not open to exporting an > interface to change IO schedulers from inside the kernel. What do you think of this? From: Mike Snitzer <snitzer@redhat.com> Date: Fri, 8 Sep 2017 11:45:13 -0400 Subject: [PATCH] block: directly insert blk-mq request from blk_insert_cloned_request() A NULL pointer crash was reported for the case of having the BFQ IO scheduler attached to the underlying blk-mq paths of a DM multipath device. The crash occured in blk_mq_sched_insert_request()'s call to e->type->ops.mq.insert_requests(). Paolo Valente correctly summarized why the crash occured with: "the call chain (dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone) creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request." All said, a request-based DM multipath device's IO scheduler should be the only one used -- when the original requests are issued to the underlying paths as cloned requests they are inserted directly in the underlying dispatch queue(s) rather than through an additional elevator. But commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any. To fix this re-introduce blk_mq_insert_request(), albeit simpler and blk-mq private, that blk_insert_cloned_request() calls to insert the request without involving any elevator that may be attached to the cloned request's request_queue. Fixes: bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 2 +- block/blk-mq.c | 27 ++++++++++++++++++--------- block/blk-mq.h | 1 + 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index d709c0e..7a06b2b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2342,7 +2342,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * if (q->mq_ops) { if (blk_queue_io_stat(q)) blk_account_io_start(rq, true); - blk_mq_sched_insert_request(rq, false, true, false, false); + blk_mq_insert_request(rq); return BLK_STS_OK; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 3f18cff..5c5bb3f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1401,6 +1401,24 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_mq_hctx_mark_pending(hctx, ctx); } +static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, + struct blk_mq_ctx *ctx, + struct request *rq) +{ + spin_lock(&ctx->lock); + __blk_mq_insert_request(hctx, rq, false); + spin_unlock(&ctx->lock); +} + +void blk_mq_insert_request(struct request *rq) +{ + struct blk_mq_ctx *ctx = rq->mq_ctx; + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); + + blk_mq_queue_io(hctx, ctx, rq); + blk_mq_run_hw_queue(hctx, false); +} + void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list) @@ -1494,15 +1512,6 @@ static inline bool hctx_allow_merges(struct blk_mq_hw_ctx *hctx) !blk_queue_nomerges(hctx->queue); } -static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, - struct blk_mq_ctx *ctx, - struct request *rq) -{ - spin_lock(&ctx->lock); - __blk_mq_insert_request(hctx, rq, false); - spin_unlock(&ctx->lock); -} - static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq) { if (rq->tag != -1) diff --git a/block/blk-mq.h b/block/blk-mq.h index 98252b7..678ab76 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -54,6 +54,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, */ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head); +void blk_mq_insert_request(struct request *rq); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); -- 2.10.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-08 21:42 ` [PATCH] block: directly insert blk-mq request from blk_insert_cloned_request() Mike Snitzer @ 2017-09-08 21:50 ` Jens Axboe 2017-09-08 22:03 ` Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2017-09-08 21:50 UTC (permalink / raw) To: Mike Snitzer Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On 09/08/2017 03:42 PM, Mike Snitzer wrote: > diff --git a/block/blk-core.c b/block/blk-core.c > index d709c0e..7a06b2b 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2342,7 +2342,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * > if (q->mq_ops) { > if (blk_queue_io_stat(q)) > blk_account_io_start(rq, true); > - blk_mq_sched_insert_request(rq, false, true, false, false); > + blk_mq_insert_request(rq); > return BLK_STS_OK; > } I think this is fine, since only dm uses this function. Would be nice to have some check though, to ensure it doesn't get misused in the future. > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 3f18cff..5c5bb3f 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1401,6 +1401,24 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > blk_mq_hctx_mark_pending(hctx, ctx); > } > > +static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, > + struct blk_mq_ctx *ctx, > + struct request *rq) > +{ > + spin_lock(&ctx->lock); > + __blk_mq_insert_request(hctx, rq, false); > + spin_unlock(&ctx->lock); > +} Any particular reason it isn't just added to the dispatch queue? > +void blk_mq_insert_request(struct request *rq) > +{ > + struct blk_mq_ctx *ctx = rq->mq_ctx; > + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); > + > + blk_mq_queue_io(hctx, ctx, rq); > + blk_mq_run_hw_queue(hctx, false); > +} Would probably be cleaner as blk_mq_insert_and_run_request() or something, to make sure it's understood that it also runs the queue. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-08 21:50 ` Jens Axboe @ 2017-09-08 22:03 ` Mike Snitzer 2017-09-11 16:16 ` [PATCH v2] " Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-08 22:03 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Fri, Sep 08 2017 at 5:50pm -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 09/08/2017 03:42 PM, Mike Snitzer wrote: > > diff --git a/block/blk-core.c b/block/blk-core.c > > index d709c0e..7a06b2b 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -2342,7 +2342,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * > > if (q->mq_ops) { > > if (blk_queue_io_stat(q)) > > blk_account_io_start(rq, true); > > - blk_mq_sched_insert_request(rq, false, true, false, false); > > + blk_mq_insert_request(rq); > > return BLK_STS_OK; > > } > > I think this is fine, since only dm uses this function. Would be nice to > have some check though, to ensure it doesn't get misused in the future. Not sure what kind of check you're thinking. > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 3f18cff..5c5bb3f 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -1401,6 +1401,24 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > > blk_mq_hctx_mark_pending(hctx, ctx); > > } > > > > +static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, > > + struct blk_mq_ctx *ctx, > > + struct request *rq) > > +{ > > + spin_lock(&ctx->lock); > > + __blk_mq_insert_request(hctx, rq, false); > > + spin_unlock(&ctx->lock); > > +} > > Any particular reason it isn't just added to the dispatch queue? I just started from the blk_mq_insert_request() that was removed as part of commit bd166ef18 and then simplified it by reusing blk_mq_queue_io() rather than open-coding it again. So I moved blk_mq_queue_io() higher in the file and re-used it. Is there more efficiency associated with adding direct to dispatch queue? If so then I suppose it is worth it! But me doing it would take a bit more effort (to review code and expand my horizons, but that is fine if that is what you'd prefer to see). > > +void blk_mq_insert_request(struct request *rq) > > +{ > > + struct blk_mq_ctx *ctx = rq->mq_ctx; > > + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); > > + > > + blk_mq_queue_io(hctx, ctx, rq); > > + blk_mq_run_hw_queue(hctx, false); > > +} > > Would probably be cleaner as blk_mq_insert_and_run_request() or > something, to make sure it's understood that it also runs the queue. That's fine. Feel free to tweak this patch however you like! But if you'd like me to completely own driving this patch forward I'll fold this in to v2. Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-08 22:03 ` Mike Snitzer @ 2017-09-11 16:16 ` Mike Snitzer 2017-09-11 20:51 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-11 16:16 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel Here is v2 that should obviate the need to rename blk_mq_insert_request (by using bools to control run_queue and async). As for inserting directly into dispatch, if that can be done that is great but I'd prefer to have that be a follow-up optimization. This fixes the regression in question, and does so in well-known terms. What do you think? Thanks, Mike From: Mike Snitzer <snitzer@redhat.com> Date: Fri, 8 Sep 2017 11:45:13 -0400 Subject: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() A NULL pointer crash was reported for the case of having the BFQ IO scheduler attached to the underlying blk-mq paths of a DM multipath device. The crash occured in blk_mq_sched_insert_request()'s call to e->type->ops.mq.insert_requests(). Paolo Valente correctly summarized why the crash occured with: "the call chain (dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone) creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request." All said, a request-based DM multipath device's IO scheduler should be the only one used -- when the original requests are issued to the underlying paths as cloned requests they are inserted directly in the underlying dispatch queue(s) rather than through an additional elevator. But commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any. To fix this re-introduce a blk-mq private blk_mq_insert_request() that blk_insert_cloned_request() calls to insert the request without involving any elevator that may be attached to the cloned request's request_queue. Fixes: bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-core.c | 2 +- block/blk-mq.c | 28 +++++++++++++++++++--------- block/blk-mq.h | 1 + 3 files changed, 21 insertions(+), 10 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index dbecbf4..9085013 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2330,7 +2330,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * if (q->mq_ops) { if (blk_queue_io_stat(q)) blk_account_io_start(rq, true); - blk_mq_sched_insert_request(rq, false, true, false, false); + blk_mq_insert_request(rq, true, false); return BLK_STS_OK; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 4603b11..05d9f7c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1357,6 +1357,25 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_mq_hctx_mark_pending(hctx, ctx); } +static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, + struct blk_mq_ctx *ctx, + struct request *rq) +{ + spin_lock(&ctx->lock); + __blk_mq_insert_request(hctx, rq, false); + spin_unlock(&ctx->lock); +} + +void blk_mq_insert_request(struct request *rq, bool run_queue, bool async) +{ + struct blk_mq_ctx *ctx = rq->mq_ctx; + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); + + blk_mq_queue_io(hctx, ctx, rq); + if (run_queue) + blk_mq_run_hw_queue(hctx, async); +} + void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list) @@ -1450,15 +1469,6 @@ static inline bool hctx_allow_merges(struct blk_mq_hw_ctx *hctx) !blk_queue_nomerges(hctx->queue); } -static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, - struct blk_mq_ctx *ctx, - struct request *rq) -{ - spin_lock(&ctx->lock); - __blk_mq_insert_request(hctx, rq, false); - spin_unlock(&ctx->lock); -} - static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq) { if (rq->tag != -1) diff --git a/block/blk-mq.h b/block/blk-mq.h index 60b01c0..01067b2 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -54,6 +54,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, */ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head); +void blk_mq_insert_request(struct request *rq, bool run_queue, bool async); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); -- 2.10.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 16:16 ` [PATCH v2] " Mike Snitzer @ 2017-09-11 20:51 ` Jens Axboe 2017-09-11 21:13 ` Mike Snitzer 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2017-09-11 20:51 UTC (permalink / raw) To: Mike Snitzer Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On 09/11/2017 10:16 AM, Mike Snitzer wrote: > Here is v2 that should obviate the need to rename blk_mq_insert_request > (by using bools to control run_queue and async). > > As for inserting directly into dispatch, if that can be done that is > great but I'd prefer to have that be a follow-up optimization. This > fixes the regression in question, and does so in well-known terms. > > What do you think? I think it looks reasonable. My only concern is the use of the software queues. Depending on the scheduler, they may or may not be used. I'd need to review the code, but my first thought is that this would break if you use blk_mq_insert_request() on a device that is managed by mq-deadline or bfq, for instance. Schedulers are free to use the software queues, but they are also free to ignore them and use internal queuing. Looking at the code, looks like this was changed slightly at some point, we always flush the software queues, if any of them contain requests. So it's probably fine. My earlier suggestion to use just hctx->dispatch for the IO and bypass the software queues completely. The use case for the dispatch list is the same, regardless of whether the device has a scheduler attached or not. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 20:51 ` Jens Axboe @ 2017-09-11 21:13 ` Mike Snitzer 2017-09-11 21:27 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-11 21:13 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Mon, Sep 11 2017 at 4:51pm -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 09/11/2017 10:16 AM, Mike Snitzer wrote: > > Here is v2 that should obviate the need to rename blk_mq_insert_request > > (by using bools to control run_queue and async). > > > > As for inserting directly into dispatch, if that can be done that is > > great but I'd prefer to have that be a follow-up optimization. This > > fixes the regression in question, and does so in well-known terms. > > > > What do you think? > > I think it looks reasonable. My only concern is the use of the software > queues. Depending on the scheduler, they may or may not be used. I'd > need to review the code, but my first thought is that this would break > if you use blk_mq_insert_request() on a device that is managed by > mq-deadline or bfq, for instance. Schedulers are free to use the > software queues, but they are also free to ignore them and use internal > queuing. > > Looking at the code, looks like this was changed slightly at some point, > we always flush the software queues, if any of them contain requests. So > it's probably fine. OK good, but is that too brittle to rely on? Something that might change in the future? > My earlier suggestion to use just hctx->dispatch for the IO and bypass > the software queues completely. The use case for the dispatch list is > the same, regardless of whether the device has a scheduler attached or > not. I'm missing how these details relate to the goal of bypassing any scheduler that might be attached. Are you saying the attached elevator would still get in the way? Looking at blk_mq_sched_insert_request(), submission when an elevator isn't attached is exactly what I made blk_mq_insert_request() do (which is exactly what it did in the past). In the case of DM multipath, nothing else should be submitting IO to the device so elevator shouldn't be used -- only interface for submitting IO would be blk_mq_insert_request(). So even if a scheduler is attached it should be bypassed right? Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 21:13 ` Mike Snitzer @ 2017-09-11 21:27 ` Jens Axboe 2017-09-11 21:51 ` Mike Snitzer 2017-09-14 15:57 ` Ming Lei 0 siblings, 2 replies; 25+ messages in thread From: Jens Axboe @ 2017-09-11 21:27 UTC (permalink / raw) To: Mike Snitzer Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On 09/11/2017 03:13 PM, Mike Snitzer wrote: > On Mon, Sep 11 2017 at 4:51pm -0400, > Jens Axboe <axboe@kernel.dk> wrote: > >> On 09/11/2017 10:16 AM, Mike Snitzer wrote: >>> Here is v2 that should obviate the need to rename blk_mq_insert_request >>> (by using bools to control run_queue and async). >>> >>> As for inserting directly into dispatch, if that can be done that is >>> great but I'd prefer to have that be a follow-up optimization. This >>> fixes the regression in question, and does so in well-known terms. >>> >>> What do you think? >> >> I think it looks reasonable. My only concern is the use of the software >> queues. Depending on the scheduler, they may or may not be used. I'd >> need to review the code, but my first thought is that this would break >> if you use blk_mq_insert_request() on a device that is managed by >> mq-deadline or bfq, for instance. Schedulers are free to use the >> software queues, but they are also free to ignore them and use internal >> queuing. >> >> Looking at the code, looks like this was changed slightly at some point, >> we always flush the software queues, if any of them contain requests. So >> it's probably fine. > > OK good, but is that too brittle to rely on? Something that might change > in the future? I'm actually surprised we do flush software queues for that case, since we don't always have to. So it is a bit of a wart. If we don't have a scheduler, software queues is where IO goes. If we have a scheduler, the scheduler has complete control of where to queue IO. Generally, the scheduler will either utilize the software queues or it won't, there's nothing in between. I know realize I'm an idiot and didn't read it right. So here's the code in question: const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; [...] } else if (!has_sched_dispatch) { blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list); } so we do only enter sw queue flushing, if we don't have a scheduler with a dispatch_request hook. So now I am really wondering how your patch could work if the bottom device has bfq or mq-deadline attached? >> My earlier suggestion to use just hctx->dispatch for the IO and bypass >> the software queues completely. The use case for the dispatch list is >> the same, regardless of whether the device has a scheduler attached or >> not. > > I'm missing how these details relate to the goal of bypassing any > scheduler that might be attached. Are you saying the attached elevator > would still get in the way? See above. > Looking at blk_mq_sched_insert_request(), submission when an elevator > isn't attached is exactly what I made blk_mq_insert_request() do > (which is exactly what it did in the past). Right, but that path is only used if we don't have a scheduler attached. So while the code will use that path IFF a scheduler isn't attached to that device, your use case will use it for both cases. > In the case of DM multipath, nothing else should be submitting IO to > the device so elevator shouldn't be used -- only interface for > submitting IO would be blk_mq_insert_request(). So even if a > scheduler is attached it should be bypassed right? The problem is the usage of the sw queue. Does the below work for you? diff --git a/block/blk-core.c b/block/blk-core.c index d709c0e3a2ac..aebe676225e6 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * if (q->mq_ops) { if (blk_queue_io_stat(q)) blk_account_io_start(rq, true); - blk_mq_sched_insert_request(rq, false, true, false, false); + /* + * Since we have a scheduler attached on the top device, + * bypass a potential scheduler on the bottom device for + * insert. + */ + blk_mq_request_bypass_insert(rq); return BLK_STS_OK; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 3f18cff80050..98a18609755e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_mq_hctx_mark_pending(hctx, ctx); } +/* + * Should only be used carefully, when the caller knows we want to + * bypass a potential IO scheduler on the target device. + */ +void blk_mq_request_bypass_insert(struct request *rq) +{ + struct blk_mq_ctx *ctx = rq->mq_ctx; + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); + + spin_lock(&hctx->lock); + list_add_tail(&rq->queuelist, &hctx->dispatch); + spin_unlock(&hctx->lock); + + blk_mq_run_hw_queue(hctx, false); +} + void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list) diff --git a/block/blk-mq.h b/block/blk-mq.h index 98252b79b80b..ef15b3414da5 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -54,6 +54,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, */ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head); +void blk_mq_request_bypass_insert(struct request *rq); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); -- Jens Axboe ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 21:27 ` Jens Axboe @ 2017-09-11 21:51 ` Mike Snitzer 2017-09-11 22:30 ` Mike Snitzer 2017-09-14 15:57 ` Ming Lei 1 sibling, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-11 21:51 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Mon, Sep 11 2017 at 5:27pm -0400, Jens Axboe <axboe@kernel.dk> wrote: > On 09/11/2017 03:13 PM, Mike Snitzer wrote: > > On Mon, Sep 11 2017 at 4:51pm -0400, > > Jens Axboe <axboe@kernel.dk> wrote: > > > >> On 09/11/2017 10:16 AM, Mike Snitzer wrote: > >>> Here is v2 that should obviate the need to rename blk_mq_insert_request > >>> (by using bools to control run_queue and async). > >>> > >>> As for inserting directly into dispatch, if that can be done that is > >>> great but I'd prefer to have that be a follow-up optimization. This > >>> fixes the regression in question, and does so in well-known terms. > >>> > >>> What do you think? > >> > >> I think it looks reasonable. My only concern is the use of the software > >> queues. Depending on the scheduler, they may or may not be used. I'd > >> need to review the code, but my first thought is that this would break > >> if you use blk_mq_insert_request() on a device that is managed by > >> mq-deadline or bfq, for instance. Schedulers are free to use the > >> software queues, but they are also free to ignore them and use internal > >> queuing. > >> > >> Looking at the code, looks like this was changed slightly at some point, > >> we always flush the software queues, if any of them contain requests. So > >> it's probably fine. > > > > OK good, but is that too brittle to rely on? Something that might change > > in the future? > > I'm actually surprised we do flush software queues for that case, since > we don't always have to. So it is a bit of a wart. If we don't have a > scheduler, software queues is where IO goes. If we have a scheduler, the > scheduler has complete control of where to queue IO. Generally, the > scheduler will either utilize the software queues or it won't, there's > nothing in between. > > I know realize I'm an idiot and didn't read it right. So here's the code > in question: > > const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; > > [...] > > } else if (!has_sched_dispatch) { > blk_mq_flush_busy_ctxs(hctx, &rq_list); > blk_mq_dispatch_rq_list(q, &rq_list); > } > > so we do only enter sw queue flushing, if we don't have a scheduler with > a dispatch_request hook. So now I am really wondering how your patch > could work if the bottom device has bfq or mq-deadline attached? I didn't test it.. I was an even bigger idiot and assumed blk-mq core wouldn't alter its IO processing based on scheduler or no. Nevermind that I tagged my patch for stable@ without testing.. /me knows better > >> My earlier suggestion to use just hctx->dispatch for the IO and bypass > >> the software queues completely. The use case for the dispatch list is > >> the same, regardless of whether the device has a scheduler attached or > >> not. > > > > I'm missing how these details relate to the goal of bypassing any > > scheduler that might be attached. Are you saying the attached elevator > > would still get in the way? > > See above. Yeap, got it. > > Looking at blk_mq_sched_insert_request(), submission when an elevator > > isn't attached is exactly what I made blk_mq_insert_request() do > > (which is exactly what it did in the past). > > Right, but that path is only used if we don't have a scheduler attached. > So while the code will use that path IFF a scheduler isn't attached to > that device, your use case will use it for both cases. > > > In the case of DM multipath, nothing else should be submitting IO to > > the device so elevator shouldn't be used -- only interface for > > submitting IO would be blk_mq_insert_request(). So even if a > > scheduler is attached it should be bypassed right? > > The problem is the usage of the sw queue. > > Does the below work for you? I _will_ test your patch and let you know! Thanks, much appreciated. Mike ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 21:51 ` Mike Snitzer @ 2017-09-11 22:30 ` Mike Snitzer 2017-09-11 22:43 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Mike Snitzer @ 2017-09-11 22:30 UTC (permalink / raw) To: Jens Axboe Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On Mon, Sep 11 2017 at 5:51pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Sep 11 2017 at 5:27pm -0400, > Jens Axboe <axboe@kernel.dk> wrote: > > > > Does the below work for you? > > I _will_ test your patch and let you know! Tested with bfq on underlying paths and both none and bfq on upper-level DM multipath request_queue. Works perfectly, feel free to add my: Tested-by: Mike Snitzer <snitzer@redhat.com> Thanks again! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 22:30 ` Mike Snitzer @ 2017-09-11 22:43 ` Jens Axboe 0 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2017-09-11 22:43 UTC (permalink / raw) To: Mike Snitzer Cc: Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, dm-devel On 09/11/2017 04:30 PM, Mike Snitzer wrote: > On Mon, Sep 11 2017 at 5:51pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > >> On Mon, Sep 11 2017 at 5:27pm -0400, >> Jens Axboe <axboe@kernel.dk> wrote: >>> >>> Does the below work for you? >> >> I _will_ test your patch and let you know! > > Tested with bfq on underlying paths and both none and bfq on upper-level > DM multipath request_queue. > > Works perfectly, feel free to add my: > > Tested-by: Mike Snitzer <snitzer@redhat.com> > > Thanks again! Great, thanks for testing! I'll commit this, stealing your change log almost verbatim. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-11 21:27 ` Jens Axboe 2017-09-11 21:51 ` Mike Snitzer @ 2017-09-14 15:57 ` Ming Lei 2017-09-14 16:30 ` Jens Axboe 1 sibling, 1 reply; 25+ messages in thread From: Ming Lei @ 2017-09-14 15:57 UTC (permalink / raw) To: Jens Axboe Cc: Mike Snitzer, Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, open list:DEVICE-MAPPER (LVM) On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 09/11/2017 03:13 PM, Mike Snitzer wrote: >> On Mon, Sep 11 2017 at 4:51pm -0400, >> Jens Axboe <axboe@kernel.dk> wrote: >> >>> On 09/11/2017 10:16 AM, Mike Snitzer wrote: >>>> Here is v2 that should obviate the need to rename blk_mq_insert_request >>>> (by using bools to control run_queue and async). >>>> >>>> As for inserting directly into dispatch, if that can be done that is >>>> great but I'd prefer to have that be a follow-up optimization. This >>>> fixes the regression in question, and does so in well-known terms. >>>> >>>> What do you think? >>> >>> I think it looks reasonable. My only concern is the use of the software >>> queues. Depending on the scheduler, they may or may not be used. I'd >>> need to review the code, but my first thought is that this would break >>> if you use blk_mq_insert_request() on a device that is managed by >>> mq-deadline or bfq, for instance. Schedulers are free to use the >>> software queues, but they are also free to ignore them and use internal >>> queuing. >>> >>> Looking at the code, looks like this was changed slightly at some point, >>> we always flush the software queues, if any of them contain requests. So >>> it's probably fine. >> >> OK good, but is that too brittle to rely on? Something that might change >> in the future? > > I'm actually surprised we do flush software queues for that case, since > we don't always have to. So it is a bit of a wart. If we don't have a > scheduler, software queues is where IO goes. If we have a scheduler, the > scheduler has complete control of where to queue IO. Generally, the > scheduler will either utilize the software queues or it won't, there's > nothing in between. > > I know realize I'm an idiot and didn't read it right. So here's the code > in question: > > const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; > > [...] > > } else if (!has_sched_dispatch) { > blk_mq_flush_busy_ctxs(hctx, &rq_list); > blk_mq_dispatch_rq_list(q, &rq_list); > } > > so we do only enter sw queue flushing, if we don't have a scheduler with > a dispatch_request hook. So now I am really wondering how your patch > could work if the bottom device has bfq or mq-deadline attached? > >>> My earlier suggestion to use just hctx->dispatch for the IO and bypass >>> the software queues completely. The use case for the dispatch list is >>> the same, regardless of whether the device has a scheduler attached or >>> not. >> >> I'm missing how these details relate to the goal of bypassing any >> scheduler that might be attached. Are you saying the attached elevator >> would still get in the way? > > See above. > >> Looking at blk_mq_sched_insert_request(), submission when an elevator >> isn't attached is exactly what I made blk_mq_insert_request() do >> (which is exactly what it did in the past). > > Right, but that path is only used if we don't have a scheduler attached. > So while the code will use that path IFF a scheduler isn't attached to > that device, your use case will use it for both cases. > >> In the case of DM multipath, nothing else should be submitting IO to >> the device so elevator shouldn't be used -- only interface for >> submitting IO would be blk_mq_insert_request(). So even if a >> scheduler is attached it should be bypassed right? > > The problem is the usage of the sw queue. > > Does the below work for you? > > > diff --git a/block/blk-core.c b/block/blk-core.c > index d709c0e3a2ac..aebe676225e6 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * > if (q->mq_ops) { > if (blk_queue_io_stat(q)) > blk_account_io_start(rq, true); > - blk_mq_sched_insert_request(rq, false, true, false, false); > + /* > + * Since we have a scheduler attached on the top device, > + * bypass a potential scheduler on the bottom device for > + * insert. > + */ > + blk_mq_request_bypass_insert(rq); > return BLK_STS_OK; > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 3f18cff80050..98a18609755e 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > blk_mq_hctx_mark_pending(hctx, ctx); > } > > +/* > + * Should only be used carefully, when the caller knows we want to > + * bypass a potential IO scheduler on the target device. > + */ > +void blk_mq_request_bypass_insert(struct request *rq) > +{ > + struct blk_mq_ctx *ctx = rq->mq_ctx; > + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); > + > + spin_lock(&hctx->lock); > + list_add_tail(&rq->queuelist, &hctx->dispatch); > + spin_unlock(&hctx->lock); > + > + blk_mq_run_hw_queue(hctx, false); > +} > + Hello Jens and Mike, This patch sends flush request to ->dispatch directly too, which changes the previous behaviour, is that OK for dm-rq? -- Ming Lei ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-14 15:57 ` Ming Lei @ 2017-09-14 16:30 ` Jens Axboe 2017-09-14 16:33 ` Ming Lei 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2017-09-14 16:30 UTC (permalink / raw) To: Ming Lei Cc: Mike Snitzer, Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, open list:DEVICE-MAPPER (LVM) On 09/14/2017 09:57 AM, Ming Lei wrote: > On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@kernel.dk> wrote: >> On 09/11/2017 03:13 PM, Mike Snitzer wrote: >>> On Mon, Sep 11 2017 at 4:51pm -0400, >>> Jens Axboe <axboe@kernel.dk> wrote: >>> >>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote: >>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request >>>>> (by using bools to control run_queue and async). >>>>> >>>>> As for inserting directly into dispatch, if that can be done that is >>>>> great but I'd prefer to have that be a follow-up optimization. This >>>>> fixes the regression in question, and does so in well-known terms. >>>>> >>>>> What do you think? >>>> >>>> I think it looks reasonable. My only concern is the use of the software >>>> queues. Depending on the scheduler, they may or may not be used. I'd >>>> need to review the code, but my first thought is that this would break >>>> if you use blk_mq_insert_request() on a device that is managed by >>>> mq-deadline or bfq, for instance. Schedulers are free to use the >>>> software queues, but they are also free to ignore them and use internal >>>> queuing. >>>> >>>> Looking at the code, looks like this was changed slightly at some point, >>>> we always flush the software queues, if any of them contain requests. So >>>> it's probably fine. >>> >>> OK good, but is that too brittle to rely on? Something that might change >>> in the future? >> >> I'm actually surprised we do flush software queues for that case, since >> we don't always have to. So it is a bit of a wart. If we don't have a >> scheduler, software queues is where IO goes. If we have a scheduler, the >> scheduler has complete control of where to queue IO. Generally, the >> scheduler will either utilize the software queues or it won't, there's >> nothing in between. >> >> I know realize I'm an idiot and didn't read it right. So here's the code >> in question: >> >> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; >> >> [...] >> >> } else if (!has_sched_dispatch) { >> blk_mq_flush_busy_ctxs(hctx, &rq_list); >> blk_mq_dispatch_rq_list(q, &rq_list); >> } >> >> so we do only enter sw queue flushing, if we don't have a scheduler with >> a dispatch_request hook. So now I am really wondering how your patch >> could work if the bottom device has bfq or mq-deadline attached? >> >>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass >>>> the software queues completely. The use case for the dispatch list is >>>> the same, regardless of whether the device has a scheduler attached or >>>> not. >>> >>> I'm missing how these details relate to the goal of bypassing any >>> scheduler that might be attached. Are you saying the attached elevator >>> would still get in the way? >> >> See above. >> >>> Looking at blk_mq_sched_insert_request(), submission when an elevator >>> isn't attached is exactly what I made blk_mq_insert_request() do >>> (which is exactly what it did in the past). >> >> Right, but that path is only used if we don't have a scheduler attached. >> So while the code will use that path IFF a scheduler isn't attached to >> that device, your use case will use it for both cases. >> >>> In the case of DM multipath, nothing else should be submitting IO to >>> the device so elevator shouldn't be used -- only interface for >>> submitting IO would be blk_mq_insert_request(). So even if a >>> scheduler is attached it should be bypassed right? >> >> The problem is the usage of the sw queue. >> >> Does the below work for you? >> >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index d709c0e3a2ac..aebe676225e6 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * >> if (q->mq_ops) { >> if (blk_queue_io_stat(q)) >> blk_account_io_start(rq, true); >> - blk_mq_sched_insert_request(rq, false, true, false, false); >> + /* >> + * Since we have a scheduler attached on the top device, >> + * bypass a potential scheduler on the bottom device for >> + * insert. >> + */ >> + blk_mq_request_bypass_insert(rq); >> return BLK_STS_OK; >> } >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 3f18cff80050..98a18609755e 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, >> blk_mq_hctx_mark_pending(hctx, ctx); >> } >> >> +/* >> + * Should only be used carefully, when the caller knows we want to >> + * bypass a potential IO scheduler on the target device. >> + */ >> +void blk_mq_request_bypass_insert(struct request *rq) >> +{ >> + struct blk_mq_ctx *ctx = rq->mq_ctx; >> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); >> + >> + spin_lock(&hctx->lock); >> + list_add_tail(&rq->queuelist, &hctx->dispatch); >> + spin_unlock(&hctx->lock); >> + >> + blk_mq_run_hw_queue(hctx, false); >> +} >> + > > Hello Jens and Mike, > > This patch sends flush request to ->dispatch directly too, which changes the > previous behaviour, is that OK for dm-rq? That's a good question, I need to look into that. The flush behavior is so annoying. Did you make any progress on fixing up the patch you posted the other day on treating flushes like any other request? -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-14 16:30 ` Jens Axboe @ 2017-09-14 16:33 ` Ming Lei 2017-09-14 16:34 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Ming Lei @ 2017-09-14 16:33 UTC (permalink / raw) To: Jens Axboe Cc: Mike Snitzer, Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, open list:DEVICE-MAPPER (LVM) On Fri, Sep 15, 2017 at 12:30 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 09/14/2017 09:57 AM, Ming Lei wrote: >> On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@kernel.dk> wrote: >>> On 09/11/2017 03:13 PM, Mike Snitzer wrote: >>>> On Mon, Sep 11 2017 at 4:51pm -0400, >>>> Jens Axboe <axboe@kernel.dk> wrote: >>>> >>>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote: >>>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request >>>>>> (by using bools to control run_queue and async). >>>>>> >>>>>> As for inserting directly into dispatch, if that can be done that is >>>>>> great but I'd prefer to have that be a follow-up optimization. This >>>>>> fixes the regression in question, and does so in well-known terms. >>>>>> >>>>>> What do you think? >>>>> >>>>> I think it looks reasonable. My only concern is the use of the software >>>>> queues. Depending on the scheduler, they may or may not be used. I'd >>>>> need to review the code, but my first thought is that this would break >>>>> if you use blk_mq_insert_request() on a device that is managed by >>>>> mq-deadline or bfq, for instance. Schedulers are free to use the >>>>> software queues, but they are also free to ignore them and use internal >>>>> queuing. >>>>> >>>>> Looking at the code, looks like this was changed slightly at some point, >>>>> we always flush the software queues, if any of them contain requests. So >>>>> it's probably fine. >>>> >>>> OK good, but is that too brittle to rely on? Something that might change >>>> in the future? >>> >>> I'm actually surprised we do flush software queues for that case, since >>> we don't always have to. So it is a bit of a wart. If we don't have a >>> scheduler, software queues is where IO goes. If we have a scheduler, the >>> scheduler has complete control of where to queue IO. Generally, the >>> scheduler will either utilize the software queues or it won't, there's >>> nothing in between. >>> >>> I know realize I'm an idiot and didn't read it right. So here's the code >>> in question: >>> >>> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; >>> >>> [...] >>> >>> } else if (!has_sched_dispatch) { >>> blk_mq_flush_busy_ctxs(hctx, &rq_list); >>> blk_mq_dispatch_rq_list(q, &rq_list); >>> } >>> >>> so we do only enter sw queue flushing, if we don't have a scheduler with >>> a dispatch_request hook. So now I am really wondering how your patch >>> could work if the bottom device has bfq or mq-deadline attached? >>> >>>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass >>>>> the software queues completely. The use case for the dispatch list is >>>>> the same, regardless of whether the device has a scheduler attached or >>>>> not. >>>> >>>> I'm missing how these details relate to the goal of bypassing any >>>> scheduler that might be attached. Are you saying the attached elevator >>>> would still get in the way? >>> >>> See above. >>> >>>> Looking at blk_mq_sched_insert_request(), submission when an elevator >>>> isn't attached is exactly what I made blk_mq_insert_request() do >>>> (which is exactly what it did in the past). >>> >>> Right, but that path is only used if we don't have a scheduler attached. >>> So while the code will use that path IFF a scheduler isn't attached to >>> that device, your use case will use it for both cases. >>> >>>> In the case of DM multipath, nothing else should be submitting IO to >>>> the device so elevator shouldn't be used -- only interface for >>>> submitting IO would be blk_mq_insert_request(). So even if a >>>> scheduler is attached it should be bypassed right? >>> >>> The problem is the usage of the sw queue. >>> >>> Does the below work for you? >>> >>> >>> diff --git a/block/blk-core.c b/block/blk-core.c >>> index d709c0e3a2ac..aebe676225e6 100644 >>> --- a/block/blk-core.c >>> +++ b/block/blk-core.c >>> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * >>> if (q->mq_ops) { >>> if (blk_queue_io_stat(q)) >>> blk_account_io_start(rq, true); >>> - blk_mq_sched_insert_request(rq, false, true, false, false); >>> + /* >>> + * Since we have a scheduler attached on the top device, >>> + * bypass a potential scheduler on the bottom device for >>> + * insert. >>> + */ >>> + blk_mq_request_bypass_insert(rq); >>> return BLK_STS_OK; >>> } >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 3f18cff80050..98a18609755e 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, >>> blk_mq_hctx_mark_pending(hctx, ctx); >>> } >>> >>> +/* >>> + * Should only be used carefully, when the caller knows we want to >>> + * bypass a potential IO scheduler on the target device. >>> + */ >>> +void blk_mq_request_bypass_insert(struct request *rq) >>> +{ >>> + struct blk_mq_ctx *ctx = rq->mq_ctx; >>> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); >>> + >>> + spin_lock(&hctx->lock); >>> + list_add_tail(&rq->queuelist, &hctx->dispatch); >>> + spin_unlock(&hctx->lock); >>> + >>> + blk_mq_run_hw_queue(hctx, false); >>> +} >>> + >> >> Hello Jens and Mike, >> >> This patch sends flush request to ->dispatch directly too, which changes the >> previous behaviour, is that OK for dm-rq? > > That's a good question, I need to look into that. The flush behavior is so > annoying. Did you make any progress on fixing up the patch you posted the > other day on treating flushes like any other request? It has been ready, will post it out later. -- Ming Lei ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request() 2017-09-14 16:33 ` Ming Lei @ 2017-09-14 16:34 ` Jens Axboe 0 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2017-09-14 16:34 UTC (permalink / raw) To: Ming Lei Cc: Mike Snitzer, Paolo Valente, Bart Van Assche, linux-block@vger.kernel.org, open list:DEVICE-MAPPER (LVM) On 09/14/2017 10:33 AM, Ming Lei wrote: > On Fri, Sep 15, 2017 at 12:30 AM, Jens Axboe <axboe@kernel.dk> wrote: >> On 09/14/2017 09:57 AM, Ming Lei wrote: >>> On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe@kernel.dk> wrote: >>>> On 09/11/2017 03:13 PM, Mike Snitzer wrote: >>>>> On Mon, Sep 11 2017 at 4:51pm -0400, >>>>> Jens Axboe <axboe@kernel.dk> wrote: >>>>> >>>>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote: >>>>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request >>>>>>> (by using bools to control run_queue and async). >>>>>>> >>>>>>> As for inserting directly into dispatch, if that can be done that is >>>>>>> great but I'd prefer to have that be a follow-up optimization. This >>>>>>> fixes the regression in question, and does so in well-known terms. >>>>>>> >>>>>>> What do you think? >>>>>> >>>>>> I think it looks reasonable. My only concern is the use of the software >>>>>> queues. Depending on the scheduler, they may or may not be used. I'd >>>>>> need to review the code, but my first thought is that this would break >>>>>> if you use blk_mq_insert_request() on a device that is managed by >>>>>> mq-deadline or bfq, for instance. Schedulers are free to use the >>>>>> software queues, but they are also free to ignore them and use internal >>>>>> queuing. >>>>>> >>>>>> Looking at the code, looks like this was changed slightly at some point, >>>>>> we always flush the software queues, if any of them contain requests. So >>>>>> it's probably fine. >>>>> >>>>> OK good, but is that too brittle to rely on? Something that might change >>>>> in the future? >>>> >>>> I'm actually surprised we do flush software queues for that case, since >>>> we don't always have to. So it is a bit of a wart. If we don't have a >>>> scheduler, software queues is where IO goes. If we have a scheduler, the >>>> scheduler has complete control of where to queue IO. Generally, the >>>> scheduler will either utilize the software queues or it won't, there's >>>> nothing in between. >>>> >>>> I know realize I'm an idiot and didn't read it right. So here's the code >>>> in question: >>>> >>>> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; >>>> >>>> [...] >>>> >>>> } else if (!has_sched_dispatch) { >>>> blk_mq_flush_busy_ctxs(hctx, &rq_list); >>>> blk_mq_dispatch_rq_list(q, &rq_list); >>>> } >>>> >>>> so we do only enter sw queue flushing, if we don't have a scheduler with >>>> a dispatch_request hook. So now I am really wondering how your patch >>>> could work if the bottom device has bfq or mq-deadline attached? >>>> >>>>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass >>>>>> the software queues completely. The use case for the dispatch list is >>>>>> the same, regardless of whether the device has a scheduler attached or >>>>>> not. >>>>> >>>>> I'm missing how these details relate to the goal of bypassing any >>>>> scheduler that might be attached. Are you saying the attached elevator >>>>> would still get in the way? >>>> >>>> See above. >>>> >>>>> Looking at blk_mq_sched_insert_request(), submission when an elevator >>>>> isn't attached is exactly what I made blk_mq_insert_request() do >>>>> (which is exactly what it did in the past). >>>> >>>> Right, but that path is only used if we don't have a scheduler attached. >>>> So while the code will use that path IFF a scheduler isn't attached to >>>> that device, your use case will use it for both cases. >>>> >>>>> In the case of DM multipath, nothing else should be submitting IO to >>>>> the device so elevator shouldn't be used -- only interface for >>>>> submitting IO would be blk_mq_insert_request(). So even if a >>>>> scheduler is attached it should be bypassed right? >>>> >>>> The problem is the usage of the sw queue. >>>> >>>> Does the below work for you? >>>> >>>> >>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>> index d709c0e3a2ac..aebe676225e6 100644 >>>> --- a/block/blk-core.c >>>> +++ b/block/blk-core.c >>>> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * >>>> if (q->mq_ops) { >>>> if (blk_queue_io_stat(q)) >>>> blk_account_io_start(rq, true); >>>> - blk_mq_sched_insert_request(rq, false, true, false, false); >>>> + /* >>>> + * Since we have a scheduler attached on the top device, >>>> + * bypass a potential scheduler on the bottom device for >>>> + * insert. >>>> + */ >>>> + blk_mq_request_bypass_insert(rq); >>>> return BLK_STS_OK; >>>> } >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 3f18cff80050..98a18609755e 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, >>>> blk_mq_hctx_mark_pending(hctx, ctx); >>>> } >>>> >>>> +/* >>>> + * Should only be used carefully, when the caller knows we want to >>>> + * bypass a potential IO scheduler on the target device. >>>> + */ >>>> +void blk_mq_request_bypass_insert(struct request *rq) >>>> +{ >>>> + struct blk_mq_ctx *ctx = rq->mq_ctx; >>>> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); >>>> + >>>> + spin_lock(&hctx->lock); >>>> + list_add_tail(&rq->queuelist, &hctx->dispatch); >>>> + spin_unlock(&hctx->lock); >>>> + >>>> + blk_mq_run_hw_queue(hctx, false); >>>> +} >>>> + >>> >>> Hello Jens and Mike, >>> >>> This patch sends flush request to ->dispatch directly too, which changes the >>> previous behaviour, is that OK for dm-rq? >> >> That's a good question, I need to look into that. The flush behavior is so >> annoying. Did you make any progress on fixing up the patch you posted the >> other day on treating flushes like any other request? > > It has been ready, will post it out later. OK good, if that's clean enough, then I think going that route is a much better idea than introducing more flush/not-flush logic. I liked the initial patch from a concept point of view, and it enables us to get rid of a few nasty hacks. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2017-09-14 16:34 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-24 23:16 BFQ + dm-mpath Bart Van Assche 2017-08-30 16:31 ` Paolo Valente 2017-09-05 7:56 ` Paolo Valente 2017-09-05 14:15 ` Bart Van Assche 2017-09-07 15:52 ` Mike Snitzer 2017-09-08 9:13 ` Paolo Valente 2017-09-08 16:41 ` [PATCH] dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath] Mike Snitzer 2017-09-08 16:48 ` Jens Axboe 2017-09-08 17:07 ` Mike Snitzer 2017-09-08 19:58 ` Mike Snitzer 2017-09-08 20:28 ` Jens Axboe 2017-09-08 21:42 ` [PATCH] block: directly insert blk-mq request from blk_insert_cloned_request() Mike Snitzer 2017-09-08 21:50 ` Jens Axboe 2017-09-08 22:03 ` Mike Snitzer 2017-09-11 16:16 ` [PATCH v2] " Mike Snitzer 2017-09-11 20:51 ` Jens Axboe 2017-09-11 21:13 ` Mike Snitzer 2017-09-11 21:27 ` Jens Axboe 2017-09-11 21:51 ` Mike Snitzer 2017-09-11 22:30 ` Mike Snitzer 2017-09-11 22:43 ` Jens Axboe 2017-09-14 15:57 ` Ming Lei 2017-09-14 16:30 ` Jens Axboe 2017-09-14 16:33 ` Ming Lei 2017-09-14 16:34 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).