Something I noticed while working on that deposit patch for ia64 was how annoying it is to bundle instructions by hand. And the fact that it's been done incorrectly at least once. E.g. static inline void tcg_out_bswap64(TCGContext *s, TCGArg ret, TCGArg arg) { tcg_out_bundle(s, mII, tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0), tcg_opc_i18(TCG_REG_P0, OPC_NOP_I18, 0), tcg_opc_i3 (TCG_REG_P0, OPC_MUX1_I3, ret, arg, 0xb)); Notice that there's an unnecessary stop bit in there. This patch has does too much all at once, I'll admit that. But before I bother going back to split it into smaller pieces, I wanted to get some feedback -- including whether it would be considered at all. Some statistics that I gleaned from grepping -d out_asm. Both before and after the rewrite, the only two linux-user targets that work are i386 and alpha. Clearly there are still bugs to fix, but it's not necessarily a regression. Code size Stop bits old new change old new change i386/ls 2309712 1602736 -31% 215520 98698 -54% alpha/ls 1088352 817888 -25% 106215 48440 -54% On the system side, I am able to boot the arm-test kernel. But it's a bit harder to get repeatable statistics there. I assume that's all timing issues, and what code gets run when, and how that affects the TB cache. Anyway, comments greatly appreciated. r~