Our paper "Cross-ISA Machine Emulation for Multicores" has been accepted for publication at the upcoming CGO'17 conference to be held in Austin, TX. The work described in the paper enables parallel execution of parallel cross-ISA workloads in QEMU. That is, multi-core hosts can be exploited to speed up the emulation of guests that are (1) multi-core systems or (2) multi-threaded user-mode programs.
Code changes resulting from this work have started making their way to upstream QEMU. In particular, QEMU v2.7 includes (1) our improved hashing for the TB block hash table and (2) the implementation and use of QHT for scalable TB lookups (commit, commit); see the merge commit. Moreover, the following code has been merged for inclusion in the upcoming QEMU v2.8: (1) the use of the host's atomic instructions to emulate the guest's atomics (x86, arm, aarch64; merge commit), and (2) the necessary work to safely support multi-threaded execution (commit, commit; merge commit).
This work would not have been possible without the QEMU community. In particular, Paolo Bonzini and Alex Bennée have made key contributions and are coauthors of the paper. Other QEMU developers such as Sergey Fedorov and Richard Henderson have been actively involved by writing code as well as making significant improvements to our ideas and code. We are very grateful for their help and their patience with us.