The -a mode of perfquery is intended to loop through all ports on a single HCA and provide aggregated output across all ports. The -l mode is intended to loop through all ports of a single HCA and output non-aggregated data. Neither mode addresses a machine with more than one HCA. Furthermore, I found both -a and -l failed to loop properly on my Mellanox adapter (it would read the first port and error out trying to read the second). So, I wrote a new switch, -H, that loops through all ports on all HCAs in the system. Because of how it's implemented, it gets around the problem that both -a and -l had on my machine when dealing with the second Mellanox port. It, however, does not do aggregated output because each HCA/port combination is treated as its own device. I forgot to update the man page though. If the current infiniband-diags maintainer wants it, I can add that (that's assuming the base patch is acceptable). I think Ira is doing that now, right? Anyway, attached is the patch.