In music, there are often two stages in the realization of a musical composition: composition and interpretation. One might suggest that the composition is the skeleton and the interpretation provides the flesh, although this is somewhat simplistic. Nonetheless, interpretation usually adds expressive qualities to the composition. It often gives the work additional emotional impact. It always alters the balance among the elements and facets of the composition. Any satisfying translation of music to image would need to allow space for the expression of these inflections and nuances.
Most interpretations of a composition can be considered modulations. In classical music, rubato is the interpreter's modulation of tempo. Particular notes are accented through modulations of intensity. Vibrato, and the bending of guitar strings in a guitar solo are modulations of frequency. Much of the content of an interpretation could be expressed as modulations of the visual expression of the feature. Such modulations would have to be scaled so that they for the most part do not threaten the integrity of the feature they are modulating.
Unfortunately, while a lot of music has been translating into MIDI files, providing a good starting point for translation, most of these are literal or amateurish interpretations of the musical score. The interpretive nuances of a masterful interpreation would probably need to be extracted from the actual sound recording, and this is still somewhat beyond the capabilities of computer sound analysis, especially when the music is being performed by larger ensembles. Some automated capture of this sort of nuance is possible using devices like the disklavier piano, which capture the details of a performer's performance precisely.
On the other hand, more and more music is being constructed digitally, which means that any nuances have been specifically coded into the sequencer, and are at least theoretically available to be reprocessed for translation.