At the large end of the scale, most and aural and time-based visual creations have a duration, and there is no substantial difference in the way that duration is experienced between the modes, other than possible differences in the manner in which fatigue manifests itself, where the works are of extended duration.
The dividing up of a work into sections or movements seems to have similar meaning and result in similar experience in both modalities.
Tension and resolution is conventionally a function of narrative harmonic movement. Tension is established as music moves from a state of tonal stability into a state of relative dissonance. When this dissonance resolves back into consonant harmony, the accumulated tension is released. A similar movement between tension and resolution exists in most narrative forms, a simple example being the build up and release of tension in a horror film. This mechanism in music is considered to carry inherent emotional impact, although this reaction is probably at least partially a learned response. There is no direct analogue in abstract vision, although movement back and forth between visual chaos and order might be able to serve a similar function.
Repetition of musical material is more common that repetition of visual material in time-based visuals, but the effect can be constructed in either modality. In sound there is the analogous natural phenomenon of the echo, which perhaps makes repetition feel more natural in sound than in image.
Rhythms can be expressed in an identical manner in sound and image. However, there are some distinct differences in the nature of the experience. An insistent musical rhythm is less fatiguing to the auditory system than an insistent visual rhythm of apparently similar intensity. Also, a heard rhythm tends to draw the body into a groove and create a sense of groundedness, whereas a visual rhythm, at least to my eyes, does not seem to settle, but rather, feels like it has a tendency to race.
(click on the image to stop or start.)
Melody and harmonic movement have aesthetic parameters that are not easily replicated in image. Changes in frequency can be expressed by shifts in position along a spatial dimension, but there is more at play in melody than simply relative position on a continuum, in particular, issues of harmonic movement. Relationships between consecutive notes and groups of notes generate qualities related to consonance and dissonance that do not have a direct analogue in vision.
Individual events occurring in time are quite similar in sight and sound. In hearing, they are sometimes accompanied by reverberations that establish the event within an acoustic space. (visual fade and blur)
Envelope refers to the trajectory of changes of frequency, volume and timbre during the life of a single sound event, such as a single note played by an instrument. Envelopes are often described in terms of attack (the rate of onset), decay (natural decay after the initial onset), sustain (relatively constant continuing presence through the body of the event, and release (the decay of the sound after the initiating event is over. These parameters tend to relate the actual physics of the sound making device and reflect the natural behaviours of materials across time. We tend to have greater sensitivity to the nuances of sonic envelopes than to visual envelopes (which are most often encountered as transitions such as cuts, crossfades, fade-ins and fade-outs.)
Intervals between pitches, pitch and harmonic spectrum are acoustic features based on temporal phenomena of oscillating sound waves, but which are perceived as higher-order qualities. With the exception of the very lowest end of the range of hearing, the relevant frequencies of the oscillations involved in these features are too high for the auditory cortex to experience directly as patterns or vibrations. Of course, these features may vary through time according to an envelope or temporal profile.