Mark Changizi has an interesting way of looking at things. The brain has functions and facilities that have evolved very long ago for the situations that an ape would need to deal with. He puts forward the idea that language and music adapted to what the brain can do, and not, that the brain adapted to do what was needed for language and music. This is a main idea in his book Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man. From its blurb:
In particular, language and music came to have the structures of the sounds in nature, just the sorts of sounds our brain had evolved to process. It is this nature-harnessing that explains who we are today. For speech, Changizi provides a barrage of evidence that speech across human languages mimics the fundamental sounds of physical events in the world. By mimicking the sounds that solid objects make when they hit, slide and ring, speech harnesses our ancient event-recognition powers that were never intended for language. And, for music, Changizi lays out his case that music mimics another equally important category of sound in the world: the sounds of human movement. Just as we possess brains specially designed to recognize facial expressions, our brains evolved to recognize what people are doing in our midst from the sounds they make. Music harnesses that ancient brain capability, turning a human action recognition system into a music appreciation machine.
There is some independent experimental evidence of a connection between music and movement. Derck Bownds (here) has a posting on a paper by B. Sievers and others, Music and movement share a dynamic structure that supports universal expressions of emotion, PNAS Jan 2013. Here is the abstract:
Music moves us. Its kinetic power is the foundation of human behaviors as diverse as dance, romance, lullabies, and the military march. Despite its significance, the music-movement relationship is poorly understood. We present an empirical method for testing whether music and movement share a common structure that affords equivalent and universal emotional expressions. Our method uses a computer program that can generate matching examples of music and movement from a single set of features: rate, jitter (regularity of rate), direction, step size, and dissonance/visual spikiness. We applied our method in two experiments, one in the United States and another in an isolated tribal village in Cambodia. These experiments revealed three things: (i) each emotion was represented by a unique combination of features, (ii) each combination expressed the same emotion in both music and movement, and (iii) this common structure between music and movement was evident within and across cultures.
Bownd’s description of the computer program is clearer.
They designed an ingenious computer program that used slider bars to adjust a music player or a bouncing ball with varying rate, jitter (regularity of rate), direction, step size, and dissonance/visual spikiness. Participants were instructed to take as much time as needed to set the sliders in the program to express five emotions: angry, happy, peaceful, sad, and scared. One set of participants was instructed to move sliders to express the emotion with the moving ball, then other set told to move the sliders to use music to express the emotion. U.S. college students were one experimental group, the other was a culturally isolated Kreug ethnic minority in northern Cambodia with music formally dissimilar to Western music.