The Image of Music and Sound in Xunzi's 'On Music': A Study of Character, Cosmos, and the Cultivation of Rites and Music
This paper offers an in-depth interpretation of the 'Image of Music and Sound' (Sheng Yue zhi Xiang) discussed in Xunzi's 'On Music,' clarifying the Pre-Qin meaning of 'Xiang' (image/analogy) and elucidating how the qualities of sound correspond to the myriad things in the cosmos. It further situates this correspondence within Xunzi's Confucian framework of 'transforming human nature through rites and music' to explore the cosmological significance and pedagogical function of music.

Chapter Ten: Deep Inquiry: Why$2
Section 1: Why Does Sound and Music Require "Imagery" ($\text{xiàng}$)$3
First Question: Why does sound and music require "imagery" ($\text{xiàng}$)$4 Is not the beauty of sound and music directly perceptible$5 Why must we use language to describe its "imagery"—its qualities$6
This question touches upon the fundamental relationship between experience and language.
The beauty of sound and music is indeed directly felt—hearing the drum inspires reverence, hearing the bell brings stability, hearing the wind instruments inspires vigor. These do not require the mediation of language. Yet, Xunzi insists on using precise terms like "grandly beautiful" ($\text{dà lì}$), "comprehensively substantial" ($\text{tǒng shí}$), and "pure and regulated" ($\text{lián zhì}$) to describe these experiences. Why$7
Because Xunzi's goal is not merely to "feel" the beauty of music but to "understand" it—to elevate sensory experience to rational knowledge. Pure sensory experience is vague, personal, and incommunicable; rational knowledge is clear, public, and transmissible. By naming things with "imagery" ($\text{xiàng}$) and describing them with language, Xunzi elevates the aesthetic experience of music to a rational understanding, making it discussable, judgeable, and inheritable.
This aligns perfectly with the intent of "Rectifying Names" ($\text{zhèng míng}$): "A name is that by which we designate and accumulate substance ($\text{shí}$)." The "substance" ($\text{shí}$) of music is its acoustic quality; the "name" ($\text{míng}$) is the descriptive term ($\text{dà lì}$, $\text{tǒng shí}$, etc.). With these "names," people can "designate" (distinguish) the different qualities of various instruments and "accumulate" (transmit) this knowledge. Without these "images," there would be no standard to judge the good or bad of music, and no basis for the training of musicians.
Section 2: Why Do Instrument Qualities Correspond to Heaven, Earth, and the Myriad Things$8
Second Question: Why can the qualities of musical instruments correspond to Heaven, Earth, and the myriad things$9 Is this correspondence objectively real or a human construction$10
As discussed earlier, this correspondence has both an objective basis (the natural quality of the material determines the natural acoustic trait) and a human construction (systematizing these natural traits into a complete correspondence diagram).
More profoundly, why does the material of a thing happen to possess a quality that connects with the qualities of Heaven and Earth$11 Why does leather sound like Heaven’s thunder$12 Why does metal sound like Earth$13 Why does stone sound like flowing water$14
The answer in pre-Qin thought is: because all things are fundamentally one body—"The myriad things share the same space but have different bodies... All things emerge from the Dao and are endowed with qi." Leather comes from animals, which live between Heaven and Earth, so the quality of leather naturally embodies the qualities of Heaven and Earth. Metal comes from underground ore, so the quality of metal naturally embodies the quality of Earth. Stone originates in mountains, and water flows over stone—the quality of stone naturally connects with the quality of water.
This is the cosmology of "the oneness of the myriad things" ($\text{wàn wù yī tǐ}$). Under this view, the correspondence between instrument qualities and cosmic elements is not coincidental or forced; it is the natural manifestation of the unity of all things.
The Yijing, Great Treatise, First Part, states:
"That which is above form is called the Dao; that which is below form is called the vessel ($\text{qì}$)."
The "vessel" ($\text{qì}$) of music (the instrument), though below form, presents an "imagery" ($\text{xiàng}$, quality) that points toward the formless Dao (the principle of Heaven and Earth). "The Imagery of Sound and Music" serves as the bridge from "vessel" to "Dao"—through the specific qualities of musical instruments, one perceives the abstract principle of Heaven and Earth.
Section 3: Why the Drum is the "Sovereign" ($\text{jūn}$) and Not the "First" ($\text{shǒu}$) or "Master" ($\text{zhǔ}$)$15
Third Question: Why does Xunzi use the term "sovereign" ($\text{jūn}$) instead of "first" ($\text{shǒu}$) or "master" ($\text{zhǔ}$) to denote the drum's position in the ensemble$16
This question, seemingly minor, is deeply significant.
"First" ($\text{shǒu}$) means head, foremost, or number one—using "Is not the drum the first of music$17" ($\text{gǔ qí yuè zhī shǒu xié}$$18) would only imply the drum is first in sequence.
"Master" ($\text{zhǔ}$) means host, director, or principal—using "Is not the drum the master of music$19" ($\text{gǔ qí yuè zhī zhǔ xié}$$20) would imply the drum is the controlling agent.
But "Sovereign" ($\text{jūn}$) in pre-Qin usage carries rich political and ethical connotations. A "sovereign" is not just a ruler but a moral exemplar, the core of order, and the source of group cohesion. Xunzi: The Way of the Ruler states:
"The ruler is the source ($\text{yuán}$) of the people. If the source is pure, the stream is pure; if the source is turbid, the stream is turbid."
The ruler is the source of the people. It also states:
"The ruler is the model ($\text{yí}$). The people are the shadow ($\text{yǐng}$). If the model is correct, the shadow is correct."
The ruler is the standard model; the people are its shadow. If the model is correct, the shadow follows. By calling the drum the "sovereign" ($\text{jūn}$), Xunzi implies that the drum is not only first in rank ($\text{shǒu}$) and possesses control power ($\text{zhǔ}$), but also functions as a moral exemplar for the entire ensemble—the quality of the drum sound determines the quality of the whole ensemble ("if the model is correct, the shadow is correct"), and the rhythm of the drum determines the rhythm of the whole ("if the source is pure, the stream is pure").
This precise terminology reflects Xunzi’s consistent tendency to deeply connect music with politics—the harmony of the ensemble mirrors the political order of the state. An orderly ensemble is like a well-governed state; chaos in music is like chaos in governance.
Section 4: Why Dance Holds the Highest Status in the Acoustic System
Fourth Question: Why does dance hold the highest status in the system of sound and music—its quality being "combining the intent of the Dao of Heaven" ($\text{yì tiān dào jiān}$), rather than the drum or song$21
Superficially, the drum is the "sovereign of music," suggesting the highest status. However, upon closer examination, the drum’s quality is only "grand beauty" ($\text{dà lì}$)—this is just one aspect of the Dao (Heaven’s quality). The bell’s "substantiality" ($\text{tǒng shí}$) is Earth’s quality; the chime stone’s "regulation" ($\text{lián zhì}$) is Water’s quality—each captures one aspect of the Dao. Only dance, "combining" ($\text{jiān}$), encompasses the totality of the Dao of Heaven.
Why can dance "combine the intent of the Dao of Heaven"$22
Answer 1: Completeness of Medium. Instrumental music uses material media (metal, stone, earth, leather, silk, wood, gourd, bamboo), inherently limited by the material's characteristics—leather sound can only resemble Heaven, metal sound only Earth, stone sound only Water. Song uses the human voice, which is more flexible than instruments but remains confined to the auditory realm. Only dance uses the entire human body as its medium—the entirety of bones, muscles, joints, and facial expressions—encompassing visual, kinesthetic, and auditory (in coordination with music) sensory channels. The more comprehensive the medium, the more comprehensive the content it can express—hence, only dance can "combine" ($\text{jiān}$) the entirety of the Dao of Heaven.
Answer 2: Unity of Movement and Stillness. Instrumental music is primarily "sound"—dynamic and temporal. Dance is primarily "form"—which is spatial (occupying space) and temporal (changing over time). Dance unifies space and time, movement and stillness, form and sound—this unification is the characteristic of the Dao of Heaven (which possesses both spatial vastness and infinite time, both static order and dynamic change).
Answer 3: Degree of Human Participation. Instrumental music expresses through objects—humans manipulate instruments to produce sound. Song expresses directly through the human voice. Dance expresses through the entire human body—the person participates directly with their whole being. From "through objects" to "through sound" to "through the body," the degree of human participation increases. The highest participation yields the strongest expression; the strongest expression is best able to "combine" ($\text{jiān}$) the totality of the Dao of Heaven’s intent.
Answer 4: Visibility. The sound of instruments is audible but invisible. The sound of song is also audible but invisible. Only the movements of dance are both visible and audible (coordinated with music). "Imagery" ($\text{xiàng}$) is visible—the tradition of "observing imagery" ($\text{guān xiàng}$) in the Yijing emphasizes the visibility of "imagery"—"When established in Heaven, they become imagery ($\text{xiàng}$)" (Heaven presents itself through visible celestial signs). Dance presents the Dao of Heaven through visible bodily movements—this is why the "imagery" of dance surpasses the "imagery" of mere sound.
Section 5: Why "Not Seeing Oneself, Not Hearing Oneself"$23
Fifth Question: Why does Xunzi consider "not seeing oneself, not hearing oneself" ($\text{mù bù zì jiàn, ěr bù zì wén}$) the highest state of dance$24 Why not aim for precise self-control$25
This question touches upon the core Confucian discussion of the relationship between "nature" ($\text{xìng}$) and "effort" ($\text{miǎnqiáng}$).
Logically, the highest state of dance might seem to be flawless self-control—the dancer constantly monitors every movement to ensure zero error. However, Xunzi argues the opposite—"not seeing oneself, not hearing oneself," meaning no self-monitoring, is the highest state. Why$26
Because "self-monitoring" implies a separation between the actor and the action—"I" am watching "my movements"; "I" and "my action" are divided. While this division allows for error correction (because one can see the error), it limits the fluency and naturalness of the action—one must constantly switch between "thinking" and "doing," judging and adjusting, making truly fluid action impossible.
Conversely, "not seeing oneself, not hearing oneself" implies a complete unity between actor and action—no distance, no division, no switching between "thinking" and "doing." The dancer is the dance—the dance is simply happening. Under this condition, movement is most fluid, most natural, and most precise, because there is no interference from conscious deliberation.
This state resonates with Confucius's statement on reaching seventy: "At seventy, I could follow what my heart desired without overstepping the boundaries ($\text{bù yú jǔ}$)." (Analects, Wei Zheng)
"Following what the heart desires without overstepping the boundaries" ($\text{cóng xīn suǒ yù bù yú jǔ}$) is not about suppressing desire to obey the rules, but about the desire itself becoming integrated with the rules—what the heart desires is what the boundary requires.
The dancer’s "not seeing oneself, not hearing oneself" is the bodily equivalent of this—bodily actions flow naturally and perfectly align with the measures of the bell and drum. It is not that the body is forced to obey the rhythm, but that the body’s natural movement is the correct rhythm. This is the ultimate achievement of cultivation—where "nature" ($\text{xìng}$) and "artifice" ($\text{wěi}$) are perfectly merged, and naturalness and regulation are completely unified.
Section 5: Why Sound and Music Can "Move Deeply" and "Transform Quickly"
Sixth Question: Why is sound and music able to "move people deeply" ($\text{gǎn rén shēn}$) and "transform people quickly" ($\text{huà rén sù}$) compared to other means of moral instruction (like language or law)$27
Xunzi asserts that "the entry of sound and music into man is deep, and its transformation of man is fast." The basis for this assertion lies in:
Answer 1: Direct Appeal to Emotion. Verbal instruction relies on rationality—one must first understand the meaning of the words, then judge their correctness, and finally decide whether to accept them. This process involves "understanding—judgment—decision," where resistance can arise at any stage. Legal instruction relies on fear—people obey for fear of punishment, but their hearts may not agree. Sound and music are different—hearing a solemn drum sound, one does not need to "understand" its meaning or "judge" its correctness; reverence arises naturally. This is why it "enters deep" ($\text{rù rén yě shēn}$)—it penetrates directly to the core of emotion, bypassing rational mediation.
Answer 2: Collective Contagion. Instruction from one person to another is limited; the promulgation of a law, while public, depends on specific enforcement. Sound and music, however, can simultaneously move hundreds or thousands of people. As Xunzi states, "When music is performed in the ancestral temple, the ruler and ministers, the high and the low, listen together, and none is not harmonious and respectful." They "listen together" ($\text{tóng tīng zhī}$)—hearing the same music simultaneously, being moved by the same emotion simultaneously. This is the collective infectiousness of music, explaining why it "transforms fast" ($\text{huà rén yě sù}$).
Answer 3: Mobilization of All Senses. Speech primarily engages hearing (requiring comprehension); law primarily engages intellect. Music engages hearing (the sound), sight (the dance), kinesthesia (unconscious bodily movement), and even touch (feeling the vibration of low-frequency drums). The mobilization of all senses grants music a pervasive influence unmatched by means relying on a single sense.
The Record of Music states:
"Music is that which is an unchangeable aspect of feeling ($\text{qíng}$)."
The reason sound and music "feel deep and transform fast" is that they touch the most fundamental aspect of human feeling—the core that precedes rationality, judgment, and all cultural constructs.
Section 7: Why Refute Mozi’s "Against Music" ($\text{fēi yuè}$)$28
Seventh Question: Why did Xunzi specifically refute Mozi’s doctrine of "Against Music" in On Music$29 What exactly was wrong with Mozi’s argument$30
Mozi’s argument against music is found in Mozi: Against Music, Part One:
"Furthermore, when a benevolent man ($\text{rén zhě}$) calculates for the world, he does not calculate based on what pleases the eyes, what pleases the ears, what pleases the mouth, or what pleases the body for comfort; he does not use resources gained by plundering the people’s clothing and food to achieve these ends. A benevolent man does not do this."
Mozi’s fundamental argument is: Music consumes vast human and material resources ($\text{yī lì wù}$), yet it does not increase material wealth; instead, it "plunders the people’s clothing and food resources" ($\text{kuī duó mín yī shí zhī cái}$), so the benevolent man should not engage in it.
Xunzi refutes this by stating:
"Now, music ($\text{yuè}$) is enjoyment ($\text{lè}$), an essential aspect of human feeling that cannot be avoided. Therefore, man cannot be without music ($\text{rén bù néng wú yuè}$)."
The core of the refutation is: Man must have music; if music is forcibly abolished, human emotions will find harmful outlets instead.
From the perspective of the "Imagery of Sound and Music," the fundamental error in Mozi’s argument is that Mozi only saw the "material cost" of music but failed to see its "spiritual benefit" (cultivating the heart, coordinating society, communicating with Heaven and Man). The "imagery" of sound and music—grand beauty, substantiality, regulation, harmony, fierceness, ampleness, goodness, femininity, purity, and encompassing the Dao—each quality is necessary for social harmony. Without the drum’s "grand beauty," society lacks sublime aspiration; without the bell’s "substantiality," society lacks a solid foundation; without the chime stone’s "regulation," society lacks measured restraint. The social function of music far outweighs its material cost.
Xunzi further states:
"Therefore, music is the means by which the Dao is expressed. Metal, stone, silk, and bamboo are the means by which morality is realized. When music is practiced, the people turn toward the right path. Therefore, music is the greatest means of governing the people."
"Metal, stone, silk, and bamboo are the means by which morality is realized" ($\text{jīn shí sī zhú, suǒ yǐ dào dé yě}$). "When music is practiced, the people turn toward the right path" ($\text{yuè xíng ér mín xiāng fāng yǐ}$). "Music is the greatest means of governing the people" ($\text{yuè zhě, zhì rén zhī shèng zhě yě}$).
This judgment elevates music from mere entertainment to the highest political tool. Mozi viewed music as useless consumption; Xunzi viewed it as an essential instrument of governance—the difference lies in their understanding of the "Imagery of Sound and Music"—the qualities and functions of music.