Next: Overview Up: No Title Previous: Contents

Preface

Chapter Preface

Abstract

The subject of this thesis is a software toolkit to visualize and transform audio and other signals. The presented solution for this interesting intramedia problem makes use of visual computing, to enhance auditive perception. I have developed a reusable software system called tsKit using high-end software technology for recent graphic workstations (In particular from Silicon Graphics). It is necessary to stress for my purpose that the term signal is defined as a function of time. Thus, though 'time signal' may appear redundant, I shall use this phrase selectively to stress the importance of time qua dimension.

The main software engineering task was to generalize a mono-media framework architecture (3D graphics library Open Inventor) into a multimedia library capable of supporting a synchronious real-time rendering of singals as auditory and visual output. Time is established as an integrated dimension for rendering in Open Inventor.

The result of this work is a collection of objects, that are integrated in the Open Inventor architecture and can used like built-in objects by scripting scene descriptions or C++ programming. The extension offers specific features for visualizing, computing and the playback of sound signals. The tools produced in this work have many potential applications in the fields of audio signal analysis and processing, speech recognition, computer music and other fields of research.

Motivation

The following subsections describes motivations behind this work.

Making Perception perceivable

One motivation is from the field of auditory perception; also called psychoacoustic or auditory scene analysis[5]. Psychoacoustics is that part of acoustics which deals with the perception of acoustic waves; not their physics. In its most general formulation, the subject of psychoacoustics is the auditory systems of humans and animals and how these systems allow their possessors to experience the world. A central question of this field is how recieved stimuli becomes perception?

Apporaches to such issues in psychoacoustics can be physiological, psychological, musical, and technological. The same auditory phenomena can be simply experienced or become the subject of analytic study. Computer models can be used to make the process of perception transparent to conscious. This project promises to provide a metalanguage for formulating new feedback loops between different senory fields.

My personal motivation has been to make it possible to use my results to make auditory perception visually perceivable: in a way that fosters its use as a creative tool. I have built an open software tool, that can create videos which enhance, through the perception of visual images, our ability to understand auditory stimuli. I have produced a tool for research and creative endevors rather than a particular theory or application.

The Walkman Effect: Guide Video with Audio

I am sitting in the tram, listening to my walkman. The train starts moving with the beginning of a new song on the tape. Outside the tram, a couple of pigeons fly into the air shortly after the strings begin to play.

It often happens that a visual and an auditory event occurs simultaniously. Visual and auditive events often coincide without a common source, sense or cause. To construct relationships between these events gives me great joy. I like to imagine they concide not randomly but for a reason; I love the moment at which apparent randomnes falls away to reveal symmetry. Each sense of the stream of time comments on the other. With the help of a computer I want to create this synaesthetic effect: one media guides the perception of another.

Shuhei Hosakawa analizes in [11] and [12] a similar phenomena of co-perception. He calls it the Walkman-Effect.

It is not yet possible to manipulate video within this framework, but this is one of my ultimate goals. I hope that this work will lay a foundation that will eventually enable me to reach this goal.

Computer Code as a Variation Machine and a Metamedium

In [15] Gene Yongblood, Woody Vasulka and Peter Weibel discuss which role a computer program can have for the art of cinema. Cinema they generalize from the medium (film, video, holography and program code) to '' [...] the art of organizing a stream of audiovisual events in time.''.

Thus, the basic phenomenology of the moving image - what Vasulka calls ''the performance of the image on the surface of the screen'' - remains historically coninious across all media. Digital code, for example, has radically altered the epistemology and ontology of the moving image but has not fundamentally changed its phenomenology. There are no digital images that have not been prefigured in painting, film and video. With the code we can only summarize them, elaborate and unfold them or exercise modalities. Vasulka calls the code a variation machine. There are no new classes of images, there are only new variations and new epistomological and ontological conditions for generating and witnessing those variations.

Later they write

''[...] aesthetic strategies invented 100 years ago in photography and cinema - scaling, perspective/negative reversals, wipes, mattes - have now become machine elements whose operations are trivially invoked through the preset button. It is a question of primitives. The code is a metamedium: trough it, high-level aesthetic constructs from previous media become the primitives of the new medium. This influences which aesthetic strategies will be emphasized.'' [emphasis added by author]

It is a hard work, to build a model inside a computer, that lets it 'understand' the information that it handles, even slight. So, how to make aesthetic use of a computer then and especially it's graphical output capabilities? Why not let it vary something sensible? For me the most videos generated by digital code suffer from being low in dimensionality, related to things created in the 'real world'. I want to take advantage of the fact, that a computer has to handle everything just as uniform information 'bits and bytes', but has a good access to human's sensory material; audio and video. It can reduce audio and video into information and can turn information into audio and video. So it can become a meta instrument. A meta-instrument is an abstract instrument, that incorporates other instruments.

Building an Intramedia Framework

Summarizing the motivations mentioned above I looked for a suitable tool. Because I didn't found a free software offering media independent, fully extensible real-time rendering with orthogonal functionality, I decided to build it my own in this thesis.

Goals of This Work

My one word goal is to visualize audio. The goal for this computer science thesis is to develop a software system that efficiently supports various ways of visualizing audio using animated 3D graphics and video. The software should give real-time auditive and visual feedback, direct gestural control and allow interactive creation and modification of mappings between the two media.

In the fields exploring the perception of audio (psychoacoustic, auditory scene analysis, electroacoustic music) visualization is an undervalued opportunity to model, verify and communicate concepts of reseption and perception. This work attempts to build a easy to use framework that factors out the generic parts of this task, to make visualization a better used instrument for science and research in fields related to audio and music.

Structure of this Document

The thesis A Real-Time 3D Visualization Toolkit for Sound and other Signals comes in 2 parts.

Part 1: Thesis An Open Inventor based Visualization-Toolkit for Sound and other Signals

Chapter Overview gives an overview of basic aspects of the system's architecture, its usage and application.

Chapter Introduction is a short chapter, that summarizes the needs of the system and presents related work.

Chapter System Design deals with the software engineering and software technological aspects of this project and the basic system it is integrated into. The concepts are discussed for dealing with several problems of storing, rendering, transforming and controlling different time media in real-time and off real-time.

Chapter Conclusions draws conclusions and presents ideas for future work.

Part 2: The tsKit Reference Manual

The tsKit Reference Manual is the documentation of how the design is implemented into concrete C++ classes and applications. Here aspects are described, that are specific to the implementation of certain classes and class families. It is the detailed documentation of the C++ classes developed in this work. This part is suited for the developers who want to use the tsKit.

Acknowledges

Thanks to David Wessel (Director), Adrian Freed (Director of Research) and Matt Wright (Musical Applications Programmer) for inviting me to the Center for New Music and Audio Technologies and the fruitful discussions and advice. Thanks to Michael Warner for catalyzing my ideas. Thanks to Gibson Guitars and Silicon Graphics for equipment support.

Prof. Dr. Godbersen was my helpful advisor at the Technische Fachhochschule Berlin, Germany.

Figure 1.1: CNMAT, located at 1750 Arch St., Berkeley, USA, near the campus of the University of California, UCB. CNMAT is a satellite of the UC Berkeley Department of Music.

Next: Overview Up: No Title Previous: Contents

Andreas Luecke
Mon Sep 15 10:08:08 PDT 1997