Music MetaData Quality: a multiyear Case Study using the Music of Skip James

Adrian Freed

CNMAT, UC Berkeley, 1750 Arch Street, Berkeley, CA 94709, USA

adrian@cnmat.berkeley.edu

 

ABSTRACT

The case study reported here is an exploratory step towards developing a quantitative system for audio and music metadata quality measurement. Errors, their sources and their propagation mechanisms are carefully examined in a small but meaningful subset of music metadata centered on a single artist Skip James..

1.         Introduction

Internet music portals such as the Apple iTunes Music Store,  Yahoo! Music, Google Music Search and Amazon develop their offerings by aggregating and synthesizing music metadata of many types from many sources. These include biographical data about composers and performers, album track listings, UPC codes, CD track hashes, album cover graphics, related artist and related song, product pricing data etc. The quality of this metadata varies considerably between competing vendors in terms of accuracy and completeness. Unfortunately there are no coordinated efforts to independently measure metadata quality and the metadata industry has prioritized other aspects of the business such as timeliness, recommendation systems and selling advertising space.

The effects of inaccurate metadata are numerous. There is the direct impact to frustrated consumers and lost transactions for suppliers. There is the long-term problem of dilution of brand equity and competitive position as consumers lose confidence in their information sources. There is an amplification of errors as information flows from ÒtrustedÓ sources into derivative publications. Finally the music information retrieval research and application communities rely on symbolic metadata for validation and training of  machine-learning based music classification systems.

The main goal of the case study reported here to identify, log and analyze errors, their sources and the propagation mechanisms in a small subset of music metadata. The idea is not to vilify or blame individuals or organizations involved in the errors that are discussed. It would be unreasonable to generalize the results here to data sources that cover hundreds of thousands of artists. However the surprisingly high number and variety of errors should serve to wake up the music metadata and online music distribution industry to the scope of the problem.

The music of Skip James was chosen because:

¥    Data is available: The author has good access to primary reference materials;

¥    Good temporal coverage: the performer was active at the beginning of the recorded music industry, again in the 1960Õs and his recordings are still being released.

¥    Wide variety of metadata: The artist is popular enough that are many kinds of metadata and derivative works including videos, biographical book, sheet music, calendars, etc.

¥    Tractable amount of data: The artist was neither too popular nor too active.

¥    Access to experts to corroborate information

2.        Skip James

2.1.    Biographical Data


 

Source

Name

Date of Birth

Place of Birth

Residence

Date of Death

Place of Death

Cause of Death

All Music Guide

Skip James

June 9th, 1902

Bentonia, MS

 

Oct. 3rd 1969

Philadelphia, PA

 

All Music Guide

Nehemiah James

June 21, 1902

Bentonia, MS

 

Oct. 3rd 1969

Philadelphia, PA

 

Calt [1]

Nehemiah Curtis
James

June 21, 1902

Yazoo City, MS

Woodbine

Oct. 3rd 1969

Philadelphia, PA

Cancer of Penis

MusicBrainz.org

Skip James

June 21, 1902

 

 

Oct. 3rd 1969

 

 

Edward Komarra [2]

Nehemiah Curtis ÒSkipÓ James

June 21, 1902

Yazoo City, MS

Woodbine

Oct. 3rd 1969

Philadelphia, PA

 

Charters [3]

 

June 9th, 1902

Whitehead
Plantation

Whitehead

 

 

 

Stambler [4]

 

June 9th, 1902

Bentonia, MS

 

 

 

Cancer

Muze

 

June 9th, 1902

Bentonia, MS

 

 

 

 

Calt [5]

 

 

 

Whitehead

 

 

 

Bonnie Rait [6]

 

 

 

 

 

 

Stomach Cancer

Larry Hoffman [7]

 

 

Woodbine
Plantation

 

 

 

 

 


Table 1            Biographical Metadata


2.1.1.  Name

An artistÕs name is the most important biographical metadata element. Although the history of Mr. JamesÕs names is relatively simple, it has proved tricky enough to contribute to an error with wide ranging consequences: the existence in the AMG data of two distinct artist entries for the same person.  This problem has recently been partially corrected after many years.

Skip James given name was ÒNehemiah Curtis JamesÓ. According to him he was given the nickname ÒSkippyÓ because he danced a lot as a child. Charters says vaguely that

ÒSkippyÓ became ÒSkipÓ when he was a young man. Calt [1] says that Speir inadvertently introduced  James to Laibly as ÒSkipÓ. The primacy of Skip James in the metadata stems from its use by the record company on the labels of the original 78 RPM records. The introduction of the book ÒSkip James:  Blues CollectionÓ [8] includes a phrase which will serve as a good reminder of the importance of seeking out primary sources for factual data:

ÒBorn Nehemiah Curtis ÒskipÓ James on June 21, 1902 in Yazoo City Mississippi, he was brought up as an only child on the Woodbine plantationÉ.Ó. This strange and misleading sentence construct implying that his naming and birth occurred at the same time is quite common in biographies , e.g. Memphis Minnie, ÔBorn Lizzie "Kid" Douglas on June 3,  1897 near New OrleansÕ [9].

Many artist naming problems can be simply solved by maintaining an exhaustive list of pseudonyms. Problems arise if a pseudonym on this list is shared with a different artist. In this case it is useful to use another attribute of the artist to avoid confusion, for example the date of birth. This practice is not widespread.  A well-known consequence involves three ÒWillie BrownÕsÓ who played guitar in the South in the 30Õs and 40Õs. Most of the writings about these artists simply designate ÒWillie BrownÓ leaving the reader to guess whether the reference is to the Willie Brown who played with Memphis Minnie, Charley Patton or recorded the classic ÒMississippi BluesÓ. Sometimes it is not possible to use the birth date as a disambiguating attribute as it may not be known.

An interesting case of this involves a pair of guitar players both nicknamed ÒBlind BlakeÓ. One from the Bahamas, Blake Alphonso Higgs, had modest recognition in the 1950Õs. His biographical data is readily available. The other was an extremely popular guitar player in the 1930Õs with over 60 recordings on Paramount records. We know practically nothing about him. We donÕt know his real name, birthday, origins or circumstances or date of his death. An example of the consequence of this ambiguity may be seen with a search on EMusic of ÒBlind Blake.Ó Until EMusic recently dropped the music of Blake Alphonso Higgs, a search listed CDÕs from both artists and the pages for each of these CDÕs linked back to a single artist entry of that name.

Table 2 illustrates some of the many challenges associated with artist names in metadata. Note that artist names are best represented as signals in the sense that they are functions of time. Not mentioned in the table are the considerable challenges associated with international characters sets.


Challenge

Example

Pseudonym

Robert Zimmerman/Bob Dylan

Contraction

Bob Dylan/Dylan, Carlos Santana, Santana, Blind Blake, Blind Arthur Blake

Pseudonym evolution

Prince/The Artist/Prince

Fan or popular  nickname

The Artist/The Artist Formerly Known As Prince

Louis Armstrong/Satchmo

Aliasing

Willie Brown/Willie Brown/Willie Brown, Blind Blake/Blind Blake

Multiple Title

Blind Gary Davis/Reverend Gary Davis,

Abbrieviations

Rev. Gary Davis/ Reverend Gary Davis

Offensive

Fuc*

Surname qualifiers

Hank Williams/Hank Williams Jr.

Character set

???

With others

Martin and Jessica Simpson/Jessica Simpson

Associations

Count Basie and his Orchestra

Spelling

Charlie Patton/Charley Patton

Acronyms

TAFKAP

Table 2            Author Naming Challenge


2.1.2.  Dates of Birth

In the case of Skip James in the All Music Guide data we have the opposite problem to contend with: one real Skip JamesÕs with two artist entries each with differing birthdays. The two different dates of birth arise in the biographical writings of two reputed scholars Stephen Calt [1] and Sam Charters [3] neither of whom cite the source of their dates. Charters date wrote the date into a book thirty years ago while Skip James was alive. It is surprising therefore that the date persists in current metadata despite the more extensive and more recent biographical work of Stephen Calt.

2.1.3.  Place of Birth

From Stephen CaltÕs writings we can infer that the Woodbine and Whitehead plantations are one and the same, the former designation referring to a place the latter to the owner at the time. Stephen Calt has used both designations in his writings [5].

2.1.4.  Death

The most consistently reported biographical information is Skip James date and place of death. Since his death was not controversial and in a relatively modern hospital there is reason to be believe this is accurate. Although his death was definitely related to the cancer he suffered from for many years the nature and types of this cancer are not consistently reported.

3.        Recordings

3.1.    1931


Title

Paramount
Matrix #

Paramount Catalog#

Yazoo 2009

AMG

Wolf
WBJ-CD-009

Biograph
BLP-12029

Document
DOCD-5005

Devil Got My Woman

L0746-1

13088A

 

 

 

 

 

If You Haven't Any Hay Get on Down the Road

L0766-1

13066B, +Decca Champion 50031

 

Down the...

 

If You HavenÕt Any Hay

Haven t

Hard Luck Child

L0751-2

130106A

 

 

Hard-Luck

 

Hard-Luck

Drunken Spree

L0758-2

130111

 

 

 

 

 

Little Cow and Calf Is Gonna Die Blues

L0763-1

13085B

 

 

 

Little Cow & Calf

 

Be Ready When He Comes

L0755-2

13108A

 

 

 

 

 

How Long "Buck"

L0761-1

13085A

 

 

 

 

Long  Buck

I'm So Glad

L0759-1

13098A

 

 

 

 

 

Cherry Ball Blues

L0748-2

13065A

 

 

 

 

 

Hard Time Killin' Floor Blues

L0752-2

13065B

 

 

 

 

 

22-20 Blues

L0765-1

13066A

+Decca Champion 50031

 

 

 

 

 

Four OÕClock

L0750-1

13106B

4 O'Clock

 

4 O'Clock

 

 

Jesus Is a Mighty Good Leader

L0754-1

13108B

 

 

 

 

 

Yola My Blues Away

L0756-1

13072

 

 

 

 

 

What Am I to Do Blues

 

L0764-1

13111

 

 

 

 

 

Special Rider Blues

L0760-2

13098B

 

 

 

 

 

Illinois Blues

L0749-1

13072

 

 

 

 

 

Cypress Grove Blues

L0747-2

13088B

 

 

 

 

 

 

 

 

 

 

 

Throw Me Down

 

Table 3   1931 Recordings

3.1.1.  Titles

Readily available CD transfers of Skip James recordings made in 1931 were created from 78RPM records many of which are extremely rare. For example, there is only one known disc of ÒIllinois BluesÓ. The titles of these recordings correspond mostly to those found on the labels of the original 78Õs and record company logs. The differences noted in table 3 appear to be due to:

¥    a printing problem with quotation marks in the case of Document DOCD-5005

¥    a choice to incorrectly use a numeral 4 instead of the word Four in ÒFour OÕClockÓ

¥    hyphenation of ÒHard LuckÓ in ÒHard Luck ChildÓ

¥    Truncation due to space constraints on the label and/or sleeve of BLP-12029

The titles above were taken from CD printed inserts or album covers as it is now common practice to omit track listings on the CD itself, e.g., Yazoo 2009. This is the opposite of the common practice in the 1930Õs when 78RPM record sleeves usually contained no information specific to the content.

Song (1930Õs title)

Vanguard 79517-2

VCD 77/88

GCD9910 piano

GCD9901

BCD122

Paramount

Alabama Bound/Elder Greene

 

 

All Night Long

All Night Long

All Night Long

If You Haven't Any Hay
Get on Down the Road

 (BCD122)

Look Down The Road

 

 

Look down the road
(w. drummer)

I DonÕt Want A Woman To Stay Out All Night Long

+BCD 107

 

Hard Time KillinÕ Floor

Hard Time Killing Floor Blues

Hard Time Killing Floor

 

Hard Time KillinÕ Floor Blues

Hard Time Killing Floor
+BCD 107

 

 

 

 

 

Worried Blues

SkipÕs Worried Blues

 

One Dime Blues

 

 

 

Broke & Hungry

 

 

Deep Blue Sea Blues

Catfish Blues

 

 

Catfish

Catfish Blues

 

 

Crow Jane

 

 

Crow Jane

 

 

 

How Long Blues

 

How Long Blues

 

 

 

 

 

 

 

 

 

Special Rider Blues

 

Cherry Ball Blues

 

 

 

Cherry Ball Blues

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Black Gal

Black Gal

 

 

 

 

 

 

Worried Blues

SkipÕs Worried Blues

 

 

Little Cow, Little Calf Blues

 

Little Cow and Calf

 

 

 

 

Center Blues

 

 

Washington D.C. Hospital Center Blues

 

 

 

Table 4            1960Õs Name Variants 1 of 2

The power to choose a song title (and artist pseudonym) rested with the record company in those days not the performer and confusions and errors abound. It was common for the same artist to perform under different pseudonyms to evade the constraints of an exclusive recording contract and it was common for titles to be chosen to avoid copyright issues when artists covered each otherÕs hits. Skip James is reported to have said that ÒHow Long ÔBuckÕÓ was mislabeled as a buck dance [5].It may be that the awkward choice of title  ÒIf You HavenÕt Any Hay, Get on Down the RoadÓ was to avoid legal disputes over one of the many variants of the song cluster known variously as ÒAlabama Bound/ All Night Long/Elder Greene/DonÕt Leave Me here.Ó In Skip James recordings of the 1960s the song is named ÒAll Night LongÓ and the album listing on Biograph BLP-12016 has ÒAll Night Long (If you HavenÕt Any Hay).Ó

Skip James sings the words Òspecial loverÓ on his 1960Õs live recording of ÒSpecial RiderÓ probably because he knew his northern audience would not understand his colloquial use of the term ÒriderÓ. Document RecordÕs choice to name the song ÒSpecial LoverÓ results in the loss of the connection between this song and other recordings of the same song.  This indicates the importance of synonym metadata for song titles as well as authors. Such connections are not just relevant to consumers considering which song to buy they also have rights management implications. Unfortunately the data used by organizations managing and licensing those rights such as ASCAP, BMI, and Songfile are no more reliable than the metadata sources so cannot be used to clarify or correct. Space limitations prevents a complete analysis here of these errors but the must important one is that their data is incomplete because of notoriously poor record keeping and the numerous songs which had no copyright filing at all.

3.1.2.  Attribution error

The attribution of ÒThrow Me DownÓ to Skip James on BLP-12029 is a mistake. According to the album notes this track was from a Òtest pressing, previously unissued, bearing Skip JamesÕs nameÓ. Fortunately this error did not propagate beyond that album and available CDÕs contain the 18 currently recovered 1931 recordings. It is highly unlikely but not impossible that other 1931 recordings will be discovered as Skip James claims he made 26 in that year [1].


 

Song (1930Õs title)

VMD79705-2

VMD79219

DOCD-5149

DOCD-5633

DOCD-5634

Alabama Bound/Elder Greene

 

All Night Long

 

 

 

 (BCD122)

 

Look Down The Road

I DonÕt Want A Woman
To Stay Out All Night Long

Look Down The Road

 

Hard Time KillinÕ Floor

 

Hard Times Killing Floor Blues

 

Hard Time KillinÕ Floor Blues

 

One Dime Blues

One Dime Was All I Had

 

 

 

 

Deep Blue Sea Blues

 

 

 

 

Catfish Blues

 

 

Crow Jane

Someday You Gotta Die

 

 

 

 

How Long

How Long Blues

 

 

 

 

Special Rider Blues

Special Lover Blues

 

 

 

 

Cherryball

 

Cherry Ball Blues

Cherry Ball Blues

 

Oh, Mary DonÕt You Weep

 

Mary DonÕt You Weep

 

 

 

 

My Gal

 

 

Hard Headed Woman

 

 

Washington D.C. Hospital Center Blues

 

 

Washington D.C. Hospital Center Blues

 

 

 

Sickbed Blues

Sickbed BLues

 

 

 

 

Look at the People

 

 

 

 

 

Hard Luck Child

 

Hard-Luck Child

Table 5   1960Õs Name Variants 2 of 2


With the exception of the single misattributed track, the printed information on CDÕs, albums and 78 RPM vinyl correctly attributes Skip James as performer on all tracks. However, the path of this information to electronic form is flawed in many ways. The problem is that most track listings have been manually re-keyed. The lack of a universally adopted standard or widely used business process to automate error free transfer of this information from the record companies has created a small industry of competing sources for the information.

One source is through CD wholesalers who enter the information as part of the process of adding new products to their inventory. Another is from CD owners who enter track information into jukebox applications on their personal computers. The pioneer of this approach is GraceNote (originally called CDDB). Freedb.org, an Òopen sourceÓ competing effort started in reaction to GraceNoteÕs assertion of proprietary rights over data that freedb.org consider was offered by users on a Òfree for allÓ basis. MusicMoz.org and MusicBrainz.org use volunteer human editors to filter errors from submitted information.

3.1.3.  Title Error

Most of the errors from CDDB/freedb.org stem from a weak database schema used in the original data capture system. The original schema did not offer any per-track attributes. This means there is no standardized way of entering the performer name for each track in a compilation. Also genres are associated with CDÕs instead of tracks. The errors that result can be readily seen for Wolf WBJ-CD-009. This is in fact a compilation of the original 1931 Skip James and five tracks recorded by an artist from Bentonia perported to be from the same ÒBentonia SchoolÓ of musicians. One illustration of the problem is the title field from freedb.org: Ò Skip James & Jack Owens / Skip James and Jack Owens 1931-1981Jack OwensÓ.

3.1.4.  Song misattribution

When this CD is inserted into a Macintosh computer, AppleÕs iTunes Jukebox program consults GraceNoteÕs CDDB servers and transfers data into AppleÕs own database format on the users computer. AppleÕs database has a more general design which does allow for individual track attributes but it is forced to populate these per-track fields with copies from the per CD fields of CDDB. The result is that those original 18 Skip James tracks are attributed to both Skip James and Jack Owens. If iTunes is used to convert the CD audio data into MP3 formatted files the per-track data will be transformed into MP3 ID-3 tags and stored with the compressed audio. If these MP3 files are copied (legally or illegally) the incorrect artist attribute will propagate. It is possible to correct this data in the local iTunes database but there is no reliable mechanism to propagate the change back to GraceNote or fix ID-3 tags in MP3 files circulating the Internet. The Freedb.org entry for WBJ-CD-009 will allow human users to correctly attribute the performer to each song: each track name is preceded by ÒSkip James:Ó or ÒJack Owens:Ó.

The approach of building artist names into track names is common in musical metadatabases. For example in July of 2003 an Artist Search on EMusic found Yazoo 2009 tracks but no compilation albums containing Skip James. A ÒTrack SearchÓ for Skip James found some compilations albums. Because of the aformentioned problems of artist name aliases and numerous typographical conventions (dashes, colons, slashes, prefix, postfix) it is very difficult to automatically separate artists from track names. EMusic has worked to improve this situation: In April 2006, Skip James artist searches identify four albums and ten compilation albums.  Two of the four albums listed are actually compilations. This results in misattribution of recordings of Son House to Skip James. The source of this problem was challenging to track as the YazooÕs web site fails to list this CD at all.  The source of the error appears to be the All Music Guide (who also erroneously tag Skip James as a kazoo player). Yazoo 2009 is in fact a compilation and results in the misattribution of 7 recordings of Son House to Skip James. The addition of Son House songs makes little sense musically or historically. Yazoo is perhaps attempting to compete with the JSP CD by filling available space on the CD with the same artist as JSP did for their boxed set. This is worrisome because it suggests compilation CDÕs may not be a reliable source of good clustering data for Òrelated artistÓ recommendations.

The EMusic encoding of DOCD-5005 has the incorrect track data on the ID-3 tags of the tracks and the previews and tracks are mislabeled. A customer review entry points this out and recommends another source. This suggests that EMusic and others could improve their metadata by creating a detailed form for customers to correct their metadata and report encoding problems.


Title

Instruments
DOCD-5005

DOCD-5005

AMG

Wolf
WBJ-CD-009

iTunes (cddb) for
WBJ-CD-009

Authors
Genre

CDDB Genre

Yazoo
2009

Devil Got My Woman

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

If You Haven't Any Hay Get on Down the Road

Piano/Vocals/Foot tapping

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Hard Luck Child

Guitar,Vocals

(Unidentified) Copyright Control

 

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Drunken Spree

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

 

Blues

1930

Little Cow and Calf Is Gonna Die Blues

Piano/Vocals/Foot tapping

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Be Ready When He Comes

Guitar,Vocals

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Gospel

Blues

1930

How Long "Buck"

Guitar,Vocals

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Blues

Blues

1930

I'm So Glad

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Cherry Ball Blues

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Hard Time Killin' Floor Blues

Guitar,Vocals

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Blues

Blues

1930

22-20 Blues

Piano/Vocals/Foot tapping

(James) Wynwood Music

James
Johnson

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Four OÕClock

Guitar,Vocals

(James) Copyright Control

Durham

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Jesus Is a Mighty Good Leader

Guitar,Vocals

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Gospel

Blues

1930

Yola My Blues Away

Guitar,Vocals

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Blues

Blues

1930

What Am I to Do Blues

 

Piano/Vocals/Foot tapping

(James) Wynwood Music

 

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Special Rider Blues

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Illinois Blues

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Cypress Grove Blues

Guitar,Vocals

(James) Wynwood Music

James

Skip James

Skip James & Jack Owens

Blues

Blues

1930

Backwater Blues

 

Traditional

Traditional

 

 

Blues

Blues

 

Everybody Ought To Live Right

 

Traditional

Traditional

 

 

Gospel

Blues

 

I Want To Be More Like Jesus

 

Traditional

Traditional

 

 

Gospel

Blues

 

Jack Of Diamonds

 

Traditional

Traditional

 

 

Field Holler

Blues

 

My Last Boogie

 

Traditional

Traditional

 

 

Blues

Blues

 

Lazy Bones

 

Traditional

Traditional

Hoagy Carmichael

 

Popular

Blues

 

Let My Jesus Lead You

 

Copyright Control

Traditional

 

 

Gospel

Blues

 

My Own Blues

 

Traditional

Traditional

 

 

Blues

Blues

 

Oh, Mary Don't You Weep

 

Traditional

Traditional

 

 

Gospel

Blues

 

Omaha Blues

 

Traditional

Traditional

 

 

Blues

Blues

 

Bumble Bee

 

Traditional

Traditional

 

 

Blues

Blues

 

One Dime Was All I Had

 

Traditional

Traditional

 

 

Blues

Blues

 

Keep Your Lamp Trimmed And Burning

 

Traditional

Traditional

 

 

Gospel

Blues

 

Somebody Gonna Wish They Had Religion

 

Traditional

Traditional

 

 

Gospel

Blues

 

Somebody Loves You

 

Traditional

Traditional

 

 

Gospel

Blues

 

Sorry For To Leave You

 

Traditional

Traditional

 

 

Gospel

Blues

 

Sporting Life Blues

 

Traditional

Traditional

 

 

Blues

Blues

 

They Are Waiting For Me

 

Traditional

Traditional

 

 

Gospel

Blues

 

Walking The Sea

 

Traditional

Traditional

 

 

Gospel

Blues

 

Table 6   Song Attributes


3.2.    Genre Attributes

CD owners can choose to enter any term into the CD genre attribute in freeDB.org entries. The term ÒBentonia BluesÓ is interesting because that term is both controversial amongst scholars and unlikely to be very useful in practice for example in recommender system [reference] since there are so few other CDÕs or artists that anyone is likely to label with that genre designation. Besides, as discussed in the next section, attaching a single genre to an artist or album is a bad and obsolete idea.

Many professional musicians and certainly Skip James contemporaries in the 1930Õs played songs in a diverse variety of styles in different contexts (weddings, dances, church meetings etc., clubs, brothels, house parties etc.) to diverse audiences. In fact many musicians were adept at arranging songs from core musical and textual material into different genres, e.g. Robert WilkinsÕs song ÒProdigal SonÓ (Gospel) and ÒThatÕs No Way to Get AlongÓ (Blues) use the same melody and guitar arrangement. Many musicians from this period had a ÒcleanÓ and ribald version of the same song, e.g.,ÒDonÕt Leave Me HereÓ and ÒDonÕt Ease Me InÓ of Henry Thomas. From interviews of Skip James [1] and his recorded legacy seen as a whole it is clear he was comfortable with many styles on both guitar and piano including, stride piano, boogie-woogie, blues, gospel, popular songs and field hollers. All online metadata the author is aware of describe Skip James songs as simply ÒbluesÓ or a specific blues subgenre. 

The standard genres and subgenre hierarchies presented to users in music navigation systems are derived from an obsolete scheme designed to organize the inventory and browsing behavior of customers in bricks-and-mortar record stores in the 1950Õs and 1960Õs. ÒWorld Music,Ó ÒNew Age,Ó and ÒFolk MusicÓ are current examples of dubious categorizations of little practical value [1].

4.        Conclusion and Proposals

For metadata  errors to be resolved it is essential that recording industry adopt the equivalent of a unique ISBN number for individual sound recordings. One option is the  Global Release Identifier (GRid)  of the mi3p standard effort currently in draft status. Without unambiguous identifiers it is impossible to reliably check, merge and correlate data from different sources.

Relating vendor product codes back to the recording codes will be challenging but is essential for honest digital rights management. Curiously, the best source of data for achieving this may not be record companies or any of the metadata sources mentioned so far; it may be carefully researched academic compilations [10-12].

Poor database schema and poor data on compilation albums result in the creation and propagation of artist and genre attribution errors.  Data on such albums (when they can be identified) need special scrutiny and probably careful human editing.

Larger more complex database schema will be needed to tackle the metadata errors and provide accurate digital rights management.

Metadata providers should provide feedback mechanisms so that the customers the metadata was created to serve can collaboratively contribute corrections. Currently it is difficult for customers to even identify the source organization to send corrections to.

Genre and other categorization schemes are an active area of research and development. The metadata industry now faces the challenge of blending their proprietary ontologies with personal and group ontologies enabled by web 2.0 semantic tagging [13-15].

Although it doesnÕt fit current habits of relational database users many of the metadata values should be viewed as signals, i.e., functions of time that need to be tracked. Genres usually emerge decades after recordings, artists change their names and the names of their songs. The complexity of the required scheme should not be underestimated. The over 900 items in the draft mi3p data dictionary are a sobering read.

Inevitably more audio metadata will be embedded in encoded audio and be available during playback in consumer devices. More efficient binary representations will be needed than the xml schemas in current use. Mature time tagging and sound description standards such as SDIF may be helpful.

5.        Future Work

This work can be extended to evaluating how long it takes metadata providers to correct their data and how long it takes for corrections to be propagated.

This work can also be used as a starting point for a more extensive statistical analysis of metadata quality.

Skip James enthusiasts and scholars can complement this work by analyzing the quality of metadata on video and images and musical transcriptions.


6.        Web Resources

 

All Music Guide (AMG)

http://www.allmusic.com/

Muze

http://www.muze.com/

SDIF

http://www.cnmat.berkeley.edu/SDIF

EMusic

http://www.emusic.com/

Freedb

http://freedb.org/

Gracenote

http://www.gracenote.com/

Yazoo records

http://www.yazoorecords.com

Musicbrainz

http://musicbrainz.org/

musicmox

http://musicmoz.org/

Songfile

http://www.harryfox.com/songfile/public/publicsearch.jsp

BMI

http://www.bmi.com/

ASCAP

http://www.ascap.com/

Mi3p

http://www.mi3p-standard.org/


7.        REFERENCES

 

 

[1]       S. Calt, I'd rather be the devil : Skip James and the blues. New York: Da Capo Press, 1994.

[2]       E. Komarra, Introduction to The Skip James Blues Collection: Hal Leonard.

[3]       S. B. Charters, The bluesmen; the story and the music of the men who made the blues. New York,: Oak Publications, 1967.

[4]       I. Stambler, G. Landon, and I. Stambler, Encyclopedia of folk, country & western music, 2nd ed. New York, N.Y.: St. Martin's Press, 1983.

[5]       S. Calt, "Liner Notes of Skip James: The Complete Early Recordings," Yazoo, 1986.

[6]       Hocheman, "Passion For Blues and Filmmaking," in LA Times Calendar, 2002.

[7]       L. Hoffman, "Liner Notes to She Lyin' by Skip James," Genes Records, 1993.

[8]       J. Perin, The Skip James Blues Collection: Hal Leonard.

[9]       "Drinkin' in the Blues," 2006. http://www.geocities.com/BourbonStreet/Quarter/5939/memphisminnie.html

[10]     M. Leadbitter and N. Slaven, Blues records, 1943-1970 : a selective discography. London, England: Record Information Services, 1987.

[11]     M. Leadbitter and N. Slaven, Blues records, January 1943 to December 1966. London: Hanover Books, 1968.

[12]     J. Godrich and R. M. W. Dixon, Blues & gospel records, 1902-1942. London: Storyville Publications, 1969.

[13]     G. Stamou, S. Kollias, and MyiLibrary., Multimedia content and the semantic Web methods, standards, and tools. Chichester, West Sussex, England ; Hoboken, NJ: John Wiley & Sons, 2005.

[14]     J. Davies, R. Studer, and P. Warren, Semantic Web technologies : trends and research in ontology-based systems. Chichester, England ; Hoboken, NJ: John Wiley & Sons, 2006.

[15]     S. Sirmakessis, Adaptive and personalized semantic web. Berlin ; New York: Springer, 2006.