Posts Tagged ‘project’

Optical Music Recognition

Some time ago, the What’s the score at the Bodleian? project team went to see Matt McGrattan at The Bodleian Digital Library Systems and Services. We wanted to find out what it would take to be able to use our digitized scores to automatically generate a sound file to go with the sheet music, and Matt had been looking into this.

A number of programs exist that will convert images of music into a kind of notation that can be read by computers to, for example, generate a sound file or be used as input into music editing programs. Background reading on the matter had suggested that it was unlikely that our material would convert easily as far too many variables were non-ideal (some references in the Optical Music Recognition Bibliography). We nevertheless wanted to explore what it would take to make it worth-while to include automatic music recognition in the project.

Screenshot of Audiveris interface (from

Screenshot of Audiveris interface (from

The program Matt used for our initial test was Audiveris. Audiveris is an open-source Optical Music Recognition (OMR) tool that can ‘interpret’ music notation and convert it to a form of data (Music XML) that can then be used as input into other programs.

Before we could use the program, our sample file had to be pre-processed (for example making sure it was the right format and size). The file was then loaded into Audiveris and processed as illustrated in the Quick example found on the Audiveris website.

The initial output that we got was not perfect, and what this meant was obvious when the file was used to automatically generate a sound file. Matt suggested it sounded ‘like something by Scott Joplin’. For some kinds of music that is the desired effect, but in this case it was not. It is perfectly possible to post-edit the initial output and manually correct some of the problems, but the time and effort necessary for this means we could not fit it in to the current phase of the project. It is, however, something we want to continue to look into.

This test only included one program (Audiveris) and was performed on only one of our samples. It is possible that other programs will suit our material better, or that this process will be better suited for other types of material. As we are hoping to be able to digitize and make available other kinds of scores in the future, we will continue to explore options for automatic optical music recognition. We’ll report on any further findings as and when we have some.


Sample files

Cover for Abbey House Schottische

'fancy fonts''

As with all digitisation projects, it is important to test your technology on a small sample of material before you finalise your plans. That was, naturally, done also on the What’s the score at the Bodleian? project. Our sample consisted of a few items from the collection, both loose sheets of music (with colour covers and covers without illustration) and a bound volume. Although the currently planned part of this project will be focussing on purely instrumental piano music, we also included some pieces with lyrics in the sample for testing.

The material was scanned at different resolutions and Optical Character Recognition (OCR) was run on the files to see how any text was picked up. The result showed that the material was eminently ‘scannable’ and we received clear and good scans. Not unexpectedly, it was found that in many cases the OCR was not particularly successful when it came to identifying text in ‘fancy fonts’. As many of our covers consist of text in decorative lettering, that means we will not be able to rely on that for the description of the covers. Luckily, humans tend to be able to read this kind of text without too much effort, so it shouldn’t be difficult to decode for the people contributing to the project later.

Categories: project progress Tags: , ,

What’s a duplicate?

March 15, 2011 2 comments

As we are going through the boxes, we are identifying duplicates, the idea being that we do not need two identical copies of the same item. But what is ‘the same’? It may seem obvious at first – if it is the same piece of music it is a duplicate. But what if it is a different edition, where some changes (may) have been done to the music? Well, then it is not an identical copy and thus not a duplicate. But what if the music is the same but the cover differs?

We have taken the view here that if the cover is different, the items are not duplicates, even if the music would sound identical irrespective of what copy you play it from. The reasoning behind this is that these items are not only music scores. The actual physical copies are interesting, and variations there can very well be of interest to someone researching the genre or period.

The differences between two versions of a score can be quite obvious, like the Valentine Galop pictured here.Different covers for Valentine Galop The covers look different – one has an illustration while the other uses different fonts in a decorative way – which may make it less obvious that this is actually the same music. It is the same composer (although called M Relle on one cover and Moritz Relle on the other), but the title is slightly different (St Valentine’s Galop vs Valentine Galop). It is only by looking at the actual music notation that we will know if it is the same piece. In this case it is easy to motivate scanning both copies, since there is so much to look at and compare for someone researching the area.

In other cases, duplicates may be less obvious. It may be that the cover looks very similar, but a closer inspection reveals small differences, for example that the advertisements on the back are different or that the list of titles in the series contains different number of items. If these were to be considered duplicates, which one should be scanned? Who should decide that one set of adverts is more important or interesting than another? We have refrained from making that decision and are instead scanning both copies in cases like these. This will allow different kinds of research on the material. The actual number of near-duplicate scores is fairly low, so seen in the grand scheme of things scanning the near-duplicates it is a small extra. Having them does however also allow a further interesting use, namely for quality assurance. Having the same title described twice will allow us to make comparisons between the different descriptions and see in what way they differ (if at all). That will help us understand how much variation we should expect in the descriptions that we are getting. There are other ways this quality assurance can be performed, and we will be using various methods to get material that is truly useful for those who wish to make use of it.


February 24, 2011 Leave a comment

This is the first time I get a really good look at the material. What strikes me at once is the covers. Many of the pamphlets have colourful cover illustrations, usually something that relates to the title of the piece of music.
Cover for scores 'Fancy Ball'
Some depict people in ‘exotic’ dress, or involved in some special pursuit, such as a fancy ball.

Cover for Bird's Nest Polka

Bird's Nest Polka

Scenes from nature are also frequent – either scenic views or some particular element, like a bird’s nest.

Cover for Welcome Home

Welcome Home

Some covers give a good idea of the fashion of the time – pictures of women in dresses and men with exquisite neck ties and perhaps a little moustache. Or men in uniform – there’s quite a few of those.

What I really like about all these covers is that not only are they pretty and interesting but they are REAL – they are not reproductions made to look like something from a different time. These are just like they would have looked when someone in the late 19th bought the score and took it home. Was it because of the cover that a particular piece  was chosen? We’ll never know, but we can enjoy looking at them all the same.

Categories: sample scores Tags: , ,

Counting boxes

February 24, 2011 Leave a comment

I’m really excited – we’ve started working on the actual material that will be included in the pilot phase.

64 boxes with piano music. The scores are arranged in the boxes in alphabetical order by composer and each box has the information ‘Piano scores Macmillan – Mozart’ or similar. Other than that we do not know what is hiding in the collection. For example, how many difference pieces are there?

Project manager labelling boxes and counting scores.

Counting boxes

Our first job is to prepare the material to be scanned. The boxes are retrieved from their storage and we open them and count the number of items. Fragile items are placed in plastic sleeves and we keep a tally of how many there are. We also look for duplicates and note which one is to be scanned (the project resources are better used for other purposes than scanning identical items twice). All this is recorded so that both we and the people doing the actual scanning know what we are working with.

Once all the scores have been counted and recorded, the boxes will be taken away for the content to be scanned.

Categories: project progress Tags: ,
%d bloggers like this: