Archive for the ‘project progress’ Category

Optical Music Recognition

Some time ago, the What’s the score at the Bodleian? project team went to see Matt McGrattan at The Bodleian Digital Library Systems and Services. We wanted to find out what it would take to be able to use our digitized scores to automatically generate a sound file to go with the sheet music, and Matt had been looking into this.

A number of programs exist that will convert images of music into a kind of notation that can be read by computers to, for example, generate a sound file or be used as input into music editing programs. Background reading on the matter had suggested that it was unlikely that our material would convert easily as far too many variables were non-ideal (some references in the Optical Music Recognition Bibliography). We nevertheless wanted to explore what it would take to make it worth-while to include automatic music recognition in the project.

Screenshot of Audiveris interface (from

Screenshot of Audiveris interface (from

The program Matt used for our initial test was Audiveris. Audiveris is an open-source Optical Music Recognition (OMR) tool that can ‘interpret’ music notation and convert it to a form of data (Music XML) that can then be used as input into other programs.

Before we could use the program, our sample file had to be pre-processed (for example making sure it was the right format and size). The file was then loaded into Audiveris and processed as illustrated in the Quick example found on the Audiveris website.

The initial output that we got was not perfect, and what this meant was obvious when the file was used to automatically generate a sound file. Matt suggested it sounded ‘like something by Scott Joplin’. For some kinds of music that is the desired effect, but in this case it was not. It is perfectly possible to post-edit the initial output and manually correct some of the problems, but the time and effort necessary for this means we could not fit it in to the current phase of the project. It is, however, something we want to continue to look into.

This test only included one program (Audiveris) and was performed on only one of our samples. It is possible that other programs will suit our material better, or that this process will be better suited for other types of material. As we are hoping to be able to digitize and make available other kinds of scores in the future, we will continue to explore options for automatic optical music recognition. We’ll report on any further findings as and when we have some.


Sample files

Cover for Abbey House Schottische

'fancy fonts''

As with all digitisation projects, it is important to test your technology on a small sample of material before you finalise your plans. That was, naturally, done also on the What’s the score at the Bodleian? project. Our sample consisted of a few items from the collection, both loose sheets of music (with colour covers and covers without illustration) and a bound volume. Although the currently planned part of this project will be focussing on purely instrumental piano music, we also included some pieces with lyrics in the sample for testing.

The material was scanned at different resolutions and Optical Character Recognition (OCR) was run on the files to see how any text was picked up. The result showed that the material was eminently ‘scannable’ and we received clear and good scans. Not unexpectedly, it was found that in many cases the OCR was not particularly successful when it came to identifying text in ‘fancy fonts’. As many of our covers consist of text in decorative lettering, that means we will not be able to rely on that for the description of the covers. Luckily, humans tend to be able to read this kind of text without too much effort, so it shouldn’t be difficult to decode for the people contributing to the project later.

Categories: project progress Tags: , ,


Some of the boxes containing our scores

Some of the boxes containing our scores

As we were working away on our boxes (I had just finished counting no 39 of 64), we heard the fire alarm. After a short while it became obvious that this was not a test or brief error but the bell was chiming steadily and we had no option but to leave the building. I hated doing that – leaving all our boxes behind. What if it really WAS a fire? What would happen to my galops and waltzes and beautiful covers? I had to fight an urge to carry them all with me – I didn’t even take the Wedding Valse. What shall I now do if I come back and discover it is all in cinder? At least I have some photographs to remind me of what the boxes looked like…

On a more serious note, although this incidence turned out not to be a real fire, it highlights how important digitisation really is. By digitising material we will be able to use it and rejoice in what it has to offer even if we cannot access the original physical copies.

Categories: project progress Tags: ,

Counting boxes

February 24, 2011 Leave a comment

I’m really excited – we’ve started working on the actual material that will be included in the pilot phase.

64 boxes with piano music. The scores are arranged in the boxes in alphabetical order by composer and each box has the information ‘Piano scores Macmillan – Mozart’ or similar. Other than that we do not know what is hiding in the collection. For example, how many difference pieces are there?

Project manager labelling boxes and counting scores.

Counting boxes

Our first job is to prepare the material to be scanned. The boxes are retrieved from their storage and we open them and count the number of items. Fragile items are placed in plastic sleeves and we keep a tally of how many there are. We also look for duplicates and note which one is to be scanned (the project resources are better used for other purposes than scanning identical items twice). All this is recorded so that both we and the people doing the actual scanning know what we are working with.

Once all the scores have been counted and recorded, the boxes will be taken away for the content to be scanned.

Categories: project progress Tags: ,
%d bloggers like this: