Tuesday, July 07, 2009

How many times are people going to scan "One Hundred Years of Solitude"?

A question: Can I rip my real books into ebook format? I want to because it would free up a whole room and one wall.

Of course, one of the reasons for the massive success of the ipod is the ability for people to rip their existing collections into ipod format. Like chumps, the record companies fought against this, saying that somehow we did not own the records we had bought but actually the de facto ability to do this was a major part in creating a new market for them.


So, how do I rip a book? It is easy to scan/photo pages into a computer but Optical Character Recognition technology has only a 95-99 per cent accuracy rate. That sounds good but it actually means that there is a blunder every two or three sentences. It is therefore impossible to "rip" a novel without spending lots of time correcting it.

However, this would be very easy to do if we had Optical Sentence Recognition rather than Optical Character Recognition software. If I scanned in my book and the software compared the strings of letters in my book to other strings of letters previously scanned in (and possibly corrected) by other users it could get well above 99 per cent accuracy. It would recognise a sentence rather than a letter and use the consensus (or most authoritative, human corrected version) to inform its recognition of my page. Ripping would be a doddle because most books are replicated thousands of times across different personal libraries. How many times are people going to have to scan in and independently correct "One Hundred Years of Solitude"?

Alternatively, a company could just offer to digitise people's book collections, speeding the process up massively by keeping corrected versions of the most popular books on its system.

The key point here is that people own these books and they have the right to digitise them. But I reckon both the solutions above would come across resistance from publishers and distributors like Amazon. I am speculating without any factual basis but think they would say that they own the copyright on those sentences in the Optical Sentence Recognition database, even if people could establish that they privately owned the books that they were digitising.

This would not be an attempt to frustrate piracy, this would be an attempt to restrict readers' rights over their own property on the utterly false premise that significant numbers of people are going to go out and buy digital versions of what they already own.

On a slightly broader less speculative point, the more I get into this ebook thing the more I realise that DRMs (basically systems that restrict the users ability to use the text they buy freely) are being used not to stop piracy but to try to control consumers and the market. These tactics are, in various ways, inhibiting the growth of the ebook/enewspaper market [1,2,3,4,5,6]. This is not in the interests of authors or publishers but it is in the interest of Amazon (or at least in line with a certain very narrow strategy that Amazon seems to be pursuing at the moment).

No comments: