Setting up a Digital Library using Koha and Opensource Technology by Indranil Das Gupta About Koha About this award-winning software, Joshua Ferraro – a leading Koha developer, says on his website: “Koha was built using Perl scripting language (http://perl.org), MySQL Relational Database Management System (http://mysql.org), and Apache Web Server, running on the GNU/Linux Operating System (http://kernel.org); however, it has been ported to other operating systems (including Windows) and should be compatible with any system running an SQL database, a web server, and Perl.” The current stable version of Koha is 2.0 and this is being used at the University library. The next stable release – Koha v2.2 is scheduled for release in end-October, early November '04. It will usher in a slew of new features like MARC management of Authority and Thesauri files, comprehensive Serials management, improved templates and bar-code management from within Koha to name just a few. Meanwhile one can also download 2.2's beta version from the website – this way one can test, help the developers find and fix bugs in the software and be ready to migrate to the latest version when it is released. The WBUT implementation is taking this route. Setting up Koha In the early days of the implementation, Joshua told this implementor during a discussion – “Setting up Koha for the first time can be bit of a bear”. He was right, installing and configuring Koha isn't just about grabbing the right pieces of software, and installing them. Koha is a functional software and setting it up requires bringing to the table a right mix of Linux System Administration, Database Management and Configuration Management skills on one hand and Library and Information Science skills on the other. The implementors should be at equally at easy in debugging a MySQL database setup that isn't working right just as they should be at ease with cataloging, managing serials, collection building, Dewey/Universal Decimal Classification, MARC21 tags, authority files et. al. At the university, the implementation is running on Gentoo Linux (www.gentoo.org) as well as Fedora Core 2 (http://fedora.redhat.com) from Redhat Inc. The production system is housed in an high-powered, dual-CPU IBM pSeries p6301 Enterprise class server (running Gentoo 2004.2 for PowerPC 64-bit architecture). The development server is a x86 class assembled system (single-CPU Athlon 2000+ XP, 1 GB RAM and a 40 GB harddisk). The clients used are either normal desktop systems (running Windows XP or Fedora Core 2 Linux) or low-cost LTSP based thin-clients. The Software Prerequisites 1.A working GNU/Linux server system (with networking configured if you want client systems to connect to the OPAC). 2.MySQL Relational Database Server. 3.Perl 5.8.0 and upwards 4.The following Perl Modules: a)Event b)HTML::Template c)Mail::Sendmail d)MARC::Record e)Net::Z3950 5.Apache web server 6.YAZ – a C/C++ programmer's toolkit supporting the development of Z39.50/SRW/SRU clients and servers. 7.And of course, a copy of the Koha software. Barcodes, barcode symbologies and their ilk The library is successfully using bar-codes for tracking all the physical items (i.e. books, serials, CD-ROMs etc.) in the holdings. After evaluating several FLOSS-based barcoding solutions, all of which we needed to download, KBarcode (www.kbarcode.net) was found to be the right candidate for the job. KBarcode provides a simple graphical frontend to barcode generator backends like GNU Barcode. It also provides facilities to design custom barcode labels through a label editor and supports batch printing of barcodes using barcode numbers stored in a database. Once generated, the barcodes can be exported into several formats such as PostScript, PDF etc. We chose to use the PDF format for it camera-readiness in terms of printing and for its ready portability on GNU/Linux, Windows and Mac OS platforms. However, getting KBarcode to work the way one wants calls for a fair understanding of a GNU/Linux system and how to resolve software dependencies. So, it can be somewhat of an uphill task for a non IT person, but the good thing is that once set up, it just works! The software dependencies required by KBarcode are clearly defined at the Kbarcode site. An implementor should keep an eye on which version of KBarcode she is using as the requirements are somewhat version specific in nature. Extending Koha The CD-ROM Image Server – Necessity the Mother of Invention With its high data capacity and amazingly low cost, the Compact Disc or CD-ROM has firmly established itself as an ubiquitous means of publishing large volumes of information. These days several publications are made available electronically exclusively on CD-ROMs, while books and periodicals are very often accompanied by them. During setting up the Koha system, this posed a challenge. How do the library manage this electronic resource? How to allow the members to access the same while making sure that conditions set forth Section 52 of the Indian Copyright Act (1999 Amendment) are adhered to? With the proliferation of cheap CD Writers and throw-away priced blank CD media, handing out CD-ROMs to the members for reference was not a option. The solution was simple – buy a CD Towers/Juke Box system. These are essentially specific-purpose computers with a fast and large hard disk (or a number of them), which can store bit-for-bit mirror images (called ISO images in the technical parlance) of literally hundreds of CDs and make them available to users spread across the network, just the way they would access a local CD-ROM drive. The response to the enquiries were less than heartening. Yes, such systems were available, and no, as the vendors informed, they are not available on the GNU/Linux platform. And by the by, they would cost around 3.75 – 4.0 lakhs of rupees only. Things looked rather dismal, but as the saying goes every cloud has a silver lining, so did this one. A close, hard look at the GNU/Linux software stack provided the answer. Use the 2.6 family of the Linux kernel and its support of extended loopback devices (upto 256 from the original 8), the automounter daemon (a piece of software that makes it possible to access not readily available filesystems on-demand), the Apache web-server to provide one of the possible ways to access the mounted ISO image using a web-browser (e.g. Internet Explorer, Mozilla, FireFox, Opera etc.) on the client side, as well as SAMBA to provide an alternative means of access to Windows-based users (aside from the browser). Tie all these down together using a bit of symlinking magic and some nifty Unix shell script and the odd cron job to take care of the daily updates. Voila! you have a CD ROM Image Server based entirely on open source software. The beauty of this system is its clean simplicity. At all given times, it is now possible to provide simultaneous access from a single physical server to around 250+ unique CD ROM titles, whereas the number of CD-ROMs whose images can be stored is technically limited only by the size of your hard disk(s). Lets look at the hardware cost of a CD Image Server. At 650 megabyte to a CD, a 120 gigabyte hard disk costing around Rs. 5000, can store around 180 CDs. Since most of the CDs accompanying books as additional materials, do not cross 200 megabytes in size, realistically you could be storing anywhere around 400+ CDs. As for the rest of the hardware for the image server, the implementors used standard off-the-shelf components costing no more than Rs. 30,000 and assembled locally. To err on the side of safety, one can always go for RAID 1 (disk mirroring) using either the well-established software-based RAID solutions like MD-RAID or use the more recent ATA RAID (a given feature on most of the high performance desktop motherboards these days). Others with deeper pockets or needing better RAID performance can look towards hardware-based RAID solutions that use the more expensive SCSI disks. Using a RAID system can also benefit those who are looking for access speeds through the use of RAID 0 or disk-set striping. Whether using a PATA/SATA/SCSI drive, it is important to make sure that the fastest disk with the largest I/O buffer is chosen. On ATA variants I/O buffers of 8 MB are becoming standard these days. Integrating a Search Engine for the CD Server. Now that the library had a CD Image Server, the obvious question was that how can a browsing user know what is stored in them. In other words, how do we search for anything that could possibly be stored on those hundreds of CD-ROMs? The answer to that was to provide the users with intranet search engines, that sort of acted as your friendly, in-house Google. Enter Namazu and HTDig. Both are well-established opensource search engines of repute supporting intelligent searching using boolean parameters, soundex and wildcard patterns, full-text searching or can look for files of a certain type. How these tools were set up and configured is well beyond the scope of this article, however it is suffice to say that these tools are all web-based and can be accessed using a browser by the user. Koha Users Community in India The global Koha users' community is growing at a fast pace, with India being no exception. Looking at this interest and to provide a common forum for discussing the typical problems faced by Indian implementors, a Indian Koha Interest Group (KIG-India) has been formed and a mailing list started earlier this month. From only 5 members, within just 2 weeks it has gained 22 members from all over India – a clear indication of the active interest about Koha in the country. News of the formation of the KIG-India was picked by UNDP's APDIP programme which looks after UNDP funded ICT4D programmes in the Asia Pacific region, and the group has found a prestigious place on their website2. About the Author Indranil Das Gupta is a GNU/Linux and Opensource software user for the past several years. At work, he generally attempts to pass himself off as an Open Source technology consultant to the unsuspecting people around. He can be reached at indradg@icbic.com