Category Archives: Content Management

October 19, 2013 · 1:27 pm

The King James Bible Virtual Exhibit : The King James Bible

The King James Bible Virtual Exhibit : The King James Bible.

Here is an interesting DH project from Ohio State libraries about the King James Bible. It was developed as part of a pilot project, as the developer describes below:

The exhibits pilot innovation grant project was a partnership of three departments, Digital Content Services (formerly SRI), Rare Books and Manuscripts, and the Web Implementation Team (nowApplications, Development and Support). The Preservation and Reformatting Department (Amy McCrory) and the Copyright Resources Center (Sandra Enimil) were also heavily involved. The grant was “to develop a new model for creating and delivering digital exhibits at the Libraries.” The project was developmental in scope, and the specific goals were to create a polished digital version of a physical exhibit, and to gather information about what would be required to develop an exhibits program in the Libraries.

The King James Bible exhibit, curated by Eric Johnson, is indeed a polished exhibit. We learned a great deal from working on it, such as the need to create a glossary of terms as reference for all people on the project. We also identified the strengths and weaknesses of the Omeka software for our environment. The research into what it would take to build a sustainable program took many forms. We looked at existing digital exhibits at OSUL, as well as curator expectations for exhibit functionality, and the use of Omeka at other institutions. We tracked information on the time it took to create the exhibit.

What’s next? The report is done and has been given to the Executive Committee. The suggestions in the report are just that – suggestions. We were not charged to develop a program. We applied for funding to explore the possibilities; the report is what we discovered. It is also worth noting that the environment has changed since the report was written. Most important, is that the Libraries have hired an Exhibits Coordinator. However, many of you have expressed interest in our results.

Read Report Here (docx).

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, digital repository, Library science, Omeka

Tagged as digital collection, digital humanities, King James Bible, Ohio State Libraries, Omeka

December 8, 2011 · 2:00 am

Unit 13 – I’ve learned a lot this semester!

This slideshow requires JavaScript.

I have learned so much this semester, it’s hard to know where to begin. But I guess I need to begin with metadata and taxonomy. Early in the semester I posted that “I read some articles this week that made me realize that much of what added value librarians provide to collections is in the form of metadata. I guess I always thought of librarians mainly as reference librarians or subject specialists — not as experts in classifying and indexing information.” But now I know (a little) about the value of good metadata, and taxonomies, and I’ve learned about metadata standards such as Dublin Core, that help to standardize metadata usage over the entire web. I’ve learned about the semantic web, and the idea of linking data and building ontologies that describe the relations between concepts. I’ve learned about the contrasting benefits of controlled vocabularies versus “folksonomy” (i.e., tagging). And I’ve learned a little about harvesting metadata, using PKPHarvester to harvest metadata from several databases and data providers.

I’ve also learned more about the Open Access movement, and open access initiatives; and about the issues of “freeing” information from behind paywalls. The main obstacle to this is that knowledge (and its associated data) is a currency that has value, and that making it freely available will necessitate basic structural changes in academia and in academic publishing.

Those structural changes include major changes in the role of the library and librarians in the production and preservation of knowledge. These changes present sigificant challenges to libraries in managing, curating and preserving digital materials and data. Librarians are increasingly expected to have the technical skills to design and select Content Management Systems for their libraries, to design, create, and maintain digital collections and digital repositories, and to train other librarians to do the same, often with limited technical staff and limited budgets. Open Source software is a boon to the small library or non-profit or museum that needs these types of functionality; but again that requires technical knowledge and skill on the part of the librarian to install, configure, and maintain operating systems and small in-house servers.

In order to gain those technical skills, I learned how to create several virtual machines with linux stacks of various sorts, and to install and configure four different content management/digital repository software systems (Drupal, DSpace, Eprints, and Omeka). I created a sample digital collection in each one, and used the experience to compare each system’s strengths and weaknesses, and then to decide which sorts of digital collections (and environments) each system is best suited for. I then chose which system to use to host my digital collection, set up the system, entered the records and created the metadata, and then wrote a paper on the process, which will contribute toward my digital portfolio. In my case, I decided upon Drupal as the system I want to use for my digital collection. Drupal has a steep learning curve, and I really learned a lot about Drupal in a short time through the process of designing my collection, downloading and installing extra modules to provide the functions I needed, and troubleshooting the installation. I’m proud of how well my prototype digital collection works, but I already have plans to keep working on the prototype to get it working even better, and to extend its functions, and to redesign certain features. I’m turning into a Drupal geek already.

Librarians are also expected to conduct outreach to their various communities, in order to make the services of the library more accessible and useful. This seems to be especially needful for the humanities scholar community. We read many articles about the obstacles that keep humanities scholars from embracing digital initiatives, and from using digital resources (and those articles confirmed my own observations). We learned about how humanities scholarship, data, and workflows are vastly different than those of the scientific community; identified some of the obstacles that prevent humanities scholars from using (or producing) digital resources, including digital repositories; and read about several digital humanities initiatives, both in the U.S. and in Europe.

I think one of the most enjoyable aspects of the course for me was just the chance to see so many different examples of digital collections; to interact with my fellow students over their collections and interests; and to explore what is already being done, what is possible and useful. A find that was very helpful to me was the UK Reading Experience Database (RED). This is a database that contains much of the same sorts of data that I wanted to collect in my own digital collection, so it gave me some assurance that I was on the right track with my ideas.

Finally, I have included a slideshow of some screen shots from my project.

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, digital repository, digital surrogate, Drupal, dSPace, ePrints, George Eliot, Library science, metadata, Omeka, Project Management, semantic web, SIRLS 675, taxonomy, Ubuntu Linux, VRE (Virtual Research Environment)

Tagged as Drupal, learning

November 28, 2011 · 3:14 pm

Unit 12: pre-installed VM versus DIY

I think the question here is really: what skills are really needed in a librarian working in digital collections? That is not easy to answer, because libraries vary so much in terms of staff size, budget, and training. I certainly feel much more confident about installing and configuring virtual machines (VMs) as a result of this course; but I wonder if it was the best use of class assignment time, especially since it could be so time-consuming. But if I was the sole librarian in a small non-profit or museum, with no technical staff, and I wanted to create and host a digital collection, the ability to create a VM from scratch would be important. I guess the question is, how common is this scenario, and how common will it be in the future? And what sorts of librarianship does the DigIN program want to support?

I don’t know if a middle ground might have worked better; maybe install only 3 VMS and spend more time on metadata and actually working with collections. I think DSpace and Eprints were sort of repetitive; maybe we could have been given a choice to install one or the other as our example of digital repository software. I think it was good to see Drupal because it is so ubiquitous, and to get an idea of the more technical end of the spectrum in digital collections management. And I think Omeka seems to represent the other end, the simple end of digital repository management.

Leave a comment

Filed under Content Management, Digital Collections, digital repository, Drupal, dSPace, ePrints, Library science, Omeka, SIRLS 675

November 27, 2011 · 11:49 pm

Unit 11: Repository Software Homepages – an assessment

A repository software package’s homepage ought to be attractive, clear, and inspire the users’ confidence. Some of these homepages do that better than others; but they are also geared toward very different audiences. In general I think each site is geared toward the user that could best benefit from it.

Eprints (http://www.eprints.org/): clearly states what it is, the interface is clean, and provides a live demo as well as links to documentation, downloads, and a description of the principles of open access.
Omeka (http://omeka.org/): again, the homepage is well-designed, attractive, and provides clear links to all the information a user could want. It seems geared especially to draw in the new or uninitiated user (i.e. me). The user could be an individual rather than an institution.
DSpace (http://www.dspace.org/): There is a lot of white space on this page. For some reason I find that intimidating. There is a very clear statement about what it is–if you know what an “institutional repository application” is. The logo at the top identifies it as a “scholar space” – This is definitely geared toward an institutional user/IT professional that already knows what an institutional repository is. This might be more confidence-building if you are an institutional administrator looking to find a turnkey application.
Drupal (http://drupal.org/): This is a very busy page. But the tag line: “Come for the Software, Stay for the Community” is catchy. The page goes out of its way to show you how world-wide its scope is; you can tell that the software is geared for IT professionals and developers; they even have announcements about DrupalCons (a very geeky term for conventions). This is definitely a geek community and that means that I am not the sort of user they are targeting.
PKP (Public Knowledge Project http://pkp.sfu.ca/): This site also has a lot of projects besides the harvester software. It takes a while exploring and reading to figure out what this site is and what it contains. It’s not for the casual user, and it seems to already assume that the user is committed to open source and open access projects.
JHove (http://hul.harvard.edu/jhove/): This is a site full of technical jargon, definitely targeting the technical user, not the casual user or the repository administrator.

I guess each site has its advantages for the type of user it is seeking. As a librarian or a non-profit or museum curator, I find the first 3 more attractive and accessible.

Leave a comment

Filed under Content Management, Digital Collections, digital repository, Drupal, dSPace, ePrints, Omeka, SIRLS 675

November 22, 2011 · 12:01 am

Unit 9 – metadata difficulties

Creating a catalog record is expensive — some estimates range from $50 to well over $100 per record. It’s also not easy to create good metadata that is consistent enough so that queries across repositories (or even across different catalogers) return precise results. Discuss briefly the challenges you are having cataloging your items in terms of subject listings, key words and tags, categories and other facets. How are you approaching the problem of consistency (or are you)?

Consistency is a goal that is pretty difficult so far for me, especially because I tend to not be very systematic when it comes to creating taxonomies and tagging data. I am learning about the value of good metadata in this course, though, so I’m trying to be better at it. But it is difficult to be consistent when each repository program has its own way of adding and labeling metadata. Because the fields are not consistent and not very customizable, I find myself having to make the data fit existing fields and categories. A controlled vocabulary is very helpful when I am tagging, so I think that I would prefer to find a program that lets me define my own–or that at least keeps track of what I have already used and suggests it.

I realize that I need to learn more about cataloging — another course later on in the program, I imagine.

Leave a comment

Filed under Content Management, Digital Collections, digital repository, metadata

October 1, 2011 · 4:44 pm

DuraSpace | Open technologies for durable digital content

DuraSpace | Open technologies for durable digital content.

This is the home page of the non-profit organization that runs DSpace and Fedora Commons. I wanted to link it here for future reference.

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, dSPace, SIRLS 675

Tagged as dSpace

September 27, 2011 · 1:49 pm

Unit 5 – Using Drupal for my digital collection

Discuss either a) which module you decided to try to try from assignment 2 and how it enhances your collection; include if you like any problems or tips related to installation; or
b) now that you have some experience, how you feel overall about the suitability of Drupal for your collection.

It is clear that Drupal, in the hands of a trained Drupal programmer, would be a powerful and customized tool that could be used to manage my digital collection; although it seems that it is not really designed for the type of content I would like to include: many large searchable text files (in pdf or other formats, especially including files with specialized markup). When I say that it is not really designed for it, I mean that the native content types don’t lend themselves to it (although I have not experimented with the “book” type). Of course there are many modules that add that type of functionality; I saw several that seemed designed to make RDF-type relations between nodes; but I was too intimidated by all the dependencies to try to install such modules, and the help material was too highly technical for a casual Drupal user to understand.

I did find an apparently simple module that added some necessary functionality to my site, i.e., the ability to search attached text files. The module is called, appropriately, search-files.Here is a screenshot of the kind of output the module produces:

Because this is a crucial function for my collection, I decided to install it, even though it requires several “helper applications” in Linux.

Helper Applications

In order to extract text, this module calls ‘helper apps’ such as cat and pdftotext. Drupal administrators can configure any helpers they like. Helper apps need to be installed on the server and need to be setup to print to stdout.

Most Linux distributions have the following helper apps available:

cat – generic text (txt) files

pdftotext – Adobe Acrobat (pdf) Documents

catdoc – Microsoft Word (doc) Documents

xls2csv – Microsoft Excel (xls) files

catppt – Microsoft Power Point (ppt) files

unrtf – Rich Text Format (rtf) files

For more information about helpers and how to configure them, see hints for Linux and Windows. It is also possible to configure helpers in a shared hosting environment.

I assumed that my Linux installation might already have these applications available, although I could enable them separately if need be. So I downloaded and installed search_files-6.x-1.6.

I had no difficulty installing it or configuring it in Drupal. But it can’t search the pdf files I have attached, so I’m assuming I also need to install the helper applications in Linux.

UPDATE: as it turns out, this module worked in Drupal 5 but is broken in Drupal 6. Evidently it works in Drupal 7, so hopefully when I update my system I can get this working. Else I will need to find a different CMS, because this search functionality is crucial.

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, Drupal, Operating systems, SIRLS 675

Tagged as digital collection, Drupal

September 27, 2011 · 1:46 pm

Unit 4 – Drupal as a content management system – initial thoughts

This week, you might choose to comment on how suitable Drupal might be for your collection. Begin to develop some criteria you would use to judge how well an application such as Drupal meets the needs of your collection and its users. We will expand on this problem over the semester.

We have been reading about the need for humanities scholars to be able to use a digital collection with a degree of confidence about the nature and authority of the relations between objects, yet having the structure of those relations clear so that the information added is objective rather than subjective. What I would really like to make is a database or collection or semantic web of all the texts (with attached full-text) that George Eliot read or interacted with, with some degree of confidence added in about how influential those texts were. One could argue that there is a sort of taxonomy to how much she interacted with a text, in ascending order from hearing it read aloud, to reading it in translation, to reading it herself in the original language, to reviewing it, to editing it, to translating it from another language into English. These are all types of relations with a text. One can also argue that reading it more than once, or attesting to its influence in letters or in research notebooks, is also a measure of influence. I was reading about RDF, and that seems exactly the sort of inferential structure I want to be able to capture, starting with the simplest: What she read, with some sort of statement about her relation to the text, and a documentary page showing the authority for that relation. One can infer the direction of influence between texts according to who read what and when.

Because eventually I would want this to be part of a larger database of “Literary intelocutors,” I’m having trouble figuring out if the key entity in this collection is texts or a person. The way I envision the normalized tables in a database would be a table of persons, a table of texts, and a table of links between the two, in the form of “GE read Rousseau’s Les Confessions, in French, in 1834, according to these authorities, and here is a link to that edition of Les Confessions in French (or perhaps a digital image), plus a searchable English translation.” I have been thinking that I needed to include all the standard metadata for each text in each entry, but that seems a waste of space. The new and useful information to be collected is the table of links, so all I really need to capture is what I have underlined; Each underlined phrase is a field in my collection.

Any content management system I use for my collection will need to be able to search and manage large attached text files in a variety of formats, to query the collection of these files with a full-text search, and have a faceted search that narrows the query results by type of relation, by subject, by language, by year, or type of text file. I also want to be able to widen the search if necessary, though, across subjects, dates, etc. The idea is to be able to use this collection to specify a group of texts to search, and to be able to document the relationships and direction of influence between them. I would love to be able to actually graph the connected nodes in some sort of network display and to assess the degree of influence.

Leave a comment

Filed under Content Management, Drupal, SIRLS 675

Tagged as digital collection, digital humanities, Drupal

September 7, 2011 · 5:28 pm

IRLS 765 Unit 2 – reviewing content management solutions

We were asked to skim a special issue of Library Hi Tech on content management systems (Vol. 24, issue 1, 2006), pick an article, summarize it, and discuss it.

First a few notes about the special issue: almost all the articles contained case studies of libraries and their processes in selecting a CMS system (either open-source, proprietary, or developed in-house), and then implementing that system. I was rather dismayed to find that most of the libraries large and small ended up developing a CMS in-house, because other systems were either too expensive, not flexible enough, or would require the library to jettison too many established workflows and/or already-built in software. The reason this dismays me is that designing and installing a custom CMS in almost every case took extensive programming knowledge and resources outside of the library staff. That tells me that although librarians are increasingly expected to be involved in designing and selecting CMS for their libraries, and although there are many CMS packages out there, implementing a workable CMS without significant outside help is still far beyond the capabilities/resources of most library staff.

For example, Matt Benzing’s article “Luwak: a content management solution” documents how the Rensselaer Research Libraries in Troy, New York, were able to adapt and extend a piece of software that had already been developed at its associated institution, Rensselaer Polytechnic Institute. Prior to adapting that software, the libraries had been using DreamWeaver to develop its web pages, which provided some “design consistency” but did not provide enough flexibility in content management nor enough access control to avoid the occasional “misstep by a librarian new to HTML” resulting in the accidental erasure or overwriting of web pages (9). The software that they adapted, an “XML-based application” named Luwak, had already been developed elsewhere at the institution “as a solution to the problem of how to adapt web pages for users of PDAs and other handheld devices without having to maintain multiple copies of the same information.” This software already contained the types of functionality the library staff were looking for, but just needed to be extended to the particular uses of the library.

Luwak was implemented using open-source software, written in Java and utilizing a MySQL database. It had already been deployed to manage the campus newsletters and a campus-wide information system. Its ability to control content creation and site maintenance through user roles, to separate content from format and reformat content on the fly, to validate content before posting to the site, and to allow timed updates made it useful for the library as well. The developers were not the librarians, but the technical staff at the Institute; the IT librarian developed the style sheets for the site. Since the new system resided on a separate server, they were able to develop the new library site in Luwak without disturbing the old site. The other librarians were quickly on board with the switch, as most of them just wanted to generate content and did not want to be involved with the formatting or site design. The implementation and switch over went smoothly. The article concludes that “Further work needs to be done on providing a useful on the fly stylesheet for handheld devices, and in exploiting some of the design and functionality capabilities of the system, such as providing collapsible hierarchies of links, multiple page skins, and the importation of library news bulletins into an RSS feed. The website as it now stands is more flexible, efficient, and consistent than it has ever been” (13).

Leave a comment

Filed under Content Management, Digital Collections, Library science, SIRLS 675

Tagged as CMS, Content Management System, library

September 7, 2011 · 3:06 pm

LibX – a Firefox extension for enhanced library access

Emerald | Library Hi Tech | LibX – a Firefox extension for enhanced library access.

I wonder why I hadn’t heard of this. There is a version for the UA library available, which I downloaded and installed in Firefox. It is very useful to be able to browse the UA catalog system from my browser, and it connects with Google Scholar as well. It also tells me if a book I’m looking at on Amazon is available at the UA library.

Leave a comment

Filed under Content Management

Category Archives: Content Management

The King James Bible Virtual Exhibit : The King James Bible

Unit 13 – I’ve learned a lot this semester!

Unit 12: pre-installed VM versus DIY

Unit 11: Repository Software Homepages – an assessment

Unit 9 – metadata difficulties

DuraSpace | Open technologies for durable digital content

Unit 5 – Using Drupal for my digital collection

Helper Applications

Unit 4 – Drupal as a content management system – initial thoughts

IRLS 765 Unit 2 – reviewing content management solutions

LibX – a Firefox extension for enhanced library access

Recent Posts

Archives

Topics