I think the question here is really: what skills are really needed in a librarian working in digital collections? That is not easy to answer, because libraries vary so much in terms of staff size, budget, and training. I certainly feel much more confident about installing and configuring virtual machines (VMs) as a result of this course; but I wonder if it was the best use of class assignment time, especially since it could be so time-consuming. But if I was the sole librarian in a small non-profit or museum, with no technical staff, and I wanted to create and host a digital collection, the ability to create a VM from scratch would be important. I guess the question is, how common is this scenario, and how common will it be in the future? And what sorts of librarianship does the DigIN program want to support?
I don’t know if a middle ground might have worked better; maybe install only 3 VMS and spend more time on metadata and actually working with collections. I think DSpace and Eprints were sort of repetitive; maybe we could have been given a choice to install one or the other as our example of digital repository software. I think it was good to see Drupal because it is so ubiquitous, and to get an idea of the more technical end of the spectrum in digital collections management. And I think Omeka seems to represent the other end, the simple end of digital repository management.
As opposed to “close reading” of a text, “distant reading” allows the Lit Lab researchers to analyze not just one or two books but thousands of them at a time. This exploratory project is made possible by a vast reservoir of computing and library resources in addition to those working in the Lab.
via Recasting the humanities through ‘distant reading’ | Stanford Daily.
A repository software package’s homepage ought to be attractive, clear, and inspire the users’ confidence. Some of these homepages do that better than others; but they are also geared toward very different audiences. In general I think each site is geared toward the user that could best benefit from it.
- Eprints (http://www.eprints.org/): clearly states what it is, the interface is clean, and provides a live demo as well as links to documentation, downloads, and a description of the principles of open access.
- Omeka (http://omeka.org/): again, the homepage is well-designed, attractive, and provides clear links to all the information a user could want. It seems geared especially to draw in the new or uninitiated user (i.e. me). The user could be an individual rather than an institution.
- DSpace (http://www.dspace.org/): There is a lot of white space on this page. For some reason I find that intimidating. There is a very clear statement about what it is–if you know what an “institutional repository application” is. The logo at the top identifies it as a “scholar space” – This is definitely geared toward an institutional user/IT professional that already knows what an institutional repository is. This might be more confidence-building if you are an institutional administrator looking to find a turnkey application.
- Drupal (http://drupal.org/): This is a very busy page. But the tag line: “Come for the Software, Stay for the Community” is catchy. The page goes out of its way to show you how world-wide its scope is; you can tell that the software is geared for IT professionals and developers; they even have announcements about DrupalCons (a very geeky term for conventions). This is definitely a geek community and that means that I am not the sort of user they are targeting.
- PKP (Public Knowledge Project http://pkp.sfu.ca/): This site also has a lot of projects besides the harvester software. It takes a while exploring and reading to figure out what this site is and what it contains. It’s not for the casual user, and it seems to already assume that the user is committed to open source and open access projects.
- JHove (http://hul.harvard.edu/jhove/): This is a site full of technical jargon, definitely targeting the technical user, not the casual user or the repository administrator.
I guess each site has its advantages for the type of user it is seeking. As a librarian or a non-profit or museum curator, I find the first 3 more attractive and accessible.
Like many others in class, I had difficulty finding working links on the list of open archives service providers we were given. Most service providers seemed to collect scientific database metadata, and many sites were in foreign languages, which is great but not useful to me. There were only a few that provided harvests from humanities archives; these were almost exclusively the same sites that I found with my harvester. I also found many dead links; or sites that would not load.
I looked at the DL-Harvest site from the University of Arizona; its description states that “It brings together full-text, scholarly materials in the Information Sciences from many different OAI-PMH compliant repositories.” The site seemed true to its description, well organized, and I was able to search it easily. It would be very useful if I were looking for articles in library science or information management.
I looked at the OIAster site, which is the huge federated site that supports WorldCat. That site is truly amazing, and the search is faceted in very useful ways, mainly by type of resource, which works for me most of the time. I think this service is very very useful; there have been many times I have used WorldCat to find things I could not find in other ways. In particular, when I am searching for archival material it is invaluable. I think it is especially helpful when one is searching for unique items, or a set of items that is small in number. A general search would return too many hits. But again, being able to select archival material usually returns a manageable set. Sometimes the links don’t click through, or it can’t resolve a particular record to a particular library or collection; but that is pretty rare in my experience.
The Perseus site was really interesting. It collects metatdata from sites containing Greek and Latin texts, as well as Arabic and Old Norse. It was very easy to use and search; it also included a dictionary for each language. I liked the interface; when I searched for “shame” in the general search bar, I got hits that actually showed me the hits in context. It is very useful for the subject matter it covers.
It seems that services work best that have a very specific focus; or a very, very broad one with the faceted ability to cull the search results. The descriptions need to specify the language used by the federated site. Thet also need to indicate the last time the federated archive was updated.
We will not be covering Fedora Commons as a digital repository, so I wanted to experiment with it a little. I am interested in creating more of a semantic web than a traditional digital repository, and Fedora is designed to enable this data model. According to their website under the section “Fedora Basics,”
“While Fedora can easily be used to model digital collections of surrogates of traditional, catalog-based collections, it has been designed to be able to support durable web-like information architectures. Because each object completely contains all of the content, metadata and attributes of a unit of content, and can assert any number of relationships to any other object, it is easy to support schemes in which objects have multiple contexts with no dependencies upon each other.”
Fedora supports an RDF-like data model, where
“Relationships are asserted from the perspective of one object to another object as in the following general pattern:
<subjectFedoraObject> <relationshipProperty> <targetFedoraObject>
The first Fedora object is considered the “subject” of the relationship assertion. The relationship, itself, is considered a property of the subject. The target Fedora object is the related object. Thus, a valid relationship assertion as an English-language sentence might be:
<MyCatVideo> <is a member of the collection> <GreatCatVideos>”
They have an online sandbox environment that you can test. I will post more when I get the chance to play around.
I found Eprints to be more difficult to install than either DSpace or Drupal. I had problems creating repositories, and had to try four times before I got my repository irls675 installed and configured correctly. I still don’t know what went wrong. I eventually fixed the problems I was having by restoring a snapshot of my VM taken right after I installed Eprints but before I configured any repositories, and then going from that point. Once I got the repository irls675 configured and once I could access it via Firefox, things went more smoothly.
Not perfectly smoothly, however. Changing the description of the repository went as described. When I tried changing the logo, I had problems. I tried both methods mentioned in the instructions Bruce gave us. The first method did not work, but the second one did, which involved editing the file to reference the new file name for the logo, and then restarting the apache server and rebuilding the repository. Then we were offered the choice whether to change the theme. I elected not to install the “glass” theme, since others indicated that it did not look very different. I also chose to go with the LOC subject classifications rather than build my own taxonomy.
I found adding records to be easy, although I would like the ability to customize the fields. I had to put some information in the abstract field that I would have liked to create some special fields for. I found the LOC classifications to be limiting.
At first, searches would not return any of my records, even though I could browse and see them. BUT after I re-indexed the repository, then I could search all the fields, including the full-text of the attached files. Success!!
I installed two programs from the ePrints Bazaar: the “Batch-edit eprints via Excel Export/Import – version 1.0.0″ package, and the comments and notes package. To my delight, I can now add comments and notes to my records. I can’t figure out how to use the other package (the batch edit).
UK Reading Experience Database – Home.
Wow. Here is a database that may do some of what I want to do with my Literary Interlocutors database. I found the link when I was examining the “What Middletown Read” site. I will comment more on this when I have a chance to look at it more closely.