Category Archives: SIRLS 675

These are posts related to the course Advanced Digital Collections

December 8, 2011 · 2:00 am

Unit 13 – I’ve learned a lot this semester!

This slideshow requires JavaScript.

I have learned so much this semester, it’s hard to know where to begin. But I guess I need to begin with metadata and taxonomy. Early in the semester I posted that “I read some articles this week that made me realize that much of what added value librarians provide to collections is in the form of metadata. I guess I always thought of librarians mainly as reference librarians or subject specialists — not as experts in classifying and indexing information.” But now I know (a little) about the value of good metadata, and taxonomies, and I’ve learned about metadata standards such as Dublin Core, that help to standardize metadata usage over the entire web. I’ve learned about the semantic web, and the idea of linking data and building ontologies that describe the relations between concepts. I’ve learned about the contrasting benefits of controlled vocabularies versus “folksonomy” (i.e., tagging). And I’ve learned a little about harvesting metadata, using PKPHarvester to harvest metadata from several databases and data providers.

I’ve also learned more about the Open Access movement, and open access initiatives; and about the issues of “freeing” information from behind paywalls. The main obstacle to this is that knowledge (and its associated data) is a currency that has value, and that making it freely available will necessitate basic structural changes in academia and in academic publishing.

Those structural changes include major changes in the role of the library and librarians in the production and preservation of knowledge. These changes present sigificant challenges to libraries in managing, curating and preserving digital materials and data. Librarians are increasingly expected to have the technical skills to design and select Content Management Systems for their libraries, to design, create, and maintain digital collections and digital repositories, and to train other librarians to do the same, often with limited technical staff and limited budgets. Open Source software is a boon to the small library or non-profit or museum that needs these types of functionality; but again that requires technical knowledge and skill on the part of the librarian to install, configure, and maintain operating systems and small in-house servers.

In order to gain those technical skills, I learned how to create several virtual machines with linux stacks of various sorts, and to install and configure four different content management/digital repository software systems (Drupal, DSpace, Eprints, and Omeka). I created a sample digital collection in each one, and used the experience to compare each system’s strengths and weaknesses, and then to decide which sorts of digital collections (and environments) each system is best suited for. I then chose which system to use to host my digital collection, set up the system, entered the records and created the metadata, and then wrote a paper on the process, which will contribute toward my digital portfolio. In my case, I decided upon Drupal as the system I want to use for my digital collection. Drupal has a steep learning curve, and I really learned a lot about Drupal in a short time through the process of designing my collection, downloading and installing extra modules to provide the functions I needed, and troubleshooting the installation. I’m proud of how well my prototype digital collection works, but I already have plans to keep working on the prototype to get it working even better, and to extend its functions, and to redesign certain features. I’m turning into a Drupal geek already.

Librarians are also expected to conduct outreach to their various communities, in order to make the services of the library more accessible and useful. This seems to be especially needful for the humanities scholar community. We read many articles about the obstacles that keep humanities scholars from embracing digital initiatives, and from using digital resources (and those articles confirmed my own observations). We learned about how humanities scholarship, data, and workflows are vastly different than those of the scientific community; identified some of the obstacles that prevent humanities scholars from using (or producing) digital resources, including digital repositories; and read about several digital humanities initiatives, both in the U.S. and in Europe.

I think one of the most enjoyable aspects of the course for me was just the chance to see so many different examples of digital collections; to interact with my fellow students over their collections and interests; and to explore what is already being done, what is possible and useful. A find that was very helpful to me was the UK Reading Experience Database (RED). This is a database that contains much of the same sorts of data that I wanted to collect in my own digital collection, so it gave me some assurance that I was on the right track with my ideas.

Finally, I have included a slideshow of some screen shots from my project.

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, digital repository, digital surrogate, Drupal, dSPace, ePrints, George Eliot, Library science, metadata, Omeka, Project Management, semantic web, SIRLS 675, taxonomy, Ubuntu Linux, VRE (Virtual Research Environment)

Tagged as Drupal, learning

November 28, 2011 · 3:14 pm

Unit 12: pre-installed VM versus DIY

I think the question here is really: what skills are really needed in a librarian working in digital collections? That is not easy to answer, because libraries vary so much in terms of staff size, budget, and training. I certainly feel much more confident about installing and configuring virtual machines (VMs) as a result of this course; but I wonder if it was the best use of class assignment time, especially since it could be so time-consuming. But if I was the sole librarian in a small non-profit or museum, with no technical staff, and I wanted to create and host a digital collection, the ability to create a VM from scratch would be important. I guess the question is, how common is this scenario, and how common will it be in the future? And what sorts of librarianship does the DigIN program want to support?

I don’t know if a middle ground might have worked better; maybe install only 3 VMS and spend more time on metadata and actually working with collections. I think DSpace and Eprints were sort of repetitive; maybe we could have been given a choice to install one or the other as our example of digital repository software. I think it was good to see Drupal because it is so ubiquitous, and to get an idea of the more technical end of the spectrum in digital collections management. And I think Omeka seems to represent the other end, the simple end of digital repository management.

Leave a comment

Filed under Content Management, Digital Collections, digital repository, Drupal, dSPace, ePrints, Library science, Omeka, SIRLS 675

November 27, 2011 · 11:49 pm

Unit 11: Repository Software Homepages – an assessment

A repository software package’s homepage ought to be attractive, clear, and inspire the users’ confidence. Some of these homepages do that better than others; but they are also geared toward very different audiences. In general I think each site is geared toward the user that could best benefit from it.

Eprints (http://www.eprints.org/): clearly states what it is, the interface is clean, and provides a live demo as well as links to documentation, downloads, and a description of the principles of open access.
Omeka (http://omeka.org/): again, the homepage is well-designed, attractive, and provides clear links to all the information a user could want. It seems geared especially to draw in the new or uninitiated user (i.e. me). The user could be an individual rather than an institution.
DSpace (http://www.dspace.org/): There is a lot of white space on this page. For some reason I find that intimidating. There is a very clear statement about what it is–if you know what an “institutional repository application” is. The logo at the top identifies it as a “scholar space” – This is definitely geared toward an institutional user/IT professional that already knows what an institutional repository is. This might be more confidence-building if you are an institutional administrator looking to find a turnkey application.
Drupal (http://drupal.org/): This is a very busy page. But the tag line: “Come for the Software, Stay for the Community” is catchy. The page goes out of its way to show you how world-wide its scope is; you can tell that the software is geared for IT professionals and developers; they even have announcements about DrupalCons (a very geeky term for conventions). This is definitely a geek community and that means that I am not the sort of user they are targeting.
PKP (Public Knowledge Project http://pkp.sfu.ca/): This site also has a lot of projects besides the harvester software. It takes a while exploring and reading to figure out what this site is and what it contains. It’s not for the casual user, and it seems to already assume that the user is committed to open source and open access projects.
JHove (http://hul.harvard.edu/jhove/): This is a site full of technical jargon, definitely targeting the technical user, not the casual user or the repository administrator.

I guess each site has its advantages for the type of user it is seeking. As a librarian or a non-profit or museum curator, I find the first 3 more attractive and accessible.

Leave a comment

Filed under Content Management, Digital Collections, digital repository, Drupal, dSPace, ePrints, Omeka, SIRLS 675

November 25, 2011 · 9:48 pm

Unit 10 – Open Archives Service Providers

Like many others in class, I had difficulty finding working links on the list of open archives service providers we were given. Most service providers seemed to collect scientific database metadata, and many sites were in foreign languages, which is great but not useful to me. There were only a few that provided harvests from humanities archives; these were almost exclusively the same sites that I found with my harvester. I also found many dead links; or sites that would not load.

I looked at the DL-Harvest site from the University of Arizona; its description states that “It brings together full-text, scholarly materials in the Information Sciences from many different OAI-PMH compliant repositories.” The site seemed true to its description, well organized, and I was able to search it easily. It would be very useful if I were looking for articles in library science or information management.

I looked at the OIAster site, which is the huge federated site that supports WorldCat. That site is truly amazing, and the search is faceted in very useful ways, mainly by type of resource, which works for me most of the time. I think this service is very very useful; there have been many times I have used WorldCat to find things I could not find in other ways. In particular, when I am searching for archival material it is invaluable. I think it is especially helpful when one is searching for unique items, or a set of items that is small in number. A general search would return too many hits. But again, being able to select archival material usually returns a manageable set. Sometimes the links don’t click through, or it can’t resolve a particular record to a particular library or collection; but that is pretty rare in my experience.

The Perseus site was really interesting. It collects metatdata from sites containing Greek and Latin texts, as well as Arabic and Old Norse. It was very easy to use and search; it also included a dictionary for each language. I liked the interface; when I searched for “shame” in the general search bar, I got hits that actually showed me the hits in context. It is very useful for the subject matter it covers.

It seems that services work best that have a very specific focus; or a very, very broad one with the faceted ability to cull the search results. The descriptions need to specify the language used by the federated site. Thet also need to indicate the last time the federated archive was updated.

Leave a comment

Filed under Digital Collections, digital repository, SIRLS 675

November 21, 2011 · 11:44 pm

Unit 7 – Experimenting with Fedora Commons

We will not be covering Fedora Commons as a digital repository, so I wanted to experiment with it a little. I am interested in creating more of a semantic web than a traditional digital repository, and Fedora is designed to enable this data model. According to their website under the section “Fedora Basics,”

“While Fedora can easily be used to model digital collections of surrogates of traditional, catalog-based collections, it has been designed to be able to support durable web-like information architectures. Because each object completely contains all of the content, metadata and attributes of a unit of content, and can assert any number of relationships to any other object, it is easy to support schemes in which objects have multiple contexts with no dependencies upon each other.”

Fedora supports an RDF-like data model, where

“Relationships are asserted from the perspective of one object to another object as in the following general pattern:

<subjectFedoraObject> <relationshipProperty> <targetFedoraObject>

The first Fedora object is considered the “subject” of the relationship assertion. The relationship, itself, is considered a property of the subject. The target Fedora object is the related object. Thus, a valid relationship assertion as an English-language sentence might be:

<MyCatVideo> <is a member of the collection> <GreatCatVideos>”

They have an online sandbox environment that you can test. I will post more when I get the chance to play around.

Leave a comment

Filed under Digital Collections, Library science, semantic web, SIRLS 675

Tagged as Digital Repository, Fedora Commons

November 20, 2011 · 3:20 am

Unit 8 – Eprints install and branding

I found Eprints to be more difficult to install than either DSpace or Drupal. I had problems creating repositories, and had to try four times before I got my repository irls675 installed and configured correctly. I still don’t know what went wrong. I eventually fixed the problems I was having by restoring a snapshot of my VM taken right after I installed Eprints but before I configured any repositories, and then going from that point. Once I got the repository irls675 configured and once I could access it via Firefox, things went more smoothly.

Not perfectly smoothly, however. Changing the description of the repository went as described. When I tried changing the logo, I had problems. I tried both methods mentioned in the instructions Bruce gave us. The first method did not work, but the second one did, which involved editing the file to reference the new file name for the logo, and then restarting the apache server and rebuilding the repository. Then we were offered the choice whether to change the theme. I elected not to install the “glass” theme, since others indicated that it did not look very different. I also chose to go with the LOC subject classifications rather than build my own taxonomy.

I found adding records to be easy, although I would like the ability to customize the fields. I had to put some information in the abstract field that I would have liked to create some special fields for. I found the LOC classifications to be limiting.

At first, searches would not return any of my records, even though I could browse and see them. BUT after I re-indexed the repository, then I could search all the fields, including the full-text of the attached files. Success!!

I installed two programs from the ePrints Bazaar: the “Batch-edit eprints via Excel Export/Import – version 1.0.0″ package, and the comments and notes package. To my delight, I can now add comments and notes to my records. I can’t figure out how to use the other package (the batch edit).

Leave a comment

Filed under Digital Collections, digital repository, ePrints, SIRLS 675

Tagged as Eprints

October 14, 2011 · 1:44 pm

Unit 7 – Digital Humanities Centers and libraries as “Third Spaces.”

I found this 2008 report, A Survey of Digital Humanities Centers in the United States, on the Council of Library and Information Resources website. It is a massive document, describing the survey’s goals, methods, and findings, as well as identifying trends, issues, and placing DHCs in the broader context of other models, including the sciences. I can only pick out a few ideas to talk about here, but I recommend the report for anyone interested in the state of the digital humanities in the U.S.

The foreword to the report identifies DHCs –and libraries– as examples of

“interdisciplinary ‘third places’—a term sociologist Ray Oldenburg has used to identify a social space, distinct from home and workplace. Third places foster important ties and are critical to community life. Familiar examples are barbershops, beauty salons, and coffee shops where, in the age of wireless, we see tables of students hunched over laptops, textbooks, and notepads. The academic library plays a role similar to that of a third place, providing resources, seminar rooms, and collaborative work spaces. It probably should not surprise us that both centers and libraries are frequently cited as elements in the emerging cyberinfrastructure to support advanced research in the sciences, technology, and humanities.”

Such third spaces are an important part of the “emerging cyberinfrastructure” precisely because of the issues we have been identifying in class: the insular nature of traditional humanities research and reward structures, and institutional inertia or resistance. Thus “the centers, whether virtual or physical, effectively become safe places, hospitable to innovation and experimentation.” Such shared infrastructure “requires compromise, negotiation, and, ultimately, trust” since they are “cooperative social systems.” I think the idea of “trust” goes back to the way the humanities (and academic institutions in general) create cultural capital: and a new model is needed, one that values cooperative research, as in the sciences. The third space approach may be the beginning, as third spaces provide “safe” places outside disciplinary and institutional boundaries to forge new alliances and to foster trust, to forge new structures of cultural and academic value.

However, humanities funding and research structures/sources in the U.S. tend to work against the idea of a third space. DHCs are almost exclusively associated with universities in the U.S. The executive summary identifies the tendency for DHCs in the U.S. to be “silos” that, because they “favor individual projects that address specialized research interests,” do not “effectively leverage resources community-wide.” This silo effect is an “inefficient use of the scarce resources available to the humanities community,” and leaves these projects at risk for “being orphaned over time.” The executive summary concludes that “new models are needed for large-scale cyberinfrastructure projects” and suggests that “the sciences offer a useful framework. Large-scale collaborations in the sciences have been the subject of research that examines the organizational structures and behaviors of these entities and identifies the criteria needed to ensure their success. The humanities should look to this work in planning its own strategies for regional or national models of collaboration.”

Section 6.3 of the report suggests that regional and national centers will be a necessary future development for the digital humanities, since “the form of collaboration that takes place in today’s centers is also inadequate for future scholarship. The differences between the small-scale, narrowly focused collaborations common among DHCs, and the more coordinated, large-scale organizational collaborations characteristic of regional and national centers are more than just differences in size and degree. They involve wholly new processes of management, communication, and interaction.” Thus suggests that this third space will become itself standardized and institutionalized (much like the internet itself as a third space) and require the development of a fourth space (as some researchers are now developing alternate internets). But I digress…..

The executive summary does not address where the funding for such regional and national models is to come from. In section 4.5.2 of the report, centers could often not account for all of their funding sources, which were a hodgepodge of grants, funding lines, and other sources. “It is, however, certain that universities, followed by grants and foundations, are the most frequently cited funding sources for centers.” So long as funding comes from such heterogeneous sources, and especially from universities whose narrow interests will dictate which projects get funded, it seems that a larger model will not be feasible. But getting funding on the state or national level will be difficult, given the current financial/fiscal crisis in this country. It seems that the European Union is ahead of the U.S. with their Europeana initiative (which is an immensely-scaled cyberinfrastructure project funded by the EU). Of course, if the EU goes broke bailing out its weaker members, perhaps all bets are off. But I think even the effort will have been worthwhile because of the way it has intensified the development of protocols and standards, and raised international awareness.

The report addresses the issue of motivation: many DHCs did not see the need for large-scale collaboration or regional/national centers. Section 6.4 asks, “As digital humanities computing becomes an integrative, multi-team endeavor, the motivations, support structures, and reward systems that make for successful collaboration become critically important. What aspects of collaboration may be critical to the success of regional or national centers?” (emphasis mine). The report suggests several aspects:

Compelling, Community-Wide Research Needs – examples such as digital preservation issues, developing repositories for digital collections, and the creation of large datasets
No Center Left Behind – clarification of the role of individual DHCs in the context of regional and national centers, so that current investment is not lost.
Trust as the Tie that Binds – The ability to trust the level of prestige/cultutal capital associated with a center: “Academic tenure-and-review committees have long been accused of failing to give credence to digital scholarship. Michael Shanks, codirector of the Stanford Humanities Lab, believes the reason for their hesitation is rooted in trust. These committees want to know if an individual on a team has done the work, or if he or she is simply riding on someone else’s coattails.” The report suggests regional and national centers will confer more prestige and thus, more trust.
Individual Motivations – suggests that web 2.0 technologies to give feedback and confer prestige to contributors will help, since “reward systems that enhance the personal reputation of contributors are important.” The report also suggests that structural motivations/requirements such as exist in the sciences will help (i.e., requiring the sharing of deposits of data in repositories as a prerequisite for publication and/or funding).
The Nature of the Work – “successful large-scale collaborations occur most frequently when the work is easily divided into components.”

So the report assesses where we currently are and suggests a needed direction; but it doesn’t really have much in the way of concrete suggestions except to look to the sciences for institutional and structural models. This is probably because it appears that the digital humanities are still building consensus that such regional and national centers are needed, and that such institutional and disciplinary changes are requisite.

Where does the library fit in? The report refers to libraries as other examples of such “third spaces,” and it seems that libraries as a whole are much more aware of and committed to developing such regional and national structures/centers. It seems that libraries can thus take the lead in developing such centers; which would put them squarely in the center of the developing cyberinfrastructure in the U.S. But the study’s author is not a librarian per se, but a museum specialist; according to the website, the report’s author, Diane M. Zorich, “is a cultural heritage consultant specializing in planning and managing the delivery of cultural information. Her clients include the J. Paul Getty Trust, the American Association of Museums, the Smithsonian Institution, RLG Programs/OCLC, and many other cultural organizations and institutions . . . . [She] has graduate degrees in anthropology and museum studies.” So libraries need to really work on developing such resources and connections with cultural organizations in order to make this work.

Leave a comment

Filed under Digital Humanities, Library science, SIRLS 675

Tagged as digital humanities, library

October 4, 2011 · 2:01 pm

Unit 6 – DSpace Install notes and comments

I was able to install and configure DSpace with no problems; we followed these steps:

We set up a new virtual machine, and built, not a LAMP stack, but a LTPJ stack: Linux-Tomcat-PostgreSQL-Java. Once those programs were installed, we needed to create all the structure for DSpace: we used sudo to create linux directories and users for DSpace, set their permissions, and then set up a related user and space in PostgreSQL. Then we set up a DSpace database and directories in Tomcat.

Once those structures were ready, then we downloaded the DSpace source code and set up a configuration file, then used maven to actually “build” the installation according to the configuration we specified. I’m guessing that means maven compiled all the code using the modules and settings we specified in the configuration files. The we used ant to do a “fresh install” – I guess it installed the compiled binary code that maven created.

The we had to create a DSpace administrator/user at the linux command line and edit some configuration files to give that user privileges; then we rebooted the system and were then able to access DSpace from the browser and set up our collection.

—-

The alternate instructions Bruce suggests at

https://wiki.duraspace.org/display/DSPACE/Installing+DSpace+1.7+on+Ubuntu

http://wiki.lib.sun.ac.za/index.php/SUNScholar/Dspace

look like they would be followable; although the comments on those instructions show there is some room for error in interpretation. The details of the steps are different than what Bruce gave us, but they seem to follow the same general outline. I’m not sure I could follow them without technical support. Bruce’s step-by-step commands are probably best if you are going to try to do this without support; but the screenshots in the second link are probably helpful; and I like the clear delineation of steps in the first link.

Leave a comment

Filed under dSPace, Operating systems, SIRLS 675

Tagged as dSpace, Linux, virtual machine

October 1, 2011 · 4:44 pm

DuraSpace | Open technologies for durable digital content

DuraSpace | Open technologies for durable digital content.

This is the home page of the non-profit organization that runs DSpace and Fedora Commons. I wanted to link it here for future reference.

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, dSPace, SIRLS 675

Tagged as dSpace

September 27, 2011 · 1:49 pm

Unit 5 – Using Drupal for my digital collection

Discuss either a) which module you decided to try to try from assignment 2 and how it enhances your collection; include if you like any problems or tips related to installation; or
b) now that you have some experience, how you feel overall about the suitability of Drupal for your collection.

It is clear that Drupal, in the hands of a trained Drupal programmer, would be a powerful and customized tool that could be used to manage my digital collection; although it seems that it is not really designed for the type of content I would like to include: many large searchable text files (in pdf or other formats, especially including files with specialized markup). When I say that it is not really designed for it, I mean that the native content types don’t lend themselves to it (although I have not experimented with the “book” type). Of course there are many modules that add that type of functionality; I saw several that seemed designed to make RDF-type relations between nodes; but I was too intimidated by all the dependencies to try to install such modules, and the help material was too highly technical for a casual Drupal user to understand.

I did find an apparently simple module that added some necessary functionality to my site, i.e., the ability to search attached text files. The module is called, appropriately, search-files.Here is a screenshot of the kind of output the module produces:

Because this is a crucial function for my collection, I decided to install it, even though it requires several “helper applications” in Linux.

Helper Applications

In order to extract text, this module calls ‘helper apps’ such as cat and pdftotext. Drupal administrators can configure any helpers they like. Helper apps need to be installed on the server and need to be setup to print to stdout.

Most Linux distributions have the following helper apps available:

cat – generic text (txt) files

pdftotext – Adobe Acrobat (pdf) Documents

catdoc – Microsoft Word (doc) Documents

xls2csv – Microsoft Excel (xls) files

catppt – Microsoft Power Point (ppt) files

unrtf – Rich Text Format (rtf) files

For more information about helpers and how to configure them, see hints for Linux and Windows. It is also possible to configure helpers in a shared hosting environment.

I assumed that my Linux installation might already have these applications available, although I could enable them separately if need be. So I downloaded and installed search_files-6.x-1.6.

I had no difficulty installing it or configuring it in Drupal. But it can’t search the pdf files I have attached, so I’m assuming I also need to install the helper applications in Linux.

UPDATE: as it turns out, this module worked in Drupal 5 but is broken in Drupal 6. Evidently it works in Drupal 7, so hopefully when I update my system I can get this working. Else I will need to find a different CMS, because this search functionality is crucial.

Leave a comment

Filed under Content Management, Digital Collections, Digital Humanities, Drupal, Operating systems, SIRLS 675

Tagged as digital collection, Drupal

Category Archives: SIRLS 675

Unit 13 – I’ve learned a lot this semester!

Unit 12: pre-installed VM versus DIY

Unit 11: Repository Software Homepages – an assessment

Unit 10 – Open Archives Service Providers

Unit 7 – Experimenting with Fedora Commons

Unit 8 – Eprints install and branding

Unit 7 – Digital Humanities Centers and libraries as “Third Spaces.”

Unit 6 – DSpace Install notes and comments

DuraSpace | Open technologies for durable digital content

Unit 5 – Using Drupal for my digital collection

Helper Applications

Recent Posts

Archives

Topics