<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Authoritative Opinion &#187; Experiments</title>
	<atom:link href="http://authoritativeopinion.com/blog/category/experiments/feed/" rel="self" type="application/rss+xml" />
	<link>http://authoritativeopinion.com/blog</link>
	<description></description>
	<lastBuildDate>Mon, 19 Jul 2010 00:04:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1-alpha</generator>
		<item>
		<title>A Fedora in a Pairtree</title>
		<link>http://authoritativeopinion.com/blog/2010/01/18/a-fedora-in-a-pairtree/</link>
		<comments>http://authoritativeopinion.com/blog/2010/01/18/a-fedora-in-a-pairtree/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 14:13:05 +0000</pubDate>
		<dc:creator><span property="dc:creator" resource="http://authoritativeopinion.com/blog/2010/01/18/a-fedora-in-a-pairtree/">chris</span></dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Repository]]></category>
		<category><![CDATA[cdl]]></category>
		<category><![CDATA[digital library]]></category>
		<category><![CDATA[fedora]]></category>
		<category><![CDATA[micro-services]]></category>

		<guid isPermaLink="false">http://authoritativeopinion.com/blog/?p=289</guid>
		<description><![CDATA[The California Digital Library (CDL) has released a number of exciting micro-services specifications for digital libraries. The Fedora repository from DuraSpace takes an opposite approach and has a monolithic applications comprised of a number of modules. With the modular approach, it should be possible to slip micro-services under the hood of Fedora easily. Here is [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.cdlib.org/inside/diglib/">California Digital Library</a> (CDL) has released a number of exciting micro-services specifications for digital libraries. The <a href="http://fedora-commons.org/">Fedora</a> repository from DuraSpace takes an opposite approach and has a monolithic applications comprised of a number of modules. With the modular approach, it should be possible to slip micro-services under the hood of Fedora easily.</p>
<p>Here is a first attempt at implementing the <a href="http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html">Pairtree filesystem hierarchy</a> for Fedora:</p>
<pre name="code" class="java">
package fedora.server.storage.lowlevel;

import java.io.File;
import java.util.Map;

import fedora.server.errors.LowlevelStorageException;

/**
 * @author Chris Beer
 */
class PairtreePathAlgorithm
        extends PathAlgorithm {

    private final String storeBase;

    private static final String SEP = File.separator;

    public PairtreePathAlgorithm(Map<String, ?> configuration) {
        super(configuration);
        storeBase = (String) configuration.get("storeBase");
    }

    @Override
    public final String get(String pid) throws LowlevelStorageException {
        return format(pid);
    }

    public String format(String pid) throws LowlevelStorageException {
        String pt = to_pairtree(pid);
		return storeBase + pt + "obj" + SEP + pid;
    }

    private String to_pairtree(String s) {
		String pt = SEP;
		String src = escape(s);

		int i = 0;
		while(i < src.length()) {
			pt += src.substring(i, i+2) + SEP;
            i+= 2;
		}

		if(i < src.length()) {
			pt += src.substring(i);
		}

		return pt;
    }
    private String escape(String s) {
		/*
		 Fedora PIDs do not support non-visible ASCII or the characters below,
		 so we skip hex encoding:
		 "   hex 22           <   hex 3c           ?   hex 3f
		 *   hex 2a           =   hex 3d           ^   hex 5e
		 +   hex 2b           >   hex 3e           |   hex 7c
		 ,   hex 2c
		 */
		return s.replace("/", "+").replace(":", "+").replace(".", ",");
    }
}
</pre>
<p>See also: <a href="http://gist.github.com/280020">http://gist.github.com/280020</a></p>
<p>This basic services replaces the Timestamp Path algorithm for FOXML storage and creates a minimally compliant Pairtree. A better implementation could add:</p>
<ul>
<li>Splitting Fedora datastreams into individual files on the filesystem. A first step would be to implement an appropriate managed content mapper</li>
<li>Add the appropriate identifier cleaning specified in §3. Much of this was omitted in this implementation, with the assumption that the repository core would handle identifier validation</li>
<li>The implementation should support pairtree initialization (§4). The current assumption is the repository maintainer would pre-establish a pairtree hierarchy for Fedora to populate. To do this properly, I think one would need to override the DefaultLowlevelStorageModule to add an initialization step.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://authoritativeopinion.com/blog/2010/01/18/a-fedora-in-a-pairtree/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
