The California Digital Library (CDL) has released a number of exciting micro-services specifications for digital libraries. The Fedora repository from DuraSpace takes an opposite approach and has a monolithic applications comprised of a number of modules. With the modular approach, it should be possible to slip micro-services under the hood of Fedora easily.
Here is a first attempt at implementing the Pairtree filesystem hierarchy for Fedora:
package fedora.server.storage.lowlevel;
import java.io.File;
import java.util.Map;
import fedora.server.errors.LowlevelStorageException;
/**
* @author Chris Beer
*/
class PairtreePathAlgorithm
extends PathAlgorithm {
private final String storeBase;
private static final String SEP = File.separator;
public PairtreePathAlgorithm(Map configuration) {
super(configuration);
storeBase = (String) configuration.get("storeBase");
}
@Override
public final String get(String pid) throws LowlevelStorageException {
return format(pid);
}
public String format(String pid) throws LowlevelStorageException {
String pt = to_pairtree(pid);
return storeBase + pt + "obj" + SEP + pid;
}
private String to_pairtree(String s) {
String pt = SEP;
String src = escape(s);
int i = 0;
while(i < src.length()) {
pt += src.substring(i, i+2) + SEP;
i+= 2;
}
if(i < src.length()) {
pt += src.substring(i);
}
return pt;
}
private String escape(String s) {
/*
Fedora PIDs do not support non-visible ASCII or the characters below,
so we skip hex encoding:
" hex 22 < hex 3c ? hex 3f
* hex 2a = hex 3d ^ hex 5e
+ hex 2b > hex 3e | hex 7c
, hex 2c
*/
return s.replace("/", "+").replace(":", "+").replace(".", ",");
}
}
See also: http://gist.github.com/280020
This basic services replaces the Timestamp Path algorithm for FOXML storage and creates a minimally compliant Pairtree. A better implementation could add:
- Splitting Fedora datastreams into individual files on the filesystem. A first step would be to implement an appropriate managed content mapper
- Add the appropriate identifier cleaning specified in §3. Much of this was omitted in this implementation, with the assumption that the repository core would handle identifier validation
- The implementation should support pairtree initialization (§4). The current assumption is the repository maintainer would pre-establish a pairtree hierarchy for Fedora to populate. To do this properly, I think one would need to override the DefaultLowlevelStorageModule to add an initialization step.
Like
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.