About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
legacy:vufind_1.x_developer_manual:supporting_a_new_metadata_format

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
other_than_marc [2010/01/12 18:41] – Moved interface spec into SVN. demiankatzlegacy:vufind_1.x_developer_manual:supporting_a_new_metadata_format [2015/12/14 16:52] – ↷ Links adapted because of a move operation demiankatz
Line 1: Line 1:
-====== Support for other record formats ======+====== Support for New Record Formats ======
  
-===== Problem =====+//This page refers to VuFind 1.x; for documentation on newer versions, see the [[development:howtos:supporting_a_new_metadata_format|VuFind 2.x version of the page]].//
  
-VuFind currently is tightly bound to MARCish record formats. The standard indexer ([[http://code.google.com/p/solrmarc/|solrmarc]]) handles MARCish records fast and efficently, but there is no support for other record formats at the momentEven if writing an appropriate indexer for other formats (which isn't so hardas long as it doesn't need to meet the quality of solrmarc), the "full record"/"details" view in VuFind still depends on a MARC record to show record data.+//Note: This feature was introduced in VuFind 1.0 If you are using VuFind 1.0RC2 or earlierplease upgrade to gain access to this functionality.//
  
-===== Solutions =====+===== Introduction =====
  
 +VuFind is currently bundled with a standard indexer ([[indexing:solrmarc|SolrMarc]]) that handles MARC records quickly and efficently.  However, VuFind also has the capability to support any other form of metadata if you are willing to do at least one of two things:
 +  * write an indexer to get the data into VuFind's index (necessary!)
 +  * write a "Record Driver" to display that data appropriately within the VuFind interface (optional, if the index-based display meets your needs)
  
-==== Indexer ==== +===== Indexer ===== 
-Indexing is not an integrated part of VuFind. For indexing MARCish records, [[http://code.google.com/p/solrmarc/|solrmarc]] is distributed with VuFind. For other record formats one needs different custom indexers. That's out of the scope of this document (and the VuFind project, isn't it?), so only general hints on how to create such an indexer are given here:+Indexing is not an integrated part of VuFind. For indexing MARC records, [[http://code.google.com/p/solrmarc/|solrmarc]] is distributed with VuFind. For other record formatsone needs different custom indexers. Some general hints on how to create such an indexer are given here.
  
 Writing a custom indexer for other record formats may be done in almost any programming language. Steps to be done: Writing a custom indexer for other record formats may be done in almost any programming language. Steps to be done:
   * parse record format   * parse record format
-  * map format entities to Solr index fields +  * map format entities to Solr index fields (see solr/biblio/conf/schema.xml for VuFind's index schema; the MARC mappings in import/marc.properties should be helpful for understanding the meaning of the various fields) 
-  * create XMLish document out of these fields+  * create XMLish document out of these fields (see the [[http://wiki.apache.org/solr/UpdateXmlMessages|Solr Wiki]] for details)
   * POST document to Solr's update handler (FYI: it doesn't have to be a POST per se; you can use SOLR/Lucene APIs to add documents directly to the index which is much faster)   * POST document to Solr's update handler (FYI: it doesn't have to be a POST per se; you can use SOLR/Lucene APIs to add documents directly to the index which is much faster)
  
 +Note that for any metadata format available as XML, you can save yourself some steps by using the [[:indexing:xml|XSLT tool]] provided with recent versions of VuFind.
  
 +===== Record Display =====
  
- +Record display is handled by a family of "Record Driver" classes that extract information from the stored Solr fields and return it through a standard interface.  The top-level parent Record Driver (found in web/RecordDrivers/IndexRecord.phprelies entirely on the Solr index fields, but children of this class (for example, web/RecordDrivers/MarcRecord.php) override and expand methods by using record-specific data extracted from the "fullrecord" field.  Whenever record information is needed by VuFind, a Record Driver is instantiated through the Record Driver Factory (web/RecordDrivers/Factory.php) -- the exact driver built is based on the "recordtype" field from Solr, and the default index-based Record Driver is used if no type-specific driver exists.
-==== Record display ==== +
-Currently full record display in VuFind is done by pulling a MARC record out of Solr index field fullrecord. The content to display is then taken out of MARC fields (e.g. 245$a as title...). +
- +
-There has been [[http://www.nabble.com/other-data-sources-for-the-index-to23867424.html|some discussion on the vufind-tech mailing]] list how to support other record formats, with a [[http://www.nabble.com/Re%3A-Using-a-defined-format-for-display-instead-of-MARC-p23930906.html|result]]. +
- +
-So the idea is: +
- +
-when indexing: +
-  * index whatever you want by mapping it to the index schema +
-  * put the original record into the "fullrecord" field in Solr as is (or somewhere else, may even be a flat file in the file system, because that one won't be searchable anyway, will it?) +
-  * put an "identifier" for the record format into an index field for that purpose lets call it recordtype +
- +
-when displaying: +
-  * when a record is retrieved for display, look at recordtype +
-  * call a class appropriate for handling that format +
-  * that class may fetch the full record and render it +
-  * if no special class for a record format exists, take a fallback generic class, that renders a view out of the stored Solr index fields +
- +
- +
- +
- +
- +
-===== Implementation ===== +
-Currently services/Record/Record.php implements the Object Record that parses MARC records, then applies them to interfaces/theme/[theme]/Record/view.tpl for display. Different childs of that Record object are created when the different full title views (Holdings, Details, Reviews, ...) are called. Record implements some methods for adding external data from WorldCat and a constructor that does the MARC parsing. +
- +
-One way to implement handling of other record formats would be by implementing new family of "Record Driver" classes that extract information from the stored Solr fields and return it through a standard interface.  The top-level parent Record Driver (let's call it the default Record Drivercould rely entirely on the Solr index fields, but children of this class could override and expand methods by using record-specific data extracted from the "fullrecord" field.  When record information is needed (by the current Record constructor or by other areas of code that currently extract MARC data directly), a Record Driver could be instantiated based on the "recordtype" field from Solr (using the default Record Driver if no type-specific driver exists). +
- +
-General "data enrichment services" (like the WorldCat Calls, and calls to extract tags and such from the MySQL database) may be moved to the default Record Driver and inherited by its children. +
- +
-==== Integration with Templates ==== +
- +
-The current view.tpl used by the Record module includes basic record details along with export options, clickable tabs and other framing details.  It uses a sub-template mechanism to fill in the contents of the tabbed area. +
- +
-For greater flexibility, this sub-template mechanism should be extended.  Many methods of the Record Driver interface will assign values to Smarty and then return the name of a template that can display the assigned values.  In some cases, this template can be displayed by itself, but in most cases, it will actually be assigned to Smarty and included as another sub-template. +
- +
- +
- +
- +
- +
- +
  
 ==== Driver API Specification ==== ==== Driver API Specification ====
  
-The interface that must be implemented by all Record Driver classes used to be part of this page.  Now that coding has begun in an SVN branch, you can see the file directly in the [[https://vufind.svn.sourceforge.net/svnroot/vufind/branches/record_drivers/web/RecordDrivers/Interface.php|SVN repository]].+The interface that must be implemented by all Record Driver classes can be viewed in the [[https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/RecordDrivers/Interface.php|SVN repository]].
  
 Record Driver naming conventions: Record Driver naming conventions:
  
   * Filename: web/RecordDrivers/[record type]Record.php (i.e. web/RecordDrivers/MarcRecord.php)   * Filename: web/RecordDrivers/[record type]Record.php (i.e. web/RecordDrivers/MarcRecord.php)
-  * Default Filename (if no match was found for [record type]Record.php): web/RecordDrivers/DefaultRecord.php +  * Class name: same as the filename -- [record type]Record (i.e. MarcRecord, IndexRecord, etc.)
-  * Class name: same as the filename -- [record type]Record (i.e. MarcRecord, DefaultRecord, etc.) +
-  * Config file: web/conf/[record type]Record.ini (i.e. MarcRecord.ini, DefaultRecord.ini) -- not sure yet if it will be necessary to have configuration options for record drivers, but it certainly might be.+
  
 ==== Notes ==== ==== Notes ====
  
-  * The public record driver interface is intentionally very abstract -- we want to be able to support records of all kinds, and we don't want to make any assumptions about the structure of the records. +  * The public Record Driver interface is intentionally very abstract -- we want to be able to support records of all kinds, and we don't want to make any assumptions about the structure of the records. 
-  * In spite of the generic interface, it is often useful to have very structured methods (getTitle, getAuthor, etc.).  These should be implemented as protected methods in the default Record Driver.  They are useful within the driver itself, but should never be accessed from the outside -- the interface between the record and the presentation layer must remain abstract to ensure future flexibility.  However, there is nothing wrong with taking advantage of structure within the record driver hierarchy to make implementation of similar record formats easier. +  * In spite of the generic interface, it is often useful to have very structured methods (getTitle, getAuthor, etc.).  These are implemented as protected methods in the default index-based Record Driver.  They are useful within the driver itself, but should never be accessed from the outside -- the interface between the record and the presentation layer must remain abstract to ensure future flexibility.  However, there is nothing wrong with taking advantage of structure within the record driver hierarchy to make implementation of similar record formats easier. 
-  * A lot of discussion has been deleted to reduce clutter on the page now that our ideas seem to be taking firmer shape.  No offense to any of the past contributors is intended -- please look at the old revisions if you are interested in seeing previous commentsespecially if you see flaws in what is here now and think old ideas and comments need to be brought back.+  * To summarize the previous two points: if your data format represents bibliographic data in a way roughly similar to MARC, you can probably extend the IndexRecord class and override a few methods to do what you need.  If your data format is wildly different, you probably want to build a whole new class based on the generic interfaceand you shouldn't worry about the internals of the IndexRecord class.
 ---- struct data ---- ---- struct data ----
 ---- ----
  
legacy/vufind_1.x_developer_manual/supporting_a_new_metadata_format.txt · Last modified: 2018/12/19 14:02 by demiankatz