About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
developers_call:minutes20130205

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
developers_call:minutes20130205 [2013/02/05 15:39] demiankatzdevelopers_call:minutes20130205 [2014/06/13 13:14] (current) – external edit 127.0.0.1
Line 44: Line 44:
 ==== 5. DataImportHandler Update ==== ==== 5. DataImportHandler Update ====
  
-Filipe has continued experimenting with the Solr DataImportHandler using an XML file containing the [[http://ocw.mit.edu/index.htm|MIT OCW]] record set.  He has run into some limitations of the XPathEntityProcessor and a problem with the DataImportHandler failing to respect <copyField> directives in the Solr schema.+Filipe has continued experimenting with the Solr DataImportHandler using an [[http://ocw.mit.edu/rss/all/mit-allcourses.xml|XML file (warning: large)]] containing the [[http://ocw.mit.edu/index.htm|MIT OCW]] record set.  He had to work around some limitations of the XPathEntityProcessor and a problem with the DataImportHandler failing to respect <copyField> directives in the Solr schema.  Eventually these problems were overcome -- here is an [[http://pastebin.com/rvvaXD51|example configuration]] used to push data through the DataImportHandler after first preprocessing it with some XSLT transforms. 
 + 
 +Filipe has also experimented with using Google Reader to pull data from RSS feeds for import.  In the OCW example, RSS data is available to supplement the initial raw XML document.  There are limitations on how many records can be retrieved in this fashion (1,000 per pack) and the resulting XML requires some preprocessing (Filipe used sed) before it can be cleanly imported through the DataImportHandler.
  
 ==== 6. Other Topics? ==== ==== 6. Other Topics? ====
Line 54: Line 56:
 The next call will be Tuesday, February 19, 2013 at 10am Eastern Standard Time (15:00 GMT). The next call will be Tuesday, February 19, 2013 at 10am Eastern Standard Time (15:00 GMT).
 ---- struct data ---- ---- struct data ----
 +properties.Page Owner : 
 ---- ----
  
developers_call/minutes20130205.1360078762.txt.gz · Last modified: 2014/06/13 13:13 (external edit)