VuFind API Documentation

VuFindSitemap extends VuFind
in package

XSLT support class -- all methods of this class must be public and static; they will be automatically made available to your XSL stylesheet for use with the php:function() function.

Tags
category

VuFind

author

Demian Katz demian.katz@villanova.edu

license

http://opensource.org/licenses/gpl-2.0.php GNU General Public License

link

Wiki

Table of Contents

ISO8601_FORMAT  = 'Y-m-d\\TH:i:s\\Z'
ISO8601 date format string
$serviceLocator  : ServiceLocatorInterface
Service locator
arrayToSolrXml()  : string
Convert an associative array of fields into a Solr document.
explode()  : DOMDocument
Proxy the explode PHP function for use in XSL transformation.
extractBestDateOrRange()  : string
Try to find the best single year or date range in a set of DOM elements.
extractEarliestYear()  : string
Try to find a four-digit year in a set of DOM elements.
getApertureCommand()  : string
Generic method for building Aperture Command
getChangeTracker()  : ChangeTrackerServiceInterface
Get the change tracker service object.
getConfig()  : Config
Get a configuration file.
getDocument()  : string
Harvest the contents of a document file (PDF, Word, etc.) using Aperture.
getFirstIndexed()  : string
Get the date/time of the first time this record was indexed.
getLastIndexed()  : string
Get the date/time of the most recent time this record was indexed.
getParser()  : string
Read parser method from fulltext.ini
getTikaCommand()  : array<string|int, mixed>
Generic method for building Tika command
harvestTextFile()  : string
Harvest the contents of a text file for inclusion in the output.
harvestWithAperture()  : string
Harvest the contents of a document file (PDF, Word, etc.) using Aperture.
harvestWithParser()  : string
Call parsing method based on parser setting in fulltext.ini
harvestWithTika()  : string
Harvest the contents of a document file (PDF, Word, etc.) using Tika.
implode()  : string
Proxy the implode PHP function for use in XSL transformation.
invertName()  : string
Invert "Firstname Lastname" authors into "Lastname, Firstname."
invertNames()  : DOMDocument
Call invertName on all matching elements; return a DOMDocument with a name tag for each inverted name.
isInvertedName()  : bool
Is the provided name inverted ("Last, First") or not ("First Last")?
mapString()  : string
Map string using a config file from the translation_maps folder.
removeOuterBrackets()  : string
Remove single square bracket characters if they are the start and/or end chars (matched or unmatched) and are the only square bracket chars in the string.
removeTagAndReturnXMLasText()  : string
Remove a given tag from the provided nodes, then convert into XML and return as text. This is useful for populating the fullrecord field with the raw input XML but allow for removal of certain elements (eg: full text field).
setServiceLocator()  : void
Set the service locator.
solrMarcStyleCleanData()  : string
Port of logic from SolrMarc's DataUtil::cleanData method.
stripAccents()  : string
Strip accents from a string.
stripArticles()  : string
Strip articles from the front of the text (for creating sortable titles).
stripBadChars()  : string
Strip illegal XML characters from a string.
stripPunctuation()  : string
Strip punctuation from a string.
titleSortLower()  : string
Perform text processing roughly equivalent to SolrMarc's titleSortLower feature to allow consistent indexing into the title_sort field.
xmlAsText()  : string
Convert provided nodes into XML and return as text. This is useful for populating the fullrecord field with the raw input XML.
getApertureFields()  : array<string|int, mixed>
Load metadata about an HTML document using Aperture.
getDocumentFieldArray()  : array<string|int, mixed>
Support method for getDocument() -- retrieve associative array of field data.
getHtmlFields()  : array<string|int, mixed>
Extract key metadata from HTML.
getTikaFields()  : array<string|int, mixed>
Load metadata about an HTML document using Tika.

Constants

ISO8601_FORMAT

ISO8601 date format string

protected string ISO8601_FORMAT = 'Y-m-d\\TH:i:s\\Z'

Properties

$serviceLocator

Service locator

protected static ServiceLocatorInterface $serviceLocator

Methods

arrayToSolrXml()

Convert an associative array of fields into a Solr document.

public static arrayToSolrXml(array<string|int, mixed> $fields) : string
Parameters
$fields : array<string|int, mixed>

Field data

Return values
string

explode()

Proxy the explode PHP function for use in XSL transformation.

public static explode(string $delimiter, string $string) : DOMDocument
Parameters
$delimiter : string

Delimiter for splitting $string

$string : string

String to split

Return values
DOMDocument

extractBestDateOrRange()

Try to find the best single year or date range in a set of DOM elements.

public static extractBestDateOrRange(array<string|int, mixed> $input) : string

Best is defined as the first value to consist of only YYYY or YYYY-ZZZZ, with no other text. If no "best" match is found, the first value is used.

Parameters
$input : array<string|int, mixed>

DOM elements to search.

Return values
string

extractEarliestYear()

Try to find a four-digit year in a set of DOM elements.

public static extractEarliestYear(array<string|int, mixed> $input) : string
Parameters
$input : array<string|int, mixed>

DOM elements to search.

Return values
string

getApertureCommand()

Generic method for building Aperture Command

public static getApertureCommand(string $input, string $output[, string $method = 'webcrawler' ]) : string
Parameters
$input : string

name of input file | url

$output : string

name of output file

$method : string = 'webcrawler'

webcrawler | filecrawler

Return values
string

command to be executed

getConfig()

Get a configuration file.

public static getConfig([string $config = 'config' ]) : Config
Parameters
$config : string = 'config'

Configuration name

Return values
Config

getDocument()

Harvest the contents of a document file (PDF, Word, etc.) using Aperture.

public static getDocument(string $url) : string

This method will only work if Aperture is properly configured in the web/conf/fulltext.ini file. Without proper configuration, this will simply return an empty string.

Parameters
$url : string

URL of file to retrieve.

Return values
string

text contents of file.

getFirstIndexed()

Get the date/time of the first time this record was indexed.

public static getFirstIndexed(string $core, string $id, string $date) : string
Parameters
$core : string

Solr core holding this record.

$id : string

Record ID within specified core.

$date : string

Date record was last modified.

Return values
string

First index date/time.

getLastIndexed()

Get the date/time of the most recent time this record was indexed.

public static getLastIndexed(string $core, string $id, string $date) : string
Parameters
$core : string

Solr core holding this record.

$id : string

Record ID within specified core.

$date : string

Date record was last modified.

Return values
string

Latest index date/time.

getParser()

Read parser method from fulltext.ini

public static getParser() : string
Return values
string

Name of parser to use (i.e. Aperture or Tika)

getTikaCommand()

Generic method for building Tika command

public static getTikaCommand(string $input, string $output, string $arg) : array<string|int, mixed>
Parameters
$input : string

url | fileresource

$output : string

name of output file

$arg : string

optional Tika arguments

Return values
array<string|int, mixed>

Parameters for proc_open command

harvestTextFile()

Harvest the contents of a text file for inclusion in the output.

public static harvestTextFile(string $url) : string
Parameters
$url : string

URL of file to retrieve.

Return values
string

file contents.

harvestWithAperture()

Harvest the contents of a document file (PDF, Word, etc.) using Aperture.

public static harvestWithAperture(string $url[, string $method = 'webcrawler' ]) : string

This method will only work if Aperture is properly configured in the fulltext.ini file. Without proper configuration, this will simply return an empty string.

Parameters
$url : string

URL of file to retrieve.

$method : string = 'webcrawler'

webcrawler | filecrawler

Return values
string

text contents of file.

harvestWithParser()

Call parsing method based on parser setting in fulltext.ini

public static harvestWithParser(string $url) : string
Parameters
$url : string

URL to harvest

Return values
string

Text contents of URL

harvestWithTika()

Harvest the contents of a document file (PDF, Word, etc.) using Tika.

public static harvestWithTika(string $url[, string $arg = '--text' ]) : string

This method will only work if Tika is properly configured in the fulltext.ini file. Without proper configuration, this will simply return an empty string.

Parameters
$url : string

URL of file to retrieve.

$arg : string = '--text'

optional argument(s) for Tika

Return values
string

text contents of file.

implode()

Proxy the implode PHP function for use in XSL transformation.

public static implode(string $glue, array<string|int, mixed> $pieces) : string
Parameters
$glue : string

Glue string

$pieces : array<string|int, mixed>

DOM elements to join together.

Return values
string

invertName()

Invert "Firstname Lastname" authors into "Lastname, Firstname."

public static invertName(string $rawName) : string
Parameters
$rawName : string

Raw name

Return values
string

invertNames()

Call invertName on all matching elements; return a DOMDocument with a name tag for each inverted name.

public static invertNames(array<string|int, mixed> $input) : DOMDocument
Parameters
$input : array<string|int, mixed>

DOM elements to adjust

Return values
DOMDocument

isInvertedName()

Is the provided name inverted ("Last, First") or not ("First Last")?

public static isInvertedName(string $name) : bool
Parameters
$name : string

Name to check

Return values
bool

mapString()

Map string using a config file from the translation_maps folder.

public static mapString(string $in, string $filename) : string
Parameters
$in : string

string to map.

$filename : string

filename of map file

Return values
string

mapped text.

removeOuterBrackets()

Remove single square bracket characters if they are the start and/or end chars (matched or unmatched) and are the only square bracket chars in the string.

public static removeOuterBrackets(string $str) : string

Ported from SolrMarc's DataUtil class.

Parameters
$str : string

Text string with possible enclosing brackets

Return values
string

Processed string with the brackets removed.

removeTagAndReturnXMLasText()

Remove a given tag from the provided nodes, then convert into XML and return as text. This is useful for populating the fullrecord field with the raw input XML but allow for removal of certain elements (eg: full text field).

public static removeTagAndReturnXMLasText(array<string|int, mixed> $in, string $tag) : string
Parameters
$in : array<string|int, mixed>

array of DOMElement objects.

$tag : string

name of tag to remove

Return values
string

XML as string

setServiceLocator()

Set the service locator.

public static setServiceLocator(ServiceLocatorInterface $serviceLocator) : void
Parameters
$serviceLocator : ServiceLocatorInterface

Locator to register

Return values
void

solrMarcStyleCleanData()

Port of logic from SolrMarc's DataUtil::cleanData method.

public static solrMarcStyleCleanData(string $str) : string
Parameters
$str : string

String to process.

Return values
string

Processed string.

stripAccents()

Strip accents from a string.

public static stripAccents(string $str) : string
Parameters
$str : string

String to process.

Return values
string

Processed string.

stripArticles()

Strip articles from the front of the text (for creating sortable titles).

public static stripArticles(string $in) : string
Parameters
$in : string

title to process.

Return values
string

article-stripped text.

stripBadChars()

Strip illegal XML characters from a string.

public static stripBadChars(string $in) : string
Parameters
$in : string

String to process

Return values
string

stripPunctuation()

Strip punctuation from a string.

public static stripPunctuation(string $str) : string
Parameters
$str : string

String to process.

Return values
string

Processed string.

titleSortLower()

Perform text processing roughly equivalent to SolrMarc's titleSortLower feature to allow consistent indexing into the title_sort field.

public static titleSortLower(string $str) : string
Parameters
$str : string

String to process.

Return values
string

Processed string.

xmlAsText()

Convert provided nodes into XML and return as text. This is useful for populating the fullrecord field with the raw input XML.

public static xmlAsText(array<string|int, mixed> $in) : string
Parameters
$in : array<string|int, mixed>

array of DOMElement objects.

Return values
string

XML as string

getApertureFields()

Load metadata about an HTML document using Aperture.

protected static getApertureFields(string $htmlFile) : array<string|int, mixed>
Parameters
$htmlFile : string

File on disk containing HTML.

Return values
array<string|int, mixed>

getDocumentFieldArray()

Support method for getDocument() -- retrieve associative array of field data.

protected static getDocumentFieldArray(string $url) : array<string|int, mixed>
Parameters
$url : string

URL of file to retrieve.

Return values
array<string|int, mixed>

getHtmlFields()

Extract key metadata from HTML.

protected static getHtmlFields(string $html) : array<string|int, mixed>

NOTE: This method uses some non-standard meta tags; it is intended as an example that can be overridden/extended to support local practices.

Parameters
$html : string

HTML content.

Return values
array<string|int, mixed>

getTikaFields()

Load metadata about an HTML document using Tika.

protected static getTikaFields(string $htmlFile) : array<string|int, mixed>
Parameters
$htmlFile : string

File on disk containing HTML.

Return values
array<string|int, mixed>

Search results