VuFind API Documentation

LuceneSyntaxHelper
in package

Lucene query syntax helper class.

Tags
category

VuFind

author

Andrew S. Nagy vufind-tech@lists.sourceforge.net

author

David Maus maus@hab.de

author

Demian Katz demian.katz@villanova.edu

author

Ere Maijala ere.maijala@helsinki.fi

license

http://opensource.org/licenses/gpl-2.0.php GNU General Public License

link
https://vufind.org

Table of Contents

SOLR_RANGE_RE  = '/(\\[.+\\s+TO\\s+.+\\])|(\\{.+\\s+TO\\s+.+\\})/'
Regular expression matching a SOLR range.
$allBools  : array<string|int, mixed>
All boolean operators supported by the class.
$caseSensitiveBooleans  : bool|string
Force boolean operators to uppercase? Set to true to make all Booleans case-sensitive; false to make no Booleans case-sensitive; comma-separated string to make only certain operators case sensitive.
$caseSensitiveRanges  : bool
Force ranges to uppercase?
$insideQuotes  : string
Lookahead that detects whether or not we are inside quotes.
__construct()  : mixed
Constructor.
capitalizeBooleans()  : string
Capitalize boolean operators.
capitalizeCaseInsensitiveBooleans()  : string
Wrapper around capitalizeBooleans that accounts for the caseSensitiveBooleans property of this class.
capitalizeRanges()  : string
Capitalize range operator.
containsAdvancedLuceneSyntax()  : bool
Return true if the search string contains advanced Lucene syntax.
containsBooleans()  : bool
Return true if the search string contains boolean operators.
containsRanges()  : bool
Return true if the search string contains ranges.
extractSearchTerms()  : string
Extract search terms from a query string for spell checking.
hasCaseSensitiveBooleans()  : bool
Are there any case-sensitive Boolean operators configured?
hasCaseSensitiveRanges()  : bool
Are case-sensitive ranges configured?
normalizeSearchString()  : string
Return normalized input string.
capitalizeRangesCallback()  : string
Callback helper function.
countNonQuoted()  : int
Count occurrences of a character in non-quoted parts of the string
getBoolsToCap()  : array<string|int, mixed>
Convert the caseSensitiveBooleans property into an array for use with the capitalizeBooleans function.
normalizeBoosts()  : string
Normalize boosts in a query.
normalizeBracesAndBrackets()  : string
Normalize braces/brackets in a query.
normalizeColons()  : string
Normalize field specifications within the query.
normalizeFancyQuotes()  : string
Normalize fancy quotes in a query.
normalizeParens()  : string
Normalize parentheses in a query.
normalizeUnquotedText()  : string
Normalize various problems found in unquoted text within the query.
normalizeWildcards()  : string
Normalize wildcards in a query.
prepareForLuceneSyntax()  : string
Prepare input to be used in a SOLR query.
processQueryString()  : void
Process a Lucene query string with a callback
removeNonQuoted()  : string
Remove occurrences of given characters in non-quoted parts of the string

Constants

SOLR_RANGE_RE

Regular expression matching a SOLR range.

public string SOLR_RANGE_RE = '/(\\[.+\\s+TO\\s+.+\\])|(\\{.+\\s+TO\\s+.+\\})/'

Properties

$allBools

All boolean operators supported by the class.

protected array<string|int, mixed> $allBools = ['AND', 'OR', 'NOT']

$caseSensitiveBooleans

Force boolean operators to uppercase? Set to true to make all Booleans case-sensitive; false to make no Booleans case-sensitive; comma-separated string to make only certain operators case sensitive.

protected bool|string $caseSensitiveBooleans

$caseSensitiveRanges

Force ranges to uppercase?

protected bool $caseSensitiveRanges

$insideQuotes

Lookahead that detects whether or not we are inside quotes.

protected static string $insideQuotes = '(?=(?:[^\\"]*+\\"[^\\"]*+\\")*+[^\\"]*+$)'

Methods

__construct()

Constructor.

public __construct([bool|string $csBools = true ][, bool $csRanges = true ]) : mixed
Parameters
$csBools : bool|string = true

Case sensitive Booleans setting

$csRanges : bool = true

Case sensitive ranges setting

Return values
mixed

capitalizeBooleans()

Capitalize boolean operators.

public capitalizeBooleans(string $string[, array<string|int, mixed> $bools = ['AND', 'OR', 'NOT'] ]) : string
Parameters
$string : string

Search string

$bools : array<string|int, mixed> = ['AND', 'OR', 'NOT']

Which booleans to capitalize (default = all)

Return values
string

capitalizeCaseInsensitiveBooleans()

Wrapper around capitalizeBooleans that accounts for the caseSensitiveBooleans property of this class.

public capitalizeCaseInsensitiveBooleans(string $string) : string
Parameters
$string : string

Search string

Return values
string

capitalizeRanges()

Capitalize range operator.

public capitalizeRanges(string $string) : string
Parameters
$string : string

Search string

Return values
string

containsAdvancedLuceneSyntax()

Return true if the search string contains advanced Lucene syntax.

public containsAdvancedLuceneSyntax(string $searchString) : bool
Parameters
$searchString : string

Search string

Return values
bool

containsBooleans()

Return true if the search string contains boolean operators.

public containsBooleans(string $searchString) : bool
Parameters
$searchString : string

Search string

Return values
bool

containsRanges()

Return true if the search string contains ranges.

public containsRanges(string $searchString) : bool
Parameters
$searchString : string

Search string

Return values
bool

extractSearchTerms()

Extract search terms from a query string for spell checking.

public extractSearchTerms(string $query) : string

This will only handle the most often used simple cases.

Parameters
$query : string

Query string

Return values
string

hasCaseSensitiveBooleans()

Are there any case-sensitive Boolean operators configured?

public hasCaseSensitiveBooleans() : bool
Return values
bool

hasCaseSensitiveRanges()

Are case-sensitive ranges configured?

public hasCaseSensitiveRanges() : bool
Return values
bool

normalizeSearchString()

Return normalized input string.

public normalizeSearchString(string $searchString) : string
Parameters
$searchString : string

Input search string

Return values
string

capitalizeRangesCallback()

Callback helper function.

protected capitalizeRangesCallback(array<string|int, mixed> $match) : string
Parameters
$match : array<string|int, mixed>

Matches as of preg_replace_callback()

Tags
see
LuceneSyntaxHelper::capitalizeRanges()
todo

Check possible problem with umlauts/non-ASCII word characters

Return values
string

countNonQuoted()

Count occurrences of a character in non-quoted parts of the string

protected countNonQuoted(string $needle, string $haystack) : int
Parameters
$needle : string

Character to look for (non-escaped)

$haystack : string

String to process

Return values
int

getBoolsToCap()

Convert the caseSensitiveBooleans property into an array for use with the capitalizeBooleans function.

protected getBoolsToCap() : array<string|int, mixed>
Return values
array<string|int, mixed>

normalizeBoosts()

Normalize boosts in a query.

protected normalizeBoosts(string $input) : string
Parameters
$input : string

String to normalize

Return values
string

normalizeBracesAndBrackets()

Normalize braces/brackets in a query.

protected normalizeBracesAndBrackets(string $input) : string

IMPORTANT: This should only be called on a string that has already been cleaned up by normalizeBoosts().

Parameters
$input : string

String to normalize

Return values
string

normalizeColons()

Normalize field specifications within the query.

protected normalizeColons(string $input) : string
Parameters
$input : string

String to normalize

Return values
string

normalizeFancyQuotes()

Normalize fancy quotes in a query.

protected normalizeFancyQuotes(string $input) : string
Parameters
$input : string

String to normalize

Return values
string

normalizeParens()

Normalize parentheses in a query.

protected normalizeParens(string $input) : string

Removes all non-quoted parentheses if they're not balanced.

Parameters
$input : string

String to normalize

Return values
string

normalizeUnquotedText()

Normalize various problems found in unquoted text within the query.

protected normalizeUnquotedText(string $input) : string
Parameters
$input : string

String to normalize

Return values
string

normalizeWildcards()

Normalize wildcards in a query.

protected normalizeWildcards(string $input) : string
Parameters
$input : string

String to normalize

Return values
string

prepareForLuceneSyntax()

Prepare input to be used in a SOLR query.

protected prepareForLuceneSyntax(string $input) : string

Handles certain cases where the input might conflict with Lucene syntax rules.

Parameters
$input : string

Input string

Tags
todo

Check if it is safe to assume $input to be an UTF-8 encoded string.

Return values
string

processQueryString()

Process a Lucene query string with a callback

protected processQueryString(callable $callback, string $str) : void
Parameters
$callback : callable

Callback that gets called for each character

$str : string

String to process

Return values
void

removeNonQuoted()

Remove occurrences of given characters in non-quoted parts of the string

protected removeNonQuoted(array<string|int, mixed> $needles, string $haystack) : string
Parameters
$needles : array<string|int, mixed>

Characters to remove (non-escaped)

$haystack : string

String to process

Return values
string

Search results