About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
indexing:solrmarc:troubleshooting

SolrMarc Troubleshooting

This page contains tips for troubleshooting common MARC import problems.

Why are records missing from my index?

Sometimes when you run an import, the number of records in your index will not match the number of records in your MARC file. This can happen for a few possible reasons:

  1. You may have duplicate IDs in your records. Solr stores records based on their unique IDs. If two records have the same ID, the most recently indexed one will overwrite any previous record with a matching ID. If this is a possibility in your data, you may need to review your import rules and determine a better way to uniquely identify each record. Note that records with NO ID value can also cause problems – if the field you index for unique ID is missing from any records, they will not be indexed correctly (possibly being added to the index with a blank ID).
  2. There may be other data problems (either bad records or problematic import configuration rules) that are causing some records to be skipped during the import process. If this is the case, the import tool should be outputting errors and warnings during the import.
  3. Your import configuration might include a rule that deletes records based on specific conditions, and some of those conditions are being met. VuFind®'s default import configuration does not include any such rules, so if you are using defaults, this is not the explanation. But if you have customized the rules, you may wish to review whether any are using the DeleteRecordIfFieldEmpty modifier.

Here is a process for identifying whether you have one of the first two problems:

First, run the import process and redirect it to a file:

$VUFIND_HOME/import-marc.sh /path/to/your/marc-file.mrc > /tmp/import.log

Next, filter DEBUG messages from the log and see if there are any ERROR or WARNING messages indicating corrupt records:

cat /tmp/import.log | grep -v DEBUG

If this does not reveal the cause of your problem, you can look for duplicate IDs by sorting the output and comparing deduplicated results against non-deduplicated results. For example:

sort < /tmp/import.log > /tmp/import-sorted.log
sort -u < /tmp/import.log > /tmp/import-sorted-unique.log
diff /tmp/import-sorted.log /tmp/import-sorted-unique.log

If you have duplicate records, the diff command will show the duplicate DEBUG lines highlighting the IDs of the duplicates.

:!: IMPORTANT: this process will only work if your unique IDs are in the 001 field, because the SolrMarc debug output only includes 001 values. If your identifiers are stored somewhere else, you will need a different method of detecting duplicates.

indexing/solrmarc/troubleshooting.txt · Last modified: 2024/02/23 11:34 by demiankatz