Video 5: Administering a VuFind® Server

The fifth VuFind® instructional video shows you some important server administration tasks to keep your VuFind site happy and healthy: making sure that Solr auto-starts on boot-up (using systemd), and configuring cleanup of old search and session data.

Video is available as an mp4 download or through YouTube.

Related Resources

Starting and Stopping Solr wiki page

Update Notes

This video was recorded using VuFind 6.1. In VuFind 7.0, two significant changes were made which impact the content of this video:

Solr now runs on port 8983 instead of port 8080. When creating the vufind.service file as shown in the video, the filename in the PIDFile line should be changed from solr-8080.pid to solr-8983.pid.
The command line utilities still work the same as portrayed in the video, but they have been visually improved, so some displays may look different.

Transcript

Welcome to the fifth VuFind video. This time around, we are going to talk about administering a VuFind server after you've built and configured it, since there are a few common tasks that it's useful to know about. We're going to cover both getting Solr set up in a secure way and making sure that it starts automatically when your server boots up. And we're going to talk about some cleanup you'll want to do to make sure that you don't accidentally fill up your disk without realizing it.

So let's start with Solr setup. For the purposes of this video, I am going to show you how to set up VuFind autostart using systemd. Systemd is a set of tools shared across Linux distributions which are focused on system and service management. This is a comparatively new development in Linux land, which is to say it's been around for several years, but if you've been in Linux a long time, you may be more familiar with the earlier system that used symbolic links to a directory called init.d. Systemd gets rid of obscure bash scripts and replaces them with configuration files, and while it took a little getting used to, I've come to really like it, and so I'm going to go through it in some detail in this video to help you understand how it works and what it's doing for you.

But first, let's talk about Solr security quickly. So when you install Solr, you're creating a web service that can have data sent to it and that VuFind communicates with to do searching. I hope it goes without saying, but it might not, that you do not want to expose your Solr index to the whole world because people can do malicious things to it. So you should always have Solr behind a firewall so that VuFind can talk to it, but nobody who doesn't need to has access. Additionally, it's a really good idea to create a user account dedicated to running Solr and give ownership of the Solr directories to that account, so that if somebody does somehow get to your Solr web interface and exploit a bug that allows them to do something malicious, their ability to do harm is somewhat constrained by file ownerships and so forth.

So what we are going to do in this video is create a Solr user, give ownership of the Solr directories to that Solr user, and then set up systemd so that the Solr user starts up Solr when the server boots. So first of all, we'll just do the bare minimum to create a user. We will say sudo adduser solr. So this creates a user named Solr, and we're going to set the disabled password switch (–disabled-password) because we don't need to have a password for this account. We're not going to be logging in as it.

And we're just going to accept all the defaults because this is a demo. So now that we have a user created called solr, we can give it ownership of our Solr directories, “-R” for recursive, “solr:solr” to give both user and group ownership of the directory in question, and we're going to say “$VUFIND_HOME/solr” (sudo chown -R solr:solr $VUFIND_HOME/solr). And now if I do an ls -l of the VuFind home Solr directory, I see that it is owned by Solr and in the Solr group.

Now we are all set to create a systemd service to boot up Solr web. There is a directory called /etc/systemd/system, which is where service definitions live so I'm going to use my nano editor. So, sudo nano /etc/systemd/system/vufind.service. Every service definition needs to end in dot service to tell systemd that it is a service. So I'm creating a blank file and I'm going to type in a whole bunch of parameters here to explain how the services supposed to work.

I'm going to start with After=network.target. The After command allows us to essentially define dependencies, so you can make services that wait for other services to start before they start and so forth, but in this instance we're just going to use network dot target which is a predefined setting in systemd, which means wait for there to be network online before you do anything Solr doesn't do much good without network access.

And I'm going to create a [Service] section where most of the main settings will live I'm going to say Type=forking. This is used when you run a script that exits quickly but spawns a long lived child process, which is an accurate description of the Solr dot sh script that you find uses to start up Solr script returns, but it forks a process that lives until we stop it.

Next I'm going to say ExecStart=/bin/sh -l -c '/usr/local/vufind/solr.sh start' -x. So, as you can probably guess from the setting name, ExecStart is where you specify the systemd what command you use to start the service. You'll notice that I use full paths to everything because we don't want to make any assumptions about the environment that's set up when systemd is running things. Just saying use the standard shell. The -l switch makes the shell act as if a user has logged into it when running a command which sets up the environment correctly so this gives us access to VUFIND_HOME, VUFIND_LOCAL_DIR, etc. Then the -c is just telling the shell what command to run: that quoted string we're running the Solr scripts to start Solr. Finally the -x just provides extra detailed outpuit from the shell, which can be useful for error logging and troubleshooting.

Next I'm going to say PIDFile=/usr/local/vufind/solr/vendor/bin/solr-8080.pid. This tells systemd where the file containing the process ID of the running Solr process lives. It's something that's set up by the solr.sh script when we start things up and it's useful for knowing whether or not the service is running and also understanding how to stop the process.

Next I'm going to say User=solr. This is where we specify that the Solr user we created earlier will be used to run the Solr process.

Next, ExecStop=/bin/sh -l -c '/usr/local/vufind/solr.sh stop' -x. So as you can see this exactly mirrors the ExecStart command except this is the command used to stop Solr instead of to start it up.

We also add SuccessExitStatus=0. This tells systemd that the ExecStart command will return a return value of zero when it succeeds so if solr.sh comes back with some other exit status something is wrong and the process will throw an error.

Finally we add LimitNOFILE=65000 and LimitNPROC=65000. You may have noticed recently when you start up solr.sh from the command line it throws warnings about file and process limit settings if you don't allow Solr to have lots of files open and potentially cause performance issues and so it's recommended that you set these limits to these values when running solar and production and using systemd provides a really convenient way to set those settings and then not have to think about them anymore.

Finally, we create an [Install] section containing WantedBy=multi-user.target. This section tells systemd what circumstances it should start this process under when the process is enabled. multi-user.target means that the server is running and accepting logins but isn't necessarily presenting a graphical interface, so it's kind of a low threshold for system is up and running in a normal mode.

If you use the older Linux startup system, within init.d, there were things called run levels. Those don't exist anymore. Instead there are these targets, but multi-user.target is a safe and appropriate option for this use case.

So I've now saved this file, I'm going to exit out of here and hope I didn't make any typos.

So just to show you I'm going to open up VuFind in a web browser, and try to do a search and it fails because Solr is not running. But now that I've defined a service, I can start Solr up using the standard system CTL command which systemd uses to start and stop services.

I just do sudo systemctl start vufind because I named my file vufind.service.

I wait a moment while it spins up. Now I'm back at the prompt so it appears to have succeeded let's refresh our browser. And now we have search results it worked. So if I wanted to stop or restart the service, I could do sudo systemctl stop vufind which stops it. And if it were running I could just substitute restart for stop to stop and then start it.

What we are really concerned about here is ensuring that Solr starts every time our server boots up so that we don't have to remember to start it by hand and if something happens in the middle of the night, it just recovers on its own. So that's easily done. We just say sudo systemctl enable vufind. And now the system has enabled the service based on that wanted by setting we put in the service file it knows that when it's enabled it needs to start up whenever the system is running in multi user mode and accepting connections.

So let's prove that this works and reboot the server. All right. So now I'm going to log in, and now if everything worked, I should be able to open up a web browser and access my VuFind instance and do searches without having to manually start anything. So here's VuFind. Do a blank search. Results. We are successful. Now that we've covered getting Solr to start automatically, let's also talk a little bit about cleaning up, because VuFind potentially has a lot of users accessing it, and some of the activity that users perform creates traces that can over time accumulate into quite a bit of data. So first of all, search history. Every time anybody does a search, it creates a row in a database in MySQL or whatever database platform you're using called search. The reason for the search table is that it allows us to maintain a search history. So if I go here and look, I can see that I did a blank search. Every user has a search history maintained in the search table. Also, there's the save button here, so users can potentially save their searches so that they can refer back to them in future. Also, recent releases of VuFind have a notification feature that when enabled lets people subscribe to searches and get e-mails indicating what new results have showed up in those sets. So it's useful to have this database, but of course, the vast majority of searches that get entered in the database are just forgotten about and never referred to again, and if you have people doing thousands or millions of searches in your system, this database table can get really big. Fortunately, VuFind has a command line utility called expire searches, which will clean out the table.

So just to demonstrate, if I cd into my VuFind home directory and run php public index.php util expire searches, in this example, it deleted 70 old searches from all of the times I've done searching in past videos, and you can see that if I manage to create 70 searches just in the process of recording these videos, you can end up with a lot of these things if you have search engines crawling you and or a large user base. So I strongly encourage you to set up a cron job that regularly runs this expire searches task. Otherwise, you can find that your MySQL database gets incredibly enormous. You'll also learn if you find yourself in that situation that while MySQL can grow, it can never shrink. Once a MySQL database gets really, really big, even if you clear data out of it, it doesn't get smaller again. It just reuses the already claimed space. If you really need to reclaim disk space from an out-of-control MySQL database, the best thing to do is to dump the whole database with MySQL dump, then drop the database and reimport it, which will clean up all the disk space and make a nice, new, small, optimized file for you. So just a heads-up. This might also be a good time to point out that VuFind has a whole bunch of command line tools for you, and if you just run the public index.php script from the command line, you will get a list of all of them. So there are a few different kinds of expiration tasks and all sorts of odds and ends, some of which we will go into more detail in on future videos, but just be aware these exist. They might come in handy. So getting back to the subject of cleaning up after ourselves, there's one other thing that can potentially take up a lot of space, and that is user sessions. So the way that PHP and really any web-based system allows users to have a persistent state within the system, such as being logged in or tracking a partially completed workflow, is to store some data on the server called a session. So PHP sends a session cookie to the user, which gives them a unique identifier that's tied to a session file on the server, and every time the user comes in with that cookie, PHP loads that session data and then can use it to see who is being interacted with and what they're currently in the process of doing. VuFind doesn't use the session too heavily most of the time, but there are certainly places where it's important, such as enabling you to log in and stay logged in, or tracking what page to redirect you to after you've completed a login process. VuFind has a configuration setting that controls how user session data is stored, because there are actually several options.

The default is to use PHP's built-in disk-based session handling, where it just sticks files in a directory, but you can also set it up to use a database table or to use different kinds of memory-based stores like Redis or Memcached. Depending on what option you choose, you may have different maintenance issues to deal with.

With the default disk-based sessions, normally PHP should clean up after itself, and you shouldn't have to worry about it, but I have experienced situations where things have not quite gone as planned, and session files have accumulated faster than desired. For example, if you have such a heavy load that there are too many files in the session directory for PHP to handle, it might stop cleaning up after itself. So this is something you may want to monitor on your server, perhaps with a cron job that cleans out files past a certain age in the directory used for holding sessions.

If you use the database-based session storage, there's an expire sessions command line utility, which you can see listed right here, which cleans up the session table in the database. And just to show you where these settings live, if you look in your config.ini file, here in local slash config slash view find slash config.ini, there is a section in the file called session, which I'm going to search for, and as you can see, you can set the type, which here defaults to file, but other options include memcache and database. You can set the lifetime of the session, which defaults to an hour, so in theory, these things should be cleaned up after an hour if a user stops being active. You can encrypt the session data if you're worried about anything sensitive in there, and then there are a number of settings that are specific to different session handlers, so for example, if you're using files, you can specify a non-default save path for the directory where the sessions live. If you're using memcache, you can specify how to connect to the memcache server, etc.

So that's all I have for today. I hope that's helpful. There are certainly other issues to think about when administering a server, and there are some wiki pages that talk about this in more detail, but if you can get Solr to start and you can avoid filling up your disk, you are well on your way to having a happy and healthy VuFind server. More next month.

This is an edited version of an automated transcript. Apologies for any errors.

VuFind Documentation

Table of Contents

Video 5: Administering a VuFind® Server

Related Resources

Update Notes

Transcript