About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
videos:administering_a_vufind_server

This is an old revision of the document!


Video 5: Administering a VuFind Server

The fifth VuFind instructional video shows you some important server administration tasks to keep your VuFind site happy and healthy: making sure that Solr auto-starts on boot-up (using systemd), and configuring cleanup of old search and session data.

Video is available as an mp4 download or through YouTube.

Update Notes

:!: This video was recorded using VuFind 6.1. In VuFind 7.0, two significant changes were made which impact the content of this video:

  • Solr now runs on port 8983 instead of port 8080. When creating the vufind.service file as shown in the video, the filename in the PIDFile line should be changed from solr-8080.pid to solr-8983.pid.
  • The command line utilities still work the same as portrayed in the video, but they have been visually improved, so some displays may look different.

Transcript

This is a raw machine-generated transcript; it will be cleaned up as time permits.

Welcome to the fifth VuFind video. This time around we are going to talk about administering a VuFind server after you've built and configured it since there are a few common tasks that it's useful to know about. We're going to cover both getting Solr setup in a secure way and making sure that it starts automatically when your server boots up and we're going to talk about some cleanup you'll want to do to make sure that you don't accidentally fill up your disk without realizing it.

So let's start with Solr setup. For the purposes of this video I am going to show you how to set up VuFind auto start using systemd. Systemd is a set of tools shared across Linux distributions which are focused on system and service management. This is a comparatively new development in Linux land, which is to say it's been around for several years, but if you've been in Linux a long time you'd be you may be more familiar with the earlier system that used symbolic links to a directory called init.d. systemd gets rid of obscure Bash scripts and replaces them with configuration files. And while it took a little getting used to I've come to really like it and so I'm going to go through it in some detail in this video to help you understand how it works and what it's doing for you.

But first let's talk about Solr security quickly. So when you install Solr you're creating a web service that can have data sent to it and that VuFind communicates with to do searching. I hope it goes without saying, but it might not, that you do not want to expose your Solr index to the whole world because people can do malicious things to it. So you should always have Solr behind a firewall so that VuFind can talk to it but nobody who doesn't need to has access. Additionally it's a really good idea to create a user account dedicated to running Solr and give ownership of the Solr directories to that account so that if somebody does somehow get to your Solr web interface and exploit a bug that allows them to do something malicious, their ability to do harm is somewhat constrained by file ownerships and so forth. So what we are going to do in this video is create a Solr user, give ownership of the Solr directories to that Solr user, and then set up systemd so that the Solr user starts up Solr when the server boots.

So first of all we'll just do the bare minimum to create a user we will say “sudo add user solr” so this creates a user named solr and we're going to set the disabled password switch because we don't need to have a password for this account we're not going to be logging in as it. And we're just going to accept all the defaults because this is a demo. So now that we have a user created called solr we can give it ownership of our solr directories “sudo chown -R”, for recursive, “solr:solr”, to give both user and group ownership of the directory in question and we're going to say “VUFIND_HOME/solr”. And now if I do an “ls -l” of the VuFind home Solr directory I see that it is owned by solr and in the solr group. Now we are all set to create a systemd service to boot up Solr with. There is a directory called etc/systemd/system which is where service definitions live. So I'm going to use my Nano editor so sudo nano slash Etsy system D slash system slash you find that service every service definition needs to end in that service to tell system view that is a service so I'm creating a blank file and I'm going to type in a whole bunch of parameters here to explain how the service is supposed to work. I'm going to start with after equals network dot target the after command allows us to essentially define dependencies so you can make services that wait for other services to start before they start and so forth but in this instance we're just going to use network target which is a predefined setting in systemd which means wait further to be networking online before you do anything Solr doesn't do much good without network access then I'm going to create a service section where most of the main settings will live I'm going to say type equals forking this is used when you run a script that exits quickly but spawns a long-lived child process which is a an accurate description of the solr SH script that you find uses to start up solr script returns but it forks a process that lives until we stop it next I'm going to say exact start equal slash bin slash s H minus L minus C and then in single quotes a slash user slash local slash Q fine slash solar dot s H starch and single quote minus X so as you can probably guess from the setting name exact start is where you specify the systemd what command you use to start the service you'll notice that I use full paths to everything because we don't want to make any assumptions about the environment that's set up when systemd is running things so we're just saying use the standard shell the minus else which makes the shell act as if a user has logged into it when running a command which sets up the environment correctly so this gives us access to if you find home if you find local dir etc then the minus C is just telling the shell what command to run so that quoted string we're running the Solar scripts to start solar and finally the minus X just provides extra detailed output from the shell which can be useful for error logging and troubleshooting next I'm going to say Pig file equals user local few fine solar vendor bin solar - 88 e dot Pig this tells system D where the file containing the process ID of the running solar process lives and this is something that's set up by the solar dot s H script when we start things up and it's useful for knowing whether or not the service is running and also understanding how to stop the process next I'm going to say user equals solar this is where we specify that the solar user we created earlier will be used to run the solar process next exact stop equals slash bin slash s H minus L minus C user local view fine solar SH stop all in single quotes and minus X so as you can see this exactly mirrors the exact start command except this is the command used to stop solar instead of to start it up we add success exit status equals 0 this tells system D that the exact start command will return a return value of zero when it succeeds so if solar dive SH comes back with some other exit status something is wrong and the process will throw an error finally limit n Oh file equals 65,000 and limit in proc equals 65,000 you may have noticed recently when you start up solar data SH from the command line it throws warnings about file and process limit settings if you don't allow a solar to have lots of files open it can potentially cause performance issues and so it's recommended that you set these limits to these values when running solar in production and using system D provides a really convenient way to set those settings and then not have to think about them anymore finally we create an install section containing wanted by equals multi - user dot target this section tells system D what circumstances it should start this process under when the process is enabled and multi-user target means that the server is running and accepting logins but isn't necessarily presenting a graphical interface so it's kind of a low threshold for system is up and running in a normal mode if you use the older Linux startup system with the nib D there were things called run levels those don't exist anymore instead there are these targets but multi-user target is a safe and appropriate option for this use case so I've now saved this file I'm going to exit out of here and hope I didn't make any typos so just to show you I'm going to open up if you find in a web browser and try to do a search and it fails because solar is not running but now that I've defined a service I can start solar up using the standard systemctl command which systemd uses to start and stop services so I just say sudo systemctl start view fine because I named my file view find dot service and I wait a moment well it spins up and now I'm back prompt so it appears to have succeeded let's refresh our browser and now we have search results it worked so if I wanted to stop or restart the service I could do sudo systemctl stop view find which stops it and if it were running I could just substitute restart for stop to stop and then start it but what we are really concerned about here is ensuring that solar starts every time our server boots up so that we don't have to remember to start it by hand and if something happens in the middle of the night it just recovers on its own so that's easily done we just say sudo systemctl enable view find and now the system has enabled the service based on that wanted by setting we put in the service file it knows that when it's enabled it needs to start up whenever the system is running in multi-user mode and accepting connections so let's prove that this works and reboot the server all right so now I'm going to login and now if everything worked I should be able to open up a web browser and access my view find instance and do searches without having to manually start anything so here's you find do a blank search results we are successful now that we've covered getting solar to start automatically let's also talk a little bit about cleaning up because view find potentially has a lot of users accessing it and some of the activity that users perform creates traces that can over time accumulate into quite a bit of data so first of all search history every time anybody does a search it creates a row in a database in my sequel or whatever database platform you're using called search the reason for the search table is that it allows us to maintain a search history so if I go here and look I can see that I did a blank search every user has a search history maintained in the search table also there's the Save button here so users can potentially save their searches so that they can refer back to them in future also recent releases of you find have a notification feature that when enabled lets people subscribe to searches and get emails indicating what new results have showed up in those sets so it's useful to have this database but of course the vast majority of searches that get entered in the database are just forgotten about and never referred to again and if you have people doing thousands or millions of searches in your system this database table can get really big fortunately view find has a command-line utility called expire searches which will clean out the table so just to demonstrate if I CD into my view find home directory and run PHP public index dot PHP util expire searches in this example it deleted 70 old searches from all of the time I've done searching in past videos and you can see that if I manage to create 70 searches just in the process of recording these videos you can end up with a lot of these things if you have search engines crawling you and/or a large user base so I strongly encourage you to set up a cron job that regularly runs this expire searches tasks otherwise you can find that your my sequel database gets incredibly enormous you'll also learn if you find yourself in that situation that well my sequel can grow it can never shrink once a my sequel database gets really really big even if you clear data out of it it doesn't get smaller again it just reuses the already claimed space if you really need to reclaim disk space from an out-of-control my sequel database the best thing to do is to dump the whole database with my sequel dump then drop the database and re-import it which will clean up all the disk space and make a nice new small optimized file for you so just a heads up this might also be a good time to point out that if you find has a whole bunch of command line tools for you and if you just run the public index dot PHP script from the command line you will get a list of all of them so there are a few different kinds of exploration tasks and all sorts of odds and ends some of which we will go into more detail in on future videos but just be aware these exist they might come in handy so getting back to the subject of cleaning up after ourselves there's one other thing that can potentially take up a lot of space and that is user sessions so the way that PHP and really any web-based system allows users to have a persistent state within the system such as being logged in or tracking a partially completed workflow is to store some data on the server called a session so PHP sends a session cookie to the user which gives them a unique identifier that's tied to a session file on the server and every time the user comes in with that cookie PHP loads that session data and then can use it to see who is being interacted with and what they're currently in the process of doing whew fine doesn't use the session too heavily most of the time but there are certainly places where it's important such as enabling you to log in and stay logged in or tracking what page to redirect you to after you've completed a login process view find has a configuration setting that controls how user session data is stored because there are actually several options the default is to use PHP s built-in disk based session handling where it just sticks files in a directory but you can also set it up to use a database table or to use different kinds of memory based stores like Redis or memcache D depending on what option you choose you may have different maintenance issues to deal with with the default disk based sessions normally PHP should clean up after itself and you shouldn't have to worry about it but I have experienced situations where things have not quite gone as planned and session files have accumulated faster than desired for example if you have such a heavy load that there are too many files in the session directory for PHP to handle it might stop cleaning up after itself so this is something you may want to monitor on your server perhaps with a cron job that cleans out files past a certain age in the directory used for holding sessions if you use the database based session storage there's an expire sessions command-line utility which you can see listed right here which cleans up the table in the database and just to show you where these settings live if you look in your config dot ini file here in local slash config slash you find slash config dot ini there is a section in the file called session which I'm going to search for and as you can see you can set the type which here defaults to file that other options include memcache and database you can set the lifetime of the session which defaults to an hour so in theory these things should be cleaned up after an hour if a user stops being active you can encrypt the session data if you're worried about anything sensitive in there and then there are a number of settings that are specific to different session handlers so for example if you're using files you can specify a non-default save path from the directory where the sessions live if you're using memcache you can specify how to connect the memcache server etc so that's all I have for today I hope that's helpful there are certainly other issues to think about when administering a server and there are some wiki pages that talk about this in more detail but if you can get solar to start and you can avoid filling up your disk you are well on your way to having a happy and healthy view fine server more next month

videos/administering_a_vufind_server.1639785917.txt.gz · Last modified: 2021/12/18 00:05 by akilsdonk