About Features Downloads Getting Started Documentation Events Support GitHub

Love VuFind®? Consider becoming a financial supporter. Your support helps build a better VuFind®!

Site Tools


Warning: This page has not been updated in over over a year and may be outdated or deprecated.
videos:administering_a_vufind_server

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
videos:administering_a_vufind_server [2023/04/25 19:17] – [Transcript] crhallbergvideos:administering_a_vufind_server [2023/04/26 13:34] (current) – [Transcript] crhallberg
Line 1: Line 1:
-====== Video 5: Administering a VuFind Server ======+====== Video 5: Administering a VuFind® Server ======
  
-The fifth VuFind instructional video shows you some important server administration tasks to keep your VuFind site happy and healthy: making sure that Solr auto-starts on boot-up (using systemd), and configuring cleanup of old search and session data.+The fifth VuFind® instructional video shows you some important server administration tasks to keep your VuFind site happy and healthy: making sure that Solr auto-starts on boot-up (using systemd), and configuring cleanup of old search and session data.
  
 Video is available as an [[https://vufind.org/video/Administering_VuFind.mp4|mp4 download]] or through [[https://www.youtube.com/watch?v=KsSomqmB19g&feature=youtu.be|YouTube]]. Video is available as an [[https://vufind.org/video/Administering_VuFind.mp4|mp4 download]] or through [[https://www.youtube.com/watch?v=KsSomqmB19g&feature=youtu.be|YouTube]].
Line 18: Line 18:
 ===== Transcript ===== ===== Transcript =====
  
-Welcome to the fifth VuFind™ video. This time around, we are going to talk about administering a VuFind™ server after you've built and configured it, since there are a few common tasks that it's useful to know about. We're going to cover both getting Solr set up in a secure way and making sure that it starts automatically when your server boots up. And we're going to talk about some cleanup you'll want to do to make sure that you don't accidentally fill up your disk without realizing it.+Welcome to the fifth VuFind video. This time around, we are going to talk about administering a VuFind server after you've built and configured it, since there are a few common tasks that it's useful to know about. We're going to cover both getting Solr set up in a secure way and making sure that it starts automatically when your server boots up. And we're going to talk about some cleanup you'll want to do to make sure that you don't accidentally fill up your disk without realizing it.
  
-So let's start with Solr setup. For the purposes of this video, I am going to show you how to set up VuFind™ autostart using systemd. Systemd is a set of tools shared across Linux distributions which are focused on system and service management. This is a comparatively new development in Linux land, which is to say it's been around for several years, but if you've been in Linux a long time, you may be more familiar with the earlier system that used symbolic links to a directory called init.d. Systemd gets rid of obscure bash scripts and replaces them with configuration files, and while it took a little getting used to, I've come to really like it, and so I'm going to go through it in some detail in this video to help you understand how it works and what it's doing for you.+So let's start with Solr setup. For the purposes of this video, I am going to show you how to set up VuFind autostart using systemd. Systemd is a set of tools shared across Linux distributions which are focused on system and service management. This is a comparatively new development in Linux land, which is to say it's been around for several years, but if you've been in Linux a long time, you may be more familiar with the earlier system that used symbolic links to a directory called init.d. Systemd gets rid of obscure bash scripts and replaces them with configuration files, and while it took a little getting used to, I've come to really like it, and so I'm going to go through it in some detail in this video to help you understand how it works and what it's doing for you.
  
-But first, let's talk about Solr security quickly. So when you install Solr, you're creating a web service that can have data sent to it and that VuFind™ communicates with to do searching. I hope it goes without saying, but it might not, that you do not want to expose your Solr index to the whole world because people can do malicious things to it. So you should always have Solr behind a firewall so that VuFind™ can talk to it, but nobody who doesn't need to has access. Additionally, it's a really good idea to create a user account dedicated to running Solr and give ownership of the Solr directories to that account, so that if somebody does somehow get to your Solr web interface and exploit a bug that allows them to do something malicious, their ability to do harm is somewhat constrained by file ownerships and so forth.+But first, let's talk about Solr security quickly. So when you install Solr, you're creating a web service that can have data sent to it and that VuFind communicates with to do searching. I hope it goes without saying, but it might not, that you do not want to expose your Solr index to the whole world because people can do malicious things to it. So you should always have Solr behind a firewall so that VuFind can talk to it, but nobody who doesn't need to has access. Additionally, it's a really good idea to create a user account dedicated to running Solr and give ownership of the Solr directories to that account, so that if somebody does somehow get to your Solr web interface and exploit a bug that allows them to do something malicious, their ability to do harm is somewhat constrained by file ownerships and so forth.
  
 So what we are going to do in this video is create a Solr user, give ownership of the Solr directories to that Solr user, and then set up systemd so that the Solr user starts up Solr when the server boots. So first of all, we'll just do the bare minimum to create a user. We will say ''sudo adduser solr''. So this creates a user named Solr, and we're going to set the disabled password switch (''--disabled-password'') because we don't need to have a password for this account. We're not going to be logging in as it. So what we are going to do in this video is create a Solr user, give ownership of the Solr directories to that Solr user, and then set up systemd so that the Solr user starts up Solr when the server boots. So first of all, we'll just do the bare minimum to create a user. We will say ''sudo adduser solr''. So this creates a user named Solr, and we're going to set the disabled password switch (''--disabled-password'') because we don't need to have a password for this account. We're not going to be logging in as it.
  
-And we're just going to accept all the defaults because this is a demo. So now that we have a user created called solr, we can give it ownership of our Solr directories, "-R" for recursive, "solr:solr" to give both user and group ownership of the directory in question, and we're going to say "$VUFIND_HOME/solr" (''sudo chown -R solr:solr $VUFIND_HOME/solr''). And now if I do an ''ls -l'' of the VuFind™ home Solr directory, I see that it is owned by Solr and in the Solr group.+And we're just going to accept all the defaults because this is a demo. So now that we have a user created called solr, we can give it ownership of our Solr directories, "-R" for recursive, "solr:solr" to give both user and group ownership of the directory in question, and we're going to say "$VUFIND_HOME/solr" (''sudo chown -R solr:solr $VUFIND_HOME/solr''). And now if I do an ''ls -l'' of the VuFind home Solr directory, I see that it is owned by Solr and in the Solr group.
  
 Now we are all set to create a systemd service to boot up Solr web. There is a directory called ''/etc/systemd/system'', which is where service definitions live so I'm going to use my nano editor. So, ''sudo nano /etc/systemd/system/vufind.service''. Every service definition needs to end in dot service to tell systemd that it is a service. So I'm creating a blank file and I'm going to type in a whole bunch of parameters here to explain how the services supposed to work. Now we are all set to create a systemd service to boot up Solr web. There is a directory called ''/etc/systemd/system'', which is where service definitions live so I'm going to use my nano editor. So, ''sudo nano /etc/systemd/system/vufind.service''. Every service definition needs to end in dot service to tell systemd that it is a service. So I'm creating a blank file and I'm going to type in a whole bunch of parameters here to explain how the services supposed to work.
Line 52: Line 52:
 So I've now saved this file, I'm going to exit out of here and hope I didn't make any typos. So I've now saved this file, I'm going to exit out of here and hope I didn't make any typos.
  
-So just to show you I'm going to open up VuFind™ in a web browser, and try to do a search and it fails because Solr is not running. But now that I've defined a service, I can start Solr up using the standard system CTL command which systemd uses to start and stop services.+So just to show you I'm going to open up VuFind in a web browser, and try to do a search and it fails because Solr is not running. But now that I've defined a service, I can start Solr up using the standard system CTL command which systemd uses to start and stop services.
  
 I just do ''sudo systemctl start vufind'' because I named my file ''vufind.service''. I just do ''sudo systemctl start vufind'' because I named my file ''vufind.service''.
Line 60: Line 60:
 What we are really concerned about here is ensuring that Solr starts every time our server boots up so that we don't have to remember to start it by hand and if something happens in the middle of the night, it just recovers on its own. So that's easily done. We just say ''sudo systemctl enable vufind''. And now the system has enabled the service based on that wanted by setting we put in the service file it knows that when it's enabled it needs to start up whenever the system is running in multi user mode and accepting connections. What we are really concerned about here is ensuring that Solr starts every time our server boots up so that we don't have to remember to start it by hand and if something happens in the middle of the night, it just recovers on its own. So that's easily done. We just say ''sudo systemctl enable vufind''. And now the system has enabled the service based on that wanted by setting we put in the service file it knows that when it's enabled it needs to start up whenever the system is running in multi user mode and accepting connections.
  
-So let's prove that this works and reboot the server. All right. So now I'm going to log in, and now if everything worked, I should be able to open up a web browser and access my VuFind™ instance and do searches without having to manually start anything. So here's VuFind. Do a blank search. Results. We are successful. Now that we've covered getting Solr to start automatically, let's also talk a little bit about cleaning up, because VuFind™ potentially has a lot of users accessing it, and some of the activity that users perform creates traces that can over time accumulate into quite a bit of data. So first of all, search history. Every time anybody does a search, it creates a row in a database in MySQL or whatever database platform you're using called search. The reason for the search table is that it allows us to maintain a search history. So if I go here and look, I can see that I did a blank search. Every user has a search history maintained in the search table. Also, there's the save button here, so users can potentially save their searches so that they can refer back to them in future. Also, recent releases of VuFind™ have a notification feature that when enabled lets people subscribe to searches and get e-mails indicating what new results have showed up in those sets. So it's useful to have this database, but of course, the vast majority of searches that get entered in the database are just forgotten about and never referred to again, and if you have people doing thousands or millions of searches in your system, this database table can get really big. Fortunately, VuFind™ has a command line utility called expire searches, which will clean out the table.+So let's prove that this works and reboot the server. All right. So now I'm going to log in, and now if everything worked, I should be able to open up a web browser and access my VuFind instance and do searches without having to manually start anything. So here's VuFind. Do a blank search. Results. We are successful. Now that we've covered getting Solr to start automatically, let's also talk a little bit about cleaning up, because VuFind potentially has a lot of users accessing it, and some of the activity that users perform creates traces that can over time accumulate into quite a bit of data. So first of all, search history. Every time anybody does a search, it creates a row in a database in MySQL or whatever database platform you're using called search. The reason for the search table is that it allows us to maintain a search history. So if I go here and look, I can see that I did a blank search. Every user has a search history maintained in the search table. Also, there's the save button here, so users can potentially save their searches so that they can refer back to them in future. Also, recent releases of VuFind have a notification feature that when enabled lets people subscribe to searches and get e-mails indicating what new results have showed up in those sets. So it's useful to have this database, but of course, the vast majority of searches that get entered in the database are just forgotten about and never referred to again, and if you have people doing thousands or millions of searches in your system, this database table can get really big. Fortunately, VuFind has a command line utility called expire searches, which will clean out the table.
  
-So just to demonstrate, if I cd into my VuFind™ home directory and run php public index.php util expire searches, in this example, it deleted 70 old searches from all of the times I've done searching in past videos, and you can see that if I manage to create 70 searches just in the process of recording these videos, you can end up with a lot of these things if you have search engines crawling you and or a large user base. So I strongly encourage you to set up a cron job that regularly runs this expire searches task. Otherwise, you can find that your MySQL database gets incredibly enormous. You'll also learn if you find yourself in that situation that while MySQL can grow, it can never shrink. Once a MySQL database gets really, really big, even if you clear data out of it, it doesn't get smaller again. It just reuses the already claimed space. If you really need to reclaim disk space from an out-of-control MySQL database, the best thing to do is to dump the whole database with MySQL dump, then drop the database and reimport it, which will clean up all the disk space and make a nice, new, small, optimized file for you. So just a heads-up. This might also be a good time to point out that VuFind™ has a whole bunch of command line tools for you, and if you just run the public index.php script from the command line, you will get a list of all of them. So there are a few different kinds of expiration tasks and all sorts of odds and ends, some of which we will go into more detail in on future videos, but just be aware these exist. They might come in handy. So getting back to the subject of cleaning up after ourselves, there's one other thing that can potentially take up a lot of space, and that is user sessions. So the way that PHP and really any web-based system allows users to have a persistent state within the system, such as being logged in or tracking a partially completed workflow, is to store some data on the server called a session. So PHP sends a session cookie to the user, which gives them a unique identifier that's tied to a session file on the server, and every time the user comes in with that cookie, PHP loads that session data and then can use it to see who is being interacted with and what they're currently in the process of doing. VuFind™ doesn't use the session too heavily most of the time, but there are certainly places where it's important, such as enabling you to log in and stay logged in, or tracking what page to redirect you to after you've completed a login process. VuFind™ has a configuration setting that controls how user session data is stored, because there are actually several options.+So just to demonstrate, if I cd into my VuFind home directory and run php public index.php util expire searches, in this example, it deleted 70 old searches from all of the times I've done searching in past videos, and you can see that if I manage to create 70 searches just in the process of recording these videos, you can end up with a lot of these things if you have search engines crawling you and or a large user base. So I strongly encourage you to set up a cron job that regularly runs this expire searches task. Otherwise, you can find that your MySQL database gets incredibly enormous. You'll also learn if you find yourself in that situation that while MySQL can grow, it can never shrink. Once a MySQL database gets really, really big, even if you clear data out of it, it doesn't get smaller again. It just reuses the already claimed space. If you really need to reclaim disk space from an out-of-control MySQL database, the best thing to do is to dump the whole database with MySQL dump, then drop the database and reimport it, which will clean up all the disk space and make a nice, new, small, optimized file for you. So just a heads-up. This might also be a good time to point out that VuFind has a whole bunch of command line tools for you, and if you just run the public index.php script from the command line, you will get a list of all of them. So there are a few different kinds of expiration tasks and all sorts of odds and ends, some of which we will go into more detail in on future videos, but just be aware these exist. They might come in handy. So getting back to the subject of cleaning up after ourselves, there's one other thing that can potentially take up a lot of space, and that is user sessions. So the way that PHP and really any web-based system allows users to have a persistent state within the system, such as being logged in or tracking a partially completed workflow, is to store some data on the server called a session. So PHP sends a session cookie to the user, which gives them a unique identifier that's tied to a session file on the server, and every time the user comes in with that cookie, PHP loads that session data and then can use it to see who is being interacted with and what they're currently in the process of doing. VuFind doesn't use the session too heavily most of the time, but there are certainly places where it's important, such as enabling you to log in and stay logged in, or tracking what page to redirect you to after you've completed a login process. VuFind has a configuration setting that controls how user session data is stored, because there are actually several options.
  
 The default is to use PHP's built-in disk-based session handling, where it just sticks files in a directory, but you can also set it up to use a database table or to use different kinds of memory-based stores like Redis or Memcached. Depending on what option you choose, you may have different maintenance issues to deal with. The default is to use PHP's built-in disk-based session handling, where it just sticks files in a directory, but you can also set it up to use a database table or to use different kinds of memory-based stores like Redis or Memcached. Depending on what option you choose, you may have different maintenance issues to deal with.
Line 70: Line 70:
 If you use the database-based session storage, there's an expire sessions command line utility, which you can see listed right here, which cleans up the session table in the database. And just to show you where these settings live, if you look in your config.ini file, here in local slash config slash view find slash config.ini, there is a section in the file called session, which I'm going to search for, and as you can see, you can set the type, which here defaults to file, but other options include memcache and database. You can set the lifetime of the session, which defaults to an hour, so in theory, these things should be cleaned up after an hour if a user stops being active. You can encrypt the session data if you're worried about anything sensitive in there, and then there are a number of settings that are specific to different session handlers, so for example, if you're using files, you can specify a non-default save path for the directory where the sessions live. If you're using memcache, you can specify how to connect to the memcache server, etc. If you use the database-based session storage, there's an expire sessions command line utility, which you can see listed right here, which cleans up the session table in the database. And just to show you where these settings live, if you look in your config.ini file, here in local slash config slash view find slash config.ini, there is a section in the file called session, which I'm going to search for, and as you can see, you can set the type, which here defaults to file, but other options include memcache and database. You can set the lifetime of the session, which defaults to an hour, so in theory, these things should be cleaned up after an hour if a user stops being active. You can encrypt the session data if you're worried about anything sensitive in there, and then there are a number of settings that are specific to different session handlers, so for example, if you're using files, you can specify a non-default save path for the directory where the sessions live. If you're using memcache, you can specify how to connect to the memcache server, etc.
  
-So that's all I have for today. I hope that's helpful. There are certainly other issues to think about when administering a server, and there are some wiki pages that talk about this in more detail, but if you can get Solr to start and you can avoid filling up your disk, you are well on your way to having a happy and healthy VuFind™ server. More next month.+So that's all I have for today. I hope that's helpful. There are certainly other issues to think about when administering a server, and there are some wiki pages that talk about this in more detail, but if you can get Solr to start and you can avoid filling up your disk, you are well on your way to having a happy and healthy VuFind server. More next month.
  
 //This is an edited version of an automated transcript. Apologies for any errors.// //This is an edited version of an automated transcript. Apologies for any errors.//
videos/administering_a_vufind_server.1682450270.txt.gz · Last modified: 2023/04/25 19:17 by crhallberg