Set up a Server

From DISI
Revision as of 09:21, 22 September 2020 by Khtang (Talk | contribs)

Jump to: navigation, search

This page described how to install CentOS and setup/troubleshooting puppet

Installing CentOS 7

Getting a Bootable USB stick

You can borrow it from the Sysadmin or DIY one (4.4GB+ storage) with instruction here

Change Boot Order

1. Insert the USB stick and connect the monitor to the machine

2. Reboot the machine

3. Bring up the BIOS Menu by pressing Del button while the machine is booting

- In Boot, change the boot oder so that the USB get booted first

- Save changes and reboot

Install CentOS 7

Adopted from this guide -> https://phoenixnap.com/kb/how-to-install-centos-7

Select Test this media and install Centos 7

Step 1 : Choose Keyboard and Language

Step 2 : Network Configuration

Select NETWORK & HOSTNAME

1. Switch on the Ethernet

2. Change Host name at the bottom

3. Select Configure

Select IPv4 Settings
DNS Servers:
  [alpha private ip adress]
Search domains:
  cluster.ucsf.bkslab.org, ucsf.bkslab.org, bkslab.org, compbio.ucsf.edu, ucsf.edu
Check "Require IPv4 addressing for this connection to complete".
Save.

Step 3: Set Date and Time

Turn on Network Time and Select the local timezone.

Step 4: Partitioning

Select INSTALLATION DESTINATION.

Option 1: Automatic Partitioning

Under the Other Storage Options heading, select the Automatically configure partitioning checkbox. This ensures the selected destination storage disk will automatically partition with the /(root), /home and swap partitions. It will automatically create an LVM logical volume in the XFS file system.

If you do not have enough free space, you can reclaim disk space and instruct the system to delete files.

When finished, click the Done button.

Option 2: Manual Partitioning

Select the I will configure partitioning checkbox and choose Done.

If you want to use other file systems (such as ext4 and vfat) and a non-LVM partitioning scheme, such as btrfs. This will initiate a configuration pop-up where you can set up your partitioning manually.

Step 5: Software Selection

Select Compute Node on the left menu, then select Add-Ons on the right menu.

Step 6: Enable KDUMP

Double-check if KDUMP is enabled.

Step 7: Start installation Process

Hit Begin Installation

Step 8: Setup Root Password & User

During Installation, will see 2 items on top

Root Password

The usual one

User Creation

Create a local administrator account

User name : survival
Check "Make this user adminstrator"
Check "Require a password for this account"
Password : [Hint it starts with G and has t somewhere in the middle]

'REBOOT when Installation is completed

Install Puppet and Create Puppet Certificate

Packages Installation

Login as root user

  • Install EPEL release. EPEL is a repository for enterprise releases. Learn more
$ sudo yum install epel-release
This will install access to public repo on Epel. GPG key is provided to provide transaction is valid
  • Update centos packages
$ sudo yum update
  • Install Puppet
$ sudo yum install puppet
  • Install sssd
$ sudo yum install sssd
  • Install nss-pam-ldapd
$ sudo yum install nss-pam-ldapd
$ yum install oddjob-mkhomedir

Edit Puppet configuration on foreman.uscf.bkslab.org

  1. Search for host with it is existed.
  2. Edit Puppet setting
    1. If the machine is brand new, click on 'New Host', choose 'Testing' as Host Group and replicate the other existing desktop settings.
    2. In Parameters, click "Override" in "variant" and assign "cluster" as variable at the bottom.
    3. In Puppet class, Choose :
           * nfs-mounts.*
           * ssd*
           

Issue new Puppet Certificate

In a second terminal, log in as root

  • Log into alpha, to create new puppet certificate for the new computer
$ sudo puppet cert list -a | grep <hostname>.cluster.ucsf.bkslab.org //to list all of the current puppet certificates and check if there was an existing certificate for this machine
  • To clean out existing certificate
$ sudo puppet cert clean <hostname>.cluster.ucsf.bkslab.org

BEFORE PROCEEDING TO THE NEXT STEP, MAKE SURE that you have 2 terminals on: one logged in as root on the new computer (client) and the other logged in as s_ on alpha (server) 1. On the client side:

$ puppet agent --test --waitforcert=10
"puppet agent --test" command initial integration with puppet for a new computer or reintegrate puppet. Without this command, the machine will not have access to the /mnt/nfs, /nfs/* and /nfs/soft 
"--waitforcert=10" means "keep calm, wait 10s for DNS server to respond"

2. On server (alpha) side:

Sign the certificate
$ sudo puppet cert sign <hostname>.cluster.ucsf.bkslab.org


Testing puppet

$ id <user_name>

If failed, try running these commands and try it again:

$ systemctl restart sssd
$ systemctl enable sssd

$ authcofig-tui
This will prompt you to the authcofig-tui screen. User SpaceBar to change setting.
1. Uncheck "Use Shadow Password".
2. Uncheck "User Fingerprint reader" so that it would not raise any fingerprint error later. Click "Next' after.
3. Under "LDAP Settings", make sure it says:
   [*] User TLS
   Server: ldaps://ds.ucsf.bkslab.org/
   Base DN: dc=bkslab, dc=org
$ systemctl start oddjobd
$ systemctl enable oddjobd

GPU

Nouveau is the proprietary driver that is enable by default. In order to nvidia driver to work, nouveau must be disable How to know

$ lsmod | grep nouveau

How to disable nouveau

$ vim /etc/default/grub
Append this line 'rd.driver.blacklist=nouveau nouveau.modeset=0' at the end of GRUB_CMDLINE_LINUX
$ mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
$ echo "blacklist nouveau" > /etc/modprobe.d/nouveau-blacklist.conf 
$ dracut /boot/initramfs-$(uname -r).img $(uname -r)
$ reboot

Troubleshooting

Puppet SSL issue

  • Datetime mismatch
http://wiki.docking.org/index.php/Troubleshooting_-_Puppet_Failed_to_generate_additional_resources_using_%27eval_generate:_SSL_connect_returned%3D1%27

These are some issues from n-5-34/5 and the proposed solutions

  • Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid tag "" on node

This error happens because puppet uses cache version of the node instead of creating new one. You must clean all trace of node on alpha before reissuing a new certification

[root@alpha tmp]# puppet node deactivate samekh.cluster.ucsf.bkslab.org

  • To reissue Puppet on machine:
-revoke puppet certificate in alpha
 $ sudo puppet cert clean <hostname>.cluster.ucsf.bkslab.org
-remove this directory
 $ rm -rf /var/lib/puppet/ssl

1. Network configuration (/etc/resolv.conf)

Issue 1 : DNS and nameserver are empty (Ethernet connection was not configured during installation)

What I did:

$ nmtui (NetworkManager tui)
-Edit the connection by following the example from n-1-136 

Issue 2: nameserver 127.0.0.1

What I did:

- Commented out all items in [main] section in /etc/NetworkManager/NetworkManager.conf
- Change nameserver to 10.20.1.1
$ systemctl restart NetworkManager.service
$ systemctl restart network

2. Yum not working (http://yum/centos/7/contrib/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found)

Issue: Puppet overwrote the existing Centos-Base.repo (Centos-7) with a Centos 6's Centos-Base.repo file

What I did:

- Overwritten /etc/yum.repos.d/CentOS-Base.repo with copy of the correct version from n-1-136

3. Machine not recognizing users Issue 1: sssd was not installed What I did:

$ yum install sssd
$ systemctl start sssd
$ systemctl enable sssd

Issue 2:

$ id s_khtang
uid=2006(s_khtang) gid=1000(n-5-34) groups=1000(n-5-34),1002(portal)

This means the machine mistake sysadmin group 1000 for n-5-34

What I did:

$ vim /etc/group
Change n-5-34:x:1000:n-5-34 to sysadmin:x:1000:n-5-34
$ authconfig-tui
Uncheck 'Shadow Password'