/sbin/init

Phew, it’s been a long time. It doesn’t even start with /sbin/init anymore. It is replaced by systemd. Maybe we are all obsolete and replaced.

Introduction

In the old days, the PID 1 (not zero) is for /sbin/init which is basically responsible for loading the system up for users and starting the system daemons. Nowadays, you get something similar to the following:

$ ps auxw|head -n 2
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  1.0  0.0 166468 12368 ?        Ss   19:24   0:31 /run/current-system/systemd/lib/systemd/systemd

Yes, most of the System V components are now replaced by System D. Why the name? My guess is, V is Roman numeral 5 so the upgrade can only be 500.

Inittab & Runlevels

rc.d system was a neat system that employed shell scripts to load and guard system services such as the old xinet daemon. /bin/init basically executes a series of operations defined to reach something called runlevel. The runlevels are defined in /etc/inittab file, here is a sample inittab file.

#
# inittab This file describes how the INIT process should set up
# the system in a certain run-level.
#
# Author: Miquel van Smoorenburg, miquels@drinkel.nl.mugnet.org
# Modified for RHS Linux by Marc Ewing and Donnie Barnes
#
# Default runlevel. The runlevels used by RHS are:
# 0 – halt (Do NOT set initdefault to this)
# 1 – Single user mode
# 2 – Multiuser, without NFS (The same as 3, if you do not have networking)
# 3 – Full multiuser mode
# 4 – unused
# 5 – X11
# 6 – reboot (Do NOT set initdefault to this)
#

id:3:initdefault:

# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit
l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Trap CTRL-ALT-DELETE
#ca::ctrlaltdel:/sbin/shutdown -t3 -r now
# When our UPS tells us power has failed, assume we have a few minutes
# of power left. Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.

pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”
# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon

As you might observe, there are 7 runlevels defined in the file. Levels 0 and 6 are defined for halt and reboot. If you run init 0 or init 6 than the system will halt or reboot itself, respectively. Level 4 is unused. So we are left with 1,2,3,5. Level 1 is a single-user mode where the system boots into a shell with root filesystem mounted. Levels 2 and 3 are almost the same, the only difference is Level 3 has NFS partitions/directories exported.

In addition to system services, to enable console logins, mingetty is spawned on consoles 1 to 6. Let’s how they are defined. First column is the console, the second is in which runlevels should this spawn, third one is for options and the last one is the program and its arguments.

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

The package responsible for init is called sysvinit. I have found a copy of the repo on GitHub. The procedure responsible for parsing the inittab file is on line 1276 has the type signature static void read_inittab(void).

/*
 *	Read the inittab file.
 */
static
void read_inittab(void)
{
  FILE		*fp;			/* The INITTAB file */
  CHILD		*ch, *old, *i;		/* Pointers to CHILD structure */
  CHILD		*head = NULL;		/* Head of linked list */

On line 1310 the file is opened.

  /*
   *	Open INITTAB and read line by line.
   */
  if ((fp = fopen(INITTAB, "r")) == NULL)
	initlog(L_VB, "No inittab file found");

Later on, fields separated by : are decoded on line 1340.

	/*
	 *	Decode the fields
	 */
	id =      strsep(&p, ":");
	rlevel =  strsep(&p, ":");
	action =  strsep(&p, ":");
	process = strsep(&p, "\n");

The options column I have mentioned before is actually the action field. So, what an action might be?

        /*  
	 *	Decode the "action" field
	 */
	actionNo = -1;
	for(f = 0; actions[f].name; f++)
		if (strcasecmp(action, actions[f].name) == 0) {
			actionNo = actions[f].act;
			break;
		}
	if (actionNo == -1) {
		initlog(L_VB, "%s[%d]: %s: unknown action field",
			INITTAB, lineNo, action);
		continue;
	}

Actions

Action structure is defined in the same file.

/* ascii values for the `action' field. */
struct actions {
  char *name;
  int act;
} actions[] = {
  { "respawn", 	   RESPAWN	},
  { "wait",	   WAIT		},
  { "once",	   ONCE		},
  { "boot",	   BOOT		},
  { "bootwait",	   BOOTWAIT	},
  { "powerfail",   POWERFAIL	},
  { "powerfailnow",POWERFAILNOW },
  { "powerwait",   POWERWAIT	},
  { "powerokwait", POWEROKWAIT	},
  { "ctrlaltdel",  CTRLALTDEL	},
  { "off",	   OFF		},
  { "ondemand",	   ONDEMAND	},
  { "initdefault", INITDEFAULT	},
  { "sysinit",	   SYSINIT	},
  { "kbrequest",   KBREQUEST    },
  { NULL,	   0		},
};

Similarly, constants are defined in init.h.

/* Actions to be taken by init */
#define RESPAWN			1
#define WAIT			2
#define ONCE			3
#define	BOOT			4
#define BOOTWAIT		5
#define POWERFAIL		6
#define POWERWAIT		7
#define POWEROKWAIT		8
#define CTRLALTDEL		9
#define OFF		       10
#define	ONDEMAND	       11
#define	INITDEFAULT	       12
#define SYSINIT		       13
#define POWERFAILNOW           14
#define KBREQUEST              15

In the manual init(4), I’ve found descriptions for the actions.

  1. respawn

If the process does not exist, then start the process; do not wait for its termination (continue scanning the inittab file), and when the process dies, restart the process. If the process currently exists, do nothing and continue scanning the inittab file.

So this the default for mingetty since we want tty’s to be respawned when user issues quit or exit.

  1. wait

When init enters the run level that matches the entry’s rstate, start the process and wait for its termination. All subsequent reads of the inittab file while init is in the same run level cause init to ignore this entry.

rstate is basically runlevel in this entry.

  1. once

When init enters a run level that matches the entry’s rstate, start the process, do not wait for its termination. When it dies, do not restart the process. If init enters a new run level and the process is still running from a previous run level change, the program is not restarted.

The only difference I see between wait and once is if you alter inittab, in wait, it does not respond.

  1. boot

The entry is to be processed only at init’s boot-time read of the inittab file. init is to start the process and not wait for its termination; when it dies, it does not restart the process. In order for this instruction to be meaningful, the rstate should be the default or it must match init’s run level at boot time. This action is useful for an initialization function following a hardware reboot of the system.

  1. bootwait

The entry is to be processed the first time init goes from single-user to multi-user state after the system is booted. (If initdefault is set to 2, the process runs right after the boot.) init starts the process, waits for its termination and, when it dies, does not restart the process.

These two are used to spawn during boot.

  1. powerfail

Execute the process associated with this entry only when init receives a power fail signal, SIGPWR (see signal(3C)).

  1. powerwait

Execute the process associated with this entry only when init receives a power fail signal, SIGPWR, and wait until it terminates before continuing any processing of inittab.

These two are really convenient for an UPS to be serially connected to run a daemon and tell init to shutdown by just kill -SIGPWR 1 where 1 is the PID of init as usual. On x86, signal(7) lists SIGPWR as number 30.

  1. off

If the process associated with this entry is currently running, send the warning signal SIGTERM and wait 5 seconds before forcibly terminating the process with the kill signal SIGKILL. If the process is nonexistent, ignore the entry.

Never used this one but seems like in case we don’t want a process to run at a runlevel we just put an off action in inittab so that init gets rid of it.

  1. ondemand

This instruction is really a synonym for the respawn action. It is functionally identical to respawn but is given a different keyword in order to divorce its association with run levels. This instruction is used only with the a, b or c values described in the rstate field.

This is really interesting and needs attention. Never read about this before either.

  1. initdefault

An entry with this action is scanned only when init is initially invoked. init uses this entry to determine which run level to enter initially. It does this by taking the highest run level specified in the rstate field and using that as its initial state. If the rstate field is empty, this is interpreted as 0123456 and init will enter run level 6. This will cause the system to loop (it will go to firmware and reboot continuously). Additionally, if init does not find an initdefault entry in inittab, it requests an initial run level from the user at reboot time.

initdefault basically defines the default runlevel as in the following:

id:3:initdefault:

I guess the only interesting case is when initdefault is not defined. There is a case where system loops? forever. Might be a nice joke for a close friend I guess :) Let me know what happens if I comment the line above.

  1. sysinit

Entries of this type are executed before init tries to access the console (that is, before the Console Login: prompt). It is expected that this entry will be used only to initialize devices that init might try to ask the run level question. These entries are executed and init waits for their completion before continuing.

This pretty much does all the heavy-lifting like starting devfs, mounting /proc, umounting initrd, setting hostname, system clock, initializing LVM, remounting root fs, etc.

So, what about ondemand? On line 1264, it can be observed that it is not really different from RESPAWN.

    case BOOT:
    case POWERFAIL:
    case ONCE:
	     if (ch->flags & XECUTED) break;
    case ONDEMAND:
    case RESPAWN:
	     ch->flags |= RUNNING;
	     (void)spawn(ch, &(ch->pid));
	     break;

Again, in spawn(), on line 992, these are treated as the same.

  ch->flags |= XECUTED;

  if (ch->action == RESPAWN || ch->action == ONDEMAND) {
	/* Is the date stamp from less than 2 minutes ago? */
	time(&t);

So we can safely assume they are the same. Let me know if there is a difference :)

“got the power?”

Let’s learn something interesting, shall we? How does a UPS daemon would signal init to tell about a power outage. In order to do this, let’s look into the apcupsd svn trunk. In examples/hid-ups.c, on line 265, void powerfail(int state) is defined as follows:

/* Tell init the power has either gone or is back. */
void powerfail(int state) {
#ifndef TESTING
    int fd;
    
    /* Create an info file needed by init to shutdown/cancel shutdown */
    unlink(PWRSTAT);
    if ((fd = open(PWRSTAT, O_CREAT|O_WRONLY, 0644)) >= 0) {
	if (state > 0)
            write(fd, "FAIL\n", 5);
	else if (state < 0)
            write(fd, "LOW\n", 4);
	else
            write(fd, "OK\n", 3);
	close(fd);
    }
    kill(1, SIGPWR);
#else
    printf("We are in powerfail() with state=%d ", state);
    if (state > 0)
        printf("POWER FAILURE\n");
    else if (state < 0)
        printf("BATTERY LOW\n");
    else
        printf("OK\n");
#endif
    
}

Basically, the function opens a file called PWRSTAT, writes current status and signals PID 1 with SIGPWR. The following on line 52 defines the target file:

#define PWRSTAT "/etc/powerstatus"

On the other side, in static void process_signals() in init.c,

static
void process_signals()
{
  CHILD		*ch;
  int		pwrstat;
  int		oldlevel;
  int		fd;
  char		c;

  if (ISMEMBER(got_signals, SIGPWR)) {
	INITDBG(L_VB, "got SIGPWR");
	/* See _what_ kind of SIGPWR this is. */
	pwrstat = 0;
	if ((fd = open(PWRSTAT, O_RDONLY)) >= 0) {
		if (read(fd, &c, 1) != 1)
			c = 0;
		pwrstat = c;
		close(fd);
		unlink(PWRSTAT);
	} else if ((fd = open(PWRSTAT_OLD, O_RDONLY)) >= 0) {
		/* Path changed 2010-03-20.  Look for the old path for a while. */
		initlog(L_VB, "warning: found obsolete path %s, use %s instead",
			PWRSTAT_OLD, PWRSTAT);
		if (read(fd, &c, 1) != 1)
			c = 0;
		pwrstat = c;
		close(fd);
		unlink(PWRSTAT_OLD);
        }
	do_power_fail(pwrstat);
	DELSET(got_signals, SIGPWR);
  }

PWRSTAT and PWRSTAT_OLD are defined in src/paths.h as follows:

#define PWRSTAT_OLD	"/etc/powerstatus"	/* COMPAT: SIGPWR reason (OK/BAD) */
#define PWRSTAT		"/var/run/powerstatus"	/* COMPAT: SIGPWR reason (OK/BAD) */

Therefore pwrstat is basically the first char of the power status, one of ‘F’, ‘L’ or ‘O’. This variable is handled in static void do_power_fail(int pwrstat) on line 1944 to switch to appropriate runlevel.

static
void do_power_fail(int pwrstat)
{
	CHILD *ch;
	/*
	 *	Tell powerwait & powerfail entries to start up
	 */
	for (ch = family; ch; ch = ch->next) {
		if (pwrstat == 'O') {
			/*
		 	 *	The power is OK again.
		 	 */
			if (ch->action == POWEROKWAIT)
				ch->flags &= ~XECUTED;
		} else if (pwrstat == 'L') {
			/*
			 *	Low battery, shut down now.
			 */
			if (ch->action == POWERFAILNOW)
				ch->flags &= ~XECUTED;
		} else {
			/*
			 *	Power is failing, shutdown imminent
			 */
			if (ch->action == POWERFAIL||ch->action == POWERWAIT)
				ch->flags &= ~XECUTED;
		}
	}
}

This seems pretty adequate. One thing to notice is, if you have write access to any files above, you might shutdown any server running UPS daemon by writing the file “F” and exec'ing kill -SIGPWR 1 and the daemon will not notice.

Conclusion

Unfortunately System V is replaced by System D. More functionally is provided by it and more control can be applied over system and services. System logs are also handled by the mighty journald. The one I like the most is coredumpctl where I can basically list all faults on the system and see stack traces and even run gdb over coredumps.

Until next time!