
BigAdmin News
Information & Resources
delivered to you.
Subscribe today!
» More
|
|
Solaris Service Management Facility - Quickstart Guide
Introduction
UNIX operating systems have traditionally included a set of
services: software programs not associated with any interactive user
login that listen for and respond to requests to perform certain tasks,
such as delivering email, responding to ftp requests, or permitting remote
command execution. These traditional services were usually individual
applications that executed as a single process that started at boot time
and executed continuously while a system was up and running, servicing any
requests that were received.
Today, administrators must contend with a collection of services that
has grown to such a point that it has exceeded the utility of this original
model. Sun has created the Service Management Facility (SMF)
to simplify management of these system services. SMF is a new feature of
the Solaris Operating System that creates a supported, unified model for
services and service management on each Solaris system. It is a core part
of the Predictive Self-Healing technology available in Solaris 10, which
provides automatic recovery from software and hardware failures as well as
administrative errors.
In this guide, we'll describe the features and benefits of SMF, point
out some parts of Solaris that have changed significantly, and show how to
accomplish typical administration tasks using SMF. A comprehensive guide
to SMF and Predictive Self-Healing is available on
Sun's BigAdmin website.
Features
The Service Management Facility has improved several aspects of the
Solaris administrative model. Some of the most notable updates are:
- Services are represented as first-class objects that can be viewed
(using the new
svcs(1) command) and managed (using
svcadm(1M) and svccfg(1M) ).
- Failed services are automatically restarted in dependency order,
whether they failed as the result of administrator error, software
bug, or were affected by an uncorrectable hardware error.
- More information is available about misconfigured or misbehaving
services, including an explanation of why a service isn't running
(using "
svcs -x "), as well as individual, persistent log
files for each service.
- Problems during the boot process are easier to debug, as boot
verbosity can be controlled, service startup messages are logged, and
console access is provided more reliably during startup failures.
- Snapshots of service configurations are taken automatically, making it
easier to backup, restore, and undo changes to services.
- Services can be enabled and disabled using a supported tool
(
svcadm(1M) ), allowing the changes to persist across upgrades
and patches.
- Administrators can securely delegate tasks to non-root users more
easily, including the ability to configure, start, stop, or restart
services (as described in the
smf_security(5) man page).
- Large systems boot faster by starting services in parallel according
to their dependencies.
Despite these changes, compatibility with existing administrative
practices has been preserved wherever possible. For example, most
site-local and ISV-supplied "rc" scripts will still work as usual.
Notable Changes
Most of the new features provided by SMF happen "behind the scenes" or
are accessed by new commands; however, some changes will be apparent very
quickly. Here's what some of these changes look like.
In previous versions of Solaris, a fair amount of output would be
printed to the system console during boot. Although the messages gave some
insight into what was happening, they were not very helpful in several
regards. A handful of services would print messages indicating that they
had come on line, while many others didn't. Some failure modes would print
messages (such as "WARNING: Timed out waiting for NIS to come up ")
that didn't help diagnose the underlying problem. Error messages would
sometimes be printed directly to the console, but they wouldn't show up in
any log.
The boot process is much quieter now.
Here's an example of what a machine looks like while booting under SMF:
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2004 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: demobox
NIS domain name is testlab.example.com
checking ufs filesystems
demobox console login:
Although fewer messages are printed, SMF has made the boot process more
observable. Each service has a log file in the /var/svc/log
directory (or the /etc/svc/volatile directory, for services
started before the single-user milestone) indicating when and how it was
started, whether it started successfully, and any messages it may have
printed during its initialization. If a severe problem occurs during boot,
you will be able to log in on the console in maintenance mode, and you can
use the svcs(1) command to help diagnose the problem. This is
even the case for problems which would have caused boot to hang -- such as
the NIS failure mentioned above. Finally, the new "-m " boot
option (see kernel(1M) ) allows you to configure the boot process
to be more verbose, printing a simple message when each service starts.
You may also notice processes "refusing to die" after being killed.
For example:
# ps -fp `pgrep -d, sendmail`
UID PID PPID C STIME TTY TIME CMD
root 330 1 0 14:21:05 ? 0:00 /usr/lib/sendmail -bd -q15m
smmsp 331 1 0 14:21:05 ? 0:00 /usr/lib/sendmail -Ac -q15m
# pkill -9 sendmail
# ps -fp `pgrep -d, sendmail`
UID PID PPID C STIME TTY TIME CMD
root 530 1 0 14:51:02 ? 0:00 /usr/lib/sendmail -bd -q15m
smmsp 531 1 0 14:51:02 ? 0:00 /usr/lib/sendmail -Ac -q15m
At first glance, it may appear that nothing happened, despite having
used kill -9 . But notice that the PIDs are different, and
the process start times have changed; the old sendmail processes did, in
fact, die.
SMF has added the notion of the relationship between a service, its
processes, and another service that is responsible for restarting the
service, to the Solaris kernel. This restart relationship is tightly
integrated with Sun's new technologies for fault management on Solaris,
permitting SMF restarters to understand whether a service process failed as
the result of an administrator error, failure of a dependent service,
software bug, or underlying hardware failure. Once this information has
been captured following any service failure, SMF informs the appropriate
restarter, which decides either to disable the service by placing it in
maintenance mode because it appears to be defective, or to restart
it automatically. The default SMF restarter, svc.startd ,
is responsible for starting and restarting most of the services on your
Solaris system. In the example above, svc.startd noticed that
sendmail had died, logged a message to that effect, and restarted it
automatically.
If you want to stop a service and not have its processes restarted, use
the svcadm(1M) command (see the "Common Tasks" section
below). Note also that not all system services have been converted to use
SMF yet; any processes belonging to these legacy services will not be
restarted if they are killed.
Finally, you may notice that the
/etc/init.d and /etc/rc*.d directories, as well as the
/etc/inittab file, are quite a bit emptier than in previous
releases of Solaris. SMF-managed services no longer use rc scripts or
inittab entries for startup and shutdown, so the scripts
corresponding to those services have been removed. In future releases of
Solaris, more services will be managed by SMF, and these directories will
become less and less populated. rc scripts and inittab entries which
manage ISV-provided or locally developed services will continue to be run
at boot. These services may not run at exactly the same point in boot as
they had before the advent of SMF, but they are guaranteed to not run any
earlier -- so any services which they had implicitly depended on will still
be available.
Service Names
Solaris uses a URI string called an FMRI (Fault Managed
Resource Identifier) to identify system objects for which advanced
fault and resource management capabilities are provided. Services managed
by SMF are assigned FMRI strings prefixed with the scheme name
"svc ", as shown in the following examples for the Solaris service
syslogd(1M) :
svc://localhost/system/system-log:default
svc:/system/system-log:default
system/system-log:default
Notice that these service FMRIs used by SMF can be expressed in
three ways: first as an absolute path including a location path such as
"localhost "; second as a path relative to the local machine; and
third as simply the service identifier with the string prefixes implied.
The SMF administrator tools described in the rest of this document
typically describe services using the third form, as they are assumed to be
operating on local services. Other management tools that operate on
multiple types of resources or across machine boundaries may use one of the
other forms to describe services. The SMF tools in the current release of
Solaris can only manage services on the local host.
Since the FMRI strings are fairly long, the SMF tools allow abbreviated
forms of the FMRIs to be used. The abbreviation must be unique, must match
the trailing part of the service name (although the ":default " can
be left off), and it must begin after a "/ ". So the following are
acceptable abbreviations of the above FMRI:
system-log:default
system-log
These abbreviations should be used with care, as a new service may be
added at some point that includes the same substring (e.g.
"svc:/mysite/system-log:default "). The SMF tools will print a
warning if a non-unique abbreviation is used.
The FMRIs for Solaris system services include a general functional
category, such as "application", "milestone", "network", "platform", and
"system", as well as a descriptive name similar to the name of the
service's daemon or the old rc script. The svcs(1) command will
list all active services available on a machine:
% svcs
STATE STIME FMRI
...
online 11:19:35 svc:/network/nfs/status:default
offline 18:20:30 svc:/application/print/rfc1179:default
maintenance 18:20:26 svc:/network/ntp:default
Since services are now first-class objects, SMF can even provide
information about services that aren't enabled, using the "-a "
option of the svcs(1) command.
Common Tasks
SMF is a particularly notable change in Solaris because it impacts the
administrative model. So although we encourage you to read more about the
features of SMF (see the "More information"
section below), you may want to start by learning how to do some common
system administration tasks.
Enabling and disabling services
Prior to Solaris 10, there wasn't a good way to permanently disable
a service in Solaris. The typical method used is to rename the
relevant rc script to a name that won't get executed, but that change
will get overlooked the next time the system is upgraded. Furthermore,
inetd-based services are enabled and disabled by a totally different
method -- editing a configuration file. Under SMF, both types of
services can be configured using the svcadm(1M) command, and
the changes will persist if the machine is upgraded. Here's a
comparison of how to enable and disable some services:
|
|
mv /etc/rc2.d/S75cron /etc/rc2.d/x.S75cron
|
svcadm disable system/cron:default
|
edit /etc/inet/inetd.conf ,
uncomment the finger line
|
svcadm enable network/finger:default
|
|
The last argument to svcadm in these examples is the FMRI
of the service.
Note that svcadm should only be used for SMF services --
legacy rc script-controlled services work the same as in past
releases.
Stopping, starting, and restarting services
Traditionally, services have been started by an rc script run at
boot, run with the argument start . Some rc scripts provide a
stop option, and a few also allow restart . In SMF,
these tasks are all accomplished with the svcadm(1M) command:
|
|
/etc/init.d/sshd stop
|
svcadm disable -t network/ssh:default
|
/etc/init.d/sshd start
|
svcadm enable -t network/ssh:default
|
/etc/init.d/sshd stop; /etc/init.d/sshd start
|
svcadm restart network/ssh:default
|
kill -HUP `cat /var/run/sshd.pid`
|
svcadm refresh network/ssh:default
|
|
The "-t " option to svcadm enable and
svcadm disable indicates that the requested action should
be temporary -- it will not affect whether the service is
started the next time that the system boots. This is in contrast to
the "Enabling and disabling
services" example, above.
As with the enabling and disabling of services, svcadm
should not be used to control rc script-controlled services; they
continue to work the same as in past releases.
Observing the boot process
As mentioned in the "Notable Changes"
section, the boot process is much quieter by default than in
previous releases of Solaris. This was done to reduce the amount of
uninformative "chatter" that might obscure any real problems that might
occur during boot.
Some new boot options have been added to control the verbosity of
boot. One that you may find particularly useful is
"-m verbose ", which prints a line of information when
each service attempts to start up. This is similar to the default boot
mode for some other UNIX-based and UNIX-like operating systems.
Verbose boot looks like this:
{1} ok boot -m verbose
Rebooting with command: boot -m verbose
Boot device: /pci@1c,600000/scsi@2/disk@0,0:a File and args: -m verbose
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2004 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
[ network/pfil:default starting (pfil) ]
[ network/loopback:default starting (Loopback network interface) ]
[ system/filesystem/root:default starting (Root filesystem mount) ]
Oct 18 13:53:02/13: system start time was Mon Oct 18 13:52:57 2004
[ network/physical:default starting (Physical network interfaces) ]
[ system/filesystem/usr:default starting (/usr and / mounted read/write) ]
( more service messages elided )
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ network/ntp:default starting (network time protocol (NTP)) ]
[ system/utmp:default starting (utmpx monitoring) ]
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ system/console-login:default starting (Console login) ]
demobox console login: checking ufs filesystems
/dev/rdsk/c0t0d0s7: is logging.
Oct 18 13:53:14/50: system/system-log:default starting
Oct 18 13:53:14/51: network/inetd:default starting
Oct 18 13:53:14/52: system/cron:default starting
( more service messages elided )
The order of the service start messages may change from boot to
boot, because SMF starts services in parallel according to their
dependency relationships.
If a service fails to start successfully, warning messages will
be printed in addition to the start message. Here's an example where
the NTP service failed to start up:
[ system/filesystem/local:default starting (Local filesystem mounts) ]
[ network/ntp:default starting (network time protocol (NTP)) ]
Oct 25 13:58:42/49 ERROR: svc:/network/ntp:default:
Method "/lib/svc/method/xntp" failed with exit status 96.
Oct 25 13:58:42 svc.startd[4]: svc:/network/ntp:default:
Method "/lib/svc/method/xntp" failed with exit status 96.
[ network/ntp:default misconfigured (see 'svcs -x' for details) ]
[ system/utmp:default starting (utmpx monitoring) ]
( more service messages elided )
The first two error messages would appear during both normal boot
and verbose boot; the last one ("network/ntp:default
misconfigured ...") would only appear during verbose boot.
Discovering what's going wrong
Solaris has not had a comprehensive place to look for problems with
system services. Some solutions exist to help catch and diagnose these
problems, ranging from coreadm(1M) logging to site-specific
monitoring scripts to comprehensive products such as Sun Cluster. The
new svcs(1) command includes an "explain" option
("svcs -x "), which prints out detailed, solution-driven
messages about the services that are not running.
svcs -x shows when and why the service failed, provides
pointers to more information about the problem, and lists what other
services are affected by this problem.
Continuing with the example of the NTP service failing to start up:
# svcs -x
svc:/network/ntp:default (Network Time Protocol (NTP).)
State: maintenance since Mon Oct 18 13:58:42 2004
Reason: Start method exited with $SMF_EXIT_ERR_CONFIG.
See: http://sun.com/msg/SMF-8000-KS
See: ntpq(1M)
See: ntpdate(1M)
See: xntpd(1M)
Impact: 0 services are not running.
The NTP service has been placed into maintenance mode because the
startup script indicated that there was a problem with the service's
configuration. Further information about the service failure is
available in the service's log file in the /var/svc/log
directory (or the /etc/svc/volatile directory). The log file
name is based off the short form of the FMRI, with "/ "'s
replaced by "- "'s. So the log file for the
svc:/network/ntp:default service is
/var/svc/log/network-ntp:default.log . This log file quickly
led to the conclusion that the NTP daemon's configuration file,
/etc/inet/ntp.conf , had been removed.
Another example shows SMF's ability to track dependencies and point
out problems relating to disabled services. We use the "-v "
option in this example to see the list of impacted services.
# svcs -x -v
svc:/application/print/server:default (LP Print Service)
State: disabled since Mon Oct 18 16:17:27 2004
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: man -M /usr/share/man -s 1M lpsched
Impact: 1 service is not running:
svc:/application/print/rfc1179:default
Here, the application/print/server:default service has
been explicitly disabled, but another service which depended on it
(application/print/rfc1179:default ) was not disabled. So the
disabling of the first service has kept the second one from running.
Observing services
In earlier versions of Solaris, the only way to see what services
were available was to use the ps(1) command and list all the
active processes on the system, and then look around for the names of
processes that match the names of service applications. Unfortunately,
since most systems have many processes, and new services are introduced
with each new version of Solaris and when other software packages are
added, it's very difficult to track things this way. To further
complicate the situation, many modern services are no longer
implemented as single processes. Some services are implemented as
collections of processes, multi-threaded processes, or both
simultaneously.
The new svcs(1) command makes it much easier to observe
the status of a system service. The "-p " option shows all the
processes associated with a service:
% svcs -p network/smtp:sendmail
STATE STIME FMRI
online 18:20:30 svc:/network/smtp:sendmail
18:20:30 655 sendmail
18:20:30 657 sendmail
% ps -fp 655,657
UID PID PPID C STIME TTY TIME CMD
root 655 1 0 18:20:30 ? 0:01 /usr/lib/sendmail -bd -q15m
smmsp 657 1 0 18:20:30 ? 0:00 /usr/lib/sendmail -Ac -q15m
The "-d " option shows what other services this service
depends on, and the "-D " option shows what other services
depend on this service:
% svcs -d network/smtp:sendmail
STATE STIME FMRI
online 18:20:14 svc:/system/identity:domain
online 18:20:26 svc:/network/service:default
online 18:20:27 svc:/system/filesystem/local:default
online 18:20:27 svc:/milestone/name-services:default
online 18:20:27 svc:/system/system-log:default
online 18:20:30 svc:/system/filesystem/autofs:default
% svcs -D network/smtp:sendmail
STATE STIME FMRI
online 18:20:32 svc:/milestone/multi-user:default
We can see that sendmail requires networking, local file systems,
name services, the syslog daemon, and the automount daemon to be
running before it will run, and sendmail itself must be running before
the multi-user milestone can be reached. The service start times (the
STIME column) illustrate that these dependencies have been
followed.
Changing run levels
SMF has introduced the concept of milestones, which supplant
the traditional notion of run levels. Run levels provide a basic
description of the set of services running on the machine,
traditionally grouped as the services necessary for one user to log in
on the machine console (run level S), and for multiple users to log in
to the machine (run levels 2 and 3). These system states are
represented in SMF as milestones, which are stable services that
represent a group of other services. "svcs -d " can be
used to see what services must be running before a milestone is
reached.
svcadm(1M) is now the preferred method of setting the
system's default run level. This is done with the milestone
subcommand and the FMRI of a valid milestone:
|
|
edit /etc/inittab
|
svcadm milestone -d milestone/single-user:default
|
|
The "-d " option indicates that the default
milestone should be set to the named FMRI. Without "-d ",
"svcadm milestone " transitions the system to the named
milestone immediately.
The boot process has been updated to be aware of milestones. In
addition to the traditional "boot -s " (boot into
single-user mode), there is now
"boot -m milestone= <milestone>", to
boot to the named milestone. <milestone> can be
"single-user ", "multi-user ", or
"multi-user-server ", as well as the special milestones
"all " (all enabled services online) and "none " (no
services at all). The "none " milestone can be very useful in
repairing systems that have failures early in the boot process.
Booting to the single-user milestone (with
"-m milestone=single-user ") is slightly different than
the old "boot -s ". When the system is explicitly booted
to a milestone, exiting the console administrative shell will not
transition the system to multi-user mode, as "boot -s "
does. To move to multi-user mode after
"boot -m milestone=single-user ", use the command
"svcadm milestone milestone/multi-user-server:default ".
Enabling, disabling, and monitoring legacy services
Services that are started by traditional rc scripts (referred to as
legacy services) will generally continue to work as they always
have. They will show up in the output of svcs(1) , with an
FMRI based on the pathname of their rc script, but they can not be
controlled by svcadm(1M) . They should be stopped and started
by running the rc script directly.
As mentioned in the "Notable Changes"
section, rc scripts may not run at exactly the same point in boot
as they had in earlier versions of Solaris. In particular, scripts
which depend on running before certain Solaris-provided rc scripts may
encounter problems. The vast majority of scripts should continue to
work without any trouble, though.
Adding new services to inetd.conf
The Internet services daemon, inetd(1M) , has been
rewritten as part of SMF. It stores all of its configuration data in
the SMF database, rather than /etc/inet/inetd.conf , allowing
the SMF tools to be used to control and observe inetd-based services.
Most inetd-based services that ship with Solaris will no longer have
entries in inetd.conf . To provide compatibility for services
which haven't converted to SMF, entries can still be added to
inetd.conf using the same syntax as always, and the new
inetconv(1M) command will convert the new services to SMF
services. inetconv should always be run after editing
/etc/inet/inetd.conf ; it can be run without any arguments.
More information
To learn more about SMF, refer to the following documentation:
|