# Resolving Problems With ZFS

[https://docs.oracle.com/cd/E19253-01/819-5461/gbbuw/index.html](https://docs.oracle.com/cd/E19253-01/819-5461/gbbuw/index.html)

The following sections describe how to identify and resolve problems with your ZFS file systems or storage pools:

<div class="pc11 imgMax-590" id="bkmrk-determining-if-probl">- [Determining If Problems Exist in a ZFS Storage Pool](https://docs.oracle.com/cd/E19253-01/819-5461/gbcwb/index.html)
- [Reviewing <kbd>zpool status</kbd> Output](https://docs.oracle.com/cd/E19253-01/819-5461/gbcve/index.html)
- [System Reporting of ZFS Error Messages](https://docs.oracle.com/cd/E19253-01/819-5461/gbcvk/index.html)

</div>You can use the following features to identify problems with your ZFS configuration:

<div class="pc11 imgMax-590" id="bkmrk-detailed-zfs-storage">- Detailed ZFS storage pool information can be displayed by using the <kbd>zpool status</kbd> command.
- Pool and device failures are reported through ZFS/FMA diagnostic messages.
- Previous ZFS commands that modified pool state information can be displayed by using the <kbd>zpool history</kbd> command.

</div>Most ZFS troubleshooting involves the <kbd>zpool status</kbd> command. This command analyzes the various failures in a system and identifies the most severe problem, presenting you with a suggested action and a link to a knowledge article for more information. Note that the command only identifies a single problem with a pool, though multiple problems can exist. For example, data corruption errors generally imply that one of the devices has failed, but replacing the failed device might not resolve all of the data corruption problems.

In addition, a ZFS diagnostic engine diagnoses and reports pool failures and device failures. Checksum, I/O, device, and pool errors associated with these failures are also reported. ZFS failures as reported by <kbd>fmd</kbd> are displayed on the console as well as the system messages file. In most cases, the <kbd>fmd</kbd> message directs you to the <kbd>zpool status</kbd> command for further recovery instructions.<a name="indexterm-672"></a><a name="indexterm-673"></a>

The basic recovery process is as follows:

<div class="pc11 imgMax-590" id="bkmrk-if-appropriate%2C-use-">- If appropriate, use the <kbd>zpool history</kbd> command to identify the ZFS commands that preceded the error scenario. For example:
    
    <table border="1" cellpadding="1" width="100%"><tbody><tr><td nowrap="nowrap">  
    ```
    # <strong><kbd>zpool history tank</kbd></strong>
    History for 'tank':
    2010-07-15.12:06:50 zpool create tank mirror c0t1d0 c0t2d0 c0t3d0
    2010-07-15.12:06:58 zfs create tank/erick
    2010-07-15.12:07:01 zfs set checksum=off tank/erick
    ```
    
    </td></tr></tbody></table>
    
    In this output, note that checksums are disabled for the <kbd>tank/erick</kbd> file system. This configuration is not recommended.
- Identify the errors through the <kbd>fmd</kbd> messages that are displayed on the system console or in the <kbd>/var/adm/messages</kbd> file.
- Find further repair instructions by using the <kbd>zpool status -x</kbd> command.
- Repair the failures, which involves the following steps:
    
    
    - Replacing the faulted or missing device and bring it online.
    - Restoring the faulted configuration or corrupted data from a backup.
    - Verifying the recovery by using the <kbd>zpool status</kbd> <kbd>**-x**</kbd> command.
    - Backing up your restored configuration, if applicable.

</div>This section describes how to interpret <kbd>zpool status</kbd> output in order to diagnose the type of failures that can occur. Although most of the work is performed automatically by the command, it is important to understand exactly what problems are being identified in order to diagnose the failure. Subsequent sections describe how to repair the various problems that you might encounter.

<div class="pc11 imgMax-590" id="bkmrk-"><a name="6n7ht6r73"></a></div>## Determining If Problems Exist in a ZFS Storage Pool

The easiest way to determine if any known problems exist on a system is to use the <kbd>zpool status</kbd> <kbd>**-x**</kbd> command. This command describes only pools that are exhibiting problems. If no unhealthy pools exist on the system, then the command displays the following:

<div class="pc11 imgMax-590" id="bkmrk-%23-zpool-status--x-al"><table border="1" cellpadding="1" width="100%"><tbody><tr><td nowrap="nowrap">  
```
# <strong><kbd>zpool status -x</kbd></strong>
all pools are healthy
```

</td></tr></tbody></table>

</div>Without the <kbd>**-x**</kbd> flag, the command displays the complete status for all pools (or the requested pool, if specified on the command line), even if the pools are otherwise healthy.<a name="indexterm-674"></a><a name="indexterm-675"></a>

For more information about command-line options to the <kbd>zpool status</kbd> command, see [Querying ZFS Storage Pool Status](https://docs.oracle.com/cd/E19253-01/819-5461/gaynp/index.html).

<div class="pc11 imgMax-590" id="bkmrk--1"><a name="6n7ht6r74"></a></div>## Reviewing <kbd>zpool status</kbd> Output

The complete <kbd>zpool status</kbd> output looks similar to the following:

<div class="pc11 imgMax-590" id="bkmrk-%23-zpool-status-tank-"><table border="1" cellpadding="1" width="100%"><tbody><tr><td nowrap="nowrap">  
```
# <strong><kbd>zpool status tank</kbd></strong>
# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  UNAVAIL      0     0     0  cannot open

errors: No known data errors
```

</td></tr></tbody></table>

</div>This output is described next:

<div class="pc11 imgMax-590" id="bkmrk--2"><a name="6n7ht6r77"></a></div>### Overall Pool Status Information

This section in the <kbd>zpool status</kbd> output contains the following fields, some of which are only displayed for pools exhibiting problems:<a name="indexterm-676"></a><a name="indexterm-677"></a>

<div class="pc11 imgMax-590" id="bkmrk-pool-identifies-the-"><dl><dt><tt>pool</tt></dt><dd>Identifies the name of the pool.

</dd><dt><tt>state</tt></dt><dd>Indicates the current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level.

</dd><dt><tt>status</tt></dt><dd>Describes what is wrong with the pool. This field is omitted if no errors are found.

</dd><dt><tt>action</tt></dt><dd>A recommended action for repairing the errors. This field is omitted if no errors are found.

</dd><dt><tt>see</tt></dt><dd>Refers to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated. So, always reference them for the most up-to-date repair procedures. This field is omitted if no errors are found.

</dd><dt><tt>scrub</tt></dt><dd>Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub is in progress, or if no scrub was requested.

</dd><dt><tt>errors</tt></dt><dd>Identifies known data errors or the absence of known data errors.

</dd></dl><a name="6n7ht6r78"></a></div>### Pool Configuration Information

The <tt>config</tt> field in the <kbd>zpool status</kbd> output describes the configuration of the devices in the pool, as well as their state and any errors generated from the devices. The state can be one of the following: <tt>ONLINE</tt>, <tt>FAULTED</tt>, <tt>DEGRADED</tt>, <tt>UNAVAIL</tt>, or <tt>OFFLINE</tt>. If the state is anything but <tt>ONLINE</tt>, the fault tolerance of the pool has been compromised.

The second section of the configuration output displays error statistics. These errors are divided into three categories:

<div class="pc11 imgMax-590" id="bkmrk-read%C2%A0%E2%80%93-i%2Fo-errors-th">- <tt>READ</tt> – I/O errors that occurred while issuing a read request
- <tt>WRITE</tt> – I/O errors that occurred while issuing a write request
- <tt>CKSUM</tt> – Checksum errors, meaning that the device returned corrupted data as the result of a read request

</div>These errors can be used to determine if the damage is permanent. A small number of I/O errors might indicate a temporary outage, while a large number might indicate a permanent problem with the device. These errors do not necessarily correspond to data corruption as interpreted by applications. If the device is in a redundant configuration, the devices might show uncorrectable errors, while no errors appear at the mirror or RAID-Z device level. In such cases, ZFS successfully retrieved the good data and attempted to heal the damaged data from existing replicas.

For more information about interpreting these errors, see [Determining the Type of Device Failure](https://docs.oracle.com/cd/E19253-01/819-5461/gbbzs/index.html).

Finally, additional auxiliary information is displayed in the last column of the <kbd>zpool status</kbd> output. This information expands on the <tt>state</tt> field, aiding in the diagnosis of failures. If a device is <tt>FAULTED</tt>, this field indicates whether the device is inaccessible or whether the data on the device is corrupted. If the device is undergoing resilvering, this field displays the current progress.

For information about monitoring resilvering progress, see [Viewing Resilvering Status](https://docs.oracle.com/cd/E19253-01/819-5461/gbcus/index.html).

<div class="pc11 imgMax-590" id="bkmrk--3"><a name="6n7ht6r79"></a></div>### Scrubbing Status

The scrub section of the <kbd>zpool status</kbd> output describes the current status of any explicit scrubbing operations. This information is distinct from whether any errors are detected on the system, though this information can be used to determine the accuracy of the data corruption error reporting. If the last scrub ended recently, most likely, any known data corruption has been discovered.

Scrub completion messages persist across system reboots.

For more information about the data scrubbing and how to interpret this information, see [Checking ZFS File System Integrity](https://docs.oracle.com/cd/E19253-01/819-5461/gbbwa/index.html).

<div class="pc11 imgMax-590" id="bkmrk--4"><a name="6n7ht6r7a"></a></div>### Data Corruption Errors

The <kbd>zpool status</kbd> command also shows whether any known errors are associated with the pool. These errors might have been found during data scrubbing or during normal operation. ZFS maintains a persistent log of all data errors associated with a pool. This log is rotated whenever a complete scrub of the system finishes.

Data corruption errors are always fatal. Their presence indicates that at least one application experienced an I/O error due to corrupt data within the pool. Device errors within a redundant pool do not result in data corruption and are not recorded as part of this log. By default, only the number of errors found is displayed. A complete list of errors and their specifics can be found by using the <kbd>zpool status</kbd> <kbd>**-v**</kbd> option. For example:<a name="indexterm-678"></a><a name="indexterm-679"></a><a name="indexterm-680"></a>

<div class="pc11 imgMax-590" id="bkmrk-%23-zpool-status--v-po"><table border="1" cellpadding="1" width="100%"><tbody><tr><td nowrap="nowrap">  
```
# <strong><kbd>zpool status -v</kbd></strong>
  pool: tank
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub completed after 0h0m with 0 errors on Tue Feb  2 13:08:42 2010
config:

        NAME        STATE     READ WRITE CKSUM
        tank        UNAVAIL      0     0     0  insufficient replicas
          c1t0d0    ONLINE       0     0     0
          c1t1d0    UNAVAIL      4     1     0  cannot open

errors: Permanent errors have been detected in the following files: 

/tank/data/aaa
/tank/data/bbb
/tank/data/ccc
```

</td></tr></tbody></table>

</div>A similar message is also displayed by <kbd>fmd</kbd> on the system console and the <kbd>/var/adm/messages</kbd> file. These messages can also be tracked by using the <kbd>fmdump</kbd> command.

For more information about interpreting data corruption errors, see [Identifying the Type of Data Corruption](https://docs.oracle.com/cd/E19253-01/819-5461/gbcuz/index.html).

<div class="pc11 imgMax-590" id="bkmrk--5"><a name="6n7ht6r7b"></a></div>## System Reporting of ZFS Error Messages

In addition to persistently tracking errors within the pool, ZFS also displays <tt>syslog</tt> messages when events of interest occur. The following scenarios generate events to notify the administrator:<a name="indexterm-681"></a><a name="indexterm-682"></a><a name="indexterm-683"></a>

<div class="pc11 imgMax-590" id="bkmrk-device-state-transit">- **Device state transition** – If a device becomes <tt>FAULTED</tt>, ZFS logs a message indicating that the fault tolerance of the pool might be compromised. A similar message is sent if the device is later brought online, restoring the pool to health.
- **Data corruption** – If any data corruption is detected, ZFS logs a message describing when and where the corruption was detected. This message is only logged the first time it is detected. Subsequent accesses do not generate a message.
- **Pool failures and device failures** – If a pool failure or a device failure occurs, the fault manager daemon reports these errors through <tt>syslog</tt> messages as well as the <kbd>fmdump</kbd> command.

</div>If ZFS detects a device error and automatically recovers from it, no notification occurs. Such errors do not constitute a failure in the pool redundancy or in data integrity. Moreover, such errors are typically the result of a driver problem accompanied by its own set of error messages.