IPMCTL-START-DIAGNOSTIC(1) | ipmctl | IPMCTL-START-DIAGNOSTIC(1) |
ipmctl-start-diagnostic - Starts a diagnostic test
ipmctl start [OPTIONS] -diagnostic [TARGETS]
Starts a diagnostic test.
-h, -help
-ddrt
-smbus
The -ddrt and -smbus options are mutually exclusive and may not be used together.
-lpmb
-spmb
The -lpmb and -spmb options are mutually exclusive and may not be used together.
-o (text|nvmxml), -output (text|nvmxml)
-diagnostic [Quick|Config|Security|FW]
-dimm [DimmIDS]
Starts all diagnostics.
ipmctl start -diagnostic
Starts the quick check diagnostic on PMem module 0x0001.
ipmctl start -diagnostic Quick -dimm 0x0001
If a PMem module is unmanageable, then Quick test will report the reason, while Config, Security and FW tests will skip unmanageable PMem modules.
Each diagnostic generates one or more log messages. A successful test generates a single log message per PMem module indicating that no errors were found. A failed test might generate multiple log messages each highlighting a specific error with all the relevant details. Each log contains the following information.
Test
State
Message
SubTestName
Test Name | Valid SubTest Names |
Quick | 4 • Manageability 4 • Boot status 4 • Health |
Config | 4 • PMem module specs 4 • Duplicate PMem module 4 • System Capability 4 • Namespace LSA 4 • PCD |
Security | 4 • Encryption status 4 • Inconsistency |
FW | 4 • FW Consistency 4 • Viral Policy 4 • Threshold check 4 • System Time |
State
Events are generated as a result of invoking the Start Diagnostics command in order to analyze the Intel® Optane™ PMem module for potential issues.
Diagnostic events may fall into the following categories:
Each event includes the following pieces of information:
The following sections list each of the possible events grouped by category of the event.
The quick health check diagnostic verifies that the Intel® Optane™ PMem module’s host mailboxes are accessible and that basic health indicators can be read and are currently reporting acceptable values.
Table 1. Table Quick Health Check Events
Code | Severity | Message | Arguments |
500 | Info | The quick health check succeeded. | |
501 | Warning | The quick health check detected that PMem module [1] is not manageable because subsystem vendor ID [2] is not supported. UID: [3] | 4 1. PMem module Handle 4 2. Subsystem Vendor ID 4 3. PMem module UID |
502 | Warning | The quick health check detected that PMem module [1] is not manageable because subsystem device ID [2] is not supported. UID: [3] | 4 1. PMem module Handle 4 2. Subsystem Device ID 4 3. PMem module UID |
503 | Warning | The quick health check detected that PMem module [1] is not manageable because firmware API version [2] is not supported. UID: [3] | 4 1. PMem module Handle 4 2. FW API version 4 3. PMem module UID |
504 | Warning | The quick health check detected that PMem module [1] is reporting a bad health state [2]. UID: [3] | 4 1. PMem module Handle 4 2. Actual Health State 4 3. PMem module UID |
505 | Warning | The quick health check detected that PMem module [1] is reporting a media temperature of [2] C which is above the alarm threshold [3] C. UID: [4] | 4 1. PMem module Handle 4 2. Actual Media Temperature 4 3. Media Temperature Threshold 4 4. PMem module UID |
506 | Warning | The quick health check detected that PMem module [1] is reporting percentage remaining at [2]% which is less than the alarm threshold [3]%. UID: [4] | 4 1. PMem module Handle 4 2. Actual Percentage Remaining 4 3. Percentage Remaining Threshold 4 4. PMem module UID |
507 | Warning | The quick health check detected that PMem module [1] is reporting reboot required. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
511 | Warning | The quick health check detected that PMem module [1] is reporting a controller temperature of [2] C which is above the alarm threshold [3] C. UID: [4] | 4 1. PMem module Handle 4 2. Actual Controller Temperature 4 3. Controller Temperature Threshold 4 4. PMem module UID |
513 | Error | The quick health check detected that the boot status register of PMem module [1] is not readable. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
514 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the media is not ready. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
515 | Error | The quick health check detected that the firmware on PMem module [1] is reporting an error in the media. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
519 | Error | The quick health check detected that PMem module [1] failed to initialize BIOS POST testing. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
520 | Error | The quick health check detected that the firmware on PMem module [1] has not initialized successfully. The last known Major:Minor Checkpoint is [2]. UID: [3] | 4 1. PMem module Handle 4 2. Major checkpoint : Minor checkpoint in Boot Status Register 4 3. PMem module UID |
523 | Error | The quick health check detected that PMem module [1] is reporting a viral state. The PMem module is now read-only. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
529 | Warning | The quick health check detected that PMem module [1] is reporting that it has no package spares available. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
530 | Info | The quick health check detected that the firmware on PMem module [1] experienced an unsafe shutdown before its latest restart. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
533 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is not ready. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
534 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the media is disabled. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
535 | Error | The quick health check detected that the firmware on PMem module [1] is reporting that the AIT DRAM is disabled. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
536 | Error | The quick health check detected that the firmware on PMem module [1] failed to load successfully. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
538 | Error | PMem module [1] is reporting that the DDRT IO Init is not complete. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
539 | Error | PMem module [1] is reporting that the mailbox interface is not ready. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
540 | Error | An internal error caused the quick health check to abort on PMem module [1]. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
541 | Error | The quick health check detected that PMem module [1] is busy. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
542 | Error | The quick health check detected that the platform FW did not map a region to SPA on PMem module [1]. ACPI NFIT NVPMem module State Flags Error Bit 6 Set. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
543 | Error | The quick health check detected that PMem module [1] DDRT Training is not complete/failed. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
544 | Error | PMem module [1] is reporting that the DDRT IO Init is not started. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
545 | Error | The quick health check detected that the ROM on PMem module [1] has failed to complete initialization, last known Major:Minor Checkpoint is [2]. | 4 1. PMem module Handle 4 2. Major checkpoint : Minor checkpoint in Boot Status Register 4 3. PMem module UID |
This diagnostic test group verifies that the BIOS platform
configuration matches the
installed hardware and the platform configuration conforms to best known
practices.
Table 2. Table Platform Configuration Check Events
Code | Severity | Message | Arguments |
600 | Info | The platform configuration check succeeded. | |
601 | Info | The platform configuration check detected that there are no manageable PMem modules. | |
606 | Info | The platform configuration check detected that PMem module [1] is not configured. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
608 | Error | The platform configuration check detected [1] PMem modules installed on the platform with the same serial number [2]. | 4 1. Number of PMem modules with duplicate serial numbers. 4 2. The duplicate serial number |
609 | Info | The platform configuration check detected that PMem module [1] has a goal configuration that has not yet been applied. A system reboot is required for the new configuration to take effect. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
618 | Error | The platform configuration check detected that a PMem module with physical ID [1] is present in the system but failed to initialize. UID: [2] | 4 1. PMem module handle in the SMBIOS table 4 2. PMem module UID |
621 | Error | The platform configuration check detected PCD contains invalid data on PMem module [1]. UID: [2] | 4 1. PMem module Handle 4 2. PMem module UID |
622 | Error | The platform configuration check was unable to retrieve the namespace information. | |
623 | Warning | The platform configuration check detected that the BIOS settings do not currently allow memory provisioning from this software. | |
624 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of errors in the goal data. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. | 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status |
625 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because the system has insufficient resources. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. | 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status |
626 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] because of a firmware error. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. | 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status |
627 | Error | The platform configuration check detected that the BIOS could not apply the configuration goal on PMem module [1] for an unknown reason. The detailed status is COUT table status: [2] [3], Partition change table status: [4], Interleave change table 1 status: [5], Interleave change table 2 status: [6]. | 4 1. PMem module Handle 4 2. Validation Status 4 3. Text error code corresponding to the status code 4 4. Partition Size Change Status 4 5. Interleave Change Status 4 6. Interleave Change Status |
628 | Error | The platform configuration check detected that interleave set [1] is broken because the PMem modules were moved [2]. | 4 1. Interleave set index ID 4 2. List of moved PMem modules. |
629 | Error | The platform configuration check detected that the platform does not support ADR and therefore data integrity is not guaranteed on the PMem modules. | |
630 | Error | An internal error caused the platform configuration check to abort. | |
631 | Error | The platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is missing from location (Socket-Die-iMC-Channel-Slot) [3]. | 4 1. Interleave set index ID 4 2. PMem module UID 4 3. Location ID |
632 | Error | The platform configuration check detected that interleave set [1] is broken because the PMem module with UID: [2] is misplaced. It is currently in location (Socket-Die-iMC-Channel-Slot) [3] and should be moved to (Socket-Die-iMC-Channel-Slot) [4]. | 4 1. Interleave set index ID 4 2. PMem module UID 4 3. Location ID 4 4. Location ID |
633 | Error | The platform configuration check detected that the BIOS could not fully map memory on PMem module [1] because of an error in current configuration. The detailed status is CCUR table status: [2] [3]. | 4 1. PMem module Handle 4 2. Current Configuration Status 4 3. Text error code corresponding to the status code |
The security check diagnostic test group verifies that all
Intel® Optane™ PMem modules
have a consistent security state.
Table 3. Table Security Check Events
Code | Severity | Message | Arguments |
800 | Info | The security check succeeded. | |
801 | Info | The security check detected that there are no manageable PMem modules. | |
802 | Warning | The security check detected that security settings are inconsistent [1]. | 4 1. A comma separated list of the number of PMem modules in each security state |
804 | Info | The security check detected that security is not supported on all PMem modules. | |
805 | Error | An internal error caused the security check to abort. |
This test group verifies that all PMem modules of a given
subsystem
device ID have consistent FW installed and other FW modifiable attributes are
set in accordance with best practices.
Table 4. Table Firmware Consistency and Settings Check Events
Code | Severity | Message | Arguments |
900 | Info | The firmware consistency and settings check succeeded. | |
901 | Info | The firmware consistency and settings check detected that there are no manageable PMem modules. | |
902 | Warning | The firmware consistency and settings check detected that firmware version on PMem modules [1] with subsystem device ID [2] is non-optimal, preferred version is [3]. | 4 1. Comma separated list of PMem module UIDs 4 2. Subsystem device ID 4 3. Preferred firmware version |
903 | Warning | The firmware consistency and settings check detected that PMem module [1] is reporting a non-critical media temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4] | 4 1. PMem module Handle 4 2. Current media temperature threshold 4 3. Fatal media temperature threshold 4 4. PMem module UID |
904 | Warning | The firmware consistency and settings check detected that PMem module [1] is reporting a non-critical controller temperature threshold of [2] C which is above the fatal threshold [3] C. UID: [4] | 4 1. PMem module Handle 4 2. Current controller temperature threshold 4 3. Fatal controller temperature threshold 4 4. PMem module UID |
905 | Warning | The firmware consistency and settings check detected that PMem module [1] is reporting a percentage remaining of [2]% which is below the recommended threshold [3]%. UID: [4] | 4 1. PMem module Handle 4 2. Current percentage remaining threshold 4 3. Recommended percentage remaining threshold 4 4. PMem module UID |
906 | Warning | The firmware consistency and settings check detected that PMem modules have inconsistent viral policy settings. | |
910 | Error | An internal error caused the firmware consistency and settings check to abort. | |
911 | Warning | The firmware consistency and settings check detected that PMem modules have inconsistent first fast refresh settings. |
2022-09-26 | ipmctl |