Bug 61998

Summary: Fails to read status from WD raptors
Product: libatasmart Reporter: Phillip Susi <psusi>
Component: libraryAssignee: Lennart Poettering <lennart>
Status: NEW --- QA Contact: Lennart Poettering <lennart>
Severity: normal    
Priority: medium CC: auxsvr, b.bellec, bugzilla, pocek, samuel, thommi336, zeuthen
Version: unspecifiedKeywords: have-backtrace, patch
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: fix-status-io-error.patch
ssd crash

Description Phillip Susi 2013-03-08 03:40:22 UTC
I get an I/O error trying to read health status:

psusi@faldara:~$ sudo skdump /dev/sdc
Device: sat16:/dev/sdc
Type: 16 Byte SCSI ATA SAT Passthru
Size: 35304 MiB
Model: [WDC WD360GD-00FNA0]
Serial: [WD-WMAH91337618]
Firmware: [35.06K35]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: Input/output error
Off-line Data Collection Status: [Off-line data collection activity was completed without error.]
Total Time To Complete Off-Line Data Collection: 1572 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 28 min
Conveyance Self-Test Polling Time: 5 min
Bad Sectors: 0 sectors
Powered On: 1.7 years
Power Cycles: 5260
Average Powered On Per Power Cycle: 2.8 h
Temperature: 35.0 C
Attribute Parsing Verification: Good
Overall Status: Input/output error
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                 88    84    21   2.1 s       0x340800000000 prefail online  yes  yes 
  4 start-stop-count             95    95    40   5571        0xc31500000000 old-age online  yes  yes 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200    51   0           0x000000000000 prefail online  yes  yes 
  9 power-on-hours               80    80     0   1.7 years   0x833900000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   100    51   0           0x000000000000 prefail online  yes  yes 
 11 calibration-retry-count     100   100    51   0           0x000000000000 prefail online  yes  yes 
 12 power-cycle-count            95    95     0   5260        0x8c1400000000 old-age online  n/a  n/a 
194 temperature-celsius-2       108   253     0   35.0 C      0x230000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
199 udma-crc-error-count        200   253     0   0           0x000000000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       200   125    51   0           0x000000000000 prefail offline yes  yes 


I traced it down to sk_disk_smart_status here:

        /* SAT/USB bridges truncate packets, so we only check for 4F,
         * not for 2C on those */
        if ((d->type == SK_DISK_TYPE_ATA_PASSTHROUGH_12 || cmd[3] == htons(0x00C2U)) &&
            cmd[4] == htons(0x4F00U))
                *good = TRUE;
        else if ((d->type == SK_DISK_TYPE_ATA_PASSTHROUGH_12 || cmd[3] == htons(0x002CU)) &&
                 cmd[4] == htons(0xF400U))
                *good = FALSE;
        else {
>               errno = EIO;
                return -1;
        }

(gdb) print d->type
$5 = SK_DISK_TYPE_ATA_PASSTHROUGH_16
(gdb) print /x cmd
$4 = {0x0, 0x0, 0x0, 0xc200, 0x454f, 0x5000}
Comment 1 Phillip Susi 2013-03-18 23:10:59 UTC
Created attachment 76719 [details] [review]
fix-status-io-error.patch

Fixed the bug with this simple patch.  The existing code is testing the value of 8 bits to be zero that the standard says are undefined.  My drives were not setting them to zero.  Mask off the undefined bits when comparing.
Comment 2 Martin Pitt 2013-03-25 06:41:26 UTC
Thanks Philip! I applied your patch to the Debian package (and will sync that into Ubuntu)
Comment 3 Phillip Susi 2013-09-25 14:19:29 UTC
Hi Lennart, it has been 6 months since I submitted this patch and it hasn't been applied yet.  Could you take a look and at least comment?
Comment 4 Orion Poplawski 2014-06-16 21:10:18 UTC
Patch appears to fix this for me as well as bug 53475
Comment 5 Orion Poplawski 2014-06-16 21:11:45 UTC
*** Bug 53475 has been marked as a duplicate of this bug. ***
Comment 6 Benjamin Bellec 2016-03-23 22:33:02 UTC
Created attachment 122509 [details]
ssd crash

I have a cheap SSD (Corsair Force LS) in my working computer, and it regularly crash. I think this is related to this bug, see my attachment.
Comment 7 Phillip Susi 2016-05-13 00:31:05 UTC
3 years on and this patch is still waiting to be applied.
Comment 8 RASG 2017-10-04 12:08:21 UTC
another year, and still the same problem.

in my case, with a hybrid (HD+SSD) 500G ST95005620AS

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.