M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

JEDEC Jedi

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

Thanks for clarification Smiley Happy

uname -r
3.14.5-031405-generic

 

After first reboot I have noticed following entries:

[   25.818233] ata1: log page 10h reported inactive tag 0
[   25.818242] ata1.00: exception Emask 0x1 SAct 0x50000000 SErr 0x0 action 0x0
[   25.818244] ata1.00: irq_stat 0x40000008
[   25.818247] ata1.00: failed command: READ FPDMA QUEUED
[   25.818252] ata1.00: cmd 60/60:e0:78:d4:15/00:00:09:00:00/40 tag 28 ncq 49152 in
[   25.818252]          res 40/00:f4:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[   25.818254] ata1.00: status: { DRDY }
[   25.818256] ata1.00: failed command: SEND FPDMA QUEUED
[   25.818260] ata1.00: cmd 64/01:f0:00:00:00/00:00:00:00:00/a0 tag 30 ncq 512 out
[   25.818260]          res 40/00:f4:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[   25.818262] ata1.00: status: { DRDY }
[   25.818490] ata1.00: supports DRM functions and may not be fully accessible
[   25.824747] ata1.00: supports DRM functions and may not be fully accessible
[   25.830741] ata1.00: configured for UDMA/133
[   25.830754] ata1.00: device reported invalid CHS sector 0
[   25.830779] sd 0:0:0:0: [sda]
[   25.830781] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   25.830783] sd 0:0:0:0: [sda]
[   25.830784] Sense Key : Aborted Command [current] [descriptor]
[   25.830787] Descriptor sense data with sense descriptors (in hex):
[   25.830788]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[   25.830795]         00 00 00 00
[   25.830798] sd 0:0:0:0: [sda]
[   25.830800] Add. Sense: No additional sense information
[   25.830802] sd 0:0:0:0: [sda] CDB:
[   25.830804] Write same(16): 93 08 00 00 00 00 02 93 68 38 00 00 00 08 00 00
[   25.830812] end_request: I/O error, dev sda, sector 43214904
[   25.830827] ata1: EH complete
[   25.831278] EXT4-fs (sda1): discard request in group:164 block:27655 count:1 failed with -5

Then I have created large files, 1GB, 15GB and 30GB, deleted them and issued fstrim command

sudo fstrim -v /
/: 106956468224 bytes were trimmed

 

Right now I have no more errors in log file except those recorded after first reboot so I guess I should wait for data corruption?

______________________________________

FAQs and Top Forum Solutions
Did a user help you? Say thanks by giving Kudos!
How do I know what memory to buy?
Still need help? Contact Crucial Customer Service
Remember to regularly backup your important data!

hmh
Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)


bogdan wrote:

Thanks for clarification Smiley Happy

uname -r
3.14.5-031405-generic

 

After first reboot I have noticed following entries:

[   25.818233] ata1: log page 10h reported inactive tag 0
[   25.818242] ata1.00: exception Emask 0x1 SAct 0x50000000 SErr 0x0 action 0x0
[   25.818244] ata1.00: irq_stat 0x40000008
[   25.818247] ata1.00: failed command: READ FPDMA QUEUED
[   25.818252] ata1.00: cmd 60/60:e0:78:d4:15/00:00:09:00:00/40 tag 28 ncq 49152 in
[   25.818252]          res 40/00:f4:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[   25.818254] ata1.00: status: { DRDY }
[   25.818256] ata1.00: failed command: SEND FPDMA QUEUED
[   25.818260] ata1.00: cmd 64/01:f0:00:00:00/00:00:00:00:00/a0 tag 30 ncq 512 out
[   25.818260]          res 40/00:f4:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[   25.818262] ata1.00: status: { DRDY }
[   25.818490] ata1.00: supports DRM functions and may not be fully accessible
[   25.824747] ata1.00: supports DRM functions and may not be fully accessible
[   25.830741] ata1.00: configured for UDMA/133
[   25.830754] ata1.00: device reported invalid CHS sector 0
[   25.830779] sd 0:0:0:0: [sda]
[   25.830781] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   25.830783] sd 0:0:0:0: [sda]
[   25.830784] Sense Key : Aborted Command [current] [descriptor]
[   25.830787] Descriptor sense data with sense descriptors (in hex):
[   25.830788]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[   25.830795]         00 00 00 00
[   25.830798] sd 0:0:0:0: [sda]
[   25.830800] Add. Sense: No additional sense information
[   25.830802] sd 0:0:0:0: [sda] CDB:
[   25.830804] Write same(16): 93 08 00 00 00 00 02 93 68 38 00 00 00 08 00 00
[   25.830812] end_request: I/O error, dev sda, sector 43214904
[   25.830827] ata1: EH complete
[   25.831278] EXT4-fs (sda1): discard request in group:164 block:27655 count:1 failed with -5

Then I have created large files, 1GB, 15GB and 30GB, deleted them and issued fstrim command

sudo fstrim -v /
/: 106956468224 bytes were trimmed

 

Right now I have no more errors in log file except those recorded after first reboot so I guess I should wait for data corruption?


 

It should show up much faster if you have the filesystem mounted with the "discard" option, which enables online discard mode.  My best guess is that corruption won't trigger on just every write, likely you want to have pending writes inside the SSD, and maybe cause a trim near them, or something else like that.

 

Running the "mount" command as root should show you the mount options ("sudo mount" might do it in default Ubuntu).  "sudo mount -o discard,remount /" might enable it if it isn't already there.  Or try "sudo su -" to go root, and issue the commands without "sudo".

 

Now, where the corruption, should it happen, will end up going, I don't know.

 

EDIT:  Doing a lot of filesystem work might help, as well.  Maybe running bonie++ (warning: will do a lot of writes), or several concurrent file creation/remove workloads.  Doing it either in online discard mode, or running fstrim concurrently should do it, I guess.

Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

A simple smoke test might be to run something like

  sudo dd if=/dev/sda of=/dev/zero &
  sudo find / >/dev/null 2>&1 &
  sudo grep -r . / >/dev/null &

and run fstrim-all in parallel.

 this is a quote from the following ubuntu bug report wich should trigger the bug. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1259829

hmh
Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

Hmm, if that code snippet does manage to trigger corruption, then it is much much worse than I imagined.

 

You will notice it doesn't do any writes at all, other than, maybe, those caused by inode access time updates(!).  That's actually something important to check.

 

To disable inode access time updates:  mount filesystem with the "noatime" option.    (mount -o noatime,remount /).

To enable reduced inode access time updates (default): use the "realatime" option.    (mount -o relatime,remount /).

To enable full inode access time updates (not recommended for SSDs): use the "strictatime" option.

Highlighted
Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)


hmh wrote:

The M550 (all released firmware), and the M500 (up to MU04) can cause data corruption when QUEUED TRIM is used.

 


Hello. Does anyone know if the new MX100 is susceptible to this problem as well? Seems it uses the same controller and firmware as the M550:

 

http://www.anandtech.com/show/8066/crucial-mx100-256gb-512gb-review

 

I have the 512GB version installed on Xubuntu 14.04, and so far I haven't seen any problems in dmesg while issuing `sudo fstrim -v /` repeatedly while under heavy IO load. I don't have `discard` in my `/etc/fstab`. `uname -a` gives:

 

Linux box 3.13.0-30-generic #54-Ubuntu SMP Mon Jun 9 22:45:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

hmh
Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

The MX100 should be as buggy as the M5X0, indeed. However, chances are your kernel is blacklisting QUEUED TRIM already, so it shouldn't trip the bug.
Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

Until now I bought a RealSSD C300, 2x Crucial m4 and 3x Crucial M500, my last one was the 960GB version for about 500 bucks. And at least 15 people in my circle of friends and acquaintances, mabye even more, have bought a Crucial SSD just because of my recommendation. So it would be nice to get a offical statement from Crucial if they are actively working on this, or if the don't care about there loyal customers...

Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)


hmh wrote:
The MX100 should be as buggy as the M5X0, indeed. However, chances are your kernel is blacklisting QUEUED TRIM already, so it shouldn't trip the bug.

Ack. OK, should this thread be renamed to include the MX100?

JEDEC Jedi

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)


mspacek wrote:

Seems it uses the same controller and firmware as the M550:

 


What makes you think it has the same firmware?  It uses different NAND so the firmware would need added compatibility with that for starters.

_______________________________________
How do I know what memory to buy?
Shop for your region: US | UK | EU | France | Global
I think my memory is bad. What do I do now?
FAQs and Top Forum Solutions
We want your feedback! Post in the Suggestion Box
Did a user help you? Say thanks by giving Kudos!
Still need help? Contact Customer Service
Want to be a Super User?
hmh
Kilobyte Kid

Re: M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

And I just checked, Linux has no blacklist for the MX100.

 

So let's hope the MX100 really does not have the same firmware as the M5x0, otherwise people will lose data.