endrebjo Skrevet 4. november 2007 Del Skrevet 4. november 2007 (endret) Jeg har en filserver som står og snurrer. Specs: - Debian 4.0 r1 - Gigabyte HK m/ i810 brikkesett - Pentium III 600 MHz - 256 MB SD-RAM - SDM PCI SATA RAID 2P (Sil3512) kontroller - 500 GB SATA-disk - 80 GB PATA-disk Men den henger seg når jeg kopierer store filer (f.eks 200 MB). Dette skjer både når jeg overfører fra SATA-disk til PATA-disk, og fra SATA-disk til SATA-disk. I loggfilen ser det ut som at det er SATA-disken som er problemet. Klikk for å se/fjerne innholdet nedenfor Nov 4 12:13:44 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0 Nov 4 12:13:44 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:13:44 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error) Nov 4 12:13:44 server kernel: ata1: EH complete Nov 4 12:13:44 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:13:44 server kernel: sda: Write Protect is off Nov 4 12:13:44 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:13:44 server kernel: SCSI device sda: drive cache: write back Nov 4 12:13:44 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen Nov 4 12:13:44 server kernel: ata1.00: (BMDMA stat 0x64) Nov 4 12:13:44 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0xff err 0xff (HSM violation) Nov 4 12:13:45 server kernel: ata1: soft resetting port Nov 4 12:13:45 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:14:15 server kernel: ata1.00: qc timeout (cmd 0xec) Nov 4 12:14:15 server kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 4 12:14:15 server kernel: ata1.00: revalidation failed (errno=-5) Nov 4 12:14:15 server kernel: ata1: failed to recover some devices, retrying in 5 secs Nov 4 12:14:20 server kernel: ata1: hard resetting port Nov 4 12:14:20 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:14:20 server kernel: ata1.00: configured for UDMA/100 Nov 4 12:14:20 server kernel: ata1: EH complete Nov 4 12:14:20 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:14:20 server kernel: sda: Write Protect is off Nov 4 12:14:20 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:14:20 server kernel: SCSI device sda: drive cache: write back Nov 4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0 Nov 4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error) Nov 4 12:14:21 server kernel: ata1: EH complete Nov 4 12:14:21 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:14:21 server kernel: sda: Write Protect is off Nov 4 12:14:21 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:14:21 server kernel: SCSI device sda: drive cache: write back Nov 4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0 Nov 4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error) Nov 4 12:14:21 server kernel: ata1: EH complete Nov 4 12:14:21 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:14:21 server kernel: sda: Write Protect is off Nov 4 12:14:21 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:14:21 server kernel: SCSI device sda: drive cache: write back Nov 4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0 Nov 4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error) Nov 4 12:14:21 server kernel: ata1: EH complete Nov 4 12:14:21 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:14:21 server kernel: sda: Write Protect is off Nov 4 12:14:21 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:14:21 server kernel: SCSI device sda: drive cache: write back Nov 4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen Nov 4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x64) Nov 4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0xff err 0xff (HSM violation) Nov 4 12:14:22 server kernel: ata1: soft resetting port Nov 4 12:14:22 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:14:52 server kernel: ata1.00: qc timeout (cmd 0xec) Nov 4 12:14:52 server kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 4 12:14:52 server kernel: ata1.00: revalidation failed (errno=-5) Nov 4 12:14:52 server kernel: ata1: failed to recover some devices, retrying in 5 secs Nov 4 12:14:57 server kernel: ata1: hard resetting port Nov 4 12:14:57 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:14:57 server kernel: ata1.00: configured for UDMA/100 Nov 4 12:14:57 server kernel: ata1: EH complete Nov 4 12:14:57 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:14:57 server kernel: sda: Write Protect is off Nov 4 12:14:57 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:14:57 server kernel: SCSI device sda: drive cache: write back Nov 4 12:16:26 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0 Nov 4 12:16:26 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:16:26 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error) Nov 4 12:16:26 server kernel: ata1: EH complete Nov 4 12:16:26 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:16:26 server kernel: sda: Write Protect is off Nov 4 12:16:26 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:16:26 server kernel: SCSI device sda: drive cache: write back Nov 4 12:16:28 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0 Nov 4 12:16:28 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:16:28 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error) Nov 4 12:16:28 server kernel: ata1: EH complete Nov 4 12:16:28 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:16:28 server kernel: sda: Write Protect is off Nov 4 12:16:28 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:16:28 server kernel: SCSI device sda: drive cache: write back Nov 4 12:16:58 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen Nov 4 12:16:58 server kernel: ata1.00: (BMDMA stat 0x60) Nov 4 12:16:58 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) Nov 4 12:16:58 server kernel: ata1: soft resetting port Nov 4 12:16:58 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:16:59 server kernel: ata1.00: configured for UDMA/100 Nov 4 12:16:59 server kernel: ata1: EH complete Nov 4 12:16:59 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:16:59 server kernel: sda: Write Protect is off Nov 4 12:16:59 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:16:59 server kernel: SCSI device sda: drive cache: write back Nov 4 12:17:00 server kernel: ata1.00: limiting speed to UDMA/66 Nov 4 12:17:00 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen Nov 4 12:17:00 server kernel: ata1.00: (BMDMA stat 0x64) Nov 4 12:17:00 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0xff err 0xff (HSM violation) Nov 4 12:17:00 server kernel: ata1: soft resetting port Nov 4 12:17:00 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:17:01 server /USR/SBIN/CRON[3483]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Nov 4 12:17:30 server kernel: ata1.00: qc timeout (cmd 0xec) Nov 4 12:17:30 server kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 4 12:17:30 server kernel: ata1.00: revalidation failed (errno=-5) Nov 4 12:17:30 server kernel: ata1: failed to recover some devices, retrying in 5 secs Nov 4 12:17:35 server kernel: ata1: hard resetting port Nov 4 12:17:35 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 4 12:17:35 server kernel: ata1.00: configured for UDMA/66 Nov 4 12:17:35 server kernel: ata1: EH complete Nov 4 12:17:35 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) Nov 4 12:17:35 server kernel: sda: Write Protect is off Nov 4 12:17:35 server kernel: sda: Mode Sense: 00 3a 00 00 Nov 4 12:17:35 server kernel: SCSI device sda: drive cache: write back Er det noen som har peiling på hva som er galt? Er det hardware- eller software-problem? Endret 4. november 2007 av endrebjorsvik Lenke til kommentar
vega77 Skrevet 8. november 2007 Del Skrevet 8. november 2007 hva med å sjekke smart status på diskene dine ? innstaler feks webmin på din server og sjekk via webmin Lenke til kommentar
endrebjo Skrevet 8. november 2007 Forfatter Del Skrevet 8. november 2007 (endret) God idé å sjekke SMART-status. Er smartmontools god nok? Det virker mindre omfattende enn webmin, Problemdisken er forresten flett ny, og det samme er kontrolleren. Jeg har ikke gjort noe som helt for å installere det. Verken drivere eller konfigurasjon (bare mount). Kan det være der problemet ligger? Endret 8. november 2007 av endrebjorsvik Lenke til kommentar
vega77 Skrevet 8. november 2007 Del Skrevet 8. november 2007 bare den er formatert så skulle vel resten fikse seg selv. smartmontools funker sikkert det også aldri testet dog. Kan jo være feil på nye disker også. Sendte for noen uker siden en 750gb hitachi i retur. Den hadde vært i drift under 24t og bare problemer. Lenke til kommentar
endrebjo Skrevet 8. november 2007 Forfatter Del Skrevet 8. november 2007 Jeg driver og kjører en eller annen SMART self-check på problemdisken nå ($ smartctl -t long -d ata /dev/sda). Den er ferdig om halvannen time. Systemdisken tok mye kortere tid (20 min). Ser SMART-verdiene fra den grei ut? Klikk for å se/fjerne innholdet nedenfor === START OF INFORMATION SECTION === Model Family: IBM/Hitachi Deskstar 120GXP family Device Model: IC35L080AVVA07-0 Serial Number: VNC402A4D0JEXA Firmware Version: VA4OA52A User Capacity: 82,348,277,760 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 5 ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 Local Time is: Thu Nov 8 18:58:42 2007 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (2288) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 38) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 099 099 060 Pre-fail Always - 2 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 102 102 024 Pre-fail Always - 269 (Average 269) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 1643 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 9144 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1613 192 Power-Off_Retract_Count 0x0032 099 099 050 Old_age Always - 1786 193 Load_Cycle_Count 0x0012 099 099 050 Old_age Always - 1786 194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 33 (Lifetime Min/Max 2/54) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 9144 - Device does not support Selective Self Tests/Logging Selv om det er en "DeathStar", så har den vært trofast i kanskje 8-9 år (ble levert med Win98). Lenke til kommentar
endrebjo Skrevet 8. november 2007 Forfatter Del Skrevet 8. november 2007 Er det noen som kan si noe om SMART-verdiene til denne disken? Dette er den nye, som jeg kanskje har problemer med. Klikk for å se/fjerne innholdet nedenfor === START OF INFORMATION SECTION === Device Model: WDC WD5000AAKS-75TMA0 Serial Number: WD-WCAPW3670592 Firmware Version: 12.01C01 User Capacity: 500,107,862,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Nov 8 21:17:51 2007 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (12000) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 150) minutes. Conveyance self-test routine recommended polling time: ( 6) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 177 177 021 Pre-fail Always - 6108 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 7 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 594 10 Spin_Retry_Count 0x0012 100 253 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0012 100 253 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 7 194 Temperature_Celsius 0x0022 114 106 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 198 000 Old_age Always - 75 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 594 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. 10 og 11 ser litt merkelige ut. Lenke til kommentar
endrebjo Skrevet 9. november 2007 Forfatter Del Skrevet 9. november 2007 (endret) Etter å ha sammenliknet med denne tabellen, ser det ut som at bare Spin_Up_Time er rar. 6 sekunder er vel mer enn hva som er vanlig, eller? Edit: Nei. Det er vel egentlig ikke så gale. Men da er det ingenting i veien med SMART-verdiene. Hva kan da være galt? Endret 9. november 2007 av endrebjorsvik Lenke til kommentar
endrebjo Skrevet 15. november 2007 Forfatter Del Skrevet 15. november 2007 Det må ha vært en dårlig kontakt en plass. Nå plukket jeg det fra hverandre og testet det i en annen PC, og flyttet det tilbake igjen etterpå, og nå fungerer det. S-ATA-pluggene er ikke like faste i fisken som de gode gamle P-ATA-kontaktene. Bare synd at P-ATA-kabler skal være så tjukke og jævlige. Lenke til kommentar
Anbefalte innlegg
Opprett en konto eller logg inn for å kommentere
Du må være et medlem for å kunne skrive en kommentar
Opprett konto
Det er enkelt å melde seg inn for å starte en ny konto!
Start en kontoLogg inn
Har du allerede en konto? Logg inn her.
Logg inn nå