Gå til innhold

Serveren henger seg når jeg overfører store filer.


Anbefalte innlegg

Jeg har en filserver som står og snurrer. Specs:

- Debian 4.0 r1

- Gigabyte HK m/ i810 brikkesett

- Pentium III 600 MHz

- 256 MB SD-RAM

- SDM PCI SATA RAID 2P (Sil3512) kontroller

- 500 GB SATA-disk

- 80 GB PATA-disk

 

Men den henger seg når jeg kopierer store filer (f.eks 200 MB). Dette skjer både når jeg overfører fra SATA-disk til PATA-disk, og fra SATA-disk til SATA-disk.

I loggfilen ser det ut som at det er SATA-disken som er problemet.

Klikk for å se/fjerne innholdet nedenfor
Nov  4 12:13:44 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
Nov  4 12:13:44 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:13:44 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error)
Nov  4 12:13:44 server kernel: ata1: EH complete
Nov  4 12:13:44 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:13:44 server kernel: sda: Write Protect is off
Nov  4 12:13:44 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:13:44 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:13:44 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen
Nov  4 12:13:44 server kernel: ata1.00: (BMDMA stat 0x64)
Nov  4 12:13:44 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0xff err 0xff (HSM violation)
Nov  4 12:13:45 server kernel: ata1: soft resetting port
Nov  4 12:13:45 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:14:15 server kernel: ata1.00: qc timeout (cmd 0xec)
Nov  4 12:14:15 server kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Nov  4 12:14:15 server kernel: ata1.00: revalidation failed (errno=-5)
Nov  4 12:14:15 server kernel: ata1: failed to recover some devices, retrying in 5 secs
Nov  4 12:14:20 server kernel: ata1: hard resetting port
Nov  4 12:14:20 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:14:20 server kernel: ata1.00: configured for UDMA/100
Nov  4 12:14:20 server kernel: ata1: EH complete
Nov  4 12:14:20 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:14:20 server kernel: sda: Write Protect is off
Nov  4 12:14:20 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:14:20 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
Nov  4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error)
Nov  4 12:14:21 server kernel: ata1: EH complete
Nov  4 12:14:21 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:14:21 server kernel: sda: Write Protect is off
Nov  4 12:14:21 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:14:21 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
Nov  4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error)
Nov  4 12:14:21 server kernel: ata1: EH complete
Nov  4 12:14:21 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:14:21 server kernel: sda: Write Protect is off
Nov  4 12:14:21 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:14:21 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
Nov  4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error)
Nov  4 12:14:21 server kernel: ata1: EH complete
Nov  4 12:14:21 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:14:21 server kernel: sda: Write Protect is off
Nov  4 12:14:21 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:14:21 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:14:21 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen
Nov  4 12:14:21 server kernel: ata1.00: (BMDMA stat 0x64)
Nov  4 12:14:21 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0xff err 0xff (HSM violation)
Nov  4 12:14:22 server kernel: ata1: soft resetting port
Nov  4 12:14:22 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:14:52 server kernel: ata1.00: qc timeout (cmd 0xec)
Nov  4 12:14:52 server kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Nov  4 12:14:52 server kernel: ata1.00: revalidation failed (errno=-5)
Nov  4 12:14:52 server kernel: ata1: failed to recover some devices, retrying in 5 secs
Nov  4 12:14:57 server kernel: ata1: hard resetting port
Nov  4 12:14:57 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:14:57 server kernel: ata1.00: configured for UDMA/100
Nov  4 12:14:57 server kernel: ata1: EH complete
Nov  4 12:14:57 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:14:57 server kernel: sda: Write Protect is off
Nov  4 12:14:57 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:14:57 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:16:26 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
Nov  4 12:16:26 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:16:26 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error)
Nov  4 12:16:26 server kernel: ata1: EH complete
Nov  4 12:16:26 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:16:26 server kernel: sda: Write Protect is off
Nov  4 12:16:26 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:16:26 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:16:28 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
Nov  4 12:16:28 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:16:28 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x4 (device error)
Nov  4 12:16:28 server kernel: ata1: EH complete
Nov  4 12:16:28 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:16:28 server kernel: sda: Write Protect is off
Nov  4 12:16:28 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:16:28 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:16:58 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen
Nov  4 12:16:58 server kernel: ata1.00: (BMDMA stat 0x60)
Nov  4 12:16:58 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
Nov  4 12:16:58 server kernel: ata1: soft resetting port
Nov  4 12:16:58 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:16:59 server kernel: ata1.00: configured for UDMA/100
Nov  4 12:16:59 server kernel: ata1: EH complete
Nov  4 12:16:59 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:16:59 server kernel: sda: Write Protect is off
Nov  4 12:16:59 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:16:59 server kernel: SCSI device sda: drive cache: write back
Nov  4 12:17:00 server kernel: ata1.00: limiting speed to UDMA/66
Nov  4 12:17:00 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x2 frozen
Nov  4 12:17:00 server kernel: ata1.00: (BMDMA stat 0x64)
Nov  4 12:17:00 server kernel: ata1.00: tag 0 cmd 0xc8 Emask 0x2 stat 0xff err 0xff (HSM violation)
Nov  4 12:17:00 server kernel: ata1: soft resetting port
Nov  4 12:17:00 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:17:01 server /USR/SBIN/CRON[3483]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Nov  4 12:17:30 server kernel: ata1.00: qc timeout (cmd 0xec)
Nov  4 12:17:30 server kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Nov  4 12:17:30 server kernel: ata1.00: revalidation failed (errno=-5)
Nov  4 12:17:30 server kernel: ata1: failed to recover some devices, retrying in 5 secs
Nov  4 12:17:35 server kernel: ata1: hard resetting port
Nov  4 12:17:35 server kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov  4 12:17:35 server kernel: ata1.00: configured for UDMA/66
Nov  4 12:17:35 server kernel: ata1: EH complete
Nov  4 12:17:35 server kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Nov  4 12:17:35 server kernel: sda: Write Protect is off
Nov  4 12:17:35 server kernel: sda: Mode Sense: 00 3a 00 00
Nov  4 12:17:35 server kernel: SCSI device sda: drive cache: write back

Er det noen som har peiling på hva som er galt? Er det hardware- eller software-problem? Endret av endrebjorsvik
Lenke til kommentar
Videoannonse
Annonse

God idé å sjekke SMART-status. :)

Er smartmontools god nok? Det virker mindre omfattende enn webmin,

 

Problemdisken er forresten flett ny, og det samme er kontrolleren. Jeg har ikke gjort noe som helt for å installere det. Verken drivere eller konfigurasjon (bare mount). Kan det være der problemet ligger?

Endret av endrebjorsvik
Lenke til kommentar

bare den er formatert så skulle vel resten fikse seg selv.

 

smartmontools funker sikkert det også aldri testet dog.

Kan jo være feil på nye disker også.

Sendte for noen uker siden en 750gb hitachi i retur.

Den hadde vært i drift under 24t og bare problemer.

Lenke til kommentar

Jeg driver og kjører en eller annen SMART self-check på problemdisken nå ($ smartctl -t long -d ata /dev/sda). Den er ferdig om halvannen time.

Systemdisken tok mye kortere tid (20 min). Ser SMART-verdiene fra den grei ut?

Klikk for å se/fjerne innholdet nedenfor
=== START OF INFORMATION SECTION ===
Model Family:	 IBM/Hitachi Deskstar 120GXP family
Device Model:	 IC35L080AVVA07-0
Serial Number:	VNC402A4D0JEXA
Firmware Version: VA4OA52A
User Capacity:	82,348,277,760 bytes
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
Local Time is:	Thu Nov  8 18:58:42 2007 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
									was never started.
									Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
									without error or no self-test has ever 
									been run.
Total time to complete Offline 
data collection:				 (2288) seconds.
Offline data collection
capabilities:					(0x1b) SMART execute Offline immediate.
									Auto Offline data collection on/off support.
									Suspend Offline collection upon new
									command.
									Offline surface scan supported.
									Self-test supported.
									No Conveyance Self-test supported.
									No Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
									power-saving mode.
									Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
									No General Purpose Logging support.
Short self-test routine 
recommended polling time:		(   1) minutes.
Extended self-test routine
recommended polling time:		(  38) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate	 0x000b   099   099   060	Pre-fail  Always	   -	   2
 2 Throughput_Performance  0x0005   100   100   050	Pre-fail  Offline	  -	   0
 3 Spin_Up_Time			0x0007   102   102   024	Pre-fail  Always	   -	   269 (Average 269)
 4 Start_Stop_Count		0x0012   100   100   000	Old_age   Always	   -	   1643
 5 Reallocated_Sector_Ct   0x0033   100   100   005	Pre-fail  Always	   -	   0
 7 Seek_Error_Rate		 0x000b   100   100   067	Pre-fail  Always	   -	   0
 8 Seek_Time_Performance   0x0005   100   100   020	Pre-fail  Offline	  -	   0
 9 Power_On_Hours		  0x0012   099   099   000	Old_age   Always	   -	   9144
10 Spin_Retry_Count		0x0013   100   100   060	Pre-fail  Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   1613
192 Power-Off_Retract_Count 0x0032   099   099   050	Old_age   Always	   -	   1786
193 Load_Cycle_Count		0x0012   099   099   050	Old_age   Always	   -	   1786
194 Temperature_Celsius	 0x0002   166   166   000	Old_age   Always	   -	   33 (Lifetime Min/Max 2/54)
196 Reallocated_Event_Count 0x0032   100   100   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0022   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0008   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x000a   200   200   000	Old_age   Always	   -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	  9144		 -

Device does not support Selective Self Tests/Logging

Selv om det er en "DeathStar", så har den vært trofast i kanskje 8-9 år (ble levert med Win98).

Lenke til kommentar

Er det noen som kan si noe om SMART-verdiene til denne disken? Dette er den nye, som jeg kanskje har problemer med.

Klikk for å se/fjerne innholdet nedenfor
=== START OF INFORMATION SECTION ===
Device Model:	 WDC WD5000AAKS-75TMA0
Serial Number:	WD-WCAPW3670592
Firmware Version: 12.01C01
User Capacity:	500,107,862,016 bytes
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:	Thu Nov  8 21:17:51 2007 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
									was completed without error.
									Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
									without error or no self-test has ever 
									been run.
Total time to complete Offline 
data collection:				 (12000) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
									Auto Offline data collection on/off support.
									Suspend Offline collection upon new
									command.
									Offline surface scan supported.
									Self-test supported.
									Conveyance Self-test supported.
									Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
									power-saving mode.
									Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
									General Purpose Logging supported.
Short self-test routine 
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 150) minutes.
Conveyance self-test routine
recommended polling time:		(   6) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate	 0x000f   200   200   051	Pre-fail  Always	   -	   0
 3 Spin_Up_Time			0x0003   177   177   021	Pre-fail  Always	   -	   6108
 4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   7
 5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
 7 Seek_Error_Rate		 0x000e   200   200   051	Old_age   Always	   -	   0
 9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   594
10 Spin_Retry_Count		0x0012   100   253   051	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0012   100   253   051	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   7
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   1
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   7
194 Temperature_Celsius	 0x0022   114   106   000	Old_age   Always	   -	   36
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0012   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   200   200   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   198   000	Old_age   Always	   -	   75
200 Multi_Zone_Error_Rate   0x0008   200   200   051	Old_age   Offline	  -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	   594		 -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1		0		0  Not_testing
2		0		0  Not_testing
3		0		0  Not_testing
4		0		0  Not_testing
5		0		0  Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

10 og 11 ser litt merkelige ut.

Lenke til kommentar

Det må ha vært en dårlig kontakt en plass. Nå plukket jeg det fra hverandre og testet det i en annen PC, og flyttet det tilbake igjen etterpå, og nå fungerer det. :)

S-ATA-pluggene er ikke like faste i fisken som de gode gamle P-ATA-kontaktene. Bare synd at P-ATA-kabler skal være så tjukke og jævlige.

Lenke til kommentar

Opprett en konto eller logg inn for å kommentere

Du må være et medlem for å kunne skrive en kommentar

Opprett konto

Det er enkelt å melde seg inn for å starte en ny konto!

Start en konto

Logg inn

Har du allerede en konto? Logg inn her.

Logg inn nå
  • Hvem er aktive   0 medlemmer

    • Ingen innloggede medlemmer aktive
×
×
  • Opprett ny...