Gå til innhold

Trøblete software-RAID


Anbefalte innlegg

Jeg har et oppset med 5 disker: sda med dual-boot WIN7/FC14 og sd[b-e]1 utgjør et softwarebasert RAID 5 som kun brukes av linux.

 

En vakker(?) dag får jeg følgende feilmeldinger ved oppstart:

mdadm: failed to start array /dev/md0: Inout/output error
Setter opp logisk volumhådtering: No volume groups found

Sjekker filsystemer
/dev/sda3: clean, 702535/3276800 files, 4962539/13107200 blocks
fsck.ext4: Unable to resolve ´UUID=3431887e-d190-48b6-8c07-a028ca9cb318´

 

Jeg prøver så å starte pc-en fra USB-stick med systemrescuecd.

 

Der starter jeg med:

root@sysresccd /root % mdadm -A -s
mdadm: No arrays found in config file or automatically

root@sysresccd /root % mdadm --examine --scan /dev/sd[b-e]1
ARRAY /dev/md0 UUID=a1439113:09de74ea:e94f2cff:2743ab37

root@sysresccd /root % cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md126 : inactive sdb1[0](S) sdd1[2](S) sdc1[1](S)
     2930279808 blocks

md127 : inactive sde1[3](S)
     976759936 blocks

unused devices: <none>

 

Jeg finner da fram /etc/mdadm.conf fra Fedora-installasjonen, den lyder:

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=a1439113:09de74ea:e94f2cff:2743ab37

 

Jeg prøver å legge siste linje til i mdadm.conf for systemrescuecd og kjøre mdadm på nytt:

root@sysresccd /root % mdadm -A -s -v           
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdj1: Device or resource busy
mdadm: /dev/sdj1 has wrong uuid.
mdadm: cannot open device /dev/sdj: Device or resource busy
mdadm: /dev/sdj has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda6
mdadm: /dev/sda6 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda5
mdadm: /dev/sda5 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda4
mdadm: /dev/sda4 has wrong uuid.
mdadm: cannot open device /dev/sda3: Device or resource busy
mdadm: /dev/sda3 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda2
mdadm: /dev/sda2 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda1
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: cannot open device /dev/sde1: Device or resource busy
mdadm: /dev/sde1 has wrong uuid.
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: /dev/sde has wrong uuid.
mdadm: cannot open device /dev/sdc1: Device or resource busy
mdadm: /dev/sdc1 has wrong uuid.
mdadm: cannot open device /dev/sdc: Device or resource busy
mdadm: /dev/sdc has wrong uuid.
mdadm: cannot open device /dev/sdd1: Device or resource busy
mdadm: /dev/sdd1 has wrong uuid.
mdadm: cannot open device /dev/sdd: Device or resource busy
mdadm: /dev/sdd has wrong uuid.
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has wrong uuid.
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/loop0: Device or resource busy
mdadm: /dev/loop0 has wrong uuid.

root@sysresccd /root % mdadm --assemble -v /dev/sd[b,c,d,e]1
mdadm: device /dev/sdb1 exists but is not an md array.

 

Så prøver jeg å få litt mer info:

root@sysresccd /root % mdadm -E /dev/sd[b-e]1
/dev/sdb1:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : a1439113:09de74ea:e94f2cff:2743ab37
 Creation Time : Tue Feb 24 12:14:32 2009
    Raid Level : raid5
 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
    Array Size : 2930279424 (2794.53 GiB 3000.61 GB)
  Raid Devices : 4
 Total Devices : 3
Preferred Minor : 0

   Update Time : Sat Apr  2 20:34:19 2011
         State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
      Checksum : 36bd99a6 - correct
        Events : 3336472

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

  0     0       8       17        0      active sync   /dev/sdb1
  1     1       0        0        1      faulty removed
  2     2       8       49        2      active sync   /dev/sdd1
  3     3       8       65        3      active sync   /dev/sde1
/dev/sdc1:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : a1439113:09de74ea:e94f2cff:2743ab37
 Creation Time : Tue Feb 24 12:14:32 2009
    Raid Level : raid5
 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
    Array Size : 2930279424 (2794.53 GiB 3000.61 GB)
  Raid Devices : 4
 Total Devices : 4
Preferred Minor : 0

   Update Time : Fri May 28 22:36:42 2010
         State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
 Spare Devices : 0
      Checksum : 34cb2303 - correct
        Events : 347922

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

  0     0       8       17        0      active sync   /dev/sdb1
  1     1       8       33        1      active sync   /dev/sdc1
  2     2       8       49        2      active sync   /dev/sdd1
  3     3       8       65        3      active sync   /dev/sde1
/dev/sdd1:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : a1439113:09de74ea:e94f2cff:2743ab37
 Creation Time : Tue Feb 24 12:14:32 2009
    Raid Level : raid5
 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
    Array Size : 2930279424 (2794.53 GiB 3000.61 GB)
  Raid Devices : 4
 Total Devices : 3
Preferred Minor : 0

   Update Time : Sat Apr  2 20:34:19 2011
         State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
      Checksum : 36bd99ca - correct
        Events : 3336472

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     2       8       49        2      active sync   /dev/sdd1

  0     0       8       17        0      active sync   /dev/sdb1
  1     1       0        0        1      faulty removed
  2     2       8       49        2      active sync   /dev/sdd1
  3     3       8       65        3      active sync   /dev/sde1
/dev/sde1:
         Magic : a92b4efc
       Version : 0.90.00
          UUID : a1439113:09de74ea:e94f2cff:2743ab37
 Creation Time : Tue Feb 24 12:14:32 2009
    Raid Level : raid5
 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
    Array Size : 2930279424 (2794.53 GiB 3000.61 GB)
  Raid Devices : 4
 Total Devices : 3
Preferred Minor : 0

   Update Time : Sat Apr  2 20:34:23 2011
         State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
      Checksum : 368ab0c8 - correct
        Events : 3336473

        Layout : left-symmetric
    Chunk Size : 256K

     Number   Major   Minor   RaidDevice State
this     3       8       65        3      active sync   /dev/sde1

  0     0       8       17        0      active sync   /dev/sdb1
  1     1       0        0        1      faulty removed
  2     2       8       49        2      active sync   /dev/sdd1
  3     3       8       65        3      active sync   /dev/sde1

 

Det ser ut til at det er et problem med sdc1 her. Kjører jeg smartctl, får jeg:

root@sysresccd /root % smartctl -Hc /dev/sdc1
smartctl 5.40 2010-10-16 r3189 [i486-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				was never started.
				Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (11816) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 198) minutes.
Conveyance self-test routine
recommended polling time: 	 (  21) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
				SCT Error Recovery Control supported.
				SCT Feature Control supported.
				SCT Data Table supported.

 

Jeg er ingen ekspert, men disken ser da ut til å være i live?

 

Er det noen av guruene her på forumet som skjønner hva som er galt her, og hvordan jeg kan få RAID-et opp og gå igjen?

Lenke til kommentar
Videoannonse
Annonse

Ser ut til at /dev/sdc1 har kommet litt i "utakt" med de andre 3, ja.

 

Hvis du mener deg trygg på at det er /dev/sdc1 som er litt "satt ut" kan du jo prøve å assemble raidet med de 3 andre diskene:

mdadm --assemble /dev/md0 /dev/sd[bde]1

 

Hvis den klarer å assemble disse bør /proc/mdstat vise at du har fått en inaktiv md0

 

Hvis du så stoler på hw'en til sdc kan du jo forsøke å legge den til igjen, /proc/mdstat bør da vise at raidet sync'er.

mdadm /dev/md0 --add /dev/sdc1

Lenke til kommentar

Ser ut til at /dev/sdc1 har kommet litt i "utakt" med de andre 3, ja.

 

Hvis du mener deg trygg på at det er /dev/sdc1 som er litt "satt ut" kan du jo prøve å assemble raidet med de 3 andre diskene:

mdadm --assemble /dev/md0 /dev/sd[bde]1

 

Hvis den klarer å assemble disse bør /proc/mdstat vise at du har fått en inaktiv md0

 

Hvis du så stoler på hw'en til sdc kan du jo forsøke å legge den til igjen, /proc/mdstat bør da vise at raidet sync'er.

mdadm /dev/md0 --add /dev/sdc1

 

Hmm...nå kan den visst ikke åpne partisjonene, eller så mangler den en 'superblock' (uavhengig av om jeg utelukker sdc1 eller sdb1). Hva i all verden skal dette bety?

 

root@sysresccd /root % mdadm --assemble /dev/md0 /dev/sd[b,d,e]1
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has no superblock - assembly aborted

root@sysresccd /root % mdadm --assemble /dev/md0 /dev/sd[c-e]1  
mdadm: cannot open device /dev/sdc1: Device or resource busy
mdadm: /dev/sdc1 has no superblock - assembly aborted

Lenke til kommentar

Hvis rescue cd/usb du bruker har forsøkt å starte raidet så kan det være at den har laget noen inaktive raid, slike som du tidligere viste:

md126 : inactive sdb1[0](S) sdd1[2](S) sdc1[1](S)

2930279808 blocks

 

md127 : inactive sde1[3](S)

976759936 blocks

 

I såfall må du stoppe disse først:

 

mdadm --stop /dev/md126 /dev/md127

Lenke til kommentar

Opprett en konto eller logg inn for å kommentere

Du må være et medlem for å kunne skrive en kommentar

Opprett konto

Det er enkelt å melde seg inn for å starte en ny konto!

Start en konto

Logg inn

Har du allerede en konto? Logg inn her.

Logg inn nå
×
×
  • Opprett ny...