• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Frits Walraven
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • salvin francis
  • fred rosenberger

so - just another harddrive failed

 
Ranch Hand
Posts: 127
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ah - I remember those days back when I was one of the lucky kids with a harddrive bigger than 2gb (it was a 4gb drive if I remember correctly) - but still the 400mb drive I got from a schoolmate really was an advantage
fast forward about 20 years today I have a total of 5 drives each 3tb in size linked together as a raid5 to end up in about 12tb of continues disk space - compared to that poor 400mb drive back then several orders of magnitute
as I set up the raid I initially put in 5 identical drives of type seagate barracuda st3000dm001-1er166
why? to take advantage of what raid5 has to offer? no, but just because I can
so, I had the first drive failed about last year - and as I didn't had any spare drives on hand I ordered 3 new ones - st3000dm008-2dm116 - so basically the same model as before but a newer revision
the first rebuild took me about 14 hours - oh, if only I had knew this before I got that stupid idea in mind to fill my case with a bunch of harddrives and rather had used proper backups
anyway - due to how raid5 works I didn't lost a single bit of data - sure, most of them I could had just re-downloaded from the net - but I also have some important personal data on those drives
skipping ahead about half a year and all of the sudden my OS drive crashed - well ok, it was some very old notebook drive I stole from some system 10 years ago - so the drive should had failed long time ago - I'm still impressed it survived that long as it was a WD drive - long story short: replaced it with just another seagate barracuda (a st3500418basq this time) and got lucky enough to pull of some important data
and today - just about 2 hours ago my raid5 became critical once again all of the sudden without the log showing issues the last days (which is often the case when a drive is about to fail in the near future)
luckly I had a spare drive just on hand so replacing and starting the rebuild took just about half an hour this time instead of a week waitin for new drives
as I already got 15% just after an hour after starting the rebuild it looks like the rebuild will finish sooner this time

TLDR: it's true that a raid is not a backup - and with raid5 there's a chance of just another drive failing while rebuild which would end up in dataloss - hence I plan to migrate to a raid6 solution soon
lesson learned from the failures in that time: always have a backup on another drive

how about you? do you use raid? if so: what type and how many drives? what do you use it for? and had you a failed drive yet?
would like to hear other's stories about using raid
 
Sheriff
Posts: 4869
317
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nothing anywhere near as elaborate as you. My data requirements are low so I have 2 x 1TB disks configured as RAID 1 (Mirror) which gives a logical disk of 1TB. I do it that way because it's simple and the 2 bay NAS is a nice size to tuck away in a cupboard. No failures yet so can't attest to the ease of replacement process.
 
Saloon Keeper
Posts: 12027
257
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Somehow I'm lucky that I've never had a hard drive fail before. The only time I ever did a RAID setup it was RAID 0 and I never had problems with it.

I wouldn't really worry if it did happen though. Most of my saved games are in Steam, my code is in GitHub and my other personal files are in Dropbox. If one of my systems dies, I mostly just have to reinstall software. The most annoying thing would probably be that I would have to reconfigure my IDEs, so it might be worth exporting my settings and backing those up.
 
Bob Winter
Ranch Hand
Posts: 127
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
just as a small update: rebuild took only 6h46m this time - quite fast in comparison - and even the first time I had my system just sit there "idle" to gave it most power for the rebuild - weird

Tim Cooke wrote:Nothing anywhere near as elaborate as you. My data requirements are low so I have 2 x 1TB disks configured as RAID 1 (Mirror) which gives a logical disk of 1TB. I do it that way because it's simple and the 2 bay NAS is a nice size to tuck away in a cupboard. No failures yet so can't attest to the ease of replacement process.


I guess a (rather small) mirror set is sufficient for most users who even bother with setting up a raid or a nas - it has some benefits: it protects the data if one drive fails, it doesn't have a write penalty and in fact often have an increased read performance
have it sit as a small 2-bay nas in some corner so one can access it over the network with all devices is a good idea - but, when not done correctly, can lead to some issues when two (or more) devices try to write at the same time
lucky you for not had a failure yet - but I can asure you: someday it will happen
as for "requirements": well, it's not like I would really "require" that much space - it's more like: "yea, I had the money back then to effort it - so I did it" - but as it starts to fill up (currently I'm closing in on 4tb) everytime I think about it its value just increases a tiny bit in terms of: "I have the space - I don'T even care to check for duplicates" (which I'm sure I have at least some of)

Stephan van Hulst wrote:Somehow I'm lucky that I've never had a hard drive fail before. The only time I ever did a RAID setup it was RAID 0 and I never had problems with it.

I wouldn't really worry if it did happen though. Most of my saved games are in Steam, my code is in GitHub and my other personal files are in Dropbox. If one of my systems dies, I mostly just have to reinstall software. The most annoying thing would probably be that I would have to reconfigure my IDEs, so it might be worth exporting my settings and backing those up.


also lucky you for not had a drive failure yet - I had so many since I touched my first c64 (sadly only have very few fregmented memories back to those times) - let alone how many floppies I destroyed
as for "not worry if a drive fails": well, sure, the really really important data are also secured on another disk offline in the bedroom - so for them there's a "real" backup - but I see it this way: when something goes wrong and I'm in a situation where the backup isn't accessible immediately being able to at least have the ability to quickly take a thumbdrive and quickly copy over the files when a drive goes bad is worth all the effort
sure - a lot of space is taken by several TB of games from steam and all those other launchers - so losing them would only be a matter of hours to get them back (I have a really fast connection of 250mbit/s which steam is capable of fully utilizing it even over several hours) - and to be honest: most of them I hadn'T touched in years - so some may ask why I bother to keep them on disk - well, because I can

also - I know that some reading this may think: why use seagates? or why use the same model?
yes, I'm aware that a "good" array should consist of mostly random picked drives of different model, type and manufacture to minimize failure rate by production error affecting more than one drive at once - but as for why seagate: I can'T really tell, but seagates barracuda are the only ones not failing within the first month of usage - other models and most from other manufactures fail pretty fast within weeks or few months - but seagate barracuda? they keep running for years - don't know if it's me - and for the diversity: well, as many other types keep failing fast I would be rebuilding my raid every other week - aside from the pile of failed drives I would had collected this way
I also know that barracuda are not really rated for 24/7 or for close proximity in raid configurations - but I tried a lot of different types - even special datacenter ones specificly designed for my use case - but either they cost way too much (sorry, but pay about twice the price? that's pretty tough tho) and in the end fail too fast anyway so not worth their price
 
Saloon Keeper
Posts: 22126
151
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a funny one. Back in the last century, I was administrator for a couple of Windows NT servers with hardware RAID in them. For two years running all would be more or less well until I went on annual vacation. About 2 days in, a drive would fail. No problem, RAID can deal with that. But before I got back, a second RAID disk would fail, taking the array out of service entirely.

Recovery was a major pain. First of all, the big-name commercial backup/restore program never did what it was supposed to do and we had no way to confirm it was operating properly short of building a spare server - which didn't come up as an option at the time. And, incidentally, restoring Windows was a MAJOR pain, since there was no way of restoring only the parts of the Registry that the rebuilt system needed.

The second problem was worse. Product volatility was so high that we could never manage to buy an exact replacement for failing disks. So just rebuilding the array was a challenge.

I'm happy to say that this sort of stuff has never been an issue on Linux. You can get much better tools for free there than you can get help on the commercial products I've worked with.

While several of my servers had motherboard-based RAID as an option, I've only used software RAID on Linux. However, there are several other options.

One alternative to RAID is to use LVM mirroring. This is block mirroring at the logical volume level. It works best for comfort as opposed to "must-restore" environments. You can have mirrors on the same physical disk, so if it's just sector failures you have a cushion while you migrate to another disk or block the failing sectors. LVM can also do more complicated (and safer) mirroring, and I think even specific official RAID architectures now.

Another alternative is distributed mirroring. I'm presently using a glusterfs filesystem where the "bricks" mirror on 3 separate computers. That raises the odds of a killing system failure astronomically. Perhaps literally, since it would take something like a meteor smashing through the roof and trashing all 3 boxes at once. OK, maybe a hurricane, which around here is more likely. The gluster filesystem can rebuild a brick transparently in the background, so there's no downtime. Bigger enterprises use Ceph, but Ceph is a pig and has a fairly narrow arena of supported OS's/versions.

On average, I have 1-2 disk failures a year, so continuity is an essential part of my operational scheme, I also do nightly backups with periodic backup to external media and have provisioning systems to rapidly rebuild servers. Something always goes wrong, of course, but the last time I had to swap out a server it was relatively painless.

Incidentally, you can buy hot-swap disk bays for about $20-25 if you're using something like a tower machine that didn't come with hot-swap built in. It's not a bad investment. I can build servers offline then walk the disks over to their production homes.
 
Stephan van Hulst
Saloon Keeper
Posts: 12027
257
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Bob Winter wrote:other models and most from other manufactures fail pretty fast within weeks or few months


This is not normal. If it happens so frequently, there must be some cause not related to the disks themselves. Are your drives close to a strong source of electromagnetic fields, or mechanical vibration?
 
Tim Cooke
Sheriff
Posts: 4869
317
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim you never fail to get my mostly dormant 'nerdy tinkering mode' up and paying attention, especially after the last time you shared an insight into your data management systems.
 
Tim Holloway
Saloon Keeper
Posts: 22126
151
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:

Bob Winter wrote:other models and most from other manufactures fail pretty fast within weeks or few months


This is not normal. If it happens so frequently, there must be some cause not related to the disks themselves. Are your drives close to a strong source of electromagnetic fields, or mechanical vibration?



Unfortunately, sometimes the cause is external. I'm afraid I developed an aversion to Western Digital drives because back in the '90's I worked at a place where we bought a batch of 40GB drives and about 30% of them failed totally in the first three weeks of service. And it turned out that our network administrator hadn't  implemented a decent backup plan. The president had me literally dumpster diving in an attempt to find something usable. I lost about 3 weeks of work.

My wife got an Acer notebook with a WD 2-inch drive that was about 60% bad sectors from the factory. I bought a new one and it was even worse. The third attempt, however worked just fine and still does. It was a different model, though.

On the other hand, the enterprise-grade Seagate drives I used on the next job came with a 7-year warranty. And the only time I actually cashed in on one, the support person was in Little Rock. And had a son living about 15 miles from where we were.

Your Mileage May Vary. I've heard of some bad Seagate runs, I don't think they warrant anything for 7 years anymore, and Western Digital has been happily supplying drives to Apple without noticeable complaints.

Those are the two primary vendors these days, having eaten all the others such as LaCie, Fujitsu, IBM and the like.
 
Marshal
Posts: 69495
277
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:. . . I don't think they warrant anything for 7 years anymore . . . .

When mSATA drives were all the thing for laptops, their manufacturers advertised them with MTBF 1000000 hours. I wondered how thoroughly they had tested it, particularly since 1000000 hours is about 114  years.
 
Tim Holloway
Saloon Keeper
Posts: 22126
151
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It has been explained. It's largely statistical. Plus some stressing, I think. An Internet search should turn up answers.
 
Bob Winter
Ranch Hand
Posts: 127
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:

Bob Winter wrote:other models and most from other manufactures fail pretty fast within weeks or few months


This is not normal. If it happens so frequently, there must be some cause not related to the disks themselves. Are your drives close to a strong source of electromagnetic fields, or mechanical vibration?


I'm aware that my observation is far from "normal" - and at least as far as I can tell the drives failed on me never were handled any worse as the the ones not failed (or not failed as fast as others). So, honestly: I can't tell if there is/was an external source that caused so many drives just from a different model to entire different manufacturer fail faster than ones branded seagate model barracuda. All I can tell: Yes, I have tried different types several times, as I know that diversity should at least taken into consideration. But, and that's just observations, drives from WD and even seagate ones but not barracudas had a shorter lifespan than those with barracuda on their label. I just can't tell any difference or some at least somewhat plausible ideas as to why this is happening to me over and over again - all I can tell is the fact that it does so.
On the other hand: Sure I had at least some drives which had a pretty long livespan - like that one 2,5 inch WD drive I had as system drive for several years - but these were rather exceptions than regular diversity.
As for my current setup: I currently use a Cooler Master CM Storm Trooper tower case. It has two harddrive cages for four drives each. The spacing between them is about the same as all those old AT full height towers from the mid-90s. Their mounting system itself is a plastic sled that slides into the metal carrier frame. At least as far as I can tell there is hardly any anti-vibration stuff like rubber grommets or such. In fact just alone the vibration of the 6 drives in the case (5 for the raid + 1 for OS) without any other moving part is enough the make the case itself vibrate at some very low frequency so I can pick it up when I have my foot near one of the case stands. So I'm sure that these drives have a lot of stress in term of all-day vibration they may not really designed for - which sure is another factor for 3 of them failed over the past 12 months. On the other hand I'm sure that CoolerMaster at least had some thought or testing into this design to have the drives spaced and mounted the way this case is designed, at least I hope so.

TLDR: Yes, I know that what I observed is rather unusual - and it may have a tiny bit of "believe into the superiority of seagate barracudas" with it (at least I won't deny it) - but without any intention to "hate" other dirves these are just my observations - and I can'T tell if "me" is at least some small factor to it.

@Tim Holloway
Haha, yea, that's quite some story.
As for hardware vs software raid: I'm aware about the pros and cons of both of them. In fact, if I had knew what I know now about how this raid stuff works I surely would also had gone software raid. But at least as for now I'm just to lazy to fix it. On the other hand: There seem no way to set up a raid5 on a windows client version - at least none that I was able to find via google. I'm still using Win7 x64 Ultimate SP1 - for about the same reason: Drivers!
Very unfortunate as I came up with the plan of setting up my raid I didn't knew what a "fake raid" is. Today I know: It's not a hardware raid like all those raid controller cards but rather only a mock up software raid done by the bios using the power of the main cpu and require some rather special driver to make it work and access it.
I use a ASUS Crosshair V Formula-Z. It has a AMD 990FX chipset and with it this "raid functionality" comes along. The main issue I face against migrating to any other OS is that AMD provides the raid drivers for Windows 7 only. It doesn't work on windows 8/.1, it doesn'T work on windows 10 - and sadly it also doesn't work on linux. Where as linux is at least able to see the drives themselfs without the raid win 8/.1/10 fail to recognize the controller at all unless it's set to AHCI mode. So would it be viable to just set the controller to AHCI and use a software based raid? Well, if I would go lnux using it's tools to manage raid (there'Re several) it wouldn't be an issue as I would had to re-format it from ntfs to some unix supported FS to even make it work. Staying on windows would stop me from using it as a raid 5 as this raid level is only supported on windows server but not on the client versions. Aside from all that as the "bios menu" for what ever reason doesn't have an option for rebuild the array after drive replacement but this has to be done via the windows driver providing a web-based front end to be used by a browser I wouldn't had any chance to rebuild the raid after I had another drive fail.
So, unless someone has an idea how I could use my drives in a raid 5 on a windows client version I guess I'm stuck with windows 7. Sure I could go linux - but in the end I'm a gamer and sadly most of the games I like to play just doesn't work on linux - mostly caused by stupid DRM things like that denuvo copy protect stuff or that BattlEye anti cheat service (which for some odd reason doesn't seem to have a working linux client implementation). Some day when I have the money to build an entire new system I may convert my current one to a big power hungry nas - or find some other way to make use of it.
 
Tim Holloway
Saloon Keeper
Posts: 22126
151
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think - and this is a VERY long time ago - that Microsoft explicitly considered RAID as an "enterprise" feature and thus only included it on their server systems.

It's not unreasonable. A typical Windows user (client) system would only have 1 or at most 2 hard drives and often wouldn't even be running in a box that could accommodate any more - much less RAID-5.

RAID is typically bulk store for critical enterprise resources. Desktops would normally either use LAN-based data or local data that would - ideally, but alas, rarely actually - be backed up by the enterprise.

Home users, unfortunately, aren't expected to be that ambitious.

Having just experienced the massive data download that came with a single game, I can see where you might want some local security, though.

You can buy a stand-alone NAS device for a fairly low price these days, In fact, Lenovo has a NAS box for under $50 - you supply the drives. And all a NAS device ultimately is is a storage-only LAN server. You can - and many people do - make one using a Raspberry Pi if cost is an overriding factor.

What I've considered, however, is simply making a backup copy of the data on a portable external disk. Which, in fact, is one of the several offline backup storage techniques I use should a tornado take out all my mirrored servers.

Really, RAID-5 for anything less than a major business environment is overkill. For most of us, a simple 2-disk mirror is adequate. RAID-5 does make it more likely that you can continue operating without pause in case of drive failure, but as you can see from my own experience, even that isn't foolproof. It's ridiculous that that an array should have failed not once, but twice, but that's how it went.

Far more important is that you maintain a regular backup schedule and do periodic tests. I back up changed files every night at 4AM, do full backups once a week with rotations and archive backups offline periodically. Aside from protecting against hardware failures, not even version control systems can replace restoring from a nightly backup if you've really scrambled some files.

Incidentally, I learned from Windows backups that there's no point in archiving installed software and the only reliable backup of a complete operating system is to do full-disk image backups. As I said, the registry isn't suitable for partial restores.

I don't back up Linux software these days. Since I have automated provisioning, it's simpler to simply direct them to re-install/update everything. And in the case of Docker images, the master "backup" is the repository image and/or local Dockerfile. Similarly with Vagrant. I'm really only backing up data.
 
Rancher
Posts: 167
7
Mac OS X IntelliJ IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What are your thoughts on using Mac's Time Machine (software)? I have been using it for a while now (with external HDD) but never had to do a recovery yet, I'm wondering if it's going to be painless to restore the system and all the data if something happens.
 
Tim Holloway
Saloon Keeper
Posts: 22126
151
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Adrian Grabowski wrote:What are your thoughts on using Mac's Time Machine (software)? I have been using it for a while now (with external HDD) but never had to do a recovery yet, I'm wondering if it's going to be painless to restore the system and all the data if something happens.



I haven't heard about the Time Machine for years. I assume that

A) No one ever uses it
B) No one even knows about it
or
C) It's there and it works so perfectly that no one ever talks about it. Which would be nice, but even Apple rarely gets it done that well.

The drawback is that as far as I know, the Time Machine doesn't mirror like RAID, just keep back generations. So good for "oops" problems, but not hardware failure.
 
I found a beautiful pie. And a tiny ad:
Devious Experiments for a Truly Passive Greenhouse!
https://www.kickstarter.com/projects/paulwheaton/greenhouse-1
    Bookmark Topic Watch Topic
  • New Topic