JMartin
Mar 25 2004, 04:22 AM
Just to let you guys know that today I noticed that mojo4 was stuck, not responsive to VNC. I visually checked it and it looked fine, power on, NIC had signal, so I powered it off. After connecting kvm to it, I powered it back on and so far it appears to be dead; most likely a mobo, cpu, memory. I don't have time to diagnose it fully right now, but I'll check it out more on Saturday, then let you know.
-Jeff
BadHat
Mar 25 2004, 08:54 AM
Bummer, I have an XP2100 processor if you need it.
Or maybe just chicken soup?
ColdinCbus
Mar 25 2004, 01:10 PM
Our first causality of the war.
Dersgniw
Mar 25 2004, 04:09 PM
Hmmm... good thing we all have multiple crunchers to make things like this easier to diagnose.
Does anyone know off hand want the warrantees are on the different parts?
JMartin
Mar 25 2004, 04:15 PM
QUOTE
Does anyone know off hand want the warrantees are on the different parts?
That's one of the things I will be looking into!
I did some further testing this morning and it looks like a mobo at this point. I'll dig into it later on.
JMartin
Mar 27 2004, 04:20 AM
Update!
Mojo 4 is well. Here's the short version: ID10T error - me.
I set the bios for all these blades so that if power is lost, the machine will restart automatically when power returns.
All blades, that is, except for Mojo 4 who is now compliant
ColdinCbus
Mar 27 2004, 01:59 PM
That's good news.
lilhurricane
Mar 27 2004, 02:55 PM
Wonderful news, Jeff!!
BadHat
Mar 27 2004, 08:55 PM
I thought I was the only one that did those ID10T errors...
jlhugh
Mar 28 2004, 04:05 AM
Yep, I get thoes same errors every now and then.
Good to see that you figured it out. That is all that matters.
slava
Mar 28 2004, 01:21 PM
QUOTE
Update!
Mojo 4 is well. Here's the short version: ID10T error - me.
I set the bios for all these blades so that if power is lost, the machine will restart automatically when power returns.
All blades, that is, except for Mojo 4 who is now compliant

Way to go Jeff.
Great too hear you fixed it.
JMartin
Apr 16 2004, 02:40 PM
Well she's acting up again. The board is running but something is halting the os. I think I'm going to take it down tomorrow and re-image on a new drive.
slava
Apr 17 2004, 02:07 AM
QUOTE
Well she's acting up again. The board is running but something is halting the os. I think I'm going to take it down tomorrow and re-image on a new drive.
Hope it's just the OS and you can work around that and get it fixed.
How is the heat in that room, now that it's getting warmer down there?
JMartin
Apr 17 2004, 07:47 PM
It's staying around 72 or lower, but we haven't had any real heat here to speak of. That room is the first right of the ducting and I'm usually able to keep it like a meat locker.
JMartin
Apr 18 2004, 03:03 AM
Here's an odd one. I finally had time to dig into this, so I hooked up a k/v/m to it so I could really see what was going on. Bottom line, it was being deined an IP address. Nothing I did to the setting would get an address. I think it has to be something with my network rather than the board so I just gave it a fixed IP for now until I can research the *real* problem.
So, she's no longer sick
jlhugh
Apr 18 2004, 07:31 PM
Yep, that is a weird problem.
Glad you got it somewhat figured out.
JMartin
Apr 23 2004, 02:38 PM
It looks like I'm going to have to rip into this thing again. It'll run for half a day and then hang.
lilhurricane
Apr 23 2004, 09:30 PM
QUOTE
It looks like I'm going to have to rip into this thing again. It'll run for half a day and then hang.
You'll get it up & running, Jeff...
jlhugh
Apr 24 2004, 09:59 PM
You can do it.
JMartin
Apr 28 2004, 03:02 AM
Update.
I put a different hard drive on it and loaded a fresh image. I'll run it for a day without crunching just to make sure it's stable. Then, I'll start it crunching. My thought process is that by keeping it idle for a day, which was longer than it would stay up before, and then throwing a load it by crunching, I'll assume it's a heat problem. How to diagnose things from there, I'm not sure.
slava
Apr 30 2004, 01:06 AM
QUOTE
Update.
I put a different hard drive on it and loaded a fresh image. I'll run it for a day without crunching just to make sure it's stable. Then, I'll start it crunching. My thought process is that by keeping it idle for a day, which was longer than it would stay up before, and then throwing a load it by crunching, I'll assume it's a heat problem. How to diagnose things from there, I'm not sure.
`
Keep at it Jeff, I know whatever you run up against with those blades you can fix....
lilhurricane
Apr 30 2004, 01:11 AM
Here, here!! No doubt about that
jlhugh
Apr 30 2004, 04:26 PM
If these boards have temp monitors on them then you can see how much the temp goes up when you put the load on it. If it's quite a bit, we may need to buy some artic silver for it and scrape that thermal pad off the HSF.
JMartin
May 1 2004, 12:02 AM
Ok, new drive and image ran fine for over a day. Put a load on it and within a half day, it locked up.
I rebooted into BIOS and did a reset to default, then set the CPU performance-related parameters back to what the other boards are. It's been running fine ever since. Wonder how it got mucked up...
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.