@taizi said:
docker 45 is now UP
Downtime: 27w 5d 9 hr 50 min
Target: Server Agent
Noticed at: 2023-02-14 13:03:51 (UTC+08:00)
docker 45 is now DOWN
Target: Server Agent
Noticed at: 2023-02-14 13:09:49 (UTC+08:00)
?????????????????????
my colocrossing dedi resurrected??
for 6 mins
System Uptime
273d 12hr 3min
so they just unplugged the network,but server still on power
@VirMach Is there anything wrong with Node PHXZ001?
This node seems went offline and made the control panel not loading in last weekend.
Now the control panel page loads but VPS remains offline even after trying boot/reboot from control panel.
@VirMach said:
That does not mean it will be fast or guaranteed. If that bothers you, there's only one other option: we tell you we don't have it and provide a prompt reply. We don't want to do that but if you want that, that's always been the default state. You have a service, it's functional. You don't have the data because a re-install occurred. We still offered to extend the service, look for backup and restore it free of charge. Backup has not been located. Backup that did get located ran into filesystem errors and disappeared for a lot of these that are still left over. We're not just sitting here twiddling our thumbs and making you wait for no reason.
On January 21st 2023 (in addition to your posts above), you updated my Ticket (#634655) on the Virmach website, saying this:
"Due date has been extended. I'm going to do one last check over the weekend and if nothing viable is found, I'll let you know so we can close out this ticket. If it is found then we'll proceed with restoring it how you prefer: did you want it to override your service or provide credentials for it to be compressed and dumped inside the VM?"
I replied on the same day with:
"Thanks. Do you mean all folders from the old VM would zipped with a password and dumped on the current VM's C: drive? If so, then that would probably be easiest."
I haven't heard anything back from you since. Did you check for the backup? When can I expect that?
Also, needless to say, the "extension" was rather pointless as there's been no conclusion and your system has already billed me again.
@HiEndSoul said: @VirMach Is there anything wrong with Node PHXZ001?
This node seems went offline and made the control panel not loading in last weekend.
Now the control panel page loads but VPS remains offline even after trying boot/reboot from control panel.
My VM on node PHXZ001 is working fine, so I expect it is not a node issue.
Will Tokyo Ryzen migration to EPYC be available in the future?
If I cancel Tokyo now, will I still be charged a setup fee in the future?
Thank you.
Yes.
@taizi said:
Sorry, your account is not eligible to create any orders.
imagine your account even can't purchase Account Support Level
That's a feature, not a bug. To be able to purchase the support level you can mention it in your account appeal ticket. Not you specifically though, just in general that's how it would happen. For you I don't recommend making another ticket, it'd just get merged. Your ticket's the only one I think still in the queue and not placed on hold so about 500 more tickets to go and I'll get to it.
@fan said: Possible disk error/corruption on TYOC040 like 026? Just found the node was unlocked and boot disk is gone.
Not necessarily the whole disk, but your lvm is obviously knackered. Not 100% sure that it applies to your case, but I've had this happen before. Normally VirMach will automatically fix it within a ~couple of weeks. One sign that it has been fixed is that the O/S in SolusVM is blank. You will then need to reinstall.
@fan said:
Possible disk error/corruption on TYOC040 like 026? Just found the node was unlocked and boot disk is gone.
Update: I/O error when access the virtual disk, so reinstallation won't work.
It just keeps getting knocked offline. As in the PCIe link drops. All Tokyo servers are already patched to the max pretty much to resolve all the previous problems but there was possibly at some point a kernel update, firmware update, or BIOS update and now it's no longer in proper equilibrium.
I remember @FrankZ was able to emulate a situation that took down the drive on AMSD030X so it's not necessarily indicative as a "bad" drive. Could be perfect health. Could also be reputable brand SSD. These new problems popping up are NOT related to the XPG fiasco.
(edit) Oh I forgot why I mentioned Frank, that node has basically been stable ever since he stopped stressing the server. So if he can do that, it also means other people can possibly trigger a dropoff, whether intentionally or not. And it's not an easy case of identifying abuse. This can unfortunately happen in a fraction of a second, not hours of thrashing. I'd basically need to be a kernel engineer with a full-time job of diagnosing this to go any further with it. And don't worry this isn't a case of me being incapable, I also phoned in a lot of intelligent friends and they all basically couldn't take it that far. One of them did assist us in fixing maybe 1 out of 10 things that could cause a dropoff and instead it just "overloads" in those scenarios. The overloads happen if for example people start mass re-installing after they see a disk message like yours, it balloons out of control before it can recover. If we could code up a better/faster detection system that isn't intensive what we could do is force the server to basically lock itself out from SolusVM. We've gotten that done to some degree, I just need to push out an update.
It's definitely frustrating but this is something that's had 6 years of Linux kernel bug reports. Seems like every kernel update it may introduce a new specific scenario where perhaps if someone's VM ends up using swap space or something super specific happens, or multiple VMs perform certain extremely spikey behavior it occurs. It would explain why we keep seeing it in Tokyo since that entire region is very spiky in usage. I'm open to any suggestions that aren't "go back in time and buy U.2 drives."
Basically for NVMe SSDs to function properly the motherboard, CPU, kernel, firmware, everything has to perform spectacularly or else it will go away. We've since coded out a "rescuer" that checks and runs on a cron and does everything it possibly can to automatically bring it back up but once it drops off it creates a domino effect that has a low success rate without a cold reboot on LInux. On Windows, in my testing when I stressed the NVMe and it dropped off it would basically fix itself within seconds. On Linux, not so much.
Some of these, if it ends up being related to a specific motherboard being sub-par or not on the perfect combo of everything, will drop off and only come back after hours of attempts.
@AuroraZero said:
Always @FrankZ fault, every damned time with that guy!!!
---
@VirMach I am sure @fan will be happy to hear that the disk dropped off line and that his lvm is more than likely fine. I did not realize that the disk issue that we were able to recreate when testing AMSD030X was an ongoing issue. Thank you for the update and the reasoning above. I will add this to my list of answers to questions like fans's.
Just a copy/paste from my past VM list. I always write "RYZEN" in caps because it just does not look right as "Ryzen".
Apologies if I negatively affected any of your sensitivities.
(insert evil laugh gif from above)
@VirMach said:
Basically for NVMe SSDs to function properly the motherboard, CPU, kernel, firmware, everything has to perform spectacularly or else it will go away. We've since coded out a "rescuer" that checks and runs on a cron and does everything it possibly can to automatically bring it back up but once it drops off it creates a domino effect that has a low success rate without a cold reboot on LInux. On Windows, in my testing when I stressed the NVMe and it dropped off it would basically fix itself within seconds. On Linux, not so much.
Have you tried doing a "Secondary Bus Reset" on the PCIE controller to bring back the NVME?
If I recall, Linux doesn't do this (by default or at all) -- there's a hackish way to do it with setpci.
I'd speculate if the bursts are giving NVME firmware a bad day that the HBM might be disabled or the buffer might be too low to round it out.
@taoqi said:
I want to know, is there any way to solve the problem of multiple accounts in the future?
Change households, get a new computer, maybe get a legal name change for good measure and then pay with Coinbase.
Because I want to buy more vps, I did register two accounts on January 4, one of which has been refunded and the other has been marked. For this situation, I admit, but I also want to change the current state. Is there any way to deal with the marked ones? I can pay the management fees for multiple accounts. I just want to use vps better.
my theory is that virmach builds with Ryzen 5XXXs (or specifically 5900X) arent exactly stable due to "insert answer"
feel free to comment yall
People running multiple simultaneous yabs disk tests.
i have 4 VPS on the same node in SEA (3900X). If Virmach allows I can try YABS all in one go, and see how that goes. heck he can even verify if they are all provisioned on the same disk.
@taoqi said:
I want to know, is there any way to solve the problem of multiple accounts in the future?
Change households, get a new computer, maybe get a legal name change for good measure and then pay with Coinbase.
Because I want to buy more vps, I did register two accounts on January 4, one of which has been refunded and the other has been marked. For this situation, I admit, but I also want to change the current state. Is there any way to deal with the marked ones? I can pay the management fees for multiple accounts. I just want to use vps better.
my theory is that virmach builds with Ryzen 5XXXs (or specifically 5900X) arent exactly stable due to "insert answer"
feel free to comment yall
People running multiple simultaneous yabs disk tests.
i have 4 VPS on the same node in SEA (3900X). If Virmach allows I can try YABS all in one go, and see how that goes. heck he can even verify if they are all provisioned on the same disk.
I was just funning with you about the yabs. What node are you on in Seattle? I am on most nodes in Seattle so before you break my stuff ....
I informed him of the method I used when asked to try and break the testing node (AMSD030X), it was not yabs, so he does know how to get the disk to drop off if he wants to. I think the issue may be that trying to completely resolve the problem with the motherboards/kernels/etc available has been elusive.
my theory is that virmach builds with Ryzen 5XXXs (or specifically 5900X) arent exactly stable due to "insert answer"
feel free to comment yall
People running multiple simultaneous yabs disk tests.
i have 4 VPS on the same node in SEA (3900X). If Virmach allows I can try YABS all in one go, and see how that goes. heck he can even verify if they are all provisioned on the same disk.
sets on @cybertech shoulder and whispers do it into his ear
Comments
CHI2z027 seems down?
System Uptime
273d 12hr 3min
so they just unplugged the network,but server still on power
@VirMach Is there anything wrong with Node PHXZ001?
This node seems went offline and made the control panel not loading in last weekend.
Now the control panel page loads but VPS remains offline even after trying boot/reboot from control panel.
@VirMach Hello,my service on TYOC036 node has been offline for 51 days, can you help me deal with it?
Invoice #1413533
Thanks.
On January 21st 2023 (in addition to your posts above), you updated my Ticket (#634655) on the Virmach website, saying this:
I replied on the same day with:
I haven't heard anything back from you since. Did you check for the backup? When can I expect that?
Also, needless to say, the "extension" was rather pointless as there's been no conclusion and your system has already billed me again.
I want to know, is there any way to solve the problem of multiple accounts in the future?
Yes, it's very easy to solve - just don't use multiple accounts.
Haven't bought a single service in VirMach Great Ryzen 2022 - 2023 Flash Sale.
https://lowendspirit.com/uploads/editor/gi/ippw0lcmqowk.png
My VM on node PHXZ001 is working fine, so I expect it is not a node issue.
LES • About • Donate • Rules • Support
Change households, get a new computer, maybe get a legal name change for good measure and then pay with Coinbase.
Yes.
That's a feature, not a bug. To be able to purchase the support level you can mention it in your account appeal ticket. Not you specifically though, just in general that's how it would happen. For you I don't recommend making another ticket, it'd just get merged. Your ticket's the only one I think still in the queue and not placed on hold so about 500 more tickets to go and I'll get to it.
It just keeps getting knocked offline. As in the PCIe link drops. All Tokyo servers are already patched to the max pretty much to resolve all the previous problems but there was possibly at some point a kernel update, firmware update, or BIOS update and now it's no longer in proper equilibrium.
I remember @FrankZ was able to emulate a situation that took down the drive on AMSD030X so it's not necessarily indicative as a "bad" drive. Could be perfect health. Could also be reputable brand SSD. These new problems popping up are NOT related to the XPG fiasco.
(edit) Oh I forgot why I mentioned Frank, that node has basically been stable ever since he stopped stressing the server. So if he can do that, it also means other people can possibly trigger a dropoff, whether intentionally or not. And it's not an easy case of identifying abuse. This can unfortunately happen in a fraction of a second, not hours of thrashing. I'd basically need to be a kernel engineer with a full-time job of diagnosing this to go any further with it. And don't worry this isn't a case of me being incapable, I also phoned in a lot of intelligent friends and they all basically couldn't take it that far. One of them did assist us in fixing maybe 1 out of 10 things that could cause a dropoff and instead it just "overloads" in those scenarios. The overloads happen if for example people start mass re-installing after they see a disk message like yours, it balloons out of control before it can recover. If we could code up a better/faster detection system that isn't intensive what we could do is force the server to basically lock itself out from SolusVM. We've gotten that done to some degree, I just need to push out an update.
It's definitely frustrating but this is something that's had 6 years of Linux kernel bug reports. Seems like every kernel update it may introduce a new specific scenario where perhaps if someone's VM ends up using swap space or something super specific happens, or multiple VMs perform certain extremely spikey behavior it occurs. It would explain why we keep seeing it in Tokyo since that entire region is very spiky in usage. I'm open to any suggestions that aren't "go back in time and buy U.2 drives."
Basically for NVMe SSDs to function properly the motherboard, CPU, kernel, firmware, everything has to perform spectacularly or else it will go away. We've since coded out a "rescuer" that checks and runs on a cron and does everything it possibly can to automatically bring it back up but once it drops off it creates a domino effect that has a low success rate without a cold reboot on LInux. On Windows, in my testing when I stressed the NVMe and it dropped off it would basically fix itself within seconds. On Linux, not so much.
Some of these, if it ends up being related to a specific motherboard being sub-par or not on the perfect combo of everything, will drop off and only come back after hours of attempts.
@VirMach Can you please respond to my query posted above? Thanks.
Always @FrankZ fault, every damned time with that guy!!!
"I would have gotten away with it too, if it wasn't for that meddling Frankz and Mason!!"
@VirMach I am sure @fan will be happy to hear that the disk dropped off line and that his lvm is more than likely fine. I did not realize that the disk issue that we were able to recreate when testing AMSD030X was an ongoing issue. Thank you for the update and the reasoning above. I will add this to my list of answers to questions like fans's.
LES • About • Donate • Rules • Support
time for kernal 6.1!
I bench YABS 24/7/365 unless it's a leap year.
what CPU's on AMSD030X?
I bench YABS 24/7/365 unless it's a leap year.
RYZEN 5900X
LES • About • Donate • Rules • Support
WHY ARE WE YELLING?
"I would have gotten away with it too, if it wasn't for that meddling Frankz and Mason!!"
Roar..!!
Are you changing propic every two days lol.
https://microlxc.net/
Just a copy/paste from my past VM list. I always write "RYZEN" in caps because it just does not look right as "Ryzen".
Apologies if I negatively affected any of your sensitivities.
(insert evil laugh gif from above)
LES • About • Donate • Rules • Support
@FrankZ Hey, Hey, Hey @cybertech started it.
@Flying_Chinaman frustrating isn't it?
"I would have gotten away with it too, if it wasn't for that meddling Frankz and Mason!!"
my theory is that virmach builds with Ryzen 5XXXs (or specifically 5900X) arent exactly stable due to "insert answer"
feel free to comment yall
Edit: seems like DALZ008 is also 5900X
I bench YABS 24/7/365 unless it's a leap year.
WUT DA F**
I bench YABS 24/7/365 unless it's a leap year.
Have you tried doing a "Secondary Bus Reset" on the PCIE controller to bring back the NVME?
If I recall, Linux doesn't do this (by default or at all) -- there's a hackish way to do it with setpci.
I'd speculate if the bursts are giving NVME firmware a bad day that the HBM might be disabled or the buffer might be too low to round it out.
People running multiple simultaneous yabs disk tests.
LES • About • Donate • Rules • Support
Because I want to buy more vps, I did register two accounts on January 4, one of which has been refunded and the other has been marked. For this situation, I admit, but I also want to change the current state. Is there any way to deal with the marked ones? I can pay the management fees for multiple accounts. I just want to use vps better.
i have 4 VPS on the same node in SEA (3900X). If Virmach allows I can try YABS all in one go, and see how that goes. heck he can even verify if they are all provisioned on the same disk.
I bench YABS 24/7/365 unless it's a leap year.
you just want to break terms of service.
I bench YABS 24/7/365 unless it's a leap year.
I was just funning with you about the yabs. What node are you on in Seattle? I am on most nodes in Seattle so before you break my stuff ....
I informed him of the method I used when asked to try and break the testing node (AMSD030X), it was not yabs, so he does know how to get the disk to drop off if he wants to. I think the issue may be that trying to completely resolve the problem with the motherboards/kernels/etc available has been elusive.
LES • About • Donate • Rules • Support
sets on @cybertech shoulder and whispers do it into his ear
you know you want to so do it
what's the worst that can happen?
give into the darkside and do it
"I would have gotten away with it too, if it wasn't for that meddling Frankz and Mason!!"