FireZen – My Island in the Sea of Data.

Long time..

I haven’t updated my blog..

Lots has changed in the lab:

Now running a cluster of 4-5 Proxmox Servers to play with HA and CEPH, both have been truly eye opening in the way of maintaining 0.99x up time. With this combination I can update them, set one to reboot, watch as it shunts all its HA VMs/CTs to other systems, reboots and takes them back. no more services down, no more wife/kids screaming or waiting until the dead of night for updates.

A note before I continue, these “Servers” the only real server grade is the one I built in Lab runneth over, its been upgraded over the years, Dual Intel(R) Xeon(R) CPU E5-2667 v2 and now 256GB of RAM. The rest are used/old hand me downs and a repurposed gaming PC I built. You don’t need the best to learn in a home lab.

The cluster also came with a need for 10GB networking, I beta tested the cluster before making the switch to live and the biggest problem I ran into (also the internet long had told me it would be an issue but again beta) was running CEPH over 1GB. Before bringing it live I bought them all 10GB SFP+ nics and a new eight port SFP+ switch.

New switch brought about a completely network design overhaul, Originally it was:

WAN>(1GB E)Router>(10GB SFP+)24/2P Switch>Everything
And I could have added the new 8 Port SFP+ after the 24/2P as it had a spare SFP+ port and daisy chained it.

Instead I split them:
WAN>(1GB E)Router>(10GB SFP+,VLANs 1 and 2)24/2P Switch

WAN>(1GB E)Router>(10GB SFP+,VLANs 3 and 99)8P SFP+ (Also self containted VLANs 10 and 11*)

Then both switches are linked and pass their respected VLAN’s (*other than 10 and 11) between them.

*The new switch brought about new fun, the beta ran CEPH and Proxmox’s Cluster traffic over my default “Lan” network, Yeah not what I wanted and I also didn’t want the Firewall/IDS/IPS anywhere near that traffic to cause any latency after the 1GB test, so they both got their own VLANs that only traverse the SFP+ switch.

Tied AD to Authentik, originally for giggles after a random youtube video, now everything that can uses it and while having a single SSO solution by its self was awesome, the real magic came when I figure out MFA then shortly after how to use my Yubikey. Now logging into anything is simply: Punch Yubikey pin, touch Yubikey, Done.

Also a check in from one of other posts where I got Linux tied to AD: Another from the It Bucket list, using a combo of that and the above, I have a Debian box tied to AD that I (and a friend who also toys in my lab) can SSO into.

Tried out various other projects that I’ll give short blurbs about:

Immich: I like it but it needs a way to stay synced with my Google photos until I’m ready to actually make the switch.
Graphene (Android Rom): Ran it for a month, had Work and Google profiles separated from the “default” got exasperated with the constant profile switching, went back to stock.
Bunch of random AI/LLM stuff: Things raining from starting with following a tutorial on getting Openwebui started to falling down the rabbit hole of writing python scripts to make an AI sort my mailbox, also N8N.

I know I’m probably missing things, its been a long time since I posted.

Thats it for now, Thanks for reading.
FireZen

AD….Again

No Fancy Pictures this time.

So something I wanted to try for a long while, and I think I even tried last year during/after the Linux AD Lab was setting up a Windows NPS server to run as a Radius front-end to AD and get one of my Access points to auth to that as opposed to FreeRadius which I have been using for years.

Now there is nothing wrong with FreeRadius, I started with it back in my Pi2 testing days, and ran it off one of them for years until I migrated my main router to PF then OPNsense where it had the ability without a second device along with a nice gui, self contained Letsencrypt for certs..

I digress.

After figuring out NPS I got radtest authenticating to it from my Linux pc, Neat but not secure, FreeRadius was atleast using certs and such.

Certs… Wasnt there an AD component for Certs? Yup installed AD CA, generated up a cert for the computer and it was now able to authenticate via eapol_test.

Could I take this a step further though? currently it verified the server and used AD User name and Password for Authentication…

User certs! Yeah… In its current configuration I am setup for Cert/Smartcard login with user certs. No password. Neat and something I have never seen Freeradius do.

Next steps, I have multiple networks I want to provide separate logins for and I can see how to do this in NPS, something I dont see in Freeradius, atleast the one in Opnsense. so I will slowly migrate them over.

I think my only complaint is that either NPS or AD CA required the gui version of Windows server, I would have much preferred it to run from server core.

Another from the It Bucket list

Most may have Guessed I am not a Windows Guy, I prefer Linux, Mainly Debian for Servers and Arch for my Desktop and Laptops.

I do dabble to keep my window skills sharp enough to be dangerous, From spinning up a new Active Directory Server every major Windows Server Revision, Installing and getting a fully functioning Exchange environment last fall and a Windows Distribution Server (PXE boot Windows installs over the network) Most of these don’t last longer then there intended projects and all being virtual I keep them around and updated for around 6 months to a year before killing them and starting over.

One thing I haven’t been able to do over the years was tie a Linux box to MS AD for authentication, I have tried for years (read once, for maybe a a max of a week per year for years) and could just never get it to work.

Cant say that anymore:

I got the Server tied to AD, a test user created:

Initial Login was successful with the only quirk being the home folder being broken.

I fixed that and made a new group for “LinuxAdmins” to give anyone in it ‘sudo’ access:

End result:

Lab runneth over

Its seems I update this page yearly, lets see whats happened over the last year.

I upgraded the Networks switching from the old 1G Dell Power connect 48 port I had gotten free to a combination of Mikrotiks, one 24/1G Eth 2/10G SFP+(CSS326-24G-2S+) and a 4/10G SFP+ 1/1G Eth SFP (CRS305-1G-4S+)

With this network upgrade I got both the “Servers” and the “Wireless” on VLans as well as going over the 10G link coming out of the OPNsense router now to the New switches, “Wireless” is split out to the access points.

I was forced off of ESXi at the end of last year with the announcement that ESXi was ending support for version 6.7 in November of 2021, I have just read they actually extended it to October 2022.

The end of support hurt as 7.0 dosent support the hardware ware in my server nor did it support the 10G SFP+ card inside.

To be ahead of the End of Support and to make sure I didn’t get lost in security holes I started seeking out alternatives.

The one I landed on and have been running since roughly October of last year is Proxmox VE, 95% of my Virtual Machines were converted without a problem. The two from memory that did cause an issue were my ansible control Linux system and a Windows 10 box. These were recreated and have been functioning happily.

I decided to forgo the Hardware raid and return to my long lost love, ZFS Originally opting for Raid Z3 until I saw a lot of performance degradation so I dropped it down to Raid Z2 and have been running it like that since.

I also went through a time of testing NVME drives as Log and Cache for the SSD this ended up being more detrimental then helpful, since then I instead have created a NVME mirrored pool that hosts a single VM I use for work.

My server is starting to get aged.. I first bought it in February 10, 2016, even then the Motherboard Z9PA-D8 was 3 years old coming out atleast in February 7, 2013. Which Next year will make then a Decade old.

The CPUs in it are not the best it can handle, currently it has Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz and the best I can find for it is Intel Xeon E5-2690 v2 @ 3.00 GHz, This would give me a 900MHz Boost per core (5.4GHz though I know thats not how it works) on top of 4 extra cores, Per cpu as this is a Dual CPU system. Both Processors are Discontinued and came out Q3’13.

On the ram side I upgraded the Ram from 128GB to 256GB not long ago with the switch to Proxmox and ZFS, ZFS likes yummy ram.

Looking over Amazon for a new server, Matching but allowing for expand-ability is roughly five thousand dollars:

Price

ASUS Z11PA-D8 Server CEB Motherboard Socket-P LGA3647 for Intel Xeon Skylake Scalable Processors Featuring DDR4, Opens in a new tab

ASUS Z11PA-D8 Server CEB Motherboard Socket-P LGA3647 for Intel Xeon Skylake Scalable Processors Featuring DDR4ASUS Z11PA-D8 Server CEB Motherboard Socket-P LGA3647 for Intel Xeon Skylake Scalable Processors Featuring DDR4

$619.97

Intel Xeon Gold 6226 Processor 12 Core 2.70GHZ CPU CD8069504283404 (OEM Tray Processor), Opens in a new tab

Intel Xeon Gold 6226 Processor 12 Core 2.70GHZ CPU CD8069504283404 (OEM Tray Processor)Intel Xeon Gold 6226 Processor 12 Core 2.70GHZ CPU CD8069504283404 (OEM Tray Processor)

Qty:1

$1,628.88

Samsung 64GB/4Gx4 DDR4-2666 ECC/REG Load Reduced CL19 Server Memory Model M386A8K40BM2-CTD7Q, Opens in a new tab

Samsung 64GB/4Gx4 DDR4-2666 ECC/REG Load Reduced CL19 Server Memory Model M386A8K40BM2-CTD7QSamsung 64GB/4Gx4 DDR4-2666 ECC/REG Load Reduced CL19 Server Memory Model M386A8K40BM2-CTD7Q

Qty:4

$325.67

Noctua NH-D9 DX-3647 4U, Premium CPU Cooler for Intel Xeon LGA3647 (Brown), Opens in a new tab

Noctua NH-D9 DX-3647 4U, Premium CPU Cooler for Intel Xeon LGA3647 (Brown)Noctua NH-D9 DX-3647 4U, Premium CPU Cooler for Intel Xeon LGA3647 (Brown)

$99.95

10Gb PCI-E Network Card NIC Compatible for Intel X520-DA2(Intel E10G42BTDA), Dual SFP+ Port, with Intel 82599EN Controller, 10G PCI Express LAN Adapter Support Windows Server/Windows, Linux, Vmware, Opens in a new tab

10Gb PCI-E Network Card NIC Compatible for Intel X520-DA2(Intel E10G42BTDA), Dual SFP+ Port, with Intel 82599EN Controller, 10G PCI Express LAN Adapter Support Windows Server/Windows, Linux, Vmware10Gb PCI-E Network Card NIC Compatible for Intel X520-DA2(Intel E10G42BTDA), Dual SFP+ Port, with Intel 82599EN Controller, 10G PCI Express LAN Adapter Support Windows Server/Windows, Linux, Vmware

Qty:1

$175.00

Save 5%

Clip Coupon

ASUS - MOTHERBOARDS TPM SPI Module System Components MOTHERBOARDS, Opens in a new tab

ASUS – MOTHERBOARDS TPM SPI Module System Components MOTHERBOARDSASUS – MOTHERBOARDS TPM SPI Module System Components MOTHERBOARDS

Qty:1

$19.97

CORSAIR RM1000X 80+ 1000w GOLD MODULAR PSU, Opens in a new tab

CORSAIR RM1000X 80+ 1000w GOLD MODULAR PSUCORSAIR RM1000X 80+ 1000w GOLD MODULAR PSU

Qty:1

$265.99

RROYJJ 4U Rackmount Server Case Chassis with 24 Hot-Swappable SATA/SAS Drive Bays, Opens in a new tab

RROYJJ 4U Rackmount Server Case Chassis with 24 Hot-Swappable SATA/SAS Drive BaysRROYJJ 4U Rackmount Server Case Chassis with 24 Hot-Swappable SATA/SAS Drive Bays

Qty:1

Subtotal (11 items): $4,802.43

Thats my Next Goal.

Of Routers and Labs

So Over the last months the ‘Router’ ESXi system was screaming its head off whenever it got close to 100% cpu usage, this would be generally when anything in the house would cause a spike in the Opnsense VM.

Along with this the drives in there were the last of the consumer drives from the V1 lab and it seems I was in prefailure of another drive, No data lost but ESXi wouldn’t allow me to update as the drive would disappear any time I tried.

I had a small 256GB Enterprise drive left over from the last set so I migrated all the VMs from the Router to the man system, So much for Disaster Mode, and set about ripping all the drives out of the Router systems chassis, I put in the single 256GB and install OPNSense on the baremetal, worries me come upgrades but for the moment the backup restored easily with only having to reassign the NICs to the physical interfaces rather then the virtuals and my 10G NIC came back after a config tweak so my ‘Servers’ branch (Most of my VMs run on this as its my DMZ) of my big server runs over a 10G link again.

It could be placebo but it feels alot faster, the system runs cool, I cleaned it at the same time so I bet that helped.

System Status

Its been a hell of a ride.

So within recent memory I pulled the consumer drives from the Main server as they were causing issues again and kept making the system unstable.

Both the Main and Sub server updated to ESXi 7 during a routine update, which didn’t go well, The main servers age proved to be a detriment as its hardware raid, the raid I just got done putting the new enterprise SSDs in, was no longer supported in 7, nor was the 10/40GB SFP+ Nic cards, I ended up downgrading the Main server back to 6.7 but keeping the Sub on 7, The main server needed the raid but I figured I could always test the Sub and see if those Nics ever came back.

I bought a Raspberry pi 4, after a few different Project, Retro pi, Zabbix Server, Cluster Head, It settled into the role of the NAS for a few Months, I got a four Drive USB 3 “Toaster” that I put four of the consumer SSDs in and backed it with a 5TB USB 3 HDD, This ran great, low power usage and the SSDs never gave me any grief, I had the system rsync nightly the changes from the flash “raid” to the HDD, Until..

Introduction of the i7! Ok so its nothing new, This is my fathers old system and by old I mean 3960X, it had 64GB of RAM but before giving me the system he pulled half for other systems, Not a big deal, First thing I did was yank it out of the case it was in and put it in a 4U Rack case, Little tricky with it having a 240mm water cooler but eventually I got it all to fit, Stole the drives back from the PI, All of them. Built the system using Manjaro Architect, Things were ok for a while, I bought the missing 32 GB for the i7, was running a Minecraft server on it for my kids and family, slowly instability crept in along with corruption, I shut it down for months as I didn’t have time to troubleshoot what was going on, I bought some new drives and decided to tear it all apart and test components, found one of the 8GB RAM sticks was bad, rebuit it with couple 2TB Ent SSDs, a 120GB ENT Boot SSD, the 4X500 Consumer toaster, and the 5TB Rust. it been running Arch since last weekend and after tuning the Network setting its been stable.

The Raspberry pi 4 /4GB became my Zabbix system, I bought one of the newer pi4 8GB and ran it as my desktop for about a month before calling it quits, Thinking of moving my (this) Web site and mail server to it so its easier to shutdown the servers without interrupting services.

Months Later

Alright its been a little over a month since the crash.

Main server is running perfectly again, this time hardware raid on Enterprise grade SSDs, with the old consumer ones reinstalled after being checked over multiple times, they are also under hardware raid but not willing to host anything outside of projects I am willing to lose.

This web server still runs across six Raspberry Pi 3b(+) and I plan to keep it this way for the unseen future. Its quite capable, running https, and multiple web apps.

Back up and running!

So the system is back up, most systems got through unscathed, it seems my FreeNas box which hosts my in home Plex server is worse for the wear, I’m backing it up and getting ready to pull the 3tb HDDs it sits on and put them through a battery of tests to see if they are indeed failing as well or if they were just corrupted from the crash.

After pulling the other SSDs from the system and putting them through their paces it seems they are unscathed, I plan to put them back in, likely in the 3tbs place.

All for now!

Bonus! Picture of the pi rack this runs on:

Don’t mind the stray pi2.

A Crash for the Ages

The title says it all, I went on Vacation a few weeks back, everything was running fine, I decided to brings 80% of the systems down as no one would be here to use them, I.E: Me.

I get back and start booting them all up only to start having issues left an right in different Machines, Background most of my ‘Lab’ is virtualized, maintained on one Main Server, Dual CPU 6 core, 128GB ram, over 3TB SSD and over 3TB HDD, sad part is under a money crunch the 3TB of SSD were consumer grade and thats what ended up being my downfall. It seems one began to fail and ESXi just freaked out.. Now the entire system is down, I have made back ups of what I could, ordered new Enterprise Drives, and await there.

This is running on 6 Raspberry PIs, you may remember my old cluster project, 7 raspberry pis running Gentoo for a single goal: self sufficiency that goal was met with one building the kernel, another storing everything outside of root, slowly they all started to die without a will to rebuild it was reborn into this. Gluster backend, MariaDB galera split over the six of them.

Edit!: And now with https!