ToDo¶
-
Improve docs and add configuration script for new users
- Ansible inventory config for metal provisioning
- Prompt user to edit the inventory file
- Update docs - add sample inventory file contents for reference
- Search-and-replace throughut the repo files:
- Domain name
- Github repository name
- ... ?
- Ansible inventory config for metal provisioning
-
Automate personal gitea user provisioning
-
Stream/export sensor data from baremetal and visualize it in grafana
- some potentially-useful resources to explore:
- https://grafana.com/blog/2019/11/06/how-to-stream-sensor-data-with-grafana-and-influxdb/
- https://grafana.com/blog/2021/08/12/streaming-real-time-sensor-data-to-grafana-using-mqtt-and-grafana-live/
- https://grafana.com/blog/2024/01/03/how-to-create-alert-rules-to-monitor-sensor-data-with-grafana-and-raspberry-pi/
- https://github.com/sbnb-io/sbnb/blob/main/README-GRAFANA.md
- https://grafana.com/grafana/dashboards/237-sensors/
- https://github.com/ncabatoff/sensor-exporter
- some potentially-useful resources to explore:
-
Automate firmware updates
-
Test zapping devices and make sure it works fine. Current version was tested on
draupnir
, which seemed to wipe the disk fine, but I had to do some manual steps afterwards to create OSD on the new node once the node was re-provisioned and joined the cluster again. -
Install ARA to record ansible executions
-
Try rke2 which includes by default Cilium and nginx-ingress + etcd db
-
Replace kickstart with cloud-init
- Kickstart is gonna end (?) and cloud-init is much more agnostic; also it can trigger ansible pull
- Cloud-init can also be used across muppet OSes, not just Fedora, as is the case with kickstart
- ref: https://github.com/khuedoan/homelab/issues/179#issue-2875515756
-
Improve github+gitea workflow
- One of the downsides is that I need to delete non-master (PR) branches on gitea manually (well, via cli, but still)
- Maybe use a separate branch for gitea? E.g.
main
? - Or maybe use a separate remote for gitea (or for github? since gitea is technically considered "the origin"?) E.g.
gitea
(orgithub
, for github repo origin)?
-
Encrypt kubeconfig with sops so it can be committed to git
-
Update architecture/overview components
- Basic diagram of code components and their relations
- Description of components and their purpose
-
Update concepts/pxe_boot with a visual "in-action" showcase of how it works, once it's in place
-
Add up-to-date config files of C1111 and C3560 for reference
- Can be placed in a separate note (probably don't even need to make it visible in nav menu) and referenced from installation/production/network
-
Check that devices on Guest WiFi network (when Eero is in AP/Bridge mode!) are still isolated and cannot see or communicate with each other or the main network.
- Eero in Bridge mode looses a lot of security related functionality (it becomes "greyed out" in the app also.) However, it seems that the guest network can still be enabled from the app. Hopefully that guest network is still isolated, but needs double-checking.
- Some related links:
-
When storing terraform state locally one needs to think about where/how to back it up. An alternative would be to use terraform cloud or opentofu TACOS, which are paid services (Plus your state is stored on someone else's computer, and hence should be encrypted)
- What can be alternatives to storing the state locally?
- Initial provisioning can be done with local state
- Once the cluster is up and running, we can host Atlantis and migrate the state to it.
- As an added benefit, this makes it possible to run terraform from PRs
- Store/commit sops-encrypted state. Run
terraform
with a script/make wrapper that decrypts the state before runningterraform
commands, and re-encrypts it at the end.
- Initial provisioning can be done with local state
- What can be alternatives to storing the state locally?
-
Configure
/etc/hosts
on local controller machine as part ofmetal
provisioning# midgard.local homelab # network devices 10.10.10.1 muspell 10.10.10.2 bifrost # k8s cluster 10.10.10.10 odin 10.10.10.11 freyja 10.10.10.12 heimdall 10.10.10.20 mjolinr 10.10.10.21 gungnir 10.10.10.22 draupnir 10.10.10.23 megingjord 10.10.10.24 hofund 10.10.10.25 gjallarhorn 10.10.10.26 brisingamen # storage devices 10.10.10.30 yggdrasil
-
Configure
~/.ssh/config
on local controller machine as part ofmetal
provisioningHost 10.10.10.* StrictHostKeyChecking no LogLevel ERROR UserKnownHostsFile /dev/null # muspell (C1111 router) in homelab vlan Host 10.10.10.1 muspell User cisco PasswordAuthentication yes # bifrost (C3560 switch) in homelab vlan Host 10.10.10.2 bifrost User cisco PasswordAuthentication yes KexAlgorithms +diffie-hellman-group14-sha1 HostKeyAlgorithms +ssh-rsa # k8s cluster nodes in homelab vlan Host 10.10.10.1* 10.10.10.2* odin freyja heimdall mjolnir draupnir gungnir megingjord hofund brisingamen gjallarhorn User root IdentityFile ~/.ssh/homelab_id_ed25519 StrictHostKeyChecking no LogLevel ERROR UserKnownHostsFile /dev/null GSSAPIAuthentication no # not supported on OS I use today for servers # storage nodes in homelab vlan Host 10.10.10.3* yggdrasil User root IdentityFile ~/.ssh/homelab_id_ed25519 StrictHostKeyChecking no LogLevel ERROR UserKnownHostsFile /dev/null
-
Check if server is up before sending WoL magic packets
-
Ask before proceeding when running
make bootstrap
inmetal
provisioning- The server prefers to boot from Network when woken up, which will erase all data on the disk and re-install the OS
- Ask the user for confirmation before proceeding.
- Mention
make wake
alternative which can be used just to wake up the machines
- Mention
-
Consider restricting ssh access from homelab to router/switch SVI to specific IPs
- [ ] Limit permits to specific IP addresses instead of using! --- Define ACL for traffic FROM Homelab Network --- ip access-list extended ACL_FROM_HOMELAB_NETWORK ! ... ! (Optional: Add permits if Homelab needs to SSH to router's Homelab SVI - local management when e.g. laptop is physically connected to homelab network) 103 remark Permit SSH from Homelab to router's Homelab SVI (local management) 103 permit tcp 10.10.10.0 0.0.0.255 host 10.10.10.1 eq 22 104 remark Permit SSH from Homelab to switch's Homelab SVI (local management) 104 permit tcp 10.10.10.0 0.0.0.255 host 10.10.10.2 eq 22 199 remark --- END --- ! ... exit
10.10.10.0
so that e.g. k8s servers couldn't ssh to Homelab's router or switch -
Provision cisco devices with Ansible
-
Explore Enchanced Power Saving Mode in BIOS
- Newer Lenovo machines support enhanced power saving mode which lowers power consumption during power-off.
- Won't do: WoL is not supported!
-
Configure and document BIOS -> Power -> After Power Loss
- What option is better for my use-cases? Make sure it's configured everywhere and document it.
-
Figure out why dnf is very slow
- References:
- Seems like adding
fastestmirror=True
to/etc/dnf/dnf.conf
helps at least to some degree
-
Setup pi-hole on the cluster