Buy Me a Coffee

Buy Me a Coffee!

Tuesday, May 19, 2026

Automating Podcast Transcription, Subtitles, and AI Summaries with Whisper and Ollama

One of the things I want to add to the CloudTalkShow production pipeline is automatic transcription. After each episode records, I want three things to exist without anyone having to think about it: a full transcript, an SRT subtitle file ready to upload to YouTube, and a summary that Raquel can use as a starting point for show notes. This post walks through how I built that, what went wrong, and how I fixed it. 

The Setup 

The script runs on the Windows 11 VM I have dedicated to OBS production. That VM has an NVIDIA GeForce RTX 4060 passed through from the Proxmox host, which matters a lot here — running Whisper on a GPU versus a CPU is not a minor difference. I also have Ollama installed on the same VM to handle the summarization step locally without sending episode content to an external API. 

Step 1 — Installing the Dependencies 

Everything installs cleanly from the command line. Start with Python and FFmpeg, which Whisper needs to process audio from video files:
winget install Python.Python.3.12
winget install Gyan.FFmpeg
Then install Whisper and the progress bar library:
pip install -U openai-whisper
pip install tqdm

Step 2 — The First Script 

The starting point was straightforward. Point Whisper at every MP4 in the current directory, transcribe each one, and save the result as a text file. Skip anything that already has a transcript so reruns are safe.
import os
import glob
import whisper

model = whisper.load_model("base")

video_files = glob.glob("*.mp4")
if not video_files:
    print("No files found")
else:
    print(f"Found {len(video_files)} to transcribe")

for video_path in video_files:
    base_name = os.path.splitext(video_path)[0]
    transcript_path = f"{base_name}.transcript.txt"
    if os.path.exists(transcript_path):
        print(f"Skipping '{video_path}'")
        continue
    print(f"Processing: '{video_path}'")
    try:
        result = model.transcribe(video_path)
        with open(transcript_path, "w", encoding="utf-8") as f:
            f.write(result["text"])
        print(f"Success: saved to '{transcript_path}'.\n")
    except Exception as e:
        print(f"Error processing '{video_path}': {e}\n")

print("All files processed!")
Running this immediately produced a warning:
warnings.warn("FP16 is not supported on CPU; using FP32 instead")


Whisper was running on the CPU. The 4060 was sitting there doing nothing. The fix is to reinstall PyTorch with CUDA support — the default pip install does not include it:
pip uninstall torch torchvision torchaudio -y
pip cache purge
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
After that, verify the GPU is visible to PyTorch before touching the script:
python -c "import torch; print('GPU Available:', torch.cuda.is_available()); print('Device Name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')"
You want to see your GPU name come back, not None. Once that is confirmed, change the model load line to tell Whisper to use CUDA:
model = whisper.load_model("base", device="cuda")

 

Step 3 — Better Model, Better Output 

With the GPU working, the base model felt like leaving performance on the table. The 4060 has 8GB of VRAM, which is enough to run Whisper's turbo model — a distilled version of large that runs significantly faster with comparable accuracy. I also added timing and a progress bar with tqdm so it is obvious what is happening and how long it takes.
import os
import glob
import time
import whisper
from tqdm import tqdm

print("Loading Whisper 'turbo' model...")
model = whisper.load_model("turbo", device="cuda")

video_files = glob.glob("*.mp4")
if not video_files:
    print("No .mp4 files found in the current directory.")
else:
    print(f"Found {len(video_files)} video file(s). Starting queue...\n")

for video_path in tqdm(video_files, desc="Overall Progress", unit="video"):
    base_name = os.path.splitext(video_path)[0]
    transcript_path = f"{base_name}.transcript.txt"

    if os.path.exists(transcript_path):
        continue

    print(f"\n[Processing] {video_path}")
    start_time = time.time()

    try:
        result = model.transcribe(video_path)

        with open(transcript_path, "w", encoding="utf-8") as f:
            f.write(result["text"])

        elapsed_time = time.time() - start_time
        mins, secs = divmod(int(elapsed_time), 60)
        print(f"[Success] Saved transcript. Time taken: {mins}m {secs}s.")

    except Exception as e:
        print(f"[Error] Failed on '{video_path}': {e}")

print("\nAll tasks finished!")


Step 4 — Adding Subtitles and Summaries 

Whisper's transcription result includes segment-level timing data, which makes generating an SRT subtitle file essentially free — just format the timestamps correctly. I added that along with a summarization step using Ollama and llama3.1 running locally. 

First, install the Ollama Python library and pull the model:
pip install ollama
ollama pull llama3.1
Then the updated script that produces all three output files — transcript, subtitles, and summary — for each MP4:
import os
import glob
import time
import whisper
import ollama
from tqdm import tqdm

def format_srt_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    milliseconds = int((seconds % 1) * 1000)
    return f"{hours:02d}:{minutes:02d}:{secs:02d},{milliseconds:03d}"

model = whisper.load_model("turbo", device="cuda")

video_files = glob.glob("*.mp4")
if not video_files:
    print("No files found")
else:
    print(f"Found {len(video_files)} to transcribe")

for video_path in tqdm(video_files, desc="Overall Progress", unit="video"):
    base_name = os.path.splitext(video_path)[0]
    transcript_path = f"{base_name}.transcript.txt"
    subtitle_path = f"{base_name}.subtitles.srt"
    summary_path = f"{base_name}.summary.txt"

    if os.path.exists(transcript_path) and os.path.exists(subtitle_path) and os.path.exists(summary_path):
        print(f"Skipping '{video_path}'")
        continue

    print(f"\nProcessing: '{video_path}'")
    start_time = time.time()

    try:
        print(" -> Step 1/3: Transcribing...")
        result = model.transcribe(video_path)
        raw_text = result["text"].strip()

        with open(transcript_path, "w", encoding="utf-8") as f:
            f.write(raw_text)

        print(" -> Step 2/3: Creating Subtitles...")
        with open(subtitle_path, "w", encoding="utf-8") as srt_file:
            for index, segment in enumerate(result["segments"], start=1):
                start = format_srt_time(segment["start"])
                end = format_srt_time(segment["end"])
                text = segment["text"].strip()
                srt_file.write(f"{index}\n{start} --> {end}\n{text}\n\n")

        print(" -> Step 3/3: Summarizing...")
        prompt = (
            f"You are writing show notes for 'The Cloud Talk Show', a podcast hosted by "
            f"Larry Smithmier and Ralph Lecesse that covers cloud-native development, "
            f"self-hosted infrastructure, and hands-on technical topics.\n\n"
            f"Based on the following transcript, write engaging show notes in the style of a "
            f"knowledgeable tech blogger. Write in present tense as if describing the episode "
            f"to a potential listener. Do not use phrases like 'this transcript' or 'the host' "
            f"— use 'Larry', 'Ralph', or 'this episode' instead.\n\n"
            f"Structure the output as:\n"
            f"1. A 3-4 sentence episode overview in an engaging, direct tone\n"
            f"2. A bulleted list of key topics and takeaways\n"
            f"3. A one-sentence closing that tells the reader why they should watch\n\n"
            f"Transcript:\n{raw_text}"
        )

        ollama_response = ollama.generate(
            model="llama3.1",
            prompt=prompt
        )

        with open(summary_path, "w", encoding="utf-8") as f:
            f.write(ollama_response['response'])

        elapsed_time = time.time() - start_time
        mins, secs = divmod(int(elapsed_time), 60)
        print(f"Success: All files saved for {video_path}! (Time: {mins}m {secs}s)\n")

    except Exception as e:
        print(f"Error processing '{video_path}': {e}\n")

print("All files processed!")


Step 5 — The Summary Was Bad

The script worked. The transcript and subtitle files were exactly what I wanted. The summary was not.

The output read like someone who had never heard of the show and was hedging every sentence. It referred to things as "the host" instead of Larry or Ralph, and it ended with the phrase "this transcript provides a fascinating glimpse" — which tells you the model had no idea it was writing show notes for a podcast. It was just pattern-matching on "I was given a transcript, I will write about a transcript." 

There were two problems. 
  • Problem 1: The prompt gave the model no context. Without knowing what the show is, who the hosts are, or what tone is appropriate, the model defaults to generic academic summarization. The fix is to front-load the prompt with enough context that the model knows exactly what role it is playing before it reads a single word of the transcript. 
  • Problem 2: The context window was too small. This one is less obvious. Here is what is actually happening under the hood.

Understanding Model Size, Context, and VRAM 

When you run a model through Ollama, two separate things consume your GPU's VRAM: 
  • Model weights are fixed. When you see llama3.1 listed at 4.9GB in Ollama, that is how much VRAM the model itself needs regardless of what you ask it to do. It loads once and stays there. 
  • The KV cache is dynamic. Every token the model processes — both input and output — needs to be stored in what is called the key-value cache. The larger the context window, the more VRAM the cache consumes. This is why models that support 128K context cannot actually use all of it on consumer hardware. 

Rules of thumb for an 8GB GPU like the 4060:

  • Model weights consume roughly what Ollama reports as the model size
  • Each 8K of context adds approximately 0.5-1GB of KV cache for a 7-8B model
  • 32K context adds roughly 2-3GB on top of the model weights — so llama3.1 (4.9GB) at 32K context needs about 7-8GB total. Tight but workable on a 4060.
  • 128K context on an 8GB GPU is not realistic — the KV cache alone would need 15-20GB
  • The Ollama default context is 8K, which for a podcast transcript means the model is almost certainly reading a truncated version of what you gave it and filling in the gaps with hallucination
That last point explains the summary I got. The model did not read the whole episode. It read part of it and made up the rest. The fix is to explicitly set num_ctx in the Ollama call to give it more room:
ollama_response = ollama.generate(
    model="llama3.1",
    prompt=prompt,
    options={"num_ctx": 32768}
)

Step 6 — The VRAM Conflict

Running the updated script with the improved prompt and increased context window immediately hit a new error:
model requires more system memory (6.8 GiB) than is available (3.9 GiB) (status code: 500)

The problem is straightforward once you see it. The script loads Whisper at startup and keeps it in VRAM for the entire run. Whisper's turbo model uses about 6GB of the 4060's 8GB, which leaves only 2GB free. When the script gets to the Ollama step and tries to load llama3.1 at 32K context — which needs roughly 7-8GB — there is simply nowhere to put it. The fix is to explicitly release Whisper from VRAM before calling Ollama, then the full 8GB is available for the summarization step. Python's garbage collector does not do this automatically for GPU memory — you have to do it yourself with three lines:

del model
gc.collect()
torch.cuda.empty_cache()
del model removes the Python reference. gc.collect() runs the garbage collector to clean up any remaining references. torch.cuda.empty_cache() is the important one — it tells PyTorch to actually release the VRAM back to the GPU rather than holding it in reserve for potential reuse. Two other changes go along with this. First, import gc and import torch need to be added to the imports at the top of the script. Second, the Whisper model load moves from the top of the script to inside the loop, so it loads fresh for each file rather than once at startup. This means Whisper reloads on each iteration, which adds a few seconds per file, but for podcast episodes that is completely negligible. The updated loop now looks like this:
# --- Step 1/3: Transcribe (Whisper on GPU) ---
print(" -> Step 1/3: Transcribing...")
model = whisper.load_model("turbo", device="cuda")
result = model.transcribe(video_path)
raw_text = result["text"].strip()

with open(transcript_path, "w", encoding="utf-8") as f:
    f.write(raw_text)

# --- Step 2/3: Subtitles (no GPU needed) ---
print(" -> Step 2/3: Creating Subtitles...")
with open(subtitle_path, "w", encoding="utf-8") as srt_file:
    for index, segment in enumerate(result["segments"], start=1):
        start = format_srt_time(segment["start"])
        end = format_srt_time(segment["end"])
        text = segment["text"].strip()
        srt_file.write(f"{index}\n{start} --> {end}\n{text}\n\n")

# --- Free Whisper from VRAM before loading Ollama ---
del model
gc.collect()
torch.cuda.empty_cache()
print("    (Whisper unloaded from VRAM)")

# --- Step 3/3: Summarize (Ollama on GPU) ---
print(" -> Step 3/3: Summarizing...")
I also had to add code to unload the Ollama model using the following code:
        # After the ollama.generate call, unload the model from VRAM
        ollama.generate(
            model="llama3.1",
            prompt="",
            keep_alive=0
        )
print(" (Ollama unloaded from VRAM)")

The terminal will now print (Whisper unloaded from VRAM) between the subtitle and summarization steps and (Ollama unloaded from VRAM) after, which makes it easy to confirm the sequence is working correctly when you watch a run in progress.



Step 7 — The Updated Script 

Here is the final version incorporating the improved prompt and the increased context window. Note the bug fix in the timing output as well — the earlier version used min as a variable name (which shadows Python's built-in) and then referenced mins in the print statement, which would have thrown a NameError on any file that actually completed successfully.
import os
import glob
import time
import whisper
import ollama
import gc
import torch
from tqdm import tqdm

def format_srt_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    milliseconds = int((seconds % 1) * 1000)
    return f"{hours:02d}:{minutes:02d}:{secs:02d},{milliseconds:03d}"

video_files = glob.glob("*.mp4")
if not video_files:
    print("No files found")
else:
    print(f"Found {len(video_files)} to transcribe")

for video_path in tqdm(video_files, desc="Overall Progress", unit="video"):
    # At the start of each loop iteration, ensure Ollama isn't holding VRAM
    try:
        ollama.generate(model="llama3.1", prompt="", keep_alive=0)
    except:
        pass  # Model wasn't loaded, that's fine
    base_name = os.path.splitext(video_path)[0]
    transcript_path = f"{base_name}.transcript.txt"
    subtitle_path = f"{base_name}.subtitles.srt"
    summary_path = f"{base_name}.summary.txt"

    if os.path.exists(transcript_path) and os.path.exists(subtitle_path) and os.path.exists(summary_path):
        print(f"Skipping '{video_path}'")
        continue

    print(f"\nProcessing: '{video_path}'")
    start_time = time.time()

    try:
        # --- Step 1/3: Transcribe (Whisper on GPU) ---
        print(" -> Step 1/3: Transcribing...")
        model = whisper.load_model("turbo", device="cuda")
        result = model.transcribe(video_path)
        raw_text = result["text"].strip()

        with open(transcript_path, "w", encoding="utf-8") as f:
            f.write(raw_text)

        # --- Step 2/3: Subtitles (no GPU needed) ---
        print(" -> Step 2/3: Creating Subtitles...")
        with open(subtitle_path, "w", encoding="utf-8") as srt_file:
            for index, segment in enumerate(result["segments"], start=1):
                start = format_srt_time(segment["start"])
                end = format_srt_time(segment["end"])
                text = segment["text"].strip()
                srt_file.write(f"{index}\n{start} --> {end}\n{text}\n\n")

        # --- Free Whisper from VRAM before loading Ollama ---
        del model
        gc.collect()
        torch.cuda.empty_cache()
        print("    (Whisper unloaded from VRAM)")

        # --- Step 3/3: Summarize (Ollama on GPU) ---
        print(" -> Step 3/3: Summarizing...")
        prompt = (
            f"You are writing show notes for 'The Cloud Talk Show', a podcast hosted by "
            f"Larry Smithmier and Ralph Lecesse that covers cloud-native development, "
            f"self-hosted infrastructure, and hands-on technical topics.\n\n"
            f"Based on the following transcript, write engaging show notes in the style of a "
            f"knowledgeable tech blogger. Write in present tense as if describing the episode "
            f"to a potential listener. Do not use phrases like 'this transcript' or 'the host' "
            f"— use 'Larry', 'Ralph', or 'this episode' instead.\n\n"
            f"Structure the output as:\n"
            f"1. A 3-4 sentence episode overview in an engaging, direct tone\n"
            f"2. A bulleted list of key topics and takeaways\n"
            f"3. A one-sentence closing that tells the reader why they should watch\n\n"
            f"Transcript:\n{raw_text}"
        )

        ollama_response = ollama.generate(
            model="llama3.1",
            prompt=prompt,
            options={"num_ctx": 32768}
        )

        with open(summary_path, "w", encoding="utf-8") as f:
            f.write(ollama_response['response'])

        # After the ollama.generate call, unload the model from VRAM
        ollama.generate(
            model="llama3.1",
            prompt="",
            keep_alive=0
        )
        print("    (Ollama unloaded from VRAM)")

        elapsed_time = time.time() - start_time
        mins, secs = divmod(int(elapsed_time), 60)
        print(f"Success: All files saved for {video_path}! (Time: {mins}m {secs}s)\n")

    except Exception as e:
        print(f"Error processing '{video_path}': {e}\n")

print("All files processed!")


Alternative: mistral-nemo for Better Long-Context Summarization 

If the summary quality on long episodes still feels like it is missing content, there is a better model for this job. mistral-nemo is a 12B model that runs at 7.1GB — it fits on the 4060 — and it handles long-form structured output significantly better than llama3.1:8b. It also has a native 128K context window, which means it is less likely to lose the thread on a long episode even at moderate num_ctx settings. 

Pull the model:
ollama pull mistral-nemo
The only change to the script is the model name in the ollama.generate call:
ollama_response = ollama.generate(
    model="mistral-nemo",
    prompt=prompt,
    options={"num_ctx": 32768}
)
At 32K context, mistral-nemo needs roughly 9-10GB of VRAM — slightly over the 4060's 8GB. In practice Ollama will offload some layers to system RAM automatically, so it will still run, just a bit slower than a model that fits entirely on the GPU. Whether the quality improvement is worth the trade-off is worth testing on a real episode.

What Is Next 

The script is working but it is not yet wired into the production pipeline. My plan is to add a trigger so it runs automatically when a new recording lands in the OneDrive folder — either a file system watcher or a scheduled task that checks for new MP4s. Once that is in place, Raquel will have the transcript, subtitles, and summary waiting for her before she even opens Camtasia. 

I also want to upload the SRT file to YouTube automatically as part of the same pipeline, since that step is currently manual. More on both of those when I get there. 

The script and final version are available if you want to use them — drop a comment or reach out if you have questions.

Friday, May 15, 2026

CloudTalkShow: Building a Lobby and Adding Chat

In my last post I talked about getting NGINX and VDO.Ninja up and running on my Proxmox setup. At the time I mentioned that I was planning on using VDO.Ninja to help with production of the Cloud Talk Show and that I wanted to expose the OBS WebSocket interface for the producer. I have done both of those things, and I want to walk through what I built and the problems I ran into along the way.

The Setup

The goal was to replace Microsoft Teams as the coordination and recording tool for the show. Teams compresses video streams significantly and the labor involved in mixing everything into a program was more than we wanted to deal with long term. Here is what I ended up with:

  • VDO.Ninja running in my NGINX LXC to handle peer-to-peer video streams
  • OBS Studio running on a Windows 11 VM to mix the streams into a program
  • OBS-Web running on the NGINX LXC so the producer can control OBS remotely through a browser
  • A lobby web page that everyone uses to meet before the show starts

Each participant pushes two separate streams into VDO.Ninja — one for their camera and one for their screen share. I have five participants, so that is ten streams total. The lobby page pulls all ten of those streams into small preview blocks arranged in two columns, and also shows the live program output from an OBS virtual camera so everyone can see what is going out. I call it the lobby because we use it to get organized before we go live, the same way you would mill around in the lobby before a meeting starts.

Setting up the VDO.Ninja streams was straightforward once I understood the push/view URL pattern. Each participant gets a push URL they open in their browser, and the lobby page uses the corresponding view URLs in iframes. The OBS-Web setup required a bit more NGINX configuration to proxy the WebSocket connection through to the OBS WebSocket server running on the Windows 11 VM at port 4455.

Adding Chat

We have recorded 2 shows using the setup, but there were some shortcomings.  The lobby page was missing one thing that we had in our old MS Teams setup: a way for participants to talk to each other while they are getting set up. VDO.Ninja has a built-in chat feature but it didn't work with how I had the streams set up — I am using completely separate push/view pairs for each stream rather than a room, so the chat context isn't shared across the lobby page.

I wanted something simple and ephemeral. No accounts, no database, messages are gone when the server restarts. I already had Node.js v20 installed on the LXC, so I went with a minimal ws WebSocket server. No Express, no Socket.IO, just the WebSocket library and about 20 lines of code.

Here is the server:

const { WebSocketServer } = require('ws'); const wss = new WebSocketServer({ port: 3001 }); wss.on('connection', (ws) => { ws.on('message', (data) => { wss.clients.forEach((client) => { if (client.readyState === 1) { client.send(data.toString()); } }); }); }); console.log('Chat server running on port 3001');

I installed it as a systemd service so it starts automatically and restarts if it crashes:

[Unit] Description=CloudTalkShow Chat Server After=network.target [Service] ExecStart=/usr/bin/node /opt/chat-server/server.js Restart=always User=www-data WorkingDirectory=/opt/chat-server [Install] WantedBy=multi-user.target

Then I added a new NGINX vhost for chat.smithmier.net to proxy WebSocket connections through to port 3001, grabbed a Let's Encrypt cert with Certbot, and dropped a chat widget into the lobby page HTML. The widget is plain JavaScript — a WebSocket connection, an input for your name, an input for the message, and a scrolling message area. Enter key sends. Nothing fancy.

The Problems

Getting here took longer than it should have, and the problems were all interesting enough to be worth documenting.

The Certbot chicken-and-egg problem. I configured the NGINX vhost with the SSL certificate paths before the certificate existed. NGINX failed its config test, Certbot couldn't run because NGINX was broken, and I was stuck. The fix is simple once you know it: start with an HTTP-only vhost, get the cert, then let Certbot add the SSL configuration. Obvious in retrospect.

Hairpin NAT. Once the cert was in place and the server was running, the chat worked when I tested it locally on the LXC but timed out from every other machine. The curl output told the story — the DNS A record pointed to my public IP 173.49.39.254, but my Verizon CR1000A router does not support hairpin NAT. Traffic that leaves the network and tries to come back in through the same public IP just gets dropped. My other subdomains worked because I had already added them to Pi-Hole with the internal IP 192.168.1.50. Adding chat.smithmier.net → 192.168.1.50 to Pi-Hole fixed it immediately.

The Chrome service worker. After all of that, the lobby page itself stopped loading in Chrome and Edge. Firefox was fine. The error in the Chrome console was The FetchEvent resulted in a network error response: the promise was rejected, which is a service worker error, not a network error. OBS-Web is a Svelte PWA and it had registered a service worker in Chrome at some point that got into a broken state. Clearing it from chrome://serviceworker-internals/ fixed it. Edge had the same issue since it runs the same engine.

What Is Next

The lobby and chat are working well. The next thing I want to write about is the overall show workflow — how the producer uses OBS-Web to control the mix, how we handle the recording, and how we get from a raw recording to a finished YouTube upload. I also want to dig into the VDO.Ninja configuration in more detail, because there are some bitrate and quality settings that made a significant difference in stream quality that are worth sharing.

If you are running a similar setup or have questions about any of the pieces, feel free to reach out.

Friday, March 20, 2026

NGINX and VDO.Ninja: the most recent additions to my local Proxmox

 I have finally decided to expose some of my internal services to the outside world and thus need some security in place.  I have decided to set up NGINX as a proxy and web host.  I initially looked at Caddy, but found the installation and configuration more than what I was able to do in my spare time.  NGINX allowed me to manage the installation and configuration with small steps.

First, I built an LXC and installed NGINX and ddclient to manage keeping my CloudFlare domain names mapped to the dynamic IP address provided by my ISP.  I installed Debian and used apt to install NGINX and ddclient.  I created and installed an SSL certificate from CloudFlare to allow secure communication.  Lastly, I added a simple test web site and confirmed that I could access it externally.

I then installed VDO.Ninja as a local website and set up a proxy through to an instance of LeanTime that I have been using to track my internal projects.  I added a couple of domain names and confirmed that they resolved correctly through the proxy.  I do have additional services that I will expose in the future, but this is enough for now.

As an aside, VDO.Ninja is going to be something I plan on using to help with production of the Cloud Talk Show going forward.  It allows peer to peer connections for video streams that can be included in OBS Studio as sources.  We have been using MS Teams to coordinate and record the shows and OBS to capture our program locally. We then have a producer mixing the different streams into a program that is uploaded to YouTube and published.  The main issues are that MS Teams compresses video streams and greatly reduces the fidelity of the stream and the labor involved in mixing or producing the program. 

I have built a Windows 11 VM with OBS installed on it to manage streams.  As mentioned above, I am going to eventually expose the OBS WebSocket interface for the producer to interact with directly.

Monday, February 23, 2026

OpenMediaVault for Sharing Files

The next service I added to my Proxmox was OpenMediaVault (OMV) in a virtual machine rather than an LXC.  LXCs are great, lightweight containers, but they use a shared kernel and don't have the level of isolation you get from a VM.  I also wanted to easily pass hardware through to the VM to allow it to control the USB ports on my machine to give it the ability to directly control my external drives.

There could be a strong case made that I should have installed OMV first and had it controlling the media containing my .mkv files served by Plex. And if I were to sit down and build it all out in a weekend, then that would surely be the way I would go about it.  However, it was a few months after my initial setup that I got around to setting up a NAS.  

When my server lost a drive (RAID 0, so total data loss but I did have backups) and I installed replacement drives I dedicated one to OMV and redirected all backups to it.  It wasn't hard to do, I was able to do it all using directions found on the Proxmox site.  

I have since opened up NFS, SMB, and FTP shared directories.  I then added Veeam Backup agents on my desktops and laptops with OMV as a backup target.  

Thank you for following me on my journey, please feel free to reach out to me if there is anything you want me to dig into more.

Sunday, December 21, 2025

Plex for home video serving

 The first service I installed on my home server was Plex.  Back during the time of Blockbuster, we decided to purchase movies rather than rent them if there was a chance that we would want to watch them more than once.  We ended up with over 400, and have long ago stored them in books and thrown out the cases but it was still a lot of trouble to dig through them and find what we wanted to watch.  I wanted to rip the DVDs and eventually BluRays but didn't have the storage for them for a long time.  I eventually got an external drive of sufficient size and started ripping movies.  Eventually I found MakeMKV and purchased a license for it. I have been running Plex since 2020 and purchased a lifetime subscription in 2023, so it wasn't a new thing for me to have internally.  

I had been running in on my desktop with an external drive.  I have an Intel NUC that I tried hosting it on for a while, but it wasn't powerful enough to really do it justice.  I often ran into reboot issues due to patching, or I would shut down my machine at the end of the day and have to go and start it back when someone wanted to watch a move.  

Adding the Plex LXC service was super easy using the Proxmox VE Helper-Script to build and install the image.  After adding it, I only needed to mount my external drive and start serving movies.  I had set up 5 of my 6 2G drives as a RAID 0 and moved many of the movies there to increase .  When I lost a drive in the array last week, I replaced 2 of them with very large drives (18G and 22G) and pulled all of my movies onto one of them.  I repurposed the external drive as the large drive on one of my new cluster servers and have it mounted as a target for Proxmox Backup.

I don't have much more to say about running Plex locally, it doesn't require much maintenance.  I added the service to Observium to monitor it and I update the base operating system and run the update of Plex itself through its interface.  As I write this, I am now looking at scripts to automatically update the underlying images. ;-)

I do expose the UI in NextCloud, but we really only use it through our Rokus.  I also picked up a FireStick last month when they were on deep discount at Target.  I hadn't plugged it in until just now, and I haven't finished playing with it yet.  Plex is honestly one of the easiest home applications I am currently running and if you have a CD, DVD, or BluRay collection at home there is no good reason to not take the plunge.

Friday, December 19, 2025

What I am running at home

Since we moved from Brooklyn to Wilmington, I have enough room to start setting up a real home lab.   

Side note: I have had my R710 since 2018 (thank you TechMikeNY) but it hasn't had a real home where I could actually run it as a server for over 3 years. I initially purchased it when Magenic was getting deep into a Pivotal Cloud Foundry partnership so I could run CF locally. I overbought (High-End Dell PowerEdge R710 Server 2x 2.93Ghz X5670 6C 144GB 6x 2TB) but it was something I had been wanting to do for a while and I was able to run it in the Magenic offices in Manhattan. I got a 1/4 rack on wheels and had a great time with it until Magenic closed the office and I had to bring it home to a Manhattan apartment.

I have a 1GB Verizon FiOS connection running in and a semi-finished basement and a full sized room for my office.  With a door!  Anyway, let me do a quick inventory and I will try and come back and talk about each entry in more detail later.  Sharing is caring, and I do want to brag a bit about what I have going on.  

First, I choose Proxmox as my base system.  I know VMWare has ESXi as a free hypervisor and Microsoft still has Hyper-V Server 2019 available, but I wanted open source as well as free with a great UI.  I looked at Unraid as it has perpetual licensing and looks great, but I decided Proxmox VE was a better fit.

Here are the different computers I have running as part of my Proxmox cluster:

  • Dell Poweredge R710
  • Dell XPS 17 L702X
  • Dell Precision M4800
  • HP Omen 40L
  • Dell Alienware Aurora R5
I started out just running on the R710 (The Beast) but thought it would be fun to try clustering and, admittedly, things got a little bit out of hand.  The service that I am currently hosting are:
I have some Windows 11 and openSUSE desktops running in VMs, and use Veeam to backup our desktops to OpenMediaVault.

I have not moved to Ceph storage to allow for VM migrations yet, as it it honestly a bit daunting and I still have a lot of work to do with my standard services.

I will try and get back into blogging more and will dig into the individual pieces as I do.  For example, I have Ollama running locally on my desktop, in a VM with a PCI passthrough, and on The Beast using raw memory and CPU.  The VM and local are memory constrained by the video cards I have but the  raw CPU and memory instance is slow.  

Thanks for your interest!

Tuesday, May 31, 2022

Tool roundup

http://carnackeys.com/ is great utility for presenting on Windows. It shows your keystrokes so you don't have to talk through your shortcuts.


https://ko-fi.com/ is a site to monetize your site or blog.

https://github.com/NVlabs/ffhq-dataset is a dataset of high quality faces to be used for AI training.

https://www.youtube.com/watch?v=6Rxqk3Lcvrw is one of my favorite movie clips to use when explaining scope creep.

https://jamstack.org/ is an architectural style I am interested in learning more about.

https://hyper.is/ is an open source terminal.

https://umbraco.com/products/umbraco-cms/ is an open source CMS system built on .Net.

Learn how to branch using Git on this site https://learngitbranching.js.org/.

https://anuket.io/ is trying to build standards for cloud native networking through reference infrastructures and test suites.

https://www.joelonsoftware.com/2008/03/17/martian-headsets/ is from a while ago, but the best article I have found to explain technical debt and the complexity of compatibility.