AI assistant for Beaglebone using LLM
- Anup Halarnkar

- Mar 23
- 3 min read
Introduction
For this project, I have used llama.cpp as the local inference engine and TinyLlama-1.1B-Chat-v1.0 as the language model.
llama.cpp is a lightweight C/C++ inference framework designed to run LLMs locally with minimal setup across CPUs and GPUs. It is well suited for embedded and edge-oriented workflows because it supports efficient local execution without depending on cloud APIs.
The TinyLlama model used here is the chat-tuned 1.1B parameter variant published on Hugging Face under the Apache 2.0 license. The model is available in GGUF format. GGUF is a new format introduced by the llama.cpp team and is a replacement for GGML, which is no longer supported by llama.cpp
For controlled tool-planning and summarization tasks, we launch the model in non-conversation mode so that the model behaves like a constrained completion engine rather than an interactive chatbot.
In the llama.cpp tooling, the completion-style path is intended for prompt-to-output generation, while the chat-oriented tools enable conversation behavior and chat templates.
In my workflow, I use two prompt templates:

A typical launch pattern looks like this:
llama-cli.exe ^
-m .\tinyllama.gguf ^
-f .\planner_prompt.txt ^
-jf .\planner_schema.json ^
-n 160 ^
--temp 0.1 ^
--top-p 0.9 ^
--simple-io ^
-no-cnv ^
--no-display-prompt ^
--no-warmup
Here, the important options are:
-m to load the GGUF model
-f to pass a prompt template from a file
-jf to constrain output with a JSON schema
-no-cnv to disable conversation mode
--simple-io to make subprocess integration cleaner
--no-warmup to reduce startup overhead during rapid testing
- -temp 0.1 makes the model highly deterministic, choosing the most likely word almost every time (great for summarization)
--top-p 0.9 is a sampling safeguardFor summarization, the same pattern can be reused with a different template file:
llama-cli.exe ^
-m .\tinyllama.gguf ^
-f .\summarizer_prompt.txt ^
-n 80 ^
--temp 0.1 ^
--top-p 0.9 ^
--simple-io ^
-no-cnv ^
--no-display-prompt ^
--no-warmupThis approach works well because the model is not asked to “know everything” about the target device. Instead, it performs two narrower jobs:
Study the user query and select the correct tools
Summarize the evidence returned by those tools
That makes the system much more reliable than allowing free-form generation. The Linux truth still comes from deterministic commands executed over SSH, while TinyLlama is used mainly for interpretation and orchestration.
System Architecture

Key Components
2.1 Llama Builder
The builder uses an LLM to decide which diagnostic tools should run.
Example prompt:
User question: Is HDMI connected?
Allowed tools:
hdmi.status
Planner output:
{
"tools": [
{
"name": "hdmi.status",
"args": {}
}
],
"confidence": 0.95
}The builder does not generate commands. It only selects from predefined tools. This prevents unsafe command execution.
2.2 Tool Registry
Each diagnostic tool is defined in the engine.
Example:
hdmi.status
Command executed on device:
for f in /sys/class/drm/*HDMI*/status; do
printf "%s: %s\n" "$f" "$(cat "$f")"
done
The registry maps the tool name to the command and output parser.
2.3 SSH Execution Engine
The engine connects to the device using SSH.
Instead of passing long shell commands through SSH arguments, the command is sent via standard input.
Example:
ssh debian@192.168.1.11 bash -s --The command script is then streamed to the remote shell.
This approach avoids complex quoting issues across:
Windows
SSH
Bash
2.4 Parsing
Raw Linux output is converted into structured data.
Example raw output:
/sys/class/drm/card0-HDMI-A-1/status: disconnected
Parsed result:
{
"connected": false,
"entries": [
{
"path": "/sys/class/drm/card0-HDMI-A-1/status",
"status": "disconnected"
}
]
}Structured evidence makes the results easier to analyze and summarize.
2.5 LLM Response
Finally, the LLM converts evidence into a concise explanation.
Example output:
- HDMI is disconnected.
- Checked DRM HDMI status from sysfs.
- Connector /sys/class/drm/card0-HDMI-A-1/status reported disconnected.The response is constrained to use only the evidence and not invent information.
Example Questions the System Can Answer
3.1 Display diagnostics
Is HDMI connected ?3.2 Sensor discovery
Are any sensors detected ?3.3 Performance analysis
Is the system overloaded ?3.4 Networking
IP address of device ?3.5 Service health
Are any system services failing ?GUI Screenshots





Comments