Navigation

Experiments with FunctionGemma

Exploring how FunctionGemma, a 300MB specialized LLM, can act as a bridge between natural language and local system automation.

FunctionGemma - Natural Language in - Function Call out

Experiments with FunctionGemma

FunctionGemma is a new, very lightweight LLM from Google. It is not designed to be the typical chatbot that most LLMs are today. Instead, it acts as a specialized model that takes text as input and produces a structured function call as output.

What is “Function Calling”?

Say, for example, you have programmed a function that can query the temperature at the location of any city. Such a function could look like this:

def get_weather(city: str) -> float: ...

This function looks very normal, and indeed it is! It might call an external API, run a calculation, or do anything with the input parameter city to generate an output. For example:

>>> get_weather("Augsburg")
4.0

This function was called by me, a human who wrote this line of code so that the function gets executed. I had to think about which function I wanted to call and with which parameters.

The Natural Language Interface

That’s exactly where FunctionGemma comes in. It serves as a natural language interface for my functions. Instead of manually writing the function call, I can just write a sentence describing the action I want to perform:

“What’s the temperature in Augsburg?”

FunctionGemma then tries to figure out the name of the function and its arguments. Its answer to the previous example looks like this:

ToolCall(function=Function(name='get_weather', arguments={'city': 'Augsburg'}))

And now, I’m able to dynamically call this function!

Wiring it Up

Of course, we need to provide the model with the available functions. In Python, this is incredibly easy because functions are first-class objects.

tools = [get_weather]

response = chat(
    "functiongemma",
    messages,
    tools=tools,
)

The model not only sees the function name and parameters but can also read from Python’s docstrings. This is helpful because it adds more context, allowing the model to easily decide which function fits best.

To make the function callable by its name, we can use a registry approach:

tool_registry = {f.__name__: f for f in tools}

# Retrieve and execute the function
func_name = tool_call.function.name
func = tool_registry[func_name]
func(**tool_call.function.arguments)

Controlling my Desktop

I took this a step further and created a script that allows me to control my desktop environment using natural language. I defined functions that map to keyboard shortcuts using ydotool (a generic Linux command-line automation tool).

For example, I defined a function to increase the volume:

def volume_up() -> str:
    """Increase the system volume. Use this when the user wants to make audio louder..."""
    return run_ydotool_command(Keys.VOLUME_UP)

Now, when I type “Pump up the volume!”, FunctionGemma recognizes the intent and outputs a call to volume_up().

In this example you can see functiongemma switching my desktop workspace for me: Controlling my WM via functiongemma

This effectively turns FunctionGemma into a semantic layer between my natural language requests and my operating system’s raw input events.

Reflection

To sum it up, I’m amazed that I can run this model locally on my laptop. Its size is only 300 MB!

I started this experiment to explore how LLMs can bridge the gap between human intent and machine execution and I think this is one of the most useful areas of application for LLMs. They are excellent at recognizing natural language, but can make mistakes and not yet understand “logic”. But functions on the other hand execute logic perfectly without errors, but are rigid and tedious to call.

Using FunctionGemma as the interface allows “non-programmers” (or just lazy programmers like me ;) to enjoy ease of use while maintaining the correctness of underlying code.

What’s next?

I’m thinking about how a shell could be reimagined with FunctionGemma. If you think about it, a shell is just a collection of commands (functions) with help texts (man pages).

In a future experiment, I might try to make an adapter for my shell using FunctionGemma locally. Imagine saying “Find all my Python virtual environments” and having the model instantly translate that to:

find . -type d -name .venv

That would be amazing!

References