Local LLM's with .Net

Local LLM's with .Net

Phi3/Mistral/TinyLamma CPU inference with Llamasharp

Introduction

In this article we will explore performing inference on GGUF models with Llama.cpp using the Llamasharp nuget package. It sounds like it should take longer than it actually does.

GGUF models are probably one of the easiest models to work with, both using python and C#. If your goal is to use C# to integrate a local model into an existing development or if you want to build something from the ground up. No need to search to far, Llamasharp has all the basic functionality you require.

We'll just use a new console application in Visual Studio for the sake of not overcomplicating this.

Step 1: Install LLaMaSharp

First, you need to install the LLaMaSharp library. LLaMaSharp is a cross-platform library that allows you to run LLaMA/LLAVA models (and others) on your local device. To install LLaMaSharp.

  1. Use the Nuget package manager:

  1. Use the command:

     dotnet add package LLaMaSharp
    

Step 2: Install LLaMaSharp.Backend.Cpu

Next, install the LLaMaSharp.Backend.Cpu package, which provides the necessary backend to run LLaMaSharp using only the CPU. Follow these steps:

  1. Again either throught the Nuget package manager:

  1. Or using the command:

     dotnet add package LLaMaSharp.Backend.Cpu
    

Step 3: Select a GGUF Model

You can use almost any GGUF model with LLaMaSharp, but here are some models that I have tested and found to be efficient in terms of speed and accuracy:

  1. Microsoft/Phi-3

  2. Mistral 7B

  3. TinyLamma

There are plenty more i have used, but these were the most speed/accuracy prominent ones. Don't expect miracles either, you won't be running GPT4o on a CPU anytime soon. But this is nifty for certain use cases.

Step 4: Implement the code

Now that you have installed the necessary packages and selected your GGUF model, it's time to implement the code. Below is an example of how to set up and run a model using LLaMaSharp and the selected backend.

  1. Create a new C# project (if you haven't already):

     dotnet new console -n LLaMaSharpExample
     cd LLaMaSharpExample
    
  2. Implement the code in your Program.cs file:

     using LLama.Common;
     using LLama;
    
     string modelPath = @"<Your Model Path>"; // change it to your own model path.
    
     var parameters = new ModelParams(modelPath)
     {
         ContextSize = 1024, // The longest length of chat as memory.
         GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
     };
     using var model = LLamaWeights.LoadFromFile(parameters);
     using var context = model.CreateContext(parameters);
     var executor = new InteractiveExecutor(context);
    
     // Add chat histories as prompt to tell AI how to act.
     var chatHistory = new ChatHistory();
     chatHistory.AddMessage(AuthorRole.System, "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.");
     chatHistory.AddMessage(AuthorRole.User, "Hello, Bob.");
     chatHistory.AddMessage(AuthorRole.Assistant, "Hello. How may I help you today?");
    
     ChatSession session = new(executor, chatHistory);
    
     InferenceParams inferenceParams = new InferenceParams()
     {
         MaxTokens = 256, // No more than 256 tokens should appear in answer. Remove it if antiprompt is enough for control.
         AntiPrompts = new List<string> { "User:" } // Stop generation once antiprompts appear.
     };
    
     Console.ForegroundColor = ConsoleColor.Yellow;
     Console.Write("The chat session has started.\nUser: ");
     Console.ForegroundColor = ConsoleColor.Green;
     string userInput = Console.ReadLine() ?? "";
    
     while (userInput != "exit")
     {
         await foreach ( // Generate the response streamingly.
             var text
             in session.ChatAsync(
                 new ChatHistory.Message(AuthorRole.User, userInput),
                 inferenceParams))
         {
             Console.ForegroundColor = ConsoleColor.White;
             Console.Write(text);
         }
         Console.ForegroundColor = ConsoleColor.Green;
         userInput = Console.ReadLine() ?? "";
     }
    

Step 5: AI away

Congratulations! You've successfully set up your console application. Now, it's time to put your model to the test. Run your application, ask some questions, and explore the power of AI. Let your creativity flow and see what amazing insights and ideas your model can generate.

Related Links:

  1. Thank you to Scisharp

  2. Hugging face as always for hosting our models

  3. TheBloke for all his contributions to this space