How to Build Local AI on PC Using OpenAI gpt-oss-20b in VS Code

Table of Contents

Want to Build Powerful AI on Your Own PC? Are You Missing Out on the Secret to Free, Private AI Development? Discover Microsoft’s Incredible Toolkit.
Understanding the gpt-oss-20b Model
Your Toolkit for Local AI Development
How to Get Started with Local Deployment
System Requirements
Method 1: Direct Deployment
Method 2: Using Ollama
Added Flexibility with Ollama
Testing and Building with Your New AI
Building AI Agents
Performance and Optimization
Real-World Applications
Security and Safety
Future Updates

Want to Build Powerful AI on Your Own PC? Are You Missing Out on the Secret to Free, Private AI Development? Discover Microsoft’s Incredible Toolkit.

You can now build and run powerful Artificial Intelligence (AI) applications right on your personal computer. Microsoft has provided a clear guide showing how developers can use OpenAI’s gpt-oss-20b model locally with the AI Toolkit in Visual Studio (VS) Code. This approach allows you to work with advanced AI without needing to connect to the cloud, offering more privacy and control.

How to Build Local AI on PC Using OpenAI gpt-oss-20b in VS Code

The gpt-oss-20b model is a game-changer because it has strong reasoning skills but can run on regular consumer hardware. This is perfect for projects that need to work offline or on devices at the edge of a network, like smart cameras or sensors.

Understanding the gpt-oss-20b Model

OpenAI recently made its gpt-oss-20b and a larger gpt-oss-120b model available for public use. Both are built with a smart “mixture-of-experts” design, which helps them run efficiently. Here’s what makes the smaller gpt-oss-20b so useful:

Low Memory Needs: It requires only 16GB of GPU memory, which is common in many gaming or development computers.
Large Context Window: It can remember and process up to 128,000 tokens of information at once, allowing for more complex conversations and tasks.
Free to Use: The model is released under an Apache 2.0 license, meaning you can use, change, and build upon it for free, even for commercial products.
Powerful Capabilities: Despite its smaller size, it performs very well on tasks that require reasoning, problem-solving, and using tools.

Your Toolkit for Local AI Development

The AI Toolkit for Visual Studio Code is a free extension that brings all the necessary tools for AI development into one place. It helps you manage the entire process, from downloading and testing models to building them into your applications. With this toolkit, you can deploy, test, and use the gpt-oss-20b model without relying on external cloud APIs, which can be costly and complicated.

How to Get Started with Local Deployment

Setting up the gpt-oss-20b model on your machine is a straightforward process with the AI Toolkit.

System Requirements

Before you start, make sure your computer meets these requirements:

GPU: 16GB or more VRAM
Software: Visual Studio Code with AI Toolkit extension
Operating system: Windows, macOS, or Linux

Method 1: Direct Deployment

Install Visual Studio Code if you don’t have it already
Add the AI Toolkit extension from the marketplace
Open the Model Catalog using Ctrl+Shift+P
Find gpt-oss-20b in the catalog and click “Add Model”
Wait for download – this takes 15-30 minutes
Check deployment status in the AI Toolkit’s model management interface

Method 2: Using Ollama

You can also use Ollama for more flexibility:

Install Ollama on your computer
Run the command: ollama run gpt-oss
Add to AI Toolkit through the Resources section

This method gives you API access and works with different development frameworks.

Added Flexibility with Ollama

For developers who prefer working with the gguf model format, the AI Toolkit supports Ollama. This allows you to run gpt-oss-20b through Ollama’s local server while still managing it within the toolkit. You can install Ollama, pull the gpt-oss model, and then add it to your resources in the AI Toolkit.

Testing and Building with Your New AI

Once the model is deployed, the AI Toolkit provides features to help you test its capabilities and build applications.

The Playground

The toolkit includes a “Playground” where you can test prompts and even compare different models side-by-side. For example, you could see how gpt-oss-20b performs on a coding task compared to another model like Qwen3-Coder.

The AI Toolkit includes a Playground feature for testing models. You can:

Compare different models side-by-side
Test programming tasks like creating HTML5 games
Evaluate performance against other local models like Qwen3-Coder

For example, you can test the prompt “Creating an HTML5 Tetris application” to see how well the model generates code.

Agent Builder

For more advanced projects, the Agent Builder is a visual tool that helps you create AI agents. These agents can combine the power of your local model with other services to perform complex tasks.

Building AI Agents

The AI Toolkit’s Agent Builder lets you create intelligent agents using gpt-oss-20b. This visual tool helps you:

Build agent applications quickly
Combine multiple services using Model Control Protocol (MCP)
Create prototypes for business applications

This feature makes it easy to experiment with AI agents without complex coding.

Performance and Optimization

The gpt-oss-20b model performs well on consumer hardware:

Laptops with 16GB RAM: 15-25 tokens per second
Apple Silicon Macs: 20-30 tokens per second with Metal optimization
High-end smartphones: 8-12 tokens per second with quantization

For better performance, you can use techniques like memory mapping and torch.compile() optimization.

Real-World Applications

Developers are using gpt-oss-20b for various applications:

Edge computing solutions that work without internet
Privacy-focused AI assistants
Local development environments for testing
Offline AI applications for mobile devices

Security and Safety

OpenAI has tested gpt-oss-20b across multiple safety domains including biological, chemical, and cybersecurity applications. The model includes safety measures to prevent misuse while maintaining high performance.

Future Updates

Microsoft plans to add CPU-only deployment in future releases. Currently, GPU acceleration is required for optimal performance.

By bringing powerful, open-weight models like gpt-oss-20b into a simple development environment, Microsoft is making it easier for anyone to experiment with AI. This local-first approach gives developers more freedom and control, helping them innovate faster without the frustrating costs and limitations of cloud-based services.