Handling 200,000 requests per second (RPS) can truly be considered a high-load system. Achieving such performance often requires hundreds or even thousands of service instances and database shards. However, in this article, I will focus on how a single instance of an application written in Golang can achieve this impressive performance without complicated sharding or replication.

Golang is an excellent choice for high-performance applications. It provides powerful concurrency tools, simplicity, reliability, and speed. The concepts discussed here are broadly applicable to high-performance systems, not limited to Go.

# Example Application

Let’s consider a simple recommendation feed service. To simplify, we assume this feed is updated offline. This means the feed is generated once at application startup and never changes during runtime. This pattern is common in recommendation systems to effectively handle high concurrency.

This HTTP-based application has one straightforward goal: return a pre-generated feed for a given user and save progress.

TL;DR Benchmark Results (MacBook Pro 14, M3, 16GB RAM):

Thread Stats   Avg      Stdev     Max   +/- Stdev
Latency       1.07ms    3.19ms 106.26ms   96.11%
Requests/sec: 211042.29

# Business Logic

Let’s first define the service’s business logic in code and describe struct FeedService that contains bussiness logic:
```
type Service struct {
 feedStorage       feedStorage
 randomFeedStorage randomFeedStorage
 errRecorder       errRecorder
 logger            *slog.Logger
}

func NewService(feedStorage feedStorage, randomFeedStorage randomFeedStorage, errRecorder errRecorder, logger *slog.Logger) *Service {
 return &Service{
  feedStorage:       feedStorage,
  randomFeedStorage: randomFeedStorage,
  errRecorder:       errRecorder,
  logger:            logger,
 }
}

type feedStorage interface {
 NextFeed(ctx context.Context, userId uint32, size uint8) ([]uint32, error)
}

type randomFeedStorage interface {
 RandomFeed(ctx context.Context, size uint8, excludeItems []uint32) []uint32
}

type errRecorder interface {
 RecordFeedError(ctx context.Context, userId uint32, err error)
}
```

Now, let’s implement the method to retrieve the feed:

```
func (f *Service) RetrievFeed(ctx context.Context, r FeedRequest) ([]uint32, error) {
 // Set default size if not specified
 if r.Size == 0 {
  r.Size = defailtNextFeedSize
 }

 var randomFeedSize uint8
 // Get personalized feed for user
 persFeed, err := f.feedStorage.NextFeed(ctx, r.UserId, r.Size)
 if err != nil {
  f.errRecorder.RecordFeedError(ctx, r.UserId, err)
 }
 randomFeedSize = r.Size - uint8(len(persFeed))

 // Fill remaining items with random feed
 if randomFeedSize > 0 {
  randomFeed := f.randomFeedStorage.RandomFeed(ctx, randomFeedSize, persFeed)
  persFeed = append(persFeed, randomFeed...)
 }

 // Validate final feed size
 if len(persFeed) != int(r.Size) {
  f.errRecorder.RecordFeedError(ctx, r.UserId, fmt.Errorf("feed size is not equal to requested size"))
  f.logger.ErrorContext(ctx, "critical error feed size is not equal to requested size",
   "userId", r.UserId,
   "randomFeedSize", randomFeedSize,
   "persFeedSize", len(persFeed),
   "requestedSize", r.Size)
  if len(persFeed) == 0 {
   return nil, errors.New("no feed items")
  }
 }

 return persFeed, nil
}
```

To ensure stability, if there aren’t enough personalized items (for example, the user has already viewed them or an error occurred), we supplement the feed with random items.

Go’s explicit error handling significantly contributes to the reliability of this approach.

At the end, if it’s anyway isn’t possible to make feed we send error message and return error.
API

There is a good choice for webserver of HTTP API: fasthttp, a high-performance server library known for its zero allocation feature. Zero allocation means minimal memory allocation and no unnecessary goroutine spawning per request. Instead, it uses preallocation, buffering, and worker pools to optimize performance.

However, it’s can be confusing that request data is only valid within the request’s lifecycle. If data needs to persist beyond this, you must explicitly copy it, potentially affecting performance.

Using `fiber`, built on top of fasthttp, simplifies working with HTTP requests handling:

```
type App struct {
 feedService *feed.FeedService
 fiberApp    *fiber.App
}

func NewApp() *App {
 feedService := feed.NewFeedService(...)
 app := &App{
  feedService: feedService,
  fiberApp:    fiber.New(),
 }
 app.fiberApp.Get("/feed/:userId", app.feedHandler)
 return app
}

func (a *App) feedHandler(ctx *fiber.Ctx) error {
 // Get userId from path params
 userId, err := ctx.ParamsInt("userId")
 if err != nil {
  return ctx.Status(fiber.StatusUnprocessableEntity).SendString("userId is required")
 }
 // Get optional size from query params
 size := ctx.QueryInt("size", 0)

 // Call feed service to get items
 feed, err := a.feedService.RetrievFeed(ctx.Context(), feed.FeedRequest{
  UserId: uint32(userId),
  Size:   uint8(size),
 })
 if err != nil {
  return ctx.Status(fiber.StatusInternalServerError).SendString(err.Error())
 }

 // Format feed items as array string
 var sb strings.Builder
 sb.WriteString("[")
 for i, id := range feed {
  if i > 0 {
   sb.WriteString(",")
  }
  sb.WriteString(strconv.FormatUint(uint64(id), 10))
 }
 sb.WriteString("]")

 // Return response
 return ctx.Status(fiber.StatusOK).SendString(sb.String())
}
```

    This example uses custom serialization, which is highly efficient. In many cases, serialization and deserialization consume significant resources and can slow down the application — especially when working with JSON that contains many fields. To improve performance, you can optimize JSON handling by using high-performance libraries or code generation tools that create parsers for specific data structures. Or even you can switch to a different message type (other than JSON), for example, using binary formats like protobuf in gRPC or other custom serializers.

# Storage

The fastest way to handle a high volume of requests is to keep the data in memory. This avoids the overhead of querying external caches or databases — everything is already available in RAM, and we just need to access it directly.

Below is the implementation of the feedStorage used by the FeedService:

```
type Storage struct {
 feeds     map[uint32][feed.TotalFeedSize]uint32
 offsets   sync.Map
 numExceed atomic.Uint64
}

func NewStorage() *Storage {
 return &Storage{
  feeds: make(map[uint32][feed.TotalFeedSize]uint32),
 }
}

func (s *Storage) NextFeed(ctx context.Context, userId uint32, size uint8) ([]uint32, error) {
 // Get current offset for user
 offsetVal, _ := s.offsets.Load(userId)
 var offset uint16
 if offsetVal != nil {
  offset = offsetVal.(uint16)
 }

 // Return empty if user has seen all items
 if int(offset) >= feed.TotalFeedSize {
  return nil, nil
 }

 // Calculate how many items to return, bounded by total feed size
 lastItem := min(int(offset)+int(size), feed.TotalFeedSize)
 if lastItem >= feed.TotalFeedSize {
  s.numExceed.Add(1)
 }

 // Get user's feed array and slice the requested portion
 feed, ok := s.feeds[userId]
 if !ok {
  return nil, fmt.Errorf("no feed found for user %d", userId)
 }
 items := feed[offset:lastItem]

 // Update user's offset
 s.offsets.Store(userId, uint16(lastItem))
 return items, nil
}
```

For in-memory storage, we use two types of hash maps: Go’s built-in map and sync.Map. The main performance benefit of a hash map is its ability to access random elements in constant time (O(1)), making it ideal for retrieving data by key.

However, regular maps are not safe for concurrent read-write operations and can lead to race conditions and panics. To safely perform concurrent access, there are two main strategies:

>    Protect the `map` with a sync.RWMutex to lock it during writes.
>    Use a lock-free structure like sync.Map.

The best choice depends on the type and frequency of access. In highly concurrent environments, sync.Map often performs better due to its lock-free behavior.

In our case:

>    For storing the static array of video IDs, we use a regular map, since it’s read-only during runtime.
>    For tracking user offsets, we use sync.Map because it needs frequent concurrent updates.

# Benchmark Results

Now it’s time to generate data for a benchmark. For this test I generated data for 5 million users, each with 200 feed items (around 7GB RAM usage). More I could not afford because of my laptop.

For benchmarking I will use simple tool wrk, which will send random requests like `http://localhost:8080/feed/{random 0…5000000}`.

In my testing scenario, sending requests for randomly selected users worked well. This simulates a typical case where users behave similarly and access their feeds at a fairly even rate.

```
wrk -t5 -c200 -d30s --latency -s benchmark/random_ids.lua http://localhost:8080

Running 30s test @ http://localhost:8080
  5 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.07ms    3.19ms 106.26ms   96.11%
    Req/Sec    42.49k     9.63k   76.44k    70.86%
  Latency Distribution
     50%  393.00us
     75%  728.00us
     90%    1.91ms
     99%   12.85ms
  6352517 requests in 30.10s, 1.06GB read
  Socket errors: connect 0, read 61, write 0, timeout 0
Requests/sec: 211042.29
Transfer/sec:     36.14MB
```

## Interpreting the Benchmark

The system was loaded with 5 million users. It successfully handled around 200,000 requests per second (RPS). That means the service can support all 5 million users if, on average, each user requests their feed once every 25 seconds (5,000,000 / 200,000 ≈ 25s). This is a realistic usage pattern for many real-world applications.

Latency is also low: the average request time is just 1 millisecond, and 99% of requests complete in under 12 milliseconds — which feels instant to the user.

By the end of the benchmark test, fewer than 1% of users had exhausted their personalized feed and started receiving random content. This confirms that the system’s high performance isn’t just due to falling back on random data — it’s capable of delivering personalized results at scale.
Conclusion

You can find the full example here: https://github.com/Gamazic/fast-rec-feed

My goal with this article was to show how you can build a high-performance system in Go, based on a near real-world use case. Similar architectures are used in many high-load systems — like advertising platforms, trading engines, recommendation services, and search engines.

This example demonstrates several key patterns that enable such performance. While I didn’t cover every aspect of high-load system design, this implementation follows some core principles and can serve as a practical starting point for learning how to build scalable, fast applications.