Your AI Coding Agent Deserves a Real Review Desk

AI coding agents are getting powerful enough that the hard part is no longer just “can it write code?”

The harder product question is this:

Can you keep track of the work, review what changed, respond at the right moment, and trust the interface enough to use it every day?

That is the reason I built a Flutter client for Jules.

Jules already has the interesting primitive: sessions that can plan, ask for approval, work against a repository, report progress, accept follow-up messages, and produce code changes. But an agent workflow quickly becomes messy if the client only feels like a thin chat box.

You need a workbench.

This release is that workbench: a polished Flutter app for reviewing Jules sessions, chatting with the agent, switching between accounts, searching historical work, and inspecting generated diffs in a native desktop-style experience.

Architecture diagram showing a Flutter Jules client with session index, agent timeline, diff review panel, Jules API, and local Hive cache. — The product shape is simple: session history on the left, agent conversation in the middle, code review on the right, and a local cache underneath.

What Shipped

The first useful release focuses on the work loop that matters most:

Add one or more Jules API accounts.
Switch accounts without mixing cached sessions.
Browse sessions grouped around their repositories.
Search session history by title, repository, prompt, and state.
Start a new Jules session against a source returned by the API.
Read the agent timeline across prompts, plans, progress updates, and responses.
Send follow-up messages into an existing session.
Approve generated plans when Jules waits for confirmation.
Inspect generated code changes in a resizable diff panel.
Keep recently loaded sessions available locally through Hive.
Recover gracefully when the network drops and sync again when it returns.
Use the same Flutter codebase across desktop, mobile, and web targets.

That sounds like a normal feature list, but the engineering behind it is where the product starts to feel good.

The app does not treat the API as a random set of buttons. It wraps the API in a repository, scopes cached data by account, loads local state before remote state, and keeps the UI responsive while background indexing fills in the rest.

That is the difference between a demo and a tool.

The Architecture in One Pass

The app is intentionally boring in the best way:

Flutter UI
  -> Provider state
    -> JulesRepository
      -> ApiClient for remote calls
      -> LocalStorageService for Hive cache

The UI knows about sessions, activities, loading states, selected panes, and settings. It does not know how to build API URLs or how Hive keys are scoped.

The repository is the boundary:

class JulesRepository {
  final ApiClient _apiClient;
  final LocalStorageService _local;
  final String accountId;

  JulesRepository(this._apiClient, this._local, {required this.accountId});
}

That accountId looks small, but it is one of the most important product decisions in the app.

If a developer connects a work account and a personal account, the client must never blend those histories together. A cached session from one account should not appear while another account is active. API keys should feel like separate workspaces, not just separate strings in a settings screen.

So the local storage layer scopes keys:

String _scopedKey(String key, String? accountId) {
  if (accountId == null || accountId.isEmpty) return key;
  return '$accountId::$key';
}

That tiny convention turns Hive from a global bucket into an account-aware cache.

Pattern 1: Load Local First, Then Sync

Fast apps cheat, but in a responsible way.

They show what they already know before they ask the network for what changed.

In this client, session loading starts with local data:

_sessions = await repo.getSessions();
_restoreCurrentSessionFromList(
  allowDefaultSelection: !_hasUserSelectedView,
);
notifyListeners();

Then the remote sync runs:

_isSyncing = true;
notifyListeners();

final response = await repo.syncSessions(pageSize: 100);
_sessions = _mergeSessions(_sessions, response.sessions);

That sequence matters.

If the user opens the app on a slow connection, they still get a useful screen quickly. If the API returns newer sessions, the provider merges them into the list. If the device is offline, the cached history still explains what happened last time.

This is the local-first rule I keep coming back to:

The cache should make the app useful. The network should make it current.

Do not reverse those roles.

Pattern 2: Keep API Calls Behind a Small Contract

The Jules API surface the app needs is compact:

Future<SessionsResponse> syncSessions({int pageSize = 100});
Future<List<Activity>> syncSessionActivities(String sessionId);
Future<SourcesResponse> syncSources({int pageSize = 100});
Future<Session> createSession(String prompt, String title, String repo);
Future<void> sendMessage(String sessionId, String prompt);
Future<void> approvePlan(String sessionId);

That is enough to build the entire workflow:

Load sources so the user can choose a repository.
Create a session with a prompt and source context.
Poll the selected session for state and activity updates.
Render the timeline as activities arrive.
Send follow-up messages when the user needs to steer the work.
Approve the plan when Jules waits for the human.
Show change sets when artifacts include patches.

The app does not sprinkle raw endpoint strings across widgets. The UI asks the provider to perform product actions. The provider asks the repository to perform Jules actions. The repository asks the API client to perform HTTP actions.

That layering is not academic cleanliness. It lets the product change without turning every button into a network integration.

Pattern 3: Model Agent Work as a Timeline, Not a Chat Log

Most agent UIs start as chat UIs. That is fine for a prototype, but code work has more structure than conversation.

A Jules session can contain:

User messages.
Agent messages.
Generated plans.
Progress updates.
Artifacts.
Change sets.
State transitions such as planning, awaiting approval, in progress, completed, or failed.

The data model reflects that:

class Activity {
  final ActivityOriginator originator;
  final String? description;
  final Plan? planGenerated;
  final String? userMessage;
  final String? agentMessage;
  final ProgressUpdate? progressUpdated;
  final List<Artifact> artifacts;
}

That makes the UI more honest.

A plan is not just a message with fancy formatting. A progress update is not the same thing as a user prompt. A patch is not just text to scroll past. These are different objects in the work loop, so the app can render them differently and let the user respond to them differently.

This is one of the easiest ways to improve an AI product: stop flattening everything into a transcript.

Pattern 4: Put Code Review Beside the Conversation

The product release feature I care about most is the diff panel.

When Jules produces code changes, the app finds the latest activity with change sets and renders the patch beside the conversation:

final codeActivities = session?.activities
    .where((a) => a.artifacts.any((art) => art.changeSet != null))
    .toList() ?? [];

final changeSets = codeActivities.isEmpty
    ? []
    : codeActivities.last.artifacts
        .where((art) => art.changeSet != null)
        .map((art) => art.changeSet!)
        .toList();

This is a small decision with a large workflow impact.

If the patch is hidden in a separate page, the user has to keep switching contexts:

What did I ask for?
What did the agent say?
What file changed?
Does the diff match the plan?
Do I need to reply?

Putting the diff beside the timeline keeps the review loop intact. The user can read the agent’s reasoning, inspect the files, and send a correction without losing the thread.

The first renderer is deliberately practical. It parses unified diff lines, tracks old and new line numbers, and styles added and removed lines:

if (line.startsWith('@@')) {
  final match = RegExp(r'@@ -(\d+),?\d* \+(\d+),?\d* @@').firstMatch(line);
  oldLine = int.parse(match.group(1)!);
  newLine = int.parse(match.group(2)!);
} else if (line.startsWith('+')) {
  result.add(_ParsedLine(line.substring(1), _LineType.added, '+', null, newLine));
  newLine++;
} else if (line.startsWith('-')) {
  result.add(_ParsedLine(line.substring(1), _LineType.removed, '-', oldLine, null));
  oldLine++;
}

There is room to make the parser more complete over time. But the product lesson is already clear:

Agent output becomes more trustworthy when code review is part of the primary screen.

Pattern 5: Make Desktop Great, Then Collapse Responsively

The desktop layout uses three surfaces:

Sidebar | Chat timeline | Diff panel

That is the ideal shape for agent work. The session list stays visible. The conversation remains central. The generated code sits at the edge where a developer expects review material.

But the Flutter app also needs to behave on smaller screens. The home screen calculates breakpoints and adapts:

final width = MediaQuery.of(context).size.width;
final isMobile = width < AppConstants.mobileBreakpoint;
final isTablet = width < AppConstants.tabletBreakpoint;

On mobile, the session list moves into a drawer. On tablet-sized widths, the diff panel can collapse so the conversation stays readable. On desktop, the diff panel can be resized and toggled.

This is a better approach than designing a mobile app and stretching it across a desktop window.

Agent work is dense. It needs panes, persistent context, and review space. Flutter gives you enough layout control to make that density feel native instead of cramped.

Pattern 6: Treat Connectivity as a Product State

Offline support is not just a cache. It is also feedback.

The app watches connectivity and triggers a sync when the device comes back online:

_connectivitySubscription =
    _connectivity?.onConnectivityChanged.listen((isOnline) {
  if (isOnline) {
    refreshSessions();
  }
});

When a socket error occurs, the UI shows an offline status instead of pretending nothing happened.

That matters because agent work is asynchronous. A session might still be running elsewhere. The user needs to know whether the app is current, stale, syncing, or disconnected.

Good developer tools do not hide operational state. They make it legible.

Try This Pattern in Your Own Agent Client

If you are building a client for an AI coding workflow, you can copy the shape even if you are not using Flutter:

1. Put all remote calls behind a repository.
2. Scope local cache records by account or workspace.
3. Load cached state before remote state.
4. Model work as typed activities, not just chat messages.
5. Keep review artifacts beside the conversation.
6. Make sync, indexing, and offline states visible.
7. Design the desktop layout around panes, not pages.

The core idea is simple: an AI coding agent is not only a text generator. It is a coworker producing stateful work against real repositories.

Your UI should respect that.

What Comes Next

The current app is already useful for reviewing Jules work, but the next layer is about deeper review ergonomics:

Better diff navigation for large change sets.
Per-file review state.
Richer artifact previews.
More complete pagination controls for older sessions and sources.
Safer handling for non-main starting branches.
Better packaging around desktop releases.

Those are good problems to have because the basic shape is already right.

Jules can plan. Jules can write. Jules can report progress. This Flutter client gives that work a place to live.

And that is where agent engineering starts to feel less like a trick and more like a serious toolchain.