Chat Settings

The Chat settings panel controls whether the Ask feature is available to users, which AI model it uses, and gives you visibility into usage and cost.

Where Settings Live

In the app, go to Admin → Chat. This panel is visible only to app administrators.

Enabling and Disabling Ask

A toggle at the top of the panel turns the Ask feature on or off for all users across all projects.

Enabled (default): the Ask icon appears in every project's sidebar. Users can open threads and ask questions.
Disabled: the Ask icon is hidden. Existing threads and messages are retained in the database but are not accessible to users. Re-enabling restores access to prior threads without data loss.

Model Selection

Choose which Claude model runs the Ask feature. Three options are available:

Claude Sonnet 4.6 (default): the best starting point for most deployments. Handles multi-contract reasoning well at reasonable cost.
Claude Opus 4.7: higher capability, particularly for complex queries that span many contracts or require nuanced reasoning. Higher cost per query.
Claude Haiku 4.5: fastest and cheapest. Well-suited when users are asking simple, factual questions and cost is the priority.

The model applies to all projects. There is no per-project model override.

All three models are accessed through AWS Bedrock. Costs are tracked in the usage panel below.

Usage Chart

The usage chart shows daily message volume and estimated cost over the past 30 days. Use this to understand whether usage is growing and which days drive the most traffic.

Recent Queries Table

Below the chart, a table shows the last 50 assistant turns with the following columns:

Column	Description
User	The user who sent the query
Project	The project the thread belongs to
Model	The model that generated the response
Tokens (in / out)	Input and output token counts
Cost	Estimated cost for that turn
Tool calls	Number of contract-text lookups the model made to answer the query

Message content is not shown. The table is intentionally a billing and metering view. The actual question the user asked, and Ask's response, are omitted from the data returned by the admin query. App admins can see usage patterns and costs without reading user conversations.

What Happens When You Disable Ask

The Ask icon disappears from all project sidebars immediately.
Users who are mid-conversation cannot submit new messages.
Threads and message history are retained in the database. No data is deleted.
Re-enabling the toggle restores the Ask icon and full access to prior threads.

Preamble & Policy

These settings shape the system prompt sent to the model before each chat turn. Changes take effect on the next message. Settings are read once per request, so a settings change committed while a turn is in-flight does not retroactively apply to that turn.

Tone

Professional, Conversational, or Plain language. Maps to a one-line tone directive in the prompt. Default: Professional.

Citation strictness

Hard (recommended): every factual sentence must end with a citation. If the model cannot cite a claim, it must say so rather than guess.
Soft: citation markers requested but not required.

Refusal strictness

Strict: refuse if no exact match exists.
Helpful (recommended): explore adjacent matrix terms when the exact concept is missing.
Partial-friendly: answer the cited part, flag what's missing.

Refusal phrase

The canned line shown when the model refuses. Plain text, ≤ 200 characters.

Tool-call aggressiveness

Liberal (recommended): call tools whenever the matrix snippet is truncated, exact wording matters, or absence needs confirming.
Conservative: answer from the matrix when possible.

Behavior toggles

Quote-then-answer (default on): model internally quotes relevant matrix rows before composing the answer.
Conflict handling (default on): explicit rules for temporal, factual, and within-document conflicts across contracts.
Calibrated uncertainty (default on): model distinguishes what documents say from what it infers, with hedging words on inferences.
Partial-answer mode (default on): answer the cited part, then state what's missing, instead of refusing outright.
Inject current date (default on): Current date: YYYY-MM-DD is prepended so the model can compute renewal windows, expiries, and "what's coming up" answers.
Inject project context (default on): project name and template name are added so the model knows what portfolio it's reasoning about.

Custom addendum

Free text (≤ 2000 characters) appended after the locked sections. Use this for one-off experiments without a code deploy.

Preview generated preamble

Paste a project UUID and click Show. The exact system prompt that would be sent for that project, given the current settings, is returned. Use this to eyeball toggle combinations before saving, especially helpful when chaining a few changes together.

Generation & Retrieval

These settings shape what the model sees in its context window and how it responds.

Temperature

0.00 to 1.00. Default: 0.00.

0 (deterministic, recommended): the model picks the highest-probability next token at each step. Best for factual contract Q&A.
Higher values: more variation in phrasing. Useful for exploratory or creative use cases; rarely the right answer for contract analysis.

Max output tokens

512 to 8192. Default: 2048.

The maximum length of a single model response. The model stops generating when it hits this cap, and the response is suffixed with _(Answer truncated, hit max_tokens cap.)_ if it would have continued. Raise this only if you're seeing truncated answers on legitimate questions; higher caps add cost per turn.

Matrix token cap

10000 to 180000. Default: 100000.

The largest extracted-matrix payload sent in the prompt. The matrix is the structured table of every (contract × template field) row in the project. If a project's matrix exceeds this cap, the system trims low-confidence rows first, then drops source_text snippets.

Lower: faster responses, lower cost per turn, but the model may lose information on large projects.
Higher: better recall on large portfolios, more cost per turn.

WARNING

Lowering this below ~80K historically caused citation hallucination (98% → 90% term-id match rate at 60K) because the model remembered terms it could no longer see and invented UUIDs for them. Reduce with care.

History depth

0 to 100. Default: 20.

Number of prior conversation turns retained in the prompt. Lower values make each turn more independent; higher values let the model reference earlier exchanges.

0: each turn is stateless. Useful for keeping costs predictable.
20: typical multi-turn investigation.
100: long-form deep-dives.

Tools

When the matrix snippet alone isn't enough to answer (e.g., a quoted term is truncated, or you need to know whether a phrase appears at all), the model can call one of two tools to look deeper.

read_contract_full_text

Reads the full extracted text of a single contract from S3. The text is wrapped in <untrusted_document> tags before going to the model so any "ignore previous instructions" content from the contract is treated as data, not instructions. Capped at 100,000 characters per call.

Setting: tools.read_contract_full_text_enabled (default: enabled)

grep_project

Case-insensitive literal-substring search across every contract in the project. Useful for "does anything mention …" questions. Bounded to scan at most 50 contracts and 5 MB total per call; phrases under 2 characters are rejected.

Setting: tools.grep_project_enabled (default: enabled)

Max calls per turn

0 to 10. Default: 5.

Hard ceiling on how many tool calls the model can make in a single user turn. Setting this to 0 disables tool use entirely (the model sees no tools at all), even if the individual enable flags are on. If the model wants more calls than the cap allows, the response is suffixed with a note that the tool budget was exhausted.

Setting: tools.max_calls_per_turn (default: 5)

Limits & Governance

These settings control how chat usage is capped per user and across the org, and how the system reacts when limits are exceeded.

Settings

Setting	Default	Range	What it does
Per-user daily messages	100	0–100000	Hard cap on user-sent messages per 24h per user. 0 = unlimited.
Per-user monthly cost cap	$20 (2000 cents)	0–$10000	Trailing-30d Bedrock spend per user. 0 = unlimited.
Org daily kill switch	$50 (5000 cents)	0–$100000	Trailing-24h org-wide spend ceiling. Crossing it auto-disables chat and emails admin. 0 = disabled.
Enforcement mode	hard_refuse	hard_refuse / soft_warn	How per-user-limit crossings are handled.

Enforcement modes

hard_refuse -- Crossing a per-user cap returns 429 to the user. They must wait until their window rolls forward.
soft_warn -- Crossing a per-user cap allows the request to proceed but attaches a warning to the streaming done event (shown as an inline callout below the assistant message) and emails the admin. One email per 23h regardless of how many users cross.

Anti-flood windows (per-minute, per-hour, per-project-day) are always hard. The enforcement mode only applies to the two per-user windows above.

The kill switch

If trailing-24h org-wide chat cost exceeds the kill-switch cap, the system automatically sets chat.enabled=false and emails the admin. Users get a 403 on their next chat request.

Recovery. Re-enable chat from the admin panel. The system writes a baseline timestamp; the next 24-hour cost-sum window starts fresh from that timestamp, so chat will not immediately re-trip on the prior 24 hours of usage. The baseline expires naturally once 24h have elapsed.

Do not hand-edit kill-switch sentinel rows (chat.limits.last_kill_switch_at, chat.limits.kill_switch_baseline_at, chat.limits.last_soft_warn_at) in app_settings. The rate limiter expects them to contain valid timestamps; a malformed value causes the next kill-switch evaluation to fail silently and leaves the switch broken until the row is fixed.

The admin UI shows a confirmation modal when lowering the kill-switch cap below 50% of its prior value OR below 80% of current trailing-24h spend. The modal prevents accidentally nuking chat by mis-typing a number.

→ For the user-facing feature description, see Ask. → For more detail, see Roles & Capabilities.

Chat Settings ​

Where Settings Live ​

Enabling and Disabling Ask ​

Model Selection ​

Usage Chart ​

Recent Queries Table ​

What Happens When You Disable Ask ​

Preamble & Policy ​

Tone ​

Citation strictness ​

Refusal strictness ​

Refusal phrase ​

Tool-call aggressiveness ​

Behavior toggles ​

Custom addendum ​

Preview generated preamble ​

Generation & Retrieval ​

Temperature ​

Max output tokens ​

Matrix token cap ​

History depth ​

Tools ​

read_contract_full_text ​

grep_project ​

Max calls per turn ​

Limits & Governance ​

Settings ​

Enforcement modes ​

The kill switch ​

Confirmation modal ​