socktop-webterm/IDLE_TIMEOUT.md

# Idle Timeout Feature

## Overview

The webterm now includes an **idle timeout mechanism** to prevent "grey goo" accumulation of orphaned terminal processes. This feature automatically cleans up inactive PTY sessions, preventing resource leaks when users refresh pages or abandon sessions.

## How It Works

### Architecture

The idle timeout is implemented in the `Terminal` actor (`src/lib.rs`):

1. **Activity Tracking**: Each `Terminal` maintains a `last_activity` timestamp that is updated whenever user interaction occurs
2. **Periodic Checking**: A background task runs every 30 seconds to check if the session has been idle
3. **Automatic Cleanup**: If a session is idle for longer than the configured timeout, the Terminal actor stops itself, cleaning up the PTY and child process

### What Counts as Activity

The `last_activity` timestamp is updated on:

- **User Input** (`TerminadoMessage::Stdin`): Keyboard input from the user
- **Terminal Resize** (`TerminadoMessage::Resize`): Window resize events
- **Direct IO** (`event::IO`): Any direct I/O from the WebSocket

Note: Output from the PTY to the terminal (stdout) does NOT reset the idle timer. This is intentional—we care about user activity, not just program output.

### Configuration

The timeout values are configured as constants in `src/lib.rs`:

```rust
const IDLE_TIMEOUT: Duration = Duration::from_secs(300); // 5 minutes
const IDLE_CHECK_INTERVAL: Duration = Duration::from_secs(30); // Check every 30 seconds
```

**Default Settings:**
- **Idle Timeout**: 5 minutes (300 seconds)
- **Check Interval**: 30 seconds

### Behavior Scenarios

#### Scenario 1: Page Refresh
1. User refreshes the browser page
2. Old WebSocket disconnects → old `Websocket` actor stops
3. Old `Terminal` actor continues running (no new messages arrive)
4. After 5 minutes of no activity, old `Terminal` times out and stops
5. New WebSocket and Terminal are created for the new page

**Result**: Old session is cleaned up within 5 minutes

#### Scenario 2: User Goes Idle
1. User leaves terminal open but inactive
2. No keyboard input or resize events occur
3. Program output (if any) continues, but doesn't reset timer
4. After 5 minutes, `Terminal` stops

**Result**: Idle session is cleaned up

#### Scenario 3: Active Use
1. User actively types commands or interacts with terminal
2. Each interaction resets `last_activity`
3. `Terminal` never reaches idle timeout
4. Session continues indefinitely while active

**Result**: Active sessions remain alive

#### Scenario 4: Long-Running Command
1. User starts a long-running command (e.g., `tail -f`, continuous monitoring)
2. Program produces output, but user doesn't interact
3. After 5 minutes of no user input, `Terminal` times out
4. Child process is killed

**Result**: Long-running unattended processes are cleaned up

> **Note**: If you need to run long-lived monitoring commands, you may want to:
> - Increase the `IDLE_TIMEOUT` constant
> - Periodically send a no-op interaction (like a resize event) to keep the session alive
> - Use a different mechanism (like tmux/screen) for persistent sessions

## Implementation Details

### Terminal Struct

```rust
pub struct Terminal {
    pty_write: Option<AsyncPtyMasterWriteHalf>,
    child: Option<Child>,
    ws: Addr<Websocket>,
    command: Command,
    last_activity: Instant,      // NEW: Track last activity
    idle_timeout: Duration,       // NEW: Timeout duration
}
```

### Initialization

In `Terminal::new()`:
```rust
Self {
    pty_write: None,
    child: None,
    ws,
    command,
    last_activity: Instant::now(),  // Initialize to current time
    idle_timeout: IDLE_TIMEOUT,      // Set configured timeout
}
```

### Periodic Check

In `Terminal::started()`:
```rust
ctx.run_interval(IDLE_CHECK_INTERVAL, |act, ctx| {
    let idle_duration = Instant::now().duration_since(act.last_activity);
    if idle_duration >= act.idle_timeout {
        info!(
            "Terminal idle timeout reached ({:?} idle), stopping session",
            idle_duration
        );
        ctx.stop();
    }
});
```

### Activity Updates

In message handlers:
```rust
// Handler<event::IO>
fn handle(&mut self, msg: event::IO, ctx: &mut Context<Self>) {
    self.last_activity = Instant::now();  // Reset timer
    // ... rest of handler
}

// Handler<event::TerminadoMessage>
fn handle(&mut self, msg: event::TerminadoMessage, ctx: &mut Context<Self>) {
    match msg {
        TerminadoMessage::Stdin(io) => {
            self.last_activity = Instant::now();  // Reset on input
            // ...
        }
        TerminadoMessage::Resize { rows, cols } => {
            self.last_activity = Instant::now();  // Reset on resize
            // ...
        }
        // ...
    }
}
```

## Customization

### Changing the Timeout Duration

To adjust the idle timeout, modify the constants in `src/lib.rs`:

```rust
// For a 10-minute timeout:
const IDLE_TIMEOUT: Duration = Duration::from_secs(600);

// For a 1-minute timeout (more aggressive):
const IDLE_TIMEOUT: Duration = Duration::from_secs(60);

// For a 30-second timeout (very aggressive):
const IDLE_TIMEOUT: Duration = Duration::from_secs(30);
```

### Making It Configurable

To make the timeout configurable via environment variables:

```rust
// In Terminal::new():
let idle_timeout = std::env::var("IDLE_TIMEOUT_SECS")
    .ok()
    .and_then(|s| s.parse().ok())
    .map(Duration::from_secs)
    .unwrap_or(IDLE_TIMEOUT);

Self {
    // ...
    idle_timeout,
}
```

Then set it when running:
```bash
IDLE_TIMEOUT_SECS=600 cargo run
```

Or in Docker:
```dockerfile
ENV IDLE_TIMEOUT_SECS=600
```

## Monitoring and Debugging

### Log Messages

The idle timeout feature produces these log messages:

- `INFO`: `"Started Terminal"` - When a new terminal session begins
- `INFO`: `"Terminal idle timeout reached ({duration} idle), stopping session"` - When idle timeout triggers
- `INFO`: `"Stopping Terminal"` - When terminal is stopping (for any reason)
- `INFO`: `"Stopped Terminal"` - After terminal cleanup completes

### Checking Active Sessions

To see how many terminal processes are running:

```bash
# Count socktop processes
ps aux | grep socktop-agent | grep -v grep | wc -l

# See all with details
ps aux | grep socktop-agent | grep -v grep
```

### Testing the Timeout

To test with a shorter timeout (30 seconds):

1. Modify `IDLE_TIMEOUT` in `src/lib.rs`:
   ```rust
   const IDLE_TIMEOUT: Duration = Duration::from_secs(30);
   ```

2. Rebuild: `cargo build`

3. Start the server and connect

4. Stop interacting and watch the logs

5. After 30 seconds, you should see: `"Terminal idle timeout reached"`

6. Verify the process is gone: `ps aux | grep socktop-agent`

## Trade-offs and Considerations

### Pros
✅ Prevents resource leaks from abandoned sessions  
✅ Automatic cleanup without manual intervention  
✅ Handles page refreshes gracefully  
✅ Simple implementation with low overhead  

### Cons
❌ Long-running unattended commands will be killed  
❌ Users must stay "active" to keep sessions alive  
❌ Fixed timeout may not suit all use cases  

### Recommendations

**For Development**: Use a longer timeout (10-15 minutes) to avoid interruption during debugging

**For Production**: 
- Start with 5 minutes (current default)
- Monitor logs to see how often timeouts occur
- Adjust based on your users' typical session patterns
- Consider making it configurable per-deployment

**For Public/Demo Instances**: Use a shorter timeout (1-2 minutes) to aggressively reclaim resources

## Future Enhancements

Possible improvements:

1. **Per-User Configurable Timeouts**: Allow users to set their preferred timeout
2. **Activity-Aware Timeout**: Don't timeout if the PTY is producing output (indicates active command)
3. **Session Persistence**: Integration with tmux/screen for sessions that survive disconnects
4. **Metrics Collection**: Track session duration, timeout frequency, resource usage
5. **Graceful Shutdown Warnings**: Send a warning message to the terminal before timeout
6. **Reconnection Support**: Allow reconnecting to an existing session within the timeout window

## Related Files

- `src/lib.rs` - Main implementation
- `src/event.rs` - Message types and events
- `Cargo.toml` - Dependencies

## See Also

- [Docker Deployment Guide](DOCKER_DEPLOYMENT.md)
- [Xterm.js Upgrade Documentation](XTERM_UPGRADE.md)
- [Catppuccin Styling Guide](CATPPUCCIN_STYLING.md)
Initial commit: Socktop WebTerm with k3s deployment - Multi-architecture Docker image (ARM64 + AMD64) - Kubernetes manifests for 3-replica deployment - Traefik ingress configuration - NGINX Proxy Manager integration - ConfigMap-based configuration - Automated build and deployment scripts - Session monitoring tools 2025-11-28 09:31:33 +00:00			`# Idle Timeout Feature`

			`## Overview`

			`The webterm now includes an idle timeout mechanism to prevent "grey goo" accumulation of orphaned terminal processes. This feature automatically cleans up inactive PTY sessions, preventing resource leaks when users refresh pages or abandon sessions.`

			`## How It Works`

			`### Architecture`

			The idle timeout is implemented in the `Terminal` actor (`src/lib.rs`):

			1. Activity Tracking: Each `Terminal` maintains a `last_activity` timestamp that is updated whenever user interaction occurs
			`2. Periodic Checking: A background task runs every 30 seconds to check if the session has been idle`
			`3. Automatic Cleanup: If a session is idle for longer than the configured timeout, the Terminal actor stops itself, cleaning up the PTY and child process`

			`### What Counts as Activity`

			The `last_activity` timestamp is updated on:

			- User Input (`TerminadoMessage::Stdin`): Keyboard input from the user
			- Terminal Resize (`TerminadoMessage::Resize`): Window resize events
			- Direct IO (`event::IO`): Any direct I/O from the WebSocket

			`Note: Output from the PTY to the terminal (stdout) does NOT reset the idle timer. This is intentional—we care about user activity, not just program output.`

			`### Configuration`

			The timeout values are configured as constants in `src/lib.rs`:

			```rust
			`const IDLE_TIMEOUT: Duration = Duration::from_secs(300); // 5 minutes`
			`const IDLE_CHECK_INTERVAL: Duration = Duration::from_secs(30); // Check every 30 seconds`
			```

			`Default Settings:`
			`- Idle Timeout: 5 minutes (300 seconds)`
			`- Check Interval: 30 seconds`

			`### Behavior Scenarios`

			`#### Scenario 1: Page Refresh`
			`1. User refreshes the browser page`
			2. Old WebSocket disconnects → old `Websocket` actor stops
			3. Old `Terminal` actor continues running (no new messages arrive)
			4. After 5 minutes of no activity, old `Terminal` times out and stops
			`5. New WebSocket and Terminal are created for the new page`

			`Result: Old session is cleaned up within 5 minutes`

			`#### Scenario 2: User Goes Idle`
			`1. User leaves terminal open but inactive`
			`2. No keyboard input or resize events occur`
			`3. Program output (if any) continues, but doesn't reset timer`
			4. After 5 minutes, `Terminal` stops

			`Result: Idle session is cleaned up`

			`#### Scenario 3: Active Use`
			`1. User actively types commands or interacts with terminal`
			2. Each interaction resets `last_activity`
			3. `Terminal` never reaches idle timeout
			`4. Session continues indefinitely while active`

			`Result: Active sessions remain alive`

			`#### Scenario 4: Long-Running Command`
			1. User starts a long-running command (e.g., `tail -f`, continuous monitoring)
			`2. Program produces output, but user doesn't interact`
			3. After 5 minutes of no user input, `Terminal` times out
			`4. Child process is killed`

			`Result: Long-running unattended processes are cleaned up`

			`> Note: If you need to run long-lived monitoring commands, you may want to:`
			> - Increase the `IDLE_TIMEOUT` constant
			`> - Periodically send a no-op interaction (like a resize event) to keep the session alive`
			`> - Use a different mechanism (like tmux/screen) for persistent sessions`

			`## Implementation Details`

			`### Terminal Struct`

			```rust
			`pub struct Terminal {`
			`pty_write: Option<AsyncPtyMasterWriteHalf>,`
			`child: Option<Child>,`
			`ws: Addr<Websocket>,`
			`command: Command,`
			`last_activity: Instant, // NEW: Track last activity`
			`idle_timeout: Duration, // NEW: Timeout duration`
			`}`
			```

			`### Initialization`

			In `Terminal::new()`:
			```rust
			`Self {`
			`pty_write: None,`
			`child: None,`
			`ws,`
			`command,`
			`last_activity: Instant::now(), // Initialize to current time`
			`idle_timeout: IDLE_TIMEOUT, // Set configured timeout`
			`}`
			```

			`### Periodic Check`

			In `Terminal::started()`:
			```rust
			`ctx.run_interval(IDLE_CHECK_INTERVAL, \|act, ctx\| {`
			`let idle_duration = Instant::now().duration_since(act.last_activity);`
			`if idle_duration >= act.idle_timeout {`
			`info!(`
			`"Terminal idle timeout reached ({:?} idle), stopping session",`
			`idle_duration`
			`);`
			`ctx.stop();`
			`}`
			`});`
			```

			`### Activity Updates`

			`In message handlers:`
			```rust
			`// Handler<event::IO>`
			`fn handle(&mut self, msg: event::IO, ctx: &mut Context<Self>) {`
			`self.last_activity = Instant::now(); // Reset timer`
			`// ... rest of handler`
			`}`

			`// Handler<event::TerminadoMessage>`
			`fn handle(&mut self, msg: event::TerminadoMessage, ctx: &mut Context<Self>) {`
			`match msg {`
			`TerminadoMessage::Stdin(io) => {`
			`self.last_activity = Instant::now(); // Reset on input`
			`// ...`
			`}`
			`TerminadoMessage::Resize { rows, cols } => {`
			`self.last_activity = Instant::now(); // Reset on resize`
			`// ...`
			`}`
			`// ...`
			`}`
			`}`
			```

			`## Customization`

			`### Changing the Timeout Duration`

			To adjust the idle timeout, modify the constants in `src/lib.rs`:

			```rust
			`// For a 10-minute timeout:`
			`const IDLE_TIMEOUT: Duration = Duration::from_secs(600);`

			`// For a 1-minute timeout (more aggressive):`
			`const IDLE_TIMEOUT: Duration = Duration::from_secs(60);`

			`// For a 30-second timeout (very aggressive):`
			`const IDLE_TIMEOUT: Duration = Duration::from_secs(30);`
			```

			`### Making It Configurable`

			`To make the timeout configurable via environment variables:`

			```rust
			`// In Terminal::new():`
			`let idle_timeout = std::env::var("IDLE_TIMEOUT_SECS")`
			`.ok()`
			`.and_then(\|s\| s.parse().ok())`
			`.map(Duration::from_secs)`
			`.unwrap_or(IDLE_TIMEOUT);`

			`Self {`
			`// ...`
			`idle_timeout,`
			`}`
			```

			`Then set it when running:`
			```bash
			`IDLE_TIMEOUT_SECS=600 cargo run`
			```

			`Or in Docker:`
			```dockerfile
			`ENV IDLE_TIMEOUT_SECS=600`
			```

			`## Monitoring and Debugging`

			`### Log Messages`

			`The idle timeout feature produces these log messages:`

			- `INFO`: `"Started Terminal"` - When a new terminal session begins
			- `INFO`: `"Terminal idle timeout reached ({duration} idle), stopping session"` - When idle timeout triggers
			- `INFO`: `"Stopping Terminal"` - When terminal is stopping (for any reason)
			- `INFO`: `"Stopped Terminal"` - After terminal cleanup completes

			`### Checking Active Sessions`

			`To see how many terminal processes are running:`

			```bash
			`# Count socktop processes`
			`ps aux \| grep socktop-agent \| grep -v grep \| wc -l`

			`# See all with details`
			`ps aux \| grep socktop-agent \| grep -v grep`
			```

			`### Testing the Timeout`

			`To test with a shorter timeout (30 seconds):`

			1. Modify `IDLE_TIMEOUT` in `src/lib.rs`:
			```rust
			`const IDLE_TIMEOUT: Duration = Duration::from_secs(30);`
			```

			2. Rebuild: `cargo build`

			`3. Start the server and connect`

			`4. Stop interacting and watch the logs`

			5. After 30 seconds, you should see: `"Terminal idle timeout reached"`

			6. Verify the process is gone: `ps aux \| grep socktop-agent`

			`## Trade-offs and Considerations`

			`### Pros`
			`✅ Prevents resource leaks from abandoned sessions`
			`✅ Automatic cleanup without manual intervention`
			`✅ Handles page refreshes gracefully`
			`✅ Simple implementation with low overhead`

			`### Cons`
			`❌ Long-running unattended commands will be killed`
			`❌ Users must stay "active" to keep sessions alive`
			`❌ Fixed timeout may not suit all use cases`

			`### Recommendations`

			`For Development: Use a longer timeout (10-15 minutes) to avoid interruption during debugging`

			`For Production:`
			`- Start with 5 minutes (current default)`
			`- Monitor logs to see how often timeouts occur`
			`- Adjust based on your users' typical session patterns`
			`- Consider making it configurable per-deployment`

			`For Public/Demo Instances: Use a shorter timeout (1-2 minutes) to aggressively reclaim resources`

			`## Future Enhancements`

			`Possible improvements:`

			`1. Per-User Configurable Timeouts: Allow users to set their preferred timeout`
			`2. Activity-Aware Timeout: Don't timeout if the PTY is producing output (indicates active command)`
			`3. Session Persistence: Integration with tmux/screen for sessions that survive disconnects`
			`4. Metrics Collection: Track session duration, timeout frequency, resource usage`
			`5. Graceful Shutdown Warnings: Send a warning message to the terminal before timeout`
			`6. Reconnection Support: Allow reconnecting to an existing session within the timeout window`

			`## Related Files`

			- `src/lib.rs` - Main implementation
			- `src/event.rs` - Message types and events
			- `Cargo.toml` - Dependencies

			`## See Also`

			`- [Docker Deployment Guide](DOCKER_DEPLOYMENT.md)`
			`- [Xterm.js Upgrade Documentation](XTERM_UPGRADE.md)`
			`- [Catppuccin Styling Guide](CATPPUCCIN_STYLING.md)`