socktop-webterm/IDLE_TIMEOUT.md

284 lines
8.2 KiB
Markdown
Raw Normal View History

# Idle Timeout Feature
## Overview
The webterm now includes an **idle timeout mechanism** to prevent "grey goo" accumulation of orphaned terminal processes. This feature automatically cleans up inactive PTY sessions, preventing resource leaks when users refresh pages or abandon sessions.
## How It Works
### Architecture
The idle timeout is implemented in the `Terminal` actor (`src/lib.rs`):
1. **Activity Tracking**: Each `Terminal` maintains a `last_activity` timestamp that is updated whenever user interaction occurs
2. **Periodic Checking**: A background task runs every 30 seconds to check if the session has been idle
3. **Automatic Cleanup**: If a session is idle for longer than the configured timeout, the Terminal actor stops itself, cleaning up the PTY and child process
### What Counts as Activity
The `last_activity` timestamp is updated on:
- **User Input** (`TerminadoMessage::Stdin`): Keyboard input from the user
- **Terminal Resize** (`TerminadoMessage::Resize`): Window resize events
- **Direct IO** (`event::IO`): Any direct I/O from the WebSocket
Note: Output from the PTY to the terminal (stdout) does NOT reset the idle timer. This is intentional—we care about user activity, not just program output.
### Configuration
The timeout values are configured as constants in `src/lib.rs`:
```rust
const IDLE_TIMEOUT: Duration = Duration::from_secs(300); // 5 minutes
const IDLE_CHECK_INTERVAL: Duration = Duration::from_secs(30); // Check every 30 seconds
```
**Default Settings:**
- **Idle Timeout**: 5 minutes (300 seconds)
- **Check Interval**: 30 seconds
### Behavior Scenarios
#### Scenario 1: Page Refresh
1. User refreshes the browser page
2. Old WebSocket disconnects → old `Websocket` actor stops
3. Old `Terminal` actor continues running (no new messages arrive)
4. After 5 minutes of no activity, old `Terminal` times out and stops
5. New WebSocket and Terminal are created for the new page
**Result**: Old session is cleaned up within 5 minutes
#### Scenario 2: User Goes Idle
1. User leaves terminal open but inactive
2. No keyboard input or resize events occur
3. Program output (if any) continues, but doesn't reset timer
4. After 5 minutes, `Terminal` stops
**Result**: Idle session is cleaned up
#### Scenario 3: Active Use
1. User actively types commands or interacts with terminal
2. Each interaction resets `last_activity`
3. `Terminal` never reaches idle timeout
4. Session continues indefinitely while active
**Result**: Active sessions remain alive
#### Scenario 4: Long-Running Command
1. User starts a long-running command (e.g., `tail -f`, continuous monitoring)
2. Program produces output, but user doesn't interact
3. After 5 minutes of no user input, `Terminal` times out
4. Child process is killed
**Result**: Long-running unattended processes are cleaned up
> **Note**: If you need to run long-lived monitoring commands, you may want to:
> - Increase the `IDLE_TIMEOUT` constant
> - Periodically send a no-op interaction (like a resize event) to keep the session alive
> - Use a different mechanism (like tmux/screen) for persistent sessions
## Implementation Details
### Terminal Struct
```rust
pub struct Terminal {
pty_write: Option<AsyncPtyMasterWriteHalf>,
child: Option<Child>,
ws: Addr<Websocket>,
command: Command,
last_activity: Instant, // NEW: Track last activity
idle_timeout: Duration, // NEW: Timeout duration
}
```
### Initialization
In `Terminal::new()`:
```rust
Self {
pty_write: None,
child: None,
ws,
command,
last_activity: Instant::now(), // Initialize to current time
idle_timeout: IDLE_TIMEOUT, // Set configured timeout
}
```
### Periodic Check
In `Terminal::started()`:
```rust
ctx.run_interval(IDLE_CHECK_INTERVAL, |act, ctx| {
let idle_duration = Instant::now().duration_since(act.last_activity);
if idle_duration >= act.idle_timeout {
info!(
"Terminal idle timeout reached ({:?} idle), stopping session",
idle_duration
);
ctx.stop();
}
});
```
### Activity Updates
In message handlers:
```rust
// Handler<event::IO>
fn handle(&mut self, msg: event::IO, ctx: &mut Context<Self>) {
self.last_activity = Instant::now(); // Reset timer
// ... rest of handler
}
// Handler<event::TerminadoMessage>
fn handle(&mut self, msg: event::TerminadoMessage, ctx: &mut Context<Self>) {
match msg {
TerminadoMessage::Stdin(io) => {
self.last_activity = Instant::now(); // Reset on input
// ...
}
TerminadoMessage::Resize { rows, cols } => {
self.last_activity = Instant::now(); // Reset on resize
// ...
}
// ...
}
}
```
## Customization
### Changing the Timeout Duration
To adjust the idle timeout, modify the constants in `src/lib.rs`:
```rust
// For a 10-minute timeout:
const IDLE_TIMEOUT: Duration = Duration::from_secs(600);
// For a 1-minute timeout (more aggressive):
const IDLE_TIMEOUT: Duration = Duration::from_secs(60);
// For a 30-second timeout (very aggressive):
const IDLE_TIMEOUT: Duration = Duration::from_secs(30);
```
### Making It Configurable
To make the timeout configurable via environment variables:
```rust
// In Terminal::new():
let idle_timeout = std::env::var("IDLE_TIMEOUT_SECS")
.ok()
.and_then(|s| s.parse().ok())
.map(Duration::from_secs)
.unwrap_or(IDLE_TIMEOUT);
Self {
// ...
idle_timeout,
}
```
Then set it when running:
```bash
IDLE_TIMEOUT_SECS=600 cargo run
```
Or in Docker:
```dockerfile
ENV IDLE_TIMEOUT_SECS=600
```
## Monitoring and Debugging
### Log Messages
The idle timeout feature produces these log messages:
- `INFO`: `"Started Terminal"` - When a new terminal session begins
- `INFO`: `"Terminal idle timeout reached ({duration} idle), stopping session"` - When idle timeout triggers
- `INFO`: `"Stopping Terminal"` - When terminal is stopping (for any reason)
- `INFO`: `"Stopped Terminal"` - After terminal cleanup completes
### Checking Active Sessions
To see how many terminal processes are running:
```bash
# Count socktop processes
ps aux | grep socktop-agent | grep -v grep | wc -l
# See all with details
ps aux | grep socktop-agent | grep -v grep
```
### Testing the Timeout
To test with a shorter timeout (30 seconds):
1. Modify `IDLE_TIMEOUT` in `src/lib.rs`:
```rust
const IDLE_TIMEOUT: Duration = Duration::from_secs(30);
```
2. Rebuild: `cargo build`
3. Start the server and connect
4. Stop interacting and watch the logs
5. After 30 seconds, you should see: `"Terminal idle timeout reached"`
6. Verify the process is gone: `ps aux | grep socktop-agent`
## Trade-offs and Considerations
### Pros
✅ Prevents resource leaks from abandoned sessions
✅ Automatic cleanup without manual intervention
✅ Handles page refreshes gracefully
✅ Simple implementation with low overhead
### Cons
❌ Long-running unattended commands will be killed
❌ Users must stay "active" to keep sessions alive
❌ Fixed timeout may not suit all use cases
### Recommendations
**For Development**: Use a longer timeout (10-15 minutes) to avoid interruption during debugging
**For Production**:
- Start with 5 minutes (current default)
- Monitor logs to see how often timeouts occur
- Adjust based on your users' typical session patterns
- Consider making it configurable per-deployment
**For Public/Demo Instances**: Use a shorter timeout (1-2 minutes) to aggressively reclaim resources
## Future Enhancements
Possible improvements:
1. **Per-User Configurable Timeouts**: Allow users to set their preferred timeout
2. **Activity-Aware Timeout**: Don't timeout if the PTY is producing output (indicates active command)
3. **Session Persistence**: Integration with tmux/screen for sessions that survive disconnects
4. **Metrics Collection**: Track session duration, timeout frequency, resource usage
5. **Graceful Shutdown Warnings**: Send a warning message to the terminal before timeout
6. **Reconnection Support**: Allow reconnecting to an existing session within the timeout window
## Related Files
- `src/lib.rs` - Main implementation
- `src/event.rs` - Message types and events
- `Cargo.toml` - Dependencies
## See Also
- [Docker Deployment Guide](DOCKER_DEPLOYMENT.md)
- [Xterm.js Upgrade Documentation](XTERM_UPGRADE.md)
- [Catppuccin Styling Guide](CATPPUCCIN_STYLING.md)