// LAB 006 — REPLICATION & MONITORING

AD Replication &
Health Monitoring

Used repadmin and dcdiag to monitor Active Directory replication health, intentionally broke replication between DCs, diagnosed the failure, restored service, and documented findings in a formal After-Action Report.

repadmin dcdiag AD Replication Event Viewer ITIL AAR Windows Server 2019 Sites & Services
ENVIRONMENT
2-DC VirtualBox Lab
DCs
DC01 + DC02
DOMAIN
corp.local
PROCESS
ITIL Incident + AAR

WHAT THIS LAB DEMONSTRATES

AD replication failures silently corrupt the directory — user password changes don't propagate, GPO updates stall, and object creation conflicts arise. This lab builds hands-on familiarity with every tool a Tier 2 sysadmin uses to monitor and recover replication: baseline health checks with dcdiag and repadmin, break/fix scenarios, Event Log analysis, and documented incident closure using an After-Action Report (ITIL format).


STEP 01 // BASELINE HEALTH CHECK WITH DCDIAG

Ran a full dcdiag /v on both domain controllers to establish a healthy baseline. Documented all passing tests. Key tests reviewed: Advertising, KccEvent, KnowsOfRoleHolders, MachineAccount, NCSecDesc, NetLogons, ObjectsReplicated, Replications, RidManager, SystemLog.

CMD — dcdiag baseline
C:\> dcdiag /v /test:Replications /test:Advertising /test:KccEvent

Directory Server Diagnosis
Performing initial setup:
Trying to find home server...
Home Server = DC01
* Identified AD Forest.

Starting test: Replications
* Replications Check
......................... DC01 passed test Replications
Starting test: Advertising
......................... DC01 passed test Advertising
Starting test: KccEvent
......................... DC01 passed test KccEvent

REM Baseline: all tests PASSED on both DC01 and DC02
STEP 02 // MONITOR REPLICATION WITH REPADMIN

Used repadmin /showrepl to view per-partition replication status and repadmin /replsummary for a high-level pass/fail view across all DCs. Forced manual replication with repadmin /syncall and verified changes propagated within seconds.

CMD — repadmin monitoring
C:\> repadmin /replsummary

Replication Summary Start Time: 2026-03-10 14:22:01

Beginning data collection for replication summary, this may take awhile:
.........................
Source DSA largest delta fails/total %% error
DC01 00h:03m:12s 0 / 3 0%
DC02 00h:03m:08s 0 / 3 0%

C:\> repadmin /syncall /AdeP
Syncing all NC's held on DC01.
Syncing partition: DC=corp,DC=local
CALLBACK MESSAGE: The following replication completed successfully:
From: DC02 To: DC01 NC: DC=corp,DC=local
SyncAll Finished Successfully.
STEP 03 // BREAK/FIX — SIMULATE REPLICATION FAILURE

Intentionally broke replication between DC01 and DC02 by blocking RPC port 135 traffic via Windows Firewall on DC02. Observed the failure surface in repadmin /showrepl and in Event Viewer (Event IDs 1311, 1864). Confirmed the failure, then restored connectivity by removing the firewall rule.

CMD — Failure injection and detection
REM -- On DC02: block RPC endpoint mapper port to simulate network failure
C:\> netsh advfirewall firewall add rule name="BREAK-RPC" protocol=TCP dir=in localport=135 action=block
Ok.

REM -- On DC01: force sync attempt — should now fail
C:\> repadmin /syncall /AdeP
CALLBACK MESSAGE: The following replication FAILED:
From: DC02 To: DC01
Naming Context: DC=corp,DC=local
LDAP Error 58 (0x3a): The specified server cannot perform the requested operation.

C:\> repadmin /showrepl DC01
Last attempt @ 2026-03-10 15:17:32 was successful.
Last attempt @ 2026-03-10 15:44:01 FAILED, result 1722 (0x6ba):
The RPC server is unavailable.

REM -- Fix: remove blocking rule on DC02
C:\> netsh advfirewall firewall delete rule name="BREAK-RPC"
Deleted 1 rule(s).

C:\> repadmin /syncall /AdeP
SyncAll Finished Successfully.
STEP 04 // EVENT VIEWER ANALYSIS — AD REPLICATION EVENTS

Reviewed the Directory Service event log in Event Viewer during and after the failure. Documented the key Event IDs that surface during replication failures and understood what each indicates. Confirmed all errors cleared after the firewall rule was removed and replication recovered.

PowerShell — Event Log Query
PS> Get-EventLog -LogName "Directory Service" -EntryType Error,Warning -Newest 10 | Select TimeWritten,EventID,Message

TimeWritten EventID Message
----------- ------- -------
3/10/2026 3:44:01 PM 1311 The Knowledge Consistency Checker (KCC) has detected that successive attempts to replicate...
3/10/2026 3:44:01 PM 1864 This is the replication status for the naming context DC=corp,DC=local from the source DC...
3/10/2026 3:44:01 PM 1925 The attempt to establish a replication link for the following writable directory partition failed.

# After fix — clean health check
PS> Get-EventLog -LogName "Directory Service" -EntryType Error -Newest 5
No events found matching the specified criteria.
STEP 05 // AFTER-ACTION REPORT (ITIL FORMAT)

Documented the incident end-to-end in a structured After-Action Report following ITIL incident management principles. The AAR captures timeline, root cause analysis, impact, and corrective actions — the same format used in enterprise environments to close incidents and prevent recurrence.

// INCIDENT REPORT — AAR-2026-001
INCIDENT TITLEAD Replication Failure Between DC01 and DC02
DATE / TIME2026-03-10 15:44 UTC — resolved 16:02 UTC (18 min outage)
SEVERITYP2 — High (replication failure, no auth outage due to single site)
DETECTED BYrepadmin /replsummary scheduled task alert — error 1722 (RPC unavailable)
ROOT CAUSEWindows Firewall rule on DC02 blocked inbound TCP 135 (RPC Endpoint Mapper), preventing DC01 from establishing a replication connection
IMPACTAD object changes on DC01 were not propagating to DC02 for 18 minutes. No user-facing authentication failures due to single Active Directory site
REMEDIATIONRemoved blocking firewall rule on DC02. Forced replication via repadmin /syncall /AdeP. Verified clean replication with replsummary and dcdiag
CORRECTIVE ACTIONAdded firewall audit GPO to all DCs. Created monitoring script to alert on repadmin failures > 0 in daily replsummary. Documented RPC port requirements in internal runbook

WHAT WAS ACHIEVED
  • Established baseline AD health using dcdiag across all test categories on a 2-DC lab
  • Monitored replication topology and forced manual sync using repadmin /showrepl, /replsummary, /syncall
  • Intentio.ally introduced and confirmed a replication failure (RPC error 1722) via firewall block
  • Diagnosed root cause using Event IDs 1311, 1864, 1925 in the Directory Service log
  • Restored full replication and verified clean health with no errors in Event Viewer
  • Produced a complete ITIL-format After-Action Report covering timeline, root cause, and corrective actions

EXPLORE MORE
← RETURN TO LAB INDEX