環境
.Net 6
StackExchange.Redis Nuget 版本號: 2.6.116
連線 redis cluster 3 master-3 slave 故障轉移
環境建置參考 w4560000 - Redis Cluster
設定 cluster-node-timeout 1000
代表 master 1 秒沒回應則馬上會被標記主觀下線,其他 master 也會嘗試與該 master 進行連線
若在 cluster-node-timeout *2 時間內,收到一半以上的 master 回報連線失敗,則該 master 被標記為客觀下線
並執行故障轉移流程
目前 Reids Cluster 配置
master | slave | slot | Keys |
---|---|---|---|
10.240.0.11 (m1) | 10.240.0.15 (s1) | 0-5460 (共有5461) | Key1 |
10.240.0.12 (m2) | 10.240.0.16 (s2) | 5461-10922 (共有5462) | Key2 |
10.240.0.13 (m3) | 10.240.0.14 (s3) | 10923-16383 (共有5461) | Key3 |
設定 Key1,確認 keyslot 是在 m1 上
docker exec -it redis redis-cli cluster keyslot Key1
# 輸出
(integer) 5291
docker exec -it redis redis-cli -c SET Key1 "1"
測試情境
用 Console 模擬,每秒更新、讀取 Key1 的值
StringGet 設定 CommandFlags.PreferReplica (讀取時以 Replica 為主要讀取對象,若無Replica才改讀 Master)
StringSet 設定 CommandFlags.DemandMaster (寫入時限定 Master))
可同時 monitor m1、s1 確認是否有讀寫分離
備註: s1 會看到 SET Key 的操作是因為 Master 同步資料到 Replica,而不是程式 Client端 寫入 Replica
將 m1 停止服務,確認故障轉移情況
測試結果
Redis Log
m1:
09:24:25.725 # User requested shutdown...
s1:
09:24:25.727 # Connection with master lost.
09:24:25.727 * Caching the disconnected master state.
09:24:26.419 * Connecting to MASTER 10.240.0.11:6379
09:24:26.419 * MASTER <-> REPLICA sync started
09:24:26.419 # Error condition on socket for SYNC: Operation now in progress
09:24:27.059 * FAIL message received from 5cf40f5453157184bcaaf72705bd8e366a01a45b about a8205cb1beca30ed91412f0524b8fe7891a5e151
09:24:27.059 # Cluster state changed: fail
09:24:27.125 # Start of election delayed for 611 milliseconds (rank #0, offset 6743).
09:24:27.428 * Connecting to MASTER 10.240.0.11:6379
09:24:27.428 * MASTER <-> REPLICA sync started
09:24:27.428 # Error condition on socket for SYNC: Operation now in progress
09:24:27.829 # Starting a failover election for epoch 30.
09:24:27.866 # Failover election won: I'm the new master.
09:24:27.866 # configEpoch set to 30 after successful failover
09:24:27.866 # Setting secondary replication ID to 8573bc7a38fdd760b944b1300a270097f179c5f2, valid up to offset: 6744. New replication ID is be0c19c20de3875ee8d8eb2c
09:24:27.866 * Discarding previously cached master state.
09:24:27.866 # Cluster state changed: ok
程式 Log
2023-08-19 09:24:24.841 Key1 = 26
2023-08-19 09:24:24.841 Key1 預計更新為 27
已更新
更新後確認 Key1 = 27
2023-08-19 09:24:25.845 Key1 = 27
2023-08-19 09:24:25.845 Key1 預計更新為 28
Update 發生錯誤, Error:The message timed out in the backlog attempting to send because no connection became available, command=SET, timeout: 1000, inst: 0, qu: 0, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 27, cur-in: 0, sync-ops: 80, async-ops: 2, serverEndpoint: 10.240.0.12:6379, conn-sec: 27.27, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: instance-1(SE.Redis-v2.6.122.38350), PerfCounterHelperkeyHashSlot: 5291, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=0,Free=32767,Min=2,Max=32767), POOL: (Threads=7,QueuedItems=0,CompletedItems=281,Timers=3), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
08/19/2023 09:24:26 Redis connection failed. Retrying (00:00:01)...
Update 發生錯誤, Error:CLUSTERDOWN The cluster is down
08/19/2023 09:24:27 Redis connection failed. Retrying (00:00:01)...
Update 發生錯誤, Error:InternalFailure on [0]:SET Key1 (BooleanProcessor)
08/19/2023 09:24:28 Redis connection failed. Retrying (00:00:01)...
已更新
更新後確認 Key1 = 28
2023-08-19 09:24:30.865 Key1 = 28
2023-08-19 09:24:30.865 Key1 預計更新為 29
已更新
更新後確認 Key1 = 29
Code
using Newtonsoft.Json;
using Polly;
using Polly.Retry;
using StackExchange.Redis;
namespace RedisClusterTest
{
public class RedisConnectionManager
{
private readonly object _lock = new object();
private IConnectionMultiplexer _connectionMultiplexer;
private readonly ConfigurationOptions _configurationOptions;
private readonly RetryPolicy _retryPolicy;
public RedisConnectionManager(ConfigurationOptions configurationOptions, RetryPolicy retryPolicy)
{
_configurationOptions = configurationOptions;
_retryPolicy = retryPolicy;
SetConnection();
}
public IConnectionMultiplexer SetConnection()
{
return _retryPolicy.Execute(() =>
{
if (_connectionMultiplexer == null || !_connectionMultiplexer.IsConnected)
{
lock (_lock)
{
_connectionMultiplexer?.Close();
_connectionMultiplexer = ConnectionMultiplexer.Connect(_configurationOptions);
}
}
return _connectionMultiplexer;
});
}
public T? Get<T>(string key)
{
return _retryPolicy.Execute(() =>
{
try
{
var value = _connectionMultiplexer.GetDatabase().StringGet(key, flags: CommandFlags.PreferReplica);
if (value.IsNullOrEmpty)
return default;
return JsonConvert.DeserializeObject<T>(value);
}
catch (Exception ex)
{
Console.WriteLine($"Get 發生錯誤, Error:{ex.Message}");
throw ex;
}
});
}
public void Update(string key, string data)
{
_retryPolicy.Execute(() =>
{
try
{
_connectionMultiplexer.GetDatabase().StringSet(key, data, flags: CommandFlags.DemandMaster);
Console.WriteLine("已更新");
}
catch (Exception ex)
{
Console.WriteLine($"Update 發生錯誤, Error:{ex.Message}");
throw ex;
}
});
}
}
internal class Program
{
static void Main(string[] args)
{
var retryPolicy = Policy.Handle<RedisConnectionException>()
.Or<RedisTimeoutException>()
.Or<RedisServerException>()
.WaitAndRetry(3, _ => TimeSpan.FromSeconds(1), (exception, retryCount) =>
{
Console.WriteLine($"{DateTime.Now} Redis connection failed. Retrying ({retryCount})...");
});
var configuration = new ConfigurationOptions()
{
EndPoints = {
{ "10.240.0.11:6379" },
{ "10.240.0.12:6379" },
{ "10.240.0.13:6379" },
{ "10.240.0.14:6379" },
{ "10.240.0.15:6379" },
{ "10.240.0.16:6379" },
},
AbortOnConnectFail = true,
ConnectTimeout = 1000,
SyncTimeout = 1000,
ConnectRetry = 5
};
var redisConnectionManager = new RedisConnectionManager(configuration, retryPolicy);
while (true)
{
var value = redisConnectionManager.Get<string>("Key1");
Console.WriteLine($"{DateTime.Now:yyyy-MM-dd HH:mm:ss.fff} Key1 = {value}");
var newValue = Convert.ToInt32(value) + 1;
Console.WriteLine($"{DateTime.Now:yyyy-MM-dd HH:mm:ss.fff} Key1 預計更新為 {newValue}");
redisConnectionManager.Update("Key1", newValue.ToString());
Console.WriteLine($"更新後確認 Key1 = {redisConnectionManager.Get<string>("Key1")}\n");
Thread.Sleep(1000);
}
}
}
}
備註
假設 Redis Cluster 是在內網且 cluster-announce-ip 是設定內部 IP 10.240.0.11 ~ 10.240.0.16
而程式端在外網且 Redis Cluster 各個 Node 防火牆也都有開通 Port、允許該外網 IP 訪問
此時透過 StackExchange.Redis 連線仍然會失敗
因為當 IConnectionMultiplexer 首次嘗試連線時,Redis 會回傳 其他 Cluster Node cluster-announce-ip 給程式端連接
此時是回傳 10.240.0.11 ~ 10.240.0.16,而這是內部 IP,程式端在外網無法連線,若程式端也在同一內網則可正常連線
PS: 若 Redis Cluster 雖然是在同一內網,但 cluster-announce-ip 若是各自設定自己的外網 IP
則此時程式端也在外網,則可以正常連線
結論
程式端要連線 Redis Cluster 時,要確保 程式端與每個 Node 的 cluster-announce-ip 都是可連線狀態
正常情況下 程式端與 Redis Cluster 應都會是在內網內可正常連線,則問題不大
補充 StackExchange.Redis.CommandFlags
CommandFlags | 說明 |
---|---|
None (0) | 預設 (讀寫都往 Master) |
HighPriority (1) | 已廢棄 |
FireAndForget (2) | 射後不理 (不等待 Redis 回應) |
PreferMaster (0) | 以 Master 為主,Replica 為輔 (當 Master 異常會往 Replica ) |
DemandMaster (4) | 限定 Master |
PreferReplica (8) | 以 Replica 為主,Master 為輔 (當 Replica 異常會往 Master ) |
DemandReplica (12) | 限定 Replica |
NoRedirect (64) | 當發生 MOVED OR ASK 錯誤時,不自動重定向而是直接回傳錯誤 |
NoScriptCache (512) | 讓 Redis 執行 Lua Script 不要緩存腳本 |
若要設定讀寫分離時
讀的操作設定 PreferReplica
以讀取 Replica 為主,當故障轉移後,Replica 會被升級為 Master,此時因設定 PreferReplica 仍然可以正常讀取
(若是設定到 DemandReplica 則此時永遠無法存取,因為已經沒有 Replica 了,需小心使用)
寫的操作設定 DemandMaster
限定寫入到 Master,因沒有特別設定的話 Replica 預設是沒有寫入權限
參考文件
为个人日志 - Redis cluster集群failover测试
轉載請註明來源,若有任何錯誤或表達不清楚的地方,歡迎在下方評論區留言,也可以來信至 leozheng0621@gmail.com
如果文章對您有幫助,歡迎斗內(donate),請我喝杯咖啡