.Net Core 連線 Redis Cluster 測試故障轉移

  1. 環境
  2. 連線 redis cluster 3 master-3 slave 故障轉移
    1. 測試情境
    2. 測試結果
  3. 備註
  4. 結論
  5. 補充 StackExchange.Redis.CommandFlags
  6. 參考文件

環境

.Net 6
StackExchange.Redis Nuget 版本號: 2.6.116

連線 redis cluster 3 master-3 slave 故障轉移

環境建置參考 w4560000 - Redis Cluster

設定 cluster-node-timeout 1000

代表 master 1 秒沒回應則馬上會被標記主觀下線,其他 master 也會嘗試與該 master 進行連線
若在 cluster-node-timeout *2 時間內,收到一半以上的 master 回報連線失敗,則該 master 被標記為客觀下線
並執行故障轉移流程

目前 Reids Cluster 配置

master slave slot Keys
10.240.0.11 (m1) 10.240.0.15 (s1) 0-5460 (共有5461) Key1
10.240.0.12 (m2) 10.240.0.16 (s2) 5461-10922 (共有5462) Key2
10.240.0.13 (m3) 10.240.0.14 (s3) 10923-16383 (共有5461) Key3

設定 Key1,確認 keyslot 是在 m1 上

docker exec -it redis redis-cli cluster keyslot Key1

# 輸出
(integer) 5291

docker exec -it redis redis-cli -c SET Key1 "1"

測試情境

用 Console 模擬,每秒更新、讀取 Key1 的值
StringGet 設定 CommandFlags.PreferReplica (讀取時以 Replica 為主要讀取對象,若無Replica才改讀 Master)
StringSet 設定 CommandFlags.DemandMaster (寫入時限定 Master))

可同時 monitor m1、s1 確認是否有讀寫分離
備註: s1 會看到 SET Key 的操作是因為 Master 同步資料到 Replica,而不是程式 Client端 寫入 Replica

將 m1 停止服務,確認故障轉移情況

測試結果

Redis Log

m1:
09:24:25.725 # User requested shutdown...

s1:
09:24:25.727 # Connection with master lost.
09:24:25.727 * Caching the disconnected master state.
09:24:26.419 * Connecting to MASTER 10.240.0.11:6379
09:24:26.419 * MASTER <-> REPLICA sync started
09:24:26.419 # Error condition on socket for SYNC: Operation now in progress
09:24:27.059 * FAIL message received from 5cf40f5453157184bcaaf72705bd8e366a01a45b about a8205cb1beca30ed91412f0524b8fe7891a5e151
09:24:27.059 # Cluster state changed: fail
09:24:27.125 # Start of election delayed for 611 milliseconds (rank #0, offset 6743).
09:24:27.428 * Connecting to MASTER 10.240.0.11:6379
09:24:27.428 * MASTER <-> REPLICA sync started
09:24:27.428 # Error condition on socket for SYNC: Operation now in progress
09:24:27.829 # Starting a failover election for epoch 30.
09:24:27.866 # Failover election won: I'm the new master.
09:24:27.866 # configEpoch set to 30 after successful failover
09:24:27.866 # Setting secondary replication ID to 8573bc7a38fdd760b944b1300a270097f179c5f2, valid up to offset: 6744. New replication ID is be0c19c20de3875ee8d8eb2c
09:24:27.866 * Discarding previously cached master state.
09:24:27.866 # Cluster state changed: ok

程式 Log

2023-08-19 09:24:24.841 Key1 = 26
2023-08-19 09:24:24.841 Key1 預計更新為 27
已更新
更新後確認 Key1 = 27

2023-08-19 09:24:25.845 Key1 = 27
2023-08-19 09:24:25.845 Key1 預計更新為 28
Update 發生錯誤, Error:The message timed out in the backlog attempting to send because no connection became available, command=SET, timeout: 1000, inst: 0, qu: 0, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 27, cur-in: 0, sync-ops: 80, async-ops: 2, serverEndpoint: 10.240.0.12:6379, conn-sec: 27.27, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: instance-1(SE.Redis-v2.6.122.38350), PerfCounterHelperkeyHashSlot: 5291, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=0,Free=32767,Min=2,Max=32767), POOL: (Threads=7,QueuedItems=0,CompletedItems=281,Timers=3), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
08/19/2023 09:24:26 Redis connection failed. Retrying (00:00:01)...
Update 發生錯誤, Error:CLUSTERDOWN The cluster is down
08/19/2023 09:24:27 Redis connection failed. Retrying (00:00:01)...
Update 發生錯誤, Error:InternalFailure on [0]:SET Key1 (BooleanProcessor)
08/19/2023 09:24:28 Redis connection failed. Retrying (00:00:01)...
已更新
更新後確認 Key1 = 28

2023-08-19 09:24:30.865 Key1 = 28
2023-08-19 09:24:30.865 Key1 預計更新為 29
已更新
更新後確認 Key1 = 29

Code

using Newtonsoft.Json;
using Polly;
using Polly.Retry;
using StackExchange.Redis;

namespace RedisClusterTest
{
    public class RedisConnectionManager
    {
        private readonly object _lock = new object();
        private IConnectionMultiplexer _connectionMultiplexer;
        private readonly ConfigurationOptions _configurationOptions;
        private readonly RetryPolicy _retryPolicy;

        public RedisConnectionManager(ConfigurationOptions configurationOptions, RetryPolicy retryPolicy)
        {
            _configurationOptions = configurationOptions;
            _retryPolicy = retryPolicy;
            SetConnection();
        }

        public IConnectionMultiplexer SetConnection()
        {
            return _retryPolicy.Execute(() =>
            {
                if (_connectionMultiplexer == null || !_connectionMultiplexer.IsConnected)
                {
                    lock (_lock)
                    {
                        _connectionMultiplexer?.Close();
                        _connectionMultiplexer = ConnectionMultiplexer.Connect(_configurationOptions);
                    }
                }

                return _connectionMultiplexer;
            });
        }

        public T? Get<T>(string key)
        {
            return _retryPolicy.Execute(() =>
            {
                try
                {
                    var value = _connectionMultiplexer.GetDatabase().StringGet(key, flags: CommandFlags.PreferReplica);

                    if (value.IsNullOrEmpty)
                        return default;

                    return JsonConvert.DeserializeObject<T>(value);
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Get 發生錯誤, Error:{ex.Message}");
                    throw ex;
                }
            });
        }

        public void Update(string key, string data)
        {
            _retryPolicy.Execute(() =>
            {
                try
                {
                    _connectionMultiplexer.GetDatabase().StringSet(key, data, flags: CommandFlags.DemandMaster);
                    Console.WriteLine("已更新");
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Update 發生錯誤, Error:{ex.Message}");
                    throw ex;
                }
            });
        }
    }

    internal class Program
    {
        static void Main(string[] args)
        {

            var retryPolicy = Policy.Handle<RedisConnectionException>()
                        .Or<RedisTimeoutException>()
                        .Or<RedisServerException>()
                        .WaitAndRetry(3, _ => TimeSpan.FromSeconds(1), (exception, retryCount) =>
                        {
                            Console.WriteLine($"{DateTime.Now} Redis connection failed. Retrying ({retryCount})...");
                        });


            var configuration = new ConfigurationOptions()
            {
                EndPoints = {
                        { "10.240.0.11:6379" },
                        { "10.240.0.12:6379" },
                        { "10.240.0.13:6379" },
                        { "10.240.0.14:6379" },
                        { "10.240.0.15:6379" },
                        { "10.240.0.16:6379" },
                    },
                AbortOnConnectFail = true,
                ConnectTimeout = 1000,
                SyncTimeout = 1000,
                ConnectRetry = 5
            };

            var redisConnectionManager = new RedisConnectionManager(configuration, retryPolicy);

            while (true)
            {
                var value = redisConnectionManager.Get<string>("Key1");
                Console.WriteLine($"{DateTime.Now:yyyy-MM-dd HH:mm:ss.fff} Key1 = {value}");

                var newValue = Convert.ToInt32(value) + 1;

                Console.WriteLine($"{DateTime.Now:yyyy-MM-dd HH:mm:ss.fff} Key1 預計更新為 {newValue}");
                redisConnectionManager.Update("Key1", newValue.ToString());
                Console.WriteLine($"更新後確認 Key1 = {redisConnectionManager.Get<string>("Key1")}\n");
                Thread.Sleep(1000);
            }
        }
    }
}

備註

假設 Redis Cluster 是在內網且 cluster-announce-ip 是設定內部 IP 10.240.0.11 ~ 10.240.0.16
而程式端在外網且 Redis Cluster 各個 Node 防火牆也都有開通 Port、允許該外網 IP 訪問
此時透過 StackExchange.Redis 連線仍然會失敗

因為當 IConnectionMultiplexer 首次嘗試連線時,Redis 會回傳 其他 Cluster Node cluster-announce-ip 給程式端連接
此時是回傳 10.240.0.11 ~ 10.240.0.16,而這是內部 IP,程式端在外網無法連線,若程式端也在同一內網則可正常連線

PS: 若 Redis Cluster 雖然是在同一內網,但 cluster-announce-ip 若是各自設定自己的外網 IP
則此時程式端也在外網,則可以正常連線

結論

程式端要連線 Redis Cluster 時,要確保 程式端與每個 Node 的 cluster-announce-ip 都是可連線狀態
正常情況下 程式端與 Redis Cluster 應都會是在內網內可正常連線,則問題不大

補充 StackExchange.Redis.CommandFlags

CommandFlags 說明
None (0) 預設 (讀寫都往 Master)
HighPriority (1) 已廢棄
FireAndForget (2) 射後不理 (不等待 Redis 回應)
PreferMaster (0) 以 Master 為主,Replica 為輔 (當 Master 異常會往 Replica )
DemandMaster (4) 限定 Master
PreferReplica (8) 以 Replica 為主,Master 為輔 (當 Replica 異常會往 Master )
DemandReplica (12) 限定 Replica
NoRedirect (64) 當發生 MOVED OR ASK 錯誤時,不自動重定向而是直接回傳錯誤
NoScriptCache (512) 讓 Redis 執行 Lua Script 不要緩存腳本

若要設定讀寫分離時

讀的操作設定 PreferReplica

以讀取 Replica 為主,當故障轉移後,Replica 會被升級為 Master,此時因設定 PreferReplica 仍然可以正常讀取
(若是設定到 DemandReplica 則此時永遠無法存取,因為已經沒有 Replica 了,需小心使用)

寫的操作設定 DemandMaster

限定寫入到 Master,因沒有特別設定的話 Replica 預設是沒有寫入權限

參考文件

为个人日志 - Redis cluster集群failover测试


轉載請註明來源,若有任何錯誤或表達不清楚的地方,歡迎在下方評論區留言,也可以來信至 leozheng0621@gmail.com
如果文章對您有幫助,歡迎斗內(donate),請我喝杯咖啡

斗內💰

×

歡迎斗內

github