TiDB â
TiDBã¯ãPingCAP瀟ãéçºãã忣åNewSQLããŒã¿ããŒã¹ã·ã¹ãã ã§ããMySQLãããã³ã«ãšã®äºææ§ãä¿ã¡ãªãããæ°Žå¹³ã¹ã±ãŒã©ããªãã£ã匷äžè²«æ§ãé«å¯çšæ§ãå®çŸããŠããŸããåŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ãæ±ããã¹ã±ãŒã©ããªãã£ã®èª²é¡ãšãNoSQLããŒã¿ããŒã¹ãæã€ãã©ã³ã¶ã¯ã·ã§ã³ä¿èšŒã®åŒ±ããšããäºã€ã®åé¡ãåæã«è§£æ±ºããããšãç®æããŠèšèšãããŠããŸãã
ã¢ãŒããã¯ãã£æŠèŠ â
TiDBã¯è€æ°ã®ã³ã³ããŒãã³ãããæ§æããã忣ã·ã¹ãã ã§ããåã³ã³ããŒãã³ãã¯ç¬ç«ããŠã¹ã±ãŒã«å¯èœã§ããããããããç¹å®ã®è²¬åãæ ã£ãŠããŸãã
TiDB Server â
TiDB Serverã¯ã¹ããŒãã¬ã¹ãªã³ã³ãã¥ãŒãã£ã³ã°å±€ãšããŠæ©èœããŸããã¯ã©ã€ã¢ã³ãããã®SQLèŠæ±ãåãåããããŒã¹ãšæé©åãè¡ãã忣å®è¡èšç»ãçæããŸããMySQLãããã³ã«ãšã®äºææ§ãæäŸãããããæ¢åã®MySQLã¯ã©ã€ã¢ã³ãã©ã€ãã©ãªãããŒã«ããã®ãŸãŸäœ¿çšã§ããŸããåTiDB ServerããŒãã¯ç¬ç«ããŠåäœããä»»æã®ããŒããããŠã³ããŠãä»ã®ããŒããèŠæ±ãåŠçã§ãããããé«å¯çšæ§ãå®çŸããŠããŸãã
SQLåŠçã®æµãã¯ããŸãããŒãµãŒã«ãã£ãŠSQLæãæœè±¡æ§ææšïŒASTïŒã«å€æãããæ¬¡ã«ããªããŒã¿ãŒã«ãã£ãŠæ§æã®æ£åœæ§ãæ€èšŒãããŸãããã®åŸããªããã£ãã€ã¶ãçµ±èšæ å ±ãåºã«æé©ãªå®è¡èšç»ãéžæãããšã°ãŒãã¥ãŒã¿ãŒãå®éã®åŠçãå®è¡ããŸãã忣ç°å¢ã§ã®å®è¡ã§ã¯ãããŒã¿ã®å±ææ§ãèæ ®ããŠã¿ã¹ã¯ãé©åãªTiKVããŒãã«é 眮ããããšã§ããããã¯ãŒã¯éä¿¡ãæå°åããŠããŸãã
Placement Driver (PD) â
Placement DriverïŒPDïŒã¯ãã¯ã©ã¹ã¿å šäœã®ã¡ã¿ããŒã¿ç®¡çãšã¹ã±ãžã¥ãŒãªã³ã°ãæ åœããäžå€®ç®¡çã³ã³ããŒãã³ãã§ããetcdãå éšçã«äœ¿çšããŠã¡ã¿ããŒã¿ãæ°žç¶åããRaftã³ã³ã»ã³ãµã¹ã¢ã«ãŽãªãºã ã«ãã£ãŠé«å¯çšæ§ãå®çŸããŠããŸã[1]ã
PDã®äž»èŠãªè²¬åã«ã¯ãã°ããŒãã«ã¿ã€ã ã¹ã¿ã³ãã®çæãšç®¡çããããŸããTiDBã¯åæ£ãã©ã³ã¶ã¯ã·ã§ã³ã«ãããŠã¿ã€ã ã¹ã¿ã³ãããŒã¹ã®MVCCã䜿çšãããããã°ããŒãã«ã«äžæã§å調å¢å ããã¿ã€ã ã¹ã¿ã³ããå¿ èŠã§ããPDã¯TSOïŒTimestamp OracleïŒãšããŠæ©èœããç©çæå»ãšè«çã«ãŠã³ã¿ãçµã¿åããããã€ããªããã¯ããã¯ãæäŸããŸãã
ãŸããPDã¯ãªãŒãžã§ã³ã®ã¹ã±ãžã¥ãŒãªã³ã°ãæ åœããŸããTiKVã«æ ŒçŽãããããŒã¿ã¯ããªãŒãžã§ã³ããšåŒã°ããåäœã§åå²ãããåãªãŒãžã§ã³ã¯è€æ°ã®ã¬ããªã«ãæã¡ãŸããPDã¯åãªãŒãžã§ã³ã®ãµã€ãºãã¢ã¯ã»ã¹é »åºŠãããŒãã®è² è·ç¶æ³ãç£èŠããå¿ èŠã«å¿ããŠãªãŒãžã§ã³ã®åå²ãããŒãžãã¬ããªã«ã®é çœ®å€æŽãæç€ºããŸãã
TiKV â
TiKVã¯ãTiDBã®åæ£ã¹ãã¬ãŒãžãšã³ãžã³ã§ããããŒããªã¥ãŒã¹ãã¢ãšããŠèšèšãããŠãããé åºä»ããããã®ã€ã³ã¿ãŒãã§ãŒã¹ãæäŸããŸããå éšçã«ã¯RocksDBãã¹ãã¬ãŒãžãšã³ãžã³ãšããŠäœ¿çšãããã®äžã«Raftã«ããã¬ããªã±ãŒã·ã§ã³å±€ãæ§ç¯ããŠããŸã[2]ã
ããŒã¿ã¯ãªãŒãžã§ã³ãšåŒã°ããé£ç¶ããããŒç¯å²ã«åå²ãããŸããããã©ã«ãã§ã¯åãªãŒãžã§ã³ã¯96MBã®ãµã€ãºå¶éãæã¡ããã®å€ãè¶ ãããšèªåçã«åå²ãããŸããåãªãŒãžã§ã³ã¯ç¬ç«ããRaftã°ã«ãŒãã圢æããéåžž3ã€ã®ã¬ããªã«ãæã¡ãŸããRaftãããã³ã«ã«ãããéåæ°ã®ã¬ããªã«ãæ£åžžã«åäœããŠããéããèªã¿æžãæäœãç¶ç¶ã§ããŸãã
TiKVã¯Multi-Version Concurrency ControlïŒMVCCïŒãå®è£ ããŠãããåããŒã«å¯ŸããŠè€æ°ã®ããŒãžã§ã³ãä¿æããŸããããã«ãããèªã¿åãæäœãæžãèŸŒã¿æäœããããã¯ããããšãªããã¹ãããã·ã§ããåé¢ã¬ãã«ã§ã®äžè²«æ§ã®ããèªã¿åããå®çŸããŠããŸãã
TiFlash â
TiFlashã¯ãTiDBã®HTAPïŒHybrid Transactional and Analytical ProcessingïŒæ©èœãå®çŸããããã®åæåã¹ãã¬ãŒãžãšã³ãžã³ã§ããTiKVããéåæã«ããŒã¿ãã¬ããªã±ãŒãããåæã¯ãšãªã«æé©åããã圢åŒã§ä¿åããŸãã
åæåã¹ãã¬ãŒãžã¯ãåæã¯ãšãªã§é »ç¹ã«äœ¿çšãããéçŽæäœããã£ã«ã¿ãªã³ã°ã«å¯ŸããŠé«ãæ§èœãçºæ®ããŸããTiFlashã¯ClickHouseã®ã¹ãã¬ãŒãžãšã³ãžã³ãããŒã¹ã«ããŠãããå¹ççãªå§çž®ã¢ã«ãŽãªãºã ãšãã¯ãã«åå®è¡ãšã³ãžã³ãåããŠããŸãã
忣ãã©ã³ã¶ã¯ã·ã§ã³ â
TiDBã®åæ£ãã©ã³ã¶ã¯ã·ã§ã³ã¯ãGoogle Percolatorã®èšèšã«åºã¥ããŠããŸã[3]ãPercolatorã¯ãBigTableã®äžã«æ§ç¯ãããå¢ååŠçã·ã¹ãã ã§ãACIDãã©ã³ã¶ã¯ã·ã§ã³ããµããŒãããªããå€§èŠæš¡ãªåæ£ç°å¢ã§ã®ã¹ã±ãŒã©ããªãã£ãå®çŸããŠããŸãã
2ãã§ãŒãºã³ããã â
TiDBã¯æ¥œèгçäžŠè¡æ§å¶åŸ¡ãš2ãã§ãŒãºã³ãããïŒ2PCïŒãçµã¿åãããŠãã©ã³ã¶ã¯ã·ã§ã³ãå®è£ ããŠããŸãããã©ã³ã¶ã¯ã·ã§ã³ã®å®è¡ã¯ä»¥äžã®ãã§ãŒãºã«åãããŸãïŒ
ãã©ã³ã¶ã¯ã·ã§ã³éå§æãTiDBã¯PDããstart_tsãååŸããŸãããã®ã¿ã€ã ã¹ã¿ã³ãã¯ããã©ã³ã¶ã¯ã·ã§ã³ãèªã¿åãããŒã¿ã®ã¹ãããã·ã§ãããæ±ºå®ããŸãããã©ã³ã¶ã¯ã·ã§ã³å®è¡äžã®èªã¿åãæäœã¯ããã¹ãŠãã®start_tsæç¹ã®ããŒã¿ãåç §ããŸãã
ã³ãããæã«ã¯ããŸãPDããæ°ããã¿ã€ã ã¹ã¿ã³ãcommit_tsãååŸããŸãããã®åŸã2PCã®Prewrite段éã§ããã¹ãŠã®æžã蟌ã¿å¯Ÿè±¡ããŒã«å¯ŸããŠããã¯ãååŸããæ«å®çãªå€ãæžã蟌ã¿ãŸãããã®éããã©ã€ããªããŒãæåã«åŠçãããã®åŸã»ã«ã³ããªããŒãåŠçããŸãããã©ã€ããªããŒã¯ããã©ã³ã¶ã¯ã·ã§ã³ã®ç¶æ ã衚ãç¹å¥ãªããŒã§ããã©ã³ã¶ã¯ã·ã§ã³ã®æåŠã決å®ãã圹å²ãæã¡ãŸãã
Commit段éã§ã¯ããŸããã©ã€ããªããŒã®ããã¯ãè§£é€ããŠã³ãããã確å®ãããŸãããã©ã€ããªããŒãã³ãããããããšããã©ã³ã¶ã¯ã·ã§ã³å šäœãã³ãããããããšã¿ãªãããŸããã»ã«ã³ããªããŒã®ã³ãããã¯éåæã«è¡ãããã¯ã©ã€ã¢ã³ããžã®å¿çãé«éåããŠããŸãã
è¡çªæ€åºãšè§£æ±º â
楜芳çäžŠè¡æ§å¶åŸ¡ã§ã¯ããã©ã³ã¶ã¯ã·ã§ã³å®è¡äžã¯è¡çªæ€åºãè¡ãããã³ãããæã«åããŠè¡çªããã§ãã¯ããŸããPrewrite段éã§ãæžã蟌ã¿å¯Ÿè±¡ã®ããŒã«å¯ŸããŠä»¥äžã®æ€èšŒãè¡ããŸãïŒ
- Write-Writeè¡çª: ä»ã®ãã©ã³ã¶ã¯ã·ã§ã³ãstart_tsãšcommit_tsã®éã«åãããŒãæŽæ°ããŠããªãã
- ããã¯è¡çª: ä»ã®ãã©ã³ã¶ã¯ã·ã§ã³ãåãããŒã«å¯ŸããŠããã¯ãä¿æããŠããªãã
è¡çªãæ€åºãããå Žåããã©ã³ã¶ã¯ã·ã§ã³ã¯ããŒã«ããã¯ãããã¯ã©ã€ã¢ã³ãã«ãšã©ãŒãè¿ãããŸããã¢ããªã±ãŒã·ã§ã³å±€ã§ãªãã©ã€ããžãã¯ãå®è£ ããããšã§ãäžæçãªè¡çªã解決ã§ããŸãã
忣ç°å¢ã§ã®äžè²«æ§ä¿èšŒ â
TiDBã¯ã忣ç°å¢ã«ãããŠãACIDç¹æ§ãå®å šã«ä¿èšŒããŸããååæ§ã¯2PCãããã³ã«ã«ãã£ãŠä¿èšŒãããäžè²«æ§ã¯å¶çŽãã§ãã¯ãšãã©ã³ã¶ã¯ã·ã§ã³åé¢ã«ãã£ãŠç¶æãããŸããå颿§ã¯MVCCãšã¹ãããã·ã§ããåé¢ã«ãã£ãŠå®çŸãããæ°žç¶æ§ã¯Raftã¬ããªã±ãŒã·ã§ã³ãšWALïŒWrite-Ahead LoggingïŒã«ãã£ãŠä¿èšŒãããŸãã
ç¹ã«éèŠãªã®ã¯ã忣ç°å¢ã§ã®å æäžè²«æ§ã®ä¿èšŒã§ããTiDBã¯ãPDãæäŸããã°ããŒãã«ã¿ã€ã ã¹ã¿ã³ãã«ãã£ãŠãç°ãªãããŒãéã§ãæç³»åçãªé åºãä¿èšŒããŸããããã«ããããããã©ã³ã¶ã¯ã·ã§ã³ã®çµæã芳枬ããåŸã®ãã©ã³ã¶ã¯ã·ã§ã³ã¯ãå¿ ãåã®ãã©ã³ã¶ã¯ã·ã§ã³ã®å¹æãåæ ããç¶æ ã§ããŒã¿ãèªã¿åãããšãã§ããŸãã
Raftã³ã³ã»ã³ãµã¹ã¢ã«ãŽãªãºã â
TiKVã¯ãããŒã¿ã®ã¬ããªã±ãŒã·ã§ã³ãšäžè²«æ§ä¿èšŒã®ããã«Raftã³ã³ã»ã³ãµã¹ã¢ã«ãŽãªãºã ã䜿çšããŠããŸã[4]ãRaftã¯ã忣ã·ã¹ãã ã«ãããŠè€æ°ã®ããŒãéã§ç¶æ ãåæåœ¢æããããã®ã¢ã«ãŽãªãºã ã§ãPaxosãããçè§£ããããèšèšãšãªã£ãŠããŸãã
ãªãŒããŒéžåº â
åRaftã°ã«ãŒãã¯ã1ã€ã®ãªãŒããŒãšè€æ°ã®ãã©ãã¯ãŒããæ§æãããŸãããªãŒããŒã¯ãã¹ãŠã®æžã蟌ã¿èŠæ±ãåŠçãããã©ãã¯ãŒã«ãã°ãšã³ããªãã¬ããªã±ãŒãããŸãããªãŒããŒãæ éããå Žåããã©ãã¯ãŒã®äžããæ°ãããªãŒããŒãéžåºãããŸãã
éžåºããã»ã¹ã¯ããã©ãã¯ãŒãäžå®æéãªãŒããŒããã®ããŒãããŒããåä¿¡ããªãã£ãå Žåã«éå§ãããŸãããã©ãã¯ãŒã¯èªèº«ã®ä»»æïŒtermïŒãã€ã³ã¯ãªã¡ã³ãããåè£è ïŒCandidateïŒç¶æ ã«é·ç§»ããŠãä»ã®ããŒãã«æç¥šãèŠæ±ããŸããéåæ°ã®ç¥šãç²åŸããåè£è ãæ°ãããªãŒããŒãšãªããŸãã
ãã°ã¬ããªã±ãŒã·ã§ã³ â
ãªãŒããŒã¯ãã¯ã©ã€ã¢ã³ãããã®æžã蟌ã¿èŠæ±ããã°ãšã³ããªãšããŠèšé²ãããã©ãã¯ãŒã«ã¬ããªã±ãŒãããŸããåãã°ãšã³ããªã«ã¯ãã³ãã³ããä»»æçªå·ãã€ã³ããã¯ã¹ãå«ãŸããŸãã
ãªãŒããŒã¯ãéåæ°ã®ããŒãïŒèªèº«ãå«ãïŒããã°ãšã³ããªãæ°žç¶åããæç¹ã§ããã®ãšã³ããªãã³ãããããŸããã³ãããããããšã³ããªã¯ãç¶æ ãã·ã³ã«é©çšãããã¯ã©ã€ã¢ã³ãã«æåå¿çãè¿ãããŸãã
å®å šæ§ã®ä¿èšŒ â
Raftã¯ä»¥äžã®å®å šæ§ãä¿èšŒããŸãïŒ
- éžåºå®å šæ§: ä»»æã®ä»»æã«ãããŠãæå€§1ã€ã®ãªãŒããŒããååšããªã
- ãã°ãããã³ã°: 2ã€ã®ãã°ãåãã€ã³ããã¯ã¹ãšä»»æã®ãšã³ããªãå«ãå Žåããã以åã®ãã¹ãŠã®ãšã³ããªãåäžã§ãã
- ãªãŒããŒå®å šæ§: ã³ãããããããšã³ããªã¯ãå°æ¥ã®ãã¹ãŠã®ãªãŒããŒã®ãã°ã«å«ãŸãã
- ç¶æ ãã·ã³å®å šæ§: ä»»æã®ããŒããç¹å®ã®ã€ã³ããã¯ã¹ã®ãã°ãšã³ããªãç¶æ ãã·ã³ã«é©çšããå Žåãä»ã®ããŒããåãã€ã³ããã¯ã¹ã«åããšã³ããªãé©çšãã
ãããã®æ§è³ªã«ããããããã¯ãŒã¯åæãããŒãé害ãçºçããŠããããŒã¿ã®äžè²«æ§ãä¿ãããŸãã
SQLå®è¡ãšã³ãžã³ â
TiDBã®SQLå®è¡ãšã³ãžã³ã¯ã忣ç°å¢ã§ã®å¹ççãªã¯ãšãªåŠçãå®çŸããããã«èšèšãããŠããŸããã¯ãšãªã®è§£æããå®è¡ãŸã§ãè€æ°ã®æ®µéãçµãŠåŠçãè¡ãããŸãã
ã¯ãšãªè§£æãšæé©å â
SQLã¯ãšãªã¯ããŸãã¬ããµãŒãšããŒãµãŒã«ãã£ãŠæœè±¡æ§ææšïŒASTïŒã«å€æãããŸããTiDBã¯yaccããŒã¹ã®ããŒãµãŒã䜿çšããŠãããMySQLã®æ§æãšã®é«ãäºææ§ãå®çŸããŠããŸãã
ããŒã¹åŸãã»ãã³ãã£ãã¯ã¢ãã©ã€ã¶ãŒãããŒãã«ãã«ã©ã ã®ååšç¢ºèªãåãã§ãã¯ãæš©é確èªãªã©ãè¡ããŸãããã®åŸãè«çæé©åãšç©çæé©åã®2段éã§å®è¡èšç»ãçæãããŸãã
è«çæé©åã§ã¯ãè¿°èªããã·ã¥ããŠã³ãã«ã©ã ãã«ãŒãã³ã°ãçµåé åºã®æé©åãªã©ãè¡ãããŸãããããã®æé©åã¯ããªã¬ãŒã·ã§ãã«ä»£æ°ã®çäŸ¡å€æèŠåã«åºã¥ããŠå®æœãããŸããäŸãã°ãWHEREå¥ã®æ¡ä»¶ãã§ããã ãããŒã¿ãœãŒã¹ã«è¿ãäœçœ®ã«ç§»åãããããšã§ãåŠçããããŒã¿éãåæžããŸãã
-- Original query
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'Japan' AND o.amount > 1000;
-- After predicate pushdown
SELECT * FROM
(SELECT * FROM orders WHERE amount > 1000) o
JOIN
(SELECT * FROM customers WHERE country = 'Japan') c
ON o.customer_id = c.id;
ç©çæé©åã§ã¯ãå©çšå¯èœãªã€ã³ããã¯ã¹ãããŒã¿ã®çµ±èšæ å ±ãã³ã¹ãèŠç©ããã«åºã¥ããŠãå ·äœçãªå®è¡æ¹æ³ã決å®ããŸããçµåã¢ã«ãŽãªãºã ïŒHash JoinãMerge JoinãIndex JoinïŒã®éžæãã¢ã¯ã»ã¹ãã¹ïŒããŒãã«ã¹ãã£ã³ãã€ã³ããã¯ã¹ã¹ãã£ã³ïŒã®æ±ºå®ãªã©ãå«ãŸããŸãã
忣å®è¡ â
TiDBã¯ãMPPïŒMassively Parallel ProcessingïŒã¢ãŒããã¯ãã£ãæ¡çšããã¯ãšãªãè€æ°ã®ã¿ã¹ã¯ã«åå²ããŠäžŠåå®è¡ããŸããå®è¡èšç»ã¯ãããŒã¿ã®å±ææ§ãèæ ®ããŠã¿ã¹ã¯ãTiKVããŒãã«é 眮ããŸãã
åTiKVããŒãã¯ãåä¿¡ããã¿ã¹ã¯ãã³ããã»ããµã§å®è¡ããŸããã³ããã»ããµã¯ããã£ã«ã¿ãªã³ã°ãå°åœ±ãéšåçãªéçŽãªã©ã®æäœãã¹ãã¬ãŒãžå±€ã§å®è¡ãããããã¯ãŒã¯è»¢ééãåæžããŸãã
ãã¯ãã«åå®è¡ â
TiDBã¯ãSIMDïŒSingle Instruction, Multiple DataïŒåœä»€ã掻çšãããã¯ãã«åå®è¡ããµããŒãããŠããŸããåŸæ¥ã®è¡æåã®å®è¡ã¢ãã«ã§ã¯ãªããè€æ°ã®è¡ããããã§åŠçããããšã§ãCPUãã£ãã·ã¥ã®å¹ççãªå©çšãšåœä»€ã¬ãã«ã®äžŠåæ§ãå®çŸããŠããŸãã
ãã¯ãã«åå®è¡ã§ã¯ãããŒã¿ã¯å圢åŒã§ã¡ã¢ãªã«é 眮ãããåãæäœãè€æ°ã®ããŒã¿èŠçŽ ã«å¯ŸããŠåæã«é©çšããŸããããã«ãããç¹ã«éçŽã¯ãšãªããã£ã«ã¿ãªã³ã°æäœã§å€§å¹ ãªæ§èœåäžãåŸãããŸãã
ã€ã³ããã¯ã¹æ§é â
TiDBã¯ãMySQLãšåæ§ã®B+ããªãŒããŒã¹ã®ã€ã³ããã¯ã¹ããµããŒãããŠããŸããã忣ç°å¢ã«é©å¿ããããã®ç¬èªã®å®è£ ãæã£ãŠããŸãã
ãã©ã€ããªããŒãšã¯ã©ã¹ã¿ãŒãã€ã³ããã¯ã¹ â
TiDBã§ã¯ããã¹ãŠã®ããŒãã«ããã©ã€ããªããŒãæã€å¿ èŠããããŸããæç€ºçã«ãã©ã€ããªããŒãæå®ãããªãå Žåãå éšçã«_tidb_rowidãšããé ãã«ã©ã ãçæãããŸãã
ããã©ã«ãã§ã¯ãTiDBã¯ã¯ã©ã¹ã¿ãŒãã€ã³ããã¯ã¹ã䜿çšããŸããããã¯ãããŒãã«ã®ããŒã¿ããã©ã€ããªããŒã®é åºã§ç©ççã«æ ŒçŽãããããšãæå³ããŸããã¯ã©ã¹ã¿ãŒãã€ã³ããã¯ã¹ã«ããããã©ã€ããªããŒã«ããç¯å²ã¹ãã£ã³ãå¹ççã«ãªããããŒã¿ã®å±ææ§ãåäžããŸãã
ã»ã«ã³ããªã€ã³ããã¯ã¹ â
ã»ã«ã³ããªã€ã³ããã¯ã¹ã¯ãã€ã³ããã¯ã¹ããŒãããã©ã€ããªããŒãžã®ãããã³ã°ãšããŠå®è£ ãããŸããã€ã³ããã¯ã¹ãšã³ããªã¯ä»¥äžã®åœ¢åŒã§TiKVã«æ ŒçŽãããŸãïŒ
Key: {table_id}_{index_id}_{index_value}_{primary_key}
Value: null (for unique index) or primary_key (for non-unique index)
ãã®èšèšã«ãããã€ã³ããã¯ã¹ã¹ãã£ã³åŸã«ãã©ã€ããªããŒãã«ãžã®ã«ãã¯ã¢ãããå¿ èŠã«ãªããŸããã忣ç°å¢ã§ã®ã€ã³ããã¯ã¹æŽæ°ã®äžè²«æ§ãä¿èšŒãããããªã£ãŠããŸãã
ã°ããŒãã«ã€ã³ããã¯ã¹ â
TiDB 5.0以éã§ã¯ãããŒãã£ã·ã§ã³ããŒãã«ã«å¯Ÿããã°ããŒãã«ã€ã³ããã¯ã¹ããµããŒããããŠããŸããåŸæ¥ã®ããŒã«ã«ã€ã³ããã¯ã¹ã§ã¯ãåããŒãã£ã·ã§ã³ãç¬ç«ããã€ã³ããã¯ã¹ãæã€ãããããŒãã£ã·ã§ã³ããŸãããæ€çŽ¢ãéå¹çã§ãããã°ããŒãã«ã€ã³ããã¯ã¹ã¯ããã¹ãŠã®ããŒãã£ã·ã§ã³ã®ããŒã¿ãåäžã®ã€ã³ããã¯ã¹æ§é ã§ç®¡çããããŒãã£ã·ã§ã³ééçãªæ€çŽ¢ãå¯èœã«ããŸãã
ã¹ãã¬ãŒãžãšã³ãžã³è©³çް â
TiKVã®åºç€ãšãªãRocksDBã¯ãFacebookãéçºããLSM-TreeïŒLog-Structured Merge-TreeïŒããŒã¹ã®ããŒããªã¥ãŒã¹ãã¢ã§ããLSM-Treeã¯ãæžãèŸŒã¿æ§èœãæé©åããããŒã¿æ§é ã§ãé æ¬¡æžã蟌ã¿ã掻çšããŠã©ã³ãã I/OãåæžããŸãã
LSM-Treeã®æ§é â
LSM-Treeã¯ãã¡ã¢ãªäžã®MemTableãšããã£ã¹ã¯äžã®è€æ°ã¬ãã«ã®SSTableïŒSorted String TableïŒããæ§æãããŸããæ°ããæžã蟌ã¿ã¯ããŸãMemTableã«è¿œå ãããäžå®ãµã€ãºã«éãããšImmutable MemTableã«å€æãããŠãããã¯ã°ã©ãŠã³ãã§SSTãã¡ã€ã«ãšããŠãã£ã¹ã¯ã«æ°žç¶åãããŸãã
åã¬ãã«ã®SSTãã¡ã€ã«ã¯ã宿çã«ã³ã³ãã¯ã·ã§ã³ãšåŒã°ããããŒãžåŠçã«ãã£ãŠæŽçãããŸããã³ã³ãã¯ã·ã§ã³ã¯ãéè€ããããŒãåé€ããåé€ããŒã«ãŒãã¯ãªãŒã³ã¢ããããããŒã¿ãäžäœã¬ãã«ã«ç§»åãããŸããããã«ãããèªã¿åãæ§èœãšã¹ãã¬ãŒãžå¹çã®ãã©ã³ã¹ãä¿ã£ãŠããŸãã
æžãèŸŒã¿æé©å â
RocksDBã¯ãæžã蟌ã¿ã¹ã«ãŒããããæå€§åããããã«è€æ°ã®æé©åæè¡ãå®è£ ããŠããŸãïŒ
- ã°ã«ãŒãã³ããã: è€æ°ã®æžã蟌ã¿èŠæ±ããããåããŠWALã«æžã蟌ãããšã§ãfsyncã®åæ°ãåæž
- 䞊åãã©ãã·ã¥: è€æ°ã®ã«ã©ã ãã¡ããªãŒã䞊åã«ãã©ãã·ã¥ããããšã§ãMemTableããSSTãžã®å€æãé«éå
- ãã€ãã©ã€ã³æžã蟌ã¿: WALæžã蟌ã¿ãšMemTableæŽæ°ã䞊åå
èªã¿åãæé©å â
èªã¿åãæ§èœãåäžãããããã以äžã®æ©èœãå®è£ ãããŠããŸãïŒ
- ãã«ãŒã ãã£ã«ã¿: åSSTãã¡ã€ã«ã«ãã«ãŒã ãã£ã«ã¿ãä»å ããããŒã®ååšç¢ºèªãé«éå
- ãããã¯ãã£ãã·ã¥: é »ç¹ã«ã¢ã¯ã»ã¹ãããããŒã¿ãããã¯ãã¡ã¢ãªã«ãã£ãã·ã¥
- ãã¬ãã£ãã¯ã¹ã·ãŒã¯: å ±éãã¬ãã£ãã¯ã¹ãæã€ããŒã®ç¯å²ã¹ãã£ã³ãæé©å
çµ±èšæ å ±ãšå®è¡èšç» â
TiDBã®ãªããã£ãã€ã¶ã¯ãæ£ç¢ºãªçµ±èšæ å ±ã«åºã¥ããŠæé©ãªå®è¡èšç»ãéžæããŸããçµ±èšæ å ±ã«ã¯ãããŒãã«ã®è¡æ°ãã«ã©ã ã®å€ååžãã€ã³ããã¯ã¹ã®éžææ§ãªã©ãå«ãŸããŸãã
ãã¹ãã°ã©ã ãšCMã¹ã±ãã â
ã«ã©ã ã®å€ååžã¯ãçæ·±ãã¹ãã°ã©ã ãšCount-Min SketchïŒCMã¹ã±ããïŒã®çµã¿åããã§è¡šçŸãããŸãããã¹ãã°ã©ã ã¯ãå€åãçããé »åºŠã®ãã±ããã«åå²ããåãã±ããã®å¢çå€ãèšé²ããŸããããã«ãããç¯å²ã¯ãšãªã®éžææ§ãæ£ç¢ºã«èŠç©ããããšãã§ããŸãã
CMã¹ã±ããã¯ãåå¥ã®å€ã®åºçŸé »åºŠãæšå®ããããã®ç¢ºççããŒã¿æ§é ã§ããç¹ã«ãç倿¡ä»¶ã®éžææ§æšå®ã«äœ¿çšãããŸããããã·ã¥é¢æ°ã®é åãšãã«ãŠã³ã¿ã®2次å é åããæ§æãããã¡ã¢ãªå¹ççã«é »åºŠæ å ±ãä¿æããŸãã
åçãªçµ±èšæŽæ° â
TiDBã¯ãèªåçã«çµ±èšæ å ±ãæŽæ°ããæ©èœãæã£ãŠããŸããããŒã¿ã®å€æŽéãéŸå€ãè¶ ãããšãããã¯ã°ã©ãŠã³ãã§çµ±èšæ å ±ã®ååéãããªã¬ãŒãããŸãããŸããã¯ãšãªå®è¡æã®ãã£ãŒãããã¯ãå©çšããŠãçµ±èšæ å ±ãæ®µéçã«æ¹åããã¡ã«ããºã ãå®è£ ãããŠããŸãã
HTAPæ©èœ â
TiDBã®HTAPïŒHybrid Transactional and Analytical ProcessingïŒã¢ãŒããã¯ãã£ã¯ãåäžã·ã¹ãã ã§OLTPãšOLAPã¯ãŒã¯ããŒããå¹ççã«åŠçããããšãå¯èœã«ããŸãã
TiFlashã®åæåã¹ãã¬ãŒãž â
TiFlashã¯ããã«ã¿ããªãŒæ§é ã䜿çšããŠåæåããŒã¿ã管çããŸããæ°ããããŒã¿ã¯æåã«ãã«ã¿å±€ã«æžã蟌ãŸããããã¯ã°ã©ãŠã³ãã§å®å®å±€ã«ããŒãžãããŸãããã®èšèšã«ããããªã¢ã«ã¿ã€ã ã®ããŒã¿æŽæ°ãšå¹ççãªåæã¯ãšãªãäž¡ç«ããŠããŸãã
åæåãã©ãŒãããã§ã¯ãåãã«ã©ã ã®ããŒã¿ãé£ç¶ããŠæ ŒçŽããããããå§çž®å¹çãå€§å¹ ã«åäžããŸããTiFlashã¯ãLZ4ãZSTDããªã©ã®å§çž®ã¢ã«ãŽãªãºã ããµããŒãããããŒã¿åã«å¿ããŠæé©ãªå§çž®æ¹åŒãéžæããŸãã
ã€ã³ããªãžã§ã³ããªã¯ãšãªã«ãŒãã£ã³ã° â
TiDBã®ãªããã£ãã€ã¶ã¯ãã¯ãšãªã®ç¹æ§ã«åºã¥ããŠãTiKVãšTiFlashã®ã©ã¡ãã§ã¯ãšãªãå®è¡ããããèªåçã«æ±ºå®ããŸããäžè¬çã«ããã€ã³ãã¯ãšãªãå°èŠæš¡ãªç¯å²ã¹ãã£ã³ã¯TiKVã§ãå€§èŠæš¡ãªéçŽã¯ãšãªã¯TiFlashã§å®è¡ãããŸãã
ãŸããTiDBã¯è€æ°ã®ã¹ãã¬ãŒãžãšã³ãžã³ãçµã¿åãããå®è¡èšç»ãçæã§ããŸããäŸãã°ãçµåæäœã®äžæ¹ã®ããŒãã«ãTiKVãããããäžæ¹ãTiFlashããèªã¿åãããšã§ãåã¹ãã¬ãŒãžãšã³ãžã³ã®ç¹æ§ãæå€§éã«æŽ»çšããŸãã
éçšãšã¢ãã¿ãªã³ã° â
TiDBã¯ã©ã¹ã¿ã®å®å®éçšã«ã¯ãé©åãªã¢ãã¿ãªã³ã°ãšãã¥ãŒãã³ã°ãäžå¯æ¬ ã§ããTiDBã¯ãPrometheusããŒã¹ã®å æ¬çãªã¡ããªã¯ã¹åéã·ã¹ãã ãæäŸããŠããŸãã
äž»èŠã¡ããªã¯ã¹ â
éçšäžéèŠãªã¡ããªã¯ã¹ã«ã¯ä»¥äžããããŸãïŒ
- QPSïŒQueries Per SecondïŒ: åTiDBããŒãã®åŠçèœåã瀺ãåºæ¬ææš
- ã¬ã€ãã³ã·: P50ãP95ãP99ããŒã»ã³ã¿ã€ã«ã§ã®ã¯ãšãªå¿çæé
- TiKVã®Raftã¹ãã¢CPU䜿çšç: ã¬ããªã±ãŒã·ã§ã³è² è·ã®ææš
- ãªãŒãžã§ã³ã®å¥å šæ§: ã¬ããªã«äžè¶³ããªãŒããŒäžåšã®ãªãŒãžã§ã³æ°
ããã©ãŒãã³ã¹ãã¥ãŒãã³ã° â
TiDBã®ããã©ãŒãã³ã¹ãæé©åããããã«ã¯ãã¯ãŒã¯ããŒãã®ç¹æ§ã«å¿ããèšå®èª¿æŽãå¿ èŠã§ããéèŠãªèšå®ãã©ã¡ãŒã¿ã«ã¯ä»¥äžããããŸãïŒ
TiDBã¬ãã«:
tidb_distsql_scan_concurrency
: 忣ã¹ãã£ã³ã®äžŠå床tidb_index_join_batch_size
: ã€ã³ããã¯ã¹çµåã®ããããµã€ãºtidb_mem_quota_query
: ã¯ãšãªããšã®ã¡ã¢ãªå¶é
TiKVã¬ãã«:
storage.block-cache.capacity
: RocksDBã®ãããã¯ãã£ãã·ã¥ãµã€ãºraftstore.apply-pool-size
: Raftãã°é©çšã®äžŠå床coprocessor.region-max-keys
: ãªãŒãžã§ã³åå²ã®éŸå€
éå®³å¯Ÿå¿ â
TiDBã¯é«å¯çšæ§ãå®çŸããŠããŸãããé害çºçæã®é©åãªå¯Ÿå¿ãéèŠã§ããäžè¬çãªé害ã·ããªãªãšå¯Ÿå¿æ¹æ³ïŒ
TiKVããŒãé害: Raftã¬ããªã±ãŒã·ã§ã³ã«ããèªåçã«ãã§ã€ã«ãªãŒããŒãå®è¡ãããŸããPDã¯æ°ããã¬ããªã«ãäœæããŠã¬ããªã«æ°ãç¶æããŸãã
ãããã¯ãŒã¯åæ: Raftã®éåæ°ååã«ããããã€ããªãã£åŽã®ããŒãã£ã·ã§ã³ã¯æžã蟌ã¿ãåãä»ããªããªããŸãããããã¯ãŒã¯åŸ©æ§åŸãèªåçã«åæãåéãããŸãã
ãã£ã¹ã¯å®¹éäžè¶³: TiKVã¯æžã蟌ã¿ãæåŠãå§ããŸããäžèŠãªããŒã¿ã®åé€ãããŒãã®è¿œå ã§å¯Ÿå¿ããŸãã
ã»ãã¥ãªãã£æ©èœ â
TiDBã¯ããšã³ã¿ãŒãã©ã€ãºç°å¢ã§èŠæ±ãããå æ¬çãªã»ãã¥ãªãã£æ©èœãæäŸããŠããŸãã
èªèšŒãšæš©é管ç â
TiDBã¯ãMySQLãšäºææ§ã®ããæš©éã·ã¹ãã ãå®è£ ããŠããŸãããŠãŒã¶ãŒãããŒã«ãæš©éã®æŠå¿µããµããŒããã现ããç²åºŠã§ã®ã¢ã¯ã»ã¹å¶åŸ¡ãå¯èœã§ãã
-- Create user with specific privileges
CREATE USER 'analyst'@'%' IDENTIFIED BY 'password';
GRANT SELECT ON database.* TO 'analyst'@'%';
-- Role-based access control
CREATE ROLE 'read_only';
GRANT SELECT ON *.* TO 'read_only';
GRANT 'read_only' TO 'analyst'@'%';
æå·å â
TiDBã¯ã転éäžããŒã¿ãšä¿åããŒã¿ã®äž¡æ¹ã®æå·åããµããŒãããŠããŸãïŒ
- TLS/SSLéä¿¡: ã¯ã©ã€ã¢ã³ã-TiDBéãããã³ã³ã³ããŒãã³ãéã®éä¿¡ãTLSã§æå·å
- ééçããŒã¿æå·åïŒTDEïŒ: TiKVã«ä¿åãããããŒã¿ãAESæå·åã§ä¿è·
- æå·åããŒç®¡ç: AWS KMSãªã©ã®å€éšããŒç®¡çã·ã¹ãã ãšã®çµ±å
ç£æ»ãã° â
TiDBã¯ãããŒã¿ããŒã¹ã¢ã¯ã»ã¹ã®è©³çްãªç£æ»ãã°ãçæã§ããŸããç£æ»ãã°ã«ã¯ãå®è¡ãããSQLæãã¢ã¯ã»ã¹æå»ããŠãŒã¶ãŒæ å ±ãå®è¡çµæãªã©ãèšé²ãããŸããã³ã³ãã©ã€ã¢ã³ã¹èŠä»¶ã«å¿ããŠããã°ã®ä¿ææéããã£ã«ã¿ãªã³ã°æ¡ä»¶ãèšå®ã§ããŸãã
ããŒãã£ã·ã§ãã³ã° â
TiDBã¯ãå€§èŠæš¡ããŒãã«ã®ç®¡çã容æã«ããããã®ããŒãã£ã·ã§ãã³ã°æ©èœãæäŸããŠããŸããããŒãã£ã·ã§ãã³ã°ã«ãããè«ççã«1ã€ã®ããŒãã«ãç©ççã«è€æ°ã®éšåã«åå²ã§ããŸãã
ããŒãã£ã·ã§ã³ã¿ã€ã â
TiDBããµããŒãããããŒãã£ã·ã§ã³ã¿ã€ãïŒ
- Range Partitioning: ã«ã©ã å€ã®ç¯å²ã«åºã¥ããŠããŒãã£ã·ã§ã³åå²
- List Partitioning: ã«ã©ã å€ã®é¢æ£çãªãªã¹ãã«åºã¥ããŠåå²
- Hash Partitioning: ããã·ã¥é¢æ°ã䜿çšããŠåçã«åå²
- Key Partitioning: å éšããã·ã¥é¢æ°ã䜿çšããåå²
-- Range partitioning example
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024)
);
ããŒãã£ã·ã§ã³ãã«ãŒãã³ã° â
ã¯ãšãªå®è¡æãTiDBã®ãªããã£ãã€ã¶ã¯ãWHEREå¥ã®æ¡ä»¶ã«åºã¥ããŠäžèŠãªããŒãã£ã·ã§ã³ãé€å€ããŸããããã«ãããã¹ãã£ã³ããããŒã¿éãåæžãããã¯ãšãªæ§èœãåäžããŸãã
Change Data Capture (CDC) â
TiCDCã¯ãTiDBã®Change Data Captureã³ã³ããŒãã³ãã§ãããŒã¿å€æŽããªã¢ã«ã¿ã€ã ã§ãã£ããã£ããäžæµã·ã¹ãã ã«é ä¿¡ããŸãã
ã¢ãŒããã¯ã㣠â
TiCDCã¯ãTiKVã®Raftãã°ãèªã¿åãã倿Žã€ãã³ããæœåºããŸããè€æ°ã®CDCããŒãã«ããæ°Žå¹³ã¹ã±ãŒã©ããªãã£ããµããŒãããé«å¯çšæ§ãå®çŸããŠããŸãã
é ä¿¡ä¿èšŒ â
TiCDCã¯ãå°ãªããšãäžåºŠïŒat-least-onceïŒã®é ä¿¡ä¿èšŒãæäŸããŸãããŸããã€ãã³ãã®é åºæ§ãä¿èšŒãããŠãããåäžè¡ã«å¯Ÿãã倿Žã¯ãçºçé åºéãã«é ä¿¡ãããŸããããã«ãããäžæµã·ã¹ãã ã§ããŒã¿ã®äžè²«æ§ãç¶æã§ããŸãã
ãŸãšã â
TiDBã¯ãåŸæ¥ã®ãªã¬ãŒã·ã§ãã«ããŒã¿ããŒã¹ã®å©äŸ¿æ§ãšã忣ã·ã¹ãã ã®ã¹ã±ãŒã©ããªãã£ãèåãã驿°çãªããŒã¿ããŒã¹ã·ã¹ãã ã§ããRaftã«ãã匷äžè²«æ§ãPercolatorããŒã¹ã®åæ£ãã©ã³ã¶ã¯ã·ã§ã³ãTiFlashã«ããHTAPæ©èœãªã©ãçŸä»£ã®ããŒã¿åŠçèŠæ±ã«å¿ããå æ¬çãªæ©èœãæäŸããŠããŸãããã®èšèšææ³ãšå®è£ ã¯ãå€§èŠæš¡åæ£ããŒã¿ããŒã¹ã·ã¹ãã ã®æ§ç¯ã«ãããéèŠãªåèäºäŸãšãªã£ãŠããŸãã
Diego Ongaro and John Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX ATC 2014. â©ïž
Siddon Tang et al. "TiKV: A Distributed Transactional Key-Value Database." â©ïž
Daniel Peng and Frank Dabek. "Large-scale Incremental Processing Using Distributed Transactions and Notifications." OSDI 2010. â©ïž
Leslie Lamport. "The Part-Time Parliament." ACM Transactions on Computer Systems, 1998. â©ïž