AWS RDS 를 스냅샷으로 복구하였을 때 성능 이슈 발생

사랑초 2023. 6. 2. 14:45

오늘의 지식 2023.05.31


Production 환경의 RDS 를 스냅샷(Snapshot)을 통해 복구하는 경우 성능이 매우 느려지는 문제가 간혹 발생한다고 합니다.



Why is my RDS/EBS volume slow after restoring from a snapshot? - Someone else's computer

<p><em>Certainly the cloud is going to solve all our problems.</em></p><p><img src="" alt="Dog on top of a turtle walking around" /></p><p>Recently, we came across a very unusual scenario.</p><p>My team


New EBS volumes receive their maximum performance the moment that they are available and do not require initialization (formerly known as pre-warming). However, storage blocks on volumes that were restored from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. Performance is restored after the data is accessed once.
For most applications, amortizing the initialization cost over the lifetime of the volume is acceptable. To ensure that your restored volume always functions at peak capacity in production, you can force the immediate initialization of the entire volume using dd or fio.


따라서 큰 용량의 RDS 를 스냅샷에서 복구 하는 경우 dd 나 fio 를 이용해 전체 볼륨을 한번 초기화 해주는 게 좋다고 합니다.


이 현상의 원인은 스냅샷이 동작하는 로직에 있습니다.


Each snapshot contains all of the information that is needed to restore your data (from the moment when the snapshot was taken) to a new EBS volume. When you create an EBS volume based on a snapshot, the new volume begins as an exact replica of the original volume that was used to create the snapshot. The replicated volume loads data in the background so that you can begin using it immediately. If you access data that hasn't been loaded yet, the volume immediately downloads the requested data from Amazon S3, and then continues loading the rest of the volume's data in the background. For more information, see Create Amazon EBS snapshots.


Amazon EBS snapshots - Amazon Elastic Compute Cloud

The diagram assumes that you own Vol 1 and Snap A, and that Vol 2 is encrypted with the same KMS key as Vol 1. If Vol 1 was owned by another AWS account and that account took Snap A and shared it with you, then Snap B would be a full snapshot. Or, if Vol 2


아직 로드되지 않은 데이터에 액세스하면 볼륨이 Amazon S3에서 요청된 데이터를 즉시 다운로드한 다음 백그라운드에서 볼륨의 나머지 데이터를 계속 로드합니다.


→ 새로 생성된 RDS 에서 로드되지 않은 데이터는 S3 에서 최초로 가져오게 되는 데 이 동작에서 Hang 이 발생하는 것입니다.