Skip to Content

How to Speed Up Rsync of a Software RAID 6 Array

Rsync is a powerful software raid tool for synchronizing files and directories between two locations, either locally or remotely. It can be used for backup, mirroring, or migration purposes. However, rsync can also be slow, especially when dealing with large amounts of data or complex file systems.

One of the common scenarios where rsync performance can suffer is when syncing a software RAID 6 array. RAID 6 is a type of RAID (Redundant Array of Independent Disks) that provides both data redundancy and fault tolerance by using two parity blocks distributed across multiple disks. This means that even if two disks fail, the data can still be recovered from the remaining disks.

How to Speed Up Rsync of a Software RAID 6 Array

However, RAID 6 also has some drawbacks, such as higher write overhead, lower write performance, and longer rebuild time. These factors can affect the speed of rsync, especially if the array is not fully synchronized or optimized.

In this article, we will look at a real-world example of how to speed up rsync of a software RAID 6 array. We will also provide some general tips and best practices for improving rsync performance and avoiding common pitfalls.

Problem: How to Speed Up Rsync of a Software RAID 6 Array

The questioner has a software RAID 6 array with 8 x 8 TB disks, formatted with ext4 file system. The array is mounted on /mnt/raid6 and contains about 30 TB of data. The questioner wants to sync the array to another server using rsync over SSH, but the transfer speed is very slow, around 40 MB/s.

The questioner has tried various rsync options, such as -a (archive mode), -z (compression), –inplace (update destination files in-place), –no-whole-file (use delta-transfer algorithm), –sparse (handle sparse files efficiently), and –bwlimit (limit bandwidth usage). However, none of these options made any significant difference in the transfer speed.

The questioner also checked the CPU, memory, disk, and network usage on both servers, but none of them seemed to be the bottleneck. The questioner suspected that the problem might be related to the RAID 6 array itself, but was not sure how to diagnose or fix it.

Solution 1

Check the status and health of the RAID 6 array using mdadm, a tool for managing software RAID devices. To do this, run mdadm –detail /dev/md0 (assuming /dev/md0 is the device name of the array). Look for any errors or warnings, such as failed disks, degraded state, or resync/recovery in progress. If there are any issues with the array, fix them before proceeding with rsync.

Solution 2

Optimize the ext4 file system on the RAID 6 array using e2fsck, a tool for checking and repairing ext4 file systems. To do this, run e2fsck -f -y /dev/md0 (assuming /dev/md0 is the device name of the array). This will force a full check and repair of the file system, and automatically fix any errors found. Note that this may take a long time to complete, depending on the size and condition of the file system.

Solution 3

Tune the ext4 file system parameters using tune2fs, a tool for adjusting various aspects of ext4 file systems. To do this, run tune2fs -o journal_data_writeback /dev/md0 (assuming /dev/md0 is the device name of the array). This will enable writeback mode for journaling data, which can improve write performance by reducing disk I/O. However, this also increases the risk of data corruption in case of power failure or system crash, so make sure you have a reliable backup before using this option.

Solution 4

Mount the ext4 file system with optimal mount options using mount, a tool for mounting file systems. To do this, edit /etc/fstab, a configuration file that specifies how file systems are mounted at boot time. Add or modify the following line:

/dev/md0 /mnt/raid6 ext4 defaults,noatime,nodiratime,data=writeback,nobarrier 0 0

This will mount /dev/md0 on /mnt/raid6 with ext4 file system and default options, plus:

  • noatime: Do not update access time for files and directories. This can reduce disk I/O and improve performance.
  • nodiratime: Do not update access time for directories. This can reduce disk I/O and improve performance.
  • data=writeback: Use writeback mode for journaling data. This is consistent with the previous step of tuning the file system with tune2fs
  • nobarrier: Do not issue flush commands to the disk after writing data. This can improve write performance by reducing disk I/O, but also increases the risk of data loss in case of power failure or system crash.

After editing /etc/fstab, run mount -a to apply the changes, or reboot the system.

Solution 5

Use optimal rsync options for transferring large amounts of data over SSH. To do this, run rsync with the following options:

rsync -avz --progress --inplace --no-whole-file --sparse /mnt/raid6/ user@remote_server:/path/to/destination/

This will sync /mnt/raid6/ to /path/to/destination/ on the remote server using SSH, with the following options:

  • -a: Archive mode. This preserves file attributes, permissions, ownership, timestamps, links, etc.
  • -v: Verbose mode. This prints detailed information about the transfer progress and statistics.
  • -z: Compression mode. This compresses the data before sending it over the network, which can reduce bandwidth usage and improve transfer speed.
  • –progress: Progress mode. This shows the percentage of completion and estimated time remaining for each file.
  • –inplace: In-place mode. This updates the destination files in-place, rather than creating temporary files and renaming them. This can save disk space and reduce disk I/O, but also increases the risk of data corruption in case of interruption or failure.
  • –no-whole-file: Delta-transfer mode. This transfers only the changed parts of files, rather than the whole files. This can reduce bandwidth usage and improve transfer speed, especially for large files that have small changes.
  • –sparse: Sparse mode. This handles sparse files efficiently, by not transferring or writing zero blocks. This can save disk space and reduce disk I/O.

Additional Steps

When you’re dealing with a software RAID setup, optimizing the resync speed requires special attention. Let’s explore additional steps to boost the speed of your software RAID resync:

  • CPU Performance: Software RAID heavily relies on CPU power. Make sure your CPU isn’t the bottleneck during the resync process. Monitor CPU utilization and consider upgrading if necessary.
  • I/O Scheduler: I/O scheduler adjustments can still make a difference for software RAID. Experiment with different schedulers to find the one that offers the best performance for your specific workload.
  • Kernel Parameters: You’ve already tweaked some kernel parameters, but there’s more to explore. For instance, consider increasing the md_resync_max_rate parameter to allow for a higher resync rate.
  • Filesystem and Block Size: Ensure that your filesystem and block sizes are optimized for your specific tasks. Larger block sizes can enhance performance for larger files, while smaller ones might benefit smaller files.
  • Monitor Memory Usage: Excessive memory usage can lead to performance issues. Ensure your system has sufficient RAM and monitor memory usage during the resync.
  • Check for Software Updates: Keep your Linux distribution and RAID software up-to-date. Updates often include performance enhancements and bug fixes.
  • Optimize Disk Alignment: Proper alignment of SSD partitions is crucial. Misalignment can significantly impact performance, especially on SSDs.
  • RAID Chunk Size: Similar to hardware RAID, software RAID enables you to configure the chunk size. Experiment with different chunk sizes to see if it affects resync speed.
  • Parallel Resync: Depending on your software RAID implementation (e.g., mdadm), you may have the option to configure parallel resync, which can significantly speed up the process.
  • Backup and Restore: In some scenarios, it might be faster to create a new RAID array and restore data from backups, rather than waiting for a slow resync, especially if you regularly back up your data.

Remember to make changes incrementally, test their impact, and monitor system stability. This ensures that any optimizations do not introduce instability or data corruption. By following these steps, you can fine-tune your software RAID and significantly improve the resync speed for a smoother experience.

Tips and Best Practices

Here are some general tips and best practices for improving rsync performance and avoiding common pitfalls:

  • Use a fast and reliable network connection between the source and destination servers. If possible, use a wired connection rather than a wireless connection, and avoid network congestion or interference.
  • Use SSH keys rather than passwords for authentication. This can reduce the overhead of establishing and maintaining SSH connections, and improve security as well.
  • Use parallel rsync processes rather than a single rsync process. This can take advantage of multiple CPU cores and network interfaces, and improve transfer speed by distributing the load. However, this also increases the complexity and resource consumption of the rsync operation, so use it with caution and moderation.
  • Exclude unnecessary files or directories from the rsync operation. This can reduce the amount of data to be transferred and processed, and improve transfer speed by focusing on the relevant data. You can use –exclude or –exclude-from options to specify patterns or files that contain patterns to be excluded.
  • Keep your file system clean and organized. This can reduce the fragmentation and overhead of the file system, and improve disk I/O performance by optimizing the layout and allocation of files and directories.

Conclusion

In this article, we have learned how to speed up rsync of a software RAID 6 array, based on a real-world example from serverfault.com. We have followed several steps to check, optimize, tune, and mount the RAID 6 array and its ext4 file system, and to use optimal rsync options for transferring large amounts of data over SSH. We have also shared some general tips and best practices for improving rsync performance and avoiding common pitfalls.

Rsync is a powerful tool for synchronizing files and directories between two locations, either locally or remotely. It can be used for backup, mirroring, or migration purposes. However, rsync can also be slow, especially when dealing with large amounts of data or complex file systems. Software RAID 6 is a type of RAID (Redundant Array of Independent Disks) that provides both data redundancy and fault tolerance by using two parity blocks distributed across multiple disks.

Software RAID 6 can tolerate up to two disk failures without losing any data. However, software RAID 6 also has some drawbacks, such as higher write overhead, lower write performance, and longer rebuild time. These factors can affect the speed of rsync, especially if the array is not fully synchronized or optimized.

By following the steps and tips in this article, you can improve the speed of rsync of a software RAID 6 array significantly, and save time and resources in your backup, mirroring, or migration tasks. However, you should also be aware of the risks and trade-offs involved in some of the steps and options, such as data corruption or loss in case of power failure or system crash. Therefore, you should always have a reliable backup before making any changes to your file system or using rsync.

We hope you enjoyed this article and found it useful. If you have any questions or feedback, please feel free to leave a comment below or contact us through our website.