Many Samsung SSD products now include support for Flexible Data Placement (FDP), a way to optimize drive operation by taking placement directives from the host. We’ve heard about data placement methodologies before, first with NVMe Streams and again with NVMe Zoned Namespaces (ZNS). Why is there a need for yet another way to do it?
Data Placement Background
Let’s review why data placement capabilities are valuable, and what has been attempted until now to facilitate this practice. We’ll start with a quick review of the SSD factors involved.
WAF. Write Amplification Factor (WAF) is a ratio of the amount of data the SSD actually has to manipulate to store data writes versus the amount of data the host initially wrote. It starts out as a ratio of 1:1 for an empty drive but gets worse over time as old data gets updated. Even if the host stays well below the capacity of the drive, writing replacement data causes old pages to be invalidated and their replacement data to be written elsewhere – eventually filling up the drive and forcing old blocks to be erased and reused.
Block Erase. The NAND storage in an SSD is internally organized by blocks and pages. Typically, a block consists of several hundred pages. NAND can be written on a page-by-page basis, but only erased at the block level – the whole block must be wiped clean if any already-used page within it must be updated or overwritten. Accordingly, when a host writes data to an SSD, the SSD controller may need to read, erase, and then write back the data multiple times in order to complete the operation.
Garbage Collection. As outdated data is replaced during these read - erase – write cycles, and invalidated pages build up, the SSD controller orchestrates internal garbage collection (GC) to reorganize pages and reuse blocks. Because NAND blocks can only be erased and reused a limited number of times before starting to fail, the controller does so in a way to spread out block erasure evenly throughout the physical device. Nonetheless, without directives from the host as to the lifetime of data it writes, it is difficult for the SSD controller to minimize garbage collection. The more frequent the GC, the more power is consumed, and the shorter the time the drive will last.
Superblock. The SSD blocks physically reside in what is referred to as a Superblock. These can be in a single NAND die or can span multiple NAND dice in the same package, depending on the SSD architecture. Knowing how blocks align to Superblocks is important to the host, because the host can organize the data writes to avoid crossing Superblock boundaries.
Data Placement. A key means of reducing WAF and minimizing garbage collection is to write pages in such a way that data is segregated, such as by static data vs. data that will be updated frequently (cold vs. hot, or short vs. long lifetime). If the host knows characteristics like this about the data, then it can proactively ask the drive to place the data with like data to reduce WAF as well as GC frequency.
The NVMe specification previously incorporated two methods to allow such segregation.
- With Streams, the host specifies a stream number associated with each page of data to be written. The controller then decides how and where to place this data so that it is handled efficiently with other data of the same type. In this method, the host has no direct control over physical placement within the SSD, and is limited to a single namespace per stream.
- With Zoned Namespaces (ZNS), the host specifies a zone within the SSD for each page of data to be written. The overall host software task is complex with ZNS, requiring that writes be sequential and therefore that the host “stage” random data writes to make them appear to the drive as a sequential write. In this method, the host has more direct control over the placement of data within the SSD – but only at a Superblock level.
The limitations of these two methods made them less than optimal for many existing applications. As a result, FDP was conceived and added to the NVMe specification as another option.
FDP Features and Benefits
Flexible Data Placement removes the restrictions of Streams and ZNS by introducing the concept of a Reclaim Unit, describing the physical part of the SSD into which the host can direct the SSD to place data.
Additional features are noteworthy.
For many applications, these combined features make FDP a more practical data placement choice than the other options.
Power Consumption. Allowing the host to optimize data placement on the drive leads to lower write amplification factor (WAF), which in turn reduces garbage collection (GC) frequency. This reduction in GC frequency not only provides significant improvements in power consumption but also results in a corresponding temperature reduction. For example, a typical application of FDP in a Samsung SSD has the following performance for a 4T/8T capacity drive.
- Seq. Read 10,700MB/s, Seq. Write 5,500MB/s
- Random Read 1,600 KIOPS, Random Write 280K IOPS
Enabling FDP for this case results in a 43.6% reduction of power consumption.
With the example device having an E1.S 25mmT form factor, it is easy to see that the reduced power consumption will beneficially affect power supply and cooling requirements for the server, paving the way to a lower-cost solution.
Ease of Utilization. The benefits of FDP can be realized even with a limited investment in software changes. Linux support is provided at the Filesystem/Block IO level for both application-driven and filesystem-driven placement. I/O pass-thru support is also offered.
In both cases, the host is given a means of including “write hints” when writing data – essentially telling the drive whether the data lifetime is expected to be short, medium, long, or extremely long. With this information, the drive maps that write to the NAND area that the host has previously identified for that lifetime.
Moreover, since FDP provides a feedback mechanism for drive statistics, the Linux support also includes a way to feed this data back to the host – allowing the host, for example, to learn how many garbage collection cycles are taking place. This feedback path lets the host actively monitor the success of its write activities to fine-tune and optimize drive operation.
Finally, the extensive analysis possible using CacheLib along with other provided tools make it easy to prove out the return on investment of any host software changes being considered to utilize FDP for a given application.
Summary
Flexible Data Placement is a no-cost, easy-to-implement, and easy-to-utilize means to reduce SSD power consumption, lessen over-temperature concerns, and increase drive life by minimizing WAF. Its advantages over previous data placement methods make it an ideal solution for many applications.
Samsung will once again be attending the Future of Memory and Storage (FMS) in 2024. FDP will be part of the FMS demo showcase. Be sure to stop by to see what all the excitement is about!