Journaling in Filesystems: What Is It and Why Does It Matter?

Journaling

Introduction

When you are working with computers/servers, your data is stored on a filesystem – which acts like a giant, organized filing cabinet. But what happens if your computer crashes or loses power in the middle of a file operation? You could lose data, or worse, your filesystem could become corrupted. This is where journaling comes in. In this blog post, I will explain what journaling is, how it works, and how it helps protect your data in simple terms.

What is Journaling?

Journaling is a process used by some filesystems to keep track of changes before they are made to the main filesystem. In simple terms, think of it like writing down a to-do list before starting a project. In case something goes wrong, you can always check your list and know exactly what you were in the middle of doing.

Here is how it works?

  • When you make changes to a file(for example, creating a new file or moving a directory), the filesystem first writes the changes to the journal
  • The journal temporarily stores this information. Once everything is safely written in the journal, the changes are made to the actual filesystem.
  • If the system crashes before the operation completes, the journal is used to “replay” or “roll back” the incomplete actions when the system restarts. This prevents corruption and helps you recover from crashes.

Wait, since the data is first written to the journal and then to the filesystem, is the journal stored in memory? Isn’t this duplication? I got similar questions. Let’s try to answer them one by one.

Does the Journal Reside in Memory?

No, the journal is stored on disk, not in memory. It is written to a reserved part of the disk, ensuring that in the event of a crash, the journal entries are not lost. However, the OS might cache some of the journal writes in memory to improve performance temporarily before flushing them to disk

Is it Duplication?

Yes, journaling creates some duplication of data temporarily because:

  • It is necessary for data safety. If the system crashes before the data is fully written to disk, the journal can “replay” the logged changes to ensure filesystem consistency.
  • This temporary duplication is a small trade-off for improved fault tolerance and faster recovery after a crash.

Types of Journaling

In Ext4 Filesystem

  • Metadata-only journaling(ordered): Only changes to metadata(information about files like file names, timestamps, inode) are recorded in the journal. Its default in ext4 filesystem.
  • Full data journaling(journaled): Both file data and metadata are written to the journal. This provides more protection but can slow down performance.
  • Writeback: In writeback mode, only the metadata is journaled, and the file data is written directly to the disk without any specific order. This mode prioritizes performance over data safety.

In XFS Filesystem

XFS uses a metadata-only journaling approach by default, which means that only changes to the filesystem’s metadata(such as file creation, deletion, or changes in file attributes, inode) are logged in the journal.

How Journaling Helps During a Crash?

Let’s say you are editing a file and suddenly the power goes out. If your system uses a journaling filesystem, here is what happens when the system restarts:

  • The filesystem will check the journal and find the incomplete operation.
  • It will either finish the operation or undo it, depending on how much was done before the crash.
  • Your system is up and running again, without needing to perform a long and tedious disk check.
  • Without journaling, the system would need to scan the entire disk to look for errors, which could take a long time and potentially lead to data loss or corruption.

Tools for Manual Recovery

The fsck tool is commonly used in Linux to scan and repair filesystem issues. In journaling filesystems like ext4, fsck leverages the journal to speed up recovery after a crash. Because the journal records the last changes made before a crash, fsck can quickly check the journal and repair any incomplete transactions, rather than scanning the entire disk.

In most cases, since journaling logs the changes in advance, fsck can ensure minimal or no data loss.

In xfs filesystem, xfs_repair is used.

Checking the Journal

On an ext4 filesystem, you can check the journal and gather information using the dumpe2fs command:

  • First, identify your filesystem:
df -h
  • Then, use dumpe2fs to view details about the filesystem, including the journal:
dumpe2fs <device> | grep -i journal
  • Checking the Journal in XFS Filesystem

In xfs, journaling is more integrated into the filesystem, and you can use the xfs_info command to get journal(log) information:

  • Check the journal size and location with:
xfs_info /mount/point
  • Sample output from my lab
# dumpe2fs /dev/sda1 | grep -i journal
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Journal inode:            8
Journal backup:           inode blocks
Journal features:         journal_64bit
Journal size:             32M
Journal length:           8192
Journal sequence:         0x000001a4
Journal start:            1


# xfs_info /rhosp_virt
meta-data=/dev/mapper/vglocal00-rhosp_virt isize=512    agcount=1608, agsize=163840 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=263454720, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2 
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
  • You can also use xfs_logprint to print the contents of the log, though it’s not human-readable and is usually for debugging purposes:
xfs_logprint <device>
  • Further you can also check dmesg from the system for journaling related activities

Conclusion

Journaling is a safety net for your filesystem, ensuring that it can recover from crashes and system failures with minimal data loss. Whether you are using ext4 or XFS, journaling helps maintain the integrity of your files and allows for quicker recovery.

Next time you think about your filesystem, remember that journaling is quietly working in the background to keep your data safe! 🙂

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top