Crate sanitize_filename_reader_friendly

Source
Expand description

Converts strings in a file system friendly and human readable form.

The algorithm replaces or deletes characters from the input stream using various filters that are applied in the following sequential order:

  1. Replace all whitespace with space.
  2. Filter all control characters.
  3. REPLACE_ORIG_WITH_UNDERSCORE
  4. REPLACE_ORIG_WITH_SPACE
  5. FILTER_PROCESSED_AFTER_LAST_PROCESSED_WAS_SPACE
  6. FILTER_PROCESSED_AFTER_LAST_PROCESSED_WAS_UNDERSCORE
  7. FILTER_ORIG_AFTER_LAST_PROCESSED_WAS_WHITESPACE
  8. FILTER_ORIG_NON_PRINTING_CHARS
  9. TRIM_LINE_CHARS
  10. INSERT_LINE_SEPARATOR
  11. TRIM_END_LINES

For details see the definition and documentation of the above (private) constants.

§Rationale

Exclude NTFS critical characters: <>:"\/|?*
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx

These are considered unsafe in URLs: <>#%{}|\^~[] `
https://perishablepress.com/stop-using-unsafe-characters-in-urls/

New in version 2.0.0: Do not exclude restricted in FAT32: +,;=[]
https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words

use sanitize_filename_reader_friendly::sanitize;
let output = sanitize("Read: http://blog.getreu.net/projects/tp-note/");
assert_eq!(output, "Read_ http_blog.getreu.net_projects_tp-note");

The output string’s length is guaranteed to be shorter or equal than the input string’s length.

Constants§

  • A set of characters that is always replaced or filtered and will never appear in the output stream. Please note that additionally to the above : all is_whitespace() characters are always replaced by space and all is_control() characters are always filtered.
  • An unordered list of all characters that are potentially replaced under certain conditions. Please note that additionally to the above : all is_whitespace() characters are always replaced by space and all is_control() characters are always filtered.
  • Group characters into lines (separated by newlines) and trim both sides of all lines by the set of the quoted characters. In addition to the listed characters whitespace is trimmed too. As the filter operates line by line, it guarantees, that none of the listed characters can appear at the beginning or the at end of the output string.

Functions§

  • Converts strings in a file system friendly and human readable form.