If `escaper::decode_html_buf_sloppy()` just truncates the text (which happens when it fails to
html-decode it at some position), then it's probably not HTML at all and should be left as
is. That's what happens with hyperlinks f.e. and there was even a test on this wrong behaviour which
is fixed now. So, now hyperlinks are not truncated in messages and should work as expected.
simplify() is written to process incoming plaintext messages
and extract footers and quotes from them.
Incoming messages contain various quote styles
and simplify() implements heuristics to detects them.
If dehtml() output is processed by simplify(),
simplify() heuristics may erroneously detect
footers and quotes in produced plaintext.
dehtml() should directly detect quotes
instead of converting them to plaintext quotes
for parsing with simplify().
* draft API to deal with uncut message texts
* add column mime_modified
* add mime_modified flag to MimeParser and save it in the database
* save mime_headers also when mime_modified is set
* cargo fmt
* set mime_modified on parsed html-texts and when there are multiple alternative-parts; add test for that
* prototype functions, add to repl and ffi
* use correct mime_modified flag
* basically parse Mime-Structure to HTML
* add basic tests for HTML-parsing
* convert text/plain to html for getting original
* respect charset for plain texts
* make test more specific
* fix handling non-utf-8 charsets for plain messages
* add test for plain_to_html()
* add failing test for plaintext linkify
* linkify urls in plain text
* fix regex
* plain text linkify: add failing test for encapsulated links as <https://domain.com>
* plain text linkify: make encapsulated links as <https://domain.com> work
* plain text linkify: require word boundary at beginning of link, add tests for that
* plain text linkify: linkify emails
* plain text: support format=flowed
* plain text: support quotes
* make clippy happy
* set mime-modified also when simplify() cuts non-html messages, add tests for that
* streamline mime recursion
* repl tool: write original html to file for further processing
* convert cid:- to data:-protocol
* add a test for cid: to data: conversion
* make clippy happy
* fix html-tests to work with windows-lineends
* clarify what the returned html-code may contain
* add some more detailed doc comments
* add mime_modified column only if not exist
this additional check is needed
as the column may added with another dbversion in
some shipped beta-versions.
* incorporate documentation suggestions from review
* rename get_original_mime_html() to more simple get_html()
* rename api is_mime_modified() to more simple has_html(); internally, mime_modified-flag stays as-is, however
* rename MimeS to MimeMultipartType
* do not set mime-modified flag for encrypted messages that need extra-handling for saved mime-structure
* fix typo
* move get_msg_html() to MsgId.get_html()
* incorporate more documentation suggestions from review
* remove unused return value from collect_texts_recursive()
* avoid mime_modified being mutable in write-parts-loop
* move 'use futures::future::FutureExt' atop of html.rs
* move attributes defining plain-text to a dedicated structure
* more PlainText to separate file
* escape cid when building regex
* let dc_get_msg_html() return NULL when calling with bad param
This fixes#1804 in two ways: First, it removes a <!doctype html> from
the start of the mail, if there is any.
Then, it parses the html itself it quick-xml fails, just stripping
everything between < and >.
Both of these would have fixed this specific issue.
Also, add tests for both fixes.