add option to access original message (#2125)

* draft API to deal with uncut message texts

* add column mime_modified

* add mime_modified flag to MimeParser and save it in the database

* save mime_headers also when mime_modified is set

* cargo fmt

* set mime_modified on parsed html-texts and when there are multiple alternative-parts; add test for that

* prototype functions, add to repl and ffi

* use correct mime_modified flag

* basically parse Mime-Structure to HTML

* add basic tests for HTML-parsing

* convert text/plain to html for getting original

* respect charset for plain texts

* make test more specific

* fix handling non-utf-8 charsets for plain messages

* add test for plain_to_html()

* add failing test for plaintext linkify

* linkify urls in plain text

* fix regex

* plain text linkify: add failing test for encapsulated links as <https://domain.com>

* plain text linkify: make encapsulated links as <https://domain.com> work

* plain text linkify: require word boundary at beginning of link, add tests for that

* plain text linkify: linkify emails

* plain text: support format=flowed

* plain text: support quotes

* make clippy happy

* set mime-modified also when simplify() cuts non-html messages, add tests for that

* streamline mime recursion

* repl tool: write original html to file for further processing

* convert cid:- to data:-protocol

* add a test for cid: to data: conversion

* make clippy happy

* fix html-tests to work with windows-lineends

* clarify what the returned html-code may contain

* add some more detailed doc comments

* add mime_modified column only if not exist

this additional check is needed
as the column may added with another dbversion in
some shipped beta-versions.

* incorporate documentation suggestions from review

* rename get_original_mime_html() to more simple get_html()

* rename api is_mime_modified() to more simple has_html(); internally, mime_modified-flag stays as-is, however

* rename MimeS to MimeMultipartType

* do not set mime-modified flag for encrypted messages that need extra-handling for saved mime-structure

* fix typo

* move get_msg_html() to MsgId.get_html()

* incorporate more documentation suggestions from review

* remove unused return value from collect_texts_recursive()

* avoid mime_modified being mutable in write-parts-loop

* move 'use futures::future::FutureExt' atop of html.rs

* move attributes defining plain-text to a dedicated structure

* more PlainText to separate file

* escape cid when building regex

* let dc_get_msg_html() return NULL when calling with bad param
This commit is contained in:
bjoern
2021-01-11 17:40:35 +01:00
committed by GitHub
parent bb9e6038c4
commit e2688f6355
21 changed files with 1141 additions and 47 deletions

View File

@@ -1447,6 +1447,54 @@ int dc_set_chat_mute_duration (dc_context_t* context, ui
char* dc_get_msg_info (dc_context_t* context, uint32_t msg_id);
/**
* Get uncut message, if available.
*
* Delta Chat tries to break the message in simple parts as plain text or images
* that are retrieved using dc_msg_get_viewtype(), dc_msg_get_text(), dc_msg_get_file() and so on.
* This works totally fine for Delta Chat to Delta Chat communication,
* however, when the counterpart uses another E-Mail-client, this has limits:
*
* - even if we do some good job on removing quotes,
* sometimes one needs to see them
* - HTML-only messages might lose information on conversion to text,
* esp. when there are lots of embedded images
* - even if there is some plain text part for a HTML-message,
* this is often poor and not nicely usable due to long links
*
* In these cases, dc_msg_has_html() returns 1
* and you can ask dc_get_msg_html() for some HTML-code
* that shows the uncut text (which is close to the original)
* For simplicity, the function _always_ returns HTML-code,
* this removes the need for the UI
* to deal with different formatting options of PLAIN-parts.
*
* **Note:** The returned HTML-code may contain scripts,
* external images that may be misused as hidden read-receipts and so on.
* Taking care of these parts
* while maintaining compatibility with the then generated HTML-code
* is not easily doable, if at all.
* Eg. taking care of tags and attributes is not sufficient,
* we would have to deal with linked content (eg. script, css),
* text (eg. script-blocks) and values (eg. javascript-protocol) as well;
* on this level, we have to deal with encodings, browser peculiarities and so on -
* and would still risk to oversee something and to break things.
*
* To avoid starting this cat-and-mouse game,
* and to close this issue in a sustainable way,
* it is up to the UI to display the HTML-code in an **appropriate sandbox environment** -
* that may eg. be an external browser or a WebView with scripting disabled.
*
* @memberof dc_context_t
* @param context The context object object.
* @param msg_id The message id for which the uncut text should be loaded
* @return Uncut text as HTML.
* In case of errors, NULL is returned.
* The result must be released using dc_str_unref().
*/
char* dc_get_msg_html (dc_context_t* context, uint32_t msg_id);
/**
* Get the raw mime-headers of the given message.
* Raw headers are saved for incoming messages
@@ -3603,6 +3651,32 @@ int dc_msg_get_videochat_type (const dc_msg_t* msg);
#define DC_VIDEOCHATTYPE_JITSI 2
/**
* Checks if the message has a full HTML version.
*
* Messages have a full HTML version
* if the original message _may_ contain important parts
* that are removed by some heuristics
* or if the message is just too long or too complex
* to get displayed properly by just using plain text.
* If so, the UI should offer a button as
* "Show full message" that shows the uncut message using dc_get_msg_html().
*
* Even if a "Show full message" button is recommended,
* the UI should display the text in the bubble
* using the normal dc_msg_get_text() function -
* which will still be fine in many cases.
*
* @memberof dc_msg_t
* @param msg The message object.
* @return 0=Message as displayed using dc_msg_get_text() is just fine;
* 1=The message has a full HTML version,
* should be displayed using dc_msg_get_text()
* and a button to show the full version should be offered
*/
int dc_msg_has_html (dc_msg_t* msg);
/**
* Set the text of a message object.
* This does not alter any information in the database; this may be done by dc_send_msg() later.