add option to access original message (#2125)

* draft API to deal with uncut message texts

* add column mime_modified

* add mime_modified flag to MimeParser and save it in the database

* save mime_headers also when mime_modified is set

* cargo fmt

* set mime_modified on parsed html-texts and when there are multiple alternative-parts; add test for that

* prototype functions, add to repl and ffi

* use correct mime_modified flag

* basically parse Mime-Structure to HTML

* add basic tests for HTML-parsing

* convert text/plain to html for getting original

* respect charset for plain texts

* make test more specific

* fix handling non-utf-8 charsets for plain messages

* add test for plain_to_html()

* add failing test for plaintext linkify

* linkify urls in plain text

* fix regex

* plain text linkify: add failing test for encapsulated links as <https://domain.com>

* plain text linkify: make encapsulated links as <https://domain.com> work

* plain text linkify: require word boundary at beginning of link, add tests for that

* plain text linkify: linkify emails

* plain text: support format=flowed

* plain text: support quotes

* make clippy happy

* set mime-modified also when simplify() cuts non-html messages, add tests for that

* streamline mime recursion

* repl tool: write original html to file for further processing

* convert cid:- to data:-protocol

* add a test for cid: to data: conversion

* make clippy happy

* fix html-tests to work with windows-lineends

* clarify what the returned html-code may contain

* add some more detailed doc comments

* add mime_modified column only if not exist

this additional check is needed
as the column may added with another dbversion in
some shipped beta-versions.

* incorporate documentation suggestions from review

* rename get_original_mime_html() to more simple get_html()

* rename api is_mime_modified() to more simple has_html(); internally, mime_modified-flag stays as-is, however

* rename MimeS to MimeMultipartType

* do not set mime-modified flag for encrypted messages that need extra-handling for saved mime-structure

* fix typo

* move get_msg_html() to MsgId.get_html()

* incorporate more documentation suggestions from review

* remove unused return value from collect_texts_recursive()

* avoid mime_modified being mutable in write-parts-loop

* move 'use futures::future::FutureExt' atop of html.rs

* move attributes defining plain-text to a dedicated structure

* more PlainText to separate file

* escape cid when building regex

* let dc_get_msg_html() return NULL when calling with bad param
This commit is contained in:
bjoern
2021-01-11 17:40:35 +01:00
committed by GitHub
parent bb9e6038c4
commit e2688f6355
21 changed files with 1141 additions and 47 deletions

View File

@@ -0,0 +1,53 @@
From: =?utf-8?Q?Bj=C3=B6rn_Petersen?= <somewhere-apple@me.com>
Content-Type: multipart/alternative;
boundary="Apple-Mail=_19251BCB-E12B-423A-9553-5A68560C2AFD"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Subject: a jpeg
Message-Id: <BC47DA72-6C78-443A-8EBF-2CD199ABAD09@me.com>
Date: Sat, 9 Jan 2021 00:36:11 +0100
To: somewhere-nonapple@testrun.org
X-Mailer: Apple Mail (2.3608.120.23.2.4)
--Apple-Mail=_19251BCB-E12B-423A-9553-5A68560C2AFD
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
a jpeg
--Apple-Mail=_19251BCB-E12B-423A-9553-5A68560C2AFD
Content-Type: multipart/related;
type="text/html";
boundary="Apple-Mail=_4C3710FD-D75D-47FB-8D41-983220390856"
--Apple-Mail=_4C3710FD-D75D-47FB-8D41-983220390856
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
charset=us-ascii
<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><base></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><base class=""><div class="Apple-Mail-URLShareUserContentTopClass"><br class=""></div><div class="Apple-Mail-URLShareWrapperClass"><blockquote type="cite" style="border-left-style: none; color: inherit; padding: inherit; margin: inherit;" class="">a jpeg
<img apple-inline="yes" id="118F6150-5EF5-4DE8-917F-1851EC94FB7C" src="cid:8AE052EF-BC90-486F-BB78-58D3590308EC@fritz.box" class=""></blockquote></div></body></html>
--Apple-Mail=_4C3710FD-D75D-47FB-8D41-983220390856
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename=small.jpg
Content-Type: image/jpeg;
x-unix-mode=0666;
name="small.jpg"
Content-Id: <8AE052EF-BC90-486F-BB78-58D3590308EC@fritz.box>
/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDABELDA8MChEPDg8TEhEUGSobGRcXGTMkJh4qPDU/Pjs1
OjlDS2BRQ0daSDk6U3FUWmNma2xrQFB2fnRofWBpa2f/2wBDARITExkWGTEbGzFnRTpFZ2dnZ2dn
Z2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2f/wgARCAAIAAgDAREA
AhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAABP/EABUBAQEAAAAAAAAAAAAAAAAAAAMF/9oADAMB
AAIQAxAAAAF8s//EABQQAQAAAAAAAAAAAAAAAAAAAAD/2gAIAQEAAQUCf//EABQRAQAAAAAAAAAA
AAAAAAAAAAD/2gAIAQMBAT8Bf//EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAIAQIBAT8Bf//EABQQ
AQAAAAAAAAAAAAAAAAAAAAD/2gAIAQEABj8Cf//EABQQAQAAAAAAAAAAAAAAAAAAAAD/2gAIAQEA
AT8hf//aAAwDAQACAAMAAAAQ3//EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAIAQMBAT8Qf//EABQR
AQAAAAAAAAAAAAAAAAAAAAD/2gAIAQIBAT8Qf//EABQQAQAAAAAAAAAAAAAAAAAAAAD/2gAIAQEA
AT8Qf//Z
--Apple-Mail=_4C3710FD-D75D-47FB-8D41-983220390856--
--Apple-Mail=_19251BCB-E12B-423A-9553-5A68560C2AFD--

View File

@@ -0,0 +1,16 @@
Subject: mime-modified test
Message-ID: 12345@testrun.org
Date: Sat, 07 Dec 2019 19:00:27 +0000
To: recp@testrun.org
From: sender@testrun.org
Content-Type: multipart/alternative; boundary="==BREAK=="
--==BREAK==
Content-Type: text/html; charset=utf-8
<html>
<p>mime-modified <b>set</b>; simplify is always regarded as lossy.</p>
</html>
--==BREAK==--

View File

@@ -0,0 +1,16 @@
Subject: mime-modified test
Message-ID: 12345@testrun.org
Date: Sat, 07 Dec 2019 19:00:27 +0000
To: recp@testrun.org
From: sender@testrun.org
Content-Type: multipart/alternative; boundary="==BREAK=="
--==BREAK==
Content-Type: text/plain; charset=utf-8
mime-modified should not be set set as there is no html and no special stuff;
although not being a delta-message.
test some special html-characters as < > and & but also " and ' :)
--==BREAK==--

View File

@@ -0,0 +1,23 @@
Subject: mime-modified test
Message-ID: 12345@testrun.org
Date: Sat, 07 Dec 2019 19:00:27 +0000
To: recp@testrun.org
From: sender@testrun.org
Content-Type: multipart/alternative; boundary="==BREAK=="
--==BREAK==
Content-Type: text/plain; charset=utf-8
this is plain
--==BREAK==
Content-Type: text/html; charset=utf-8
<html>
<p>
this is <b>html</b>
</p>
</html>
--==BREAK==--

View File

@@ -0,0 +1,11 @@
Subject: mime-modified test
Message-ID: 12345@testrun.org
Date: Sat, 07 Dec 2019 19:00:27 +0000
To: recp@testrun.org
From: sender@testrun.org
Content-Type: text/html; charset=utf-8
<html>
<p>mime-modified <b>set</b>; simplify is always regarded as lossy.</p>
</html>

View File

@@ -0,0 +1,11 @@
Message-Id: <lkjsdf01u@example.org>
Date: Sat, 14 Sep 2019 19:00:13 +0200
From: lmn <x@tux.org>
To: abc <abc@bcd.com>
Content-Type: text/plain; charset=utf-8; format=flowed
This line ends with a space
and will be merged with the next one due to format=flowed.
This line does not end with a space
and will be wrapped as usual.

View File

@@ -0,0 +1,7 @@
Message-Id: <lkjsdf01u@example.org>
Date: Sat, 14 Sep 2019 19:00:13 +0200
From: lmn <x@tux.org>
To: abc <abc@bcd.com>
Content-Type: text/plain; charset=iso-8859-1
message with a non-UTF-8 encoding: <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>

View File

@@ -0,0 +1,6 @@
Message-Id: <lkjsdf01u@example.org>
Date: Sat, 14 Sep 2019 19:00:13 +0200
From: lmn <x@tux.org>
To: abc <abc@bcd.com>
This message does not have Content-Type nor Subject.