Improve onboarding by scanning all folders from time to time (#2067)

Start implementing #1994

TODO (in later PRs):

- Add a hint to the watch settings that all folders are fetched from time to time (to be done in the individual UIs)
- folder names are case-insensitive, so double-check that all comparisons are case-insensitive
- The `scan_folders.rs` file didn't get as large as I expected and it's probably not worth it having an extra file for it. But if there are no objections, I'll make another PR to rename it to `folders.rs` and also put into it `configure_folders()` from `imap/mod.rs` and `needs_move()` with all its tests from `message.rs`.

Done:

- Most mailboxes have a "Drafts" folder where constantly new emails appear but we don't actually want to show them, what do we do about this? The most reliable way to detect such messages that we found up to now is:
  If there is no `Received` header AND it's not in the `ConfiguredSentbox`, then ignore the email.
- before or after INBOX idle trigger a new "scan all folders for messages".  It does a "list folders" and then goes through all folders with select-statements, checking if "next-uid" was changed since checked last time.  This might be batchable but in any case should not consume a lot of traffic. We might debounce this scan activity to happen at most every N minutes 

- if next-uid changed for a folder, we "prefetch" and "fetch" DC-messages as as needed ("dc-messages" are not just those with "Chat-Version" headers, but can also be regular emails) 

- if we discover DC-messages in folders that have the "/Spam" flag (maybe excluding ContactRequests) we automatically move them to INBOX/DeltaChat folder to help  provider-spam-systems to regard this contact/mail as non-spam 

- for now, we do not change any user visible option, but introduce this "scan all" automatically and on top of what exists.   The DeltaChat folder-watching does not perform scan-all-folders (maybe with the exception to trigger scan-all also with DeltaChat if INBOX is not watched)

- Tests (except if you have ideas to improve them)
- all folders, their last uidvalidity, next-seen etc. are kept in a separate "imap-sync" sqlite table.  Maybe this can be used to streamline some of the "Sent" folder and "DeltaChat" folder detection code we already have. 

- We now also move self-sent messages from the Inbox to the Sent folder if `mvbox_move` is off, as this was very easy to do now. This way, we now behave more like a normal MUA if the user wants this.

FOR LATER PRs:

- maybe for the first 50 messages or so, we could reduce the IDLE-timeout (currently 23 minutes or so) to faster detect messages sent to non-inbox folders. However, on Android and iOS, we would likely trigger scan-all when the app moves to foreground, and so  it might not be neccessary to reduce the current idle-timeout at least for them.  We can leave this "faster discovery" question for the end, after we move to real-life testing. 

- (Later on, after the above works, we can consider heuristics on which folders to perform IDLE on, and remove the Watch-folder options (inbox, deltachat, sent). We tried to find a safe scheme for already doing it but failed to fine one, too many unknowns, also some questions regarding multi-device (you might have different settings with each of it, one moves, the other doesn't etc.) so we postponed this in favor of the above incremental suggestion.)




* Start implementing #1994

* Add debug logs, it seems like the SQL migration can go into another pr

* Let fetch_new_messages return whether there are new emails

* Code style

* Don't prefetch if there are no new emails

* clippy

* Even more debug logs

* If the folder was not newly selected, return always try to fetch as

uid_next is probably outdated

* Fix new bug

* Recognize spam folder

* if we discover DC-messages in folders that have the "/Spam" flag (excluding ContactRequests) we automatically move them to INBOX/DeltaChat folder to help provider-spam-systems to regard this contact/mail as non-spam

* Clippy, prioritize folder_meaning over folder_name_meaning

* Add a first test, for the first day after installation only debounce to 2s

* Start adding two tests (both of them fail)

* Don't abort folder scan if one folder fails

* More consts

* Replace bool return value by enum

* Split test up into multiple tests

* Print logs during rust tests

* Rust tests pass now

* .

* One of the Python tests passes now - reconfigure folders during scanning

* Make the last test pass - Delete emails in all folders when starting the test, not only inbox and mvbox

The problem had been that emails were left in the folder "xyz"

* lint

* DB migration (untested)

* Store uid_next in SQL instead of lastseen in a config

* Revert "If Inbox-watch is disabled and enabled again, do not fetch emails from in between"

all folders are always watched, anyway

* clippy, rm debug logs, comments

* Codestyle, comments

* fixing things again

* Fix another test: don't fetch from uid_next-1 but uid_next; make some {} to {:#} so that we can use `.context(...)`

* move self-sent, non-setupmessage chat messages to the Sent folder if `mvbox_move` is off

* comment

* Comments, make sure things work even if there is no uid_next

* Style

* Comments

* The rust test tested wrongly

* comments, small codestyle change

* Ignore emails that are probably only drafts

Most mailboxes have a "Drafts" folder where constantly new emails appear
but we don't actually want to show them.
So: If there is no Received header AND it's not in the ConfiguredSentbox,
then ignore the email.

Also: Add test.

* Fix occasional test failure, it was introduced as DC now moves messages from Inbox to Sent

* Add `Received` header to the rust tests

* After this PR we will always watch all folders and delete messages there if server_delete is enabled. So, for people who have server_delete on, disable it and add a hint to the devicechat

* comment, small fix

* link2xt's first review

* Use ON CONFLICT(FOLDER) DO to update and if it doesn't exist, then insert

Reason from link2xt: We had a problem with multiple peerstates inserted due to key fingerprint parsing error previously. With logic in Rust a similar problem can occur: an UPDATE can fail for reasons other than a conflict. PRIMARY KEY should ensure uniqueness in this case, but anyway.

* Remove two TODO statements, remove fetch_new_messages: ignoring uid {}, uid_next was {} log

* Next TODO: Make uidvalidity and uid_next DEFAULT 0

* rm two TODOs, Seems like we are not going to `exclude folders that are watched anyway` in this PR

* small tweak: Handle instants more carefully

* Add scan_all_folders_debounce_secs config for tests, set debounce to 60s (before it was just 2s during the first day)

* Don't use bold letters for the device message

* React to changes in the folders better

Before, if there was a configured Sent folder, but then it got
removed and replaced with another folder with a name meaning "Sent" but
without Sent flag, it would be ignored.

So, instead of checking against ConfiguredSentboxFolder,
create two Option variables at the beginning of the loop and replace
them with Some if it is None. At the end of the loop, store the new
values into ConfiguredSendboxFolder and ConfiguredSpamFolder, even if it
is None.

Also, derive some useful traits.

* move job: Return a meaningful error if server_folder is None instead of panicing

* small error-handling fix

* Fix test_fetch_existing() python test

Before, we sometimes got a race condition where scan_folders() sees that
there is a Sentbox and saves this info after we set the
ConfiguredSentbox to None and before the message is sent.

So, just expect that the message is moved to the sentbox.

* migration is 72 now

* rm 2 TODOs, Don't infinitely retry when dc_receive_imf() returns Err

* clippy: Remove glob imports

* Delete created folders at the beginning of tests

(some created folders made problems in the next tests because)

* Improve resetting accounts between tests
This commit is contained in:
Hocuri
2021-01-13 14:09:51 +01:00
committed by GitHub
parent 3c387a3cb3
commit ebccdbbcb9
27 changed files with 1095 additions and 205 deletions

View File

@@ -579,12 +579,12 @@ class Account(object):
raise ValueError("account not configured, cannot start io")
lib.dc_start_io(self._dc_context)
def configure(self):
def configure(self, reconfigure=False):
""" Start configuration process and return a Configtracker instance
on which you can block with wait_finish() to get a True/False success
value for the configuration process.
"""
assert not self.is_configured()
assert self.is_configured() == reconfigure
if not self.get_config("addr") or not self.get_config("mail_pw"):
raise MissingCredentials("addr or mail_pwd not set in config")
configtracker = ConfigureTracker(self)

View File

@@ -9,6 +9,7 @@ import ssl
import pathlib
from imapclient import IMAPClient
from imapclient.exceptions import IMAPClientError
import imaplib
import deltachat
from deltachat import const
@@ -25,13 +26,29 @@ def dc_account_extra_configure(account):
""" Reset the account (we reuse accounts across tests)
and make 'account.direct_imap' available for direct IMAP ops.
"""
if not hasattr(account, "direct_imap"):
imap = DirectImap(account)
if imap.select_config_folder("mvbox"):
imap.delete(ALL, expunge=True)
assert imap.select_config_folder("inbox")
imap.delete(ALL, expunge=True)
setattr(account, "direct_imap", imap)
try:
if not hasattr(account, "direct_imap"):
imap = DirectImap(account)
for folder in imap.list_folders():
if folder.lower() == "inbox" or folder.lower() == "deltachat":
assert imap.select_folder(folder)
imap.delete(ALL, expunge=True)
else:
imap.conn.delete_folder(folder)
# We just deleted the folder, so we have to make DC forget about it, too
if account.get_config("configured_sentbox_folder") == folder:
account.set_config("configured_sentbox_folder", None)
if account.get_config("configured_spam_folder") == folder:
account.set_config("configured_spam_folder", None)
setattr(account, "direct_imap", imap)
except Exception as e:
# Uncaught exceptions here would lead to a timeout without any note written to the log
account.log("=============================== CAN'T RESET ACCOUNT: ===============================")
account.log("===================", e, "===================")
@deltachat.global_hookimpl
@@ -90,6 +107,12 @@ class DirectImap:
except (OSError, IMAPClientError):
print("Could not logout direct_imap conn")
def create_folder(self, foldername):
try:
self.conn.create_folder(foldername)
except imaplib.IMAP4.error as e:
print("Can't create", foldername, "probably it already exists:", str(e))
def select_folder(self, foldername):
assert not self._idling
return self.conn.select_folder(foldername)
@@ -240,3 +263,9 @@ class DirectImap:
res = self.conn.idle_done()
self._idling = False
return res
def append(self, folder, msg):
if msg.startswith("\n"):
msg = msg[1:]
msg = '\n'.join([s.lstrip() for s in msg.splitlines()])
self.conn.append(folder, msg)

View File

@@ -359,11 +359,7 @@ def acfactory(pytestconfig, tmpdir, request, session_liveconfig, data):
def wait_configure_and_start_io(self):
started_accounts = []
for acc in self._accounts:
if hasattr(acc, "_configtracker"):
acc._configtracker.wait_finish()
acc._evtracker.consume_events()
acc.get_device_chat().mark_noticed()
del acc._configtracker
self.wait_configure(acc)
acc.set_config("bcc_self", "0")
if acc.is_configured() and not acc.is_started():
acc.start_io()
@@ -373,6 +369,13 @@ def acfactory(pytestconfig, tmpdir, request, session_liveconfig, data):
for acc in started_accounts:
acc._evtracker.wait_all_initial_fetches()
def wait_configure(self, acc):
if hasattr(acc, "_configtracker"):
acc._configtracker.wait_finish()
acc._evtracker.consume_events()
acc.get_device_chat().mark_noticed()
del acc._configtracker
def run_bot_process(self, module, ffi=True):
fn = module.__file__

View File

@@ -1136,6 +1136,51 @@ class TestOnlineAccount:
assert not device_chat.can_send()
assert device_chat.get_draft() is None
def test_dont_show_emails_in_draft_folder(self, acfactory):
"""Most mailboxes have a "Drafts" folder where constantly new emails appear but we don't actually want to show them.
So: If there is no Received header AND it's not in the sentbox, then ignore the email."""
ac1 = acfactory.get_online_configuring_account()
ac1.set_config("show_emails", "2")
acfactory.wait_configure(ac1)
ac1.direct_imap.create_folder("Drafts")
ac1.direct_imap.create_folder("Sent")
acfactory.wait_configure_and_start_io()
# Wait until each folder was selected once and we are IDLEing again:
ac1._evtracker.get_info_contains("INBOX: Idle entering wait-on-remote state")
ac1.stop_io()
ac1.direct_imap.append("Drafts", """
From: Bob <bob@example.org>
Subject: subj
To: alice@example.com
Message-ID: <aepiors@example.org>
Content-Type: text/plain; charset=utf-8
message in Drafts
""")
ac1.direct_imap.append("Sent", """
From: Bob <bob@example.org>
Subject: subj
To: alice@example.com
Message-ID: <hsabaeni@example.org>
Content-Type: text/plain; charset=utf-8
message in Sent
""")
ac1.set_config("scan_all_folders_debounce_secs", "0")
ac1.start_io()
msg = ac1._evtracker.wait_next_messages_changed()
# Wait until each folder was scanned, this is necessary for this test to test what it should test:
ac1._evtracker.get_info_contains("INBOX: Idle entering wait-on-remote state")
assert msg.text == "subj message in Sent"
assert len(msg.chat.get_messages()) == 1
def test_prefer_encrypt(self, acfactory, lp):
"""Test quorum rule for encryption preference in 1:1 and group chat."""
ac1, ac2, ac3 = acfactory.get_many_online_accounts(3)
@@ -2088,26 +2133,82 @@ class TestOnlineAccount:
assert received_reply.quoted_text == "hello"
assert received_reply.quote.id == out_msg.id
@pytest.mark.parametrize("folder,move,expected_destination,", [
("xyz", False, "xyz"), # Test that emails are recognized in a random folder but not moved
("xyz", True, "DeltaChat"), # ...emails are found in a random folder and moved to DeltaChat
("Spam", False, "INBOX") # ...emails are moved from the spam folder to the Inbox
])
# Testrun.org does not support the CREATE-SPECIAL-USE capability, which means that we can't create a folder with
# the "\Junk" flag (see https://tools.ietf.org/html/rfc6154). So, we can't test spam folder detection by flag.
def test_scan_folders(self, acfactory, lp, folder, move, expected_destination):
"""Delta Chat periodically scans all folders for new messages to make sure we don't miss any."""
variant = folder + "-" + str(move) + "-" + expected_destination
lp.sec("Testing variant " + variant)
ac1 = acfactory.get_online_configuring_account(move=move)
ac2 = acfactory.get_online_configuring_account()
acfactory.wait_configure(ac1)
ac1.direct_imap.create_folder(folder)
acfactory.wait_configure_and_start_io()
# Wait until each folder was selected once and we are IDLEing:
ac1._evtracker.get_info_contains("INBOX: Idle entering wait-on-remote state")
ac1.stop_io()
# Send a message to ac1 and move it to the mvbox:
ac1.direct_imap.select_config_folder("inbox")
ac1.direct_imap.idle_start()
acfactory.get_accepted_chat(ac2, ac1).send_text("hello")
ac1.direct_imap.idle_check(terminate=True)
ac1.direct_imap.conn.move(["*"], folder) # "*" means "biggest UID in mailbox"
lp.sec("Everything prepared, now see if DeltaChat finds the message (" + variant + ")")
ac1.set_config("scan_all_folders_debounce_secs", "0")
ac1.start_io()
msg = ac1._evtracker.wait_next_incoming_message()
assert msg.text == "hello"
# Wait until the message was moved (if at all) and we are IDLEing again:
ac1._evtracker.get_info_contains("INBOX: Idle entering wait-on-remote state")
ac1.direct_imap.select_folder(expected_destination)
assert len(ac1.direct_imap.get_all_messages()) == 1
if folder != expected_destination:
ac1.direct_imap.select_folder(folder)
assert len(ac1.direct_imap.get_all_messages()) == 0
@pytest.mark.parametrize("mvbox_move", [False, True])
def test_fetch_existing(self, acfactory, lp, mvbox_move):
"""Delta Chat reads the recipients from old emails sent by the user and adds them as contacts.
This way, we can already offer them some email addresses they can write to.
Also test that existing emails are fetched during onboarding.
Also, the newest existing emails from each folder are fetched during onboarding.
Lastly, tests that bcc_self messages moved to the mvbox are marked as read."""
Additionally tests that bcc_self messages moved to the mvbox are marked as read."""
ac1 = acfactory.get_online_configuring_account(mvbox=mvbox_move, move=mvbox_move)
ac2 = acfactory.get_online_configuring_account()
acfactory.wait_configure(ac1)
ac1.direct_imap.create_folder("Sent")
ac1.set_config("sentbox_watch", "1")
# We need to reconfigure to find the new "Sent" folder.
# `scan_folders()`, which runs automatically shortly after `start_io()` is invoked,
# would also find the "Sent" folder, but it would be too late:
# The sentbox thread, started by `start_io()`, would have seen that there is no
# ConfiguredSentboxFolder and do nothing.
ac1._configtracker = ac1.configure(reconfigure=True)
acfactory.wait_configure_and_start_io()
chat = acfactory.get_accepted_chat(ac1, ac2)
lp.sec("send out message with bcc to ourselves")
if mvbox_move:
ac1.direct_imap.select_config_folder("mvbox")
else:
ac1.direct_imap.select_config_folder("sentbox")
ac1.direct_imap.idle_start()
lp.sec("send out message with bcc to ourselves")
ac1.set_config("bcc_self", "1")
chat = acfactory.get_accepted_chat(ac1, ac2)
chat.send_text("message text")
# now wait until the bcc_self message arrives