Files
chatmail-core/python/src/deltachat/direct_imap.py
Hocuri ebccdbbcb9 Improve onboarding by scanning all folders from time to time (#2067)
Start implementing #1994

TODO (in later PRs):

- Add a hint to the watch settings that all folders are fetched from time to time (to be done in the individual UIs)
- folder names are case-insensitive, so double-check that all comparisons are case-insensitive
- The `scan_folders.rs` file didn't get as large as I expected and it's probably not worth it having an extra file for it. But if there are no objections, I'll make another PR to rename it to `folders.rs` and also put into it `configure_folders()` from `imap/mod.rs` and `needs_move()` with all its tests from `message.rs`.

Done:

- Most mailboxes have a "Drafts" folder where constantly new emails appear but we don't actually want to show them, what do we do about this? The most reliable way to detect such messages that we found up to now is:
  If there is no `Received` header AND it's not in the `ConfiguredSentbox`, then ignore the email.
- before or after INBOX idle trigger a new "scan all folders for messages".  It does a "list folders" and then goes through all folders with select-statements, checking if "next-uid" was changed since checked last time.  This might be batchable but in any case should not consume a lot of traffic. We might debounce this scan activity to happen at most every N minutes 

- if next-uid changed for a folder, we "prefetch" and "fetch" DC-messages as as needed ("dc-messages" are not just those with "Chat-Version" headers, but can also be regular emails) 

- if we discover DC-messages in folders that have the "/Spam" flag (maybe excluding ContactRequests) we automatically move them to INBOX/DeltaChat folder to help  provider-spam-systems to regard this contact/mail as non-spam 

- for now, we do not change any user visible option, but introduce this "scan all" automatically and on top of what exists.   The DeltaChat folder-watching does not perform scan-all-folders (maybe with the exception to trigger scan-all also with DeltaChat if INBOX is not watched)

- Tests (except if you have ideas to improve them)
- all folders, their last uidvalidity, next-seen etc. are kept in a separate "imap-sync" sqlite table.  Maybe this can be used to streamline some of the "Sent" folder and "DeltaChat" folder detection code we already have. 

- We now also move self-sent messages from the Inbox to the Sent folder if `mvbox_move` is off, as this was very easy to do now. This way, we now behave more like a normal MUA if the user wants this.

FOR LATER PRs:

- maybe for the first 50 messages or so, we could reduce the IDLE-timeout (currently 23 minutes or so) to faster detect messages sent to non-inbox folders. However, on Android and iOS, we would likely trigger scan-all when the app moves to foreground, and so  it might not be neccessary to reduce the current idle-timeout at least for them.  We can leave this "faster discovery" question for the end, after we move to real-life testing. 

- (Later on, after the above works, we can consider heuristics on which folders to perform IDLE on, and remove the Watch-folder options (inbox, deltachat, sent). We tried to find a safe scheme for already doing it but failed to fine one, too many unknowns, also some questions regarding multi-device (you might have different settings with each of it, one moves, the other doesn't etc.) so we postponed this in favor of the above incremental suggestion.)




* Start implementing #1994

* Add debug logs, it seems like the SQL migration can go into another pr

* Let fetch_new_messages return whether there are new emails

* Code style

* Don't prefetch if there are no new emails

* clippy

* Even more debug logs

* If the folder was not newly selected, return always try to fetch as

uid_next is probably outdated

* Fix new bug

* Recognize spam folder

* if we discover DC-messages in folders that have the "/Spam" flag (excluding ContactRequests) we automatically move them to INBOX/DeltaChat folder to help provider-spam-systems to regard this contact/mail as non-spam

* Clippy, prioritize folder_meaning over folder_name_meaning

* Add a first test, for the first day after installation only debounce to 2s

* Start adding two tests (both of them fail)

* Don't abort folder scan if one folder fails

* More consts

* Replace bool return value by enum

* Split test up into multiple tests

* Print logs during rust tests

* Rust tests pass now

* .

* One of the Python tests passes now - reconfigure folders during scanning

* Make the last test pass - Delete emails in all folders when starting the test, not only inbox and mvbox

The problem had been that emails were left in the folder "xyz"

* lint

* DB migration (untested)

* Store uid_next in SQL instead of lastseen in a config

* Revert "If Inbox-watch is disabled and enabled again, do not fetch emails from in between"

all folders are always watched, anyway

* clippy, rm debug logs, comments

* Codestyle, comments

* fixing things again

* Fix another test: don't fetch from uid_next-1 but uid_next; make some {} to {:#} so that we can use `.context(...)`

* move self-sent, non-setupmessage chat messages to the Sent folder if `mvbox_move` is off

* comment

* Comments, make sure things work even if there is no uid_next

* Style

* Comments

* The rust test tested wrongly

* comments, small codestyle change

* Ignore emails that are probably only drafts

Most mailboxes have a "Drafts" folder where constantly new emails appear
but we don't actually want to show them.
So: If there is no Received header AND it's not in the ConfiguredSentbox,
then ignore the email.

Also: Add test.

* Fix occasional test failure, it was introduced as DC now moves messages from Inbox to Sent

* Add `Received` header to the rust tests

* After this PR we will always watch all folders and delete messages there if server_delete is enabled. So, for people who have server_delete on, disable it and add a hint to the devicechat

* comment, small fix

* link2xt's first review

* Use ON CONFLICT(FOLDER) DO to update and if it doesn't exist, then insert

Reason from link2xt: We had a problem with multiple peerstates inserted due to key fingerprint parsing error previously. With logic in Rust a similar problem can occur: an UPDATE can fail for reasons other than a conflict. PRIMARY KEY should ensure uniqueness in this case, but anyway.

* Remove two TODO statements, remove fetch_new_messages: ignoring uid {}, uid_next was {} log

* Next TODO: Make uidvalidity and uid_next DEFAULT 0

* rm two TODOs, Seems like we are not going to `exclude folders that are watched anyway` in this PR

* small tweak: Handle instants more carefully

* Add scan_all_folders_debounce_secs config for tests, set debounce to 60s (before it was just 2s during the first day)

* Don't use bold letters for the device message

* React to changes in the folders better

Before, if there was a configured Sent folder, but then it got
removed and replaced with another folder with a name meaning "Sent" but
without Sent flag, it would be ignored.

So, instead of checking against ConfiguredSentboxFolder,
create two Option variables at the beginning of the loop and replace
them with Some if it is None. At the end of the loop, store the new
values into ConfiguredSendboxFolder and ConfiguredSpamFolder, even if it
is None.

Also, derive some useful traits.

* move job: Return a meaningful error if server_folder is None instead of panicing

* small error-handling fix

* Fix test_fetch_existing() python test

Before, we sometimes got a race condition where scan_folders() sees that
there is a Sentbox and saves this info after we set the
ConfiguredSentbox to None and before the message is sent.

So, just expect that the message is moved to the sentbox.

* migration is 72 now

* rm 2 TODOs, Don't infinitely retry when dc_receive_imf() returns Err

* clippy: Remove glob imports

* Delete created folders at the beginning of tests

(some created folders made problems in the next tests because)

* Improve resetting accounts between tests
2021-01-13 14:09:51 +01:00

272 lines
9.3 KiB
Python

"""
Internal Python-level IMAP handling used by the testplugin
and for cleaning up inbox/mvbox for each test function run.
"""
import io
import email
import ssl
import pathlib
from imapclient import IMAPClient
from imapclient.exceptions import IMAPClientError
import imaplib
import deltachat
from deltachat import const
SEEN = b'\\Seen'
DELETED = b'\\Deleted'
FLAGS = b'FLAGS'
FETCH = b'FETCH'
ALL = "1:*"
@deltachat.global_hookimpl
def dc_account_extra_configure(account):
""" Reset the account (we reuse accounts across tests)
and make 'account.direct_imap' available for direct IMAP ops.
"""
try:
if not hasattr(account, "direct_imap"):
imap = DirectImap(account)
for folder in imap.list_folders():
if folder.lower() == "inbox" or folder.lower() == "deltachat":
assert imap.select_folder(folder)
imap.delete(ALL, expunge=True)
else:
imap.conn.delete_folder(folder)
# We just deleted the folder, so we have to make DC forget about it, too
if account.get_config("configured_sentbox_folder") == folder:
account.set_config("configured_sentbox_folder", None)
if account.get_config("configured_spam_folder") == folder:
account.set_config("configured_spam_folder", None)
setattr(account, "direct_imap", imap)
except Exception as e:
# Uncaught exceptions here would lead to a timeout without any note written to the log
account.log("=============================== CAN'T RESET ACCOUNT: ===============================")
account.log("===================", e, "===================")
@deltachat.global_hookimpl
def dc_account_after_shutdown(account):
""" shutdown the imap connection if there is one. """
imap = getattr(account, "direct_imap", None)
if imap is not None:
imap.shutdown()
del account.direct_imap
class DirectImap:
def __init__(self, account):
self.account = account
self.logid = account.get_config("displayname") or id(account)
self._idling = False
self.connect()
def connect(self):
host = self.account.get_config("configured_mail_server")
port = int(self.account.get_config("configured_mail_port"))
security = int(self.account.get_config("configured_mail_security"))
user = self.account.get_config("addr")
pw = self.account.get_config("mail_pw")
if security == const.DC_SOCKET_PLAIN:
ssl_context = None
else:
ssl_context = ssl.create_default_context()
# don't check if certificate hostname doesn't match target hostname
ssl_context.check_hostname = False
# don't check if the certificate is trusted by a certificate authority
ssl_context.verify_mode = ssl.CERT_NONE
if security == const.DC_SOCKET_STARTTLS:
self.conn = IMAPClient(host, port, ssl=False)
self.conn.starttls(ssl_context)
elif security == const.DC_SOCKET_PLAIN:
self.conn = IMAPClient(host, port, ssl=False)
elif security == const.DC_SOCKET_SSL:
self.conn = IMAPClient(host, port, ssl_context=ssl_context)
self.conn.login(user, pw)
self.select_folder("INBOX")
def shutdown(self):
try:
self.conn.idle_done()
except (OSError, IMAPClientError):
pass
try:
self.conn.logout()
except (OSError, IMAPClientError):
print("Could not logout direct_imap conn")
def create_folder(self, foldername):
try:
self.conn.create_folder(foldername)
except imaplib.IMAP4.error as e:
print("Can't create", foldername, "probably it already exists:", str(e))
def select_folder(self, foldername):
assert not self._idling
return self.conn.select_folder(foldername)
def select_config_folder(self, config_name):
""" Return info about selected folder if it is
configured, otherwise None. """
if "_" not in config_name:
config_name = "configured_{}_folder".format(config_name)
foldername = self.account.get_config(config_name)
if foldername:
return self.select_folder(foldername)
def list_folders(self):
""" return list of all existing folder names"""
assert not self._idling
folders = []
for meta, sep, foldername in self.conn.list_folders():
folders.append(foldername)
return folders
def delete(self, range, expunge=True):
""" delete a range of messages (imap-syntax).
If expunge is true, perform the expunge-operation
to make sure the messages are really gone and not
just flagged as deleted.
"""
self.conn.set_flags(range, [DELETED])
if expunge:
self.conn.expunge()
def get_all_messages(self):
assert not self._idling
# Flush unsolicited responses. IMAPClient has problems
# dealing with them: https://github.com/mjs/imapclient/issues/334
# When this NOOP was introduced, next FETCH returned empty
# result instead of a single message, even though IMAP server
# can only return more untagged responses than required, not
# less.
self.conn.noop()
return self.conn.fetch(ALL, [FLAGS])
def get_unread_messages(self):
assert not self._idling
res = self.conn.fetch(ALL, [FLAGS])
return [uid for uid in res
if SEEN not in res[uid][FLAGS]]
def mark_all_read(self):
messages = self.get_unread_messages()
if messages:
res = self.conn.set_flags(messages, [SEEN])
print("marked seen:", messages, res)
def get_unread_cnt(self):
return len(self.get_unread_messages())
def dump_account_info(self, logfile):
def log(*args, **kwargs):
kwargs["file"] = logfile
print(*args, **kwargs)
cursor = 0
for name, val in self.account.get_info().items():
entry = "{}={}".format(name.upper(), val)
if cursor + len(entry) > 80:
log("")
cursor = 0
log(entry, end=" ")
cursor += len(entry) + 1
log("")
def dump_imap_structures(self, dir, logfile):
assert not self._idling
stream = io.StringIO()
def log(*args, **kwargs):
kwargs["file"] = stream
print(*args, **kwargs)
empty_folders = []
for imapfolder in self.list_folders():
self.select_folder(imapfolder)
messages = list(self.get_all_messages())
if not messages:
empty_folders.append(imapfolder)
continue
log("---------", imapfolder, len(messages), "messages ---------")
# get message content without auto-marking it as seen
# fetching 'RFC822' would mark it as seen.
requested = [b'BODY.PEEK[]', FLAGS]
for uid, data in self.conn.fetch(messages, requested).items():
body_bytes = data[b'BODY[]']
if not body_bytes:
log("Message", uid, "has empty body")
continue
flags = data[FLAGS]
path = pathlib.Path(str(dir)).joinpath("IMAP", self.logid, imapfolder)
path.mkdir(parents=True, exist_ok=True)
fn = path.joinpath(str(uid))
fn.write_bytes(body_bytes)
log("Message", uid, fn)
email_message = email.message_from_bytes(body_bytes)
log("Message", uid, flags, "Message-Id:", email_message.get("Message-Id"))
if empty_folders:
log("--------- EMPTY FOLDERS:", empty_folders)
print(stream.getvalue(), file=logfile)
def idle_start(self):
""" switch this connection to idle mode. non-blocking. """
assert not self._idling
res = self.conn.idle()
self._idling = True
return res
def idle_check(self, terminate=False):
""" (blocking) wait for next idle message from server. """
assert self._idling
self.account.log("imap-direct: calling idle_check")
res = self.conn.idle_check(timeout=30)
if len(res) == 0:
raise TimeoutError
if terminate:
self.idle_done()
self.account.log("imap-direct: idle_check returned {!r}".format(res))
return res
def idle_wait_for_seen(self):
""" Return first message with SEEN flag
from a running idle-stream REtiurn.
"""
while 1:
for item in self.idle_check():
if item[1] == FETCH:
if item[2][0] == FLAGS:
if SEEN in item[2][1]:
return item[0]
def idle_done(self):
""" send idle-done to server if we are currently in idle mode. """
if self._idling:
res = self.conn.idle_done()
self._idling = False
return res
def append(self, folder, msg):
if msg.startswith("\n"):
msg = msg[1:]
msg = '\n'.join([s.lstrip() for s in msg.splitlines()])
self.conn.append(folder, msg)