When receiving messages, blobs will be deduplicated with the new
function `create_and_deduplicate_from_bytes()`. For sending files, this
adds a new function `set_file_and_deduplicate()` instead of
deduplicating by default.
This is for
https://github.com/deltachat/deltachat-core-rust/issues/6265; read the
issue description there for more details.
TODO:
- [x] Set files as read-only
- [x] Don't do a write when the file is already identical
- [x] The first 32 chars or so of the 64-character hash are enough. I
calculated that if 10b people (i.e. all of humanity) use DC, and each of
them has 200k distinct blob files (I have 4k in my day-to-day account),
and we used 20 chars, then the expected value for the number of name
collisions would be ~0.0002 (and the probability that there is a least
one name collision is lower than that) [^1]. I added 12 more characters
to be on the super safe side, but this wouldn't be necessary and I could
also make it 20 instead of 32.
- Not 100% sure whether that's necessary at all - it would mainly be
necessary if we might hit a length limit on some file systems (the
blobdir is usually sth like
`accounts/2ff9fc096d2f46b6832b24a1ed99c0d6/dc.db-blobs` (53 chars), plus
64 chars for the filename would be 117).
- [x] "touch" the files to prevent them from being deleted
- [x] TODOs in the code
For later PRs:
- Replace `BlobObject::create(…)` with
`BlobObject::create_and_deduplicate(…)` in order to deduplicate
everytime core creates a file
- Modify JsonRPC to deduplicate blob files
- Possibly rename BlobObject.name to BlobObject.file in order to prevent
confusion (because `name` usually means "user-visible-name", not "name
of the file on disk").
[^1]: Calculated with both https://printfn.github.io/fend/ and
https://www.geogebra.org/calculator, both of which came to the same
result
([1](https://github.com/user-attachments/assets/bbb62550-3781-48b5-88b1-ba0e29c28c0d),
[2](https://github.com/user-attachments/assets/82171212-b797-4117-a39f-0e132eac7252))
---------
Co-authored-by: l <link2xt@testrun.org>
Add a `create` param to `select_with_uidvalidity()` instead of always trying to create the folder
and return `Ok(false)` from it if the folder doesn't exist and shouldn't be created, and handle this
in `store_seen_flags_on_imap()` by just removing "jobs" from the `imap_markseen` table. Also don't
create the folder in other code paths where it's not necessary.
this PR removes most usages of the `descr` parameter.
- to avoid noise in different branches etc. (as annoying on similar, at
a first glance simple changes), i left the external API stable
- also, the effort to do a database migration seems to be over the top,
so the column is left and set to empty strings on future updates - maybe
we can recycle the column at some point ;)
closes#6245
Without this fix IMAP loop may get stuck
trying to download non-existing message over and over
like this:
```
src/imap.rs:372: Logging into IMAP server with LOGIN.
src/imap.rs:388: Successfully logged into IMAP server
src/scheduler.rs:361: Failed to download message Msg#3467: Message Msg#3467 does not exist.
src/scheduler.rs:418: Failed fetch_idle: Failed to download messages: Message Msg#3467 does not exist
```
The whole download operation fails
due to attempt to set the state of non-existing message
to "failed". Now download of the message
will "succeed" if the message does not exist
and we don't try to set its state.
API now pretends that trashed messages don't exist.
This way callers don't have to check if loaded message
belongs to trash chat.
If message may be trashed by the time it is attempted to be loaded,
callers should use Message::load_from_db_optional.
Most changes are around receive_status_update() function
because previously it relied on loading trashed status update
messages immediately after adding them to the database.
Connection establishment now happens only in one place in each IMAP loop.
Now all connection establishment happens in one place
and is limited by the ratelimit.
Backoff was removed from fake_idle
as it does not establish connections anymore.
If connection fails, fake_idle will return an error.
We then drop the connection and get back to the beginning of IMAP
loop.
Backoff may be still nice to have to delay retries
in case of constant connection failures
so we don't immediately hit ratelimit if the network is unusable
and returns immediate error on each connection attempt
(e.g. ICMP network unreachable error),
but adding backoff for connection failures is out of scope for this change.
If a Delta Chat message has the Message-ID already existing in the db, but a greater "Date", it's a
resent message that can be deleted. Messages having the same "Date" mustn't be deleted because they
can be already seen messages moved back to INBOX. Also don't delete messages having lesser "Date" to
avoid deleting both messages in a multi-device setting.
Previously the message was removed from `download` table,
but message bubble was stuck in InProgress state.
Now download state is updated by the caller,
so it cannot be accidentally skipped.
Also add a test on downloading a message later. Although it doesn't reproduce #4700 for some reason,
it fails w/o the fix because before a message state was changing to `InSeen` after a full download
which doesn't look correct. The result of a full message download should be such as if it was fully
downloaded initially.
Such a message may be assigned to a wrong chat (e.g. undecipherable group msgs often get assigned to
the 1:1 chat with the sender). Add `DownloadState::Undecipherable` so that messages referencing
undecipherable ones don't go to that wrong chat too. Also do not reply to not fully downloaded
messages. Before `Message.error` was checked for that purpose, but a message can be error for many
reasons.
Message.set_text() and Message.get_text() are modified accordingly
to accept String and return String.
Messages which previously contained None text
are now represented as messages with empty text.
Use Message.set_text("".to_string())
instead of Message.set_text(None).
Moved custom ToSql trait including Send + Sync from lib.rs to sql.rs.
Replaced most params! and paramsv! macro usage with tuples.
Replaced paramsv! and params_iterv! with params_slice!,
because there is no need to construct a vector.
Gmail archives messages marked as `\Deleted` by default if those messages aren't in the Trash. But
if move them to the Trash instead, they will be auto-deleted in 30 days.
get_chat_msgs() function is split into new get_chat_msgs() without flags
and get_chat_msgs_ex() which accepts booleans instead of bitflags.
FFI call dc_get_chat_msgs() is still using bitflags for compatibility.
JSON-RPC calls get_message_ids() and get_message_list_items()
accept booleans instead of bitflags now.
* Because both only make problems with mailing lists, it's easiest to just disable them. If we want, we can make them work properly with mailing lists one day and re-enable them, but this needs some further thoughts.
Part of #3701
* Use load_from_db() in more tests
* clippy
* Changelog
* Downgrade warning to info, improve message
* Use lifetimes instead of cloning
This way no temporary rows are created and it is easier to maintain
because UPDATE statement is right below the INSERT statement,
unlike `merge_messages` function which is easy to forget about.
* save webxdc-updates for not yet downloaded messages, that are probably webxdc instances then
* test webxdc updates received while instance is not yet downloaded
* keep msg_id on downloading messages
keeping msg_id on downloading messages
has the advantage that webxdc updates and other references to the msg_id
can be processed as usual.
if a message expands to multiple msg_id,
the last one is kept,
however, this does not affect webxdc at all.
(alternatives may be to update `msgs_status_updates`
but that seems more complicated and even less elegant,
another alternative would be to use different keys (eg. `rfc274_mid`),
but that also seems not to be much easier and would waste space as well.
also both alternatives would need adaption for other foreign keys)
* update CHANGELOG
* do not emit WebxdcStatusUpdate event in case the message is not yet downloaded
* move DELETE/UPDATE to an transaction
* make merge_msg_id() a little less confusing
* use some webxdc-update-param from placeholder
(the placeholder may be updated,
the just downloaded messages is not)
* more precise function name
* test not directly downloading status updates
* test not directly downloading mdn
Message-IDs are now retrieved only during fetching and saved into imap
table. dc_receive_imf_inner does not attempt to extract the Message-ID
anymore.
For messages without Message-ID the ID is now generated in
imap::fetch_new_messages rather than dc_receive_imf_inner,
so the same ID is used in the imap table (maintained by the imap
module) and msgs table (maintained by dc_receive_imf module).
Message-ID generation based on the Date, From and To field hashing has
been replaced with a simple dc_create_id() to avoid retrieving Date,
From, and To fields in the imap module, as it's hard to test that it
stays compatible between Delta Chat versions in this module. This
breaks jump-to-quote for quoted messages without Message-ID, which is
not critical.
Also prefetch X-Microsoft-Original-Message-ID, so retrieval of
duplicate messages with X-Microsoft-Original-Message-ID can be skipped
like it is done for messages with Message-ID header.
- Replace .ok_or_else() and .map_err() with anyhow::Context where possible.
- Use .context() to check Option for None when it's an error
- Resultify Chatlist.get_chat_id()
- Add useful .context() to some errors
- IMAP error handling cleanup
`imap` table maps Message-IDs to UIDs on the server. `dc_receive_imf`
no longer gets the UID of the message as an argument and does not
insert the folder and UID of the message into the `msgs`
table. `server_folder` and `server_uid` columns in `msgs` table are
deprecated.
MoveMsg and DeleteMsgOnImap jobs are removed. Now messages are moved
and deleted only in the `fetch_move_delete` procedure that consults
the `target` column of the `imap` table to determine where the message
should go.
Where the message should go is determined after prefetching by the
`imap::target_folder()` procedure. Messages are only downloaded once
they reach their target folder to avoid race conditions in multidevice
setting, such as:
1. One device trying to FETCH the message while the other tries to
MOVE it.
2. One device marking the message as \Seen in the Inbox while the
other has already copied unseen message to the Movebox and is going to
delete the \Seen message in the Inbox.
3. Device downloads the message from the Inbox while there are newer
messages in the Movebox placed there by the other device, thus
processing the messages out of order.
When there is an outgoing message in a chat, mark all older messages in this chat as seen.
Android already has a similar behavior, however, this led to the issue https://github.com/deltachat/deltachat-android/issues/2163 and should be changed back.
--
From the issue description at https://github.com/deltachat/deltachat-android/issues/2163, I implemented these fixes:
> Core should take care that if the last message in a chat is not fresh|noticed, no messages in the chat can be fresh. [...] Do this [...] in a function that's called at the end of fetch_new_messages(). Then dc_receive_imf() wouldn't get slower by this and we could re-use the same function for migration.
So, I didn't do this inside `dc_receive_imf()` in order not to make it take even longer. This obviously has the downside of higher complexity.
And I think we should implement this:
> On Androd, show the unread badge when unread!=0 again (see deltachat/deltachat-android@618af02). Then the user can see that there is a chat with an unread message and click it to get rid of it.
because it shouldn't be the UI's job to decide whether an unread badge is shown, but the core's.