refactor: make dc_dehtml() function safe

* Make dc_dehtml() function safe

* Change type of is_msgrmsg parameter to bool

* Narrow type of local variable in simplify_plain_text()

* Export less fields of `Simplify' record

* Demote is_cut_* from fields of `Simplify' to local variables

* Refactor part of simplify_plain_text()

Refactor footer ("-- " and similar) code into separate function,
and re-implement it with standard Rust string methods.

It simplifies code and allows removing one mutable local variable.

* Replace dc_split_into_lines with String.split()

  * src/dc_simplify.rs(find_message_footer): adjust type signature to accept
    slice of &str, not slice of pointers

  * src/dc_simplify.rs(simplify_plain_text): adjust code to use '==' operator
    instead of strcmp(3).

  * src/dc_simplify.rs(is_empty_line, is_quoted_headline, is_plain_quote):
    + adjust type signatures to accept &str, not 'const char *'
    + remove no longer needed 'unsafe' qualifier

  * src/dc_tools(dc_split_into_lines, dc_free_splitted_lines): remove no longer
    used functions.

In addition to additional type-safety, this change reduces number of
allocations: String.split returns iterator of &str.

* Make simplify_plain_text() safe

* Make Simplify.simplify return String, not pointer

* Refactor Simplify.simplify to use String methods, not pointers

* Make Simplify.simplify() safe

* Avoid neeless allocation in Simplify.simplify when input is html

* Add tests for simplify utilities

* Document discussion about is_empty_line() discussion
This commit is contained in:
Dmitry Bogatov
2019-08-26 15:15:14 +00:00
committed by Friedel Ziegelmayer
parent 8a73f84003
commit d7d7147549
4 changed files with 127 additions and 233 deletions

View File

@@ -2,9 +2,6 @@ use lazy_static::lazy_static;
use quick_xml;
use quick_xml::events::{BytesEnd, BytesStart, BytesText};
use crate::dc_tools::*;
use crate::x::*;
lazy_static! {
static ref LINE_RE: regex::Regex = regex::Regex::new(r"(\r?\n)+").unwrap();
}
@@ -24,19 +21,20 @@ enum AddText {
// dc_dehtml() returns way too many lineends; however, an optimisation on this issue is not needed as
// the lineends are typically remove in further processing by the caller
pub unsafe fn dc_dehtml(buf_terminated: *mut libc::c_char) -> *mut libc::c_char {
dc_trim(buf_terminated);
if *buf_terminated.offset(0isize) as libc::c_int == 0i32 {
return dc_strdup(b"\x00" as *const u8 as *const libc::c_char);
pub fn dc_dehtml(buf_terminated: &str) -> String {
let buf_terminated = buf_terminated.trim();
if buf_terminated.is_empty() {
return "".into();
}
let mut dehtml = Dehtml {
strbuilder: String::with_capacity(strlen(buf_terminated)),
strbuilder: String::with_capacity(buf_terminated.len()),
add_text: AddText::YesRemoveLineEnds,
last_href: None,
};
let mut reader = quick_xml::Reader::from_str(as_str(buf_terminated));
let mut reader = quick_xml::Reader::from_str(buf_terminated);
let mut buf = Vec::new();
@@ -61,7 +59,7 @@ pub unsafe fn dc_dehtml(buf_terminated: *mut libc::c_char) -> *mut libc::c_char
buf.clear();
}
dehtml.strbuilder.strdup()
dehtml.strbuilder
}
fn dehtml_text_cb(event: &BytesText, dehtml: &mut Dehtml) {