Readability returns the longest html object as the most plausible if an article's header is longer than actual content

This thread was archived. Please ask a new question if you need help.

No replies
0 have this problem
12 views

11/17/23, 11:08 PM

An example Blogger post where readability / reader mode work as expected: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/learning-to-let-go-of-small-details-4902.html

An example Blogger page where readability / reader mode do not correctly return the article's main content: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/really-understand-ego-or-anything-when.html

It appears that the logic in `_grabArtcile` is failing to nominate the actual article content as the most plausible: https://github.com/mozilla/readability/blob/main/Readability.js#L876

When reader mode (or the standalone readability library) encounters an article on a page where a header has greater content length than the actual article or main content, then the object returned by readability will have members `content` and `textContent` matching the article header and not the actual article. This results in reader mode displaying incorrect content for a given article: An example Blogger post where readability / reader mode work as expected: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/learning-to-let-go-of-small-details-4902.html An example Blogger page where readability / reader mode do not correctly return the article's main content: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/really-understand-ego-or-anything-when.html It appears that the logic in `_grabArtcile` is failing to nominate the actual article content as the most plausible: https://github.com/mozilla/readability/blob/main/Readability.js#L876

Attached screenshots

Explore by product

Explore by topic

Browse by product

Browse all forum threads by topic

Get help with

Search Support

Readability returns the longest html object as the most plausible if an article's header is longer than actual content