Search Support

Avoid support scams. We will never ask you to call or text a phone number or share personal information. Please report suspicious activity using the “Report Abuse” option.

Learn More

Readability returns the longest html object as the most plausible if an article's header is longer than actual content

  • No replies
  • 0 have this problem
  • 2 views
more options

When reader mode (or the standalone readability library) encounters an article on a page where a header has greater content length than the actual article or main content, then the object returned by readability will have members `content` and `textContent` matching the article header and not the actual article. This results in reader mode displaying incorrect content for a given article:

An example Blogger post where readability / reader mode work as expected: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/learning-to-let-go-of-small-details-4902.html

An example Blogger page where readability / reader mode do not correctly return the article's main content: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/really-understand-ego-or-anything-when.html

It appears that the logic in `_grabArtcile` is failing to nominate the actual article content as the most plausible: https://github.com/mozilla/readability/blob/main/Readability.js#L876

When reader mode (or the standalone readability library) encounters an article on a page where a header has greater content length than the actual article or main content, then the object returned by readability will have members `content` and `textContent` matching the article header and not the actual article. This results in reader mode displaying incorrect content for a given article: An example Blogger post where readability / reader mode work as expected: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/learning-to-let-go-of-small-details-4902.html An example Blogger page where readability / reader mode do not correctly return the article's main content: https://gnosticesotericstudyworkaids.blogspot.com/2023/11/really-understand-ego-or-anything-when.html It appears that the logic in `_grabArtcile` is failing to nominate the actual article content as the most plausible: https://github.com/mozilla/readability/blob/main/Readability.js#L876
Attached screenshots

You must log in to your account to reply to posts. Please start a new question, if you do not have an account yet.