Conversation
|
LGTM You can also update the content_vector_gin to use the new |
|
@pauloxnet interesting. just to make sure that I'm getting this right since the Mastodon languages are stored as 2-char strings (ISO-639-1, |
|
It seems right. In the Django Project code I used a dictionary to map 2 characters long iso languages into language names for PostgreSQL config. Maybe there's a similar way to map language code into config names without an additional fiepd? |
|
I think the problem with changing the SearchVector language is that it's embedded into the index, is it not? We can't have 20-odd indexes on the content of posts, one for each language, and doing a search query without an index for it sounds painful. |
Actually, I think you can have an index based on the language stored in a column in the same table, but I'd leave the change until after this PR is merged. |
|
Did a bit more research on adding the -- adding a column of type `regconfig` to store each record's tsvector config
ALTER TABLE activities_post ADD COLUMN tsvector_config regconfig DEFAULT 'simple';
-- creating the GIN index
DROP INDEX content_vector_gin;
CREATE INDEX content_vector_gin ON activities_post USING GIN (to_tsvector(tsvector_config, content)); |
|
The SQL code is ok for me. |
andrewgodwin
left a comment
There was a problem hiding this comment.
On the migration - let's get this landed without it, and we'll figure that out separately. The new Django index objects should support this, but if it's complex or weird enough then we'll just do a raw SQL one, since we only support PostgreSQL anyway.
| [ | ||
| (lang.alpha_2, lang.name) | ||
| for lang in pycountry.languages | ||
| if hasattr(lang, "alpha_2") |
There was a problem hiding this comment.
This is probably good enough for now - I think it's how most of the other clients work - but we might need to add support for separating the various Spanish dialects in future, for example.
There was a problem hiding this comment.
Yeah, the Mastodon docs are pretty explicit that it's the two-letter language code. Not sure how other clients would behave if the language code would be different. Already increasing the max length of the database field sounds safe enough to do already 👍
df29a43 to
b6f9df6
Compare
|
@chdorner since the |
|
@chdorner @AstraLuma this is absolutely great feature. any chance get this updated / merged? happy to do anything I can to help. |
| if isinstance(self.type_data, QuestionData) | ||
| else None, | ||
| "card": None, | ||
| "language": None, |
There was a problem hiding this comment.
Does this need to be updated to forward the stored language?
I think this is everything that's needed to implement Mastodon's post language feature.
languagekey in the request.contentMaplangattribute on the post contentNote on client compatibility:
posting:default:languagepreference, but uses its interface language as the default value when creating a new post