The user directory is maintained based on users that are 'visible' to the homeserver - i.e. ones which are local to the server and ones which any local user shares a room with.
The directory info is stored in various tables, which can sometimes get out of
sync (although this is considered a bug). If this happens, for now the
solution to fix it is to use the admin API
and execute the job regenerate_directory
. This should then start a background task to
flush the current tables and regenerate the directory. Depending on the size
of your homeserver (number of users and rooms) this can take a while.
There are five relevant tables that collectively form the "user directory". Three of them track a list of all known users. The last two (collectively called the "search tables") track which users are visible to each other.
From all of these tables we exclude three types of local user:
A description of each table follows:
user_directory
. This contains the user ID, display name and avatar of each user.
user_directory_search
. To be joined to user_directory
. It contains an extra
column that enables full text search based on user IDs and display names.
Different schemas for SQLite and Postgres are used.
user_directory_stream_pos
. When the initial background update to populate
the directory is complete, we record a stream position here. This indicates
that synapse should now listen for room changes and incrementally update
the directory where necessary. (See stream positions.)
users_in_public_rooms
. Contains associations between users and the public
rooms they're in. Used to determine which users are in public rooms and should
be publicly visible in the directory. Both local and remote users are tracked.
users_who_share_private_rooms
. Rows are triples (L, M, room id)
where L
is a local user and M
is a local or remote user. L
and M
should be
different, but this isn't enforced by a constraint.
Note that if two local users share a room then there will be two entries:
(user1, user2, !room_id)
and (user2, user1, !room_id)
.
The exact way user search works can be tweaked via some server-level configuration options.
The information is not repeated here, but the options are mentioned below.
If search_all_users
is false
, then results are limited to users who:
users_in_public_rooms
table, orusers_who_share_private_rooms
where L
is the requesting
user and M
is the search result.Otherwise, if search_all_users
is true
, no such limits are placed and all
users known to the server (matching the search query) will be returned.
By default, locked users are not returned. If show_locked_users
is true
then
no filtering on the locked status of a user is done.
The user provided search term is lowercased and normalized using NFKC, this treats the string as case-insensitive, canonicalizes different forms of the same text, and maps some "roughly equivalent" characters together.
The search term is then split into words:
The queries for PostgreSQL and SQLite are detailed below, by their overall goal is to find matching users, preferring users who are "real" (e.g. not bots, not deactivated). It is assumed that real users will have an display name and avatar set.
The above words are then transformed into two queries:
to_tsquery
);to_tsquery
).Results are composed of all rows in the user_directory_search
table whose information
matches one (or both) of these queries. Results are ordered by calculating a weighted
score for each result, higher scores are returned first:
ts_rank_cd
function
against the "exact" search query; this has four variables with the following weightings:
D
: 0.1 for the user ID's domainC
: 0.1 for unusedB
: 0.9 for the user's display name (or an empty string if it is not set)A
: 0.1 for the user ID's localpartts_rank_cd
function against the
"prefix" search query. (Using the same weightings as above.)prefer_local_users
is true
, then 2x if the user is local to the homeserver.Note that ts_rank_cd
returns a weight between 0 and 1. The initial weighting of
all results is 1.
Results are composed of all rows in the user_directory_search
whose information
matches the query. Results are ordered by the following information, with each
subsequent column used as a tiebreaker, for each result:
rank
of the full text search results using the matchinfo
function. Higher
ranks are returned first.prefer_local_users
is true
, then users local to the homeserver are
returned first.