{"paper_id":"541466f7-cbe6-4e19-b937-109e832f3ea5","body_text":"Research on the experiences of patients with endometriosis highlights several areas of endometriosis care that require improvement. Medical treatments should be more holistic, taking into consideration the social, emotional, and psychological costs of endometriosis for patients and their loved ones [ 1 - 3 ]. Health care providers should improve their communication to validate patients’ concerns, meet their informational needs, and avoid misunderstandings [ 4 - 6 ]. Because loved ones are also affected by this condition, they should receive education and training on it from health care professionals [ 1 , 7 - 9 ].\nPrior work identifies a lack of large-scale mixed methods analyses on patient experiences and online communities, which could further support these findings [ 1 , 10 ]. Existing qualitative research on patient experiences has been limited to small patient samples. Previous quantitative analyses, though larger in scale, are limited to ontologies that are defined by researchers rather than those inferred from patient narratives [ 1 , 10 , 11 ].\nThis study fills this gap by providing a large-scale analysis of user behavior in 2 endometriosis online health communities (OHCs),  r/Endo  and  r/endometriosis , on the social media platform Reddit (Reddit, Inc). Using natural language processing and qualitative analysis, we identified and mapped the associations between a post’s subject matter (“topics”), the people and relationships (“personas”) mentioned, and the type of support the post seeks (“intent”). We investigated 2 research questions (RQs):\nRQ1—What aspects of the endometriosis experience are discussed in OHCs?\nRQ2—Which aggregate unmet needs emerge from the OHCs?\nMost studies on the effects of endometriosis on quality of life fail to investigate the impact of the condition on loved ones and informal caregivers as well as adolescents [ 1 ]. Instead, they only include people with an endometriosis diagnosis within the research population [ 5 , 6 , 12 - 14 ]. Given the long average delay between symptom onset and diagnosis [ 2 , 4 ], many adolescents with endometriosis are missed by this research.\nStudying  r/Endo  and  r/endometriosis , we discovered the unmet needs of these hard-to-reach groups. The endometriosis OHCs of interest in this study are open and accessible to anyone regardless of whether they have a diagnosis or not. As a result, numerous members self-identify as belonging to populations that have been missed by endometriosis research, that is, people who have yet to receive a diagnosis, adolescents, and loved ones of people with a diagnosis. Moreover, due to the pseudonymity afforded by the platforms, users feel more comfortable discussing needs that they might not have the time or courage to address in clinical settings.\nResearchers have also highlighted OHCs’ role in the improvement of health care [ 15 , 16 ]. An existing study of a polycystic ovary syndrome (PCOS) subreddit found concordance between trends from laboratory results posted to the OHC and trends from clinical research. This indicates that, although OHCs often include patients that are typically excluded from clinical trials (such as those with multiple conditions), studying these communities is useful to understand patient populations [ 17 ]. Indeed, content analysis of these communities reveals patterns across patients’ experiences of care and symptoms [ 18 - 21 ], and OHC members’ expertise in providing support to peers could be leveraged to deliver health care interventions and programs [ 16 , 22 - 24 ].\nEndometriosis is a chronic condition that affects 10% of people with a uterus. It is characterized by the presence of uterine lining tissue outside of the uterus [ 2 ]. This condition causes a range of painful, persistent, and life-altering symptoms, including, but not limited to, chronic pelvic pain, painful menstruation, constipation, painful urination, painful sexual intercourse, and infertility. Its estimated annual economic impact exceeds US $60 billion in direct treatment costs and lost productivity in the United States alone [ 25 ]. There is no cure for endometriosis; therefore, treatment focuses on symptom management and relief [ 2 , 26 , 27 ]. Treatments might include hormonal therapy, surgical removal of endometriosis lesions, and fertility treatment. However, such therapies cause numerous side effects and rarely provide long-term relief to patients [ 26 ].\nDue to the absence of condition-specific symptoms and biomarkers, the normalization of menstrual pain, the need for surgery to make a diagnosis, and the lack of knowledge about the condition by both the public and clinicians, the average time until diagnosis is estimated to be between 6 and 11 years, depending on the health care system of reference [ 4 , 28 , 29 ]. A confirmed diagnosis can only be reached through laparoscopic excision of endometriosis tissue, an invasive surgical procedure [ 30 ].\nPatients with endometriosis face numerous difficulties during their health care journeys. They not only struggle to find information but also encounter negative attitudes from physicians [ 5 , 12 , 31 ]. Patients’ concerns are often dismissed as “just period pain” by health care providers [ 13 ]. Negative attitudes seem to derive from physicians’ own discomfort with unexplained symptoms [ 32 ] as well as from the continued presence of hysteria discourse and androcentric views in medical literature [ 33 ].\nBecause of these interconnected factors, endometriosis has dire impacts on patients’ quality of life [ 34 ]. This condition forces people to leave their education and employment and opt out of social events and everyday activities. Due to sexual pain and infertility, patients may feel inadequate as partners and fear abandonment [ 1 ].\nPatients with endometriosis require support from partners, family members, and friends to overcome these struggles and receive a diagnosis [ 3 , 29 ]. Self-care practices are time-consuming and labor-intensive for both patients and their loved ones. As patients focus on following complex treatment regimens and become expert in their own care [ 6 , 35 - 37 ], a wide range of other responsibilities falls onto their partners and family members. These responsibilities can include financial and housekeeping duties, helping to navigate the health care system, and relaying medical information, among others [ 38 - 41 ].\nOHCs are groups of individuals who come together on an internet-based platform (eg, social media, website, or forum) to discuss general or condition-specific medical topics. Members may be patients, medical professionals, informal caregivers, patients’ loved ones, or members of the general public [ 28 , 29 ].\nOHCs have been shown to provide support to users who experience dissatisfaction or constrained access to medical care, limited social support, or the absence of a local community of people with the same condition [ 38 , 42 ]. Indeed, some members join OHCs after feeling alienated from the medical community or becoming distrustful of medical knowledge and care [ 43 , 44 ]. Others join to learn about alternative treatment options, understand their test results, or to advocate for better awareness of their condition [ 42 , 45 , 46 ].\nAs these communities allow varying levels of pseudonymity and anonymity, users with stigmatic and chronic conditions can share intimate or stigmatized information with reduced fear of social repercussions [ 47 , 48 ]. People with chronic conditions often use OHCs to make sense of their experiences and receive validation [ 49 , 50 ].\nStudies of OHCs show that an individual member’s support needs as well as the support they provide may change over time and with age [ 51 - 53 ]. Earlier work on support matching suggests that different types of support may be more appropriate for certain needs [ 54 ]. A study of a breast cancer OHC found that the presence of emotional or informational support increased the original poster’s satisfaction, though users expressed less satisfaction if they received emotional support when seeking informational support [ 55 ]. A separate study on a mental health OHC found that support matching positively predicted satisfaction, but there was significant variance across users [ 52 ].\nParticipating in OHCs empowers members as they become better informed about their health concerns, learn to manage their condition, and gain strategies for communicating with health care providers [ 22 , 23 , 31 , 43 , 53 , 56 - 58 ]. In many cases, participants ultimately feel less isolated. Contrary to common belief, Huh [ 59 ] found that OHC members do not share misinformation, and they commonly invite peers to consult a provider for medical advice. Other studies of OHCs confirm the beneficial effects of engaging in these communities, showing that members gradually express more positive emotions than negative ones with sustained participation [ 60 , 61 ]. One study of an addiction recovery OHC found that engagement in the community correlates positively with recovery [ 62 ].\nDue to the significant impacts of the condition on patients’ lives, people with endometriosis often turn to both offline and online communities for help. The former generally consist of dedicated in-person meetings and activities, and access depends on proximity [ 63 , 64 ]. The latter exist in a variety of forms, such as blogs, mailing lists, Facebook pages, and Instagram accounts; their activities depend on the specific platform [ 43 , 49 ].\nWhelan [ 65 ] found that both an offline and an online endometriosis group are epistemic communities. As members shared their stories and interacted with peers, they built a new epistemology in which patient experiences became valid forms of knowledge.\nPrevious research also focuses on the kinds of support and content shared in endometriosis online communities. In a study of Facebook pages for people with endometriosis, Towne et al [ 11 ] showed that 48% of the posts provided emotional support, while educational posts made up 21% of the total posts. Furthermore, they found that 94% of the educational posts shared accurate information. By contrast, Metzler et al [ 10 ] found that most posts on Facebook and Instagram accounts about endometriosis offered inspiration or support, awareness about the disease, or personal information. Followers mostly engaged with posts that are humorous, generate awareness, and contain personal content. Finally, Shoebotham and Coulson [ 66 ] demonstrated that several therapeutic benefits are related to joining endometriosis online support groups. They found that members felt reassured and empowered while improving their knowledge of endometriosis.\n\nEndometriosis OHCs exist on many platforms in many forms [ 43 , 49 ]. We studied  r/Endo  and  r/endometriosis , two thriving forums about endometriosis on Reddit. Reddit is a social media platform with 93 million daily active users as of September 2024. It is a collection of forums (called subreddits), which users can choose to join, and they may participate as posters or commenters. We focused on Reddit because its OHCs have been thoroughly studied, and previous analyses have demonstrated their potential for understanding patient experiences and improving care [ 15 , 17 , 21 ]. Furthermore, both  r/Endo  and  r/endometriosis  featured high membership and participation numbers ( Table 1 ) as well showed promise of continued growth ( Table 2 ).\nWe collected posts and comments from  r/Endo  and  r/endometriosis  from their inception (January 2012 and November 2014, respectively) to December 2021 using the Pushshift Reddit application programming interface (API). The custom Python code used for the data collection process and subsequent analysis are available for reference.\nAfter reading the posts, examining general statistics of the subreddits ( Table 1 ), and comparing their community-specific languages using the “fightin’ words” method by Monroe et al [ 67 ], we found that the 2 communities shared sufficient similarities to justify treating them as a single dataset ( Multimedia Appendix 1 ). Members used both subreddits to ask questions about endometriosis, share experiences, and seek advice, and this similarity was reflected in their high vocabulary overlap. Existing differences in word use were relatively minor and not clearly relevant to our RQs.\nGeneral statistics of r/Endo and r/endometriosis subreddits and of the combined dataset.\nNumber of posts and comments in r/Endo and r/endometriosis subreddits over time.\nPeople with endometriosis have historically been failed by research and medical institutions. Similar to other gendered conditions, considering its severity, economic impact, and the number of people diagnosed with it, endometriosis research is greatly underfunded [ 68 ]. There is a persistent imbalance between the high percentage of people with endometriosis and the low number of endometriosis experts [ 2 ]. Patients deal with ongoing disbelief, invalidation, or trivialization of their symptoms, even from members of the medical community [ 12 , 31 , 33 ].\nThough data from  r/Endo  and  r/endometriosis  are public, members of online communities do not necessarily anticipate that their posts and comments could be used by academic researchers [ 69 ]. By collecting, analyzing, and publishing research about these data, we extracted the data from its intended audience and brought it to a new, unanticipated audience [ 70 ]. Following previous examples of handling sensitive, health-related data [ 18 , 45 , 71 ], we obscured the source data to protect members from being identified in relation to their posts or comments. Obfuscation was performed in two ways as follows: (1) throughout this paper, we paraphrased any quoted material and (2) we did not rerelease the underlying text data themselves. Any quoted material in the paper underwent rewording at the sentence level to make it less directly searchable, but we retained as much content of the original version as possible. We released all code and our codebooks so that other researchers may replicate our results on future versions of the OHC, subject to users’ later in situ modifications or deletions of their contributions.\nBecause data from Reddit are publicly available, our study was not considered to be human participant research and was exempt from a review by Cornell's institutional review board (IRB0145192). Still, we contacted the moderators of the 2 OHCs to inform them that we were conducting this study.\nTo answer our RQs, we used a combination of natural language processing techniques. These have gained popularity and are being widely implemented in the health care domain, both in research and clinical practice [ 72 , 73 ]. Specifically, we used complementary supervised and unsupervised methods to not only isolate specific instances of personas and intents but also allow topics to emerge beyond the RQs we designed.\nFollowing suggestions from research on endometriosis experiences [ 1 ], we extracted topics from the combined endometriosis OHCs using an abductive approach rather than a priori categories. We used latent Dirichlet allocation (LDA) topic modeling [ 74 ], an unsupervised machine learning technique that allows topics to emerge from a corpus of documents without the need to set predefined categories. LDA and similar unsupervised techniques have been successfully used to extract topics from OHCs in previous studies [ 18 , 48 , 75 ].\nLDA identifies groups of similar words (topics) in a corpus based on the statistical probability of these words occurring together. The model assigns, to each document, the likelihood of containing each topic based on the words it features.\nBefore training the LDA model, we cleaned posts and comments using the string processor included in the little-mallet-wrapper by Antoniak [ 76 ], which is designed to prepare raw text for topic modeling. The string processor splits strings into a series of tokens (words separated by punctuation or spaces), removes punctuation and common words, converts all characters to lower case, and returns the transformed string. After this initial cleaning, we removed any post and comment written by or responding to bots by searching for the strings “bot” and “torrent” in both the username and the text of the document. We also removed all documents that had <5 words.\nWe implemented the LDA function using the  tomotopy  Python package [ 77 ]. We experimented by running multiple models with different combinations of the following parameters: number of topics=10;15;20; and 25 and number of most frequently removed words=5;10;15; and 20. We also explored training the model with different document lengths. We first ran LDA on whole posts and comments, and we chunked these into paragraphs and sentences.\nTo evaluate the performance of each model, we read each topic’s top 100 documents by average probability and assigned a descriptive label to each topic based on the content of those documents and themes previously identified in qualitative research.\nFollowing this evaluation procedure, we found that the topic model trained with 25 topics on paragraph chunks best suited our purposes. We grouped topics into 5 overarching categories based on conceptual similarity and interconnectedness as follows:  symptoms ,  medications ,  health care ,  self-care , and  life issues . The detailed description and listing of the 5 categories, the 25 topics, and each topic’s top 10 keywords are shown in the Results section subsequently. We then compared our topics against themes identified in previous research on endometriosis.\nAs a complement to the unsupervised topics, we designed 2 supervised tasks as follows: the identification of people based on their social roles ( personas ) in posts and the identification of the goal ( intent ) of a post. Supervised machine learning allowed us to assign OHC-specific labels, including personas and intent, to all posts in our dataset.\nPersonas are types of people, organized by social roles, who often interact with a person with endometriosis. We identified discussions of personas in endometriosis OHC posts to better understand how endometriosis interfaces with interpersonal relationships. Specifically, we studied the following 4 most frequent personas mentioned in the endometriosis OHCs, based on a qualitative analysis of 200 posts:  medical professionals ,  partners ,  family members , and  endometriosis OHCs . Given the variety of terms that could represent each persona (eg, a gynecologist, a subcategory of  medical professional , could also be referred to as gyno, OB-GYN, gynecologist, obstetrician, doctor, doc, provider, or many others), instead of using a keyword search for each persona category, we trained a supervised model to identify personas based on hand-labeled examples.\nAmong the personas studied,  medical professional  was any type of professional in the health care system with a patient-facing role, such as a physician, gynecologist, nurse, etc. The  partner  persona included romantic partners, and  family  included mentions of family members (eg, parents, children, and siblings). Depending on paragraph context,  family  also encompassed  partners . The  endometriosis OHCs  label involved the  r/Endo  and  r/endometriosis  subreddit communities. Paragraphs that mentioned the subreddit did so by name, but they also included posts that spoke directly to the reader (eg, “Can you tell me if you’ve experienced this?”). The  endometriosis OHCs  label differed from the others, given that the endometriosis OHCs tended to be both the audience and subject matter of a post.\nIn a random sample of paragraphs from posts in the corpus, we assigned the paragraph a label for every present persona category. If there was no persona present, the paragraph did not receive a label. To assess interrater reliability, using the labeling scheme described earlier (alongside a codebook included in  Multimedia Appendix 2 ), 2 authors labeled 200 of the same randomly sampled paragraphs. We then evaluated interrater reliability using Cohen κ. After reaching sufficient interrater reliability (0.7) for all categories, for each persona, one author labeled paragraphs until reaching enough labeled data for acceptable classification performance.\nFor each persona category, we fine-tuned a pretrained DistilBERT model on the persona-annotated paragraphs to perform a binary classification task [ 78 ]. DistilBERT is an English-language large language model that can be fine-tuned on a given dataset to perform a specific task, such as supervised classification [ 79 ]. DistilBERT provides a lightweight version of the BERT encoder language model [ 79 ] that retains much of BERT’s performance, making it easier for other works to replicate our results and to use our trained models. For each persona category, we fine-tuned DistilBERT on paragraphs from the combined endometriosis OHCs to best predict the assigned categorical label. We kept all training hyperparameters consistent across models, using a learning rate of 5e-5, 50 warm-up steps, and a weight decay of 0.01 in 3 training epochs. As a baseline model, we also performed logistic regression on each persona category, with input texts in term frequency-inverse document frequency structure. We used each trained model to predict instances of personas in paragraphs in the rest of the corpus. A paragraph could receive more than one persona label, if it mentioned multiple distinct personas.\nPrevious research on support in OHCs has established multiple overarching categories of support, often characterized as either emotional or informational support [ 52 , 53 , 55 , 80 , 81 ]. OHC research takes these support categories and maps them onto behavioral features in the data, which suggest the type of support a person seeks or provides [ 51 ]. Our work specifically considered what users desired from the act of posting, which we call their intent, but we acknowledged that the intent of a post was unavailable to researchers without directly speaking to the person who shared a post. To develop a set of intent categories that are tailored to the endometriosis OHCs, we iteratively labeled, discussed, and revised our labels. We identified 4 common categories of intent as follows:  seeking informational support ,  seeking experiences ,  seeking emotional support , and  venting .\nSeeking informational support occurred when a person posts on the OHC to find medical information. We built upon previous definitions of seeking informational support [ 51 , 82 ] but incorporated a novel yet simple heuristic for labeling as follows: Could the post’s question be usefully posed to a physician? After revising the  seeking informational support  definition, we found major improvements in labeling consistency, speed, and interrater reliability. Adding this question also created an effective distinction between  seeking informational support  and  seeking experiences :\nMy gyno said there’s a chance I have endo, but that I can’t be diagnosed yet since I’m too young (21). Is that true? Is there some sort of test I should be pushing for? I had a doctor who refused to perform a pelvic exam because she said I couldn’t have digestive problems because of endo. I’m feeling skeptical and I don’t know how to advocate for myself.\nSeeking experiences was the inverse of seeking informational support, as posts that sought experiences could only be answered by someone exposed to the endometriosis experience or who has been on the receiving end of care. Posts that sought experiences asked the community about their experiences with a variety of medical procedures or their day-to-day experiences living with endometriosis. Some of these posts also asked if members of the community had experienced similar symptoms:\nDoes this sound like endo? How did you get your diagnosis? Did you go to a specialist? Any other advice is appreciated.\nSeeking emotional support included posts that asked for encouragement, empathy, validation, or help navigating emotional situations. These posts might have looked for emotional support after a negative experience, or they might have asked for a celebration from the community after a major milestone in care, such as a diagnosis, improvements in symptoms, or successful self-advocacy:\nI’m feeling really down and I can’t talk to my doctor. The only reason she agreed to do this was because of my mental illness. I’m so afraid that either outcome will break my heart. How do I live with the results?\nOur final label, venting, occurred when a person posted about their grievances living with endometriosis or frustration at a specific situation. We are not aware of similar labels in previous OHC research. Both communities supported the practice of venting or ranting and even had “flares” (tags) for posts that vent or rant:\nThis is a long post, but I’m feeling hopeless. I started dealing with things since around 12 years old and now I’m 26. This pain has lasted for weeks and I can’t do any of the physical activities that I love and I feel useless and everyone is dismissing me like a crazy person. I feel dismissed by today’s doctor, some woman on the phone, all the doctors I’ve ever dealt with since 12. Ugh sorry I know this is long but I needed to rant. Anyway thanks for listening to me talk it out.\nWe found that most of the posts began or ended by stating the person’s intent and their preferred form of support. Whenever possible, we chose the intent that aligned with a post’s explicitly stated purpose.\nUsing this codebook ( Multimedia Appendix 3 ), one author labeled 1500 sampled posts from  r/Endo  and  r/endometriosis  to be used as training data for our models. Each post could receive between 0 to all 4 intent category labels, though most of the posts had a primary, explicitly expressed intent. A second author labeled 200 of the same posts as those used for training the models to be used for measuring interrater reliability.\nWe fine-tuned a series of DistilBERT models on these hand-labeled data to perform binary classification to predict each intent category in a post. We then used the fine-tuned models to predict the intent of posts in the entire corpus. Because there was a binary model fine-tuned to perform classification for each intent category, a post could have 0 to 4 intent classes.\n\nLeveraging topic probabilities, we investigated which aspects of endometriosis experiences were discussed in the endometriosis OHCs. First, we considered what topics emerged from the endometriosis OHCs once we performed LDA topic modeling on paragraph chunks from posts and comments. Second, we analyzed which of those topics were discussed the most in posts.\nUsing LDA topic modeling, we found discussions of 5 main topic categories in the endometriosis OHCs as follows: symptoms, medications, health care, self-care practices, and life issues. A complete list of the 5 qualitatively assigned categories that encompassed our 25 computationally identified topics is provided subsequently. Although health care was the category with the largest number of topics, symptoms and life issues were also commonly discussed in these communities. We further identified a category of self-care practices.\nA major pattern in the 2 OHCs was the presence of topics related to symptoms ( Table 3 ). People with endometriosis experience a wide range of chronic symptoms such as  gastrointestinal  issues;  pelvic floor  pain; heavy, irregular, and painful  menstruation ;  muscular  cramps in their legs and abdomen; and others. Many users shared these symptoms with the communities in the hope of receiving or providing support.\nTopics in the symptoms category. Topic numbers were assigned randomly by the model, while labels were assigned upon reading the top 100 documents for each topic.\nDue to the chronic nature and current incurability of endometriosis, people with endometriosis use a variety of drugs and treatments. Users of the 2 OHCs often listed their  pain management  routine, shared  hormonal treatment experiences  (“18 months ago I started using the Nuva ring and I love it”), recounted the side effects of specific drugs they have used, or provided medical  information on hormonal drugs  (“Orlissa is a GnRH antagonist, so it lowers estrogen directly without relying on the same feedback mechanism as Lupron”). The  medications  category grouped these experiences ( Table 4 ).\nTopics in the medications category. Topic numbers were assigned randomly by the model, while labels were assigned upon reading the top 100 documents for each topic.\nIn the  health care  category, we grouped topics regarding the medical aspects of endometriosis and how patients with endometriosis experienced the health care system ( Table 5 ). Often, senior members of the OHCs provided new users with  medical information  on the condition, overviews on the process of  getting diagnosed , as well as  information on surgery . Users also advised each other on how to prepare for their  medical appointments . They often pointed to competent endometriosis  specialists , compared  insurance  policies, and highlighted helpful  online resources .\nTopics in the health care category. Topic numbers were assigned randomly by the model, while labels were assigned upon reading the top 100 documents for each topic.\nAs endometriosis requires a considerable amount of  self-care  ( Table 6 ), patients were faced with the challenge of caring for themselves while maintaining other responsibilities. Users of the OHCs found support against exhaustion and isolation by comparing experiences and tips about their  postsurgery recovery . They also provided detailed information on their  diet ,  product recommendations  for items that help with daily activities, and various  comfort items  for when symptoms flared up.\nTopics in the self-care category. Topic numbers were assigned randomly by the model, while labels were assigned upon reading the top 100 documents for each topic.\nThe last category,  life issues , included groups users’ discussions of general life issues connected with having a severe chronic condition ( Table 7 ). In these communities, users shared experiences of  dismissal and abuse  and their  medical stories  as patients. They gave each other support through their  fertility  struggles. Community members exchanged expressions of  gratitude  and  empathy  with their peers.\nTable 7. Topics in the life issues category. Topic numbers were assigned randomly by the model, while labels were assigned upon reading the top 100 documents for each topic.\nTo investigate which aspects of the experiences of patients with endometriosis were most discussed in the endometriosis OHCs, we measured which topics had the highest average probability in all posts. If a topic had a high average probability across all posts, it indicated that the topic was highly present in the endometriosis OHCs. In posts, the topics with the highest average probability were  medical stories ,  medical appointments ,  sharing symptoms ,  menstruation,  and  empathy  ( Table 8 ,  Figure 1 ).\nWe found that  medical stories  and  medical appointments  were the 2 most discussed topics. New or returning users frequently recounted their health care journey at the beginning of their posts—from their first symptoms as teenagers to undergoing surgery and choosing between treatment options. Users also asked specific questions, such as how to book their medical appointment, what to do if an appointment was moved or the physician did not show up, and what strategies others used to communicate successfully with their physicians.\nTwo  symptom  topics,  sharing symptoms  and  menstruation , were among the most frequent topics. Users of the endometriosis OHCs shared detailed accounts of their symptoms in order to elicit their peers’ opinions on whether they should seek urgent care, whether a new symptom might be caused by their treatment rather than by endometriosis, and whether what they were going through resembled other people’s experiences with endometriosis.\nTopics with the highest average probability in posts.\nAverage topic probabilities in posts collected from the 2 endometriosis online health communities ordered by category. Medical stories, medical appointments, sharing symptoms, menstruation, and empathy had the highest average probability.\nA large number of posts in the OHCs were solely dedicated to describing menstrual symptoms. New users of these communities were often undiagnosed teenagers who wondered whether they should seek medical assistance given their experiences with menstruation. Furthermore, endometriosis was typically treated with hormonal medicines, which caused additional changes to patients’ menstrual cycles. Patients shared such changes with peers to understand if the treatment had been effective at relieving their pain.\nEmpathy  was the fifth most present topic in posts of the 2 OHCs, underlining that demonstrations of empathy were valued by patients with endometriosis. Users often also lamented feeling misunderstood and dismissed.\nIn this section, we consider the needs expressed by members of the OHCs. For each of the topics outlined in RQ1, we considered the following: (1) Which topics are more likely when different personas are mentioned? and (2) What is the intent of posts when they mention each topic? By doing so, we could better understand the interplay between endometriosis experiences, interpersonal relationships, and the goals of OHC members.\nBefore analyzing personas in the OHCs, we validated the model by considering interrater reliability in our hand-labeled dataset and the classification performance of our model across the different classes and assessed the balance between false positives and negatives for each persona.\nFirst, as shown in  Table 9 , we reached sufficient interrater reliability for each of the persona classes. The lowest agreement occurred for paragraphs that were assigned the family label and the highest for paragraphs assigned to the medical professional label.\nNext, we considered classification performance for each of the fine-tuned persona models.\nClassification accuracy for a holdout test set of 25% of the total labeled paragraphs is listed in  Table 10 . For all classification results, we presented macroaveraged scores, which is a more pessimistic scoring method that treats both classes equally, regardless of class imbalance. All 4 fine-tuned models reached a high classification performance.\nIn addition to the high classification performance listed in  Table 10 , we considered the frequency of false positives and negatives for each class through confusion matrices ( Figure 2 ). Each confusion matrix compared the hand-labeled persona categories with the model-predicted categories for all annotated paragraphs. On the basis of these confusion matrices, we found that each model performed quite well, but false positives were more likely for physician, family, and partner labels and false negatives for the endometriosis OHC when the model was incorrect.\nNumber of paragraphs assigned to the persona labels of the total paragraphs labeled and interrater reliability.\nClassification performance for each persona category for both logistic regression and DistilBERT. All scores are reported as macroaverages.\nConfusion matrices for the 4 persona categories. Positive refers to the presence of the persona category in the paragraph. OHC: online health community.\nOf the 4 persona categories, posts on the endometriosis OHC most often mentioned the  endometriosis OHCs , followed by  medical professional ,  family , and  partners  ( Table 11 ).\nOf the posts predicted with at least 1 of the 4 personas, we found the topics that were most present. For each persona, we found the average topic probabilities for all posts predicted with each persona converted to  z  scores.  Figure 3  displays this result, depicting the members of the OHCs who were most likely to discuss when they mentioned each persona.\nWhen a  medical professional  was mentioned, posts were more likely about  medical appointments  and  medical stories , highlighting the important role that providers had in shaping patient medical pathways. However,  medical professional  was the least likely of any persona to be discussed in combination with  empathy  ( P <.001).\nInterestingly, posts in the  endometriosis OHCs  were more likely to discuss  medical appointments  than posts in  medical professional  ( P <.001). In alignment with our findings in RQ1, users of the OHC requested the assistance of the community to prepare for visits, as this support might not have been available to them in clinical settings.\nPosts that mentioned  partner  or  family  were likely to discuss topics from the  life issues  category, in particular  fertility  ( P <.001). These posts often emphasized how navigating fertility deeply affected relationships. Mentions of  family  in posts about  fertility  might have to do with family planning and personal goals in growing a family. Some users expressed concern about being able to have or keep a partner when dealing with infertility. These posts also mentioned feeling pressured to have children by family or partners.\nFinally, posts that mentioned  partner  often also discussed  postsurgery recovery  ( P <.001). Partners can indeed play an important role in helping patients with endometriosis access treatment and maintain self-care routines. In addition, it was sometimes the partner of a person with endometriosis who asked for advice from the OHC.\nPercentage of posts with more than one mention of each persona in the endometriosis online health communities (OHCs; n=34,715).\nAverage topic probabilities (converted to z scores) for posts with different personas. OHC: online health community.\nAs with our persona models, we first validated our intent models by assessing interrater reliability, classification performance, and the presence of false positives and negatives.\nWe reached acceptable interrater reliability across all categories ( Table 12 ). The interrater reliability for intent was quite similar to the scores for personas; highest agreement was reached for posts that sought experiences and the lowest agreement for posts that included venting.\nClassification accuracy for the fine-tuned intent models on a held-out test set of 25% of the total labeled paragraphs is listed in  Table 13 . Overall, the intent models reached acceptable performance, though this performance was lower than that of our persona models. This slightly lower performance was expected because of the more complex nature of the intent categories.\nWe then considered the balance between false positives and negatives for the intent models.  Figure 4  shows the confusion matrices comparing the ground-truth intent labels with the model predictions for data where the ground-truth intent was not none. We found that when the model was incorrect, false positives and negatives were nearly equally likely for venting and seeking emotional support intents, false positives were more likely for seeking experiences intent, and false negatives were more likely for seeking informational support intent.\nNumber of posts assigned to the intent labels and interrater reliability for each label (n=1500).\nClassification performance for each intent category for both logistic regression and DistilBERT. All scores are reported as macroaverages.\nConfusion matrices for the 4 intent categories. Positive refers to the presence of the intent category in the post.\nWe considered the goals of members of the community in their posts through our algorithmic intent predictions. Across all posts, we found that users were most likely to  seek experiences  from the OHC; they did so in roughly half of the posts.  Seeking informational support  occurred in around a quarter of the posts, and  seeking emotional support  and  venting  were the least common intent types ( Table 14 ).\nWe found answers to the following questions: (1) When a member sought information from the community, what were they trying to learn about? And (2) When a member simply wanted to vent, what subjects were most often related to their frustration?\nWe found an important divide between the subject matter of posts that  sought experiences  or  informational support  and those that  sought emotional support  or  vented  ( Figure 5 ). The subject matter of posts that  sought information  or  experiences  was more often about topics in the  symptoms ,  medications , and  health care  categories, while members were more likely to  seek emotional support  and  vent  about the  life issues  topics, including  dismissal ,  medical stories ,  fertility , and  empathy .\nHowever, members of the endometriosis OHCs did  seek emotional support , and  vented , about  pain management  and when  sharing symptoms . While a person with endometriosis might look for information or experiences regarding their symptoms and pain, they were more likely to look for emotional support from the community or to vent their frustrations.\nPercentage of posts with each intent label in the endometriosis online health communities (n=34,715).\nAverage topic probabilities (converted to z scores) for posts with different intents.\n\nUsing topic modeling we find that OHCs are spaces dedicated to narrations of users’ health care pathways, directions on how to find care and manage symptoms, as well as expressions of validation between peers regarding their health concerns.\nParticularly, the most discussed topics in the 2 communities are  medical stories ,  medical appointments ,  sharing symptoms ,  menstruation , and  empathy . These results extend previous findings from small-scale, qualitative research collecting experiences of patients with endometriosis. These include evidence of the benefits of sharing one’s story within a community [ 49 , 65 ], the need for assistance with treatment regimens and appointments [ 14 , 31 ], the uncertainty experienced by patients related to their symptomatology [ 13 ], as well as the value of receiving validation regarding health concerns and symptoms [ 8 ].\nAn existing study of a PCOS subreddit has also found concordance between the OHC user population and research-identified patient cohorts [ 17 ]. Although the PCOS OHC includes patients that are typically excluded from clinical trials (such as those with multiple conditions), trends found in laboratory test results posted to the community are consistent with clinically reported results.\nOur results also align with studies on OHCs, showing that OHC users become better at communicating with their providers and managing their conditions as well as feel less isolated [ 23 , 43 , 49 , 50 , 56 - 58 ].\nWe find that posts mention the endometriosis OHCs more than they mention medical professionals, highlighting the vital role that these groups play in the users’ health care decisions, and most of the posts are written to seek experiential advice. Venting is the least common of our intent categories, but venting still occurs in a substantial fraction (10%) of the posts.\nWe find that users need assistance with accessing and preparing for medical visits as well as navigating fertility options. To meet these needs, patients currently turn to the OHCs, their partners, and their family. Interestingly, members of the OHCs seldom associate medical professionals and providers with empathy.\nWe also find that patients’ relationships with their partners and family members can be affected by their condition. Users share how physical manifestations of endometriosis, such as infertility, alter their life goals and complicate personal relationships. At the same time, partners and family members play a vital role by serving as informal caregivers. They even use the OHCs for advice in creating a strong support system.\nFurthermore, while users seek experiential knowledge regarding treatments and health care processes, they also wish to vent and establish an emotional connection about the life-altering aspects of the condition.\nThese results align with previous research on the areas of endometriosis care that need improvement, including nonholistic treatments [ 1 - 3 ], unsatisfactory patient–health care provider communication [ 5 , 9 , 18 ], and the lack of training or educational resources for patients’ loved ones [ 1 , 7 - 9 ].\nThe potential implications for clinical practice of this work are significant. Physicians who treat endometriosis would benefit from knowing the depth and breadth of medical information shared on these forums. Patients belonging to such communities often have a higher baseline level of knowledge about endometriosis and will benefit from more nuanced discussion of symptoms and treatment options. In addition, if physicians are aware of the overall supportive nature of these communities, they may be able to recommend them to patients who feel isolated with their diagnosis.\nOn a wider policy level, this work highlights the need for better education about endometriosis in medical schools and obstetrics and gynecology residency training programs. With improved awareness of the condition, it is less likely that communications with patients would be perceived as dismissive. In addition, this work calls attention to the need for more physicians with expertise in endometriosis, as highlighted by the popularity of posts on finding endometriosis specialists.\nLimitations of the study include its sole focus on Reddit, which may exclude patients who prefer receiving support on other social media platforms or through offline groups. Our focus on 2 primarily English-language subreddits may not capture the experience of non–English-speaking users.\nIn addition, we cannot assume that every post describes endometriosis, as many users rely on the community for information before an official endometriosis diagnosis. While this provides us with valuable information about the prediagnosis experience, it is possible that users post, in some instances, about medical issues that are not endometriosis.\nThe study spans nearly a decade, but it does not explicitly analyze how trends in discussion topics or user needs evolve over time (eg, the period before the COVID-19 pandemic vs the period after the COVID-19 pandemic). Future work could perform temporal analyses designed to detect how external factors, such as a pandemic or major developments in endometriosis research, change discussion topics and user needs in the community.\nIn addition, while we learn vital information about the endometriosis experience, there is a more complex picture of how endometriosis is impacted by social conditions. Future work could analyze how social determinants (eg, employment status and cultural stigma) influence the endometriosis experience.\nIn this study, we conducted a large-scale analysis of user needs in 2 endometriosis OHCs,  r/Endo  and  r/endometriosis . We found that these communities provide members with a space where they can discuss care pathways, learn to manage symptoms, and receive validation. Our results point to the need for greater empathy within clinical settings, easier access to appointments, more information on health care processes, and further support for patients’ loved ones.\nOur study demonstrates the value of quantitative analyses of OHCs. OHCs provide very large datasets on patient experiences. In this work, we analyzed hundreds of thousands of posts and comments by tens of thousands of users. To the best of our knowledge, the sample size of this study is one order of magnitude larger than the sample size of any other study on the needs and experiences of patients with endometriosis. Therefore, our results reinforce and extend findings from small-scale studies about patient experiences and provide insights into hard-to-reach groups. Finally, we believe that studies of OHCs can help design interventions to improve care, as argued in previous studies [ 15 , 17 , 18 , 45 ].","source_license":"CC-BY-4.0","license_restricted":false}