Valid Inference Using Language Model Predictions from Verbal Autopsies
In contexts where most deaths occur outside healthcare systems, verbal autopsies (VAs)—interviews with decedents’ caregivers—help predict causes of death (COD). However, making VA data actionable requires both accurate COD prediction from interviews and valid inference using these predictions. In this paper, we introduce multiPPI++, a method extending “prediction-powered inference” for multinomial classification. Leveraging advanced NLP techniques, multiPPI++ ensures accurate COD inference across various predictors, from GPT-4-32k to simpler models like KNN. Our findings emphasize the value of high-quality labeled data for reliable public health insights, regardless of the NLP model used for prediction.
Bayesian Active Questionnaire Design for Cause-of-Death Assignment Using Verbal Autopsies
Globally, only about one-third of deaths have a medically-certified cause, posing challenges for understanding deaths outside medical facilities. Verbal autopsy (VA) is commonly used to collect cause-of-death data through a structured questionnaire given to relatives of the deceased. However, the extensive nature of the VA questionnaire makes it costly and time-consuming, limiting its scalability. This research proposes a new active questionnaire design that dynamically optimizes question order, enabling accurate cause-of-death assignments with fewer questions. Our approach is fully Bayesian and compatible with any probabilistic cause-of-death method, incorporating an early stopping criterion to manage model uncertainty and a penalized score to respect existing question structures. Testing on both synthetic and real data demonstrates that this strategy achieves accurate cause-of-death results more efficiently than traditional VA surveys.
Knots and their effect on the tensile strength of lumber: a case study
When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that govern knots are informed by subjective judgment to some extent, particularly the spatial interaction of knots and their relationship with lumber strength. This case study reports the results of an experiment that investigated and modeled the strength-reducing effects of knots on a sample of Douglas Fir lumber. Experimental data were obtained by taking scans of lumber surfaces and applying tensile strength testing. The modeling approach presented incorporates all relevant knot information in a Bayesian framework, thereby contributing a more refined way of managing the quality of manufactured lumber.
Ellipse Detection and Localization with Applications to Knots in Sawn Lumber Images
While general object detection has seen tremendous progress, localization of elliptical objects has received little attention in the literature. Our motivating application is the detection of knots in sawn timber images, which is an important problem since the number and types of knots are visual characteristics that adversely affect the quality of sawn timber. We demonstrate how models can be tailored to the elliptical shape and thereby improve on general purpose detectors; more generally, elliptical defects are common in industrial production, such as enclosed air bubbles when casting glass or plastic. In this paper, we adapt the Faster R-CNN with its Region Proposal Network (RPN) to model elliptical objects with a Gaussian function, and extend the existing Gaussian Proposal Network (GPN) architecture by adding the region-of-interest pooling and regression branches, as well as using the Wasserstein distance as the loss function to predict the precise locations of elliptical objects. Our proposed method has promising results on the lumber knot dataset: knots are detected with an average intersection over union of 73.05%, compared to 63.63% for general purpose detectors. Specific to the lumber application, we also propose an algorithm to correct any misalignment in the raw timber images during scanning, and contribute the first open-source lumber knot dataset by labeling the elliptical knots in the preprocessed images.